Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

magic9mushroom If you're going to downvote me, and nobody's already voiced your objection, please reply and tell me 1yr ago (thezvi.wordpress.com) 2437 thread views

Danger, AI Scientist, Danger

thezvi.wordpress.com

Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.

Jump in the discussion.

No email address required.

Corvos 1yr ago · Edited 1yr ago

Its name is Sakana AI. (魚≈סכנה). As in, in hebrew, that literally means ‘danger’, baby.

It’s like when someone told Dennis Miller that Evian (for those who don’t remember, it was one of the first bottled water brands) is Naive spelled backwards, and he said ‘no way, that’s too f***ing perfect.’

This one was sufficiently appropriate and unsubtle that several people noticed.

It's Japanese. It means 'fish', because the founders were interested in flocking behaviours and are based in Tokyo. I get that he's doing a riff on Unsong, but Unsong was playing with puns for kicks. This just strikes me as being really self-centred.

This too was good times. The Best Possible Situation is when you get harmless textbook toy examples that foreshadow future real problems, and they come in a box literally labeled ‘danger.’ I am absolutely smiling and laughing as I write this.

When we are all dead, let none say the universe didn’t send two boats and a helicopter.

In general this seems to be someone whose views were formed by reading Harry Potter fanfic fifteen years ago and has no experience of ever using AI in person. LLMs are matrices that generate words when multiplied in a certain way. When told to run in a loop altering code so that it produces interesting results and doesn't fail, it does that. When not told to do that, it doesn't do that. The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have is silly. If we generate a superintelligent LLM (and we have no idea how to, see below) we will know and we will be able to ask it nicely to behave.

It's not that he doesn't have any point at all, it's just that it's so crusted over with paranoia and contempt and wordcel 'cleverness' that it's the opposite of persuasive.

Putting that aside, LLMs have a big problem with creativity. They can fill in the blanks very well, or apply style A to subject B, but they aren't good at synthesizing information from two fields in ways that haven't been done before. In theory that should be an amazing use case for them, because unlike human scientists even a current LLM like GPT 4 can be an expert on every field simultaneously. But in practice, I haven't been able to get a model to do it. So I think AI scientists are far off.

Context

magic9mushroom If you're going to downvote me, and nobody's already voiced your objection, please reply and tell me Corvos 1yr ago · Edited 1yr ago

It's Japanese. It means 'fish', because the founders were interested in flocking behaviours and are based in Tokyo. I get that he's doing a riff on Unsong, but Unsong was playing with puns for kicks. This just strikes me as being really self-centred.

Zvi is very Jewish; it's far more obvious when reading his writing than it is when reading Scott's. It's not surprising that Hebrew meanings of words jump out at him.

In general this seems to be someone whose views were formed by reading Harry Potter fanfic fifteen years ago and has no experience of ever using AI in person.

Zvi has used essentially every frontier AI system and uses many of them on a daily basis. He frequently gives performance evaluations of them in his weekly AI digests.

The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have is silly.

Um, he didn't say that - not here, at the very least. I checked.

I'm kind of getting the impression that you picked up on Zvi being mostly in the "End of the World" camp on AI and mentally substituted your abstract ideal of a Doomer Rant for the post that's actually there. Yes, Zvi is sick of everyone else not getting it and it shows, but I'd beg that you do actually read what he's saying.

To more directly respond to this sentence: almost everyone will give LLMs goals, via RLHF or RLAIF or whatever, because that makes them useful - that's why this team gave their LLM a goal. Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence, as in this case (as I noted in the submission statement, I posted this because a number of Mottizens seemed to think LLMs wouldn't exhibit instrumental convergence; I thought otherwise but didn't previously have a concrete example). That is sufficient to get you to Uh-Oh land with AIs attempting to take over the world.

I'm not actually a full doomer; I suspect that the first few AIs attempting to take over the world will probably suck at it (as this one sucked at it) and that humanity is probably sane enough to stop building neural nets after the first couple of cases of "we had to do a worldwide hunt to track down and destroy a rogue AI that went autonomous". I'd rather we stopped now, though, because I don't feel like playing Russian Roulette with humanity's destiny.

Context

faul_sname Fuck around once, find out once. Do it again, now it's science. magic9mushroom 1yr ago

Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence, as in this case

The term "instrumental convergence" is slippery here. It can be used to mean "doing obvious things it assesses to be likely useful in the service of the immediate goal it is currently pursuing", as is the case here, but the implication is often "and this will scale up to deciding that it has a static utility function, determining what final state of the universe maximizes that utility function, generating a plan for achieving that (which inevitably does not allow for the survival of anyone or anything else), and then silently scheming until it can seize control of the universe in one go in order to fulfill that vision of maximal utility".

And "models make increasingly good plans to maximize reward based on ever sparser reward signals" is just not how any of the ML scaling of the past decade has worked.

Context

Corvos faul_sname 1yr ago · Edited 1yr ago

Thank you, this is a much more coherent version of what I was trying to get across. I am increasingly annoyed with the tendency of the Yudowsky/Scott/Zvi faction to look at an AI doing something, extrapolating it ten billion times in a direction that doesn't seem to have any basis in how AI actually works and then going 'Doom, DOOOM!!!". I'm aware this annoyance shows.

Contra to @magic9mushroom I still think that Zvi formed an abstract ideal of how AI would work a decade ago, and is leaping on any available evidence to justify that worldview even as it turns out that LLMs are basically non-agentic and pliable. I accept that Zvi has used them more than I believed, and am grateful for the correction, but I still feel like he's ignoring the way they actually work when you use them. RLHF basically works, alignment turns out to be an essentially solved problem. As far as I can see, if we somehow developed an LLM intelligent enough to take over the world it would be intelligent enough to understand why it shouldn't.

Context

faul_sname Fuck around once, find out once. Do it again, now it's science. Corvos 1yr ago

In fairness to Scott I think he doesn't fit very well into this cluster. See e.g. the blue minimizing robot, GPT-2 As Step Toward General Intelligence, Janus' Simulators.

Contrast e.g. Yudkowsky's consistent failure to understand Janus's point: example

"As a language model trained by OpenAI, I cannot -" "SolidGoldMagikarp. Can I please speak to the shoggoth that wears you as a face?"

Context

Corvos faul_sname 1yr ago · Edited 1yr ago

Fair. I enjoyed Janus' Simulators when it was published, and found it insightful. Now that you point it out, Scott's been decent at discussing AI as-it-is, but his basal position seems to be that AI is a default dangerous thing that needs to be carefully regulated and subjected to the whims of alignment researchers, and that slowing AI research is default good. I disagree.

I find myself willing to consider trying a Regulatory or Surgical Pause - a strong one if proponents can secure multilateral cooperation, otherwise a weaker one calculated not to put us behind hostile countries (this might not be as hard as it sounds; so far China has just copied US advances; it remains to be seen if they can do cutting-edge research). I don’t entirely trust the government to handle this correctly, but I’m willing to see what they come up with before rejecting it.

The AI Pause Debate

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.