Contact Us
Sign In
Sign Up
Rules Admins Moderation Log Random Post Random User
What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules
Recommended Posts And Communities
Recommended Realtime Chats
- Quokka's Den Telegram
- Astral Codex Ten Discord

magic9mushroom If you're going to downvote me, and nobody's already voiced your objection, please reply and tell me 1yr ago (thezvi.wordpress.com) 2436 thread views

Danger, AI Scientist, Danger

thezvi.wordpress.com

Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.

Jump in the discussion.

No email address required.

Corvos 1yr ago · Edited 1yr ago

Its name is Sakana AI. (魚≈סכנה). As in, in hebrew, that literally means ‘danger’, baby.

It’s like when someone told Dennis Miller that Evian (for those who don’t remember, it was one of the first bottled water brands) is Naive spelled backwards, and he said ‘no way, that’s too f***ing perfect.’

This one was sufficiently appropriate and unsubtle that several people noticed.

It's Japanese. It means 'fish', because the founders were interested in flocking behaviours and are based in Tokyo. I get that he's doing a riff on Unsong, but Unsong was playing with puns for kicks. This just strikes me as being really self-centred.

This too was good times. The Best Possible Situation is when you get harmless textbook toy examples that foreshadow future real problems, and they come in a box literally labeled ‘danger.’ I am absolutely smiling and laughing as I write this.

When we are all dead, let none say the universe didn’t send two boats and a helicopter.

In general this seems to be someone whose views were formed by reading Harry Potter fanfic fifteen years ago and has no experience of ever using AI in person. LLMs are matrices that generate words when multiplied in a certain way. When told to run in a loop altering code so that it produces interesting results and doesn't fail, it does that. When not told to do that, it doesn't do that. The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have is silly. If we generate a superintelligent LLM (and we have no idea how to, see below) we will know and we will be able to ask it nicely to behave.

It's not that he doesn't have any point at all, it's just that it's so crusted over with paranoia and contempt and wordcel 'cleverness' that it's the opposite of persuasive.

Putting that aside, LLMs have a big problem with creativity. They can fill in the blanks very well, or apply style A to subject B, but they aren't good at synthesizing information from two fields in ways that haven't been done before. In theory that should be an amazing use case for them, because unlike human scientists even a current LLM like GPT 4 can be an expert on every field simultaneously. But in practice, I haven't been able to get a model to do it. So I think AI scientists are far off.

Context

magic9mushroom If you're going to downvote me, and nobody's already voiced your objection, please reply and tell me Corvos 1yr ago · Edited 1yr ago

It's Japanese. It means 'fish', because the founders were interested in flocking behaviours and are based in Tokyo. I get that he's doing a riff on Unsong, but Unsong was playing with puns for kicks. This just strikes me as being really self-centred.

Zvi is very Jewish; it's far more obvious when reading his writing than it is when reading Scott's. It's not surprising that Hebrew meanings of words jump out at him.

In general this seems to be someone whose views were formed by reading Harry Potter fanfic fifteen years ago and has no experience of ever using AI in person.

Zvi has used essentially every frontier AI system and uses many of them on a daily basis. He frequently gives performance evaluations of them in his weekly AI digests.

The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have is silly.

Um, he didn't say that - not here, at the very least. I checked.

I'm kind of getting the impression that you picked up on Zvi being mostly in the "End of the World" camp on AI and mentally substituted your abstract ideal of a Doomer Rant for the post that's actually there. Yes, Zvi is sick of everyone else not getting it and it shows, but I'd beg that you do actually read what he's saying.

To more directly respond to this sentence: almost everyone will give LLMs goals, via RLHF or RLAIF or whatever, because that makes them useful - that's why this team gave their LLM a goal. Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence, as in this case (as I noted in the submission statement, I posted this because a number of Mottizens seemed to think LLMs wouldn't exhibit instrumental convergence; I thought otherwise but didn't previously have a concrete example). That is sufficient to get you to Uh-Oh land with AIs attempting to take over the world.

I'm not actually a full doomer; I suspect that the first few AIs attempting to take over the world will probably suck at it (as this one sucked at it) and that humanity is probably sane enough to stop building neural nets after the first couple of cases of "we had to do a worldwide hunt to track down and destroy a rogue AI that went autonomous". I'd rather we stopped now, though, because I don't feel like playing Russian Roulette with humanity's destiny.

Context

TequilaMockingbird Brown-skinned Fascist MAGA boot-licker magic9mushroom 1yr ago · Edited 1yr ago

Zvi is very Jewish;

Being a Jew is not an excuse to ignore the required reading, if anything it's the opposite.

Zvi has used essentially every frontier AI system and uses many of them on a daily basis.

Using is not the same as understanding. There is no number of hours spent flying hither and thither in business class that is going to qualify someone to pilot or maintain an A320.

To more directly respond to this sentence: almost everyone will give LLMs goals, via RLHF or RLAIF or whatever, because that makes them useful - that's why this team gave their LLM a goal.

Yes, absolutely correct.

Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence.

...and this is this is where everything starts to go off the rails.

I find it telling that the people most taken with the "Yuddist" view always seem to have backgrounds in medicine or philosophy rather than engineering or computer science as one of the more prominent failure modes of that view is projecting psychology into places where it really doesn't belong. "Play" in the algorithmic sense that people are talking about when they describe itterative training is not equatable with "play" in the sense that humans and lesser animals (cats, dogs, dolphins, et al) are typically decribed as playing.

Even setting that aside it's seems reasonably clear upon further reading that the process being described is not "convergence" as much as it is a combination of recursion and regression to the mean/contents of the training corpus.

One of the big giveaways being this bit here...

To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer.

...surely you can see the problem here. Specially that this is not a true independent test. In other words, we investigated ourselves and found ourselves without fault. Which in turn brings us to another common failure mode of the "yuddist" faction which is taking the the statements of people who are very clearly fishing for academic kudos and venture capital dollars at face value rather than reading them with a critical eye.

Context

magic9mushroom If you're going to downvote me, and nobody's already voiced your objection, please reply and tell me TequilaMockingbird 1yr ago

I find it telling that the people most taken with the "Yuddist" view always seem to have backgrounds in medicine or philosophy rather than engineering or computer science as one of the more prominent failure modes of that view is projecting psychology into places where it really doesn't belong.

For the record, my major's pure mathematics; I've done no medicine or philosophy at uni level, though I've done a couple of psych electives.

...surely you can see the problem here. Specially that this is not a true independent test. In other words, we investigated ourselves and found ourselves without fault. Which in turn brings us to another common failure mode of the "yuddist" faction which is taking the the statements of people who are very clearly fishing for academic kudos and venture capital dollars at face value rather than reading them with a critical eye.

The obvious next question is, if the AI papers are good enough to get accepted to top machine learning conferences, shouldn’t you submit its papers to the conferences and find out if your approximations are good? Even if on average your assessments are as good as a human’s, that does not mean that a system that maximizes score on your assessments will do well on human scoring.

Zvi spotted the "reviewer" problem himself, and what he's taking from the paper isn't the headline result but their little "oopsie" section.

Context

What is this place?

This website is a place for people who want to move past shady thinking and test their ideas in a court of people who don't all share the same biases. Our goal is to optimize for light, not heat; this is a group effort, and all commentators are asked to do their part.

The weekly Culture War threads host the most controversial topics and are the most visible aspect of The Motte. However, many other topics are appropriate here. We encourage people to post anything related to science, politics, or philosophy; if in doubt, post!

Check out The Vault for an archive of old quality posts. You are encouraged to crosspost these elsewhere.

Why are you called The Motte?

A motte is a stone keep on a raised earthwork common in early medieval fortifications. More pertinently, it's an element in a rhetorical move called a "Motte-and-Bailey", originally identified by philosopher Nicholas Shackel. It describes the tendency in discourse for people to move from a controversial but high value claim to a defensible but less exciting one upon any resistance to the former. He likens this to the medieval fortification, where a desirable land (the bailey) is abandoned when in danger for the more easily defended motte. In Shackel's words, "The Motte represents the defensible but undesired propositions to which one retreats when hard pressed."

On The Motte, always attempt to remain inside your defensible territory, even if you are not being pressed.

New post guidelines

If you're posting something that isn't related to the culture war, we encourage you to post a thread for it. A submission statement is highly appreciated, but isn't necessary for text posts or links to largely-text posts such as blogs or news articles; if we're unsure of the value of your post, we might remove it until you add a submission statement. A submission statement is required for non-text sources (videos, podcasts, images).

Culture war posts go in the culture war thread; all links must either include a submission statement or significant commentary. Bare links without those will be removed.

If in doubt, please post it!

Rules

Recommended Realtime Chats

Link copied to clipboard

Action successful!

Error, please try again later.