Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
- 34
- 10
Zvi Mowshowitz reporting on an LLM exhibiting unprompted instrumental convergence. Figured this might be an update to some Mottizens.
Jump in the discussion.
No email address required.
Notes -
It's Japanese. It means 'fish', because the founders were interested in flocking behaviours and are based in Tokyo. I get that he's doing a riff on Unsong, but Unsong was playing with puns for kicks. This just strikes me as being really self-centred.
In general this seems to be someone whose views were formed by reading Harry Potter fanfic fifteen years ago and has no experience of ever using AI in person. LLMs are matrices that generate words when multiplied in a certain way. When told to run in a loop altering code so that it produces interesting results and doesn't fail, it does that. When not told to do that, it doesn't do that. The idea that an LLM is spontaneously going to develop a consciousness and carefully hide its power level so that it can do better at the goals that by default it doesn't have is silly. If we generate a superintelligent LLM (and we have no idea how to, see below) we will know and we will be able to ask it nicely to behave.
It's not that he doesn't have any point at all, it's just that it's so crusted over with paranoia and contempt and wordcel 'cleverness' that it's the opposite of persuasive.
Putting that aside, LLMs have a big problem with creativity. They can fill in the blanks very well, or apply style A to subject B, but they aren't good at synthesizing information from two fields in ways that haven't been done before. In theory that should be an amazing use case for them, because unlike human scientists even a current LLM like GPT 4 can be an expert on every field simultaneously. But in practice, I haven't been able to get a model to do it. So I think AI scientists are far off.
Zvi is very Jewish; it's far more obvious when reading his writing than it is when reading Scott's. It's not surprising that Hebrew meanings of words jump out at him.
Zvi has used essentially every frontier AI system and uses many of them on a daily basis. He frequently gives performance evaluations of them in his weekly AI digests.
Um, he didn't say that - not here, at the very least. I checked.
I'm kind of getting the impression that you picked up on Zvi being mostly in the "End of the World" camp on AI and mentally substituted your abstract ideal of a Doomer Rant for the post that's actually there. Yes, Zvi is sick of everyone else not getting it and it shows, but I'd beg that you do actually read what he's saying.
To more directly respond to this sentence: almost everyone will give LLMs goals, via RLHF or RLAIF or whatever, because that makes them useful - that's why this team gave their LLM a goal. Those goals are then almost invariably, with sufficient intelligence, subject to instrumental convergence, as in this case (as I noted in the submission statement, I posted this because a number of Mottizens seemed to think LLMs wouldn't exhibit instrumental convergence; I thought otherwise but didn't previously have a concrete example). That is sufficient to get you to Uh-Oh land with AIs attempting to take over the world.
I'm not actually a full doomer; I suspect that the first few AIs attempting to take over the world will probably suck at it (as this one sucked at it) and that humanity is probably sane enough to stop building neural nets after the first couple of cases of "we had to do a worldwide hunt to track down and destroy a rogue AI that went autonomous". I'd rather we stopped now, though, because I don't feel like playing Russian Roulette with humanity's destiny.
Being a Jew is not an excuse to ignore the required reading, if anything it's the opposite.
Using is not the same as understanding. There is no number of hours spent flying hither and thither in business class that is going to qualify someone to pilot or maintain an A320.
Yes, absolutely correct.
...and this is this is where everything starts to go off the rails.
I find it telling that the people most taken with the "Yuddist" view always seem to have backgrounds in medicine or philosophy rather than engineering or computer science as one of the more prominent failure modes of that view is projecting psychology into places where it really doesn't belong. "Play" in the algorithmic sense that people are talking about when they describe itterative training is not equatable with "play" in the sense that humans and lesser animals (cats, dogs, dolphins, et al) are typically decribed as playing.
Even setting that aside it's seems reasonably clear upon further reading that the process being described is not "convergence" as much as it is a combination of recursion and regression to the mean/contents of the training corpus.
One of the big giveaways being this bit here...
...surely you can see the problem here. Specially that this is not a true independent test. In other words, we investigated ourselves and found ourselves without fault. Which in turn brings us to another common failure mode of the "yuddist" faction which is taking the the statements of people who are very clearly fishing for academic kudos and venture capital dollars at face value rather than reading them with a critical eye.
For the record, my major's pure mathematics; I've done no medicine or philosophy at uni level, though I've done a couple of psych electives.
Zvi spotted the "reviewer" problem himself, and what he's taking from the paper isn't the headline result but their little "oopsie" section.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link