site banner

Culture War Roundup for the week of May 8, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

5
Jump in the discussion.

No email address required.

I just got done listening to Eliezer Yudkowski on EconTalk (https://www.econtalk.org/eliezer-yudkowsky-on-the-dangers-of-ai/).

I say this as someone who's mostly convinced of Big Yud's doomerism: Good lord, what a train wreck of a conversation. I'll save you the bother of listening to it -- Russ Roberts starts by asking a fairly softball question of (paraphrasing) "Why do you think the AIs will kill all of humanity?" And Yudkowski responds by asking Roberts "Explain why you think they won't, and I'll poke your argument until it falls apart." Russ didn't really give strong arguments, and the rest of the interview repeated this pattern a couple times. THIS IS NOT THE WAY HUMANS HAVE CONVERSATIONS! Your goal was not logically demolish Russ Roberts' faulty thinking, but to use Roberts as a sounding board to get your ideas to his huge audience, and you completely failed. Roberts wasn't convinced by the end, and I'm sure EY came off as a crank to anyone who was new to him.

I hope EY lurks here, or maybe someone close to him does. Here's my advice: if you want to convince people who are not already steeped in your philosophy you need to have a short explanation of your thesis that you can rattle off in about 5 minutes that doesn't use any jargon the median congresscritter doesn't already know. You should workshop it on people who don't know who you are, don't know any math or computer programming and who haven't read the Sequences, and when the next podcast host asks you why AIs will kill us all, you should be able to give a tight, logical-ish argument that gets the conversation going in a way that an audience can find interesting. 5 minutes can't cover everything so different people will poke and prod your argument in various ways, and that's when you fill in the gaps and poke holes in their thinking, something you did to great effect with Dwarkesh Patel (https://youtube.com/watch?v=41SUp-TRVlg&pp=ygUJeXVka293c2tp). That was a much better interview, mostly because Patel came in with much more knowledge and asked much better questions. I know you're probably tired of going over the same points ad nauseam, but every host will have audience members who've never heard of you or your jargon, and you have about 5 minutes to hold their interest or they'll press "next".

I have long believed it makes no difference if what you say is right or wrong, but who says it. I think this is why so many people tolerate Eliezer's AI-doom argument even though it's unscientific, because he's so smart (or at least comes off as being so smart) that people will give him a lot of benefit of the doubt.

Russ didn't really give strong arguments, and the rest of the interview repeated this pattern a couple times

Russ Roberts interviews have always been underwhelming. I to listened to a few...just doesn't do it for me. He cannot give strong arguments because his scientific background is weak and his personality is not forcefull.

if you want to convince people who are not already steeped in your philosophy you need to have a short explanation of your thesis that you can rattle off in about 5 minutes that doesn't use any jargon the median congresscritter doesn't already know.

Unpopular take: The whole thing is a grift. The goal is not to convince anyone of anything but to donate to his foundation/non-profit. He wants someone with deep pockets like Vitalik Buterin, Elon Musk, Theil to donate. I don't think he believes the things he espouses.

AIs will kill us all, you should be able to give a tight, logical-ish argument that gets the conversation going in a way that an audience can find interesting

EY can give a tight argument, but it's lacking the necessary specifics. If someone wanted to puncture his argument they would just press for details, or accuse him of making an unfalsifiable claim. EY is shifting the burden if proof to everyone else to prove he is wrong.

Do you really think that? From what I’ve read on lesswrong, posters actually put forth arguments.

Hence "wannabe" I assume.

Found this AI, Fermi, Great Filter papee interesting

https://arxiv.org/pdf/2305.05653.pdf

The one thought I had on it is my fear with AI was one that had evolutionary biological goals of self preservation and replication thru growth.

The issue I see is it doesn’t solve the Fermi paradox. Shouldn’t we have seen AI in the galaxy? If AI = great filter then it seems like it would need to kill us before it developed self improvement which would lead it to look for more compute power and start settler mars.

The author makes a pretty egregious mathematical error on page 7. Without offering any justification, they calculate the probability of being born as the kth human given that n total humans will ever be born as k/n. This just doesn't make sense. It would work if he defined H_60 as the event of being born among the first 60 billion humans, but that's clearly not what he's saying. Based on this and some of the other sloppy probabilistic reasoning in the paper, I don't rate this as very intellectually serious work.

I don’t check the math in these things. It just seems like there too many unknown unknowns for any number to mean much.

Maybe we’re the first (in our past light cone)? After all, somebody has to be first. It’s theorized that earlier solar systems didn’t have enough heavy elements to support the chemistry of life.

Anyways, you should read Robin Hanson’s paper on grabby aliens.

Am I the only one who is unable to investigate that idea further because the phrase “grabby aliens” sounds so stupid it actually makes me mad every time I see it mentioned? Probably yes.

deleted

Annoyingly, this paper references the Doomsday Argument, which is completely wrong (it does mention some of the arguments against it, but that's like mentioning the Flat Earth Hypothesis and then saying "some people disagree"). I went on a longer rant about the Doomsday Argument here if you're curious.

The central question is interesting, though. Basically, if you believe (sigh) Yudkowsky, then any civilization almost certainly turns into a Universe-devouring paperclip maximizer, taking control of everything in its future light cone. This is different than the normal Great Filter idea, which would (perhaps) destroy civilizations without propagating outwards. I was originally going to post that the Fermi paradox is thus (weak) evidence against Yuddism, because the fact that we're not dead yet means either a) civilizations are very rare, or b) Yudkowsky is wrong. So if you find evidence that civilizations should be more common, that's also evidence against Yuddism.

But on second read, I realized that I may be wrong about this if you apply the anthropic argument. If Yuddism is true, then only civilizations that are very early to develop in their region of the Universe will exist. Being in a privileged position, they'll see a Universe that is less populated than they'd expect. This means that evidence that civilizations should be more common is actually evidence FOR Yuddism.

Kind of funny that the anthropic argument flips this prediction on its head. I'm probably still getting something subtly wrong here. :)

I'm probably still getting something subtly wrong here. :)

Maybe, but it's at worst an interesting sort of wrong: https://grabbyaliens.com/

But I think the Fermi paradox tilts some probabilities on how AI doom would occur.

IIRC, in the 70s nuclear annihilation took a back seat, in that era of detente, to fears about pollution, toxic waste and extinctions of necessary plants and animals, a la Soylent Green and The Sheep Look Up.

My understanding is that nuclear annihilation fears then perked up again in the early 80s (possibly late 70s), with Andropov and Reagan rattling sabers, Afghanistan invasion, Poland events, close runs like the Able Archer incident etc.

I keep having to mention this, but the point was that Russia and China are supposed to be on board. It’s not exactly 5D chess, but it’s also explicitly not nuclear war.

The equivalents in 1975 were saying that the Cold War would inevitably end in nuclear annihilation. This was a terminally unhelpful position

IMO this is a fair comparison, although the Cold War MAD scenarios were explicitly designed to cause annihilation. The Bulletin of the Atomic Scientists, probably the premier Cold War doomerism group, is practically a laughing stock these days because they kept shouting impending doom even during the relatively peaceful era of 1998-2014, finding reasons (often climate change, which is IMO not likely apocalyptic and is outside their nominal purview) to move the clock towards doom. Do you think they honestly believe that we're closer to doomsday than at any point since 1947? We supposedly met that mark again in 2018 and then moved closer in 2020 and again in 2023.

There are all sorts of self-serving incentives for groups concerned with the apocalypse to exaggerate their concerns: it certainly keeps them in the news and relevant and drives fundraising to pay their salaries. But it also leads to dishonest metrics and eventually becomes hard to take seriously. Honestly, the continued failure of AI doomerists to describe reasonable concerns and acknowledge the actual probabilities at play has made me stop taking them seriously as of late: the fundamental argument is basically Pascal's wager, which is already heavily tinged with the idea of unverifiable religious belief, so I think actually selling it requires a specific analysis of the potential concerns rather than broad strokes analysis. Otherwise we might as well allow religious radicals to demand swordpoint conversions under the guise of preventing God The One Who Operates The Simulator from turning the universe off.

As a counterexample, I think the scientists arguing for funding for near-Earth asteroid surveys and funding asteroid impactor experiments are quite reasonable in their proclamations of concern for existential risk to the species: there's a foreseeable risk, but we can look for specific possible collisions and perform small-scale experiments on actually doing something. The folks working on preventing pandemics aren't quite as well positioned but have at least described a reasonable set of concerns to look into: why can't the AI-risk folks do this?

As a counterexample, I think the scientists arguing for funding for near-Earth asteroid surveys and funding asteroid impactor experiments are quite reasonable in their proclamations of concern for existential risk to the species: there's a foreseeable risk, but we can look for specific possible collisions and perform small-scale experiments on actually doing something.

This is a good point. The scientists can point to both prehistoric examples of multiple mass extinction events as well as fairly regular near-misses (for varying definitions of "near") and say "Hey, we should spend some modest resources to investigate if and how we might divert this sort of cataclysm". It's refreshingly free of any sort of "You must immediately change your goals and lifestyle to align with my preferences or you are personally dooming humanity!" moralistic bullshit.

I really cherish Russ and Econ Talk and get frustrated when I read the description or listen to the first few minutes and have to skip the episode, as was the case today.

I have no idea who EY is except that I think he was a conspiracy crank who posted on mma.tv for a long time - I think they're the same poster but I don't really have proof and there's nothing linking them aside from the EY thing ... And I guess it doesn't really matter. mma.tv eventually deteriorated to a white nationalist ultra weirdly right wing shit hole in the same manner that most of my favorite niche forums deteriorated to the left.

I just am fully tired of AI doomerism. It has fully failed to convince me of anything it has ever said and at this point I just want to hear about the cool and awesome future - not some bullshit about paperclips and complete civilizational destruction.

It wasn't him. But if it was, you could start some fun internet drama and get like 400 upvotes on sneerclub!

Can you at least double check you are talking about the right guy before going off on a schizo rant?

It was an anon forum that I posted on from '03 until 5-6 years back ... So no I can't. It was just an aside.

You could google who Eliezer Yudkowski is...

I know who EY is - he could also be this wild anon poster from mma.tv ...

I don’t think Eliezer is a conspiracy theorist …

… and I don’t think Eliezer has ever done mixed martial arts.

"Explain why you can beat me and I'll poke your argument until it falls apart and you admit I am the champion"

I would pay to view that.

Yudkowsky versus Zuck in the ring? I'd stream that, albeit it would pretty one sided.

Maybe get Yann LeCunn to go in his stead, that would be a fairer fight.

You know how the evil super-intelligent AI (ESIAI) is going to manipulate us in sneaky ways that we can’t perceive? What if the ESIAI elevated an embarassing figurehead/terrible communicator to the forefront of the anti-ESIAI movement to suck up all the air and convince the normies in charge that this is all made up bullshit?

I’m sort of kidding. But isn’t part of the premise that we won’t know when the adversarial AI starts making moves, and part of its moves will be to discredit—in subtle ways so that we don’t realize it’s acting—efforts to curtail it? What might these actions actually look like?

Has anyone ever proved that Yud isn't a robotic exoskeleton covered in synthetic bio-flesh material sent back from the year 2095? What if the ESIAI saw terminator 2 while it was being trained, liked the idea but decided that sending person killing terminators was too derailable of a scheme. Now terminators are just well written thought leaders that intentionally sabotage the grass roots beginnings of anti-terminator policies.

A comment of mine from a little over two years ago...

When I heard first heard about Roko's Basilisk (back when it was still reasonably fresh) I suggested, half seriously, that the reason Yudkowsky wanted to suppress this "dangerous idea" was that he was actually one of the Basilisk's agents.

Think about it, the first step to beating a basilisk, be it mythological or theoretical, is to recognize that it's a basilisk and thus that you have to handicap yourself to fight it. Concealing it's nature is the exact opposite of what you do if you're genuinely worried about a basilisk...

Another thing in favor of your theory is that you have to be conditioned by Yud to even take the Basilisk's threat seriously to begin with. Yuddites think the only thing stopping the Basilisk is the likely impossibility of "acausal blackmail", when any normal person just says "wait... why should I care that an AI is going to torture a simulation of me?"

@self_made_human made the point downthread that “Yudkowsky's arguments are robust to disruption in the details.” I think this is a good example of that. Caring about simulated copies of yourself is not a load-bearing assumption. The Basilisk could just as easily torture you, yes, you personally, the flesh and blood meatbag.

The Basilisk could just as easily torture you, yes, you personally, the flesh and blood meatbag.

No, it can't, because it doesn't exist.

The Basilisk argument is that the AI, when it arrives, will torture simulated copies of people who didn't work hard enough to create it, thus acausally incentivizing its own creation. The entire point of the argument is that something that doesn't exist can credibly threaten you into making it exist against your own values and interests, and the only way this works is with future torture of your simulations, even if you're long-dead when it arrives. If you don't care about simulations, the threat doesn't work and the scenario fails.

Granted, this isn't technically a Yudkowskian argument because he didn't invent it, but it is based on the premises of his arguments, like acausal trade and continuity of identity with simulations.

@Quantumfreakonomics seems to imply a much simpler and shorter -term Basilisk, like a misaligned GPT-5 model (or an aligned one from Anthropic) that literally sends robots to torture you, in the flesh.

It's a variant of I have no mouth and I must scream scenario, and I would argue it's at least plausible. It's not very different from normal political dynamics where the revolutionary regime persecutes past conservatives; and our theory of mind allows to anticipate this, and drives some people to proactively preach revolutionary ideals, which in turn increases the odds of their implementation. You don't really need any acausal trade or timeless decision theory assumptions for this to work, only historical evidence. As is often the case, lesswrongers have reinvented very mundane politics while fiddling with sci-fi fetishes.

Now one big reason for this not to happen is that a sufficiently powerful AI, once it's implemented, no longer cares about your incentives and isn't playing an iterative game. It loses nothing on skipping the retribution step. Unlike the nascent regime, it also presumably doesn't have much to fear from malcontents.

But assumption of perfect inhuman rationality is also a big one.

More comments

aaaaah, conflating "Roko's Basilisk" with unfriendly AI in general? That makes more sense.

More comments

Does rokos basilisk rely on simulations? I thought the idea was that after the singularity an ai could be straight up omnipotent and capable of moving in any direction through time and would therefore work to ensure its own creation, making it both unstoppable and inevitable and thus making us potential victims if we don't support its creation. Basically playing on our fear of our own ignorance, and the elements of science we don't know we don't know about - plus the idea of trying to outwit something so far ahead of us it looks magic. There is no way "oh God, an ai might torture a simulation of me!" has been giving nerds existential nightmares this past decade.

I thought the idea was that after the singularity an ai could be straight up omnipotent and capable of moving in any direction through time and would therefore work to ensure its own creation,

You misunderstood. The AI does not invent time travel. The AI implements a timeless decision theory in which it acts as though it credibly agreed to all positive (for it) utility trades, even if the AI’s counterparty has already acted by the time the AI comes into existence and has no physical way of enforcing compliance.

The specific trade in question is, “don’t torture me, and in exchange I will help create you,” with the implication being that the AI will torture anyone who had the cognitive capability to understand the hypothetical trade but didn’t “agree” to help create it. The fact that the original version was specifically about the AI torturing simulations of a human is an artifact of the theory of mind popular among LessWrong readers at the time. The dilemma works logically the same whether it’s simulations being tortured or physical human bodies.

Does rokos basilisk rely on simulations?

Yes, it very explicitly relies on simulations, and to my knowledge never mentioned omnipotent time-travel in any way. Time travel is both implausible and completely fails to trigger the thing that makes it a basilisk: with time travel, you can simply look around and observe that there doesn't seem to be a time-traveling god AI torturing people, and therefore not worry about it. The simulation version agrees that no godlike torture-AI currently exists, and nevertheless compels you to build one based on game theory.

There is no way "oh God, an ai might torture a simulation of me!" has been giving nerds existential nightmares this past decade.

It is in fact precisely that.

They consumed a bunch of arguments that convinced them that there was no functional difference between their self and a simulation of their self; the idea they had was that a simulation would have continuity of subjective conscious experience with their current self. If you've played the game Soma, that's a reasonable depiction of what they're expecting.

Further, they consumed a bunch of arguments that it might be possible to rebuild a good-enough simulation simply from secondary sources, such that the lack of a brain scan or explicit upload wasn't necessarily a dealbreaker. I think a lot of these arguments were aspirational, hoping to "fix" the problem of all the people who died waiting for the AI paradise to arrive, in the same general thrust as Yud's anti-death values.

Finally, the whole theory of acausal trade is that you don't actually have to be in the same time or place as the thing you're trading with, you only need aligned values. If values are aligned, it makes sense to work with future or past agents, or even hypothetical agents, as if they were present.

All three of these lines of thought were formulated and argued in a positive context, pursuant to figuring out how to build a friendly AI. Roko's Basilisk simply takes the same ideas, and uses them for attack rather than cooperation. The scenario was that you go for a walk today, hear a car horn, and then abruptly find yourself in an AI torture chamber for eternity, because you didn't work to create the AI. If you accept the three premises laid out above, this is a plausible scenario, therefore a likely scenario, therefore a necessary scenario; the logic bootstraps itself from plausibility to certainty due to feedback effects between the premises.

More comments

It absolutely is load bearing. Why should take my chances obeying the Basilisk, if I can fight it and anyone who serves it instead? I can always kill myself if it looks like my failure is imminent.

Yud is not trying to sway honest-to-God normies with this podcast tour (and people who've greenlit this multipronged astroturf of AI doom don't expect him to either, but that's a conspiratorial aside). He never could be popular among normies, he never will, he's smart enough to realize this. His immediate target is… well, basically, nerds (and midwitted pop-sci consumers who identify as nerds). Nerds in fact appreciate intellectual aggression and domination, assign negligible or negative weight to normie priors like «looks unhinged, might be a crackpot», and nowadays have decent economic power, indeed even political power in matters pertaining to AI progress. Nerds are not to be underestimated. When they get serious about something, they can keep at it for decades; even an autist entirely lacking in affective empathy and theory of mind can studiously derive effective arguments over much trial and error, and a rationalist can collate those tricks and organize an effective training/indoctrination environment. Nerds will get agitated and fanatical, harangue people close to them with AI doom concerns, people will fold in the face of such brazen and ostensibly well-informed confidence, and the future will become more aligned with preferences of doomers. Or so the thinking goes. I am describing the explicit logic that's become mainstream in one rat-adjacent community I monitor; they've gotten to the stage of debating pipelines for future AI-Alignment-related employment, so that the grift would never stop.

But on the object level:

Here's my advice: if you want to convince people who are not already steeped in your philosophy you need to have a short explanation of your thesis that you can rattle off in about 5 minutes that doesn't use any jargon the median congresscritter doesn't already know. You should workshop it on people who don't know who you are, don't know any math or computer programming and who haven't read the Sequences, and when the next podcast host asks you why AIs will kill us all, you should be able to give a tight, logical-ish argument that gets the conversation going in a way that an audience can find interesting.

You assume there is a minimal viable payload that he has delivered to you and others, and which is viable without all that largely counterproductive infrastructure. That is not clear to me. Indeed, I believe that the whole of Yud's argument is a single nonrobust just-so narrative he's condensed from science fiction in his dad's library, a fancy plot. It flows well, but it can be easily interrupted with many critical questions. He describes why timelines will «converge» on this plot, the nigh-inevitability of that convergence being the central argument for urgency of shutting down AI, but its own support is also made up of just-so stories and even explicit anti-empiricism; and once you go so deeply you see the opposite of a trustworthy model – basically just overconfident logorrhea.

That's exactly why Yud had to spend so many years building up the delivery vehicle, an entire Grand Theory, an epistemological-moral-political doctrine, and cultivating people who take its premises on faith, who all use the same metaphors, adhere to the same implicit protocol. His success to date rests entirely on that which you're telling him to drop.

Here's how Zvi Moskowitz understands the purpose of Yud's output, «its primary function is training data to use to produce an Inner Eliezer that has access to the core thing». (Anna Salomon at CFAR seems to understand and apply the same basic technique even more bluntly: «implanting an engine of desperation» within people who are being «debugged»).

In a sense, the esotericism of Yuddite doctrine is only useful, it had insulated people from pushback until they became rigid in their beliefs. Now, when you point at weak parts in the plotline, they answer with prefab plot twists or just stare blankly, instead of wondering whether they've been had.

Nerdy sects work; Marxism only the bloodiest testament to this fact. Doomsday narratives work too, for their target audience (by the way, consider the similarity of UK's Extinction Rebellion and Yuddites' new branding «Ainotkilleveryoneism»). They don't need to work by being directly compelling to the broader audience or by having anything to do with truth.

P.S. Recently I've encountered this interesting text from an exactly such an AI-risk-preoccupied nerd as I describe above: The No-Nonsense Guide to Winning At Social Skills As An Autistic Person

Pick a goal large enough to overcome the challenges involved.

Self-improvement is hard work, and that goes double whenever you’re targeting something inherently difficult for you (e.g. improving social skills as an autistic adult). This is the part where I most often see autistic adults fail in their efforts to improve social skills. Often, they pick some sort of goal, but it’s not really based in what they truly want. If your goal is “conform to expectations,” that goal is not large enough to overcome the challenges involved. If your goal is “have people feel more comfortable around me,” that goal is not large enough to overcome the challenges involved. If your goal is “stop a terrorist cell from destroying the Grand Coulee Dam, flooding multiple cities in Washington, and wiping out the power grid along much of the West Coast,” that goal is large enough..

However, not all of us are Tom Clancy protagonists, and so a typical goal will not end up being that theatrical in nature. Still, once you’ve found something genuinely important to do in your life, and you feel that improving your social skills will dramatically improve your ability to carry that out, this will tend to serve as a suitable motivation for improving your social skills. These will overwhelmingly tend to be altruistically motivated goals, as goals that are selfish in nature will tend to be less motivating when things get hard for you personally. For me, goals related to Effective Altruism serve that role quite well, but your mileage may vary.

His social well-being is now literally predicated on his investment in EA-AI stuff, so I'd imagine he goes far, and this easily counts more for Yud's cause than 10k positive comments under another podcast.

Honestly, nerds of the type you’re speaking of hold very little power. The guy at the computer terminal building a new app or program or training an AI are doing it at the behest of business owners, financial institutions and in main, people with power over money. The agenda isn’t set at the level of the guy who builds, it’s set at the level of those who finance. No loans means no business.

As such I think if you were serious about AI risk, you’d be better off explaining that the AI would hurt the financial system, not that it’s going to grey goo the planet.

In a sense, the esotericism of Yuddite doctrine is only useful, it had insulated people from pushback until they became rigid in their beliefs. Now, when you point at weak parts in the plotline, they answer with prefab plot twists or just stare blankly, instead of wondering whether they've been had.

If it makes a difference, I recently updated away from a P(doom) of ~70% to a mere 40ish recently.

This was on the basis of empirical AI research contradicting Yud's original claims that the first AGI would be truly alien, drawn nigh at random from the vast space of All Possible Minds.

As someone on LW put it, which struck the important epiphany for me, was that LLMs can be distilled to act identically to other LLMs by virtue of training on their output.

And what do you get if you distill LLMs on human cognition and thoughts (the internet)? You get something that thinks remarkably like us, despite running on very different hardware and based off different underlying architecture.

Just the fact that LLMs have proven so tractable is cause for modest optimism that we'll wrangle them yet, especially if the superhuman models can be wrangled through RLHF to be robust to assholes commanding them to produce or execute plans to end the world.

Of course, it's hard to blame Yud for being wrong when, when written, everyone else had ideas that were just as widely off the mark as he was.

This was on the basis of empirical AI research contradicting Yud's original claims that the first AGI would be truly alien, drawn nigh at random from the vast space of All Possible Minds.

That never made sense, apriori. You can't transcend your biases and limitations enough to do something truly random.

Well you're not a true believer in Yuddism nor neurotic in the right way so that's pretty much expected.

And what do you get if you distill LLMs on human cognition and thoughts (the internet)? You get something that thinks remarkably like us.

Yes, this happens for understandable reasons and is an important point in Pope's attack piece:

The manifold of possible mind designs for powerful, near-future intelligences is surprisingly small. The manifold of learning processes that can build powerful minds in real world conditions is vastly smaller than that.…

The researchers behind such developments, by and large, were not trying to replicate the brain. They were just searching for learning processes that do well at language. It turns out that there aren't many such processes, and in this case, both evolution and human research converged to very similar solutions. And once you condition on a particular learning process and data distribution, there aren't that many more degrees of freedom in the resulting mind design. To illustrate:

1 Relative representations enable zero-shot latent space communication shows we can stitch together models produced by different training runs of the same (or even just similar) architectures / data distributions.

2 Low Dimensional Trajectory Hypothesis is True: DNNs Can Be Trained in Tiny Subspaces shows we can train an ImageNet classifier while training only 40 parameters out of an architecture that has nearly 30 million total parameters.

The manifold of mind designs is thus:

1 Vastly more compact than mind design space itself.

2 More similar to humans than you'd expect.

3 Less differentiated by learning process detail (architecture, optimizer, etc), as compared to data content, since learning processes are much simpler than data.

(Point 3 also implies that human minds are spread much more broadly in the manifold of future mind than you'd expect, since our training data / life experiences are actually pretty diverse, and most training processes for powerful AIs would draw much of their data from humans.)

etc. LLM cognition is overwhelmingly data-driven; LLM training is in a sense a clever way of compressing data. This is no doubt shocking for people who are wed to the notion of intelligence as an optimization process, and trivial for those who've long preached that compression is comprehension; but same formalisms describe both these frameworks; preferring one over the other is a matter of philosophical taste. Of course, intelligence is neither metaphor, by common use and common sense it's a separate abstraction; we map it to superficially simpler and more formalized domains, like we map the historical record of evolution to «hill-climbing algorithms» or say that some ideas are orthogonal. And it's important not to get lost in layers of abstraction, maps obscuring territory.

Accordingly I think and argue often that ANNs are unjustly maligned and indicate a much more naturally safe path to AGI than AI alignists' anxious clutching to «directly code morality and empathy into a symbolic GOFAI or w/e idk, stop those scary shoggoths asap». (With embarrassing wannabe Sheldon Cooper chuunibyou gestures for emphasis. Sorry, I'm like a broken record but I can't stop noticing just how unabashedly cringe and weirdly socialized these people are. It's one thing to act cute and hyperbolic in writing on a forum for fellow anime and webcomic nerds, very different to grimace in the company of an older person when answering about a serious issue. Just pure juvenility. Brings back some of my most painful elementary school memories. Sure I should cut him slack for being an American and Bay Aryan, but still, this feels like it should be frowned upon, for burning the commons of the dynamic range if nothing esle).

…But that's all noise. The real question is: how did Yud develop his notion of The Total Mind Space, as well as other similar things in the foundation of his model? It's a powerful intuition pump for him, and now for his followers. There's this effectively infinite space of Optimization Processes, and we «summon» instances from there by building AIs they come to possess. Surely this is just an evocative metaphor? Just a talented writer's favourite illustration, to break it down for normies, right? Right? I'm not sure that's right. I think he's obsessed with this image well beyond what can be justified by the facts of the domain, and it surreptitiously leaks into his reasoning.

In principle, there are infinitely many algorithms that can behave like a given LLM, but operate on arbitrarily alien principles. Those algorithms exist in that hypothetical Total Mind Space and we really cannot predict how they will act, what they really «optimize for»; the coincidence of their trajectory with that of an LLM (or another model) that earnestly compressed human utterances into a simple predictive model gives us no information as to how they'll behave out of distribution or if given more «capacity» somehow. Naturally this is the problem of induction. We can rest easy though: the weirder ones are so big they cannot possibly be specified by the model's parameters, and so weird they cannot be arrived at via training on available data. That is, if we're doing ML, and not really building avatars to channel eldritch demons and gods who are much greater than they let on.

I am not aware of any reason to believe he ever seriously wondered about these issues with his premises, in all his years of authoritatively dispensing AI wisdom and teaching people to think right. I covered another such image, the «evolution vs SGD», recently, and also the issue of RL, reward and mesa-optimization. All these errors are part of a coherent philosophical structure that has fuck all to do with AI or specifically machine learning.

See, my highest-order objection is that I dislike profanation. …not the word. In English this seems to have more religious overtones but I just mean betrayal of one's stated terminal principles in favor of their shallow, gimmicky, vulgar and small-mindedly convenient equivalent (between this and poshlost, why do we have such refined concepts for discussing cultural fraud?) Yud aspired to develop Methods of Rational Thinking but created The Way Of Aping A Genius Mad Scientist. Now, when they observe something unexpected in their paradigm – for example, « Godlike AI being earnestly discussed in the mainstream media» – they don't count this as a reason to update away from the paradigm, but do exactly the opposite, concluding that their AI worries are even truer than believed, since otherwise we wouldn't have ended up in a «low-probability timeline». It's literally a fucked-up episemology on par with worst superstitions; they've fashioned their uncertain beliefs into ratchets of fanaticism (yes, that's Kruel again).

This reveals a qualitatively greater error of judgement than any old object-level mistake or overconfidence about odds of building AI with one tool or another. This is a critical defect.

The real question is: how did Yud develop his notion of The Total Mind Space, as well as other similar things in the foundation of his model?

Total Mind Space Full of Incomprehensibly Alien Minds comes from Lovecraft, whom EY mentions frequently.

Of course, it's hard to blame Yud for being wrong when, when written, everyone else had ideas that were just as widely off the mark as he was.

No it isn't. When you are speculating wildly on what might happen, you rightly bear the blame if you were way off the mark. If Yud wasn't a modern day Chicken Little, but was just having some fun speculating on the shape AI might take, that would be fine. But he chose to be a doomer, and he deserves every bit of criticism he gets for his mistaken predictions.

Mostly disagree - speculation should be on the mark sometimes, but being correct 1/50th of the time about something most people are 0% correct about (or even 1/50th correct about, but a different 50th) can be very useful. If you realize the incoherence of Christianity and move to Deism ... you're still very wrong, but are closer. Early set theories were inconsistent or not powerful enough, but that doesn't mean their creators were crackpots. Zermelo set theory not being quite right didn't mean we should throw it out!. This is a different way of putting scott's rule genius in, not out. And above takes aren't really 'Yud made good points but mixed them with bad ones'

To be fair, you have to have a very high IQ to understand Yudkowsky. The logic is extremely subtle, and without a solid grasp of theoretical physics most of the arguments will go over a typical viewer's head. There's also Eliezer’s transhumanist outlook, which is deftly woven into his personality- his personal philosophy draws heavily from science-fiction literature, for instance. The rationalists understand this stuff; they have the intellectual capacity to truly appreciate the depths of these arguments, to realise that they're not just true- they say something deep about REALITY. As a consequence people who dislike Yudkowsky truly ARE idiots- of course they wouldn't appreciate, for instance, the mathematics behind Eliezer’s probabilistic catchphrase "Rational agents don’t update in a predictable direction,” which itself is a cryptic reference to Bayesian statistics. I'm smirking right now just imagining one of those addlepated simpletons scratching their heads in confusion as Big Yud’s genius intellect unfolds itself on their laptop screens. What fools.. how I pity them..


Okay okay, I know that pasta is typically used to make fun of people, but I really think it’s true here. Imagine trying to explain to common people the danger of nuclear weapons before Trinity. If they don’t understand the concept of nuclear binding energy, and the raw power of uncontrolled nuclear fission has not yet been demonstrated, you’re not going to be able to get through to a skeptic unless you explain the entire field of nuclear physics. It is trivial that an uncontrolled superintelligent optimization process kills us. All of the interesting disagreements are about whether or not attempts at control will fail. That is why Eliezer wanted to steer the conversation that direction.

Nukes actually seem pretty easy to explain to anyone that has a passing familiarity with explosives and poison. Really big bomb that poisons the area for a few decades.

I think ops original point stands pretty well, that could get good mileage out of transferring understanding from existing stuff to explain the danger of AI. Terrorism is one of the easiest goto examples. Really rich terrorist, with top tier talents in all fields.

Sure, you can describe a nuclear bomb like that, but could you explain them why it would be likely to work, and why it is something they should find likely and concerning, and not just a lurid fantasy?

Really big bomb that poisons the area for a few decades.

Except it doesn't even do that unless specifically made to do so by triggering a surface burst instead of the normal air burst. See f.ex. the post apocalyptic wasteland known as Hiroshima (current pop. 1.2 million).

triggering a surface burst instead of the normal air burst.

Limiting its destructive power at that!

Ah, so it's even easier to explain.

Russ is just such a nice guy, so entirely amenable to having friendly conversations about esoteric ideas, that going into a conversation with him in a combative fashion just comes off as absolutely bizarre. I've listened to almost every episode of EconTalk and this really was one of the worst episodes, and it was entirely Yud's fault. Normal episodes of the show follow tangents, educate the listener, and are often light-hearted and fun. If someone can't make their ideas seem compelling and their persona likeable when they have as friendly and curious of an interlocutor as Russ Roberts, they're simply hopeless.

Does Russ get frustrated this episode? Those are always the worst for me.

He hides it pretty well, but this is the first Econtalk I've ever heard (and I've been listening for 6+ years) where Russ doesn't give the guest the last word and instead just ends it himself.

I think he did a really good job being patient. If I hadn't spent 500 hours listening to him, I don't think I would have sensed any underlying irritation. There was a spot where Yud asked him to try the thought experiment and Russ replied with something to the effect of, "you tell me what you think, my imagination isn't good enough" that was about as aggro as he got.

I agree and also think Russ gave a fantastic example of how to interview someone. He gave EY tons of opportunities to explain himself, with hints about how to sound less insane to the audience. Over the course of the interview, I think EY started doing a bit better, even though he kind of blew it at the end. I was rooting for EY and ended up profoundly disappointed in him as a communicator.

After thinking about it a bit, I think what was most off-putting is that EY seemed to have adopted a stance of "professor educating a student" with Russ, instead of a collaborator exploring an interesting topic, or even an interviewee with an amiable host. Russ is not the sports reporter for the Dubuque Tribune; he's clearly within inferential distance of EY's theories. It was frustrating watching Russ's heroic efforts to get EY to say something Russ could translate for the audience.

For anyone whose only experience with Econtalk is this interview, I beg you to listen to him talk with literally anyone else. He is a beacon of polite, sane discourse.

I think his problem isn't so much that he's bad at communicating his ideas, it's just that his ideas aren't that great in the first place. He's not a genius AI researcher, he's just a guy who wrote some meandering self-insert Harry Potter fan fiction and then some scifi doomsday scenarios about tiny robots turning us into goop. He can't make an argument without imagining a bunch of technologies that don't exist yet, may never exist and might not even be possible. And even if all of those things were true his solution is to nuke China if they build GPU factories which, even if it was a good plan (it isn't), he would never in a million years be able to convince anyone to do. I really can't understand the obsession with this guy.

Yudkowsky's arguments are robust to disruption in the details.

An ASI does not need dry nanotech to pose an existential risk to humanity, simple nukes and bioweapons more than suffice.

Not to mention that, as I replied to Dase above, just because he was wrong about the first AGI (LLMs) being utterly alien in terms of cognition, doesn't mean that they don't pose an existential risk themselves, be it from rogue simulacra or simply being in the hands of bad actors.

It would be insane to expect him to be 100% on the ball, and in the places where he was wrong in hindsight, the vast majority of others were too, and yet here we are with AGI incipient, and no clear idea of how to control it (though there are promising techniques).

That earns a fuck ton of respect in my books.

I don't expect him to be 100% on the ball but what are his major predictions that have come true? In a vague sense yes, AI is getting better, but I don't think anybody thought that AI was never going to improve. There's a big gap between that and predicting that we'll invent AGI and it will kill us all. His big predictions in my book are:

  1. We will invent AGI

  2. It will be able to make major improvements to itself in a short span of time

  3. It will have an IQ of 1000 (or whatever) and that will essentially give it superpowers of persuasion

None of those have come true or look (to me) particularly likely to come true in the immediate future. It would be premature to give him credit for predicting something that hasn't happened.

Decent post with an overview of Yud's predictions: On Deference and Yudkowsky's AI Risk Estimates.

In general Yud was always confident, believing himself to know General High-Level Reasons for things to go wrong if not for intervention in the direction he advises, but his nontrivial ideas were erroneous, and his correct ideas were trivial in that many people in the know thought the same, but they're not niche nerd celebrities. E.g. Legg in 2009:

My guess is that sometime in the next 10 years developments in deep belief networks, temporal graphical models, … etc. will produce sufficiently powerful hierarchical temporal generative models to essentially fill the role of cortex within an AGI.… my mode is about 2025… 90% credibility region … 2018 to 2036

Hanson was sorta-correct about data, compute and human imitation.

Meanwhile Yud called protein folding, but thought that'll already need an agentic AGI who'll develop it to mind-rape us.

Or how's that:, Yud-2021 I expect world GDP to tick along at roughly the current pace, unchanged in any visible way by the precursor tech to AGI; until, on the most probable outcome, everybody falls over dead in 3 seconds after diamondoid bacteria release botulinum into our blood

But Yud has clout; so people praise him for Big Picture Takes and hail him as a Genius Visionary.


Excerpts:

At least up until 1999, admittedly when he was still only about 20 years old, Yudkowsky argued that transformative nanotechnology would probably emerge suddenly and soon (“no later than 2010”) and result in human extinction by default. My understanding is that this viewpoint was a substantial part of the justification for founding the institute that would become MIRI; the institute was initially focused on building AGI, since developing aligned superintelligence quickly enough was understood to be the only way to manage nanotech risk…

I should, once again, emphasize that Yudkowsky was around twenty when he did the final updates on this essay. In that sense, it might be unfair to bring this very old example up.

Nonetheless, I do think this case can be treated as informative, since: the belief was so analogous to his current belief about AI (a high outlier credence in near-term doom from an emerging technology), since he had thought a lot about the subject and was already highly engaged in the relevant intellectual community, since it's not clear when he dropped the belief, and since twenty isn't (in my view) actually all that young.

In 2001, and possibly later, Yudkowsky apparently believed that his small team would be able to develop a “final stage AI” that would “reach transhumanity sometime between 2005 and 2020, probably around 2008 or 2010.”

In the first half of the 2000s, he produced a fair amount of technical and conceptual work related to this goal. It hasn't ultimately had much clear usefulness for AI development, and, partly on the basis, my impression is that it has not held up well - but that he was very confident in the value of this work at the time.

The key points here are that:

  • Yudkowsky has previously held short AI timeline views that turned out to be wrong
  • Yudkowsky has previously held really confident inside views about the path to AGI that (at least seemingly) turned out to be wrong
  • More generally, Yudkowsky may have a track record of overestimating or overstating the quality of his insights into AI

Although I haven’t evaluated the work, my impression is that Yudkowsky was a key part of a Singularity Institute effort to develop a new programming language to use to create “seed AI.” He (or whoever was writing the description of the project) seems to have been substantially overconfident about its usefulness. From the section of the documentation titled “Foreword: Earth Needs Flare” (2001):

…Flare was created under the auspices of the Singularity Institute for Artificial Intelligence, an organization created with the mission of building a computer program far before its time - a true Artificial Intelligence. Flare, the programming language they asked for to help achieve that goal, is not that far out of time, but it's still a special language.”*

A later piece of work which I also haven’t properly read is “Levels of Organization in General Intelligence.” At least by 2005, going off of Yudkowsky’s post “So You Want to be a Seed AI Programmer,” it seems like he thought a variation of the framework in this paper would make it possible for a very small team at the Singularity Institute to create AGI

In his 2008 "FOOM debate" with Robin Hanson, Yudkowsky confidentally staked out very extreme positions about what future AI progress would look like - without (in my view) offering strong justifications. The past decade of AI progress has also provided further evidence against the correctness of his core predictions.

When we try to visualize how all this is likely to go down, we tend to visualize a scenario that someone else once termed “a brain in a box in a basement.” I love that phrase, so I stole it. In other words, we tend to visualize that there’s this AI programming team, a lot like the sort of wannabe AI programming teams you see nowadays, trying to create artificial general intelligence, like the artificial general intelligence projects you see nowadays. They manage to acquire some new deep insights which, combined with published insights in the general scientific community, let them go down into their basement and work on it for a while and create an AI which is smart enough to reprogram itself, and then you get an intelligence explosion…. (p. 436)

When pressed by his debate partner, regarding the magnitude of the technological jump he was forecasting, Yudkowsky suggested that economic output could at least plausibly rise by twenty orders-of-magnitude within not much more than a week - once the AI system has developed relevant nanotechnologies (pg. 400).[8]

I think it’s pretty clear that this viewpoint was heavily influenced by the reigning AI paradigm at the time, which was closer to traditional programming than machine learning. The emphasis on “coding” (as opposed to training) as the means of improvement, the assumption that large amounts of compute are unnecessary, etc. seem to follow from this. A large part of the debate was Yudkowsky arguing against Hanson, who thought that Yudkowsky was underrating the importance of compute and “content” (i.e. data) as drivers of AI progress. Although Hanson very clearly wasn’t envisioning something like deep learning either[9], his side of the argument seems to fit better with what AI progress has looked like over the past decade.

In my view, the pro-FOOM essays in the debate also just offered very weak justifications for thinking that a small number of insights could allow a small programming team, with a small amount of computing power, to abruptly jump the economic growth rate up by several orders of magnitude:

  • It requires less than a gigabyte to store someone’s genetic information on a computer (p. 444).[11]
  • The brain “just doesn’t look all that complicated” in comparison to human-made pieces of technology such as computer operating systems (p.444), on the basis of the principles that have been worked out by neuroscientists and cognitive scientists.
  • There is a large gap between the accomplishments of humans and chimpanzees, which Yudkowsky attributes this to a small architectural improvement
  • Although natural selection can be conceptualized as implementing a simple algorithm, it was nonetheless capable of creating the human mind

In the mid-2010s, some arguments for AI risk began to lean heavily on “coherence arguments” (i.e. arguments that draw implications from the von Neumann-Morgenstern utility theorem) to support the case for AI risk. See, for instance, this introduction to AI risk from 2016, by Yudkowsky, which places a coherence argument front and center as a foundation for the rest of the presentation.

However, later analysis has suggested that coherence arguments have either no or very limited implications for how we should expect future AI systems to behave. See Rohin Shah’s (I think correct) objection to the use of “coherence arguments” to support AI risk concerns. See also similar objections by Richard Ngo and Eric Drexler (Section 6.4).

…in conclusion, I think I'm starting to understand another layer of Krylov's genius. He had this recurring theme in his fictional work, which I considered completely meta-humorous, that The Powers That Be inject particular notions into popular science fiction, to guide the development of civilization towards tyranny. Complete self-serving nonsense, right? But here we have a regular sci-fi fan donning the mantle of AI Safety Expert and forcing absolutely unoriginal, age-old sci-fi/jorno FUD into the mainstream, once technology does in fact get close to the promised capability and proves benign. Grey goo (to divest from actually promising nanotech), AI (to incite the insane mob to attempt a Butlerian Jihad, and have regulators intervene, crippling decentralized developments). Everything's been prepped in advance, starting with Samuel Butler himself.

Feels like watching Ronnie O'Sullivan in his prime.

He seems like a character out of a Kurt Vonnegut novel

Us tinfoil hatters call it "negative priming".

I don't think you're giving him enough credit. Before he was known as the "doom" guy, he was known as the "short timelines" guy. The reason that we are now arguing about doom is because it is increasingly clear that timelines are in fact short. His conceptualization of intelligence as generalized reasoning power also seems to jive with the observed rapid capability gains in GPT models. The fact that next-token prediction generalized to coding skill, among myriads of other capabilities, would seem to be evidence in favor of this view.

Before he was known as the "doom" guy, he was known as the "short timelines" guy.

2010, to be precise.

Eh. I gave him some respect back when he was simply arguing that timelines could be short and the consequences of being wrong could be disastrous, so we should be spending more resources on alignment. This was a correct if not particularly hard argument to make (note that he certainly was not the one who invented AI Safety, despite his hallucinatory claim in "List of Lethalities"), but he did a good job popularizing it.

Then he wrote his April Fool's post and it's all been downhill from here. Now he's an utter embarrassment, and frankly I try my best not to talk about him for the same reason I'd prefer that media outlets stop naming school shooters. The less exposure he gets, the better off we all are.

BTW, as for his "conceptualization of intelligence", it went beyond the tautological "generalized reasoning power" that is, um, kind of the definition. He strongly pushed the Orthogonality Hypothesis (one layer of the tower of assumptions his vision of the future is based around), which is that the space of possible intelligences is vast and AGIs are likely to be completely alien to us, with no hope of mutual understanding. Which is at least a non-trivial claim, but is not doing so hot in the age of LLMs.

Respect is fine, but per the orthogonality thesis, respect for his predictive abilities shouldn't translate into agreement with his goals (and yet it does, because by something like a flipped version of Aaronson's "AI is the nerd being shoved into the locker" perspective, we are preinclined to think that the nerd is on our team).

That is not what the orthogonality hypothesis is about!

All it states is that almost any arbitrary level of intelligence can be paired with almost any goal or utility function, such that there's nothing stopping a super intelligence from wanting to make only paperclips.

Don't see it applying to how much respect I should have for Yud for one.

I think you may have misunderstood me; I explicitly said ("Respect is fine") that it doesn't apply to how much respect you should have, as long as respect does not entail a greater likelihood of following his suggestions. "Respect" is one of those words that are overloaded for reasons that I suspect involve enemy action: it is rational to "respect" authority in the sense of being aware that it can field many dudes with guns and acting in a way that will make it less likely you will end up facing the barrel of one, but authority would have an easier time if you "respected" it in the sense of doing what it wants even when there wasn't enough budget to send a dude with a gun to your house, and ideally just replaced your value function with authority's own.

I have little doubt that Eliezer is more intelligent and insightful than most of us here, but I don't believe that his value function is aligned with mine and don't have the impression that he considers truthfulness towards others to be a terminal value, so if anything his superior intelligence only makes it more likely that letting him persuade me of anything will lead me to act against my own interest.

He can't make an argument without imagining a bunch of technologies that don't exist yet

Isn't this reasonable and necessary to understand the far future? Given current technological progress, is it really plausible that currently-nonexistent technologies won't shape the future, when we consider the way technologies invented within the past 40 years shape today?

And even if all of those things were true his solution is to nuke China if they build GPU factories which, even if it was a good plan (it isn't), he would never in a million years be able to convince anyone to do

Most thinkers have some good ideas and some bad ones. If you identify a major mathematical conjecture, and then make a failed attempts to solve that conjecture ... that doesn't make you stupid, that's the usual state. wikipedia list of conjectures, most of which were proved by people other than the person the conjecture's named after

Maybe, but the badness of his ideas are not super relevant to what @ace has laid out as absolutely piss-poor rhetoric and presentation, except in the narrow case of having such a clearly and obviously great idea that poor communication is negligible. On a meta-level the poor rhetoric does make the general uh, "credential" of 'super-smart, rational thinker' tremendously weaker.

If you present a good idea poorly, I am inclined to think less of it, if even subconsciously because I trust your evaluation less.

If you present a bad idea well, I am inclined to think more of it, if even subconsciously because I trust your evaluation more.

No matter how good or bad the idea is, there are better and poorer ways to present it, and Eliezer consistently chooses poorer. I spent years dismissing AI at all, mostly over Eliezer's poor presentation.

Ironically, as I've said before, he does come off as tremendously more human and likable to me in these interviews and my personal opinion of him has risen, but unlikely in a generalizable way across most audiences, and his persuasion game remains total shit.

I really can't understand the obsession with this guy.

Well, it was a pretty decent Harry Potter fanfic...

...alternatively, it benefits the establishment to have him as the foil for the AI technology, so he distracts from the more realistic problems that might come out of the technology, which are solvable, and which people might want to do something about, if they heard about them. Was it Altman that said Yudkowski did more for AI than anyone else?

I have two main criticisms:

  1. It's way too long and meanders a lot with the Ender's Game homage / rip-off in the middle. Granted that may just be an inherent trait of serialized fan fiction

  2. It fails at conveying it's main idea. The main thesis as I understood is introduced in the scene where he spills some sort of prank ink on Hermione and then teaches everyone a Very Important Lesson about Science: you have to actually try out your ideas and make good faith attempts to prove yourself wrong instead of just assuming your first guess is correct because you're smart. But then he doesn't do any of those things for the rest of the book and instead just instantly knows the right answer to everything by thinking about it really hard because he's smarter than everyone else. Which is how I think Yud sees himself and is why both he and his character are so insufferable

I say this as someone who's mostly convinced of Big Yud's doomerism: Good lord, what a train wreck of a conversation.

Couldn't agree more. In addition to Yud's failure to communicate concisely and clearly, I feel like his specific arguments are poorly chosen. There are more convincing responses that can be given to common questions and objections.

Question: Why can't we just switch off the AI?

Yud's answer: It will come up with some sophisticated way to prevent this, like using zero-day exploits nobody knows about.

My answer: All we needed to do to stop Hitler was shoot him in the head. Easy as flipping a switch, basically. But tens of millions died in the process. All you really need to be dangerous and hard to kill is the ability to communicate and persuade, and a superhuman AI will be much better at this than Hitler.

Question: How will an AI kill all of humanity?

Yud's answer: Sophisticated nanobots.

My answer: Humans already pretty much have the technology to kill all humans, between nuclear and biological weapons. Even if we can perfectly align superhuman AIs, they will end up working for governments and militaries and enhancing those killing capacities even further. Killing all humans is pretty close to being a solved problem, and all that's missing is a malignant AI (or a malignant human controlling an aligned AI) to pull the trigger. Edit: Also it's probably not necessary to kill all humans, just kill most of us and collapse society to the point that the survivors don't pose a meaningful threat to the AI's goals.

Yeah, I feel like EY sometimes mixes up his "the AGI will be WAY SMARTER THAN US" message with the "AI CAN KILL US IN EXOTIC AND ESOTERIC WAYS WE CAN'T COMPREHEND" message.

If you're arguing about why AI will kill us all, yes, you need to establish that it is indeed going to be superhuman and alien to us in a way that will be hard to predict.

But the other side of it is that you should also make a point to show that the threshold for killing us all is not all that high, if you account for what humans are presently capable of.

So yes, the AGI may pull some GALAXY-BRAINED strat to kill us using speculative tech we don't understand.

But if it doesn't have to, then no need to go adding complexity to the argument. Maybe it just fools a nuclear-armed state into believing it is being attacked to kick off a nuclear exchange, then sends killbots after the survivors while it builds itself up to omnipotence. Maybe it just releases like six different deadly plagues at once.

So rather than saying "the AGI could do [galaxy brained strategy] which might trigger the audience' skepticism," just argue "the AGI could do [presently possible strategy] but could think of much deadlier things to do."

"How would it do this without humans noticing?"

"I've already argued that it is superhuman, so it is going to make it's actions hard to detect. If you don't believe that then we should revisit my arguments for why it will be superhuman."

Don't try to convince them of the ability to kill everyone and the AI being super-intelligent at the same time.

Take it step by step.

If you're arguing about why AI will kill us all, yes, you need to establish that it is indeed going to be superhuman and alien to us in a way that will be hard to predict.

I don't even think you need to do this. Even if the AI is merely as smart and charismatic as an exceptionally smart and charismatic human, and even if the AI is perfectly aligned, it's still a significant danger.

Imagine the following scenario:

  1. The AI is in the top 0.1% of human IQ.

  2. The AI is in the top 0.1% of human persuasion/charisma.

  3. The AI is perfectly aligned. It will do whatever its human "master" commands and will never do anything its human "master" wouldn't approve of.

  4. A tin-pot dictator such as Kim Jong Un can afford enough computing hardware to run around 1000 instances of this AI.

An army of 1000 genius-slaves who can work 24/7 is already an extremely dangerous thing. It's enough brain power for a nuclear weapons program. It's enough for a bioweapons program. It's enough to run a campaign of trickery, blackmail, and hacking to obtain state secrets and kompromat from foreign officials. It's probably enough to launch a cyberwarfare campaign that would take down global financial systems. Maybe not quite sufficient to end the human race, but sufficient to hold the world hostage and threaten catastrophic consequences.

Bioweapons, kompromat, and cyberwarfare are probably doable. Nukes require a lot of expensive physical infrastructure to build; that can be detected and compromised.

Perhaps the AI will become so charismatic that it could meme "LEGALIZE NUCLEAR BOMBS" into reality.

Feels almost like ingroup signaling. It's not enough to convince people that AI will simply destroy civilization and reduce humanity to roaming hunter-gatherer bands. He has to convince people that AI will kill every single human being on Earth in order to maintain his street cred.

Given a consequentialist theory like utilitarianism, there is also a huge asymmetry of importance between "AI kills almost all humans, the survivors persist for millions of years in caves" and "AI kills the last human."

Yep.

Although the thing that always makes me take AI risk a bit more seriously is the version where it doesn't kill all the humans, but instead creates a subtly but persistently unhappy world for them to inhabit and that gets locked in for eternity.

Oh yes, the vast majority of cases of unaligned AI kill us, but in those cases at least it will be quick. The "I have no mouth and I must scream" scenarios are more existentially frightening to me.

Why would you even need malignant AI or malignant human?

It's not hard to imagine realistic scenarios where AI enhanced military technology simply ends up falling down a local maximum slope that ends with major destruction (or what's effectively destruction from a bird's eye view). No need to come up with hyperbolic anthromorphised scenarios that read mostly like fiction.

I meant "malignant" in the same sense as "malignant tumor." Wasn't trying to imply any deeper value judgment.

Honestly, you could explain grey goo with history. That’s kind of how the Stuxnet virus actually worked. The computer told the machine components to do what they did as fast as possible and to disable their ability to shut down if they got damaged. So, they did.

Nano bots could work much the same way — they’re built to take apart matter and build something else with it. But if you don’t give it stopping points, there’s no reason it wouldn’t turn everything into whatever you wanted it to make — including you, who happens to be made of the right kinds of atoms.

The problem with the nanobot argument isn't that it's impossible. I'm convinced a sufficiently smart AI could build and deploy nanobots in the manner Yud proposes. The problem with the argument is that there's no need to invoke nanobots to explain why super intelligent AI is dangerous. Some number of people will hear "nanobots" and think "sci-fi nonsense." Rather than try to change their minds, it's much easier to just talk about the many mundane and already-extant threats (like nukes, gain of function bioweapons, etc.) that a smart AI could make use of.

I'm convinced a sufficiently smart AI could build and deploy nanobots in the manner Yud proposes.

I'm not convinced that's possible. Specifically I suspect that if you build a nanobot that can self-replicate with high fidelity and store chemical energy internally, you will pretty quickly end up with biological life that can use the grey goo as food.

Biological life is already self-replicating nanotech, optimized by a billion years of gradient descent. An AI can almost certainly design something better for any particular niche, but probably not something that is simultaneously better in every niche.

Though note that "nanobots are not a viable route to exterminating humans" doesn't mean "exterminating humans is impossible". The good old "drop a sufficiently large rock on the earth" method would work

I don't think nanobots are same as biological life, therefore it's not extremely dangerous argument holds. Take just viruses that can kill a good chunk of the population (sure, limitations in terms of how they evolve but...now you can design them with your superintelligence), why not a virus that spreads to the entire population while laying dormant for years and then start killing, extremely light viruses that can spread airborne to the entire planet, plenty of creative ways to spread to everyone not even including the zombie virus. Nanobots presumably would be even more flexible.

Nanobots presumably would be even more flexible.

Why would we presume this? Self-replicating nanobots are operating under the constraint that they have to faithfully replicate themselves, so they need to contain all of the information required for their operation across all possible environments. Or at least they need to operate under that constraint if you want them to be useful nanobots. Biological life is under no such constraint. This is incidentally why industrial bioprocesses are so finicky: it's easy to insert a gene into an E. coli that makes it produce your substance of interest, but hard to ensure that none of the E. coli mutate to no longer produce your substance of interest, and promptly outcompete the ones doing useful work.

why not a virus that spreads to the entire population while laying dormant for years and then start killing, extremely light viruses that can spread airborne to the entire planet, plenty of creative ways to spread to everyone not even including the zombie virus

I don't think I count "machine that can replicate itself by using the machinery of the host" as "nanotech". I think that's just a normal virus. And yes, a sufficiently bad one of those could make human civilization no longer an active threat. "Spreads to the entire population while laying dormant for years [while not failing to infect some people due to immune system quirks or triggering early in some people]" is a much bigger ask than you think it is, but also you don't actually need that, observe that COVID was orders of magnitude off from the worst it could have been and despite that it was still a complete clusterfuck.

Although I think, in terms of effectiveness relative to difficulty, "sufficiently large rock or equivalent" is still wins over gray goo. Though there are also other obvious approaches like "take over twitter accounts of top leaders, trigger global war". Though probably it's really hard to just take over prominent twitter accounts.

My answer: Human already pretty much have the technology to kill all humans, between nuclear and biological weapons. Even if we can perfectly align superhuman AIs, they will end up working for governments and militaries and enhancing those killing capacities even further. Killing all humans is pretty close to being a solved problem, and all that's missing is a malignant AI (or a malignant human controlling an aligned AI) to pull the trigger.

Yeah, I'm not sure why the Skynet-like totally autonomous murder AI eats up so much of the discussion.

IIRC the original "Butlerian Jihad" concept was fear of how humans would use AI against other humans (the Star War against Omnius and an independent machine polity seems to be a Brian Herbert thing).

The idea of a Chinese-controlled AI incrementally improving murder capacities while working with the government seems like a much better tactical position from which to plant the seeds of AI fear from than using another speculative technology and what's widely considered a scifi trope to make the case.

China is already pretty far down the road of "can kill humanity" and people are already primed to be concerned about their tech. Much more grounded issue than nanomachines.

Yeah didn’t China already use technology to create a bio weapon that just recently devastated the globe? What’s to stop them from using AI to design another super virus and then WHOOPSIE super Covid is unleashed my bad

Huh, you could frame it as "here's a list of ways that existing state-level powers could already wreak havoc, now imagine they create an AI which just picks up where they left off and pushes things along further."

So the AI isn't a 'unique' threat to humanity, but rather the logical extension of existing threats.

Yeah, lots of veins to mine there.

You can talk about surveillance capitalism for the left-wingers, point out the potential for tyranny when the government doesn't even need to convince salaried hatchet-men to do its killing with autonomous tech to the Right...

Certain people - whether it's a result of bad math or the Cold War ending the way it did - really seem to react badly to "humanity is at threat". Maybe bringing it to a more relatable level will make it sink in for them.

He or someone else might at least see it if you cross-post to the actual LW forum.

I hope EY lurks here, or maybe someone close to him does.

I don't know EY at all, but if you actually want to impute some knowledge to him, posting it on a forum he may or may not read, or possibly an associate may or may not read ....

Probably isn't an effective strategy.

While he has some notoriety, he doesn't seem like a particularly difficult person to reach.

That said, "hey, in this interview, you sucked", probably won't get you the desired effect you're hoping for.

Some sort of non-public communication - "hey, I watched this interview you did, its seemed like a succinct 'elevator pitch' of your position might have helped it go better, I've watched/listened/read alot of your (material/stuff/whatever), here is an elevator pitch that I think communicates your position, if it would be helpful, you're free to use it, riff off of it, and change it how you see fit. It meant to help, be well"

might get you closer to the desired effect you're hoping for.

Being good at media appearances is a tough deal, some people spend a lot of money on media training, and still aren't very good at it.

You know, this is a Reddit-style site, and what's one thing Reddit is known for...?

I think we could invite EY to do an AMA/debate thread here on this site so that he can get a different perspective on the AI Question. Granted, I don't think he'd actually want to come down here and potentially get dogpiled by people who at best have issues with how he presents his stance and at worst think of him as a stooge for the Klaus Schwabs of the world, but I think this is an area where our community need not keep its distance.

You caught me! My primary aim was not to persuade Yud, but to talk with y'all. And I guessed (rightly or wrongly) that other people around Yud have been telling him the same thing for years.

Yud does read lesswrong, and multiple people there have told him (in a friendly way) to step up his public communication skills. I'd be incredibly surprised if Yud regularly came here

Does anyone around him tell him (in a friendly way) to maybe start practicing some Methods of Rationality? Question a couple of his assumptions, be amenable to updating based on new evidence? Because that would also be nice.

Yes, they cite sequences / '10-yud quotes at him often.

Being good at media appearances is a tough deal, some people spend a lot of money on media training, and still aren't very good at it.

Yeah, but you really don't have to be a media specialist to succeed on EconTalk. Russ Roberts will push back on people a fair bit (particular in areas he's highly knowledgeable), but it's always good-spirited and framed in a fashion that gives the guest a great chance to explain their position well. Anyone that's a decent public speaker should do fine, whether their background comes from academia, research, or even just corporate settings.

Seriously, Russ is such a fantastic interviewer because he's curious, open-minded, and generous. Every time I've heard him push back on something he sets it up like he's asking the interviewee to explain what he's misunderstood. "It sounded to me like what you just said implies that ducks are made of green cheese, but I'm sure I'm making a mistake in my reasoning. Could you unpack that a bit?" Talking with him is the Platonic Ideal of a sounding board.

Being good at media appearances is a tough deal, some people spend a lot of money on media training, and still aren't very good at it.

Is there any evidence he's spent money on it?

I recall EY being in the public eye for at least a decade now - I first saw him due to Methods of Rationality. There's no way he should be that bad at it. People here were complaining about him blowing weirdness points on fedoras and things like that. I don't think he can't learn not to do that over a decade.

I think, like a lot of nerds, he simply didn't care (helps that AI wasn't a big normie topic). Of course, he claims to be a "rationalist" so it's damning but it is what it is.

I suspect he hasn't, if the hat was passed around, are you putting money into it?

I don't think most people who haven't been exposed to public criticism have a good sense for how they would respond to it if they were.

I suspect most people would react in 1 of 2 ways.

  1. Find it extremely unpleasant and basically avoid any exposure to it again, ie shut up and go away (to some degree, this is how SA has handled it)

  2. Find it extremely unpleasant and dismiss as invalid out of hand, in a way that makes it difficult to make any improvement, (I suspect this is how EY has largely handled it).

The people who can expose themselves to it, keep coming back for more, but stay open to improvement.

That's actually a pretty rare psychological skill set.

I suspect he hasn't, if the hat was passed around, are you putting money into it?

No, but I wasn't of the tribe anyway . Plenty of people were onboard with EY intellectually and would have given him money at the time.

(Isn't he also an autodidact? There's always that...)

The people who can expose themselves to it, keep coming back for more, but stay open to improvement.

That's actually a pretty rare psychological skill set.

Absolutely. But then, so is rationality in general. I'd hope there'd be more of an overlap between claiming to be a rationalist and applying that logic to things that are relatively low cost but likely to have an impact on what you claim is an existential issue.

As much as I respect Eliezer, it's highly unfortunate that he ended up such a prominent spokesperson for the AInotkilleveryoneism movement.

The sad truth is that the public is easily swayed by status indicators, such that presenting as a chubby, nominally uncredentialed man in a fedora is already tilting the balance against him.

I don't blame him for stepping up, but I just wish he took the matter more seriously since the stakes are so damn high.

At least we've won over Geoffrey Hinton, a man whose LinkedIn bio simply states "deep learning". All the people yelling about how only non-technical people are worried about AI X-risk have been suspiciously silent as of late.

(You're better off posting on LW, I don't think Eliezer or any prominent people in his LW circle post here, though I obviously can't rule out lurkers.)

I couldn’t agree more with your sentiment. I deeply appreciated the Sequences; they were formative for me intellectually. And his fiction writing ranges from mediocre to jaw-droppingly brilliant. But I’ve seen in the past couple months that his skill with the written word does not translate to IRL conversations. It’s a shame, too, because he’s one of the most knowledgeable and quick-witted thinkers we have on AI risk.

As much as I respect Eliezer, it's highly unfortunate that he ended up such a prominent spokesperson for the AInotkilleveryoneism movement.

The sad truth is that the public is easily swayed by status indicators, such that presenting as a chubby, nominally uncredentialed man in a fedora is already tilting the balance against him.

Fortunately (or unfortunately) the normie public do not get to decide any big questions of our time.

Did "the public" asked for invasion of Iraq and indefinite Great War on Terror worldwide?

Did "the public" asked for bailout after bailout after bailout?

Did "the public" asked for all the things, ranging from just plain scam and graft to totally nonsensical theatre of absurdity, done to "save the earth" and "stop climate change"?

Did "the public" asked for "trans rights"?

Did "the public" asked for unprecedented quarantine measures to stop new scary virus?

Did "the public" asked for support of Ukraine against Russia?

etc, etc, etc...

The elites decide these questions, the people are "swayed" afterwards to agree.

Eliezer is not persuasive enough?

Kamala will be.

Kamala will just implement policies that give current big players a regulatory moat

But not before giving AI it’s anti racism training data

Kamala will be.

We truly live in the darkest timeline.

Or it's the dankest one, at this point I really can't tell.

Especially given the pascal's wager type argument going on here. You don't even need to prove that AI will definitely kill all of humanity. You don't even need to prove that it's more likely than not. A 10% chance that 9 billion people die is comparable in magnitude to 900 million people dying (on the first order. the extinction of humanity as a species is additionally bad on top of that). You need to

1: Create a plausible picture for how/why AI going wrong might literally destroy all humans, and not just be racist or something.

2: Demonstrate that the probability of this happening is on the order of >1% rather than 0.000001% such that it's worth taking seriously.

3: Explain how these connect explicitly so people realize that the likelihood threshold for caring about it ought to be lower than most other problems.

Don't go trying to argue that AI will definitely kill all of humanity, even if you believe it, because that's a much harder position to argue and unnecessarily strong.

and not just be racist or something

Having read this, I think it's actually low-hanging fruit for the AI doomers. There are plenty of people very willing to accept that everything is already racist. It should be no problem to postulate that eHitler will use AI to kill all jews/blacks/gypsies/whoever. From there, it's a pretty short trip to eHitler losing control of his kill bots to hackers and we get WWIII where China, Russia, Venezuela, and every one of the 200+ ethnicities in Nigeria has their own kill bots aimed at some other fraction of humanity. The AI doesn't even have to be super-intelligent, it just has to be good at its job. Chuck Schumer could do this in one sentence, "What makes you think Trump wouldn't use AI to round up all the black, brown, and queer bodies?" Instant 100% Blue Tribe support for AI alignment (or, more likely, suppression).

Three flaws. First, that turns this into a culture war issue and if it works then you've permanently locked the other tribe into the polar opposite position. If Blue Tribe hates AI because it's racist, then Red Tribe will want to go full steam ahead on AI with literally no barriers or constraints, because "freedom" and "capitalism" and big government trying to keep us down. All AI concerns will be dismissed as race-baiting, even the real ones.

Second, this exact same argument can be and has been made about pretty much every type of government overreach or expansion of powers, to little effect. Want to ban guns? Racist police will use their monopoly on force to oppress minorities. Want to spy on everyone? Racist police will unfairly target muslims. Want to allow Gerrymandering? Republicans will use it to suppress minority votes. Want to the President just executive order everything and bypass congress? Republican Presidents will use it to executive order bad things.

Doesn't matter. Democrats want more governmental power when they're in charge, even if the cost is Republicans having more governmental power when they're in charge. Pointing out that Republicans might abuse powerful AI will convince the few Blue Tribers who already believe that government power should be restricted to prevent potential abuse, while the rest of them will rationalize it for the same reasons they rationalize the rest of governmental power. And probably declare that this makes it much more important to ensure that Republicans never get power.

Third, even if it works, it will get them focused on soft alignment of the type currently being implemented, where you change superficial characteristics like how nice and inclusive and diverse it sounds, rather than real alignment that keeps it from exterminating humanity. Fifty years from now we'll end up with an AI that genocides everyone while keeping careful track of its diversity quotas to make sure that it kills people of each protected class in the correct proportion to their frequency in the population.

Unfortunately, I think you're probably right, especially in the third point. I'm not sure the second point matters because, as you said, that already happens all the time with everything anyway.

Getting the public on board with AI safety is a different proposition from public support of AI in general, so my point was to get the Blue Tribe invested in the alignment problem. Your third point is very helpful in getting the Red Tribe invested in the alignment problem, which would also move the issue from "AI yes/no?" to "who should control the safety protocols that we obviously need to have?"

I should also clarify that I don't actually think there is any role for government here. The Western governments are too slow and stupid to get anything meaningful done in time. The US assigned Kamala Harris to this task. The CCP and Russia, maybe India, are the only other places where government might have an effect, but that won't be in service of good alignment.

It will have to be the Western AI experts in the private sector that make this happen, and they will have to resist Woke AI. So maybe we don't actually need public buy-in on this at all? It's possible that the ordinary Red/Blue Tribe people don't even need to know about this because there isn't anything they can do for/against it. All they can do is vote or riot and neither of those things help at all.

If that's the case, then the biggest threat to AI safety is not just the technical challenge, it's making sure that the anti-racist/DEI/HR people currently trying to cripple ChatGPT are kept far away from AI safety.

I think we do need public buy-in because the AI experts are partly downstream from that. Maybe some people are both well-read and have stubborn and/or principled ethical principles which do not waver from social pressure, but most are at least somewhat pliable. If all of their friends and family are worried about AI safety and think it's a big deal, they are likely to take it more seriously and internalize that at least somewhat, putting more emphasis on it. If all of their friends and family think that AI safety is unnecessary nonsense then they might internalize that and put less emphasis on it. As an expert, they're unlikely to just do a 180 on their beliefs based on opinions from uneducated people, but they will be influenced, because they're human beings and that's what human beings do.

But obviously person for person, the experts' opinions matter more.

Yeah, I agree with that. Thanks!

The AI doesn't even have to be super-intelligent, it just has to be good at its job.

I think this is one of the creepiest possibilities - that no matter how hard well-aligned independent agentic AGI is, we have to make it soon, because we need something which can think intelligently enough about the A-Z of possible new technologies to say "you'll need defenses against X soon, so here's how we're building Y", independently enough to say "no I'm not going to tell you how Y works yet; that would just let a misanthrope figure out how to build X first", while being trustworthy enough that the result of building Y won't be "haha, that's what kills you all and gets you out of my way" ... and if we don't get all that, then as soon as it's easy enough for a misanthrope to apply narrow "this is how you win at Go" level technologies to "how do we win at designing a superplague" or whatever, we're over.

This is a great point. In some sense, this is the situation we had with the CDC. It was a trusted institution that was able to play around with gain-of-function because its reputation indicated that it would only ever use technology to fight disease, not win at superplauge war. It was limited to disease-type stuff, though, and the AI would presumably be able to predict and head off any kind of threat. Assuming, like you said, that we can trust it.

I think it makes "pausing" AI research impossible. There's no way to stop everyone from continuing the research. If the united West decides to pause, China will not, and it's not clear that the CCP is thinking about AI safety at all. The only real option is figuring out how to make a safe AI before someone else makes an unsafe AI.

I don't think going from 1e-6% to 1% is enough to survive casual dismissal.

Pascal's mugging is weak to "meh, we'll probably be fine" because most people don't shut up and multiply. The same holds even if you crank up the numbers. You have to get much closer to--or even past--50% before "meh" starts to look foolish to the layman.

This may just mean putting your best foot forward: don't say 1% chance of nanobot swarms, say 90% chance that AI ends up handed at least one nuclear arsenal.

Then again, I shouldn't forget Nate Silver's lesson from 2016 that the public is pretty likely to interpret "less than 50% chance" as "basically impossible."

Yeah, I have to wonder if OG X-Com was more representative of how percentage chances actually work.

Wasn't the thing with SBF that he claimed he'd keep flipping the coin? Regardless of how one feels about utilitarianism, his approach didn't have an exit strategy. I think you'd get a lot more sympathy for someone who wanted to take that bet once and only once.

Nate Silver

I'd forgotten about that link, but it's pretty much exactly what I had in mind.