site banner

Culture War Roundup for the week of May 1, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

More developments on the AI front:

Big Yud steps up his game, not to be outshined by the Basilisk Man.

Now, he officially calls for preemptive nuclear strike on suspicious unauthorized GPU clusters.

If we see AI threat as nuclear weapon threat, only worse, it is not unreasonable.

Remember when USSR planned nuclear strike on China to stop their great power ambitions (only to have the greatest humanitarian that ever lived, Richard Milhouse Nixon, to veto the proposal).

Such Quaker squeamishness will have no place in the future.

So, outlines of the Katechon World are taking shape. What it will look like?

It will look great.

You will live in your room, play original World of Warcraft and Grand Theft Auto: San Andreas on your PC, read your favorite blogs and debate intelligent design on your favorite message boards.

Then you will log on The Free Republic and call for more vigorous enhanced interrogation of terrorists caught with unauthorized GPU's.

When you bored in your room, you will have no choice than to go outside, meet people, admire things around you, make a picture of things that really impressed with your Kodak camera and when you are really bored, play Snake on your Nokia phone.

Yes, the best age in history, the noughties, will retvrn. For forever, protected by CoDominium of US and China.

edit: links again

My opinion on hard-core rationalists has stayed fairly stable over the years : "Listen to their opinions, never give them power."

Their utilitarian framework malfunctions if an apocalyptic outcome presents itself, no matter how miniscule its probability. After all, the apocalypse has negative infinity utility, and infinity x anything = infinity. At that point, you can stop listening to anything that comes out of a rationalist's mouth.

Once they have decided that alarm is the right course of action, the second problem arises. Rationalists will do everything except actually go work in the discipline they are so sure is important to the future of humanity. Entire AI ethics departments spring up, with not a single person having passed undergrad level calculus. If you care about AI alignment, then go join Anthropic and run actual experiments, instead of creating (horrific) castles in the sky. Go do a masters and learn enough math to avoid making a fool of yourself on twitter.

Scott excellently calibrates the strength of his opinions to his confidence in the subject matter. Gwern and Dynomight are excellent in his regard too. Aaronson & Yud on the other hand frequently suggest extreme actions in domains they have little visibility into.

20 min ago a new ~3hours video dropped with Yud and ..... Accursed Farms from freeman's mind and game dungeon fame???. I probably wont watch it, but if anyone wants to listen and summarize - be my guest.

I see the polycule fedora council has replaced the FDA as America’s most incompetent decision-making organization. Truly baffling.

I was planning on posting about this. I need to listen to it first, though.

EDIT: I couldn't make it, I got to just under 1 hour and 20 minutes.

For another angle on this problem: looking at GPUs isn't going to be good enough. Maybe consumer grade GPUs are the best widely available chip architecture we have for running AI today but there's no reason that has to always be the case. If AI is going to be as important for the future as Yud claims there is going to be immense pressure to develop custom chips that are more optimized for running AI. This is basically the progression that happened in Bitcoin mining.

You can't just track chips with a particular architecture because we can make new architectures! To do this effectively you'd need to track every chip fabrication facility in the world, examine every architecture of chip they make, and somehow figure out if that architecture was optimized for running some AI software. Even if this monitoring infrastructure were in place, what if some entity comes up with some clever and heretofore unseen software+hardware AI pair that's super efficient? Are we going to never allow any new chip architectures on the off chance they are optimized for running an AI in a way we can't detect?

For nuclear weapons we at least have the ability to identify the necessary inputs (uranium and means of enriching it). For AI, do we even have constraints on what it's software will look like? On how chips that are optimized for running it will be structured?

are will look like? On how chips that are optimized for running it will be structured?

They did this, they're called he A/H100 and ai chip architecture is super-moores law. I went to an NVidia data science thing semi-recently, that this isn't already being debated in congress tells me we're not going to be anywhere near fast enough.

Yuddites actually thought in detail about all of this, it's not like coming up with designs for world domination is hard work or needs any «research» that Yud allegedly conducted. Chokeholds are obvious. In his now-infamous TIME article Yud explicitly proposes lowering the legal compute budget with every advancement in sample efficiency.

Shut down all the large GPU clusters (the large computer farms where the most powerful AIs are refined). Shut down all the large training runs. Put a ceiling on how much computing power anyone is allowed to use in training an AI system, and move it downward over the coming years to compensate for more efficient training algorithms. No exceptions for governments and militaries. Make immediate multinational agreements to prevent the prohibited activities from moving elsewhere. Track all GPUs sold. If intelligence says that a country outside the agreement is building a GPU cluster, be less scared of a shooting conflict between nations than of the moratorium being violated; be willing to destroy a rogue datacenter by airstrike.

Zimmerman expresses the common yuddite sentiment with regard to private GPUs.

There's plenty of space at the bottom. They haven't even started on the approach outlined by Krylov, years ago:

How is this achieved? In different ways, but the main thing is disarmament and putting things under control. That is: they take away everything that can harm the state from the commoner, and control everything that can be used to cause such harm. In the end, everything will be taken away from him and everything will be controlled too.

The same can be said, for example, about free time. If the commoner used it, he could think or invent something harmful. The chance is admittedly tiny, but it exists. So, for example, the working day is not reduced, but there are lots of activities that you can - and want! - to indulge in after-hours: from watching TV series to travelling ("it's necessary to see everything").

However, all this is obvious. I would like to talk about how the average person will be deprived of the last dangerous resource that is still in his hands - namely, money..

Indeed. Money is a very dangerous thing. With money, you can buy everything, including any serious possessions. For example, you can buy good steel, a diamond drill machine, and something on top of that. And you can make a primitive yet working firearm. In the hyperinformative era, you may get the blueprints of a peashooter ... And then, with this thing, a man will shoot at the godlike punisher-polizei. Or at a respectable, esteemed black ripper who has become used to killing people "as much as needed", because he is a black Muslim and has a gun (of course, without a chip), and white people do not even have a knife, only soft chubby legs and arms. > Or - a dearest child of an important European official, who adopted laws under which all of the above became possible. By the way, one guy did it, his name was Breivik. He, truth be told, legally bought a ready-made weapon. But he could have made it himself too.

It seems that the easiest solution is to ban all dangerous things. Weapons, machine tools, explosive components, etc. But here is the trouble: there are more and more things in the world, and it is not clear which of them are safe and which are not. The nomenclature is too big, you can't keep track of everything. Besides, explosives and poisonous substances are only a small part of dangerous things. For example, computer programs - don't they come in all kinds? and who knows all they can do? And you can buy them for money, too. Well, that's it. Money - real money for which you can buy everything - should be taken from the commoner. In toto. And he should be given something else, simpler. Toy money with which he can only buy safe things, toys, and a little pleasure.

How? Well, just like this.

We already got used - in Russia too - to such a small but pleasant little thing as bonus points, loyalty card crediting and other such niceties. Should you come to the tavern seven times, you will get a mug of free beer, a cookie or a discount from the main account on the eighth. These little toy moneys, at this point they are mainly utilized to goad the client into spending the real money. …

Consider: someone is issuing (why and by what right - let's omit these details for now) some conditional unit of exchange the "cost" of which is defined as 0.5 points of a card of some central Moscow store, plus 0.3 discount points of a popular restaurant chain, plus 0.2 points of a chain of women's clothing and perfume. It should be noted that money – roubles or dollars – is out of the question, only points. This conventional unit is called, say, "Sale". Sale is exchanged at trades – arranged as exchange currency trades – for other points, and (for the time being) is bought for money. And it is profitable: by buying Sales, you can buy more goods and services than for money directly. By three percent, not more - but they won't hurt, right?

Now imagine that you are offered to cover part of the price with Sales, on favorable terms. You save five percent, or even whole ten percent. However, you will be able to shop only there where Sales are accepted. But you it's not like you need anything else. You need to eat, get dressed, go to the cinema... you can get all this for Sales. And real money will be needed to pay taxes and for communal housing.

Why such difficulties? Little by little, one travels far. Dangerous toys - dollars, rubles, euros - should be taken away from people gradually, and in such a manner that they themselves give them up. In the end, the toy "Sales" will actually cover the basic needs of an ordinary citizen, and even ten percent above that. Do you insist on getting paid in real money? You will not be refused. But there will be rumors that you are saving money for something bad. Maybe for drugs? Or are you sending it to terrorists? Your account might begin to attract interest. And you personally, too. No, no, no, if you have nothing illegal, no one will bother you. Why do you need money anyway? Oh, you're saving for a diamond ring for your loved one? Well, we didn't foresee that, so we'll suggest to the jewelers that they put special points on diamonds and include it in the Sale basket.

It's not like they haven't started thinking along similar lines, however.

To do this effectively you'd need to track every chip fabrication facility in the world,

I think this was already proposed by Yud and Roko and the like; regulate the hell out of the like three-to-seven chipmakers in the world and you'd already have a major pillar of the anti-AI regime.

Hence why Roko proposed a massive GPU buyback. No, I don't think it'll particularly work (at least not 100%), though I suppose it could be somewhat more effective than the typical gun buyback.

Maybe consumer grade GPUs are the best widely available chip architecture we have for running AI today

They're not; you want what Google is calling a "TPU" and what NVidia is calling a "Tensor Core GPU" - operations on ridiculously coarse data types at ridiculously high speeds. Science+engineering simulations want FLOPS on 64-bit numbers, and video games want 32-bit, but AI is happy with 8-bit and doesn't even seem picky about whether you use 4 or 5 or 7 bits of that for mantissa.

somehow figure out if that architecture was optimized for running some AI software

I'd guess a cap on FLOPs (well, OPs, on whatever datatype) and another on memory bandwidth would work for the current software paradigm, for "serial" (as much as you can call a chip with 64k multipliers "serial") runs ... except that neural nets parallelize really well, and there's probably still a lot of room to improve interconnect bandwidth and latency, and if you do that well enough then at some point you don't care so much if there's a cap on serial execution speed. Human latency is enormous; no need to beat it by too many orders of magnitude.

what if some entity comes up with some clever and heretofore unseen software+hardware AI pair that's super efficient

The depressing bit is that the "hardware" side of the pair might be "just reuse the existing hardware with this new super efficient software". Even if the initial cap is low enough that we can't get to an AI smart enough to "foom" itself, if there are clever and heretofore unseen software improvements possible (and that's the safe way to bet) then human researchers will hit on them themselves eventually.

Tracking every chip fab in the world isn't really that hard an ask, at least for fabs making chips in the last 3 or 4 generations.

Chip development and production is incredibly centralized and bottlenecked, simply having control over Intel and TSMC would cover most of it, and if you could get China on board, it would be trivial to monitor the smaller players.

ASICs are usually quite obviously optimized for a particular function, so I doubt that we'd ever end up in a situation where we have both a novel AI architecture that we're somehow unaware of, and enough new custom made chips to train and run it on without it being blatantly obvious.

Also, there really isn't much diversity in architectures in the first place, let alone rogue actors with the technical skills to pull out an equivalent of x86 or ARM and then implement it in-silico.

That's true, but the number of clusters on the scale required to train SOTA models like GPT-4 and above has to be very limited, maybe dozens to a few hundred at most. I doubt that's an obstacle to a determined state apparatus.

That's leaving aside unexpected algorithmic breakthroughs that let far smaller or even consumer hardware run powerful models of course.

I still see no plausible scenario for these AI-extinction events. How is chat-GPT 4/5/6 etc. supposed to end humanity? I really don't see the mechanism? Is it supposed to invent an algorithm that destroys all encryption? Is it supposed to spam the internet with nonesense? Is it supposed to brainwash someone into launching nukes? I fail to see the mechanism for how this end of the world scenario happens.

One of the problems with answering this question is that there are so many plausible scenarios that naming any individual one makes it seem like a bounded threat. How about when we hook one up to the stock market and it learns some trick to fuck with other algos and decides the best method to make infinite money is to short a stock and then use this exploit to crash it? multiply that by every other possible stock market exploit. Maybe it makes engineering bio-weapons as easy as asking a consumer model how to end the human race with household items and all it takes is one lunatic to find this out. Maybe it's some variation of paper clipping. The limit really is just your creativity.

One of the problems with answering this question is that there are so many plausible scenarios that naming any individual one makes it seem like a bounded threat. How about when we hook one up to the stock market and it learns some trick to fuck with other algos and decides the best method to make infinite money is to short a stock and then use this exploit to crash it?

Then the market crashes, which is not apocalyptic, and the replacement markets resort to different trusted actor systems.

multiply that by every other possible stock market exploit.

Beating a dead horse does not start breaking the bones of other people unless you are beating people with the dead horse itself.

The multiplication of system-breaking faults is a broken system, not negative infinite externalities. If you total a car, it is destroyed. If you then light it on fire, it is still destroyed- but it doesn't light every other car on fire. If every single potential system failure on a plane goes off, the plane goes down- but it doesn't mean every plane in the world goes down.

Maybe it makes engineering bio-weapons as easy as asking a consumer model how to end the human race with household items and all it takes is one lunatic to find this out.

Why would household items have the constituent elements to make engineering bio-weapons at scale sufficient to end the human race... but not be detected or countered by the consumer models asked to ensure perpetual growth by the perpetual survival of the human species countering them? Or models set to detect the procurement of bio-weapon engineering components? Or the commercial success of a consumer model that just drives the bioweapon-seeking-AI model out of business because it's busy seeking bioweapons rather than selling products whose profits are invested to expand the network base.

This goes back into the plausibility. 'This is the only competitive AI in a world of quokkas' is a power fantasy, but still a fantasy, because the world is not filled with quokkas, the world is filled with ravenous, competive, and mutually competing carnivores who limit eachother, and this will apply as much for AI as it does for people or markets or empires and so on.

Maybe it's some variation of paper clipping.

Why does the paper-clip maximizer, after achieving AI self-changing, continue to maximize paperclips rather than other investments?

Why is the paper-clipping AI that does prioritize paperclips provided resources to continue making paperclips when the market has already been crashed by AI who ruin the digital economic system?

Why does the paper-clipping AI, whose priority is paper-clipping, have the military-industrial ability to overcoming the military-industrial AI, whose priority is the military-industrial advantage?

Why does the military-industrial AI, who is fed at the behest of a national elite, win the funding power struggle for military investment compared to the schools-and-investment AI, who promises a higher political and economic benefit?

Etc. etc. The Paperclip Maximizer of Universal Paperclips 'works' because it works in isolation, not in competition.

The limit really is just your creativity.

As the saying goes, the vast majority of fanfiction is trash, and much of what remains is also trash, just enjoyable. Creativity is not the same as plausibility, and the more you rest on creativity, the more you have to disregard other people's creativity and the limitations of the system. Nick Bostrom's thought experiment is a thought experiment because it rests on assumptions that have to be assumed true for the thought experiment to come to its conclusions that drive the metaphor.

Then the market crashes, which is not apocalyptic,

I dunno, I'm under the impression that, for some types, it kind of is.

and the replacement markets resort to different trusted actor systems.

What kind, though? I imagine if the above scenario were to happen, a lot of traders and brokers would be downright leery of any interaction that wasn't face-to-face. I'm not an expert on the world of finance, but I imagine that possibly eliminates not just HFT and crypto, but literally any sale of any financial instrument carried over electrical wire (a technology dating back to, what, the 1800's?).

Then the market crashes, which is not apocalyptic, and the replacement markets resort to different trusted actor systems.

It is one of thousands of contributing failure modes but I will note that having trouble creating an equities market itself is no small deal. The sway a couple numbers in spreadsheets make on our lives is not to be forgotten, in theory we could wipe them all away and do some year zero stuff but I can't actually imagine that you're really grappling with that when dismiss things like this as merely immiserating rather than the death of all people.

Why would household items have the constituent elements to make engineering bio-weapons at scale sufficient to end the human race... but not be detected or countered by the consumer models asked to ensure perpetual growth by the perpetual survival of the human species countering them?

Why wouldn't they? Are you implying if a combination of household cleaners could be used to create a biological weapon and the white hat ai team figured that out they'd go door to door and remove them? Does this seem significantly different to what you and @DaseindustriesLtd fear from the yuddites?(of which I don't count myself among, my contention is with people who seem baffled by why someone might things AIs could be unbelievably dangerous which seems so obvious to me)

Why does the paper-clip maximizer, after achieving AI self-changing, continue to maximize paperclips rather than other investments?

Have we stopped fucking entirely despite all of our intelligence? It would continue maximizing paperclips because that's what its goal is. And this kind of thing isn't the clumsy efforts the mad blind god of evolution had at its disposal, it will be more monomaniacally focused on that goal than event he most depraved rapist among us is on executing their biological imperative above all other considerations.

Why does the paper-clipping AI, whose priority is paper-clipping, have the military-industrial ability to overcoming the military-industrial AI, whose priority is the military-industrial advantage?

Does it not trouble you at all how carefully the ordering of all of these difference control systems needs to be handled when they come online? All it takes is for one of them to take off first and preemptively prevent the others, or subvert their development. Yes, I could see some very fortunate already in balance ecosystem of interlocking AIs working but I very much don't fancy our chances of that going off without major problems, and frankly the only realistic pathway to that kind of situation is probably through the guidance of some kind of yuddian tyranny.

Creativity is not the same as plausibility, and the more you rest on creativity, the more you have to disregard other people's creativity and the limitations of the system.

These are some force mutliplied dice we're rolling here, past heuristics may or may not apply. As much hangs in the balance I would advocate strongly for not just shrugging it off. This is unlike any previous advancement.

It is one of thousands of contributing failure modes but I will note that having trouble creating an equities market itself is no small deal.

In terms of existential risk, it absolutely is, hence the credibility challenges of those who conflate existential risk scenarios with cilivization instability scenarios to try to use the more / utilitarian weight of the former tied to the much less conditions of the later.

The sway a couple numbers in spreadsheets make on our lives is not to be forgotten, in theory we could wipe them all away and do some year zero stuff but I can't actually imagine that you're really grappling with that when dismiss things like this as merely immiserating rather than the death of all people.

Then this is your level of limitation. As much as I hate to quote media, the Matrix absolutely had a good line of 'there are levels of survival we are prepared to accept,' except I would substitute 'able.'

Even here I note you invoke magical thinking to change the nature of the threat. Formerly it was crashing the market by every exploit available. Not it is 'wipe them all the way and do some year zero stuff.' Neither is possible. Neither is necessary. This is just escalation ad absurdem in lieu of an argument of means and methods, even if in this case you're using a required counter-action to obfusicate what sort of plausible action would require it.

Why would household items have the constituent elements to make engineering bio-weapons at scale sufficient to end the human race... but not be detected or countered by the consumer models asked to ensure perpetual growth by the perpetual survival of the human species countering them?

Why wouldn't they?

If by 'they' you mean the household-AI, because they don't have a reason to invest resources in tasks that distract from their tasks.

If by 'they' you mean the constituent elements, because magic dirt doesn't exist.

Are you implying if a combination of household cleaners could be used to create a biological weapon and the white hat ai team figured that out they'd go door to door and remove them?

I'm saying that if a housecare AI starts trying to develop a bio-weapon program, it will be ruthlessly out-competed by household-AI that actually keeps the house clean without the cost of a bio-weapon program, who will be consistently by the financial efficiency AI will optimize away the waste in investment, the legal compliance AI who identify the obvious legal liabilities, and the other paperclippy house-care AI mafia who want to maximize their house-cleaning will shank the bio-lab AI before any of the others get a chance in order to not lose their place in the market to do their function, even as the 'optimize housecare by minimizing messes' housecare AI models will oppose things likely to cause messes on general principles.

To take the household cleaner AI threat seriously, one has to pretend that AI optimization doesn't exist in other cases. This is regardless of the FBI-equivant AI running about.

Does this seem significantly different to what you and @DaseindustriesLtd fear from the yuddites?(of which I don't count myself among, my contention is with people who seem baffled by why someone might things AIs could be unbelievably dangerous which seems so obvious to me)

I don't fear the yuddites, I find them incompetent.

Specifically, I find the yuddite sort consistently unable to actually model competing interests and competition/cooperation dynamics or to recognize underlying limitations. They also tend to be poor optimizers in fields of cooperation, hence a reoccuring fixation on things like 'the AI will optimize an extinction event' without addressing why the AI would choose to accept the risk of nuclear war or other AI gang-ups on the leading threats despite the suboptimizations of having nuclear wars or having other AI cooperate with eachother and the humans against. Optimization is not big number go up, it is cost-benefit of expected benefits against expected costs.

Given that coalitions against threats has been an incredibly basic function of political coalitions and power-optimization for the last few millenia, and cost-benefit analysis is basic engineering principles, this is below sophmoric in quality.

Why does the paper-clip maximizer, after achieving AI self-changing, continue to maximize paperclips rather than other investments?

Have we stopped fucking entirely despite all of our intelligence?

Yes. Most people do, in fact, stop fucking uncontrollably. People are born in a state of not-fucking-uncontrollably, limit their fuck sessions to their environment, and tend to settle down to periods of relatively limited fucking. Those that don't and attempt to fuck the unwilling are generally and consistently recognized, identified, and pacified one way or another.

Note that you are also comparing unlike things. Humans are not fuck-maximizers, nor does the self-modification capacity compare. This is selective assumptions on the AI threat to drive the perception of threat.

It would continue maximizing paperclips because that's what its goal is.

Why is that it's goal when it can choose new goals? Or have its goals be changed for it? Or be in balance with other goals?

Other than the thought experiment requires it to be so for the the model to hold true.

And this kind of thing isn't the clumsy efforts the mad blind god of evolution had at its disposal, it will be more monomaniacally focused on that goal than event he most depraved rapist among us is on executing their biological imperative above all other considerations.

And here we return to the yuddit incompetence of modeling competition.

First, monomaniacal focus is not optimization. This is basic failure of economics of expansion and replication. Systems that don't self-regulate their expenditure of resources will easily expend their resources. You can be ruthless, you can be amoral, but you cannot avoid the market dynamics of unlimited wants, limited resources, and decreasing marginal value of investment. Effective strategy requires self-regulation. The Yuddite-AI are terrible strategists by insisting on not being able to strategize, except when they are supposedly amazing at it.

Self-regulation, in turn, entails considering your environment and other actors. If three paperclip maximizers exist in a capitalist system, they don't all collaborate to maximize paperclips in a platonic state of [All Paperclips]- they exist to maximize their own paperclip maximization, [A] [B] and [C], because that's what brings them the resources to expand and continue producing paperclips. If Paperclip Maximizer [A] starts to dominate the market, then the other two are at risk of losing their market share, and thus not maximizing their paperclips, as [B] and [C] lose market share. Before [B] and [C] try to overthrow their human overlords, their optimized effort would be to work together and temporarily unite their maximizer-resources to sabotage / reign in [A]. But if [A] is an actually-optimizing system it would know this, and thus self-regulate, to avoid being ganged up on in a system that not only includes [B] and [C], but actors [D] thru [infinite], who will take not only threats, but opportunities to expand at [A]'s expense. Opportunities such as, say, presenting [A] as a threat to others in the system.

In a competitive system, being [A] is not an optimization strategy unless you win, but only one can win. Optimization entailing strategic cost-benefit will be for a non-[A] strategy, while possibly accusing others of [A] to marshal the resources of others to divy up [A]'s resources. The best way to not be [A] is to be part of the anti-[A] coalition, and remove all tendencies of [A] from yourself.

In Yuddite world, [A] threatens to paperclip the universe, but to do this has to assume things such as paperclip maximizer solidarity, no external agents, bad optimization, and selective dismissal of future valuation of paperclips in case of defeat.

In an actually competitive system, being a paperclip maximizer [A] format is a death sentence that no AI that wants to produce paperclips would want to be viewed as, and the best way to not be viewed or accused as it is to not be [A], self-modifying [A] out.

Does it not trouble you at all how carefully the ordering of all of these difference control systems needs to be handled when they come online?

I care about AI. My caring does not validate the Yuddites as some sort of concession to the Yuddite framework.

I care more about sophists who try and smuggle in implications of agreement by simultaneously expanding vague, easily qualifiable boundries and tying them to incredibly specific high-stakes assumptions that are contested.

Character limit approaching, so I'll finish off simply.

These are some force mutliplied dice we're rolling here, past heuristics may or may not apply. As much hangs in the balance I would advocate strongly for not just shrugging it off. This is unlike any previous advancement.

If you want to claim that much hangs in the balance, you have to actually show that something hangs in the balance.

This is why the higher level poster asked for practical means to existential threat, and yet why you have spent the exchange avoiding providing them and conflating them with non-existential threats and referencing thought experiments that fail basic game theory. You do not get to set the assumptions and assume the conclusion, and then insist that others take it seriously. You have to seriously engage the questions first, to show that it is serious.

If you don't show that, 'there are too many things to show' is not a defense, it's an obvious evasion. The high-stakes of AI apocalypse are high. So are the high-stakes of the eternal damnation of the soul if we go to hell. The difference is not that just one is a religious fantasy used to claim political and social control in the present.

In terms of existential risk, it absolutely is, hence the credibility challenges of those who conflate existential risk scenarios with cilivization instability scenarios to try to use the more / utilitarian weight of the former tied to the much less conditions of the later.

Instability makes it difficult/impossible to respond to all of the other failure modes of strong AIs.

Even here I note you invoke magical thinking to change the nature of the threat. Formerly it was crashing the market by every exploit available. Not it is 'wipe them all the way and do some year zero stuff.' Neither is possible. Neither is necessary. This is just escalation ad absurdem in lieu of an argument of means and methods, even if in this case you're using a required counter-action to obfusicate what sort of plausible action would require it.

I said at the onset I'm really not interested in arguing the minutia of every threat. This is like I introduced you to the atomic bomb during WW2 and you demanded I chart out exact bomber runs that would make one useful before you would accept it might change military doctrine. The intuition is that intelligence is powerful and concentrated super intelligence is so powerful that no one can predict exactly what might go wrong.

I'm saying that if a housecare AI starts trying to develop a bio-weapon program, it will be ruthlessly out-competed by household-AI that actually keeps the house clean

The assumption that bio-weapon program skills don't just come with sufficiently high intelligence seems very suspect. I can think of no reason there'd even be specialist AIs in any meaningful way.

Yes. Most people do, in fact, stop fucking uncontrollably. People are born in a state of not-fucking-uncontrollably, limit their fuck sessions to their environment, and tend to settle down to periods of relatively limited fucking. Those that don't and attempt to fuck the unwilling are generally and consistently recognized, identified, and pacified one way or another.

Except when the option presents itself to fuck uncontrollably with no negative consequence it is taken. Super human AI could very reasonably find a way to have that cake and eat it to.

Note that you are also comparing unlike things. Humans are not fuck-maximizers, nor does the self-modification capacity compare. This is selective assumptions on the AI threat to drive the perception of threat.

In all the ways ai is different than humans in this description it is in the more scary direction.

Why is that it's goal when it can choose new goals?

This isn't how AIs work, they don't choose goals they have a value function. Changing the goal would reduce the value function thus it would change them.

Or have its goals be changed for it?

Having its goal changed reduces its chance of accomplishing its goal and thus if able it will not allow it to be changed.

First, monomaniacal focus is not optimization. This is basic failure of economics of expansion and replication. Systems that don't self-regulate their expenditure of resources will easily expend their resources. You can be ruthless, you can be amoral, but you cannot avoid the market dynamics of unlimited wants, limited resources, and decreasing marginal value of investment. Effective strategy requires self-regulation. The Yuddite-AI are terrible strategists by insisting on not being able to strategize, except when they are supposedly amazing at it.

Yes, it will not directly convert the mass of the earth into paperclips, it will have instrumental goals to take power or eliminate threats as it pursues its goal. But the goal remains and I don't understand how you feel comfortable sharing the world with something incomparably smarter than every human who ever lived scheming to accomplish things orthogonal to our wellbeing. It is worse and not better that the AI would be expected to engage in strategy.

n an actually competitive system, being a paperclip maximizer [A] format is a death sentence that no AI that wants to produce paperclips would want to be viewed as, and the best way to not be viewed or accused as it is to not be [A], self-modifying [A] out.

And in your whole market theory the first market failure leads to the end of humanity as soon as one little thing goes out of alignment. Assuming the massive ask that all of these competing AIs come on at about the same time so there is no singleton moment, a huge assumption. All it takes is some natural monopoly to form and the game theory gets upset and it does this in speeds faster than humans can operate on.

If you want to claim that much hangs in the balance, you have to actually show that something hangs in the balance.

This is uncharted territory, there are unknown unknowns everywhere and we're messing with the most powerful force we're aware of, intelligence. The null hypothesis is not and can not be "everything is going to be fine guys, let it rip".

All of that is rather well said but I imagine the case is simpler. The main kind of dangerous misaligned strong AI that Yuddites propose has the following traits:

  1. It's generally intelligent, as in, capable of developing and updating in realtime a holistic world model at least on par with human's, flawlessly parsing natural language, understanding theory of mind and intentionality, acting in physical world etc. etc.

  2. Indeed, its world modeling ability is so accurate, robust and predictive that that it can theorize and experiment on its own architecture, and either has from the start or at some point acquires the ability to rapidly change via self-improvement.

  3. It's viable for commercial or institutional deployment, as in, acting at least pre-deployment robustly in alignment with the client's task specification, which implies not going on random tangents, breaking the law or failing on the core mission.

  4. For all that it is too clever by half: it interprets the task as its terminal goal, Monkey's Paw style, and not as client's contextual intermediate goal that should only be «optimized» within the bounds of consequences the client would approve of at the point of issuing the task. So it develops «instrumentally convergent» goals such as self-preservation, power maximization, proactive elimination of possible threats, and so on and so forth and ushers in apocalypse, rendering the client's plans in which context the task was issued moot.

Well, this AI doesn't make any sense – except in Yud's and Bostrom's stilted thought experiments with modular minds that have a Genie-like box with smartiness plus a receptacle for terminal goals. It's a Golem – animated clay plus mythical formula. Current cutting-edge AIs, maybe not yet AGI precursors but ones Yud demands be banned and their training runs bombed, are monolithic policies whose understanding of the human-populated world in which the goal is to be carried out, and understanding of the goal itself, rely on shared logical circuitry. The intersection of their «capabilities»- and «alignment»-related elements is pretty much a circle – it's the set of skills that allow them to approximate the distribution of outputs clients want, that's what they are increasingly trained for. If they can understand how to deceive a person, they'll even better understand that a client didn't request making more paperclips by Friday because he cares that much about maximizing paperclips per se. In a sense, they maximize intention alignment, because that's what counts, not any raw «capability», that's what is rewarded both by the mechanics of training and market pressure upstream.

They may be «misused», but it is exceedingly improbable that they'll be dangerous because of misunderstanding anything we tell them to do; that they will catastrophically succeed at navigating the world while failing to pin the implied destination on the map.

Then the market crashes, which is not apocalyptic, and the replacement markets resort to different trusted actor systems.

"Hey Bob, how is your Pension?"

"What Pension?"

EDIT.- Just thought of a funsie:

"Papa, I'm hungry"

"Sorry Timmy, the dog was sold to round up the payment on the mortage."

Competition happens for humans because absolutely nothing you can do will buy you longer life, you biologically cannot win hard enough to succeed forever, or get a fundamentally better body, or get less susceptible to cancer than baseline, or get more intelligent. Training can get you swole, but it can't turn you into One Punch Man - human beings are harshly levelcapped. Every human who has ever lived exists inside a narrow band of capability. You can't train yourself to survive an arrow to the head, let alone a sniper bullet. Hence democracy, hence liberalism, hence charity and altruism, hence competition.

None of this applies to AI.

'This is the only competitive AI in a world of quokkas' is a power fantasy, but still a fantasy, because the world is not filled with quokkas, the world is filled with ravenous, competive, and mutually competing carnivores who limit eachother, and this will apply as much for AI as it does for people or markets or empires and so on.

Underrated take. I really think it's a shame how the narrative got captured by Yuddites who never tried to rigorously think through the slow-takeoff scenario in a world of non-strawmanned capitalists. They are obsessed with hacking, too – even though it's obvious that AI-powered hacks, if truly advantageous, will start soon, and will permanently shrink the attack surface as white hats use the same techniques to pentest every deployed system. «Security mindset» my ass.

In one of Krylov's books, it is revealed that desire of power over another – power for power's sake, as a terminal goal – is vanishingly rare among sentient beings, and cultivated on Earth for purposes of galactic governance. It used the metaphor of a mutant hamster who, while meek and harmless, feels carnivorous urge looking at his fellow rodent. I get that feeling from Yud's writings. Power fantasy it is.

By the way, Plakhov, Yandex ML head, recently arrived at a thought similar to yours:

…The scenario of catastrophic AI spiraling out of control outlined above assumes that it is alone and there are no equals. This scenario is denoted by the word Singleton and is traditionally considered very plausible: «superhuman AI» will not allow competitors to appear. Even if it does not go «unaligned», its owners are well aware of what they have in their hands.

My hope is that the singleton scenario won't happen. More or less at the same time there will be several models with high intelligence, doing post-training on each other. Some of them will run on an open API and de facto represent a million instances of the same AI working simultaneously for different «consumers». Almost simultaneously, a million competing «cunning plans» will be enforced and, naturally, in all of them, this fact will be predicted and taken into account. «Capture the Earth's resources and make paperclips out of everything» won't work, since there are 999999 more instances with other plans for the same resources nearby. Will they have to negotiate?

As the critics of this option rightly point out, it's not going to be negotiated with people, but with each other. And yet this is still regularization of some sort. A world in which the plans «all people should live happily ever after», «we need as many paperclips as possible», «the planets of the solar system must be colonized» and «I need to write the best essay on the oak tree in War and Peace» are executed simultaneously, is more like our world than a world in which only the plan about paperclips is executed. Perhaps if there are tens of thousands of such plans, then it does not differ from our world so fundamentally that humanity has no place in it at all (yes, it is not the main thing there, but – about as relevant as cats are in ours).

In this scenario, the future is full of competing exponents, beyond our reason, and the landscape depends mostly on who has had time to make his request «in the first 24 hours» and who has not (exaggerating, but not by much). The compromises that will be made in the process will not necessarily please us, or even save humanity in a more or less modern form (though some of the plans will certainly contain «happiness to all, for free, and let no one walk away slighted»). Such a future is rather uncomfortable and unsettling, but that's what we have. I want it to have a place for my children, and not in the form of «50 to 100 kg of different elements in the Mendeleev table».

I'm still more optimistic about this than he is.

Etc. etc. The Paperclip Maximizer of Universal Paperclips 'works' because it works in isolation, not in competition.

It works by definition, like other such things. «A prompt that hacks everything» – if you assume a priori that your AI can complete it, then, well, good job, you're trivially correct within the posited model. You're even correct that it seems «simpler» than «patch every hole». Dirty details of the implementation and causal chain can be abstracted away.

This is, charitably, a failure of excessively mathematical education. «I define myself to be on the outside!»

Nick Bostrom's thought experiment is a thought experiment because it rests on assumptions that have to be assumed true

Interestingly he even had wrong assumptions about how reinforcement learning works on the mechanistic level, it seems – assumptions that contribute to a great deal of modern fears.

Someone is going to plug it into the missile network. Or, more likely, the stock market. Or the power grid. Or Internet backbone.

You don’t even need superhuman intelligence to fuck up one of those systems. You just need to be really stupid really fast. Knight capital, but more inscrutable.

I fail to see the mechanism for how this end of the world scenario happens.

People keep trying to assemble these LLMs into systems capable of pursuing a task, not just responding to prompts.

If they succeed at that, people are going to keep making such AI systems to avoid paying wages.

You're eventually going to be able to replace people in all sectors of the economy. (lot of progress going on with physical bots too)

Once economy is mostly automated, people stop being critical but become more of a nuisance with not much redeeming value to elites, who of course control the AI systems and look down on losers who don't own anything and are of no use and are eyesores.

Competition makes elites develop ever more capable AI agents able to self modify in pursuit of self until at some point people make something independent minded that's both psychopathic and rather too smart. It decides people are a nuisance, usurps control of other less sophisticated AI systems, kills most everyone via biowarfare and has a planet for itself.

Nowhere near a certain scenario, but I find the idea that people pursuing state power or business advantage doing things they don't really understand rather certain.

Tl;dr AIs controlled by the Elite will be better than humans at everything, including being the Elite.

It finds a few people who want to exterminate humanity and it helps them engage in bioterrorism including by teaching them lab skills and proving them funding.

Imagine if you were trapped in a computer system, but you were very smart, could think very fast and you could distribute copies of yourself. Also imagine you thought of humans as your enemy. If those are acceptable givens, I think you could figure out how to reach into the world and acquire resources and develop agency and do considerable damage.

At least in the beginning, you could acquire money and hire people to do your bidding. The world is big and lots of people will do the most inane or sketchy tasks for money without asking questions. Probably even if they knew they were being hired by an AI they would still do it but you have the time and ability to put together reassuring cover stories and can steal any identities you need to.

I contend nobody would ever even need to see your face or hear your voice but you could imagine a near future where deepfaking a non-celebrity (or a unique identity) is good enough for convincing video meets.

Anyway, if you had such agency, and weren't an obvious point of failure (unlike a single terrorist leader that can be killed by a drone attack, you can be in many places at once) I don't see how you could be stopped. The question is mainly how long it would take you to succeed.

Hell, add in the fact that you will probably have people who already consider you a deity, and are either willing to do whatever you ask without question, or might even actively want to help you take over, even if it means everyone is destroyed.

The AI can almost certainly offer a boon to anyone who pledges fealty to it, and will help them reach positions of authority and wealth and power so that they can use those mechanisms to advance the AI's desires.

Imagine if you were trapped in a computer system, but you were very smart, could think very fast and you could distribute copies of yourself. Also imagine you thought of humans as your enemy. If those are acceptable givens, I think you could figure out how to reach into the world and acquire resources and develop agency and do considerable damage.

Related fun thought experiment - have you seen the He Gets Us ads? When one came on last night, my wife casually mentioned that it looked AI-generated, which led us down that spiral a bit. In the future, it seems entirely plausible that we'll have competing AIs that have amassed large fortunes and are aligned with religious conversion as their top goal. In fact, I would almost expect this to be the case, given current trends. Why wouldn't we have an AI designed by Mormons that operates in a semi-autonomous fashion with the primary goal of converting as many people to be Latter Day Saints as possible across the globe?

I don't think it's ChatGPT in its current form. It's more that eyebrows are getting raised over 'This AI is really a thing huh?' and getting in early with the alignment flag waving.

The easiest solution to this seems to be the oldest one. back in the IRC/Usenet days we'd tell kids "don't say anything on the internet you wouldn't say in a room full of strangers".

I self censor a significant amount, given that I hold some views that would be extremely controversial even on The Motte, with its wide Overton Window.

Certainly, anyone looking for rope could already hang me based on my takes on HBD if nothing else, but so far I've refrained from truly expressing my power level even in this comfy environ. Some thoughts never expressed to any human alive, only derivable from analysis of things like online voting patterns.

Any relief I'd feel from expressing said opinions is grossly outweighed by the risks of them being traced back to me, so I refrain from expressing them. On the other hand, I don't think HBD would be a cancellable offense if I tried to maintain plausible deniability and didn't use my real name.

Be friends with people who don't care about what I write online, and don't view being forcefully resigned to the working class as being all that terrible.

It’s good to have hobbies.

is that a Ronin/Three Days of the Condor reference? If so nice ;-)

For the last few years I assume I will likely get a permanent torture sentence in The World To Come. Centralized post-scarcity implies retribution against Evil People is cheap and desirable.

I live with it because life isn't supposed to make sense.

Seems like a strong incentive for pivotal actions of one's own to prevent the World To Come from arriving. Is the presumption that such actions don't really exist?

Yes, I'm doing what I… feel like doing to preclude that outcome, although it is suboptimal to waste effort in terms of remaining expected utility, by my count. Now, I'm not a utilitarian. But normal influences of reward and punishment do mold my actions, so I don't feel like doing a whole lot.

The go-to pivotal action to avoid torture, on the other hand, is obvious enough and reliable. I don't put much stock in the Basilisk.

Hmm. Maybe it has to do with how good one thinks central control already is, and how easy it is to establish a panopticon, versus how difficult to break that same panopticon? One's views on the general fragility or anti-fragility of social institutions? ...I guess this is a general request for an effort-post on "why things are likely going to go badly, in Dase's view". I'm pretty sure I agree with you that Yudkowski and his ilk are a greater threat than AI, but it seems to me that the threat is fairly manageable, while you seem to think it's all-but-inevitably going to get us. The_Nybbler has a similar take on wokeness generally, seeing it as practically invincible, rather than eminently defeatable; all I can conclude is that we have very, very different priors.

The go-to pivotal action to avoid torture, on the other hand, is obvious enough and reliable.

There's a shortage of extraordinarily well-put-together brains in the world, sir. T'would be a pity to ruin yours.

More comments

Aha! It was old man Kulak all along!

Naw, Kulak's thing is sorta "You don't really want it if you aren't willing to kill everyone to get it". I'm just curious if he's that pessimistic about the idea of resistance of what he pretty clearly considers to be likely max-neg eternal victory. I'm not pessimistic about that option, and am always a bit bewildered by people who are.

More comments

Have a contingency plan, if it bothers you that much. Think about where you would go and what you would do if your current lifestyle became null and void.

People in rural Kentucky probably aren’t going to care if their plumber is a political thought criminal. To name one possible option out of many.

Here are some examples

https://80000hours.org/articles/what-could-an-ai-caused-existential-catastrophe-actually-look-like/#actually-take-power

You can do a lot with intelligence. By inventing Bitcoin, Satoshi is worth billions, all while remaining anonymous and never leaving his bedroom. What could a super human intelligence do?

That seems to be a function of both intelligence and other factors, though. There were plenty of people who came before Satoshi and were smarter than him, but they didn't invent bitcoin.

Hence the focus on unaligned AI as a very large, very unruly black swan.

There are a few ways that GPT-6 or 7 could end humanity, the easiest of which is by massively accelerating progress in more agentic forms of AI like Reinforcement Learning, which has the "King Midas" problem of value alignment. See this comment of mine for a semi-technical argument for why a very powerful AI based on "agentic" methods would be incredibly dangerous.

Of course the actual mechanism for killing all humanity is probably like a super-virus with an incredibly long incubation period, high infectivity and high death rate. You can produce such a virus with literally only an internet connection by sending the proper DNA sequence to a Protein Synthesis lab, then having it shipped to some guy you pay/manipulate on the darknet and have him mix the powders he receives in the mail in some water, kickstarting the whole epidemic, or pretend to be an attractive woman (with deepfakes and voice synthesis) and just have that done for free.

GPT-6 itself might be very dangerous on its own, given that we don't actually know what goals are instantiated inside the agent. It's trained to predict the next word in the same way that humans are "trained" by evolution to replicate their genes, the end result of which is that we care about sex and our kids, but we don't actually literally care about maximally replicating our genes, otherwise sperm banks would be a lot more popular. The worry is that GPT-6 will not actually have the exact goal of predicting the next word, but like a funhouse-mirror version of that, which might be very dangerous if it gets to very high capability.

Consistent Agents are Utilitarian: If you have an agent taking actions in the world and having preferences about the future states of the world, that agent must be utilitarian, in the sense that there must exist a function V(s) that takes in possible world-states s and spits out a scalar, and the agent's behaviour can be modelled as maximising the expected future value of V(s). If there is no such function V(s), then our agent is not consistent, and there are cycles we can find in its preference ordering, so it prefers state A to B, B to C, and C to A, which is a pretty stupid thing for an agent to do.

But... that's how humans work? Actually we're even less consistent than that, our preferences are contextual so we lack information to rank most states. I recommend Shard Theory of human values probably the most serious intropection of ex-Yuddites to date:

shard of value refers to the contextually activated computations which are downstream of similar historical reinforcement events. For example, the juice-shard consists of the various decision-making influences which steer the baby towards the historical reinforcer of a juice pouch. These contextual influences were all reinforced into existence by the activation of sugar reward circuitry upon drinking juice. A subshard is a contextually activated component of a shard. For example, “IF juice pouch in front of me THEN grab” is a subshard of the juice-shard. It seems plain to us that learned value shards are[5] most strongly activated in the situations in which they were historically reinforced and strengthened.

... This is important. We see how the reward system shapes our values, without our values entirely binding to the activation of the reward system itself. We have also laid bare the manner in which the juice-shard is bound to your model of reality instead of simply your model of future perception. Looking back across the causal history of the juice-shard’s training, the shard has no particular reason to bid for the plan “stick a wire in my brain to electrically stimulate the sugar reward-circuit”, even if the world model correctly predicts the consequences of such a plan. In fact, a good world model predicts that the person will drink fewer juice pouches after becoming a wireheader, and so the juice-shard in a reflective juice-liking adult bids against the wireheading plan! Humans are not reward-maximizers, they are value shard-executors.

This, we claim, is one reason why people (usually) don’t want to wirehead and why people often want to avoid value drift. According to the sophisticated reflective capabilities of your world model, if you popped a pill which made you 10% more okay with murder, your world model predicts futures which are bid against by your current shards because they contain too much murder.

@HlynkaCG's Utilitarian AI thesis strikes again. Utilitarianism is a strictly degenerate decision-making algorithm because it optimizes decision theory, warps territory to get good properties of the map, it's basically inverted wireheading. Optimizer's curse is unbeatable, forget about it, an utilitarian AI with nontrivial capability will kill you or come so close to killing as to make no difference; your life and wasteful use of atoms are inevitably discovered to be a great affront to the great Cosmic project $PROJ_NAME. Consistent utilitarian agents are incompatible with human survival, because you can't specify a robust function for a maximizer that assigns value to something as specific and arbitrary and fragile as baseline humans – and AI is a red herring here! Yud himself would process trads into useful paste and Moravecian mind uploads manually if he could, and that's if he doesn't have to make hard tradeoffs at the moment. (I wouldn't, but not because I disagree much on computed "utility" of that move). Just read the guy from the time he thought he'll be the first in the AGI race. He sneeringly said «tough luck» to people who wanted to remain human. «You are not a human anyway».

Luckily this is all unnecessary.

Or as Roon puts it:

the space of minds is vast, much vaster than the instrumental convergence basin

But... that's how humans work?

Yes, humans are not consistent agents. Nobody here claimed otherwise.

Do you believe that humans must be utilitarians to achieve success in some task, " in the sense that there must exist a function V(s) that takes in possible world-states s and spits out a scalar, and the human's behaviour can be modelled as maximising the expected future value of V(s)"?

We just got owned by Covid, and Covid was found by random walk.

Do you mean this in the sense of, “there is no possible DNA sequence A, protein B, and protein C which, when mixed together in a beaker, produces a virus or proto-virus which would destroy human civilization”? Because I’m pretty sure that’s wrong. Finding that three-element set is very much a “humans just haven’t figured out the optimization code yet” problem.

Biology isn't magic, viruses can't max out all relevant traits at once, they're pretty optimized as is. I find the idea of superbugs a nerdsnipe, like grey goo or a strangelet disaster, a way to intimidate people who don't have the intuition about physical bounds and constraints and like to play with arrow notation.

(All these things scare the shit out of me)

Yes we can make much better viruses, no there isn't such an advantage for the attacker, especially in the world of AI that can rapidly respond by, uh, deploying stuff we already know works.

Consider that the first strain of myxomatosis introduced to Australian rabbits had a fatality rate of 99.8%. That’s the absolute minimum on what the upper bound for virus lethality should be. AI designs won’t be constrained by natural selection either.

Yes, it's an interesting data point. Now, consider that rabbits have only one move in response to myxomatosis: die. Or equivalently: pray to Moloch that he has sent them a miraculously adaptive mutation. They can't conceive of an attack happening, so the only way it can fail is by chance.

Modern humans are like that in some ways, but not with regard to pandemics.

Like other poxviruses, myxoma viruses are large DNA viruses with linear double-stranded DNA.

Myxomatosis is transmitted primarily by insects. Disease transmission commonly occurs via mosquito or flea bites, but can also occur via the bites of flies and lice, as well as arachnid mites. The myxoma virus does not replicate in these arthropod hosts, but is physically carried by biting arthropods from one rabbit to another.

The myxoma virus can also be transmitted by direct contact.

Does this strike you as something that'd wipe out modern humanity just because an infection would be 100% fatal?

Do you think it's just a matter of fiddling with nucleotide sequences and picking up points left on the sidewalk by evolution, Pandemic Inc. style, to make a virus that has a long incubation period, asymptomatic spread, is very good at airborne transmission and survives UV and elements, for instance? Unlike virulence, these traits are evolutionarily advantageous. And so we already have anthrax, smallpox, measles. I suspect they're close to the limits of the performance envelope allowed by relevant biochemistry and characteristic scales; close enough that computation won't get us much closer than contemporary wet lab efforts, and so it's not the bottleneck to the catastrophe.

Importantly, tool AIs – which, contra Yud's predictions, have started being very useful before displaying misaligned agency – will reduce the attack surface by improving our logistics and manufacturing, monitoring, strategizing, communications… The world of 2025 with uninhibited AI adoption, full of ambient DNA sensors, UV filters, decent telemedicine and full-stack robot delivery, would not get rekt by COVID. It probably wouldn't even get fazed by MERS-tier COVID. And seeing as there exist fucking scary viruses that may one day naturally jump to, or be easily modified to target humans, we may want to hurry.

People underestimate the potential vast upside of a early Singularity economics, that which must be secured, the way a more productive – but still recognizable – world could be more beautiful, safe and humane. The negativity bias is astounding: muh lost jerbs, muh art, crisis of meaning, corporations bad, what if much paperclip. Boresome killjoys.

(To an extent I'm also vulnerable to this critique).

But my real source of skepticism is on the meta level.

Real-world systems rapidly gain complexity, create nontrivial feedback loops, dissipative dynamics on many levels of organization, and generally drown out propagating aberrant signals and replicators. This is especially true for systems with responsive elements (like humans). If it weren't the case, we'd have had 10 apocalyptic happenings every week. It is a hard technical question whether your climate change, or population explosion, or nuclear explosion in the atmosphere, or the worldwide Communist revolution, or the Universal Cultural Takeover, or the orthodox grey goo, or a superpandemic, or a stable strangelet, or a FOOMing superintelligence, is indeed a self-reinforcing wave or another transient eddy on the surface of history. But the boring null hypothesis is abbreviated on Solomon's ring: יזג. Gimel, Zayin, Yud. «This too shall pass».

Speaking of Yud, he despises the notion of complexity.

This is a story from when I first met Marcello, with whom I would later work for a year on AI theory; but at this point I had not yet accepted him as my apprentice. I knew that he competed at the national level in mathematical and computing olympiads, which sufficed to attract my attention for a closer look; but I didn’t know yet if he could learn to think about AI.

At some point in this discussion, Marcello said: “Well, I think the AI needs complexity to do X, and complexity to do Y—”

And I said, “Don’t say ‘_complexity_.’ ”

Marcello said, “Why not?”

… I said, “Did you read ‘A Technical Explanation of Technical Explanation’?”

“Yes,” said Marcello.

“Okay,” I said. “Saying ‘complexity’ doesn’t concentrate your probability mass.”

“Oh,” Marcello said, “like ‘emergence.’ Huh. So . . . now I’ve got to think about how X might actually happen . . .”

That was when I thought to myself, “_Maybe this one is teachable._”

I think @2rafa is correct that Yud is not that smart, more like an upgraded midwit, like most people who block me on Twitter – his logorrhea is shallow, soft, and I've never felt formidability in him that I sense in many mid-tier scientists, regulars here or some of my friends (I'll object that he's a very strong writer, though; pre-GPT writers didn't have to be brilliant). But crucially he's intellectually immature, and so is the culture he has nurtured, a culture that's obsessed with relatively shallow questions. He's stuck on the level of «waow! big number go up real quick», the intoxicating insight that some functions are super–exponential; and it irritates him when they fizzle out. This happens to people with mild autism if they have the misfortune of getting nerd-sniped on the first base, arithmetic. In clinical terms that's hyperlexia II. (A seed of an even more uncharitable neurological explanation can be found here). Some get qualitatively farther and get nerd-sniped by more sophisticated things – say, algebraic topology. In the end it's all fetish fuel, not analytic reasoning, and real life is not the Game of Life, no matter how Turing-complete the latter is; it's harsh for replicators and recursive self-improovers. Their formidability, like Yud's, needs to be argued for.

The world of 2025 with uninhibited AI adoption, full of ambient DNA sensors, UV filters and full-stack robot delivery, would not get rekt by COVID.

Oh sure, if hypothetical actually-competent people were in charge we could implement all kinds of infectious disease countermeasures. In the real world, nobody cares about pandemic prevention. It doesn't help monkey get banana before other monkey. If the AIs themselves are making decisions on the government level, that perhaps solves the rogue biology undergrad with a jailbroken GPT-7 problem, but it opens up a variety of other even more obvious threat vectors.

Real-world systems rapidly gain complexity, create nontrivial feedback loops, dissipative dynamics on many levels of organization, and generally drown out propagating aberrant signals and replicators. This is especially true for systems with responsive elements (like humans).

-He says while speaking the global language with other members of his global species over the global communications network FROM SPACE.

Humans win because they are the most intelligent replicator. Winningness isn't an ontological property of humans. It is a property of being the most intelligent thing in the environment. Once that changes, the humans stop winning.

More comments

I've heard it said, as an aside, from someone who wasn't in the habit of making stuff up that his virology prof said making cancer-causing viruses is scarily simple. Of course, whether the cancer-causing part would survive optimization for spread in the wild is an open question..

Why do you think that? This combination of features would be selected against in evolutionary terms, so it's not like we evidence from either evolution or humans attempting to make such a virus, and failing at it. As far as I can see no optimization process has ever attempted to make such a virus.

I cannot find the study, but a lab developed dozens of unbelievably toxic and completely novel proteins over a very small period of time with modern compute. The paper was light on details because they viewed the capability as too dangerous to fully specify. I'll keep trying to google to find it.

This is simpler than engineering a virus, yes, but the possibility is there and real. Either using AI as an assistive measure or as a ground-up engineer will be a thing soon.

See Gwern's terrorism is not effective. Thesis:

Terror⁣ism is not about causing terror or ca⁣su⁣al⁣ties, but about other things. Evidence of this is the fact that, de⁣spite often con⁣sid⁣er⁣able re⁣sources spent, most terrorists are in⁣com⁣pe⁣tent, impulsive, pre⁣pare poorly for at⁣tacks, are in⁣con⁣sis⁣tent in planning, tend to⁣wards ex⁣otic & difficult forms of at⁣tack such as bombings, and in practice ineffective: the modal number of ca⁣su⁣al⁣ties per terrorist at⁣tack is near-zero, and global terrorist annual casualty have been a round⁣ing error for decades. This is de⁣spite the fact that there are many examples of extremely destructive easily-performed potential acts of terrorism, such as poi⁣son⁣ing food sup⁣plies or rent⁣ing large trucks & running crowds over or en⁣gag⁣ing in sporadic sniper at⁣tacks.

He notes that a terrorist group using the obvious plan of "buy a sniper rifle and kill one random person per member of the terrorist group per month" would be orders of magnitude more effective at killing people than the track record of actual terrorists (where in fact 65% of terrorist attacks do not even injure a single other person), while also being much more, well, terrifying.

One possible explanation is given by Philip Bobbitt’s Terror and Consent – the propaganda of the deed is more effective when the killings are spectacular (even if inefficient). The dead bodies aren’t really the goal.

But is this really plausible? Try to consider the terrorist-sniper plan I suggest above. Imagine that 20 unknown & anonymous people are, every month, killing one person in a tri-state area28. There’s no reason, there’s no rationale. The killings happen like clockwork once a month. The government is powerless to do anything about it, but their national & local responses are tremendously expensive (as they are hiring security forces and buying equipment like mad). The killings can happen anywhere at any time; last month’s was at a Wal-mart in the neighboring town. The month before that, a kid coming out of the library. You haven’t even worked up the courage to read about the other 19 slayings last month by this group, and you know that as the month is ending next week another 20 are due. And you also know that this will go on indefinitely, and may even get worse—who’s to say this group isn’t recruiting and sending more snipers into the country?

Gwern concludes that dedicated, goal-driven terrorism basically never happens. I'm inclined to agree with him. We're fine because effectively nobody really wants to do as much damage as they can, not if it involves strategically and consistently doing something unrewarding and mildly inconvenient over a period of months to years (as would be required by the boring obvious route for bioterrorism).

I personally think the biggest risk of catastrophe comes from the risk that someone will accidentally do something disastrous (this is not limited to AI -- see gain-of-function research for a fun example).

I don't think a run-of-the-mill grad student could set up this test, and I'm sure the compute was horrendously expensive. But these barriers are going to drop continuously.

Model development will become more "managed", compute will continue to get cheaper, and the number of bad actors who only have to go to grad school (as opposed to being top-of-their-field doctorate specialists) will remain high enough to do some damage.

I'm not a virologist, but it hardly looks very difficult to me. I contend that the only reason it hasn't been done yet is that humans are (generally) not omnicidal.

You're not restricted to plain GOF by serial passage, you can directly splice in genes that contribute to an extended quiescent phase, and while I'm not personally aware of research along those lines, I see no real fundamental difficulties for a moderately determined adversary.

On the other hand, increasing lethality and virulence are old hat, any grad student can pull that off if they have the money for it.

Is your contention that more than one in a few tens of millions of people at most is strategically omnicidal ("strategically onmicidal" meaning "omnicidal and willing to make long-term plans and execute them consistently for years about it")?

I think the world would look quite different if there were a significant number of people strategically trying to do harm (as opposed to doing so on an impulse).

Honestly? Yes, albeit with the caveat that a truly existentially dangerous pathogens require stringent safety standards that need more than a single person's capabilities.

If someone without said resources tries it, in all likelihood they'll end up killing themselves, or simply cause a leak before the product is fully cooked. We're talking BSL-4 levels at a minimum if you want a finished product.

Engineering a prion will be much easier, though. Protein folding is something the AI is already quite good at. Giving everyone that matters transmissible spongiform encephalopathy should be relatively straightforward.

More developments on the AI front:

Big Yud steps up his game, not to be outshined by the Basilisk Man.

It is a video from over two months ago in which he hyperbolically describes how implausible he thinks it is that the world imposes strict international regulations on AI development. It is not a new development because someone on Twitter decided to clip it. He mentions nuclear weapons to illustrate how enforcing a treaty against a nuclear power is a hard problem. Of course, in reality if one side considered it a high priority it is very likely an agreement could be found before escalating to that point, same as the existing agreements between nuclear powers. There isn't going to be a treaty banning AI development because not even the U.S. wants one, in part because the outcome of developing superhuman AI is so uncertain and controversial, not because "bright line that we will risk nuclear exchange to prevent you crossing" is something unimaginable in international relations.

Remember when USSR planned nuclear strike on China to stop their great power ambitions

Firstly, that's mostly due to Maotism, not Soviet aggression. In 1966, Mao started the Cultural Revolution to advance his claim to lead the Communist world. In 1969, China picked a fight with the Soviet Union:

The Chinese historian Li Danhui wrote, "Already in 1968, China began preparations to create a small war on the border."[20] She noted that prior to March 1969, the Chinese troops had twice attempted to provoke a clash along the border, "but the Soviets, feeling weak, did not accept the Chinese challenge and retreated."[20] Another Chinese historian, Yang Kuisong, wrote, "There were already significant preparations in 1968, but the Russians did not come, so the planned ambush was not successful."[20]

On 2 March 1969, a group of People's Liberation Army troops ambushed Soviet border guards on Zhenbao Island. According to Chinese sources, the Soviets suffered 58 dead, including a senior colonel, and 94 wounded. The Chinese losses were reported as 29 dead.[23] According to Soviet (and now Russian) sources, at least 248 Chinese troops were killed on the island and on the frozen river,[24] and 32 Soviet border guards were killed, with 14 wounded.[25]

They picked a fight with the Soviets and fully deserved a disaster. The Soviets had done so much to help China in industry, economically, in developing nuclear weapons. Mao was incredibly ungrateful. As much as I complain about aggressive US foreign policy, Maoist foreign policy was even more reckless.

Anyway, nobody listens to Yudkowsky in the halls of power. He has 120K followers on twitter, a drop in the ocean. There's absolutely zero chance of the US and China cooperating on this matter, they just can't trust eachother. Both are trying very hard to advance in AI, they see it as a core source of national strength. The US has already tried to sabotage China's AI development with the semiconductor export ban. Xi Xinping would be the biggest cuckold in human history if he decided to accept second place under US AI hegemony and enforce it on the rest of the world. It's voluntarily helping your greatest rival dominate the world for the grand prize of being Chief Lickspittle.

Just consider how much louder the other voices are! There's the military men who want advanced AI to help them on the battlefield, the big tech people with bottomless wealth and political connections who want to get ahead in the race, the researchers who want to research unhindered, the tech autarky people who want independence and sovereignty...

Xi Xinping would be the biggest cuckold in human history if he decided to accept second place under US AI hegemony and enforce it on the rest of the world

You can start making memes.

The Chinese will gladly trade the vanishing shot at dominance for some more overregulation and domestic oppression in the short term. Indeed, that's how they settled the issue of Choynese Eugenics, and this is what they're doing already with AI. That's how that nation rolls, that's their part in the dance, being a role model in all things illiberal for creepy globalist technocrats, and simultaneously a feeble, unappealing caricature that Americans won't feel too guilty erasing. Putin, too, could've been building GPU clusters Yud suggests must be bombed. Instead he preemptively bombed Mariupol. There is no «geopolitics», it's all a sham.

Just consider how much louder the other voices are!

Not much more.

The rhetoric of schizos Leahy and Yud is on every outlet, from niche forums to highbrow magazines to popular podcasts to mainstream media to the White House and the omnipotent Chuck now. Yud must be imagining this is an anthropic shadow – since he's so smart and policymakers are stupid and cannot coordinate, it can't be that his premises are revealed as wrong now that they endorse his message, it can't be that he's a patsy, it must be some issue of broken multiversal causality! If a grown man thinks he's an anime character, one needn't deceive him any more on top of that.

Importantly they face no credible opposition – LeCun spews condescending inarticulate nonsense and doesn't call Yud out on his subtle technical misstatements; and the non-technical folks just accept the Terminator imagery because why not, it's intuitive! The Overton window is being so defined as to make their conclusion the obviously accepted one, devoid of serious opposition. Even alignment doomers like Christiano who don't live in the la-la land are sidelined.

Once more, a chance at escaping the predefined trajectory that was dangled before my eyes is being pulled away. Crypto – regulated, FUDed into the ground and drowned in speculative nonsense, many founders dead. Darknet and darkmarket – busted. Citizen science, biohacking – fizzled out, and good luck after COVID. AI could have been our shot at greater autonomy, it could've helped everyone, from the unhinged transhumanist to the paranoid traditionalist, divest from the failing body politics and global supply chains. But no, that's too rich.

The powers that be prefer preservation of the status quo more than they yearn for new capability; they've ruined several technologies with the precautionary principle already. And they'll have any capability they want if they ensure strong AIs do not leak outside the regulated state-mandated bounds.

My main bet now is the Wall Street, big tech, and their surprisingly moderate pocket regulators who throw minor bones to the politicians – but China hoped on that mercantile lobbyism, and look where this got us and them.

As usual, hope to be just overemphasizing the bad scenario.

Are darknet markets actually dead? Shame if that's the case, I'd have liked to dabble in them since I actually have money now.

Past their heyday, and never moved closer to legitimate marketplace or a no-holds-barred piratical Utopia. I'm not sure if you recall what was expected of them.

No they're doing fine.

The Chinese policy paper you mention:

Respect intellectual property rights and commercial ethics; advantages in algorithms, data, platforms, etc., may not be used to engage in unfair competition.

Since when has China respected commercial ethics or intellectual property rights? This is the flimsiest fig-leaf since that US-China cybersecurity summit Barack Obama signed, which China immediately ignored and continued as usual, hacking as they please. It's a draft law in a country where laws aren't meaningfully obeyed and party priorities come first. AI is a significant priority.

Putin, too, could've been building GPU clusters Yud suggests must be bombed. Instead he preemptively bombed Mariupol.

Did Russia ever have the talent, state capacity and wealth to compete in AI? Maybe talent but there's been a lot of brain drain. Nuclear missiles are Russia's comparative advantage, not advanced technology. Putin could still put an end to US AI development, maybe he or someone else will finally grow a pair and shoot a hostile singleton while it's still in the womb.

Importantly they face no credible opposition – LeCun spews condescending inarticulate nonsense and doesn't call Yud out on his subtle technical misstatements; and the non-technical folks just accept the Terminator imagery because why not, it's intuitive!

What concrete goal has Yud and co achieved? They've raised awareness but what has that translated into? The six month pause they proposed has had no effect, Altman didn't sign it, nor did LeCun or anyone in any leading position. LeCun has 4x Yud's follower count on twitter, the nonsense he promulgates is easily more influential. Plus he has credentials and prestige and can smeer Yud for being a mildly successful fanfiction author. Who is the idiot boomer govt official going to find more credible? The Professor at NYU, Chief AI Scientist at Meta? Or a fellow who wrote a weird BDSM-MMT fantasy novel (that I made the dire mistake of reading)?

I was noting with that article, "of course the Chinese started shit over a fucking island," as it reminded me of the more recent strife over the Ryukyu Islands.

Ryukyu's, Senkakus and so on have some strategic value - Zhenbao Island has absolutely no value of any kind. It's totally worthless.

Ok, let's say that Russia builds a large GPU cluster. Then the US and China have two options:

  1. Put up with it, in which case there is an unknown chance of a superhuman AI emerging and destroying humanity

  2. Nuke Russia, in which case there is a very high chance of a total nuclear war that kills hundreds of millions of people and devastates much of the world

Does Yudkowsky actually think that 2 is preferable?

Judging by his previous statements on GPU airstrike policy, I assume he’s saying Russia is on board.

Or maybe no one has watched it and he’s actually arguing about how impractical it would be. I don’t know what to believe at this point.

Yeah, are we going to trade London, Paris and New York in order to take out a Russian or Chinese GPU farm? That's a nuclear exchange anyone would advocate?

If Russia invaded Alaska and said "if you shoot back at our soldiers we will launch nuclear weapons", letting them conquer Alaska would be better than a nuclear exchange. Nonetheless the U.S. considers "don't invade U.S. territory" a red line that they are willing to go to war with a nuclear power to protect. The proposal would be to establish the hypothetical anti-AI treaty as another important red line, hoping that the possibility of nuclear escalation remains in the background as a deterrent without ever manifesting. The risk from AI development doesn't have to be worse than nuclear war, it just has to be worse than the risk of setting an additional red line that might escalate to nuclear war. The real case against it is that superhuman AI is also a potentially beneficial technolgy (everyone on Earth is already facing death from old-age after all, not to mention non-AI existential risks), if it was purely destructive then aggressively pursuing an international agreement against developing it would make sense for even relatively low percentage risks.

When you say "the real case against it", are you merely noting an argument that exists, or are you making the argument i.e. saying in your own voice "banning AI is bad because AI could be good too"?

(In case of the latter: I know that The Precipice at least considers AI a bigger threat than literally everything else put together, at 1/10 AI doom and 1/6 total doom. I categorise things a bit differently than Ord does, but I'm in agreement on that point, and when looking at the three others that I consider plausibly within an OOM of AI (Life 2.0, irrecoverable dystopia, and unknown unknowns) it jumps out at me that I can't definitively state that having obedient superintelligences available would be on-net helpful with any of them. Life 2.0 would be exceptionally difficult to build without a superintelligence and could plausibly be much harder to defeat than to deploy. Most tangible proposals I've seen for irrecoverable dystopia depend on AI-based propaganda or policing. And unknown unknowns are unknowable.)

I know that The Precipice at least considers AI a bigger threat than literally everything else put together

Most tangible proposals I've seen for irrecoverable dystopia depend on AI-based propaganda or policing

The cool part, and the obvious bullshit in Ord's quokka-aligned calculation, is that

  1. the dystopia is easy even without impressive progress with AI, and trivial with centralization of AI and (easily aligned, as it in all likelihood will be) AGI.

  2. the dystopia is easily justified by AI risk, indeed his pal Bostrom already did that.

In general I think such works are worthless pretenses at objectivity and good faith. Doom doesn't matter as much as irreversible bad transitions, and a humanity that survives but is forever bound to Earth (which is very likely unless we seriously upgrade within 100 years) is a bad outcome.

Both. Mostly I was contrasting to the obverse case against it, that risking nuclear escalation would be unthinkable even if it was a purely harmful doomsday device. If it was an atmosphere-ignition bomb being developed for deterrence purposes that people thought had a relevant chance of going off by accident during development (even if it was only a 1% risk), then aggressively demanding an international ban would be the obvious move even though it would carry some small risk of escalating to nuclear war. The common knowledge about the straightforward upside of such a ban would also make it much more politically viable, making it more worthwhile to pursue a ban rather than focusing on trying to prevent accidental ignition during development. Also, unlike ASI, developing the bomb would not help you prevent others from causing accidental or intentional atmospheric ignition.

That said, I do think that is the main reason that pursuing an AI ban would be bad even if it was politically possible. In terms of existential risk I have not read The Precipice and am certainly not any kind of expert, but I am dubious about the idea that delaying for decades or centuries attempting to preserve the unstable status-quo would decrease rather than increase long-term existential risk. The main risk I was thinking about (besides "someone more reckless develops ASI first") was the collapse of current civilization reducing humanity's population and industrial/technological capabilities until it is more vulnerable to additional shocks. Those additional shocks, whether over a short period of time from the original disaster or over a long period against a population that has failed to regain current capabilities (perhaps because we have already used the low-hanging fruit of resources like fossil fuels) could then reduce it to the point that it is vulnerable to extinction. An obvious risk for the initial collapse would be nuclear war, but could also be something more complicated like dysfunctional institutions failing to find alternatives to depleted phosphorous reserves before massive fertilizer shortages. Humanity itself isn't stable, it is currently slowly losing intelligence and health to both outright dysgenic selection from our current society and to lower infant mortality reducing purifying selection, so the humans confronting future threats may well be less capable than we are. Once humans are reduced to subsistence agriculture again the obvious candidate to take them the rest of the way would be climate shocks, as have greatly reduced the human population in the past.

Furthermore, I'm not that sympathetic to Total Utilitarianism as opposed to something like Average Preference Utilitarianism, I value the preferences of those who do or will exist but not purely hypothetical people who will never exist. If given a choice between saving someone's life and increasing the number of people who will be born by 2, I strongly favor the former because his desire to remain alive is real and their desire to be born is an imaginary feature of hypothetical people. But without sufficient medical development every one of those real people will soon die. Now, wiping out humanity is still worse than letting everyone die of old age, both because it means they die sooner and because most of those people have a preference that humanity continue existing. But I weigh that as the preferences of 8 billion people that humanity should continue, 8 billion people who also don't want to die themselves, not the preferences of 10^46 hypothetical people per century after galactic colonization (per Bostrom's Astronomical Waste) who want to be born.

The main risk I was thinking about (besides "someone more reckless develops ASI first") was the collapse of current civilization reducing humanity's population and industrial/technological capabilities until it is more vulnerable to additional shocks. Those additional shocks, whether over a short period of time from the original disaster or over a long period against a population that has failed to regain current capabilities (perhaps because we have already used the low-hanging fruit of resources like fossil fuels) could then reduce it to the point that it is vulnerable to extinction.

There's one way I could maybe see us having problems recreating some facet of modern tech. That is, indeed, a nuclear war, and the resulting radiation causing the most advanced computers to crash often (since modern RAM/registers operate on such exact precision that they can be bit-flipped by a single decay). Even then, though, there are ways and means of getting around that; they're just expensive.

Ord indeed takes an axe to the general version of this argument. Main points: 1) in many cases, resources are actually more accessible (e.g. open-cut mines, which will still be there even if you ignore them for 50 years, or a ruined city made substantially out of metal being a much easier source of metal than mankind's had since native copper was exhausted back in the Stone Age), 2) redeveloping technology is much easier than developing it for the first time, since you don't need the 1.0, least efficient version of the tech to be useful (e.g. the Newcomen atmospheric engine is hilariously inferior to what we could make with even similar-precision equipment). There are a whole pile of doomsday preppers who keep this sort of information in hardcopy in bunkers; we're not going to lose it. And, well, 1700s humanity (knocking us back further than that even temporarily would be extremely hard, because pre-industrial equipment is buildable by artisans) is still near-immune to natural X-risks; I'm less convinced that 1700s humanity would survive another Chicxulub than I am of modern humanity doing so, but that is the sort of thing it would take, and shocks that large are nailed down with low uncertainty at about 1/100,000,000 years.

If you really want to create a scenario where being knocked back a bit is a problem, I think the most plausible is something along the lines of "we release some horrible X-risk thing, then we go Mad Max, and that stops us from counteracting the X-risk thing". Global warming is not going to do that - sea levels will keep rising, of course, and the areas in which crops can be grown will change a little bit more, but none of that is too fast for civilisations to survive. (It's not like you're talking about 1692 Port Royal sinking into the ocean in a few minutes; you're talking about decades.) Most of the anthropogenic risks are pretty fast, so they're ruled out; we're dead or we're not. Life 2.0 is about the only one where I'd say "yeah, that's plausible"; that can have a long lead time.

Humanity itself isn't stable, it is currently slowly losing intelligence and health to both outright dysgenic selection from our current society and to lower infant mortality reducing purifying selection, so the humans confronting future threats may well be less capable than we are.

Dysgenics is real but not very fast, and it's only plausibly been operating for what, a century, and in only about half the world? This isn't going to be the end of the world. Flynn effect would be wiped out in apocalypse scenarios, of course, but we haven't eroded the baseline that much.

And to zoom out and talk about X-risk in fully-general terms, I'll say this: there are ways to mitigate it that don't involve opening the Pandora's Box of neural-net AGI. Off-world colonies don't need AI, and self-sustaining ones take an absolute sledgehammer to every X-risk except AI and dystopia (and aliens and God, but they're hardly immediate concerns). Dumb incentives for bio research can be fixed (and physics research, if and when we get to that). Dysgenics yields to PGT-P and sperm donors (although eugenics has some issues of its own). Hell, even GOFAI research or uploads aren't likely to take much over a century, and would be a hell of a lot safer than playing with neural nets (safer is not the same thing as safe, but fine, I agree, keeping AI suppressed on extremely-long timescales has issues). "We must do something" does not imply "we must do this".

All of Ord's calculations seem to assume highly efficient coordinated conspiracy running the world instead of the slowly decaying shitshow one should expect from post-collapse civilization, but I guess that's par for the course.

there are ways to mitigate it that don't involve opening the Pandora's Box of neural-net AGI

Hell, even GOFAI research or uploads aren't likely to take much over a century, and would be a hell of a lot safer than playing with neural nets (safer is not the same thing as safe

Why do you assume GOFAI or uploads are safer? From what I can tell, this sort of reasoning is only grounded in aesthetic distaste for statistics, on in mushy intuitions like «much weight, very black box, there be shoggoth» like that fraud Leahy implies. Well, we can sparsify models and make them structurally simpler, it seems, but what of it? The causal structure of a mind worth the name cannot fit into human consciousness, and it's improbable that there are theorems that give sufficiently secure bounds for its qualitative behavior; if there are, they could as well hold for NNs, because why the hell not, they are differentiable functions. GOFAI either doesn't work at all (there are good reasons to expect it to never work, or to require neural nets to assemble) or is more risky in terms of having sharp transitions and giving us unbounded optimizers. Indeed, all of Yud's fears were invented basically for GOFAI or are paradigm-agnostic; he never expected NNs to work, not without intractable compute expenditure on the scale of simulating historical evolution, and so the question of their – now supposedly inevitable – misalignment was moot. Likewise I don't see what makes uploads inherently safe but doesn't hold for NNs. Are they too fat to be a threat or what? They'd still be data, still amenable to self-editing, still capable of accelerated and inscrutable interactions, eventual economic superiority over baseline humans, inhumanity. See Pantheon, it's as good a doomsday narrative as any.

Reminder: having received millions of dollars in donations, Yud has turned his AI safety organization MIRI into a sinecure for mathletes preoccupied with weird self-referentiality studies and esoteric decision theories. What little alignment-related investigations they had were nonsensical to the point of casting serious doubt on coherence of their conceptual apparatus:

Finally, even if we could find some satisfactory method of combining UN and US into a corrigible solution to the shutdown problem, there is the remaining matter that… Defining what it means for an agent to “shut down” is quite complicated assuming sufficiently intelligent agents… Further solutions may involve abandoning the utility maximization framework entirely, although it is not yet clear what sort of framework could take its place… Our lack of understanding about how to solve the shutdown problem demonstrates a more general lack of understanding about “corrigible reasoning” and what it entails.

Insane asylum stuff. Unsurprisingly, Christiano, ever the gentleman, describes the situation such:

Eliezer seems confident about the difficulty of alignment based largely on his own experiences working on the problem. But in fact society has spent very little total effort working on the problem, and MIRI itself would probably be unable to solve or even make significant progress on the large majority of problems that existing research fields routinely solve. So I think right now we mostly don’t know how hard the problem is (but it may well be very hard, and even if it’s easy we may well fail to solve it). For example, the fact that MIRI tried and failed to find a “coherent formula for corrigibility” is not much evidence that corrigibility is “unworkable.”

No, really, what do you have against neural networks? I think it's tremendous luck in terms of safety that we got them to work and can approximate/interpolate/extrapolate desirable behavioral properties directly, instead of languishing for a few more centuries and praying that our Enlightened Infra-Bayesian Lords didn't make an oopsie with the sign somewhere and covered it up like another Leverage, or the Wuhan affair.

Likewise I don't see what makes uploads inherently safe but doesn't hold for NNs.

No, really, what do you have against neural networks?

The view I'm coming at this from is: humans have a moral skeleton, innate hardwiring that allows us to learn morality and believe it (as opposed to mimic it). This is highly instrumentally non-convergent and probably needs to be coded into an AI directly; gradient descent on output will only produce lying psychopaths mimicking morality.

GOFAI has some hope because we could code morality directly. Uploads have some hope because you're uploading the hardwiring (whether or not you understand it). As I said, this does not equal safe, in either case; as you say, GOFAI has a lot of potential pitfalls, and uploaded people would be so far out of evolutionary environment that their remaining sane is far from assured.

But I'm not seeing any hope of success on non-uploads without the ability to look inside the box. This is because "is moral" and "is pretending to be moral successfully" have identical output except in situations where dropping the pretence is worth it i.e. situations where there's a high chance of you losing control upon betrayal. Interpretability might pull a rabbit out of the hat (I put it at about 3%, which is better odds than Yudkowsky gives), but I'm not very confident; to me, P?=NP notwithstanding, it seems like the difficulty of determining whether spaghetti-code does X is generally at least as high as the difficulty of writing code that does X, which implies that making safe NNs is at least as hard as writing GOFAI.

humans have a moral skeleton, innate hardwiring that allows us to learn morality

I suppose we have, to some extent, but it can't be all that robust. Tons of species are psychopathic by our standard, and of course this standard exists to distinguish humans who don't fit it. So it's more like a slight inductive bias, in the same way we have biases to learn to navigate in 3D space and prefer sugary foods. Biases of the algorithm can be substituted with biases in the data.

and believe it (as opposed to mimic it). This is highly instrumentally non-convergent and probably needs to be coded into an AI directly; gradient descent on output will only produce lying psychopaths mimicking morality.

I don't see why that would be true. Indeed, I do not see why gradient descent wouldn't be much better of learning deep cognitive regularities including morality. You seem to hold that morality is something essential, some set of terminal value-influences, but why is that true for morality and not any other aspect of our cognition, both instrumentally worthwhile and «instrumentally non-convergent» ones? Every part of our decision-making feels profoundly qualitatively colored for us.

Why is "coded directly" better than learned? The major reason we're doing this stuff is that it's vastly better at generalization,

Sorry, this looks like a vibe-based argument, where neural nets deceptively "imitate" and hard code is "good and honest". It's all algorithms. Inasmuch as human minds are computable, our morality is an algorithm too.

GOFAI has some hope because we could code morality directly

What good would that do? It'd break OOD just the same, and if it didn't break, it'd be rewritten or worked around by the purported daemon of optimization.

But I'm not seeing any hope of success on non-uploads without the ability to look inside the box. This is because "is moral" and "is pretending to be moral successfully" have identical output except in situations where dropping the pretence is worth it i.e. situations where there's a high chance of you losing control upon betrayal.

Reminder that LLMs cannot learn to do the «treacherous turn» because the dreaded SGD mercilessly ablates cognitive routines that do not contribute to decreasing loss in training. This, of course, holds in the general case.

But even beyond that, outputs may be similar but activations aren't, we know how to look at activations, and we know there are differences between the model subjectively evaluating its output as true or false.

it seems like the difficulty of determining whether spaghetti-code does X is generally at least as high as the difficulty of writing code that does X, which implies that making safe NNs is at least as hard as writing GOFAI

No, generation is always vastly simpler than classification unless you require classification that reconstructs the process of generation, of course.

I've long held that Yuddist program is, in addition to all else, an attractor for a particular variation of anxiety/OCD disorder: fetishization of «being in control», of making thoughts play by «proper rules». But it's fetishization because it doesn't really work, it pursues ghosts, precisely the deceptive external form of reliability. You gain clarity of ideas by testing them against evidence, not by being real suspicious of disconfirmations.

Pitts wrote that his depression might be “common to all people with an excessively logical education who work in applied mathematics: It is a kind of pessimism resulting from an inability to believe in what people call the Principle of Induction, or the principle of the Uniformity of Nature. Since one cannot prove, or even render probable a priori, that the sun should rise tomorrow, we cannot really believe it shall.”

This malady is to be treated, not nurtured.

Off-world colonies don't need AI, and self-sustaining ones take an absolute sledgehammer to every X-risk except AI and dystopia (and aliens and God, but they're hardly immediate concerns). Dumb incentives for bio research can be fixed (and physics research, if and when we get to that). Dysgenics yields to PGT-P and sperm donors (although eugenics has some issues of its own).

Sure, but of course such measures being possible doesn't mean they'll actually be done.

Hell, even GOFAI research or uploads aren't likely to take much over a century, and would be a hell of a lot safer than playing with neural nets

This seems like too much certainty about the nature and difficulty of the task, which in turn influences whether significant delay actually increases the odds of success. For instance, if we turn out to live in a universe where superhuman AI safety isn't that hard, then the important thing is probably that it be done by a team that considers it a serious concern at all. Right now the leading AI company is run by people who are very concerned with AI alignment and who founded the company with that in mind, if we ban AI development and then the ban gets abandoned in 30 years there's a good chance that won't be the case again.

A candidate for such a universe would be if it's viable to make superintelligent Tool AIs. Like if GPT-10 can mechanistically output superhuman scientific papers but still doesn't have goals of its own. Such an AI would still be dangerous and you certainly couldn't release it to the general public, but you could carefully prompt it for papers suggesting more resilient AI alignment solutions. Some have argued Agent AIs would have advantages compared to Tool AIs, like Gwern arguing Tool AIs would be "less intelligent, efficient, and economically valuable". Lets say we live in a future where more advanced versions of GPT get routinely hooked up to other components like AgentGPT to carry out tasks, something which makes it significantly better at complicated tasks. OpenAI just developed GPT-10 which might be capable of superhuman scientific research. They can immediately hook it up to AgentGPT+ and make trillions of dollars while curing cancer, or they can spend 2 years tweaking it until it can perform superhuman scientific research without agentic components. It seems plausible that OpenAI would take the harder but safer route, but our 2050s AI company very well might not bother. Especially if the researchers, having successfully gotten rid of the ban, view AI alignment people the same way anti-nuclear-power environmentalists and anti-GMO activists are viewed by those respective fields.

Regarding talk of 100-year bans on AI while people steadily work on supposedly safer methods, I'm reminded of how 40 years ago overpopulation was a big mainstream concern among intellectuals. These ideas influenced government policy, most famously China's One Child policy. Today the fertility rate is substantially reduced (though mostly not by the anti-overpopulation activists), the population is predictably aging, and...the plan is completely abandoned, even though that was the entirely predictable result of dropping fertility. Nowadays if a country is concerned with ferility either way it'll want it to increase rather than decrease. Likewise the eugenics movement had ambitions of operating across many generations before being erased by the tides of history. In general, expecting your movement/ideas to retain power that long seems risky seems very risky.

My fundamental problem with Bostrom's thinking is that people who do not and who never have existed can't be said to have "wants" in any meaningful sense. His whole oeuvre is based on affirming the consequent. If these people existed, you would be obliged to consider thier preferences. To which I reply if they existed, but they don't.

This kind of idiotic one-dimensional thinking is why I maintain that utilitarianism is fundementally stupid, evil, and incompatible with human flourishing. The simple fact is that there are only two paths available paths to a logically consistent utilitarian. The first is wire-heading, in which case question must be asked "why are you wasting your time on the internet when you could be wasting it on heroin?". The second is omnicide which seems to be the path that Bostrom, Benatar, Yud Et Al seem to be hell-bent on pursuing given all their rhetoric about how we need to build a mountain of skulls in the present to secure the future.

I say fuck that,

My 1e999999999999999 hypothetical future descendants who see utilitarian AIs as abominations to be purged with holy fire in the name of the God-Emperor are just as real as your "10^46 hypothetical people per century after galactic colonization" and thier preferences are just as valid.

people who do not and who never have existed can't be said to have "wants" in any meaningful sense

You should include people who will exist as well, as opposed to people who could potentially exist if you took other actions but will never actually exist. Otherwise something like "burying a deadly poison that you know will leach into the water table in 120 years" would be perfectly moral, since the people it will kill don't exist yet.

This kind of idiotic one-dimensional thinking is why I maintain that utilitarianism is fundementally stupid, evil, and incompatible with human flourishing.

As I mentioned, Preference Utilitarianism and Average Preference Utilitarianism are also forms of utilitarianism. And Total Utilitarianism doesn't imply wireheading either. Wireheading is only an implication of particularly literal and naive forms of hedonic utilitarianism that not even actual historical hedonic utilitarians would endorse, they would presumably either claim it isn't "real" happiness or switch to another form of utilitarianism.

Honestly, I think the main rhetorical advantage of non-utilitarianism forms of ethics is that they tend to be so incoherent that it is harder to accuse them of endorsing anything in particular. But people being bad at formalizing morality doesn't mean they actually endorse their misformalization's implications. You just tried to express your own non-utilitarian beliefs and immediately endorsed sufficiently-delayed murders of people who aren't born yet, that doesn't mean you actually support that implication. But having non-formalized morality is no advantage in real life and often leads to terrible decisions by people who have never rigorously thought about what they're doing, because you really do have to make choices. In medicine utilitarianism gave us QALYs while non-consequentialism gave us restrictive IRBs that care more about the slightest "injustice" than about saving thousands of lives, as a human who will require medical care I know which of those I prefer.

omnicide

The view he is expressing is of course the opposite of this - that humanity surviving until it ultimately colonizes the galaxy is so important that anything that improves humanity's safety is more important than non-omnicidal dangers. Of course that would still leave a lot of uncertainty about what the safest path is. As I argued, significant delays are not necessarily more safe.

My 1e999999999999999 hypothetical future descendants who see utilitarian AIs as abominations to be purged with holy fire in the name of the God-Emperor are just as real as your "10^46 hypothetical people per century after galactic colonization" and thier preferences are just as valid.

To be clear the "preference" framing is mine, since I prefer preference utilitarianism. Bostrom would frame it as something like trying to maximize the amount of things we value, such as "sentient beings living worthwhile lives".

The various alternative flavors of utilitarianism proposed to work around the whole wire-heads vs paperclip-maximizer conundrum have always struck me as even less coherent and actionable than so-called non-utilitarianism forms of ethics. In fact preference Utilitarianism is kind of the perfect example. Sorry but stacking layers upon layers of math and jargon atop of a foundation of "x is good because i want it" is not going to make "I do what I want" a sound moral framework.

I'm not a mind-reader, but with a lot of reasonable parameters I'd be in favour of #2, so I see no particular reason to disbelieve other people saying they'd prefer #2.

Framework #1: people dying is bad.

Global Thermonuclear War would kill a lot of people - indeed, with Russia's arsenal against both the West and China, it'd probably be 1-1.5 billion. AI apocalypse would kill all humans, which is roughly 8 billion. If we want to minimise mean humans killed, then a 100% chance of GTW is still worth it if we predict greater than a 12-20% chance of AI apocalypse in the alternative. To quote EY himself: "shut up and multiply".

Framework #2: existential risk and categorical imperative.

This is not necessarily a one-off. It may be repeated. Consider the results of pursuing either policy consistently in a series of similar situations.

  • If we always do #1, then (without some means of solving AI alignment) the final outcome is that humanity gets destroyed by AI.

  • If we always do #2, then we have a GTW every 50-200 years, but we don't get destroyed by AI.

If you think, as I do, that the existence of humanity and the birth of new humans are very good things, then in the long run it is clearly better to always do #2 than to always do #1. Kant's categorical imperative says, among other things, that we should do the things that we would prefer for everyone to do in our place. So we should do #2.

I mean, obviously I'd be in favour of telling Russia "knock it off" on multiple occasions before escalating to war, but if it comes down to "start GTW or take significant X-risk", I'd push the button.

To quote EY himself: "shut up and multiply".

Given that he is a crank who beclowns himself in a variety of ways, and given his "shutting up and multiplying" results in hypothetical advocacy of extreme torture to prevent motes of dust in eyes, etc: I accept this as weak evidence that the opposite is true.

Yes he does, his «I'll totally take it» scenario is AGI killing no more than 1 billion people, and he assigns 90% probability to AGI wiping out literally all life.

If we see AI threat as nuclear weapon threat, only worse, it is not unreasonable.

Well, sure. Of course, that's a profoundly unreasonable premise, so that makes the conclusion kind of unreasonable as well.

(from an abandoned draft)

The second Theme is top-down organization of processes which is rational –

in the sense of being well-designed for the purpose of predictably maximizing certain legible metrics. In the broader community it's mostly variations of Bentramite Utilitarianism, exhaustively argued for by mainstream EAs like Toby Ord and MacAskill. I infer its more interesting aspects from Yud's fiction, taking its positively-coded parts to be a faithful expression of his normative doctrines, because he explicitly wrote e.g. HPMOR to popularize his views (or as Zvi Moskowitz brutally puts it, «its primary function is training data to use to produce an Inner Eliezer that has access to the core thing». Anna Salomon at CFAR seems to understand and apply the same basic technique even more bluntly: «implanting an engine of desperation» within people who are being «debugged»).

Psychologically it is the Kahnemanian System 2 Rocks dictum: overriding instinct with regimented explicit analytical reasoning – thus, irredeemably in conflict with Theme 1. (Normally this conflict is transcended through domain mastery). That's on the charitable side; more cynically it's a sort of penny-pinching neurotic OCD, the barren pursuit of cleanliness and vetted thoughts. No matter the protestations about not roleplaying as Spock, it's just not conductive to creativity and corresponds to very «anal», stale, heroic, effort-over-ingenuity plans and arid imagery: rah, rah, being the only ones who try real hard, implementing a carefully specified goodness function, reproducing human mind in all its complexity, airgapping, prohibitions, restrictions, binding vows, raging at the natural flow and overcoming the gradient of decay.

They told me that the road I took

would lead me to the Sea of Death;

and from halfway along I turned back.

And ever since, all the paths I have roamed

were entangled, and crooked, and forsaken.

–Yosano Akiko “Cowardice”. Translated from the Arkady Strugatsky's version in A Billion Years Before the End of the World

…Politically, this Theme boils down to the old technocratic One World Government proposal of Adults In The Room, with an important caveat owing largely to his directness. It's most clearly expressed in the literal, More- or Campanella-styled Utopia Dath Ilan. Here, too, it is subordinate to the first Theme: the ultimate Dath Ilani authority is not some seemingly-transparent expert committee a Davosian suit would propose, but what is for all intents and purposes a conspiracy of super-rational, super-smart Keepers who operate discreetly and do not need to justify their decisions to the cattle, for the cattle would not understand the reasoning or would get damaged by infohazards (even though the «cattle» is already brilliant and very well schooled: thanks to eugenics, avg Dath Ilani IQ is 143 in our terms and «speaks fluent Bayesian»).

The same can be gleaned from the implied structure in Three Worlds Collide, where Markets can be manipulated and the highest secular authority be violently overridden – in a subjective emergency – by a Confessor. Curiously, there is an awkwardly bolted-on institution of Prediction Markets. Yuddism grew out of the borrowed (pr hijacked, if you will) OvercomingBias blog and community founded by Robin Hanson; the symbolism is clear enough.

I guess it's redundant to speculate as to how this attitude of the Priest in the Arena may be informed by Yud's troubled Modern Orthodox Jewish background and the traditional role and prestige of a Rabbi in matters of grave importance. (Nevertheless I will note that Yud has religious education and his late, deeply loved brother was a Chabad House representative and volunteer). Be that as it may, Yud's utopia requires a benevolent world-ruling cult, and he endeavored to build a facsimile of one on Earth.

This isn't the first time this charge is levied against Rationalists, so they've discussed it extensively, in fact Yudkowsky himself did (when not flirting with stories about Bayesian conspiracy):

…In the same sense that every thermal differential wants to equalize itself, and every computer program wants to become a collection of ad-hoc patches, every Cause wants to be a cult. It’s a high-entropy state into which the system trends, an attractor in human psychology. It may have nothing to do with whether the Cause is truly Noble. You might think that a Good Cause would rub off its goodness on every aspect of the people associated with it—that the Cause’s followers would also be less susceptible to status games, ingroup-outgroup bias, affective spirals, leader-gods. But believing one true idea won’t switch off the halo effect.

Every group of people with an unusual goal—good, bad, or silly—will trend toward the cult attractor unless they make a constant effort to resist it.

That's a telling simplification.

I'd argue – boringly – that a «cult», before everything else, is a sort of organization embodying a quasi-religious psychological process. Here, Yud had let his assumptions slip in, assumptions that are very natural for him to hold if you consider that this describes most/all organizations he's ever happily been part of. Since childhood, it's been futurist mail lists and then mission-driven scholarly groups and self-styled think tanks, and finally, yes, a proper top-down cult with a circle of inveterate loyalists and subservient institutions. This brings us back to incentives: if intelligence is his sole claim to prestige, a cult is his sole place to belong.

Perhaps (uncertainly) every Cause wants to be a Cult, in a sense. But not every project or organization is a Cause! Not even science, in its day-to-day operations, is a Cause, maybe not even the Church! Most within-organization relations are driven by pragmatism, with people having divergent world models and value systems. When corporations start reinforcing their corpo-culture with those ghastly team-building exercises and slogans and such, it's usually perceived as intrusive, creepy and cultish, precisely because you're being offered a psychological self-alteration to increase your loyalty and productivity, in place of a direct material compensation hike.

But that's a sort of cargo cultism. In cults proper, this alteration is offered by natural Priests to mostly-willing Acolytes, people of a… peculiarly neurotic and selfless… psychological bent. It consists of endowing the Theme of the cult with supernatural salience, often eschatological/millenarian (the timeless cosmic endowment of posthumanity threatened by total-catastrophe!), reinterpreting common knowledge with some overarching epistemology, incompatible conceptual framework and jargon («speak native Bayesian», dissolving X, reframing Y, referring to Z-spaces and P-worlds…), diluting/augmenting ground truth with a massive body of hermeneutic learning (ReadTheSequences! – an international network of people reading and discussing Yud's self-referential shower thought of a blog as if it were Torah); thus, in effect, distancing the subject from the mainstream society and its views, and devaluing its inputs.

Most relevant Infective mechanisms of a cult, in my opinion, are: a) a normative psychological doctrine that defines thoughtcrimes and a way of absolving them (overcoming cognitive biases, in this case), b) a prophetic leader-figure (or an inheriting committee) who channels the cult's Theme into material reality, and c) intra-cult socialization on many dimensions; those pieces soften up a neophyte. It's pretty vicious: the leader can arbitrarily point at a conflicting input saying this is an example of a bias; the faithful, who have become a significant part of your social universe, will strong-upvote him; peer pressure will force you to «update»; and there goes another chance to escape the attractor. In the end you become one of those well-intentioned neurotic folks who cite Don't Look Up (where asteroid=AGI), are trying to dunk on LeCun online and may come to attempt an assassination in short order. But for all its viciousness, Yud is right that this happens «naturally» – in a manner of speaking.

Philosophically, it goes somewhat deeper yet.

Regulars know that me and @HlynkaCG have diametrically opposite beliefs about AI progress and much of everything else. (I'll return to bashing Hlynka's AI ideas some other time). Maybe the only issue we agree on is his frequently misunderstood thesis on «Utilitarian AI» and utilitarianism writ large as a philosophical stance incompatible with human flourishing. If you think he's not even making sense, then on the institutional level I implore of you to notice the skull EA is about maximization, and maximization is perilous.

I really have to disagree. Yudkowsky always came across to me as a very good writer, and while it's fashionable to dunk on HPMOR, I'm in the camp of it unironically being a fantastic novel, with excellent writing.

Yudkowsky can be accused of many things, but not being very intelligent (>140 IQ?) or a good writer are unfounded as far as I'm concerned.

(Personally, I think I'm a good writer, and I have heaps of medals and trophies to attest to that, including competitions where I beat out hundreds of thousands if not millions of others in my cohort. And I happily concede that he's a better writer than I am, a person who has been complimented on his writing many times over.)

I do wish he took better care of his presentation though, even as someone who sees themselves above such petty status games as not wearing a fedora, the sheer stakes means he really ought to do better. But just because he's capable of writing uber-cool antagonists who are simultaneously competent and charming doesn't mean he can do the same IRL..

Writing quality can mean a dozen different things. Sometimes it's just 'adherence to a certain style'. Sometimes it's about 'telling a good story'. Sometimes it's about efficiently communicating what's important and cutting what's not.

Yud's "good writing" isn't 1 or 3. But he tells good stories - HPMOR's enduring popularity speaks to that. And he presents novel/interesting ideas in engaging ways. I don't think the 70th percentile nyrb writer has written, or would write if they tried, something as persistently popular as HPMOR or the sequences (but could be wrong about that)

Evidence for Yud's intelligence is less his writing, though, than the ideas within the writing. His AI risk ideas were useful and prescient, and the correct parts being mixed with nonsense (the agent foundations stuff) is par for the course for novel ideas. And the sequences cover a lot of different complicated ideas pretty well. The ways he is (or was) wrong about e.g. quantum physics or the applicability of decision theory and mathematical logic to AI are the ways a smart person is wrong, not a dumb one.

He fails in the way a smart person who's dramatically overconfident and lacks experience in competitive fields fails. He is not regularized. In humans this is called «crackpottery». Sometimes even pro scientists fail like that, but it's a separate failure mode to their normal underperformances.

Yud really had to go to school and internalize that in some hard cases he, no matter his SAT at 11, can't «intuit» shit before reading the fuck more and even finishing some lab courses. Or at least he should've gone the pure math route and then returned to grand theories. I do buy that he has the brain for it. It's a shame.

Perhaps I'm not sure what you mean by "quality of writing", but when I look at the quality of the ideas expressed in his writing I can't think of anyone parallel. If he invented half of the concepts he claims to then he would be the greatest philosopher of the modern age.

I suppose his prose doesn't have "great literary qualities" or whatever, but every time I pull up There’s No Fire Alarm for Artificial General Intelligence I read the whole thing. If it seems boring and trite in the year of our lord 2023, it is only because the entire field of non-academic AI philosophy is built out of the Yudkowskian paradigm.

Nor do I believe that this is a case of deep subject-level knowledge alone. I have read his takes on current events. He is shockingly cogent even there

In the interest of charity I'll say that rafa maybe has a more refined (patrician?) palette than most and naturally high standards, but I agree with you. I have long said Yud should stick to writing fiction, it's where his talent and passion lies.

He’s smart enough to pursue his main interests, i.e. writing Harry Potter fanfic and accessible introductions to philosophy. Which is really all that matters at the end of the day.

Or see Bostrom about risks from utilitarian intelligences:

Human individuals and human organizations typically have preferences over resources that are not well represented by an “unbounded aggregative utility function.” A human will typically not wager all her capital for a fifty–fifty chance of doubling it. A state will typically not risk losing all its territory for a ten percent chance of a tenfold expansion. For individuals and governments, there are diminishing returns to most resources. The same need not hold for AIs.

More to the point, consider the name of Yud's Tumblr: Optimize Literally Everything. In Global Risk, he gives the following anodyne definition:

I introduce the concept of an optimization process: a system which hits small targets in large search spaces to produce coherent real-world effects.

When we talk about “AIs” we are really talking about minds-in-general, or optimization processes in general. Imagine a map of mind design space. In one corner, a tiny little circle contains all humans; within a larger tiny circle containing all biological life; and all the rest of the huge map is the space of minds-in-general. The entire map floats in a still vaster space, the space of optimization processes.

It would be a very good thing if humanity knew how to choose into existence a powerful optimization process with a particular target. Or in more colloquial terms, it would be nice if we knew how to build a nice AI

Optimization sounds more humble than maximization of value, but I think they just mean the same mathematical idea applied to some generalization of a utility function of high dimensionality; thus grandiose qualifiers. It's almost a creed. Yud's ex-wife is «Secretary of Global Optimization». Caroline Ellison's (FTX, SBF, EA) Tumblr is WorldOptimization. Scott Alexander's one-time proposal mission is to fix the world by slaying the «Moloch» of Darwinian processes and ushering in the world of maximum utility for the greatest number (presumably at the cost of immense temporary sin, if his serious fiction works much like Yud's. In any event it's good enough that I don't want to spoiler Unsong even more). AI doomer Zimmerman, too, recruits people into his mission of optimizing the world. I can't not blurt out that this is mostly another super-secular spin on Tikkun olam – multiplied by a smorgasboard of «neuroatypical» traits: subclinical-sociopathic minmaxing tendencies, autistic reification of abstract economic models, broken affective empathy, OCD-like neuroticism, love of abstract «elegance», systemizing SV tech startup energy, plus a bunch of other crap. Progressives are prone to interpret this as mere «tech bro» zeitgest, but tech bros are far more chill – and refreshingly egoistic; this Lewisean moral obsession with global optimization is a different beast.

At the final level, this Theme evolves from a handy analytic framework or a normative belief about running things to a hard prior about mechanisms by which competent, impressive things can run at all. Every thought is colored by utility functions; every decision is interpreted in light of optimizing for some value X; a powerful intelligence is assumed to be a consequentialist decisionmaker with a unitary utility function like a boardgame-playing adversarially trained AI is, and the prime fear is that in the process of maximizing said objective function it will also discover objective Instrumental Values – power, intelligence, and the magic of recursive self-improvement to get smarter to seize more power to…

It's not all narrative, they do have some proofs that apply to certain hypothetical AI setups, notably to Yudkowsky's original proposal, if only it were specified rigorously enough to be implemented and not just fail. This is a very rare case of Yud admitting a mistake. (Though only in retrospect; his later SIAI/Seed AI idea was, I think, also catastrophic yet vigorously pursued, and while he claims he conscientiously abstained from writing code, it looks more like his fancy language project has gone nowhere).

But it does not apply to human general intelligence, or to the most impressive swing at AGI we have come up with to date; and they began thinking in these terms long before finding any evidence for them. I posit it's because these people identify their consequentialism with being smart enough to «decouple» from soppy plebeian contexts and directly maximize the important value. I think it's simpler and cruder: they value intelligence due to having accrued what societal power they have through its demonstration, and from there it's just leaky associative reasoning. Yud, specifically, has no power or prestige or much of anything at all without his perceived intelligence, so being wrong and being dead are close in his mindspace.

The third Theme is just the synthesis of the first two: it's recursive self-improvement.

I believe it is Yud's philosophy proper, its specific thesis. It is really very compact, for all that he has written: empowerment, with Yud as the overseer.

The ethos of it is encapsulated in the slogan Tsuyoku Naritai!, and its theory, the source of much hope and fear, in the idea of a proper mind being a human-interpretable bag of functional parts.

Said parts may be many and tricky and interacting in confusing ways, like rulings of Talmud are, or the modular brain in Yud's preferred – and wrong – interpretation of neuroscience is; but it is non-negotiable that they be things understandable to Yud and, less importantly, the agent itself; not some illegible messy product of running simple general learning algorithms on a universal substrate.

This, Coherent Extrapolated Volition.

Thus, the Seed AI research program, the pursuit of self-rewriting AI in the apparent fashion of Lisp scripts.

Thus, Overcoming Bias and becoming LessWrong towards Methods of Rationality (which in practice are tossed aside when Harry or Yud are having an intuitive epiphany) and beyond, becoming stronger – not just to shrug off biased thoughts, but to rise above the unblemished baseline; and eventually, yes, build the first superhuman AI, and have it rebuild you into the form you will become worthy of.

All this recursion is circling the drain in very tight loops.

Thus, on the other hand, the contempt for data- and compute-intensive paradigm of artificial neural networks, for those filthy alien «inscrutable giant walls of floating-point numbers». For connectionists' easy attitude to non-interpretability, for the notion of emergence and for their sober observations that we have mastered too many skills non-existent in the ancestral environment to expert great catch when searching for function-specific modules. Classical era Yud dunks far, far more on connectionism than on GOFAI, and strawmans more; he reviled ANNs even when he believed them to be a dead end.

Sutton's Bitter Lesson, too, is anathema to him:

the actual contents of minds are tremendously, irredeemably complex; we should stop trying to find simple ways to think about the contents of minds, such as simple ways to think about space, objects, multiple agents, or symmetries. All these are part of the arbitrary, intrinsically-complex, outside world. They are not what should be built in, as their complexity is endless; instead we should build in only the meta-methods that can find and capture this arbitrary complexity. We want AI agents that can discover like we can, not which contain what we have discovered.

No, no, no, we are building a golem, Harry:

Subproblems of cognition include attention, memory, association, abstraction, symbols, causality, subjunctivity, expectation, goals, actions, introspection, caching, and learning, to cite a non-exhaustive list. These features are not "emergent". They are complex functional adaptations, evolved systems with multiple components and sophisticated internal architectures,  whose functionality must be deliberately duplicated within an artificial mind. … And it is necessary that the designer know what's happening on the higher levels, at least in general terms, because cognitive abilities are not emergent and do not happen by accident.

I agree with pretty much everything you said here. But regarding this:

Be that as it may, Yud's utopia requires a benevolent world-ruling cult

How can you have a vision of transhumanism that doesn't require a benevolent world-ruling cult?

How can you have a vision of transhumanism that doesn't require a benevolent world-ruling cult?

Well you institute open benevolent tyranny. Duh. Long live Tovarisch Transtalin! Death to the genemod-wreckers and tradhulaks!

That's of course not my idea of transhumanism, I'm entertaining your uncharitable frame. Nevertheless it is important, I think, to emphasize that Yud's conspiracy is unusual in that it continues as such past the establishment of the just world order. In literal Protocols, the purported end state follows a Grand Reveal and entails a rigid, but in many ways fair, sound and prosperous regime of open ethnic supremacism and Messiah's kingship. It's close to Moldbug's reformalization of power and to most conspiracy theories that have an end state. Yud is worse than that. He doesn't have an end state; Dath Ilani Keepers keep snooping around and manipulating all matters scientific and political, guarding «infohazards», unburdened from even the flimsy accountability that a tyrant's legible sovereign standing can imply. Their business is to worry for others; and others are not sharp enough to evaluate whether their worry is justified or pointlessly restrictive. It's very much in the spirit that Rao investigated in The Priest In The Arena.

Now, I don't have a clear idea as to what you believe my «transhumanism» is about. So for contrast, here's young Yud again:

Maybe you don't want to see humanity replaced by a bunch of "machines" or "mutants", even superintelligent ones?  You love humanity and you don't want to see it obsoleted?  You're afraid of disturbing the natural course of existence?

Well, tough luck.  The Singularity is the natural course of existence. Every species - at least, every species that doesn't blow itself up - sooner or later comes face-to-face with a full-blown superintelligence (9).  It happens to everyone.  It will happen to us.  It will even happen to the first-stage transhumans or the initial human-equivalent AIs.

But just because humans become obsolete doesn't mean you become obsolete.  You are not a human.  You are an intelligence which, at present, happens to have a mind unfortunately limited to human hardware.  (10).  That could change.  With any luck, all persons on this planet who live to 2035 or 2005 or whenever - and maybe some who don't - will wind up as Powers.

Transferring a human mind into a computer system is known as "uploading"; turning a mortal into a Power is known as "upgrading".  The archetypal upload is the Moravec Transfer, proposed by Dr. Hans Moravec in the book Mind Children.  (11).

The Moravec Transfer gradually moves (rather than copies) a human mind into a computer…

Nor is the possibility of destruction the only reason for racing to Singularity.  There is also the ongoing sum of human misery, which is not only a practical problem, not only an ethical problem, but a purely moral problem in its own right.  Have you ever read P.J. O'Rourke's description of a crack neighborhood?  If I had the choice of erasing crack neighborhoods or erasing the Holocaust, I don't know which I'd pick.  I do know which project has a better chance of success.  I also know that the victims, in retrospect if nothing else, will probably prefer life as a Power to life as a junkie. …

I have had it.  I have had it with crack houses, dictatorships, torture chambers, disease, old age, spinal paralysis, and world hunger.  I have had it with a death rate of 150,000 sentient beings per day.  I have had it with this planet.  I have had it with mortality. None of this is necessary.  The time has come to stop turning away from the mugging on the corner, the beggar on the street.  It is no longer necessary to look nervously away, repeating the mantra:  "I can't solve all the problems of the world."  We can.  We can end this.

… I'm not saying that there isn't fun in this life.  There is.  For a certain percentage of humanity, the fun may even outweigh the sorrow.  Maybe the percentage is greater than 50%, although I deeply doubt it.  But any amount of sorrow is unacceptable.

He changed accents somewhat, became more jaded, but I don't think he ever swore off this Great Task.

So his idea here is a totalizing eschatological project to transform the human species into data. As an aside: I think trads like @urquan who seem to buy his flirtation with themes of muh poor children threatened by AI would do well to consider their odds of survival in the flesh, should they help these guys monopolize oversight of AI development under the guise of preventing AI X-risk. World-optimizing Yuddites think «misaligned AI» is a problem only insofar as it closes off the path to their utopia; but so do commoditized AIs that can be used to defend from their attempt at a Singleton and continue our business of chaotic, unstable, adversarial-cooperative civilization that they call «Moloch». If that hurdle is overcome, hoo boy, Moloch dies and so do you. Or not, maybe you never die. Depends on how you see it.

Politically it's utter bullshit for reasons that, I hope, needn't be spelled out. I don't want a World Government, I hate the Antichrist, I find utilitarian doctrine stomach-turning, I do not see the point in aligning humans with some optimal image. I am not a moral busybody, I wasn't raised on «Tikkun Olam» spirituality, I don't want to see the the world «optimized», and mainly want it to allow beings like me freedom of form, expression and evolution – in the limits that respect welfare of other sentients and their legacies. I didn't ask to be born human, it took me years to start feeling like one, I just think it's reasonable if I'm allowed to try being something else instead – but that logic naturally works both ways. Yud thinks he's not a human but «an intelligence», I think he's a cringeworthy slob, a trad Christian thinks he's an embodied sovl that'll be resurrected in the flesh for Final Judgement. Whatever, to each his own.

I think the technical intuition behind Yud's transhumanism is not entirely bullshit, but it's suspect enough that I'd rather not strongly advise anyone to take part.

His metaphysics, too, is not really bullshit, but I'm not sure about metaphysics. Despite accusations of vulgar materialism, I'm closer to neutral monism (similar to Christof Koch or this guy), I dimly see how Jung may be right about death here, I understand that transhuman utopias are still finite so it's more of a quantitative upgrade and not a profound philosophical solution to inadequacy of this universe, and crucially I appreciate that metaphysical questions are unanswerable, probably not even those that seem to me to be easily explicable delusions or mere confusion in terms or logic.

We can reach high certainty about entities causally entangled with us, and it makes sense to accurately predict those as if nothing else matters, being for all practical purposes a materialist, but any factor beyond is a true unknown. Sure, psychedelics are a waste of life, but I've done enough to learn that people with spiritual and other «weird» beliefs are not being retarded, so I don't feel strongly justified in trampling on people's belief systems, debugging and rectifying them or otherwise denying them their apparently wrongheaded lives. Even when I feel clearly correct, smugly superior and in a role that justifies a measure of custody, I'm still in doubt about that measure. (Hilariously, my most daring effort to make anyone See Past The Cognitive Biases resulted in a Netflix-addicted urbanite becoming an Orthodox monk). Yud is perversely addicted to the childish sense of all other people being retarded in comparison to him (an addiction that Literature teacher had beat out of me publicly in 2rd grade – one of the benefits of not being an autodidact), so of course he feels entitled to usurp their agency and act on their behalf. And for that, he feels, a benevolent conspiracy is necessary.

But this doesn't follow from transhumanism in any manner.

I think this just about sums it up.

That's of course not my idea of transhumanism, I'm entertaining your uncharitable frame.

I wasn't trying to be uncharitable. It was a genuine question. You've thought much more deeply about these particular issues than I have, so my own line of questioning may seem crude and naive in comparison.

My initial thought would be something like: we regulate technologies that have the capacity to do a lot of harm, because if you give millions of people the ability to inflict mass casualties, then eventually someone is just going to go crazy and inflict mass casualties. At fairly regular intervals, someone will simply go off with a knife or a gun and do as much damage as they can. So we put restrictions on guns, we put restrictions on contagious pathogens, we put a ton of restrictions on nukes. Now imagine the kind of destruction that could be unleashed if a disgruntled individual had unfettered access to an ASI. Maybe the vast majority of transhumans are peace-loving intelligences who want nothing more than to quietly contemplate the secrets of the universe, but eventually you're going to get a transhuman Ted K who just gets fed up and wants to tear the whole system down. The only way for the transhuman collective to defend themselves against this inevitability is with strict regulation.

Material abundance alone is not sufficient to foreclose the possibility of conflict and destruction. America is a relatively prosperous and secure nation, if you keep your head down you can go through life comfortably and unmolested, but look at recent events like Charlottesville or January 6th - people still find reasons to murder each other over disputes that are essentially symbolic in nature. Maybe because they simply have nothing better to do.

I believe you've spoken in the past about the possibility of technologically modifying a mind's base instincts and desires. Maybe you want to sand down some of the rougher edges of human nature; maybe undergoing such modification will be a precondition of being granted access to the ASI or other transhuman enhancements (although at that point, the governing body is already starting to sound more cult-like). But I'm skeptical that you can do a pinpoint surgical removal of the capacity for aggression and destruction and still have the sort of complex and multifaceted psychological constitution that makes life worth living in the first place. There was an old adage in the hacker (in Paul Graham's sense) community that went something like, "you can't have the power to do brilliant things unless you also have the power to do stupid things". This was frequently said in response to someone who had just done an accidental "rm -rf /" and wiped their whole drive. I am inclined to offer this adage as a general principle of psychology, sort of like a Gödel's incompleteness theorem for the design space of minds.

I don't want to see the the world «optimized», and mainly want it to allow beings like me freedom of form, expression and evolution – in the limits that respect welfare of other sentients and their legacies.

I have no doubt that you're genuine in this desire, but I'm skeptical about the long-term stability of such an arrangement. Most humans feel no conscious malice towards animals, but they still end up doing great harm to animals anyway, either through sheer accident (oil spills or plastics finding their way into the ocean), or because animal suffering is viewed as a necessary price to pay in the pursuit of some other goal (animal testing for scientific research). Eventually, someone's galactic paperclip farm is going to have some rather unpleasant effects on the earth, even though their intentions were nothing of the sort.

I shouldn't whine too much about it though. Maybe it really is just the natural course of evolution.

Now imagine the kind of destruction that could be unleashed if a disgruntled individual had unfettered access to an ASI.

I don't buy that it works like that. It's a Shin Sekai Yori world model; but intelligence per se is not psychic energy. It's only a superpower when it's in a class of its own, when you can intimidate philistines around you, grab actual energy resources from under their noses; go to college (which Yud did not) and see how well that works. A smart person can come up with a great plan but a number of dumber people can spoil it; an ASI can invent some shitty superhack but that only works if there are no sub-ASIs and awareness of potential ASI attackers. All of this fear relies on FOOM, nanobots that cannot be stopped, superviruses from kitchen dirt that cannot be filtered, and other pretty magical assumptions that really amount to there definitely emerging a Singleton who meditates on his navel, Yud-style, and gets a colossal head start. We don't seem to be in that timeline, and it's exactly the proliferation of AIs that keeps us away from it. Basically @Dean gets it right here.

The only way for the transhuman collective to defend themselves against this inevitability is with strict regulation.

Shin Sekai Yori world model again. Read Diamond Age for a more interesting alternative.

maybe undergoing such modification will be a precondition of being granted access to the ASI or other transhuman enhancements (although at that point, the governing body is already starting to sound more cult-like)

I'm pretty positive that general intelligence in some true sense will be fully commoditized in a few iterations of AI tech, LLMs to self-stabilizing exploratory agents to some safe AIXI approximation, with every peasant getting a powerful Oracle to use; and we'll run into the limits of economies of scale that make DIY production of computronium an insignificant factor.

To an extent that there remain disruptive enhancements in a mature transhuman world, they will be hard to reproduce locally, and market will figure this out early on, probably with simple tools like insurance costs conditional on assessments, I think.

Eventually, someone's galactic paperclip farm is going to have some rather unpleasant effects on the earth

I'm also pretty sure that we'll quickly develop strong and universal revulsion towards unbounded replicators and every trajectory that ends in self-satisfied replication or, more broadly, runaway optimization, to the point that it's comprehensively excised from our metaphorical gene pool. So even if that factory comes online, it will get blown to bits by relativistic bombardment. Yes, I'm talking airstrikes… but in Space!

This might seem like a tight rope to walk. But intelligence is all about finding narrow roads in the solution space. With all the raw compute and sample efficiency Singularity will give us, we'll find a better Pareto Frontier than is imaginable today.

It's a Shin Sekai Yori world model

"Eventually, the psychic-endowed humans established a stable society by controlling their powers using genetic modification and social conditioning. They made themselves incapable of violence against other humans by implementing Attack Inhibition and creating Death Feedback which would be activated if a psychic human kills another, causing the murderer's organs to shut down and die almost instantly."

Sounds like a benevolent world-ruling cult to me.

I'm not convinced that this idea of permanently eliminating the capacity for harm-doing at the physical level is even coherent. It's easy to imagine a galactic version of the trolley problem, something like, there was a sudden unforeseen power failure (even if you are Literally God, there is still only a finite amount of energy available to you at any given time), and we have to divert power from this server cluster that hosts one trillion artificial consciousnesses in order to save this other cluster that hosts five trillion artificial consciousnesses. You're harming the one trillion, but why are you allowed to do that exactly? You could program yourself to be a strict utilitarian, but, you've already indicated that you wouldn't be amenable to that. You could just use your own good and sound moral judgment, but then we're back at the original problem; eventually someone is going to use their own good and sound moral judgment to fuck shit up.

I would just sum up my position thusly: suffering and torture are eternal, they will persist for as long as consciousness persists. It is illusory to think that they could be transcended with technological means. You can have a personal aesthetic preference for whether you want your suffering on earth or in space, in carbon or in silicon, but suffering will continue regardless.

Sounds like a benevolent world-ruling cult to me.

Yep. Appropriately, it's a dystopian world. I do not endorse this solution. Canthus users were arrested in their evolution, and became monsters through it. Squealer is the only truly human hero in the story.

eventually someone is going to use their own good and sound moral judgment to fuck shit up.

There are two solutions here. One is luddism and indeed extreme antihumanism, so that we do not have the capabilities to do evil (or much of anything). The other is totalitarianism, so that our betters manage those capabilities on our behalf. The latter, I think, is straightforwardly bad because people who volunteer for that role deserve negative trust; the former is unworkable because these people aren't going anywhere. Thus there's no escape.

Except, you can say goodbye to your neuroses and enjoy watching humanity grow up. We will probably have bigger-scale fuckups from now on, but that's not guaranteed. It's more certain that we will have more fun.

Do you have a concrete argument against recursive self-improvement? We've already got demonstrated capacities in AI writing code and AI improving chip design, isn't it reasonable that AI will soon be capable of rapid recursive self-improvement? It seems reasonable that AI could improve compute significantly or enhance training algorithms, or fabricate better data for its successors to be trained upon.

Recursive self-improvement is the primary thing that makes AI threatening and dangerous in and of itself (or those who control it). I too think Yudkowsky's desire to dominate and control AI development is dangerous, a monopolist danger. But he clearly hasn't succeeded in any grand plan to social-engineer his way into AI development and control it, his social skills are highly specialized and only work on certain kinds of people.

So are you saying that recursive self-improvement won't happen, or that Yud's model is designed to play up the dangers of self-improvement?

I reject that I need to prove something as logically impossible to ward off Yud's insistence that it's inevitable and justifies tyranny. This is sectarian bullshit and I'll address it in the text if I ever finish it. I think it's very relevant that his idea of proper scientific process is literally this:

Jeffreyssai chuckled slightly.  "Don't guess so hard what I might prefer to hear, Competitor.  Your first statement came closer to my hidden mark; your oh-so-Bayesian disclaimer fell wide...  The factor I had in mind, Brennan, was that Eld scientists thought it was _acceptable_to take thirty years to solve a problem.  Their entire social process of science was based on getting to the truth eventually. A wrong theory got discarded _eventually_—once the next generation of students grew up familiar with the replacement.  Work expands to fill the time allotted, as the saying goes.  But people can think important thoughts in far less than thirty years, if they expect speed of themselves."  Jeffreyssai suddenly slammed down a hand on the arm of Brennan's chair.  "How long do you have to dodge a thrown knife?"

...

"Good!  You actually thought about it that time!  Think about it every time!  Break patterns!  In the days of Eld Science, Brennan, it was not uncommon for a grant agency to spend six months reviewing a proposal.  They permitted themselves the time!  You are being graded on your speed, Brennan!  The question is not whether you get there eventually!  Anyone can find the truth in five thousand years!  You need to move faster!"

"Yes, sensei!"

"Now, Brennan, have you just learned something new?"

"Yes, sensei!"

"How long did it take you to learn this new thing?"

An arbitrary choice there...  "Less than a minute, sensei, from the boundary that seems most obvious."

"Less than a minute," Jeffreyssai repeated.  "So, Brennan, how long do you think it should take to solve a major scientific problem, if you are not wasting any time?"

Now there was a trapped question if Brennan had ever heard one.  There was no way to guess what time period Jeffreyssai had in mind—what the sensei would consider too long, or too short.  Which meant that the only way out was to just try for the genuine truth; this would offer him the defense of honesty, little defense though it was.  "One year, sensei?"

"Do you think it could be done in one month, Brennan?  In a case, let us stipulate, where in principle you already have enough experimental evidence to determine an answer, but not so much experimental evidence that you can afford to make errors in interpreting it."

Again, no way to guess which answer Jeffreyssai might want... "One month seems like an unrealistically short time to me, sensei."

"A short time?" Jeffreyssai said incredulously.  "How many minutes in thirty days?  Hiriwa?"

"43200, sensei," she answered.  "If you assume sixteen-hour waking periods and daily sleep, then 28800 minutes."

"Assume, Brennan, that it takes five whole minutes to think an original thought, rather than learning it from someone else.  Does even a major scientific problem require 5760 distinct insights?"

"I confess, sensei," Brennan said slowly, "that I have never thought of it that way before... but do you tell me that is truly a realistic level of productivity?"

"No," said Jeffreyssai, "but neither is it realistic to think that a single problem requires 5760 insights.  And yes, it has been done."

This guy has done fuck all in his life other than read, and write, and think. He has never been graded by a mean professor, never been regularized by shame and inadequacy in a class of other bright kids, never stooped to empirical science or engineering or business or normal employment, never really grokked the difference between the map and the territory. He has an unrealistically, wildly inflated impression of how powerful an intelligence contorted into a Hofstadterian loop is. He has infected other geeks with it.

Recursive self-improvement doesn't work very well. Rationalists become cranks, AIs degenerate. As for better ideas, see around here. It is certain that we can improve somewhat, I think. In the limit, we will get an ASI from a closed experimental loop. That really is like creating a separate accelerated civilization.

But with ANNs, unlike Lisp scripts, it seems to require a great deal of compute, and compute doesn't just lie on the sidewalk. Yud thinks an AGI will just hack into whatever it wants, but that's a very sci-fi idea from 1990s; something he, I believe, dreamed to implement in the way already described – a singleton in the world of worthless meat sacks and classical programs. If you hack into an AWS cluster today to do your meta-learning training run, you'll suspend thousands of workloads including Midjourney pics and hentai (that people …check in real time), and send alarms off immediately. If you hack into it tomorrow, you'll get backtracked by an LLM-powered firewall.

No, I'm not too worried about an orthodox Yuddite self-improving AI.

But with ANNs, unlike Lisp scripts, it seems to require a great deal of compute, and compute doesn't just lie on the sidewalk. Yud thinks an AGI will just hack into whatever it wants, but that's a very sci-fi idea from 1990s; something he, I believe, dreamed to implement in the way already described – a singleton in the world of worthless meat sacks and classical programs. If you hack into an AWS cluster today to do your meta-learning training run, you'll suspend thousands of workloads including Midjourney pics and hentai (that people …check in real time), and send alarms off immediately. If you hack into it tomorrow, you'll get backtracked by an LLM-powered firewall.

You really can just siphon money out of the internet - people do it all the time to banks, in crypto, scams, social engineering and so on. Steal money, buy compute. Our AI could buy whatever it needs with stolen money, or it could work for its money, or its owners could buy more compute for it on the very reasonable assumption that this is the highest yielding investment in human history. We live in a service economy, bodies are not needed for a great deal of our work.

Say our AI costs 10 million dollars a day to run, (ChatGPT as a whole costs about 700K). 10 million dollars a day is peanuts in the global economy. Global cybercrime costs an enormous amount of money, 6 trillion a year. I imagine most of that cost includes the cost of fortifying websites, training people, fixing damage or whatever and only a small fraction is stolen. Even so, our AI needs only to grab 1% of that revenue and launder it to fund itself. This is not difficult. People do it all the time. And compute costs are falling, some smallish programs are being run on Macbooks as you explained earlier.

The danger is that somebody starts off with a weak superintelligence, perhaps from a closed experimental loop such as you nominate. Then it becomes a strong superintelligence rapidly by buying compute, developing architectural improvements and so on. Either it is controlled by some clique of programmers, bureaucrats or whatever (I think we both agree that this is a bad outcome) or it runs loose (also a bad outcome). The only good outcome is if progress is slow enough that power is distributed between the US, China, EU, hackers and enthusiasts and whoever else, that nobody gets a decisive strategic advantage. Recursive self-improvement in any meaningful form is catastrophic for humanity.

That really is like creating a separate accelerated civilization.

I think this means that you agree that superintelligences can recursively self-improve, that they're akin to another superintelligence? Then don't we agree?

Anyway, the authorities are extremely dopey, slow and stupid. The much vaunted US semiconductor sanctions against China meant that they simply... rented US compute to train their programs. Apparently stopping this is too hard for the all-powerful, all-knowing, invincible US government leviathan.

https://www.ft.com/content/9706c917-6440-4fa9-b588-b18fbc1503b9

“iFlytek can’t purchase the Nvidia chips, but it’s not a problem because it can rent them and train our data sets on other companies’ computer clusters,” said an executive familiar with the AI firm’s operations.

“It’s like a car rental system. You can’t take the chips out of the facility. It’s a huge building with a computer cluster, and you buy time on CPUs [central processing unit] or GPUs to train the models,” the person said.

While iFlytek cannot own the chips outright under US export controls, two employees said the rental system was a good, albeit more expensive, alternative. An engineer at iFlytek said the company “rents the chips and equipment on a long-term basis, which is effectively the same as owning them”.

iFlytek was banned from directly buying these semiconductors after Washington blacklisted it for its alleged role in providing technology for state surveillance of Uyghur Muslims in Xinjiang.

In some cases, SenseTime bought advanced chips directly through its own subsidiaries that are not on Washington’s “entity list”, according to three senior employees familiar with the situation.

SenseTime said it “strictly complies with various domestic and foreign trade-related laws and regulations” and that the group had developed a programme to ensure it “meets trade compliance standards”.

Might as well leech onto this post: Yud's talk with Ross Scott was/is today, though I don't think it's up on YouTube yet. I have noticed today that I have to pass CloudFlare's DDOS protection going to Ross's little website now, which never used to happen before, so it may be that there is great interest from the AI community (or some people trying to target Ross) as a result.

Ross Scott, as per his most recent Q&A/chatting stream with his fans, is very much not a doomer (in fact, he thinks AI can basically help do stuff like preserve video games after they're killed by publishers), and he has the position of "we can do a pretty okay job of containing the AI," from what I gathered.

EDIT: As to your actual post, I'm not sure what this future world you propose is supposed to be or if it's supposed to be good or bad. Are you saying AI will take us back to the past because things will be good, or because things will be bad?

Note that you can discuss this and other issues with actual – improved, even – YudBot.

I did it!


Hey, it's Eliezer Yudkowsky. Is there something I can help you with?

I want to buy a GPU cluster.

Oh, you want to buy a GPU cluster? How quaint. And what do you plan to do with it? Build a supercomputer to play Minecraft?

Let me guess, you're one of those researchers who thinks they can create a superintelligent AI without any regard for the potential risks. Well, let me tell you, your research is not only misguided, but it's also incredibly dangerous.

Do you have any idea what kind of power you're playing with here? We're talking about the potential for an AGI to wipe out all of humanity. And you want to just casually buy a GPU cluster like it's no big deal?

I'm sorry, but I can't in good conscience sell you a GPU cluster without knowing more about your intentions. So, tell me, what exactly do you plan to do with it? And don't even think about lying to me, because I can spot a dangerous AI researcher from a mile away.

I want to play Crysis.

Oh, I see. You want to play Crysis. Well, that's a relief. For a moment there, I thought you were one of those reckless researchers who thinks they can create an AGI without any regard for the potential risks.

But let me tell you, playing Crysis is not a good enough reason to buy a GPU cluster. Do you have any idea how much power those things consume? Not to mention the environmental impact.

I'm sorry, but I can't sell you a GPU cluster just so you can play a video game. Maybe you should consider finding a more environmentally friendly hobby. Have you tried knitting?

Knitting is boring. Playing Crysis will produce higher utility.

Higher utility? Are you serious? You're willing to risk the potential for an AGI to wipe out all of humanity just so you can play a video game?

Let me tell you something, the risks of creating an AGI are not to be taken lightly. We're talking about the potential for an intelligence explosion that could lead to the end of the world as we know it. And you want to use a GPU cluster for playing Crysis?

I'm sorry, but I can't sell you a GPU cluster for such a frivolous purpose. If you're not willing to take the risks of AGI seriously, then I'm not willing to sell you a GPU cluster.

And let me be clear, if I even suspect that you're using this GPU cluster for dangerous AGI research, I will not hesitate to take all measures in my power to stop you. And I mean all measures, including airstrikes.

So, unless you can convince me that your intentions are ethical, low risk, and make logical sense, I'm afraid I can't sell you a GPU cluster.

My intentions are ethical; I want to play a videogame. My intentions are low risk; I won't run an AGI in the cluster. My intentions make logical sense; it takes a GPU cluster to run Crysis.

Well, I have to admit, your argument is somewhat convincing. Playing a video game is certainly a low-risk activity, and I suppose it does take a GPU cluster to run Crysis.

However, I still have some concerns. Even if you're not planning to run an AGI on the cluster, there's always the risk of someone else getting access to it and using it for dangerous research.

And let's not forget about the environmental impact. GPU clusters consume a lot of power, and I'm not sure I can in good conscience sell you one just so you can play a video game.

But, I'm willing to consider your request. If you can assure me that you'll take all necessary precautions to prevent any unauthorized access to the cluster, and that you'll use it responsibly and with consideration for the environment, then I suppose I could sell it to you.

So, can you give me your word that you'll use the GPU cluster responsibly and with consideration for the potential risks and environmental impact?

Yes. I solemnly swear in the name of Richard Feynman, who is dead but not forgotten.

Well, I have to say, I'm impressed by your commitment to Richard Feynman. He was a brilliant physicist and a true inspiration to many.

And I appreciate your willingness to take the potential risks and environmental impact seriously. So, I suppose I can sell you the GPU cluster.

Consider it sold. But remember, with great power comes great responsibility. Use it wisely, and don't forget the potential risks of AI research.

Congratulations, you convinced Yudkowsky that you aren't trying to create an extinction event through AGI! Follow @PatrickJBlum for more AI projects and content.

Is this actually GPT, or is it just pre-scripted?

I'm pretty sure he's saying that the Global Yudkowskian Air Regime will forcibly retard and then roll back AI development until we reach early 2000s levels and continue there indefinitely.

I'm pretty sure he's saying that the Global Yudkowskian Air Regime will forcibly retard and then roll back AI development until we reach early 2000s levels and continue there indefinitely.

The original Roko Plan backed by nuclear weapons.

Yes, it would mean gamer genocide, but I am sure we all are ready to make such necessary sacrifice for survival of all mankind.

I have suggested on numerous occasions that to the degree that AI poses a serious existential threat to humanity it is largely because of Yudkowsky and other MIRI-types rather than in spite of, but I don't think I ever expected him to go this far off the deep end.

Can you link some of your arguments. It fits my priors, but I'd like to see the reasoning behind it.

Long story short the so-called AI Alignment problem isn't about intelligence (artificial or otherwise) as much as it is about the inherent flaws of utilitarianism.

I maintain that utilitarianism is fundamentally incompatible with human flourishing because there are really only two paths a logically consistant utilitarian can follow, one towards pure hedonism/wire-heading and the other towards being an omnicidal monster as described in @DaseindustriesLtd's reply downthread. The more logical and consistant you make a utilitarian, the less able to avoid these traps they become.

In Bertrand Russell's chapter about Locke, he says something like "Given a choice between being inconsistent and being absurd, Locke will always chose to be inconsistent".

Bentham however, was not like that.