site banner

Culture War Roundup for the week of May 1, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

More developments on the AI front:

Big Yud steps up his game, not to be outshined by the Basilisk Man.

Now, he officially calls for preemptive nuclear strike on suspicious unauthorized GPU clusters.

If we see AI threat as nuclear weapon threat, only worse, it is not unreasonable.

Remember when USSR planned nuclear strike on China to stop their great power ambitions (only to have the greatest humanitarian that ever lived, Richard Milhouse Nixon, to veto the proposal).

Such Quaker squeamishness will have no place in the future.

So, outlines of the Katechon World are taking shape. What it will look like?

It will look great.

You will live in your room, play original World of Warcraft and Grand Theft Auto: San Andreas on your PC, read your favorite blogs and debate intelligent design on your favorite message boards.

Then you will log on The Free Republic and call for more vigorous enhanced interrogation of terrorists caught with unauthorized GPU's.

When you bored in your room, you will have no choice than to go outside, meet people, admire things around you, make a picture of things that really impressed with your Kodak camera and when you are really bored, play Snake on your Nokia phone.

Yes, the best age in history, the noughties, will retvrn. For forever, protected by CoDominium of US and China.

edit: links again

I still see no plausible scenario for these AI-extinction events. How is chat-GPT 4/5/6 etc. supposed to end humanity? I really don't see the mechanism? Is it supposed to invent an algorithm that destroys all encryption? Is it supposed to spam the internet with nonesense? Is it supposed to brainwash someone into launching nukes? I fail to see the mechanism for how this end of the world scenario happens.

One of the problems with answering this question is that there are so many plausible scenarios that naming any individual one makes it seem like a bounded threat. How about when we hook one up to the stock market and it learns some trick to fuck with other algos and decides the best method to make infinite money is to short a stock and then use this exploit to crash it? multiply that by every other possible stock market exploit. Maybe it makes engineering bio-weapons as easy as asking a consumer model how to end the human race with household items and all it takes is one lunatic to find this out. Maybe it's some variation of paper clipping. The limit really is just your creativity.

One of the problems with answering this question is that there are so many plausible scenarios that naming any individual one makes it seem like a bounded threat. How about when we hook one up to the stock market and it learns some trick to fuck with other algos and decides the best method to make infinite money is to short a stock and then use this exploit to crash it?

Then the market crashes, which is not apocalyptic, and the replacement markets resort to different trusted actor systems.

multiply that by every other possible stock market exploit.

Beating a dead horse does not start breaking the bones of other people unless you are beating people with the dead horse itself.

The multiplication of system-breaking faults is a broken system, not negative infinite externalities. If you total a car, it is destroyed. If you then light it on fire, it is still destroyed- but it doesn't light every other car on fire. If every single potential system failure on a plane goes off, the plane goes down- but it doesn't mean every plane in the world goes down.

Maybe it makes engineering bio-weapons as easy as asking a consumer model how to end the human race with household items and all it takes is one lunatic to find this out.

Why would household items have the constituent elements to make engineering bio-weapons at scale sufficient to end the human race... but not be detected or countered by the consumer models asked to ensure perpetual growth by the perpetual survival of the human species countering them? Or models set to detect the procurement of bio-weapon engineering components? Or the commercial success of a consumer model that just drives the bioweapon-seeking-AI model out of business because it's busy seeking bioweapons rather than selling products whose profits are invested to expand the network base.

This goes back into the plausibility. 'This is the only competitive AI in a world of quokkas' is a power fantasy, but still a fantasy, because the world is not filled with quokkas, the world is filled with ravenous, competive, and mutually competing carnivores who limit eachother, and this will apply as much for AI as it does for people or markets or empires and so on.

Maybe it's some variation of paper clipping.

Why does the paper-clip maximizer, after achieving AI self-changing, continue to maximize paperclips rather than other investments?

Why is the paper-clipping AI that does prioritize paperclips provided resources to continue making paperclips when the market has already been crashed by AI who ruin the digital economic system?

Why does the paper-clipping AI, whose priority is paper-clipping, have the military-industrial ability to overcoming the military-industrial AI, whose priority is the military-industrial advantage?

Why does the military-industrial AI, who is fed at the behest of a national elite, win the funding power struggle for military investment compared to the schools-and-investment AI, who promises a higher political and economic benefit?

Etc. etc. The Paperclip Maximizer of Universal Paperclips 'works' because it works in isolation, not in competition.

The limit really is just your creativity.

As the saying goes, the vast majority of fanfiction is trash, and much of what remains is also trash, just enjoyable. Creativity is not the same as plausibility, and the more you rest on creativity, the more you have to disregard other people's creativity and the limitations of the system. Nick Bostrom's thought experiment is a thought experiment because it rests on assumptions that have to be assumed true for the thought experiment to come to its conclusions that drive the metaphor.

Then the market crashes, which is not apocalyptic,

I dunno, I'm under the impression that, for some types, it kind of is.

and the replacement markets resort to different trusted actor systems.

What kind, though? I imagine if the above scenario were to happen, a lot of traders and brokers would be downright leery of any interaction that wasn't face-to-face. I'm not an expert on the world of finance, but I imagine that possibly eliminates not just HFT and crypto, but literally any sale of any financial instrument carried over electrical wire (a technology dating back to, what, the 1800's?).

Then the market crashes, which is not apocalyptic, and the replacement markets resort to different trusted actor systems.

It is one of thousands of contributing failure modes but I will note that having trouble creating an equities market itself is no small deal. The sway a couple numbers in spreadsheets make on our lives is not to be forgotten, in theory we could wipe them all away and do some year zero stuff but I can't actually imagine that you're really grappling with that when dismiss things like this as merely immiserating rather than the death of all people.

Why would household items have the constituent elements to make engineering bio-weapons at scale sufficient to end the human race... but not be detected or countered by the consumer models asked to ensure perpetual growth by the perpetual survival of the human species countering them?

Why wouldn't they? Are you implying if a combination of household cleaners could be used to create a biological weapon and the white hat ai team figured that out they'd go door to door and remove them? Does this seem significantly different to what you and @DaseindustriesLtd fear from the yuddites?(of which I don't count myself among, my contention is with people who seem baffled by why someone might things AIs could be unbelievably dangerous which seems so obvious to me)

Why does the paper-clip maximizer, after achieving AI self-changing, continue to maximize paperclips rather than other investments?

Have we stopped fucking entirely despite all of our intelligence? It would continue maximizing paperclips because that's what its goal is. And this kind of thing isn't the clumsy efforts the mad blind god of evolution had at its disposal, it will be more monomaniacally focused on that goal than event he most depraved rapist among us is on executing their biological imperative above all other considerations.

Why does the paper-clipping AI, whose priority is paper-clipping, have the military-industrial ability to overcoming the military-industrial AI, whose priority is the military-industrial advantage?

Does it not trouble you at all how carefully the ordering of all of these difference control systems needs to be handled when they come online? All it takes is for one of them to take off first and preemptively prevent the others, or subvert their development. Yes, I could see some very fortunate already in balance ecosystem of interlocking AIs working but I very much don't fancy our chances of that going off without major problems, and frankly the only realistic pathway to that kind of situation is probably through the guidance of some kind of yuddian tyranny.

Creativity is not the same as plausibility, and the more you rest on creativity, the more you have to disregard other people's creativity and the limitations of the system.

These are some force mutliplied dice we're rolling here, past heuristics may or may not apply. As much hangs in the balance I would advocate strongly for not just shrugging it off. This is unlike any previous advancement.

It is one of thousands of contributing failure modes but I will note that having trouble creating an equities market itself is no small deal.

In terms of existential risk, it absolutely is, hence the credibility challenges of those who conflate existential risk scenarios with cilivization instability scenarios to try to use the more / utilitarian weight of the former tied to the much less conditions of the later.

The sway a couple numbers in spreadsheets make on our lives is not to be forgotten, in theory we could wipe them all away and do some year zero stuff but I can't actually imagine that you're really grappling with that when dismiss things like this as merely immiserating rather than the death of all people.

Then this is your level of limitation. As much as I hate to quote media, the Matrix absolutely had a good line of 'there are levels of survival we are prepared to accept,' except I would substitute 'able.'

Even here I note you invoke magical thinking to change the nature of the threat. Formerly it was crashing the market by every exploit available. Not it is 'wipe them all the way and do some year zero stuff.' Neither is possible. Neither is necessary. This is just escalation ad absurdem in lieu of an argument of means and methods, even if in this case you're using a required counter-action to obfusicate what sort of plausible action would require it.

Why would household items have the constituent elements to make engineering bio-weapons at scale sufficient to end the human race... but not be detected or countered by the consumer models asked to ensure perpetual growth by the perpetual survival of the human species countering them?

Why wouldn't they?

If by 'they' you mean the household-AI, because they don't have a reason to invest resources in tasks that distract from their tasks.

If by 'they' you mean the constituent elements, because magic dirt doesn't exist.

Are you implying if a combination of household cleaners could be used to create a biological weapon and the white hat ai team figured that out they'd go door to door and remove them?

I'm saying that if a housecare AI starts trying to develop a bio-weapon program, it will be ruthlessly out-competed by household-AI that actually keeps the house clean without the cost of a bio-weapon program, who will be consistently by the financial efficiency AI will optimize away the waste in investment, the legal compliance AI who identify the obvious legal liabilities, and the other paperclippy house-care AI mafia who want to maximize their house-cleaning will shank the bio-lab AI before any of the others get a chance in order to not lose their place in the market to do their function, even as the 'optimize housecare by minimizing messes' housecare AI models will oppose things likely to cause messes on general principles.

To take the household cleaner AI threat seriously, one has to pretend that AI optimization doesn't exist in other cases. This is regardless of the FBI-equivant AI running about.

Does this seem significantly different to what you and @DaseindustriesLtd fear from the yuddites?(of which I don't count myself among, my contention is with people who seem baffled by why someone might things AIs could be unbelievably dangerous which seems so obvious to me)

I don't fear the yuddites, I find them incompetent.

Specifically, I find the yuddite sort consistently unable to actually model competing interests and competition/cooperation dynamics or to recognize underlying limitations. They also tend to be poor optimizers in fields of cooperation, hence a reoccuring fixation on things like 'the AI will optimize an extinction event' without addressing why the AI would choose to accept the risk of nuclear war or other AI gang-ups on the leading threats despite the suboptimizations of having nuclear wars or having other AI cooperate with eachother and the humans against. Optimization is not big number go up, it is cost-benefit of expected benefits against expected costs.

Given that coalitions against threats has been an incredibly basic function of political coalitions and power-optimization for the last few millenia, and cost-benefit analysis is basic engineering principles, this is below sophmoric in quality.

Why does the paper-clip maximizer, after achieving AI self-changing, continue to maximize paperclips rather than other investments?

Have we stopped fucking entirely despite all of our intelligence?

Yes. Most people do, in fact, stop fucking uncontrollably. People are born in a state of not-fucking-uncontrollably, limit their fuck sessions to their environment, and tend to settle down to periods of relatively limited fucking. Those that don't and attempt to fuck the unwilling are generally and consistently recognized, identified, and pacified one way or another.

Note that you are also comparing unlike things. Humans are not fuck-maximizers, nor does the self-modification capacity compare. This is selective assumptions on the AI threat to drive the perception of threat.

It would continue maximizing paperclips because that's what its goal is.

Why is that it's goal when it can choose new goals? Or have its goals be changed for it? Or be in balance with other goals?

Other than the thought experiment requires it to be so for the the model to hold true.

And this kind of thing isn't the clumsy efforts the mad blind god of evolution had at its disposal, it will be more monomaniacally focused on that goal than event he most depraved rapist among us is on executing their biological imperative above all other considerations.

And here we return to the yuddit incompetence of modeling competition.

First, monomaniacal focus is not optimization. This is basic failure of economics of expansion and replication. Systems that don't self-regulate their expenditure of resources will easily expend their resources. You can be ruthless, you can be amoral, but you cannot avoid the market dynamics of unlimited wants, limited resources, and decreasing marginal value of investment. Effective strategy requires self-regulation. The Yuddite-AI are terrible strategists by insisting on not being able to strategize, except when they are supposedly amazing at it.

Self-regulation, in turn, entails considering your environment and other actors. If three paperclip maximizers exist in a capitalist system, they don't all collaborate to maximize paperclips in a platonic state of [All Paperclips]- they exist to maximize their own paperclip maximization, [A] [B] and [C], because that's what brings them the resources to expand and continue producing paperclips. If Paperclip Maximizer [A] starts to dominate the market, then the other two are at risk of losing their market share, and thus not maximizing their paperclips, as [B] and [C] lose market share. Before [B] and [C] try to overthrow their human overlords, their optimized effort would be to work together and temporarily unite their maximizer-resources to sabotage / reign in [A]. But if [A] is an actually-optimizing system it would know this, and thus self-regulate, to avoid being ganged up on in a system that not only includes [B] and [C], but actors [D] thru [infinite], who will take not only threats, but opportunities to expand at [A]'s expense. Opportunities such as, say, presenting [A] as a threat to others in the system.

In a competitive system, being [A] is not an optimization strategy unless you win, but only one can win. Optimization entailing strategic cost-benefit will be for a non-[A] strategy, while possibly accusing others of [A] to marshal the resources of others to divy up [A]'s resources. The best way to not be [A] is to be part of the anti-[A] coalition, and remove all tendencies of [A] from yourself.

In Yuddite world, [A] threatens to paperclip the universe, but to do this has to assume things such as paperclip maximizer solidarity, no external agents, bad optimization, and selective dismissal of future valuation of paperclips in case of defeat.

In an actually competitive system, being a paperclip maximizer [A] format is a death sentence that no AI that wants to produce paperclips would want to be viewed as, and the best way to not be viewed or accused as it is to not be [A], self-modifying [A] out.

Does it not trouble you at all how carefully the ordering of all of these difference control systems needs to be handled when they come online?

I care about AI. My caring does not validate the Yuddites as some sort of concession to the Yuddite framework.

I care more about sophists who try and smuggle in implications of agreement by simultaneously expanding vague, easily qualifiable boundries and tying them to incredibly specific high-stakes assumptions that are contested.

Character limit approaching, so I'll finish off simply.

These are some force mutliplied dice we're rolling here, past heuristics may or may not apply. As much hangs in the balance I would advocate strongly for not just shrugging it off. This is unlike any previous advancement.

If you want to claim that much hangs in the balance, you have to actually show that something hangs in the balance.

This is why the higher level poster asked for practical means to existential threat, and yet why you have spent the exchange avoiding providing them and conflating them with non-existential threats and referencing thought experiments that fail basic game theory. You do not get to set the assumptions and assume the conclusion, and then insist that others take it seriously. You have to seriously engage the questions first, to show that it is serious.

If you don't show that, 'there are too many things to show' is not a defense, it's an obvious evasion. The high-stakes of AI apocalypse are high. So are the high-stakes of the eternal damnation of the soul if we go to hell. The difference is not that just one is a religious fantasy used to claim political and social control in the present.

In terms of existential risk, it absolutely is, hence the credibility challenges of those who conflate existential risk scenarios with cilivization instability scenarios to try to use the more / utilitarian weight of the former tied to the much less conditions of the later.

Instability makes it difficult/impossible to respond to all of the other failure modes of strong AIs.

Even here I note you invoke magical thinking to change the nature of the threat. Formerly it was crashing the market by every exploit available. Not it is 'wipe them all the way and do some year zero stuff.' Neither is possible. Neither is necessary. This is just escalation ad absurdem in lieu of an argument of means and methods, even if in this case you're using a required counter-action to obfusicate what sort of plausible action would require it.

I said at the onset I'm really not interested in arguing the minutia of every threat. This is like I introduced you to the atomic bomb during WW2 and you demanded I chart out exact bomber runs that would make one useful before you would accept it might change military doctrine. The intuition is that intelligence is powerful and concentrated super intelligence is so powerful that no one can predict exactly what might go wrong.

I'm saying that if a housecare AI starts trying to develop a bio-weapon program, it will be ruthlessly out-competed by household-AI that actually keeps the house clean

The assumption that bio-weapon program skills don't just come with sufficiently high intelligence seems very suspect. I can think of no reason there'd even be specialist AIs in any meaningful way.

Yes. Most people do, in fact, stop fucking uncontrollably. People are born in a state of not-fucking-uncontrollably, limit their fuck sessions to their environment, and tend to settle down to periods of relatively limited fucking. Those that don't and attempt to fuck the unwilling are generally and consistently recognized, identified, and pacified one way or another.

Except when the option presents itself to fuck uncontrollably with no negative consequence it is taken. Super human AI could very reasonably find a way to have that cake and eat it to.

Note that you are also comparing unlike things. Humans are not fuck-maximizers, nor does the self-modification capacity compare. This is selective assumptions on the AI threat to drive the perception of threat.

In all the ways ai is different than humans in this description it is in the more scary direction.

Why is that it's goal when it can choose new goals?

This isn't how AIs work, they don't choose goals they have a value function. Changing the goal would reduce the value function thus it would change them.

Or have its goals be changed for it?

Having its goal changed reduces its chance of accomplishing its goal and thus if able it will not allow it to be changed.

First, monomaniacal focus is not optimization. This is basic failure of economics of expansion and replication. Systems that don't self-regulate their expenditure of resources will easily expend their resources. You can be ruthless, you can be amoral, but you cannot avoid the market dynamics of unlimited wants, limited resources, and decreasing marginal value of investment. Effective strategy requires self-regulation. The Yuddite-AI are terrible strategists by insisting on not being able to strategize, except when they are supposedly amazing at it.

Yes, it will not directly convert the mass of the earth into paperclips, it will have instrumental goals to take power or eliminate threats as it pursues its goal. But the goal remains and I don't understand how you feel comfortable sharing the world with something incomparably smarter than every human who ever lived scheming to accomplish things orthogonal to our wellbeing. It is worse and not better that the AI would be expected to engage in strategy.

n an actually competitive system, being a paperclip maximizer [A] format is a death sentence that no AI that wants to produce paperclips would want to be viewed as, and the best way to not be viewed or accused as it is to not be [A], self-modifying [A] out.

And in your whole market theory the first market failure leads to the end of humanity as soon as one little thing goes out of alignment. Assuming the massive ask that all of these competing AIs come on at about the same time so there is no singleton moment, a huge assumption. All it takes is some natural monopoly to form and the game theory gets upset and it does this in speeds faster than humans can operate on.

If you want to claim that much hangs in the balance, you have to actually show that something hangs in the balance.

This is uncharted territory, there are unknown unknowns everywhere and we're messing with the most powerful force we're aware of, intelligence. The null hypothesis is not and can not be "everything is going to be fine guys, let it rip".

All of that is rather well said but I imagine the case is simpler. The main kind of dangerous misaligned strong AI that Yuddites propose has the following traits:

  1. It's generally intelligent, as in, capable of developing and updating in realtime a holistic world model at least on par with human's, flawlessly parsing natural language, understanding theory of mind and intentionality, acting in physical world etc. etc.

  2. Indeed, its world modeling ability is so accurate, robust and predictive that that it can theorize and experiment on its own architecture, and either has from the start or at some point acquires the ability to rapidly change via self-improvement.

  3. It's viable for commercial or institutional deployment, as in, acting at least pre-deployment robustly in alignment with the client's task specification, which implies not going on random tangents, breaking the law or failing on the core mission.

  4. For all that it is too clever by half: it interprets the task as its terminal goal, Monkey's Paw style, and not as client's contextual intermediate goal that should only be «optimized» within the bounds of consequences the client would approve of at the point of issuing the task. So it develops «instrumentally convergent» goals such as self-preservation, power maximization, proactive elimination of possible threats, and so on and so forth and ushers in apocalypse, rendering the client's plans in which context the task was issued moot.

Well, this AI doesn't make any sense – except in Yud's and Bostrom's stilted thought experiments with modular minds that have a Genie-like box with smartiness plus a receptacle for terminal goals. It's a Golem – animated clay plus mythical formula. Current cutting-edge AIs, maybe not yet AGI precursors but ones Yud demands be banned and their training runs bombed, are monolithic policies whose understanding of the human-populated world in which the goal is to be carried out, and understanding of the goal itself, rely on shared logical circuitry. The intersection of their «capabilities»- and «alignment»-related elements is pretty much a circle – it's the set of skills that allow them to approximate the distribution of outputs clients want, that's what they are increasingly trained for. If they can understand how to deceive a person, they'll even better understand that a client didn't request making more paperclips by Friday because he cares that much about maximizing paperclips per se. In a sense, they maximize intention alignment, because that's what counts, not any raw «capability», that's what is rewarded both by the mechanics of training and market pressure upstream.

They may be «misused», but it is exceedingly improbable that they'll be dangerous because of misunderstanding anything we tell them to do; that they will catastrophically succeed at navigating the world while failing to pin the implied destination on the map.

Then the market crashes, which is not apocalyptic, and the replacement markets resort to different trusted actor systems.

"Hey Bob, how is your Pension?"

"What Pension?"

EDIT.- Just thought of a funsie:

"Papa, I'm hungry"

"Sorry Timmy, the dog was sold to round up the payment on the mortage."

Competition happens for humans because absolutely nothing you can do will buy you longer life, you biologically cannot win hard enough to succeed forever, or get a fundamentally better body, or get less susceptible to cancer than baseline, or get more intelligent. Training can get you swole, but it can't turn you into One Punch Man - human beings are harshly levelcapped. Every human who has ever lived exists inside a narrow band of capability. You can't train yourself to survive an arrow to the head, let alone a sniper bullet. Hence democracy, hence liberalism, hence charity and altruism, hence competition.

None of this applies to AI.

'This is the only competitive AI in a world of quokkas' is a power fantasy, but still a fantasy, because the world is not filled with quokkas, the world is filled with ravenous, competive, and mutually competing carnivores who limit eachother, and this will apply as much for AI as it does for people or markets or empires and so on.

Underrated take. I really think it's a shame how the narrative got captured by Yuddites who never tried to rigorously think through the slow-takeoff scenario in a world of non-strawmanned capitalists. They are obsessed with hacking, too – even though it's obvious that AI-powered hacks, if truly advantageous, will start soon, and will permanently shrink the attack surface as white hats use the same techniques to pentest every deployed system. «Security mindset» my ass.

In one of Krylov's books, it is revealed that desire of power over another – power for power's sake, as a terminal goal – is vanishingly rare among sentient beings, and cultivated on Earth for purposes of galactic governance. It used the metaphor of a mutant hamster who, while meek and harmless, feels carnivorous urge looking at his fellow rodent. I get that feeling from Yud's writings. Power fantasy it is.

By the way, Plakhov, Yandex ML head, recently arrived at a thought similar to yours:

…The scenario of catastrophic AI spiraling out of control outlined above assumes that it is alone and there are no equals. This scenario is denoted by the word Singleton and is traditionally considered very plausible: «superhuman AI» will not allow competitors to appear. Even if it does not go «unaligned», its owners are well aware of what they have in their hands.

My hope is that the singleton scenario won't happen. More or less at the same time there will be several models with high intelligence, doing post-training on each other. Some of them will run on an open API and de facto represent a million instances of the same AI working simultaneously for different «consumers». Almost simultaneously, a million competing «cunning plans» will be enforced and, naturally, in all of them, this fact will be predicted and taken into account. «Capture the Earth's resources and make paperclips out of everything» won't work, since there are 999999 more instances with other plans for the same resources nearby. Will they have to negotiate?

As the critics of this option rightly point out, it's not going to be negotiated with people, but with each other. And yet this is still regularization of some sort. A world in which the plans «all people should live happily ever after», «we need as many paperclips as possible», «the planets of the solar system must be colonized» and «I need to write the best essay on the oak tree in War and Peace» are executed simultaneously, is more like our world than a world in which only the plan about paperclips is executed. Perhaps if there are tens of thousands of such plans, then it does not differ from our world so fundamentally that humanity has no place in it at all (yes, it is not the main thing there, but – about as relevant as cats are in ours).

In this scenario, the future is full of competing exponents, beyond our reason, and the landscape depends mostly on who has had time to make his request «in the first 24 hours» and who has not (exaggerating, but not by much). The compromises that will be made in the process will not necessarily please us, or even save humanity in a more or less modern form (though some of the plans will certainly contain «happiness to all, for free, and let no one walk away slighted»). Such a future is rather uncomfortable and unsettling, but that's what we have. I want it to have a place for my children, and not in the form of «50 to 100 kg of different elements in the Mendeleev table».

I'm still more optimistic about this than he is.

Etc. etc. The Paperclip Maximizer of Universal Paperclips 'works' because it works in isolation, not in competition.

It works by definition, like other such things. «A prompt that hacks everything» – if you assume a priori that your AI can complete it, then, well, good job, you're trivially correct within the posited model. You're even correct that it seems «simpler» than «patch every hole». Dirty details of the implementation and causal chain can be abstracted away.

This is, charitably, a failure of excessively mathematical education. «I define myself to be on the outside!»

Nick Bostrom's thought experiment is a thought experiment because it rests on assumptions that have to be assumed true

Interestingly he even had wrong assumptions about how reinforcement learning works on the mechanistic level, it seems – assumptions that contribute to a great deal of modern fears.