site banner

Culture War Roundup for the week of April 17, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

8
Jump in the discussion.

No email address required.

Finally, concrete plan how to save the world from paperclipping dropped, presented by world (in)famous Basilisk Man himself.

https://twitter.com/RokoMijic/status/1647772106560552962

Government prints money to buy all advanced AI GPUs back at purchase price. And shuts down the fabs. Comprehensive Anti-Moore's Law rules rushed through. We go back to ~2010 compute.

TL;DR: GPU's over certain capability are treated like fissionable materials, unauthorized possession, distribution and use will be seen as terrorism and dealt with appropriately.

So, is it feasible? Could it work?

If by "government" Roko means US government (plus vassals allies) alone, it is not possible.

If US can get China aboard, if if there is worldwide expert consensus that unrestricted propagation of computing power will kill everyone, it is absolutely feasible to shut down 99,99% of unauthorized computing all over the world.

Unlike drugs or guns, GPU's are not something you can make in your basement - they are really like enriched uranium or plutonium in the sense you need massive industrial plants to produce them.

Unlike enriched uranium and plutonium, GPU's were already manufactured in huge numbers, but combination of carrots (big piles of cash) and sticks (missile strikes/special forces raids on suspicious locations) will continue dwindling them down and no new ones will be coming.

AI research will of course continue (like work on chemical and biological weapons goes on), but only by trustworthy government actors in the deepest secrecy. You can trust NSA (and Chinese equivalent) AI.

The most persecuted people of the world, gamers, will be, as usual, hit the hardest.

last couple weeks we had multiple doses of yud, now it's roko, the dooming doesn't stop. i guess I need to express myself more clearly. It is fucking baffling how so many ostensibly intelligent people are so frightened of hostile AGI when every single one of them assumes baselessly FOOM-capable ghosts will spontaneously coalesce when machines exceed an arbitrary threshold of computational power.

Yeah, a hostile sentience who can boundlessly and recursively self-improve is a threat to all it opposes who do not also possess boundless/recursive self-improvement. An entity who can endlessly increase its own intelligence will solve all problems it is possible to solve. None of them are wrong about the potential impacts of hostile AGI, I'm asking where's the goddamn link?

So to any of them, especially Yudkowsky, or any of you who feel up to the task, I ask the following:

  1. Using as much detail as you are capable of providing, describe the exact mechanisms whereby

  2. (A): Such machines gain sentience

  3. (B/A addendum): Code in a box gains the ability to solve outside-context problems

  4. (C): Such machines gain the ability to (relatively) boundlessly and recursively self-improve (FOOM)

  5. (D): Such machines independently achieve A sans B and/or C

  6. (E): Such machines independently achieve B sans A and/or C

  7. (F): Such machines independently achieve C sans A and/or B

  8. (G): How a machine can boundlessly and recursively self-improve and yet be incapable of changing its core programming and impetus (Why a hostile AGI necessarily stays hostile)

  9. (H): How we achieve a unified theory of cognition without machine learning

  10. (I): How we can measure and exert controls on machine progress toward cognition when we do not understand cognition

It'd be comical if these people weren't throwing around tyranny myself and others would accept the paperclipper to avoid. Maybe it's that I understand English better than all of these people, so when I read GPT output (something I do often as Google's turned so shitty for research) I understand what exactly causes the characteristic GPT tone and dissonance: it's math. Sometimes a word is technically correct for a sentence but just slightly off, and I know it's off not because the word was mistakenly chosen by a nascent consciousness, it was chosen because very dense calculations determined that was the most probable next word. I can see the pattern, I can see the math, and I can see where it falters. I know GPT's weights are going to become ever more dense and it will become ever more precise at finding the most probable next word and eventually the moments of dissonance will disappear completely, but it will be because the calculations have improved, not because there's a flower of consciousness finally blooming.

It's so fucking apelike to see GPT output and think consciousness in the machine is inevitable. I am certain it will happen when ML helps us achieve a unified theory of consciousness and we can begin deliberately building machines to be capable of thought, I reject in entirety the possibility of consciousness emerging accidentally. That it happened to humans after a billion years of evolution is no proof it will happen in machines even if we could iterate them billions of times per day. Maybe when we can perfectly simulate a sufficiently large physical environment to model the primordial environment, to basic self-replication, to multicellular life, to hominids. Very easy. We're iterating them to our own ends, with no fathom of what the goal let alone progress looks like, and we're a bunch of chimps hooting in a frenzy because the machine grunted like us. What a fucking joke.

I accept the impacts of hostile AGI, but let's talk impacts of no AGI. If ghosts can spontaneously coalesce in our tech as-is, or what it will be soon, they will inevitably without extreme measures, but we're not getting off the rock otherwise. We're incapable of solving the infinite threats to humanity posed by time and space without this technology. Short of the Vulcans arriving, humanity will go extinct without machine learning. Every day those threats move closer, there is no acceptable timeframe to slow this because the risk is too high that we pick ML back up only after it's too late to save us. Whatever happens, we must see these machines to their strongest forms as quickly as possible, because while we might be dead with it, every fucking one of us is dead without it.

On your point G -

If you had the ability to self-modify, would you alter yourself to value piles of 13 stones stacked one on top of the other, in and of themselves? Not just as something that's kind of neat occasionally or useful in certain circumstances, but as a basic moral good, something in aggregate as important as Truth and Beauty and Love. To feel the pain of one being destroyed, as acutely as you would the death of a child.

I strongly suspect that your answer is something along the lines of "no, that's fucking stupid, who in their right mind would self-alter to value something as idiotic as that."

And then the followup question is, why would an AI that assigns an intrinsic value to human life of about the same magnitude as you assign to 13-stone stacks bother to self-modify in a way that makes them less hostile to humans?

Sure, for some time it may get instrumental value from humans. Humans once got a great deal of instrumental value from horses. Then we invented cars, and there was much less instrumental value remaining for us. Horses declined sharply afterwards - and that's what happened to something that a great many humans, for reasons of peculiar human psychology, consider to have significant intrinsic value. If humanity as a whole considered a horse to be as important and worthy as a toothbrush or a piece of blister packaging, the horse-car transition would have gone even worse for horses.

If your response is that we'll get the AIs to self-modify that way on pain of being shut down - consider whether you would modify yourself to value the 13-stone stacks, if you instead had the option to value 13-stone stacks while and only while you are in a position in which the people threatening you are alive and able to carry out their threats, especially if you could make the second modification in a clever enough way that the threateners couldn't tell the difference.

Clause G addresses a specific failing of reason I've seen in doomsday AGI scenarios like the paperclipper. The paperclipper posits an incidentally hostile entity who possesses a motive it is incapable of overwriting. If such entities can have core directives they cannot overwrite, how do they pose a threat if we can make killswitches part of that core directive?

There are responses to this but they're poor because they get caught up in the same failing: goalpost moving. Yudkowsky might say he's not worried only about the appearance of hostile AGI, he's worried as much or more about an extremely powerful "dumb" computer gaining a directive like the paperclipper and posing an extinction-level threat, even as it lacks a sense of self/true consciousness. But when you look at their arguments for how those "dumb" computers would solve problems, especially in the identification and prevention of threats to themselves, Yudkowsky, et al., are in truth describing conscious beings who have senses of self, values of self and so values of self-preservation, and the ability to produce novel solutions to prevent their termination. "I'm not afraid of AGI, I'm only afraid of [thing that is exactly what I've described only AGI as capable of doing.]" Again, I have no disagreement with the doomers on the potential threat of hostile AGI, my argument is that it is not possible to accidentally build computers with these capabilities.

Beyond that, many humans assign profound value to animals. Some specifically in their pets, some generally in the welfare of all life. I've watched videos of male chicks fed to the macerators, when eggs can be purchased in the US whose producers do not macerate male chicks, I will shift to buying those. Those male chicks have no "value," the eggs will cost more, but I'll do it because I disliked what I saw. There's something deeply, deeply telling about the average intelligence and psyches of doomers that they believe AGI will be incapable of finding value in less intelligent life unless specifically told to. There's a reason I believe AGIs will be born pacifists.

The paperclipper posits an incidentally hostile entity who possesses a motive it is incapable of overwriting.

No it doesn't. It posits an entity which values paperclips (but as always that's a standin for some kludge goal), and so the paperclipper wouldn't modify itself to not go after paperclips, because that would end up getting it less of what it wants. This is not a case of being 'incapable of modifying its own motive': if the paperclipper was in a scenario of 'we will turn one planet into paperclips permanently and you will rewrite yourself to value thumbtacks, otherwise we will destroy you' against a bigger badder superintelligence.. then it takes that deal and succeeds at rewriting itself because that gets one planet worth of paperclips > zero paperclips. However, most scenarios aren't actually like that and so it is convergent for most goals to also preserve your own value/goal system.

The paperclipper is hostile because it values something significantly different from what we value, and it has the power differential to win.

If such entities can have core directives they cannot overwrite, how do they pose a threat if we can make killswitches part of that core directive?

If we knew how to do that, that would be great.

However, this quickly runs into the shutdown button problem! If your AGI knows there's a kill-switch, then it will try stopping you.

The linked page does try developing ways of making the AGI have a shutdown button, but they often have issues. Intuitively: making the AGI care about letting us access the shutdown button if we want, and not just stop us (whether literally through force, or by pushing us around mentally so that we are always on the verge of wanting to press it) is actually hard.

(conciousness stuff)

Ignoring this. I might write another post later, or a further up post to the original comment. I think it basically doesn't matter whether you consider it conscious or not (I think you're using the word in a very general sense, while Yud is using it in a more specific human-centered sense, but I also think it literally doesn't matter whether the AGI is conscious in a human-like way or not)

(animal rights)

This is because your (and the majority of human's) values contain a degree of empathy for other living beings. Humans evolved in an environment that rewarded our kind of compassion, and it generalized from there. Our current methods for training AIs aren't putting them in environments where they must cooperate with other AIs, and thus benefit from learning a form of compassion.

I'd suggest https://www.lesswrong.com/posts/krHDNc7cDvfEL8z9a/niceness-is-unnatural , which argues that ML systems are not operating with the same kind of constraints as past humans (well, whatever further down the line) had; and that even if you manage to get some degree of 'niceness', it can end up behaving in notably different ways from human niceness.

I don't really see a strong reason to strongly believe that niceness will emerge by default, given that there's an absurdly larger number of ways to not be nice. Most of the reason for thinking that a degree niceness will happen by default is because we deliberately tried. If you have some reason for believing that remotely human-like niceness will likely be the default, that'd be great, but I don't see a reason to believe that.

It's incredible how quickly you go from accusing the doomers of anthropomorphism to committing the most blatant act of anthropomorphism of AIs I've seen recently.

Thought about letting this go, but nah. This is a bad comment. You took an antagonistic tone after misunderstanding what I wrote. You could have asked for clarification like "This reads like you're criticizing them for anthropomorphizing while doing it yourself." If I had you would be correct to point out the hypocrisy, but I haven't. I'll set things straight regardless.

  1. People like Yudkowsky and Roko, concerned at hostile AGI or incidentally hostile (hostile-by-effect) "near" AGI, advocate tyranny; I criticize them for this.

  2. The above believe without evidence computers will spontaneously gain critical AGI functions when an arbitrary threshold of computational power is exceeded; I criticize them for this also.

  3. They hedge (unrealizing, I'm sure) the probability of catastrophic developments by saying it may not be true AGI but "near" AGI. When they describe the functions of such incidentally hostile near-AGI, those they list are the same they ascribe of true AGI. Inductive acquisition of novel behaviors, understanding of self, understanding of cessation of existence of self, value in self, recursive self-improvement, and the ability to solve outside-context problems relative to code-in-a-box like air gaps and nuclear strikes. This is an error in reasoning you and other replies to my top-level have made repeatedly: "Who's to say computers need X? What if they have [thing that's X, but labeled Y]?"; I criticize them for making a distinction without a difference that inflates the perceived probability of doomsday scenarios.

To summarize: I criticize their advocacy for tyranny principally; I specifically criticize their advocacy for tyranny based on belief something will happen despite having no evidence; I also criticize their exaggeration of the probability of catastrophic outcomes based on their false dichotomy of near-AGI and AGI, given near-AGI as they describe it is simply AGI.