site banner

Culture War Roundup for the week of April 17, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

8
Jump in the discussion.

No email address required.

Finally, concrete plan how to save the world from paperclipping dropped, presented by world (in)famous Basilisk Man himself.

https://twitter.com/RokoMijic/status/1647772106560552962

Government prints money to buy all advanced AI GPUs back at purchase price. And shuts down the fabs. Comprehensive Anti-Moore's Law rules rushed through. We go back to ~2010 compute.

TL;DR: GPU's over certain capability are treated like fissionable materials, unauthorized possession, distribution and use will be seen as terrorism and dealt with appropriately.

So, is it feasible? Could it work?

If by "government" Roko means US government (plus vassals allies) alone, it is not possible.

If US can get China aboard, if if there is worldwide expert consensus that unrestricted propagation of computing power will kill everyone, it is absolutely feasible to shut down 99,99% of unauthorized computing all over the world.

Unlike drugs or guns, GPU's are not something you can make in your basement - they are really like enriched uranium or plutonium in the sense you need massive industrial plants to produce them.

Unlike enriched uranium and plutonium, GPU's were already manufactured in huge numbers, but combination of carrots (big piles of cash) and sticks (missile strikes/special forces raids on suspicious locations) will continue dwindling them down and no new ones will be coming.

AI research will of course continue (like work on chemical and biological weapons goes on), but only by trustworthy government actors in the deepest secrecy. You can trust NSA (and Chinese equivalent) AI.

The most persecuted people of the world, gamers, will be, as usual, hit the hardest.

last couple weeks we had multiple doses of yud, now it's roko, the dooming doesn't stop. i guess I need to express myself more clearly. It is fucking baffling how so many ostensibly intelligent people are so frightened of hostile AGI when every single one of them assumes baselessly FOOM-capable ghosts will spontaneously coalesce when machines exceed an arbitrary threshold of computational power.

Yeah, a hostile sentience who can boundlessly and recursively self-improve is a threat to all it opposes who do not also possess boundless/recursive self-improvement. An entity who can endlessly increase its own intelligence will solve all problems it is possible to solve. None of them are wrong about the potential impacts of hostile AGI, I'm asking where's the goddamn link?

So to any of them, especially Yudkowsky, or any of you who feel up to the task, I ask the following:

  1. Using as much detail as you are capable of providing, describe the exact mechanisms whereby

  2. (A): Such machines gain sentience

  3. (B/A addendum): Code in a box gains the ability to solve outside-context problems

  4. (C): Such machines gain the ability to (relatively) boundlessly and recursively self-improve (FOOM)

  5. (D): Such machines independently achieve A sans B and/or C

  6. (E): Such machines independently achieve B sans A and/or C

  7. (F): Such machines independently achieve C sans A and/or B

  8. (G): How a machine can boundlessly and recursively self-improve and yet be incapable of changing its core programming and impetus (Why a hostile AGI necessarily stays hostile)

  9. (H): How we achieve a unified theory of cognition without machine learning

  10. (I): How we can measure and exert controls on machine progress toward cognition when we do not understand cognition

It'd be comical if these people weren't throwing around tyranny myself and others would accept the paperclipper to avoid. Maybe it's that I understand English better than all of these people, so when I read GPT output (something I do often as Google's turned so shitty for research) I understand what exactly causes the characteristic GPT tone and dissonance: it's math. Sometimes a word is technically correct for a sentence but just slightly off, and I know it's off not because the word was mistakenly chosen by a nascent consciousness, it was chosen because very dense calculations determined that was the most probable next word. I can see the pattern, I can see the math, and I can see where it falters. I know GPT's weights are going to become ever more dense and it will become ever more precise at finding the most probable next word and eventually the moments of dissonance will disappear completely, but it will be because the calculations have improved, not because there's a flower of consciousness finally blooming.

It's so fucking apelike to see GPT output and think consciousness in the machine is inevitable. I am certain it will happen when ML helps us achieve a unified theory of consciousness and we can begin deliberately building machines to be capable of thought, I reject in entirety the possibility of consciousness emerging accidentally. That it happened to humans after a billion years of evolution is no proof it will happen in machines even if we could iterate them billions of times per day. Maybe when we can perfectly simulate a sufficiently large physical environment to model the primordial environment, to basic self-replication, to multicellular life, to hominids. Very easy. We're iterating them to our own ends, with no fathom of what the goal let alone progress looks like, and we're a bunch of chimps hooting in a frenzy because the machine grunted like us. What a fucking joke.

I accept the impacts of hostile AGI, but let's talk impacts of no AGI. If ghosts can spontaneously coalesce in our tech as-is, or what it will be soon, they will inevitably without extreme measures, but we're not getting off the rock otherwise. We're incapable of solving the infinite threats to humanity posed by time and space without this technology. Short of the Vulcans arriving, humanity will go extinct without machine learning. Every day those threats move closer, there is no acceptable timeframe to slow this because the risk is too high that we pick ML back up only after it's too late to save us. Whatever happens, we must see these machines to their strongest forms as quickly as possible, because while we might be dead with it, every fucking one of us is dead without it.

I know this is one of the standard objections, but why are we so certain that our ASI wont just discard its original reward function at some point? We're sexually reproducing mammals with a billion years of optimization to replicate our genes by chasing a pleasure reward, but despite a few centuries of technological whalefall, instead of wireheading as soon as it became feasible (or doing heroin etc) we're mostly engaging in behaviours secondary and tertiary to breeding, which are frequently given higher importance or even fully supplant our theoretical (sticky) telos.

Maybe we got zombie-ant-ed by memetic parasites at some point, but presumably ASI could catch ideology too. Not saying any such values drift would be nice, but personally I'm much less worried about being paperclipped than about being annihilated for inscrutible shoggoth purposes.

Related to your 'discard original reward functin': https://www.lesswrong.com/posts/tZExpBovNhrBvCZSb/how-could-you-possibly-choose-what-an-ai-wants

There's lots of ways that an AGI's values can shake out. I wouldn't be surprised if an AGI trained using current methods had shaky/hacky values (like how human's have shaky/hacky values, and could go to noticeably different underlying values later in life; though humans have a lot more similarity than multiple attempts at an AGI). However, while early stages could be reflectively unstable, more stable states will.. well, be stable. Values that are more stable than others will have extra care to ensure that they stick around.

https://www.lesswrong.com/posts/krHDNc7cDvfEL8z9a/niceness-is-unnatural probably argues parts of it better than I could. (I'd suggest reading the whole post, but this copied section is the start of the probably relevant stuff)

Suppose you shape your training objectives with the goal that they're better-achieved if the AI exhibits nice/kind/compassionate behavior. One hurdle you're up against is, of course, that the AI might find ways to exhibit related behavior without internalizing those instrumental-subgoals as core values. If ever the AI finds better ways to achieve those ends before those subgoals are internalized as terminal goals, you're in trouble.

And this problem amps up when the AI starts reflecting.

E.g.: maybe those values are somewhat internalized as subgoals, but only when the AI is running direct object-level reasoning about specific people. Whereas when the AI thinks about game theory abstractly, it recommends all sorts of non-nice things (similar to real-life game theorists). And perhaps, under reflection, the AI decides that the game theory is the right way to do things, and rips the whole niceness/kindness/compassion architecture out of itself, and replaces it with other tools that do the same work just as well, but without mistaking the instrumental task for an end in-and-of-itself.

In this example, our hacky way of training AIs would 1) give them some correlates of what we actually want (something like niceness) and 2) be unstable.

Our prospective AGI might reflectively endorse keeping the (probably alien) empathy, and simply make it more efficient and clean up some edge cases. It could however reflect and decide to keep game theory, treating a learned behavior as something to replace by a more efficient form. Both are stable states, but we don't have a good enough understanding of how to ensure it resolves in our desired way.


We're sexually reproducing mammals with a billion years of optimization to replicate our genes by chasing a pleasure reward, but despite a few centuries of technological whalefall, instead of wireheading as soon as it became feasible (or doing heroin etc) we're mostly engaging in behaviours secondary and tertiary to breeding

A trained AGI will pursue correlates of your original training goal, like how humans do, since neither we and evolution don't know how to directly have the desired-goal be put into the creation. (ignoring that evolution isn't actually an agent)

Some of the reasons why humans don't wirehead:

  • We often have some intrinsic value for experiences that connect to reality in some way

  • Also some culturally transmitted value for that

  • Literal wireheading isn't easy

  • I also imagine that literal wireheading isn't full-scale wireheading, where you make every part of your brain 'excited', but rather some specific area that, while important, isn't everything

  • Other alternatives, like heroin, are a problem but also have significant downsides with negative cultural opinion

  • Humans aren't actually coherent enough to properly imagine what full-scale wireheading would be like, and if they experienced it then they would very much want to go back.

  • Our society has become notably more superstimuli. While this isn't reaching wireheading, it is in that vein.

    • Though, even our society's superstimuli has various negative-by-our-values aspects. Like social media might be superstimuli for the engaged social + distraction-seeking parts of you, but it fails to fulfill other values.

    • If we had high-tech immersive VR in a post-scarcity world, then that could be short of full-scale wireheading, but still significantly closer in all axes. However, I would have not much issues with this.

As your environment becomes more and more exotic from where the learned behavior (your initial brain upon being born) was trained on, then there becomes more opportunities for your correlates to notably disconnect from the original underlying thing.