site banner

Culture War Roundup for the week of May 1, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

9
Jump in the discussion.

No email address required.

More developments on the AI front:

Big Yud steps up his game, not to be outshined by the Basilisk Man.

Now, he officially calls for preemptive nuclear strike on suspicious unauthorized GPU clusters.

If we see AI threat as nuclear weapon threat, only worse, it is not unreasonable.

Remember when USSR planned nuclear strike on China to stop their great power ambitions (only to have the greatest humanitarian that ever lived, Richard Milhouse Nixon, to veto the proposal).

Such Quaker squeamishness will have no place in the future.

So, outlines of the Katechon World are taking shape. What it will look like?

It will look great.

You will live in your room, play original World of Warcraft and Grand Theft Auto: San Andreas on your PC, read your favorite blogs and debate intelligent design on your favorite message boards.

Then you will log on The Free Republic and call for more vigorous enhanced interrogation of terrorists caught with unauthorized GPU's.

When you bored in your room, you will have no choice than to go outside, meet people, admire things around you, make a picture of things that really impressed with your Kodak camera and when you are really bored, play Snake on your Nokia phone.

Yes, the best age in history, the noughties, will retvrn. For forever, protected by CoDominium of US and China.

edit: links again

I still see no plausible scenario for these AI-extinction events. How is chat-GPT 4/5/6 etc. supposed to end humanity? I really don't see the mechanism? Is it supposed to invent an algorithm that destroys all encryption? Is it supposed to spam the internet with nonesense? Is it supposed to brainwash someone into launching nukes? I fail to see the mechanism for how this end of the world scenario happens.

There are a few ways that GPT-6 or 7 could end humanity, the easiest of which is by massively accelerating progress in more agentic forms of AI like Reinforcement Learning, which has the "King Midas" problem of value alignment. See this comment of mine for a semi-technical argument for why a very powerful AI based on "agentic" methods would be incredibly dangerous.

Of course the actual mechanism for killing all humanity is probably like a super-virus with an incredibly long incubation period, high infectivity and high death rate. You can produce such a virus with literally only an internet connection by sending the proper DNA sequence to a Protein Synthesis lab, then having it shipped to some guy you pay/manipulate on the darknet and have him mix the powders he receives in the mail in some water, kickstarting the whole epidemic, or pretend to be an attractive woman (with deepfakes and voice synthesis) and just have that done for free.

GPT-6 itself might be very dangerous on its own, given that we don't actually know what goals are instantiated inside the agent. It's trained to predict the next word in the same way that humans are "trained" by evolution to replicate their genes, the end result of which is that we care about sex and our kids, but we don't actually literally care about maximally replicating our genes, otherwise sperm banks would be a lot more popular. The worry is that GPT-6 will not actually have the exact goal of predicting the next word, but like a funhouse-mirror version of that, which might be very dangerous if it gets to very high capability.

Consistent Agents are Utilitarian: If you have an agent taking actions in the world and having preferences about the future states of the world, that agent must be utilitarian, in the sense that there must exist a function V(s) that takes in possible world-states s and spits out a scalar, and the agent's behaviour can be modelled as maximising the expected future value of V(s). If there is no such function V(s), then our agent is not consistent, and there are cycles we can find in its preference ordering, so it prefers state A to B, B to C, and C to A, which is a pretty stupid thing for an agent to do.

But... that's how humans work? Actually we're even less consistent than that, our preferences are contextual so we lack information to rank most states. I recommend Shard Theory of human values probably the most serious intropection of ex-Yuddites to date:

shard of value refers to the contextually activated computations which are downstream of similar historical reinforcement events. For example, the juice-shard consists of the various decision-making influences which steer the baby towards the historical reinforcer of a juice pouch. These contextual influences were all reinforced into existence by the activation of sugar reward circuitry upon drinking juice. A subshard is a contextually activated component of a shard. For example, “IF juice pouch in front of me THEN grab” is a subshard of the juice-shard. It seems plain to us that learned value shards are[5] most strongly activated in the situations in which they were historically reinforced and strengthened.

... This is important. We see how the reward system shapes our values, without our values entirely binding to the activation of the reward system itself. We have also laid bare the manner in which the juice-shard is bound to your model of reality instead of simply your model of future perception. Looking back across the causal history of the juice-shard’s training, the shard has no particular reason to bid for the plan “stick a wire in my brain to electrically stimulate the sugar reward-circuit”, even if the world model correctly predicts the consequences of such a plan. In fact, a good world model predicts that the person will drink fewer juice pouches after becoming a wireheader, and so the juice-shard in a reflective juice-liking adult bids against the wireheading plan! Humans are not reward-maximizers, they are value shard-executors.

This, we claim, is one reason why people (usually) don’t want to wirehead and why people often want to avoid value drift. According to the sophisticated reflective capabilities of your world model, if you popped a pill which made you 10% more okay with murder, your world model predicts futures which are bid against by your current shards because they contain too much murder.

@HlynkaCG's Utilitarian AI thesis strikes again. Utilitarianism is a strictly degenerate decision-making algorithm because it optimizes decision theory, warps territory to get good properties of the map, it's basically inverted wireheading. Optimizer's curse is unbeatable, forget about it, an utilitarian AI with nontrivial capability will kill you or come so close to killing as to make no difference; your life and wasteful use of atoms are inevitably discovered to be a great affront to the great Cosmic project $PROJ_NAME. Consistent utilitarian agents are incompatible with human survival, because you can't specify a robust function for a maximizer that assigns value to something as specific and arbitrary and fragile as baseline humans – and AI is a red herring here! Yud himself would process trads into useful paste and Moravecian mind uploads manually if he could, and that's if he doesn't have to make hard tradeoffs at the moment. (I wouldn't, but not because I disagree much on computed "utility" of that move). Just read the guy from the time he thought he'll be the first in the AGI race. He sneeringly said «tough luck» to people who wanted to remain human. «You are not a human anyway».

Luckily this is all unnecessary.

Or as Roon puts it:

the space of minds is vast, much vaster than the instrumental convergence basin

But... that's how humans work?

Yes, humans are not consistent agents. Nobody here claimed otherwise.

Do you believe that humans must be utilitarians to achieve success in some task, " in the sense that there must exist a function V(s) that takes in possible world-states s and spits out a scalar, and the human's behaviour can be modelled as maximising the expected future value of V(s)"?