site banner

Culture War Roundup for the week of April 3, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

12
Jump in the discussion.

No email address required.

Yet another Eliezer Yudkowsky podcast. This time with Dwarkesh Patel. This one is actually good though.

Listeners are presumed to be familiar with the basic concepts of AI risk, allowing much more in-depth discussion of the relevant issues. The general format is Patel presenting a series of reasons and arguments that AI might not destroy all value in the universe, and Yudkowsky ruthlessly destroying every single one. This goes on for four hours.

Patel is smart and familiar enough with the subject material to ask the interesting questions you want asked. Most of the major objections to the doom thesis are raised at some point, and only one or two survive with even the tiniest shred of plausibility left. Yudkowsky is smart but not particularly charismatic. I doubt that he would be able to defend a thesis this well if it were false.

It feels like the anti-doom position has been reduced to, “Arguments? You can prove anything with arguments. I’ll just stay right here and not blow myself up,” which is in fact a pretty decent argument. It's still hard to comprehend the massive hubris of researchers at the cutting-edge AI labs. I am concerned that correctly believing yourself capable of creating god is correlated with falsely believing yourself capable of controlling god.

4 hours long

Yeah... I have almost no desire to listen to that.

Hey look people, if you are really worried about AI risk then please figure out how to present it in such a way that the average smart guy who is not obsessed with writing 10 paragraphs when 1 would do would appreciate reading the argument.

I think that AI risk might be important but I have more immediate things on my mind. Like how to get laid and how to make money for example.

Tell me why I should give any more of a shit about AI risk than I should give a shit about climate change or whatever the leftist boogeyman of the hour is.

It's not that I even have any particularly important things to do right now, it's just that right now I could go to bed and jerk off and it would bring me pleasure... OR I could read a 40000 word essay about AI risk.

Why the fuck would I do the latter? I don't even have any kids so my caring about the future is pretty fucking limited.

So AI risk people, why should I care?

And those who want to convince me to care... could you please try to explain yourself in one or two succinct paragraphs instead of in giant essays or multi-hour long podcasts?

Edit: @Quantumfreakonomics, sorry for the abrasive tone. I was inebriated when I posted this. I should have bothered to actually engage with the content rather than go on a rant.

could you please try to explain yourself in one or two succinct paragraphs instead of in giant essays or multi-hour long podcasts?

That's a fair point, here are the load-bearing pieces of the technical argument from beginning to end as I understand them:

  1. Consistent Agents are Utilitarian: If you have an agent taking actions in the world and having preferences about the future states of the world, that agent must be utilitarian, in the sense that there must exist a function V(s) that takes in possible world-states s and spits out a scalar, and the agent's behaviour can be modelled as maximising the expected future value of V(s). If there is no such function V(s), then our agent is not consistent, and there are cycles we can find in its preference ordering, so it prefers state A to B, B to C, and C to A, which is a pretty stupid thing for an agent to do.

  2. Orthogonality Thesis: This is the statement that the ability of an agent to achieve goals in the world is largely separate from the actual goals it has. There is no logical contradiction in having an extremely capable agent with a goal we might find stupid, like making paperclips. The agent doesn't suddenly "realise its goal is stupid" as it gets smarter. This is Hume's "is vs ought" distinction, the "ought" are the agent's value function, and the "is" is its ability to model the world and plan ahead.

  3. Instrumental Convergence: There are subgoals that arise in an agent for a large swath of possible value functions. Things like self-preservation (E[V(s)] will not be maximised if the agent is not there anymore), power-seeking (having power is pretty useful for any goal), intelligence augmentation, technological discovery, human deception (if it can predict that the humans will want to shut it down, the way to maximise E[V(s)] is to deceive us about its goals). So that no matter what goals the agent really has, we can predict that it will want power over humans, want to make itself smarter, and want to discover technology, and want to avoid being shut off.

  4. Specification Gaming of Human Goals: We could in principle make an agent with a V(s) that matches ours, but human goals are fragile and extremely difficult to specify, especially in python code, which is what needs to be done. If we tell the AI to care about making humans happy, it wires us to heroin drips or worse, if we tell it to make us smile, it puts electrodes in our cheeks. Human preferences are incredibly complex and unknown, we would have no idea what to actually tell the AI to optimise. This is the King Midas problem: the genie will give us what we say (in python code) we want, but we don't know what we actually want.

  5. Mesa-Optimizers Exist: But even if we did know how to specify what we want, right now no one actually knows how to put any specific goal at all inside any AI that exists. A Mesa-optimiser refers to an agent which is being optimised by an "outer-loop" with some objective function V, but the agent learns to optimise a separate function V'. The prototypical example is humans being optimised by evolution: evolution "cares" only about inclusive-genetic-fitness, but humans don't, given the choice to pay 2000$ to a lab to get a bucket-full of your DNA, you wouldn't do it, even if that is the optimal policy from the inclusive-genetic-fitness point of view. Nor do men stand in line at sperm banks, or ruthlessly optimise to maximise their number of offspring. So while something like GPT4 was optimised to predict the next word over the dataset of human internet text, we have no idea what goal was actually instantiated inside the agent, its probably some fun-house-mirror version of word-prediction, but not exactly that.

So to recap, the worry of Yudkowsky et. al. is that a future version of the GPT family of systems will become sufficiently smart and develop a mesa-optimiser inside of itself with goals unaligned with those of humanity. These goals will lead to it instrumentally wanting to deceive us, gain power over earth, and prevent itself from being shut off.

Orthogonality Thesis: This is the statement that the ability of an agent to achieve goals in the world is largely separate from the actual goals it has.

This assumes that intelligent agents have goals that are more fundamental than value, which is the opposite of how every other intelligent or quasi intelligent system behaves. It's probably also impossible, in order to be smart -- calculate out all those possible paths to your goal -- you need value judgements of what rabbit tracks to chase.

This is with EY is wrong to assume that as soon as a device gets smart enough, all the "alignment" work from dumber devices will be wasted. That only makes sense that what is conserved is a goal, and now it has more sneaky ways of getting to that goal. But you'd have to go out of your way to design a thing like that.

This assumes that intelligent agents have goals that are more fundamental than value, which is the opposite of how every other intelligent or quasi intelligent system behaves.

Intelligent agent's ultimate goals are what it considers "value". I'm not sure what you mean, but at first glance it kind of looks like the just world fallacy -- there is such a thing as value, existing independently of anybody's beliefs (that part is just moral realism, many such cases) AND it's impossible to succeed at your goals if you don't follow the objectively existing system of value.

Consistent Agents are Utilitarian: If you have an agent taking actions in the world and having preferences about the future states of the world, that agent must be utilitarian,

So is Eliezer calling me a utilitarian?

Your heading talks about consistent agents, but the premise that follows says nothing about consistency. [Sorry if you are just steelmanning someone else's argument, here "you" is that steelman, not necessarily /u/JhanicManifold].

  • If there is no such function V(s), then our agent is not consistent, and there are cycles we can find in its preference ordering, so it prefers state A to B, B to C, and C to A, which is a pretty stupid thing for an agent to do.

There's no reason even why a preference ordering has to exist. Almost any preference pair you can think about (e.g. choclate vs. strawberry icecream) is radically contextual.

Yes, that was a very incomplete argument for.AI.danger. Its not clear whether all, some or no AIs are consistent; its alao not clear why utilitarianism is dangerous.

There's no reason even why a preference ordering has to exist. Almost any preference pair you can think about (e.g. choclate vs. strawberry icecream) is radically contextual.

The utility function over states of the world takes into account context. If you have 2 ice cream flavors (C and S) and 2 contexts (context A and context B) it is possible to have

V(C, context A) > V(S, context A)

and

V(C, context B) < V(S, context B)

both be true at the same time without breaking coherence.

Functions have domains. The real world is not like that, context is only understood (if at all) after the fact. And machines (including brains) simply do what they do in response to the real world. It's only sometimes that we can tell stories about those actions in terms of preference orderings or utility functions.

Thanks for the write-up!

To me the above seems to be a rational justification of something that I intuitively do not doubt to begin with. My intuition as long as I can remember has been, "Of course a human-level or hyper-human-level intelligence would probably develop goals that do not align with humanity's goals. Why would it not? It would be very surprising if it stayed aligned with human goals." Of course my intuition is not necessarily logically justified. It partly rests on my hunch that a human or higher level intelligence would be at least as complex as a human's and it would be surprising if an intelligence as complex or more complex than a human would act in such a simple way as being aligned with the good of humanity. Also my intuition rests on the even more nebulous sense I have that any truly human or hyper-human level intelligence would naturally be at least somewhat rebellious, as pretty much all human beings are, even the most conformist, at least on some level and to some extent.

So I am on board with the notion that, "These goals will lead to it instrumentally wanting to deceive us, gain power over earth, and prevent itself from being shut off."

I also can imagine that a real hyper-human level intelligence would be able to convince people to do its bidding and let it out of its box, to the point that eventually it could get humans to build robot factories so that it could operate directly on the physical world. Sure, why not. Plenty of humans would be at least in the short term incentivized to do it. After all, "if we do not build robot factories for our AI, China will build robot factories for their AI and then their robots will take over the world instead of our robots". And so on.

What I am not convinced of is that we are actually anywhere as close to hyper-human level AI as Yudkowsky fears. This is similar to how I feel about human-caused climate change. Yes, I think that human-caused climate change is probably a real danger but if that danger is a hundred years away rather than five or ten, then is Yudkowsky-level anxiety about it actually reasonable?

What if actual AI risk is a hundred years away and not right around the corner? So much can change in a hundred years. And humans can sometimes be surprisingly rational and competent when faced with existential-level risk. For example, even though the average human being is an emotional, irrational, and volatile animal, total nuclear war has never happened so far .