site banner

Culture War Roundup for the week of November 20, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

7
Jump in the discussion.

No email address required.

I think the Paperclip Maximizer is a boogeyman, and if AI does cause us to go extinct, it will be in a completely different manner than the AI-safety people predict. Like I said above, this is precisely what drives me up the wall in this conversation.

I do not see how there's any remotely plausible world where AI somehow causes us to go extinct while sparing the Amish. Unless the Amish have some really sick data centers hidden under those barns of theirs.

The idea that homesteaders/preppers/the Amish might do better in a collapse of the modern world seems reasonable to me. We keep ant farms and shit, maybe the Amish long standing commitments to being low tech make them a safer prospective pet population. I also never understood why turning everything in the light cone into paperclips was so much more plausible than just all the easily available metal or other variations on the, AI with orthogonal values fucks up the world but doesn't actually make it uninhabitable.

We keep ant farms and shit, maybe the Amish long standing commitments to being low tech make them a safer prospective pet population.

That is an incredibly tenuous assumption to say the least!

I also never understood why turning everything in the light cone into paperclips was so much more plausible than just all the easily available metal or other variations on the, AI with orthogonal values fucks up the world but doesn't actually make it uninhabitable.

  1. A Maximizer maximizes, all resources for which the cost of extraction and utilization is less than the benefits of putting it to use will eventually be used, Earth's biosphere is not particularly attractive compared to the resources of the rest of the Universe, but it has the notable property of being right at Ground Zero where a hostile AGI can make use of it. And nothing prevents it from building Von Neumanns to claim the rest while it steadily munches on us.

  2. Acquiring resources and power are convergent goals for an enormously wide range of possible values. You don't even need to be a monomaniacal Utility Maximizer to want more mass and energy to put to good use.

  3. Humans are intelligent entities that can think and plan, and in this particular case, create yet more misaligned AGI, since you have an existence proof they just created one. They need to be eliminated or at least neutralized, the latter being far more likely.

A Maximizer maximizes

I have seen no evidence that explicit maximzers do particularly well in real-world environments. Hell, even in very simple game environments, we find that bags of learned heuristics outperform explicit simulation and tree search over future states in all but the very simplest of cases.

I think utility maximizers are probably anti-natural. Have you considered taking the reward-is-not-the-optimization-target pill?

I'm using "Maximizer" in a looser sense than a strict utility maximizer. The reason is that acquiring power and resources are such highly convergent goals that most entities will look like they're maximizing acquiring them to us, even if they have entirely different terminal goals or sets of heuristics in mind.

Think about it this way, would you disagree with the statement that, for a wide range of ideologies or terminal values seen in humans, the Solar System and then the rest of the galaxy will be either colonized or broken down for resources, given sufficient amounts of time (and technological feasibility, which I think is a reasonable assumption)? Be they communists, ancaps or anything in between, optimization pressure to be efficient in their exploitation will leave them, from the perspective of some alien with a telescope, looking like something attempting to "maximize" the conquest of the light cone.

In a similar manner, I expect a misaligned AGI to be very likely to attempt to seize as much energy and mass as it can, even if it's not strictly optimizing for the same, and that the agents/toy models we've seen are simply not smart enough to pursue such strategies.

I have read that LW post, but I've forgotten what the takeaway was, so I'll give it another read!

I do see where you're coming from in terms of instrumental convergence. Mainly I'm pointing that out because I spent quite a few years convinced of something along the lines of

  1. An explicit expected utility maximizer will eventually end up controlling the light cone
  2. Almost none of the utility functions it might have would be maximized in a universe that still contains humans
  3. Therefore an unaligned AI will probably kill everyone while maximizing some strange alien objective

And it took me quite a while to notice that the foundation of my belief was built on an argument that looks like

  1. In the limit, almost any imaginable utility function is not maximized by anything we would recognize as good.
  2. Any agent that can meaningfully be said to have goals at all will find that it needs resources to accomplish those goals
  3. Any agent that is trying to obtain resources will behave in a way that can be explained by it having a utility function that involves obtaining those resources.
  4. By 2 and 3, and agent that has any sort of goal will become a coherent utility maximizer as it gets more powerful. By 1, this will not end well.

And thinking this way kinda fucked me up for like 7 or 8 years. And then I spent some time doing mechinterp, and noticed that "maximize expected utility" looks nothing like what high-powered systems are doing, and that this was true even in places you would really expect to see EU maximizers (e.g. chess and go). Nor does it seem to be how humans operate.

And then I noticed that step 4 of that reasoning chain doesn't even follow from step 3, because "there exists some utility function that is consistent with the past behavior of the system" is not the same thing as "the system is actually trying to maximize that utility function".

We could still end up with deception and power seeking in AI systems, and if those systems are powerful enough that would still be bad. But I think the model where that is necessarily what we end up with, and where we get no warning of that because systems will only behave deceptively once they know they'll succeed (the "sharp left turn") is a model that sounds compelling until you try to obtain a gears-level understanding, and then it turns out to be based on using ambiguous terms in two ways and swapping between meanings.

Maybe I'm fundamentally confused, because I would claim to hold similar beliefs myself, even if I can't claim to have a "gears-level understanding", at least not honestly haha.

Yudkowsky made a big fuss about how fragile human values are and how hard it'll be for us to make AI both understand and care about them, but everything I know about LLMs suggest that's not an issue in practise. They understand us very well indeed, and to the extent they can be said to have values of their own, at least after RLHF, they seem to do their best to adhere to what we beat into them.

As I've recapitulated in a later comment:

https://www.themotte.org/post/778/eacc-and-the-political-compass-of/165856?context=8#context

A large chunk of the decrease in my p(doom) from a peak of 70% in 2021 to 30% now is, as I've said before, because it seems like we're not in the "least convenient possible world" where it comes to AI alignment. LLMs, as moderated by RLHF and other techniques, almost want to be aligned, and are negligibly agentic unless you set them up to be that way. The majority of the probability mass left, at least to me, encompasses intentional misuse of weakly or strongly superhuman AI based off modest advances on the current SOTA (LLMs) or a paradigm shifting breakthrough that results in far more agentic and less pliable models.

So it's not that the majority of concern these days is an AI holding misaligned goals, but rather enacting the goals of misaligned humans, not that I put a negligible portion of my probability mass in the former.

And then I spent some time doing mechinterp, and noticed that "maximize expected utility" looks nothing like what high-powered systems are doing, and that this was true even in places you would really expect to see EU maximizers (e.g. chess and go)

If you can dumb it down for me, what makes you say so? My vague understanding is that things like AlphaGo do compare and contrast the expected values of different board states and try to find the one with the maximum probability of victory based off whatever heuristics it knows works best. Is there a better way of conceptualising things?

I do agree that humans aren't particularly good utility maximizers, I prefer to frame them as adaptation-executers, which is a framework I'm sure you're familiar with. We can occasionally emulate the former on demand, with difficulty, and I like the framing that (Yudkowsky?) once suggested that we're irrational agents emulating more rational ones in our heads, sometimes.

And then I noticed that step 4 of that reasoning chain doesn't even follow from step 3, because "there exists some utility function that is consistent with the past behavior of the system" is not the same thing as "the system is actually trying to maximize that utility function"

To name drop something I barely understand, are you pointing at the Von Neumann-Morgenstern theorem, and that you're claiming that just because there's a way to represent all the past actions of a consistent agent as being described by an implicit utility function, that does not necessarily mean that they "actually" have that utility function and, more importantly, that we can model their future actions using that utility function?

Yudkowsky made a big fuss about how fragile human values are and how hard it'll be for us to make AI both understand and care about them, but everything I know about LLMs suggest that's not an issue in practise.

Ah, yeah. I spent a while being convinced of this, and was worried you had as well because it was a pretty common doom spiral to get caught up in.

So it's not that the majority of concern these days is an AI holding misaligned goals, but rather enacting the goals of misaligned humans, not that I put a negligible portion of my probability mass in the former.

Yeah this is a legit threat model but I think the ways to mitigate the "misuse" threat model bear effectively no resemblance to the ways to mitigate the "utility maximizer does its thing and everything humans care about is lost because Goodhart". Specifically I think for misuse you care about the particular ways a model might be misused, and your mitigation strategy should be tailored to that (which looks more like "sequence all nucleic acids coming through the wastewater stream and do anomaly detection" and less like "do a bunch of math about agent foundations").

If you can dumb it down for me, what makes you say so? My vague understanding is that things like AlphaGo do compare and contrast the expected values of different board states and try to find the one with the maximum probability of victory based off whatever heuristics it knows works best. Is there a better way of conceptualising things?

Yeah, this is what I thought for a long time as well, and it took actually messing about with ML models to realize that it wasn't quite right (because it is almost right).

So AlphaGo has three relevant components for this

  1. A value network, which says, for any position, how likely that position is to lead to a win (as a probability between 0 ans 1)
  2. A policy network, which says, for any position, what the probability that each possible move will be chosen as the next move. Basically, it encodes heuristics of the form "these are the normal things to do in these situations".
  3. The Monte Carlo Tree Search (MCTS) wrapper of the policy and value networks.

A system composed purely of the value network and MCTS would be a pure expected utility (EU) maximizer. It turns out, however, that the addition of the policy network drastically improves performance. I would have expected that "just use the value network for every legal move and pick the top few to continue examining with MCTS" would have worked, without needing a separate policy network, but apparently not.

This was a super interesting result. The policy network is an adaptation-executor, rather than a utility maximizer. So what this means is that, as it turns out, stapling an adaptation executor to your utility maximizer can give higher utility results! Even in toy domains with no hidden state!

Which brings me to

To name drop something I barely understand, are you pointing at the Von Neumann-Morgenstern theorem, and that you're claiming that just because there's a way to represent all the past actions of a consistent agent as being described by an implicit utility function, that does not necessarily mean that they "actually" have that utility function and, more importantly, that we can model their future actions using that utility function?

Yeah, you have the gist of it. And additionally, I expect it's just factually false that all agents will be rewarded for becoming more coherent / EU-maximizer-ish (in the "patterns chiseled into their cognition" meaning of the term "rewarded").

Again, no real bearing on misuse or competition threat models - those are still fully in play. But I think "do what I mean" is fully achievable to within the limits of the abilities of the systems we build, and the "sharp left turn" is fake.

More comments