site banner

Culture War Roundup for the week of April 24, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

11
Jump in the discussion.

No email address required.

A heads-up: Yudkowsky will be talking AI with YouTube long-timer Ross Scott on the 3rd. This comes after Ross's last videochat with fans [warning: long, use the timestamps in one of the comments to skip around], where AI and Big Yud came up.

I expect that Ross will be able to wring some sort of explanation about AI risk out of Yudkowsky that will be palatable to the everyman. Ross has talked about things like Peak Oil before (here's an old, old video on the subject), so I think it will be interesting to see. I'll have to see if I can find out Ross's position on AI risk so far.

Maybe it's just because I'm not in on the game enough, or that I'm getting bored, or that I'm a little too honest about being stupid, but I'm starting to get the same kind of vibes from these interviews as I get from listening to one too many interviews with 'science popularizers' and physicists talking about black holes and solar systems or whatever. At some point the endless stream of analogies, abstractions and hypothetical arguments just starts sounding like a 2 hour poem about math that I don't understand.

You can assure me it makes sense. You can explain to me how this new and exciting theory of the universe, that hinges entirely on mathematical assumptions, is like dumping a gallon of milk into a box of cereal before pouring it into the bowl, and I can maybe relate to that analogy because I know milk and cereal. But, again, at the end of the day I will never be able to relate that analogy to what is actually being talked about because all that's really there is theoretical math I don't understand.

These conversations seem to follow a similar but slightly different path of, there's no actual math, just assumptions being made about the future. The AI man says we are doomed if we continue. Here's a powerful analogy. Here's technobabble about code... Like, dude, you got me, OK? This appeals to my vanity for coffee table philosophical arguments and you are a credentialed person who sounds confident in your convictions. I guess we are doomed. Now, who is the next guest on Joe Rogan? Oh, science man is going to tell me about a super massive black hole that can eat the sun. Bro, did you know that a volcanic eruption in Yellowstone park could decimate the entire planet? This doctor was talking about anti-biotics and...

I don't want to come across as too belligerent, but all this stuff just seems to occupy the same slot of 'it feels important/novel to care'. I'm not going to pretend to understand or care any more than I would care about Yellowstone. I'll accept all the passionate believers telling me that they told me so when the inevitable mega-earthquakes happen.

But until then I'll just continue enjoying the memes that predate our inevitable apocalypse with the same urgency that the people worrying over AI show when enjoying yet another 4 hour interview, followed by days of more rigorous debate, over the ever encroaching extinction level threat that is AI.

1/2

You can explain to me how this new and exciting theory of the universe, that hinges entirely on mathematical assumptions, is like dumping a gallon of milk into a box of cereal before pouring it into the bowl, and I can maybe relate to that analogy because I know milk and cereal. But, again, at the end of the day I will never be able to relate that analogy to what is actually being talked about because all that's really there is theoretical math I don't understand.

Your broad impression is correct with one massive caveat: there's no there, there. It is about milk and cereal, and the pretense that the analogy simplifies some profound idea is a pose, it serves to belittle and bully you into meek acceptance of the conclusion which is not founded on some solid model applying to the bowl and the universe alike. Yud's opinions do not follow from math, he arrived at them before stumbling on convenient math, most other doomers don't even understand involved math, and none of this math says much of anything about AI we are likely to build.

It's important to realize, I think, that Yud's education is 75% Science Fiction from his dad's library and 25% Jewish lore in cheder he flunked out of. That's all he learned systematically in his life, I'm afraid; other than that he just skimmed Kahnemahn, Cialdini and so on, assorted pop-sci, and some math and physics and comp sci because he is, after all, pretty smart and inclined to play around with cute abstractions. But that's it. He never had to meet deadlines, he never worked empirically, he never applied any of the math he learned in a way that was regularized against some real-world benchmark, KPI or a mean professor. Bluntly, he's a fraud, a simulacrum, an impostor.

More charitably, he's a 43-year-old professional wunderkind whose self-perception hinges on continuing to play the part. He's similar to Yevgeny «Genius» «Maestro» Ponasenkov, a weird fat guy who LARPs as a pre-Revolutionary noble and a maverick historian (based). Colloquially these people are known as freaks and crackpots, and their best defense for the last two millenia is that Socrates was probably the same but he became Great; except he did not LARP as anyone else.

I know this dirty observation is not polite to make among Rationalists. I've talked to really smart and accomplished people who roll their eyes when I say this about Yud, who object «come on now, you're clowning yourself, the guy's some savant – hell, I've got a Ph.D in particle physics and won at the All-Russian Math Olympiad, and he's nobody but talks jargon like he understands it better than my peers» and I want to scream «you dumb defenseless quokka, do you realize that while you were grinding for that Olympiad he was grinding to give off signals of an epic awesome Sci-Fi character?! That for every bit of knowledge, he gets a hundredfold more credit than you, because he arranges it into a mask while you add to the pearl of your inner understanding? That the way Yud comes across is not a glimpse of his formidability but the whole of it? Can you not learn that we wordcels are born with dark magic at the tips of our tongues, magic you do not possess, magic that cannot remake nature but enslaves minds?»

Ahem.

Let's talk about one such analogy, actually the core analogy he uses: it's about human evolution and inclusive genetic fitness. AGI Ruin: A List of Lethalities, 5th Jun '22:

Section B:

So why not train a giant stack of transformer layers on a dataset of agents doing nice things and not bad things, throw in the word 'corrigibility' somewhere, crank up that computing power, and get out an aligned AGI?

Section B.2:  Central difficulties of outer and inner alignment.

16.  Even if you train really hard on an exact loss function, that doesn't thereby create an explicit internal representation of the loss function inside an AI that then continues to pursue that exact loss function in distribution-shifted environments.  Humans don't explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn't produce inner optimization in that direction.  This happens in practice in real life, it is what happened in the only case we know about, and it seems to me that there are deep theoretical reasons to expect it to happen again: the first semi-outer-aligned solutions found, in the search ordering of a real-world bounded optimization process, are not inner-aligned solutions.  This is sufficient on its own, even ignoring many other items on this list, to trash entire categories of naive alignment proposals which assume that if you optimize a bunch on a loss function calculated using some simple concept, you get perfect inner alignment on that concept.

Point 16, Misalignment In The Only Precedent We Know About, is a big deal. There are 46 points in total, but it's a bit of a sham: many about AGI being smart, politics of «preventing other people from building an unaligned AGI», handwringing in 39-43, «multiple unaligned AGIs still bad», other padding. Pretty much every moving part depends on the core argument for AI being very likely to «learn wrong» i.e. acquire traits that unfold as hazardous out of (training) distribution, and the 16th corroborates all of such distributional reasoning in B.1 (10-15). 17-19, arguably more, expound on 16.

Accordingly, Yudkowsky cites it a lot and in slightly varied forms, e.g. on Bankless, 20th Feb 23:

we do not know how to get goals into a system. We can cause them to do a thing inside a distribution they were optimized over using gradient descent. But if you shift them outside of that distribution, I expect other weird things start happening. … GPT-7, there's probably a bunch of stuff in there too that desires to accurately model things like humans under a wide range of circumstances, but it's not exactly humans because ice cream didn't exist in the natural environment, the ancestral environment, the environment of evolutionary adaptedness. There was nothing with that much sugar, salt, fat combined together as ice cream. We are not built to want ice cream. We were built to want strawberries, honey, a gazelle that you killed and cooked … but then ice cream comes along and it fits those taste buds better than anything that existed in the environment that we were optimized over.

On Fridman, 20th March '23:

You can nonetheless imagine that there is this hill climbing process, not like gradient descent, because gradient descent uses calculus, this is just using like, where are you? But still, hill climbing in both cases makes things something better and better over time, in steps, and natural selection was optimizing exclusively for this very simple, pure criterion of inclusive genetic fitness in a very complicated environment. We're doing a very wide range of things and solving a wide range of problems led to having more kids, and this got you humans which had no internal notion of inclusive genetic fitness until thousands of years later, when they were actually figuring out what had even happened, and no desire to, no explicit desire to increase inclusive genetic fitness. So from this important case study, we may infer the important fact that if you do a whole bunch of hill climbing on a very simple loss function, at the point where the system's capabilities start to generalize very widely, when it is in an intuitive sense becoming very capable and generalizing far outside the training distribution, we know that there is no general law saying that the system even internally represents, let alone tries to optimize the very simple loss function you are training it on.

(Distinguishing SGD from an evolutionary algorithm with the mention of «calculus» is a bit odd).

And on Twitter, April 24th 2023 :

…for example, hominid evolution falsifies any purported general law along the lines of "hill-climbing optimization for a loss function, to the point where that produces general intelligence, produces robust generalization of the intuitive 'meaning' of the loss function even as the system optimized becomes more intelligent". Humans were optimized purely for inclusive genetic fitness, and we ended up with no built-in internal psychological concept of what that is. When we got smarter, smart enough that condoms were a new option that didn't exist in the ancestral environment / training distribution, we started using condoms. Gradient descent isn't natural selection, but…

It's not just Yudkowsky these days but e.g. Evan Hubinger, AI safety research scientist at Anthropic, the premier alignment-concerned lab, in 2020.

And Yud's Youtube evangelist Rob Miles, Apr 21, 2023:

@ESYudkowsky I propose this as a clearer example to support "Humans are not trying to maximise inclusive genetic fitness even a little bit"

It definitely is the ultimate cause of our motivations, emotions, and values, my point is just that this fact is not sufficient for us to explicitly try to get it

2/2

Note that this evo-talk is nothing new. In 2007, Eliezer wrote Adaptation-Executers, not Fitness-Maximizers:

No human being with the deliberate goal of maximizing their alleles' inclusive genetic fitness, would ever eat a cookie unless they were starving. But individual organisms are best thought of as adaptation-executers, not fitness-maximizers.

…This consequence is directly opposite the key regularity in the long chain of ancestral successes which caused the taste bud's shape. But, since overeating has only recently become a problem, no significant evolution (compressed regularity of ancestry) has further influenced the taste bud's shape.

…Smushing several of the concepts together, you could sort-of-say, "Modern humans do today what would have propagated our genes in a hunter-gatherer society, whether or not it helps our genes in a modern society."

The framing (and snack choice) has subtly changed: back then it was trivial that the «blind idiot god» (New Atheism was still fresh, too) does not optimize for anything and successfully aligns nothing. Back then, Eliezer pooh-poohed gradient descent as well. Now that it's at the heart of AI-as-practiced, evolution is a fellow hill-climbing algorithm that tries very hard to optimize on a loss function yet fails to induce generalized alignment.

I could go on but hopefully we can see that this is a major intuition pump.

It's a bad pump and Evolution is a bad analogy for AGI: inner alignment. Enter Quintin Pope, 13th Aug 2022.

One way people motivate extreme levels of concern about inner misalignment is to reference the fact that evolution failed to align humans to the objective of maximizing inclusive genetic fitness. … Evolution didn't directly optimize over our values. It optimized over our learning process and reward circuitry.

The relationship we want to make inferences about is: - "a particular AI's learning process + reward function + training environment -> the AI's learned values"

I think that "AI learning -> AI values" is much more similar to "human learning -> human values" than it is to "evolution -> human values". Steve Byrnes makes this case in much more detail in his post on the matter [23rd Mar 2021].

Evolution is a bi-level optimization process, with evolution optimizing over genes, and the genes specifying the human learning process, which then optimizes over human cognition. … SGD directly optimizes over an AI’s cognition, just as human within-lifetime learning directly optimizes over human cognition.

Or putting this in the «sharp left turn» frame:

within-lifetime learning happens much, much faster than evolution. Even if we conservatively say that brains do two updates per second, and that a generation is just 20 years long, that means a single person’s brain will perform ~1.2 billion updates per generation. … We don't train AIs via an outer optimizer over possible inner learning processes, where each inner learning process is initialized from scratch, then takes billions of inner learning steps before the outer optimization process takes one step, and then is deleted after the outer optimizer's single step. Such a bi-level training process would necessarily experience a sharp left turn once each inner learner became capable of building off the progress made by the previous inner learner. … However, this sharp left turn does not occur because the inner learning processes suddenly become much better / more foomy / more general in a handful of outer optimization steps.… In my frame, we've already figured out and applied the sharp left turn to our AI systems, in that we don't waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization.

Put another way: it is crucial that SGD optimizes policies themselves, and with smooth, high-density feedback from their performance on the objective function, while evolution random-walks over architectures and inductive biases of policies. An individual model is vastly more analogous to an individual human than to an evolving species, no matter on how many podcasts Yud says «hill climbing». Evolution in principle cannot be trusted to create policies that work robustly out of distribution: it can only search for local basins of optimality that are conditional on the distribution, outside of which adaptive behavior predicated on stupid evolved inductive biases does not get learned. This consideration makes the analogy based on both algorithms being «hill-climbing» deceptive, and regularized SGD inherently a stronger paradigm for OOD alignment.

But Yud keeps making it. When Quintin wrote a damning list of objections to Yud's position (using Bankless episode as a starting point), a month ago, he brought it up in more detail:

This is an argument [Yud] makes quite often, here and elsewhere, and I think it's completely wrong. I think that analogies to evolution tell us roughly nothing about the difficulty of alignment in machine learning.

… Moreover, robust alignment to IGF requires that you even have a concept of IGF in the first place. Ancestral humans never developed such a concept, so it was never useful for evolution to select for reward circuitry that would cause humans to form values around the IGF concept.

[Gradient descent] is different in that it directly optimizes over values / cognition, and that AIs will presumably have a conception of human values during training.

[Ice cream example] also illustrates the importance of thinking mechanistically, and not allegorically.

the reason humans like ice cream is because evolution created a learning process with hard-coded circuitry that assigns high rewards for eating foods like ice cream.

What does this mean for alignment? How do we prevent AIs from behaving badly as a result of a similar "misgeneralization"? What alignment insights does the fleshed-out mechanistic story of humans coming to like ice cream provide?

As far as I can tell, the answer is: don't reward your AIs for taking bad actions.

That's all it would take, because the mechanistic story above requires a specific step where the human eats ice cream and activates their reward circuits.

Compare, Yud'07: «Cognitive causes are ontologically distinct from evolutionary causes. They are made out of a different kind of stuff. Cognitive causes are made of neurons. Evolutionary causes are made of ancestors.» And «DNA constructs protein brains with reward signals that have a long-distance correlation to reproductive fitness, but a short-distance correlation to organism behavior… We, the handiwork of evolution, are as alien to evolution as our Maker is alien to us.»

So how did Yud'23 respond?

This is kinda long.  If I had time to engage with one part of this as a sample of whether it holds up to a counterresponse, what would be the strongest foot you could put forward?

Then he was pitched the evolution problem, and curtly answered the most trivial issue he could instead. «And that's it, I guess».

So the distinction of (what we in this DL era can understand as) learning policies and evolving inductive biases was recognized by Yud as early as in 2007; the concrete published-on-Lesswrong explanation why evolution is a bad analogy for AI training dates to 2021 at the latest; Quintin's analysis is 8+ months old; this hasn't had much effect on Yud's rhetoric about evolution being an important precedent supporting his pessimism, nor on the conviction of believers that his reasoning is sound.

It seems he's just anchored to the point, and strongly feels these issues are all nitpicks, and the argument should still work, one way or another, at least it proves that something-kinda-like-that is likely and therefore doom is still inevitable – even if evolution «does not use calculus», even if the category of «hill-climbing algorithms» is not informative. He barely glanced at what gradient descent does, and concluded that it's an optimization process, thus he's totally right.

People who try sniffing "nobody in alignment understands real AI engineering"... must have never worked in real AI engineering, to have no idea how few of the details matter to the macro arguments. … Or, of course, if they're real AI engineers themselves and do know all those technical details that are obviously not relevant - why, they must be lying, or self-deceiving so strongly that it amounts to other-deception, when they try that particular gambit for bullying and authority-assertion.

His arguments, on the level of pointing at something particular, are purely verbal, not even verbal math. When he uses specific technical terms, they don't necessarily correspond to the discussed issue, and often sound like buzzwords he vaguely associated with it. Sometimes he's demonstrably ignorant about their meaning. The Big Picture conclusion never changes.

Maybe it can't.


This is a sample from dunk on Yud that I drafted over 24 hours of pathological irritation recently. Overall it's pretty mean and unhinged and I'm planning to write something better soon.

Hope this helps.

Bad take, except that MAML also found no purchase, similar to other Levine's ideas.

He directly and accurately describes evolution and its difference from current approaches, but he's aware of a wide range or implementations of meta-learning. In the objections list he literally links to MAML::

I'm a lot more bullish on the current paradigm. People have tried lots and lots of approaches to getting good performance out of computers, including lots of "scary seeming" approaches such as:

1 Meta-learning over training processes. I.e., using gradient descent over learning curves, directly optimizing neural networks to learn more quickly.

2 Teaching neural networks to directly modify themselves by giving them edit access to their own weights.

3 Training learned optimizers - neural networks that learn to optimize other neural networks - and having those learned optimizers optimize themselves.

4 Using program search to find more efficient optimizers.

5 Using simulated evolution to find more efficient architectures.

6 Using efficient second-order corrections to gradient descent's approximate optimization process.

7 Tried applying biologically plausible optimization algorithms inspired by biological neurons to training neural networks.

8 Adding learned internal optimizers (different from the ones hypothesized in Risks from Learned Optimization) as neural network layers.

9 Having language models rewrite their own training data, and improve the quality of that training data, to make themselves better at a given task.

10 Having language models devise their own programming curriculum, and learn to program better with self-driven practice.

11 Mixing reinforcement learning with model-driven, recursive re-writing of future training data.

Mostly, these don't work very well. The current capabilities paradigm is state of the art because it gives the best results of anything we've tried so far, despite lots of effort to find better paradigms.

And the next paragraph on sharp left turn:

In my frame, we've already figured out and applied the sharp left turn to our AI systems, in that we don't waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization. For a given compute budget, the best (known) way to buy capabilities is to train a single big model in accordance with empirical scaling laws

Yuddites, on the other hand, mostly aren't aware of any of that. I am not sure they even read press releases.