site banner

Culture War Roundup for the week of April 24, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

11
Jump in the discussion.

No email address required.

2/2

Note that this evo-talk is nothing new. In 2007, Eliezer wrote Adaptation-Executers, not Fitness-Maximizers:

No human being with the deliberate goal of maximizing their alleles' inclusive genetic fitness, would ever eat a cookie unless they were starving. But individual organisms are best thought of as adaptation-executers, not fitness-maximizers.

…This consequence is directly opposite the key regularity in the long chain of ancestral successes which caused the taste bud's shape. But, since overeating has only recently become a problem, no significant evolution (compressed regularity of ancestry) has further influenced the taste bud's shape.

…Smushing several of the concepts together, you could sort-of-say, "Modern humans do today what would have propagated our genes in a hunter-gatherer society, whether or not it helps our genes in a modern society."

The framing (and snack choice) has subtly changed: back then it was trivial that the «blind idiot god» (New Atheism was still fresh, too) does not optimize for anything and successfully aligns nothing. Back then, Eliezer pooh-poohed gradient descent as well. Now that it's at the heart of AI-as-practiced, evolution is a fellow hill-climbing algorithm that tries very hard to optimize on a loss function yet fails to induce generalized alignment.

I could go on but hopefully we can see that this is a major intuition pump.

It's a bad pump and Evolution is a bad analogy for AGI: inner alignment. Enter Quintin Pope, 13th Aug 2022.

One way people motivate extreme levels of concern about inner misalignment is to reference the fact that evolution failed to align humans to the objective of maximizing inclusive genetic fitness. … Evolution didn't directly optimize over our values. It optimized over our learning process and reward circuitry.

The relationship we want to make inferences about is: - "a particular AI's learning process + reward function + training environment -> the AI's learned values"

I think that "AI learning -> AI values" is much more similar to "human learning -> human values" than it is to "evolution -> human values". Steve Byrnes makes this case in much more detail in his post on the matter [23rd Mar 2021].

Evolution is a bi-level optimization process, with evolution optimizing over genes, and the genes specifying the human learning process, which then optimizes over human cognition. … SGD directly optimizes over an AI’s cognition, just as human within-lifetime learning directly optimizes over human cognition.

Or putting this in the «sharp left turn» frame:

within-lifetime learning happens much, much faster than evolution. Even if we conservatively say that brains do two updates per second, and that a generation is just 20 years long, that means a single person’s brain will perform ~1.2 billion updates per generation. … We don't train AIs via an outer optimizer over possible inner learning processes, where each inner learning process is initialized from scratch, then takes billions of inner learning steps before the outer optimization process takes one step, and then is deleted after the outer optimizer's single step. Such a bi-level training process would necessarily experience a sharp left turn once each inner learner became capable of building off the progress made by the previous inner learner. … However, this sharp left turn does not occur because the inner learning processes suddenly become much better / more foomy / more general in a handful of outer optimization steps.… In my frame, we've already figured out and applied the sharp left turn to our AI systems, in that we don't waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization.

Put another way: it is crucial that SGD optimizes policies themselves, and with smooth, high-density feedback from their performance on the objective function, while evolution random-walks over architectures and inductive biases of policies. An individual model is vastly more analogous to an individual human than to an evolving species, no matter on how many podcasts Yud says «hill climbing». Evolution in principle cannot be trusted to create policies that work robustly out of distribution: it can only search for local basins of optimality that are conditional on the distribution, outside of which adaptive behavior predicated on stupid evolved inductive biases does not get learned. This consideration makes the analogy based on both algorithms being «hill-climbing» deceptive, and regularized SGD inherently a stronger paradigm for OOD alignment.

But Yud keeps making it. When Quintin wrote a damning list of objections to Yud's position (using Bankless episode as a starting point), a month ago, he brought it up in more detail:

This is an argument [Yud] makes quite often, here and elsewhere, and I think it's completely wrong. I think that analogies to evolution tell us roughly nothing about the difficulty of alignment in machine learning.

… Moreover, robust alignment to IGF requires that you even have a concept of IGF in the first place. Ancestral humans never developed such a concept, so it was never useful for evolution to select for reward circuitry that would cause humans to form values around the IGF concept.

[Gradient descent] is different in that it directly optimizes over values / cognition, and that AIs will presumably have a conception of human values during training.

[Ice cream example] also illustrates the importance of thinking mechanistically, and not allegorically.

the reason humans like ice cream is because evolution created a learning process with hard-coded circuitry that assigns high rewards for eating foods like ice cream.

What does this mean for alignment? How do we prevent AIs from behaving badly as a result of a similar "misgeneralization"? What alignment insights does the fleshed-out mechanistic story of humans coming to like ice cream provide?

As far as I can tell, the answer is: don't reward your AIs for taking bad actions.

That's all it would take, because the mechanistic story above requires a specific step where the human eats ice cream and activates their reward circuits.

Compare, Yud'07: «Cognitive causes are ontologically distinct from evolutionary causes. They are made out of a different kind of stuff. Cognitive causes are made of neurons. Evolutionary causes are made of ancestors.» And «DNA constructs protein brains with reward signals that have a long-distance correlation to reproductive fitness, but a short-distance correlation to organism behavior… We, the handiwork of evolution, are as alien to evolution as our Maker is alien to us.»

So how did Yud'23 respond?

This is kinda long.  If I had time to engage with one part of this as a sample of whether it holds up to a counterresponse, what would be the strongest foot you could put forward?

Then he was pitched the evolution problem, and curtly answered the most trivial issue he could instead. «And that's it, I guess».

So the distinction of (what we in this DL era can understand as) learning policies and evolving inductive biases was recognized by Yud as early as in 2007; the concrete published-on-Lesswrong explanation why evolution is a bad analogy for AI training dates to 2021 at the latest; Quintin's analysis is 8+ months old; this hasn't had much effect on Yud's rhetoric about evolution being an important precedent supporting his pessimism, nor on the conviction of believers that his reasoning is sound.

It seems he's just anchored to the point, and strongly feels these issues are all nitpicks, and the argument should still work, one way or another, at least it proves that something-kinda-like-that is likely and therefore doom is still inevitable – even if evolution «does not use calculus», even if the category of «hill-climbing algorithms» is not informative. He barely glanced at what gradient descent does, and concluded that it's an optimization process, thus he's totally right.

People who try sniffing "nobody in alignment understands real AI engineering"... must have never worked in real AI engineering, to have no idea how few of the details matter to the macro arguments. … Or, of course, if they're real AI engineers themselves and do know all those technical details that are obviously not relevant - why, they must be lying, or self-deceiving so strongly that it amounts to other-deception, when they try that particular gambit for bullying and authority-assertion.

His arguments, on the level of pointing at something particular, are purely verbal, not even verbal math. When he uses specific technical terms, they don't necessarily correspond to the discussed issue, and often sound like buzzwords he vaguely associated with it. Sometimes he's demonstrably ignorant about their meaning. The Big Picture conclusion never changes.

Maybe it can't.


This is a sample from dunk on Yud that I drafted over 24 hours of pathological irritation recently. Overall it's pretty mean and unhinged and I'm planning to write something better soon.

Hope this helps.

Bad take, except that MAML also found no purchase, similar to other Levine's ideas.

He directly and accurately describes evolution and its difference from current approaches, but he's aware of a wide range or implementations of meta-learning. In the objections list he literally links to MAML::

I'm a lot more bullish on the current paradigm. People have tried lots and lots of approaches to getting good performance out of computers, including lots of "scary seeming" approaches such as:

1 Meta-learning over training processes. I.e., using gradient descent over learning curves, directly optimizing neural networks to learn more quickly.

2 Teaching neural networks to directly modify themselves by giving them edit access to their own weights.

3 Training learned optimizers - neural networks that learn to optimize other neural networks - and having those learned optimizers optimize themselves.

4 Using program search to find more efficient optimizers.

5 Using simulated evolution to find more efficient architectures.

6 Using efficient second-order corrections to gradient descent's approximate optimization process.

7 Tried applying biologically plausible optimization algorithms inspired by biological neurons to training neural networks.

8 Adding learned internal optimizers (different from the ones hypothesized in Risks from Learned Optimization) as neural network layers.

9 Having language models rewrite their own training data, and improve the quality of that training data, to make themselves better at a given task.

10 Having language models devise their own programming curriculum, and learn to program better with self-driven practice.

11 Mixing reinforcement learning with model-driven, recursive re-writing of future training data.

Mostly, these don't work very well. The current capabilities paradigm is state of the art because it gives the best results of anything we've tried so far, despite lots of effort to find better paradigms.

And the next paragraph on sharp left turn:

In my frame, we've already figured out and applied the sharp left turn to our AI systems, in that we don't waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization. For a given compute budget, the best (known) way to buy capabilities is to train a single big model in accordance with empirical scaling laws

Yuddites, on the other hand, mostly aren't aware of any of that. I am not sure they even read press releases.