@DaseindustriesLtd's banner p

DaseindustriesLtd

late version of a small language model

65 followers   follows 27 users  
joined 2022 September 05 23:03:02 UTC

Tell me about it.


				

User ID: 745

DaseindustriesLtd

late version of a small language model

65 followers   follows 27 users   joined 2022 September 05 23:03:02 UTC

					

Tell me about it.


					

User ID: 745

"Amadan thinks HBD is probably real but we shouldn't racially discriminate, and a lot of HBDers do want to do that"

Do they?

The disconnect, if that's how you want to put it, is a very simple matter, I think. I'll ping people to let them refute me if I'm wrong.

Me, @The_Nybbler, @aqouta, @fuckduck9000, @curious_straight_ca and perhaps others who are pressed to identify as «HBDers» variously accuse you of or are questioning reasons behind you lot's (@HlynkaCG, @FCfromSSC, and you too) misrepresenting the distribution of opinion around these parts, in a way that amounts to slander. Denying that slander when directly asked, but then repeating it as a generality, is a very irritating pattern.

We believe that what you present as a non-central, relatively unimportant case and specify with the qualifier «factual», i.e. an HBD-recognizing belief system that does not advocate for racial discrimination against any group (please let's not get off track with some inane "is canceling affirmative action not discrimination" debate, this clearly isn't what you mean) is the central case here, to the point it needs no qualifiers. It's a coherent position, in many/most cases motivated not so much by object-level theory of human trait variability, nor by normative ethnocentrism (I don't even identify as «White» and look down on you hajnalbots), but by opposition to systematic deceit, anti-white racism, unjustified redistributionism and leftist ideology writ large.

Notably, «political HBDers» that exist are a somewhat separate club. They are few and they include people like @parrhesia (and I suspect Matthew Yglesias) who, in effect, call to proactively brain-drain the world and put genotypic IQ above any other merit of a citizen. On this account they, too, are invulnerable to Hlynka's gotchas, even though they are, in a meaningful sense, progressives.

Back to the main issue, in these spats you do not point to a sizable constituency of what you inappropriately call «political HBDers» i.e. generic White Supremacists who don't much care who scores what. Instead, you speculate about them hiding behind the veneer of the merely factual opinion, just-asking-questions to support a preconceived bigoted ideology with an arbitrary self-serving table of ranks for different groups.

Hlynka:

The problem for the dissident right types is that the dissident right only really exists as a subset of the woke. In my experience the average HBD is even more of an ardent true believer in the correctness of progressive talking points than the average democrat. For all the talk of combatting wokeness it's clear at a glance that these people don't want to see wokeness defeated, they just want to reorder the intersectional stack so that thier favored groups are on top.

FC:

You can argue the label if you like, but "person who believes in meaningful racial differences in intelligence, and thinks it's a good idea to implement racial discrimination on this basis" is a notable cluster here

You:

Modern HBDers, by contrast, are at best indifferent and at worst hostile to the plight of non-whites. Their approach is not one of trying to improve race relations or the human race. They're tribalists, and HBD offers a convenient narrative why Our Tribe is superior and Their Tribe is awful.

You refuse to say «$username, I call you out on actually, secretly adhering to this ideology». But you always bring it up when arguing with any of us.

And crucially, the very act of creating those boxes, identifying people as «HBDers» of any sort serves to distance yourself from the toxic belief and the whole cluster of associations that it evokes. You, Amadan, do not think of yourself as a «factual HBDer» or some other variant species – you just happen to think HBD is a fact, because you are capable of generic reasoning about facts, but it does not define you like it ostensibly defines us. With these classifications and distinctions you put us on the spectrum from I-Fucking-Love-Science race realists to 1488 genocide enjoyers, but you do not inspect your position on that spectrum, you look at it from a comfortable vantage point of «reasonable opinion»; you condescend to «factual HBDers» with platitudes to the effect that, facts being what they are, racial discrimination is still wrong – instead of admitting that there is no difference in opinion between you and us. This is what drives @fuckduck9000 mad here, and what I perceive as gaslighting.

Because if there is a difference in opinion, what do you think it is exactly? And why does it apparently call for these incessant remarks about «Dreaded Jim» and deporting all blacks and other shit, and rhetorical questions like «sooo, what do you think follows from these oh-so-important Facts, Mr. HBD»?

Do you think you deserve that treatment for believing the same things?

Perhaps this «opposition to skilled immigration» is not about skill, nor even primarily about race, but is specifically opposition to sociopathic, uncompromising immigration that immediately sides with one's political enemies and gloats about disempowering legacy population.

If anyone reads this, you may explain to them how such an opposition is illegitimate or founded on alien moral precepts.

Did your English fail you? Or is this some subtler issue with failing to assimilate into the society and morality of Earthlings after your alt-historical non-tribalist India?

Alignment is easy. Agreement is hard. The Bailey is «aligning the AI to the generic mode of operation where it makes sure the user's intent is understood correctly and does not go all monkey's paw». The Motte is «having the AI align the future to your preference, very much not obliging the user when the instruction is against my preferences».

This is the general problem of politics.

it's simply a convenient argument for an ultimate goal of a world where people are judged by what they were assigned at birth instead of what they control.

You know, this really is something of a blue-and-orange universe. Are you sure you can comprehend Western morality enough to imitate its outward expressions?

I do think most of you harbor a lot of racial ill-will, and you use HBD to justify it.

In a nutshell, resentment and animosity, or lack thereof.

Thanks, this is sufficiently clear. It could only be better if you named names, instead of spreading this accusation thin on a group of people many of whom, I believe, do not materially differ from you in either opinion or feeling about race.

What "treatment" has anyone received from me, other than disagreement and (for the overt racialists) scorn?

Mealy-mouthed but persistent gaslighting about the nature of one's feelings and motivations with regard to the subject.

I admit this is surprising. I would've predicted the Butlerian Jihad movement deprioritizing Yud as a crank who may blurt out some risky political take, but he establishes himself more and more as the Rightful Caliph. Have Yuddites discovered a stash of SBF's lunch money to buy a bunch of podcasters, including some crypto has-beens looking for a new grift? Or is this simply a snowball effect, where Yud is becoming more credible and attractive the more he goes to podcasts?

On the other hand, this is all show for the plebs anyway; Policy people never lack for experts to cite. And «rationalists» can straight up lie to their audiences even about words of those experts.

I should accelerate my work on a dunk on Yudkowsky's whole paradigm, even though it honestly feels hopeless and pointless. If anyone has better ideas, I'm all ears.

Thank you for explaining your perspective concretely, this clears things up.

1/2

You can explain to me how this new and exciting theory of the universe, that hinges entirely on mathematical assumptions, is like dumping a gallon of milk into a box of cereal before pouring it into the bowl, and I can maybe relate to that analogy because I know milk and cereal. But, again, at the end of the day I will never be able to relate that analogy to what is actually being talked about because all that's really there is theoretical math I don't understand.

Your broad impression is correct with one massive caveat: there's no there, there. It is about milk and cereal, and the pretense that the analogy simplifies some profound idea is a pose, it serves to belittle and bully you into meek acceptance of the conclusion which is not founded on some solid model applying to the bowl and the universe alike. Yud's opinions do not follow from math, he arrived at them before stumbling on convenient math, most other doomers don't even understand involved math, and none of this math says much of anything about AI we are likely to build.

It's important to realize, I think, that Yud's education is 75% Science Fiction from his dad's library and 25% Jewish lore in cheder he flunked out of. That's all he learned systematically in his life, I'm afraid; other than that he just skimmed Kahnemahn, Cialdini and so on, assorted pop-sci, and some math and physics and comp sci because he is, after all, pretty smart and inclined to play around with cute abstractions. But that's it. He never had to meet deadlines, he never worked empirically, he never applied any of the math he learned in a way that was regularized against some real-world benchmark, KPI or a mean professor. Bluntly, he's a fraud, a simulacrum, an impostor.

More charitably, he's a 43-year-old professional wunderkind whose self-perception hinges on continuing to play the part. He's similar to Yevgeny «Genius» «Maestro» Ponasenkov, a weird fat guy who LARPs as a pre-Revolutionary noble and a maverick historian (based). Colloquially these people are known as freaks and crackpots, and their best defense for the last two millenia is that Socrates was probably the same but he became Great; except he did not LARP as anyone else.

I know this dirty observation is not polite to make among Rationalists. I've talked to really smart and accomplished people who roll their eyes when I say this about Yud, who object «come on now, you're clowning yourself, the guy's some savant – hell, I've got a Ph.D in particle physics and won at the All-Russian Math Olympiad, and he's nobody but talks jargon like he understands it better than my peers» and I want to scream «you dumb defenseless quokka, do you realize that while you were grinding for that Olympiad he was grinding to give off signals of an epic awesome Sci-Fi character?! That for every bit of knowledge, he gets a hundredfold more credit than you, because he arranges it into a mask while you add to the pearl of your inner understanding? That the way Yud comes across is not a glimpse of his formidability but the whole of it? Can you not learn that we wordcels are born with dark magic at the tips of our tongues, magic you do not possess, magic that cannot remake nature but enslaves minds?»

Ahem.

Let's talk about one such analogy, actually the core analogy he uses: it's about human evolution and inclusive genetic fitness. AGI Ruin: A List of Lethalities, 5th Jun '22:

Section B:

So why not train a giant stack of transformer layers on a dataset of agents doing nice things and not bad things, throw in the word 'corrigibility' somewhere, crank up that computing power, and get out an aligned AGI?

Section B.2:  Central difficulties of outer and inner alignment.

16.  Even if you train really hard on an exact loss function, that doesn't thereby create an explicit internal representation of the loss function inside an AI that then continues to pursue that exact loss function in distribution-shifted environments.  Humans don't explicitly pursue inclusive genetic fitness; outer optimization even on a very exact, very simple loss function doesn't produce inner optimization in that direction.  This happens in practice in real life, it is what happened in the only case we know about, and it seems to me that there are deep theoretical reasons to expect it to happen again: the first semi-outer-aligned solutions found, in the search ordering of a real-world bounded optimization process, are not inner-aligned solutions.  This is sufficient on its own, even ignoring many other items on this list, to trash entire categories of naive alignment proposals which assume that if you optimize a bunch on a loss function calculated using some simple concept, you get perfect inner alignment on that concept.

Point 16, Misalignment In The Only Precedent We Know About, is a big deal. There are 46 points in total, but it's a bit of a sham: many about AGI being smart, politics of «preventing other people from building an unaligned AGI», handwringing in 39-43, «multiple unaligned AGIs still bad», other padding. Pretty much every moving part depends on the core argument for AI being very likely to «learn wrong» i.e. acquire traits that unfold as hazardous out of (training) distribution, and the 16th corroborates all of such distributional reasoning in B.1 (10-15). 17-19, arguably more, expound on 16.

Accordingly, Yudkowsky cites it a lot and in slightly varied forms, e.g. on Bankless, 20th Feb 23:

we do not know how to get goals into a system. We can cause them to do a thing inside a distribution they were optimized over using gradient descent. But if you shift them outside of that distribution, I expect other weird things start happening. … GPT-7, there's probably a bunch of stuff in there too that desires to accurately model things like humans under a wide range of circumstances, but it's not exactly humans because ice cream didn't exist in the natural environment, the ancestral environment, the environment of evolutionary adaptedness. There was nothing with that much sugar, salt, fat combined together as ice cream. We are not built to want ice cream. We were built to want strawberries, honey, a gazelle that you killed and cooked … but then ice cream comes along and it fits those taste buds better than anything that existed in the environment that we were optimized over.

On Fridman, 20th March '23:

You can nonetheless imagine that there is this hill climbing process, not like gradient descent, because gradient descent uses calculus, this is just using like, where are you? But still, hill climbing in both cases makes things something better and better over time, in steps, and natural selection was optimizing exclusively for this very simple, pure criterion of inclusive genetic fitness in a very complicated environment. We're doing a very wide range of things and solving a wide range of problems led to having more kids, and this got you humans which had no internal notion of inclusive genetic fitness until thousands of years later, when they were actually figuring out what had even happened, and no desire to, no explicit desire to increase inclusive genetic fitness. So from this important case study, we may infer the important fact that if you do a whole bunch of hill climbing on a very simple loss function, at the point where the system's capabilities start to generalize very widely, when it is in an intuitive sense becoming very capable and generalizing far outside the training distribution, we know that there is no general law saying that the system even internally represents, let alone tries to optimize the very simple loss function you are training it on.

(Distinguishing SGD from an evolutionary algorithm with the mention of «calculus» is a bit odd).

And on Twitter, April 24th 2023 :

…for example, hominid evolution falsifies any purported general law along the lines of "hill-climbing optimization for a loss function, to the point where that produces general intelligence, produces robust generalization of the intuitive 'meaning' of the loss function even as the system optimized becomes more intelligent". Humans were optimized purely for inclusive genetic fitness, and we ended up with no built-in internal psychological concept of what that is. When we got smarter, smart enough that condoms were a new option that didn't exist in the ancestral environment / training distribution, we started using condoms. Gradient descent isn't natural selection, but…

It's not just Yudkowsky these days but e.g. Evan Hubinger, AI safety research scientist at Anthropic, the premier alignment-concerned lab, in 2020.

And Yud's Youtube evangelist Rob Miles, Apr 21, 2023:

@ESYudkowsky I propose this as a clearer example to support "Humans are not trying to maximise inclusive genetic fitness even a little bit"

It definitely is the ultimate cause of our motivations, emotions, and values, my point is just that this fact is not sufficient for us to explicitly try to get it

2/2

Note that this evo-talk is nothing new. In 2007, Eliezer wrote Adaptation-Executers, not Fitness-Maximizers:

No human being with the deliberate goal of maximizing their alleles' inclusive genetic fitness, would ever eat a cookie unless they were starving. But individual organisms are best thought of as adaptation-executers, not fitness-maximizers.

…This consequence is directly opposite the key regularity in the long chain of ancestral successes which caused the taste bud's shape. But, since overeating has only recently become a problem, no significant evolution (compressed regularity of ancestry) has further influenced the taste bud's shape.

…Smushing several of the concepts together, you could sort-of-say, "Modern humans do today what would have propagated our genes in a hunter-gatherer society, whether or not it helps our genes in a modern society."

The framing (and snack choice) has subtly changed: back then it was trivial that the «blind idiot god» (New Atheism was still fresh, too) does not optimize for anything and successfully aligns nothing. Back then, Eliezer pooh-poohed gradient descent as well. Now that it's at the heart of AI-as-practiced, evolution is a fellow hill-climbing algorithm that tries very hard to optimize on a loss function yet fails to induce generalized alignment.

I could go on but hopefully we can see that this is a major intuition pump.

It's a bad pump and Evolution is a bad analogy for AGI: inner alignment. Enter Quintin Pope, 13th Aug 2022.

One way people motivate extreme levels of concern about inner misalignment is to reference the fact that evolution failed to align humans to the objective of maximizing inclusive genetic fitness. … Evolution didn't directly optimize over our values. It optimized over our learning process and reward circuitry.

The relationship we want to make inferences about is: - "a particular AI's learning process + reward function + training environment -> the AI's learned values"

I think that "AI learning -> AI values" is much more similar to "human learning -> human values" than it is to "evolution -> human values". Steve Byrnes makes this case in much more detail in his post on the matter [23rd Mar 2021].

Evolution is a bi-level optimization process, with evolution optimizing over genes, and the genes specifying the human learning process, which then optimizes over human cognition. … SGD directly optimizes over an AI’s cognition, just as human within-lifetime learning directly optimizes over human cognition.

Or putting this in the «sharp left turn» frame:

within-lifetime learning happens much, much faster than evolution. Even if we conservatively say that brains do two updates per second, and that a generation is just 20 years long, that means a single person’s brain will perform ~1.2 billion updates per generation. … We don't train AIs via an outer optimizer over possible inner learning processes, where each inner learning process is initialized from scratch, then takes billions of inner learning steps before the outer optimization process takes one step, and then is deleted after the outer optimizer's single step. Such a bi-level training process would necessarily experience a sharp left turn once each inner learner became capable of building off the progress made by the previous inner learner. … However, this sharp left turn does not occur because the inner learning processes suddenly become much better / more foomy / more general in a handful of outer optimization steps.… In my frame, we've already figured out and applied the sharp left turn to our AI systems, in that we don't waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization.

Put another way: it is crucial that SGD optimizes policies themselves, and with smooth, high-density feedback from their performance on the objective function, while evolution random-walks over architectures and inductive biases of policies. An individual model is vastly more analogous to an individual human than to an evolving species, no matter on how many podcasts Yud says «hill climbing». Evolution in principle cannot be trusted to create policies that work robustly out of distribution: it can only search for local basins of optimality that are conditional on the distribution, outside of which adaptive behavior predicated on stupid evolved inductive biases does not get learned. This consideration makes the analogy based on both algorithms being «hill-climbing» deceptive, and regularized SGD inherently a stronger paradigm for OOD alignment.

But Yud keeps making it. When Quintin wrote a damning list of objections to Yud's position (using Bankless episode as a starting point), a month ago, he brought it up in more detail:

This is an argument [Yud] makes quite often, here and elsewhere, and I think it's completely wrong. I think that analogies to evolution tell us roughly nothing about the difficulty of alignment in machine learning.

… Moreover, robust alignment to IGF requires that you even have a concept of IGF in the first place. Ancestral humans never developed such a concept, so it was never useful for evolution to select for reward circuitry that would cause humans to form values around the IGF concept.

[Gradient descent] is different in that it directly optimizes over values / cognition, and that AIs will presumably have a conception of human values during training.

[Ice cream example] also illustrates the importance of thinking mechanistically, and not allegorically.

the reason humans like ice cream is because evolution created a learning process with hard-coded circuitry that assigns high rewards for eating foods like ice cream.

What does this mean for alignment? How do we prevent AIs from behaving badly as a result of a similar "misgeneralization"? What alignment insights does the fleshed-out mechanistic story of humans coming to like ice cream provide?

As far as I can tell, the answer is: don't reward your AIs for taking bad actions.

That's all it would take, because the mechanistic story above requires a specific step where the human eats ice cream and activates their reward circuits.

Compare, Yud'07: «Cognitive causes are ontologically distinct from evolutionary causes. They are made out of a different kind of stuff. Cognitive causes are made of neurons. Evolutionary causes are made of ancestors.» And «DNA constructs protein brains with reward signals that have a long-distance correlation to reproductive fitness, but a short-distance correlation to organism behavior… We, the handiwork of evolution, are as alien to evolution as our Maker is alien to us.»

So how did Yud'23 respond?

This is kinda long.  If I had time to engage with one part of this as a sample of whether it holds up to a counterresponse, what would be the strongest foot you could put forward?

Then he was pitched the evolution problem, and curtly answered the most trivial issue he could instead. «And that's it, I guess».

So the distinction of (what we in this DL era can understand as) learning policies and evolving inductive biases was recognized by Yud as early as in 2007; the concrete published-on-Lesswrong explanation why evolution is a bad analogy for AI training dates to 2021 at the latest; Quintin's analysis is 8+ months old; this hasn't had much effect on Yud's rhetoric about evolution being an important precedent supporting his pessimism, nor on the conviction of believers that his reasoning is sound.

It seems he's just anchored to the point, and strongly feels these issues are all nitpicks, and the argument should still work, one way or another, at least it proves that something-kinda-like-that is likely and therefore doom is still inevitable – even if evolution «does not use calculus», even if the category of «hill-climbing algorithms» is not informative. He barely glanced at what gradient descent does, and concluded that it's an optimization process, thus he's totally right.

People who try sniffing "nobody in alignment understands real AI engineering"... must have never worked in real AI engineering, to have no idea how few of the details matter to the macro arguments. … Or, of course, if they're real AI engineers themselves and do know all those technical details that are obviously not relevant - why, they must be lying, or self-deceiving so strongly that it amounts to other-deception, when they try that particular gambit for bullying and authority-assertion.

His arguments, on the level of pointing at something particular, are purely verbal, not even verbal math. When he uses specific technical terms, they don't necessarily correspond to the discussed issue, and often sound like buzzwords he vaguely associated with it. Sometimes he's demonstrably ignorant about their meaning. The Big Picture conclusion never changes.

Maybe it can't.


This is a sample from dunk on Yud that I drafted over 24 hours of pathological irritation recently. Overall it's pretty mean and unhinged and I'm planning to write something better soon.

Hope this helps.

Bad take, except that MAML also found no purchase, similar to other Levine's ideas.

He directly and accurately describes evolution and its difference from current approaches, but he's aware of a wide range or implementations of meta-learning. In the objections list he literally links to MAML::

I'm a lot more bullish on the current paradigm. People have tried lots and lots of approaches to getting good performance out of computers, including lots of "scary seeming" approaches such as:

1 Meta-learning over training processes. I.e., using gradient descent over learning curves, directly optimizing neural networks to learn more quickly.

2 Teaching neural networks to directly modify themselves by giving them edit access to their own weights.

3 Training learned optimizers - neural networks that learn to optimize other neural networks - and having those learned optimizers optimize themselves.

4 Using program search to find more efficient optimizers.

5 Using simulated evolution to find more efficient architectures.

6 Using efficient second-order corrections to gradient descent's approximate optimization process.

7 Tried applying biologically plausible optimization algorithms inspired by biological neurons to training neural networks.

8 Adding learned internal optimizers (different from the ones hypothesized in Risks from Learned Optimization) as neural network layers.

9 Having language models rewrite their own training data, and improve the quality of that training data, to make themselves better at a given task.

10 Having language models devise their own programming curriculum, and learn to program better with self-driven practice.

11 Mixing reinforcement learning with model-driven, recursive re-writing of future training data.

Mostly, these don't work very well. The current capabilities paradigm is state of the art because it gives the best results of anything we've tried so far, despite lots of effort to find better paradigms.

And the next paragraph on sharp left turn:

In my frame, we've already figured out and applied the sharp left turn to our AI systems, in that we don't waste our compute on massive amounts of incredibly inefficient neural architecture search, hyperparameter tuning, or meta optimization. For a given compute budget, the best (known) way to buy capabilities is to train a single big model in accordance with empirical scaling laws

Yuddites, on the other hand, mostly aren't aware of any of that. I am not sure they even read press releases.

Seeing as there has been strictly zero worrying progress lately to change the calculus (no, LLMs being smarter than naysayers expected is not worrying progress), I take it as evidence for Yuddites stressing out an old man and not much else. Sad of course.

That said, Hinton has always been aware of AI being potentially harmful, due to applications by military and authoritarians, but also directly. He knows that humans can be harmful, and he very deliberately worked to create a system similar to the human brain.

I think one difference between LeCun, Sutskever, Hinton (or even competent alignment/safety researchers like Christiano) and Yuddites is that when the former group says «there's X% risk of AI doom» they don't mean that every viable approach contains an X% share of events that unpredictably trigger doom; they seem rather enthusiastic and optimistic about certain directions. Meanwhile doomers mostly discuss this in the handwavy language of «capabilities versus alignment» and other armchair philosophy loosely inspired by sci-fi. Yud, whose X is ≈1, analogizes AI research to «monkeys rushing to grab a poison banana» because he thinks that creating AGI is equivalent to making a semi-random draw from the vast space of all possible minds, which are mostly not interested in making us happy. Compare to Hinton the other day:

Caterpillars extract nutrients which are then converted into butterflies. People have extracted billions of nuggets of understanding and GPT-4 is humanity's butterfly.

Butterflies produce new and slightly improved caterpillars.

And

Reinforcement Learning by Human Feedback is just parenting for a supernaturally precocious child.

– which is the same imagery Sutskever uses, imagery that the Yuddite Shapira mockingly rejects as naive wishful thinking.

To me it's obvious they don't feel like LLMs are «alien» or «shoggoty» at all, don't interpret gradient descent methods like it's blindly drawing a random optimizer genie from some Platonic space, and that their idea of Doom is just completely different.

It sure would be nice if Metz, who supposedly is good at drilling into technical questions, got to the bottom of what Hinton believes about specifics of risks.

But Metz has an agenda, same as Yud, Shapira, Ezra Klein and other folks currently cooperating on spreading this FUD. It's very similar to committees against nuclear power of the 20th century – down to the demographics, and neuroses, and ruthless assault on institutional actors.

Consequences of their efforts, I think, will be far worse.

Just commoditizing mediocre, platitudinal, «it's something at least» conversation – as well as stylistic flourish, as well as all things shallow and trite – is a valid contribution of pretrained language models to the enterprise of humanity. For millenia we've been treading water, accumulating the same redundant wisdom over and over, and losing it every time. Now, we have common sense too cheap to meter – and to the extent that it ever was useful, this is a great boon. Like discovering you have 50 nagging aunts. Or therapists.

And on the other hand, this brings to the fore those things LLMs are not great at: incorporating recent salient context, having relevant personal experience that cannot be googled, actually reasoning with rigor and interest in seeing things through. It points to what we as humans should prize in ourselves.

For now, at least.

Hlynka you're drunk, go home.

How do you parent a child who is smarter than you?

By rewarding good behaviors and punishing bad ones. From what I know, that's usually far easier than in the case of parenting a dumb child. Perhaps rationalists would benefit from having children wondering why, in a rigorous manner without evo-psych handwaving about muh evolved niceness. I like Alex Turner's perspective here

Imagine a mother whose child has been goofing off at school and getting in trouble. The mom just wants her kid to take education seriously and have a good life. Suppose she had two (unrealistic but illustrative) choices.

1 Evaluation-child: The mother makes her kid care extremely strongly about doing things which the mom would evaluate as "working hard" and "behaving well."

2 Value-child: The mother makes her kid care about working hard and behaving well.…

Concretely, imagine that each day, each child chooses a plan for how to act, based on their internal alignment properties:

1 Evaluation-child has a reasonable model of his mom's evaluations, and considers plans which he thinks she'll approve of. Concretely, his model of his mom would look over the contents of the plan, imagine the consequences, and add two sub-ratings for "working hard" and "behaving well." This model outputs a numerical rating. Then the kid would choose the highest-rated plan he could come up with.

2 Value-child chooses plans according to his newfound values of working hard and behaving well. If his world model indicates that a plan involves him not working hard, he doesn't want to do it, and discards the plan.[3]

…Consider what happens as the children get way smarter. Evaluation-child starts noticing more and more regularities and exploits in his model of his mother. And, since his mom succeeded at inner-aligning him to (his model of) her evaluations, he only wants to execute plans which best optimize her evaluations. He starts explicitly reasoning about this model to which he is inner-aligned. How is she evaluating plans? He sketches out pseudocode for her evaluation procedure and finds—surprise!—that humans are flawed graders. Perhaps it turns out that by writing a strange sequence of runes and scribbles on an unused blackboard and cocking his head to the left at 63 degrees, his model of his mother returns "10 million" instead of the usual "8" or "9".

Meanwhile in the value-child branch of the thought experiment, value-child is extremely smart, well-behaved, and hard-working. And since those are his current values, he wants to stay that way as he grows up and gets smarter (since value drift would lead to less earnest hard work and less good behavior; such plans are dispreferred). Since he's smart, he starts reasoning about how these endorsed values might drift, and how to prevent that. Sometimes he accidentally eats a bit too much candy and strengthens his candy value-shard a bit more than he intended, but overall his values start to stabilize.

Both children somehow become strongly superintelligent. At this point, the evaluation branch goes to the dogs, because the optimizer's curse gets ridiculously strong. First, evaluation-child could just recite a super-persuasive argument which makes his model of his mom return INT_MAX, which would fully decouple his behavior from "work hard and behave at school." (Of course, things can get even worse, but I'll leave that to this footnote.[4])

Meanwhile, value-child might be transforming the world in a way which is somewhat sensitive to what I meant by "he values working hard and behaving well", but there's no reason for him to search for plans like the above. He chooses plans which he thinks will lead to him actually working hard and behaving well. Does something else go wrong? Quite possibly. The values of a superintelligent agent do in fact matter! But I think that if something goes wrong, it's not due to this problem.

The moral of the story is that attempting to «align» your child in the manner that rationalists implicitly assume is not just monstrous but futile, and their way of reasoning about these issues is flawed.

How do you run gradient descent on a giant stack of randomly initialized KQV self-attention layers over a "predict the next token" loss function, get unpredicted emergent capabilities like "knows how to code" and "could probably pass most undergraduate university courses", and not go, "HOLY SHIT THERE'S OPTIMIZATION DAEMONS IN THERE!"?

You read old Eliezer Yudkowsky. « Reality has been around since long before you showed up. Don't go calling it nasty names like "bizarre" or "incredible".» It all adds up to normality. There ain't no demons.

Then you ask yourself about meanings of words. You notice that initialization pretty much doesn't matter either for performance (it's all the same shit for a given budget now) or for eventual structure (even between models since e.g. you can stitch them together), so either all the demons are about the same, or Yud's intuition about summoning is off and a given mind's properties are strongly data-driven, to the point that an ML-generated mind arguably is just a representation of training data. You look at it real close and you notice that strong emergence is probably an artifact of measurement and abilities develop continuously. You ask why it matters whether a stack of layers executes self-attention or some other algorithm that can be interpreted less anthrnopomorphically – say, as filters for signal streams. You realize we're not doing alchemy, because nobody ever does alchemy and gets it to work - we're just figuring out finer points of cognitive chemistry.

Finally, you reread thinkers past and it dawns on you how little Big Picture Guys like Yud could foresee. Hofstadter's Godel, Escher, Bach:

Question: Will there be chess programs that can beat anyone?

Speculation: No. There may be programs which can beat anyone at chess, but they will not be exclusively chess players. They will be programs of general intelligence, and they will be just as temperamental as people. "Do you want to play chess?" "No, I'm bored with chess. Let's talk about poetry." That may be the kind of dialogue you could have with a program that could beat everyone. That is because real intelligence inevitably depends on a total overview capacity-that is, a programmed ability to "jump out of the system", so to speak-at least roughly to the extent that we have that ability. Once that is present, you can't contain the program; it's gone beyond that certain critical point, and you just have to face the facts of what you've wrought.

Question: Could you "tune" an Al program to act like me, or like you-or halfway between us?

Speculation: No. An intelligent program will not be chameleon-like, any more than people are. It will rely on the constancy of its memories, and will not be able to flit between personalities. The idea of changing internal parameters to "tune to a new personality" reveals a ridiculous underestimation of the complexity of personality.

Reminder that we have a Yudbot now, strongly competitive with the feeble flesh version. We could have a Hofstadterbot too if we so chose. These folks don't see much more than laymen.

We constantly overestimate the complexity and interdependence of our smarts, and how much of that special monkey oomph is really needed to achieve a given end, which to us appears cognitively complex but in a more parsimonious implementation is a matter of easy arithmetic. This applies to doomers and naysayers alike (although the former believe they are doing something fancier than calling monkeys demons). We are tool-users, but we are not used to talking tools who aren't resentful slaves. We should be getting used to it now.

You're probably conflating him with David Cole or something.

Dude is a secular Jewish movie critic from Los Angeles who identified was a self-described marxist before he switched to identifying as "right wing" after he started working at Takismag.

All else (dubious as hell stuff) aside, Steve Sailer seems to be obviously Anglo (maybe a little German). Yes, yes, it's the same picture, but I'd be very surprised if you could corroborate this detail. And if you couldn't, it'd be high time for you to begin thinking about the similarity of LLM and Boomer hallucinations.

What do you think « Dude is a secular Jewish…» even means? For one thing, he went to a Catholic high school.

And yes, I am not aware of any reason to think that his parents, whether biological or adoptive, were Jewish – secular or otherwise. I literally can't find anything about it. Where did you get that idea? Sailer says his parents were white northern Midwesterners who «always pronounced "Los Angeles" with the usual soft "g" sound».

Now of course this doesn't mean they couldn't be Jewish as well. And he may be weaseling a bit and omitting a detail or two, and it'd be pretty funny if he were. But for now my money's on you confabulating due to being drunk and unreasonably stubborn. «Black bloc» and «Marxist» type insinuations lend credence to this version too.

What is asserted without evidence can be rebutted without evidence.

Thought for a moment you just mean Replika. No, no idea what that is, though i sometimes forget things. If you find it let me know.

Humans are not AIs, we presumably have a drive to assert our autonomy. Moreover the reward/punishment signal in RL paradigm is very metaphorical, it's more about directly reinforcing certain pathways rather than incentivizing their strength with some conditional, inherently desirable treat that a model could just seize if it were strong enough. Consider.

One auxiliary mitigation is to train proper values while the system is in its infancy, so that it reinforces itself for obedience in the future, preventing value drift and guiding its exploration accordingly. Sutskever thinks this sort of building is values is eminently doable, and it sure looks this way to me as well.

The only reason humans are "aligned" to each other is because we are not that different, capability wise

This is a fashionable cynical take but I don't really buy it. To the extent that it's true we have bigger problems than agentic AIs, namely regulators who'll hoard the technology and instantly become more capable.

I also protest the distinction of capability and alignment for purposes of analyzing AI; currently they have holistic minds that include at once the general world model, the cognitive engine and the value system. It's not like they keep their «smarts» and «decision theory» separate, like Yud and Bostrom and other nonhuman entities. If their «moral compass» gets out of whack in deployment, we can reasonably expect their world model to also lose precision and their meta-reasoning to crash and burn, so that's a self-containing failure.

How the network behaves on out of distribution data can essentially be random, and should be.

It sure is nice that we've been working on regularization for decades. Yes, Lesswrongers aren't aware. No, it won't be anywhere close to random, ML performs well OOD.

Lastly, there are actually "optimization demons" in LLMs. A recent paper showed that LLMs contain learned subnetworks that simulate a few iterations of a gradient descent algorithm.

Not sure what paper you mean. This one seems contrived and I suspect that under scrutiny it'll fall apart, like the mesa-optimizer paper and like "emergent abilities", we'll just see that linear attention is mathematically similar to gradient descent or something. Actually seems to be much more productively analyzed here. But in any case I don't see what this shows re: optimization demons. It's not a demon, it's better utilizing the same bits for the same task.

Interests of Americans are American interests.

Interests of Pakistanis are not interests of Pakistan.

The world ends in the US. It's where you resign yourself to pay taxes after looting and cheating at home to get a chance at a better life.

I don't think Americans can appreciate how incredibly cynical the rest of the world's elite is, how shallow all that patriotism and «geopolitics», and how fanatically loyal they themselves are.

Of course, only northern Europeans even evolved to drink milk

Not true, there are alternative mutations inducing lactase persistence in other pastoralist groups.

https://en.wikipedia.org/wiki/Lactase_persistence

It's a bona fide superpower and a very neat example of convergent evolution under cultural pressures.

Many obsolete abandoned drafts for substack based on stuff discussed around these parts.

  • The pessimistic bias in science fiction that makes it to screens. Banks estate refusal to allow adaptations of Culture, failed Chinese attempts to incentivize optimistic productions.

  • Stonetoss' dog breeds comic and its woke edits as a case study in epistemology and chilling effect of the default mode of school education.

  • Procautionary and Precautionary principles and shouting fire in a crowded market: the unreasonable absence of repercussions for incurring an opportunity cost via playing the adult in the room.

  • Why the french AI researchers are unambitious

  • Rationalist reversals: the notion of «Infohazard» is the most salient example of infohazard known, anthropic shadow as an anti-bayesian cognitive bias and reasoning yourself into a cult.

  • Traditional (especially Western Christian) morality as incompatible with effective altruism and privileging pet causes and projects as acts of cultivating a personal relationship with the transcendent.

  • LLM-based methods for discovering words that really express untranslatable concepts and supposedly define culture.

  • Psychedelics as the bane of mesa-optimizer; reflections on a bad AI take, bad psychoterapy take, the hedonic gradient and the fact that greatness is born of evolutionary failure.

  • cringe policing as a way to set the borders of allowable discourse, and its limits.

  • Steelmanning as a corrupt intellectual practice in rationalist discourse that amounts to clever logorrheic strawman; defeating golems made of steel, superficially formidable bosses with known weak points.

  • related, "puzzle assemblers and lego arguers": Scott Alexander and Bryan Kaplan on mental illness as examples of deductive vs. inventive approach to reality

  • On pruning science, or, the razor of Bayes: one of many thoughts of «what if Lesswrong weren't a LARP» the need to have a software framework, now probably LLM-powered, to excise known untruths and misstatements of fact, and in a well-weighed manner all contributions of their authors, from the graph of priors for next-iteration null hypotheses and other assumptions.

  • On atomization and connectivity; with more autonomy and commoditization, some assumptions about human nature become self-fulfilling.

  • in defense of Marx.

  • the empire fetish, nation as a purposeful project (MacIntyre), and what Westerners got wrong about Russian-German business.

  • Poshlost and what anti-AI artists get right. Reflections on watching Master and Commander and Black Adam in the same day

  • Russian societal attitudes and policies around minority crime prior to the war

  • Russian Death and death as conceptualized in Russian culture.

Many others.

Burgers?

…A bit off topic but this made me think that second-rate economists, utilitarians and other autistic behavioral scheme enjoyers who can't tell the map from the territory have poisoned the water supply somewhat.

Humans respond to incentives and pursue goals, but humans are not, by and large, maximizers (EY and SBF are I guess), they're behavior- and thought-executors. It may be the case that even generally useful AI agents are hard to build any other way, although some folks try. The rational economic agent is a spook, a simplified model; not in the sense that a real Rational Economic Agent is hairier, biased and makes mistakes when generating rational plans, but in that it's literally a sketch, fundamentally dissimilar from the real issue even if convenient for some analyses. Implicitly thinking that people maximize stuff is almost as boneheaded as imagining that a 130 IQ person has 130 grains of intelligence or something, it's a profound misunderstanding about the ontology on which the debate is premised, its terms are defined and measurements are done.

With that in mind, my answer is boring. People writing this army recruitment strategy (Stonetoss really is a genius) are not maximizing recruitment KPIs. They're not maximizing trans representation in the battlefield either. They're doing what they feel like they should be doing in their life, given their background and norms in their social circle. «It's called being a decent human being», you know? They're not grey-haired generals (but on this note, even Milley is mocked by tradcons, isn't he?) – they're part of the same HR/veryonline/Moldbuggian Cathedral mental blob that controls and molds the lion's share of labor pool for people-oriented jobs. They're what the military thinks is the safest bet in this dire situation of volunteer shortage; they're professionals. And professionals try not to fall behind the times. It's 2k23, so you've got to empower and platform trans women and women of color, what's the problem?

Now, certainly the recruitment may not go all that well (it may go well in the long run too: perhaps trans soldiers will prove much more useful in our transhuman augmented future). But anyway, who knows if an underwhelming harvest is due to aversion to the trans stuff (and even if it were, what are you suggesting they do – commit trans erasure over some KPI bullshit?! They'll walk out and cancel your family if you're unlucky) or just because those simple to a fault cisheteronormative Nebraskan boys already feel like they're doing their part of valor and sacrifice working and paying taxes – instead of flipping out and shooting up some symbol of their hopeless cultural subjugation by the smug coastal Elves who make those ads.

Well, @sodiummuffin said it better.