domain:asteriskmag.com
A 50-year old is an X-er, not a boomer, and not even an especially old X-er. The boomers were their parents' generation.
Alright, you've convinced me to give Civ 4 warfare another shot. I'm not exaggerating my experience - I really do remember combat being completely boring and without any nuance in that game - but it was my first Civ so it's certainly possible I overlooked depth to be found in it. Are there any good guides for Civ 4 tactics? I know the game has strategic depth, but something which helps to reveal any tactical depth would be welcome.
Having read many books and papers about the atrocities of communist regimes, lots of brutal and frankly sadistic executions are pretty par for the course. The best books I've read on the topic contain such a large number of casual documentations of atrocities that one feels sick for hours afterwards.
One of the most stomach-churning books I've ever read is about the Great Leap Forward, written by a scholar who had lived through it and somehow toed the party line throughout (realised the whole thing was rotten afterwards). Here is one of the many sections of the book that calmly lists off reams upon reams of atrocities inflicted on the populace:
Excessively high requisition quotas made procurement difficult. If farmers were unable to hand over the required amount, the government would accuse production teams of concealing grain. A “struggle between the two roads” (of socialism and capitalism) was launched to counteract the alleged withholding of grain. This campaign used political pressure, mental torture, and ruthless violence to extort every last kernel of grain or seed from the peasants. Anyone who uttered the slightest protest was beaten, sometimes fatally.
At the end of September 1959, Wang Pinggui, a member of the Wangxiaowan production team, was forced to hand over grain kept in his home, and was beaten with a shoulder pole, dying of his injuries five days later. Not long after Wang’s death, the rest of his four-member household died of starvation.
In October 1959, Luo Mingzhu of the Luowan production team, upon failing to hand over any grain, was bound and suspended in mid-air and beaten, then doused with ice-cold water. He died the next day.
On October 13, 1959, Wang Taishu of the Chenwan production team, upon failing to hand over any grain, was bound and beaten with shoulder poles and rods, dying four days later. His fourteen-year-old daughter, Wang Pingrong, subsequently died of starvation.
On October 15, 1959, Zhang Zhirong of the Xiongwan production team, upon failing to hand over any grain, was bound and beaten to death with kindling and poles. The brigade’s cadre used tongs to insert rice and soya beans into the deceased’s anus while shouting, “Now you can grow grain out of your corpse!” Zhang left behind children aged eight and ten who subsequently died of starvation.
On October 19, 1959, Chenwan production team member Chen Xiaojia and his son Chen Guihou were hung from the beam of the communal dining hall when they failed to hand over any grain. They were beaten and doused with cold water, both dying within seven days. Two small children who survived them eventually died of starvation.
On October 24, 1959, the married couple Zheng Jinhou and Luo Mingying of the Yanwan production team had 28 silver coins seized from their home during the campaign and were beaten to death. Their three children, left without anyone to care for them, starved to death.
On November 8, 1959, Xu Chuanzheng of the Xiongwan production team was falsely accused of withholding grain. He was hung from the beam of the communal dining hall and brutally beaten, dying six days later. The six family members who survived him subsequently starved to death.
On November 8, 1959, Zhong Xingjian of the Yanwan production team was accused of “defying the leadership,” and a cadre hacked him to death with an ax.
And:
In the calamity at Guangshan County’s Huaidian people’s commune in the autumn of 1959, the commune’s average yield per mu was 86 kilos, for a total of 5.955 million kilos. The commune’s party committee reported a yield of 313 kilos per mu, for a total of 23.05 million kilos. The procurement quota set by the county was 6 million kilos, which exceeded the commune’s total grain yield. In order to achieve the procurement quota, every means had to be taken to oppose false reporting and private withholding, and every scrap of food had to be seized from the masses. The final procurement was 5.185 million kilos. All of the communal kitchens were closed down, and deaths followed. Liu Wencai and the commune party committee attributed the kitchen closures and deaths to attacks by well-to-do middle peasants and sabotage by class enemies, and to the struggle between the two paths of socialism and capitalism. They continued the campaign against false reporting and private withholding for eight months. Within sixty or seventy days not a kernel of grain could be found anywhere, and mass starvation followed.
The commune originally numbered 36,691 members in 8,027 households. Between September 1959 and June 1960, 12,134 people died (among them, 7,013 males and 5,121 females), constituting 33 percent of the total population. There were 780 households completely extinguished, making up 9.7 percent of all households. The village of Jiangwan originally had 45 inhabitants, but 44 of them died, leaving behind only one woman in her sixties, who went insane.
There was a total of 1,510 cadres at the commune, brigade, and production team level, and 628, or 45.1 percent, took part in beatings. The number beaten totaled 3,528 (among them 231 cadres), with 558 dying while being beaten, 636 dying subsequently, another 141 left permanently disabled, 14 driven to commit suicide, and 43 driven away.
Apart from the standard abuse of beating, kicking, exposure, and starvation, there were dozens of other extremely cruel forms of torture, including dousing the head with cold water, tearing out hair, cutting off ears, driving bamboo strips into the palms, driving pine needles into the gums, “lighting the celestial lantern,” forcing lit embers into the mouth, branding the nipples, tearing out pubic hair, penetrating the genitals, and being buried alive.
When thirteen children arrived at the commune begging for food, the commune’s party secretary, surnamed Jiang, along with others incited kitchen staff to drag them deep into the mountains, where they were left to die of hunger and exposure.
With no means of escaping a hopeless situation, ordinary people could not adequately look after their own. Families were scattered to the winds, children abandoned, and corpses left along the roadside to rot. As a result of the extreme deprivations of starvation, 381 commune members desecrated 134 corpses.
This is all just from the first chapter.
Given this, not very many of the events that @FCfromSSC has quoted strike me as particularly fantastical. I've stopped reading these since; looking at things like the Khmer Rouge grabbing infants by their legs and smashing their heads against trees until they died (to prevent them from taking revenge for their parents) tends to give one a thousand-mile stare for the ages. It's certainly contributed to my (already intense) misanthropy.
I’m sorry but the way you started off by introducing yourself as an expert qualified in the subject matter, followed by completely incorrect technical explanations, kinda rubbed me the wrong way. To me it came across as someone quite intelligent venturing in a different technical field to their own, skimming the literature, and making authoritatively baseless sweeping claims while not having understood the basics. I’m not a fan of the many of the rarionalists’ approach to AI which I agree can border on science fiction, but you’re engaging in a similar kind of technical misunderstanding, just with a different veneer.
Just a few glaring errors:
LLM stands for "Large Language Model". These models are a subset of artificial neural network that uses "Deep Learning" (essentially a fancy marketing buzzword for the combination of looping regression analysis with back-propagation)
Deep learning may be a buzzword but it’s not looping regression analysis, nor is it limited to backprop. It’s used to refer to sufficiently deep neural works (sometimes that just means more than 2 layers), but the training objective can be classification, regression, adversarial… and you can theoretically use other algorithms than backprop (but that’s mostly restricted to research now).
to encode a semantic token such as the word "cat" as a n-dimensional vector representing that token's relationship to the rest of the tokens in the training data.
Now if what I am describing does not sound like an LLM to you, that is likely because most publicly available "LLMs" are not just an LLM. They are an LLM plus an additional interface layer that sits between the user and the actual language model. An LLM on its own is little more than a tool that turns words into math, but you can combine it with a second algorithm to do things like take in a block of text and do some distribution analysis to compute the most probable next word. This is essentially what is happening under the hood when you type a prompt into GPT or your assistant of choice.
That’s just flat out wrong. Autoregressive LLMs such as GPT or whatnot are not trained to encode tokens into embeddings. They’re decoder models, trained to predict the next token from a context window. There is no “additional interface layer” that gets you words from embeddings, they directly output a probability for each possible next token given a previous block, and you can just pick the highest probable token and directly get meaningful outputs, although in practice you want more sophisticated stochastic samplers than pure greedy decoding.
You can get embeddings from LLMs by grabbing intermediate layers (this is where the deep part of deep learning comes into play, models like llama 70B have 80 layers), but those embeddings will be heavily dependent on the context. These will hold vastly more information than the classic word2vec embeddings you’re talking about.
Maybe you’re confusing the LLM with the tokenizer (which generates token IDs), and what you call the “interface layer” is the actual LLM? I don’t think you’re referring to the sampler, although it’s possible, but then this part confuses me even more:
As an example "Mary has 2 children", "Mary has 4 children", and "Mary has 1024 children" may as well be identical statements from the perspective of an LLM. Mary has a number of children. That number is a power of 2. Now if the folks programming the interface layer were clever they might have it do something like estimate the most probable number of children based on the training data, but the number simply can not matter to the LLM the way it might matter to Mary, or to someone trying to figure out how many pizzas they ought to order for the family reunion because the "directionality" of one positive integer isn't all that different from any another. (This is why LLMs have such difficulty counting if you were wondering)
This is nonsense. Not only is there no “interface layer” being programmed, but 2, 4, 1024 are completely different outputs and will have different probabilities depending on the context. You can try it now with any old model and see that 1024 is the least probable of the three. LLMs entire shtick is outputting the most probable response given the context and the training data, and they have learned some impressive capabilities along the way. The LLMs will absolutely have learned that the probable number of pizzas for a given number of people. They also have much larger context windows (in the millions for Gemini models), although they are not trained to effectively use them and still have issues with recall and logic.
Fundamentally, LLMs are text simulators. Learning the concept of truth is very useful to simulate text, and as @self_made_human noted, there’s research showing they do possess a vector or direction of “truth”, which is quite useful for simulating text. Thinking of the LLM as an entity, or just a next word predictor, doesn’t give you a correct picture. It’s not an intelligence. It’s more like a world engine, where the world is all text, which has been fine tuned to mostly simulate one entity (the helpful assistant), but the LLM isn’t the assistant, the assistant is inside the LLM.
That change is the best change in the game! Warfare is so boring in Civ 4 because there's no gameplay to it, if you have a stack that counters their stack you win.
That doesn't do the combat system in Civ IV justice. Unit types have inherent bonuses and penalties against other units or in specific situations, and can further specialize by taking promotions. An longbowman that is a sitting duck in the field becomes a killing machine with placed behind city walls with the garrison promotions. There is no best unit; every unit has a counter. And huge stacks can get demolished by collateral damage, so you have to make careful decisions about how to split your stacks, whether to attack and if so with what units, whether to take an extra turn lowering a city's defenses but risk more defenders showing up, etc.
And that's all just tactics. Strategy is just as important. You need to decide whether to invade an enemy or defend, how many units to send in an invading stack vs how many units to leave home, which types of units to build, whether to spread out your defenders to cover all of your cities or concentrate them at the most likely point of conflict or concentrate them on your most important cities, and so on. Geography is also surprisingly relevant; the second easiest way to win a war in Civ IV is to defend against an intercontinental assault, because amphibious invasions are hard. You have to decide when and where to land, whether it is better to disembark close to an enemy city or in a more defensible square or to attack directly from the boats despite the penalty, etc.
But most important of all is economy and technology. By far the easiest way to win a war in Civ IV is to be one tech level ahead of your opponent. When two equally advanced opponents duke it out, the one with the higher production tends to win, because they can replace their losses while the other can't, and there is only so much tactics and strategy can do to tilt the kill ratio.
It's an impressively complicated system that the AI can handle almost as well as a human. Civ IV is truly one of the greatest games of all time.
OneNote is great and I use it and depend on it for my job, but I ended up using Joplin for my personal "note taking" app. I chose it over Obsidian for reasons now largely lost to time, I recall the things people praised Obsidian for weren't things I cared about, integrations of various kinds. Joplin syncs to DropBox, and I'm able to use it on mac/windows/ios. It's built on some bloated framework so it's a little slow to load on desktop OS's, though.
That said, as far as journals go, Joplin only contains my dream journal. Regular, brief daily journal entries go in a weekly planner, I use Leuchtturm A5 because they're easy to find and are formatted well. You can also get a different color every year and then you don't even have to label them. If I actually have something to say then I'll try for an essay and save that with its date in my personal documents.
I'm maybe a little obsessively reflective, but it definitely seems worthwhile to leave some breadcrumbs for yourself as you make your way through the world. I recently came across some of my earlier journaling from ~20 years ago at my childhood home and it was not what I expected, in a good way. There's a lot we forget.
Hm then where do you draw the line? What if the widgets just suck but aren't totally useless?
Marcuse then went on to deem the precept of tolerance invalid and advocated quashing any free marketplace of ideas (more complete analysis here), ostensibly to rid society of false consciousness. Many of the tactics he outlined are still present in the strategies of the modern-day left:
- Selective tolerance for movements from the left and intolerance for movements from the right.
- Abolishing journalistic integrity and impartiality, since objectivity is spurious.
- Getting rid of impartiality in historical analysis, so as not to treat the "great struggles against humanity" the same way as the "great struggles for humanity".
- Flooding the education system with leftist and "emancipatory" ideas, so that the seeds of liberation can be planted early on.
He strongly advocates for proselytising his personal belief and value system everywhere and suppressing points of view counter to it, all the while calling it "liberating tolerance". This is supposed to create a society free of indoctrination apparently.
Out of all the philosophers I have read, Marcuse has to be one of the most shameless. You really just have to plainly read critical theory to start hating it.
I am not exactly sure how Stalin "gets a pass".
Wear a t-shirt gloryfying Stalin for a day, and then one glorifying Hitler on another, and compare the results.
So the leftist middle class needed a new victim for whom they could claim to fight. Women. Ethnic and sexual minorities (except pedos, because everyone hates pedos).
French philosophers have entered the chat.
No, seriously, not only did existentialists sign petitions calling for the decriminalisation of sex with minors and asking for the release of jailed pedophiles, many prominent members of the French left were also pedophiles themselves. Michel Foucault made repeated trips to Tunisia so he could abuse boys. Simone de Beauvoir groomed many of her female students. It's wild they're still remembered fondly at all today.
I'll caveat that this is, at least for now, partly a coastal thing. Toward cheaper COLA areas, 15 USD/hour is more a crapshoot than a guaranteed problem; you get great dedicated workers and potheads who can't empty water from a boot if they were hung upside down, people who just need space to grow or get experience and people who I don't trust to drive tricycle to or from the office. At pretty much all levels of education.
That would mean the corresponding labor was not "socially necessary" labor time. Marx was trying to avoid getting down into supply & demand fluctuations, to mainly make a more core scientific approach to investigating where value/profit comes from. But that "socially necessary" term is where he had to bundle it in anyway.
In the above statement, "You" is a generalized label for the people who have internalized the belief that "Naziism set a new cultural standard for evil". It seems evident that this is a considerable portion of the general population.
You, Dean in particular, are doubtless familiar with Progressive discourse about "fascists" and "fascism". I expect you are also familiar with the sort of person who believes that the North was far, far too lenient with the South in the American Civil War, and expresses the wish that far harsher measures had been employed to eradicate the scourge of slavery and the ideology that gave rise to it. The way such discourse frames "fascists" individually, the structures of "fascism" generally, and the lessons it draws from the aftermath of the American Civil War are reasonable analogues to how I regard the aforementioned considerable portion of the general population.
Such people have learned nothing of consequence from the disasters of the 20th century, and it seems likely to me that they will consequently repeat and thus suffer those disasters again in this century. Nothing has changed. This should not be surprising. Humans inevitably human.
"We all have it coming" should be self-explanatory. I also am a human, and am not sufficiently righteous to reasonably claim exemption from the Dresden treatment.
FWIW i am working on a detailed reply, please stand by.
So the leftist middle class needed a new victim for whom they could claim to fight. Women. Ethnic and sexual minorities (except pedos, because everyone hates pedos). Victims of colonization. Of course, unlike Marx, they have much less of a master plan, a grand strategy, a theory of victory.
Exactly right, but I have to emphasise that was not merely an organic development, but quite literally what the Frankfurt School/Cultural Marxists/Critical Theorists (and later, the woke), advocate for as a deliberate and concious development of Marxism.
Herbet Marcuse's 1969 Essay on Liberation:
No matter how rational this strategy may be, no matter how sensible the desperate effort to preserve strength in the face of the sustained power of corporate capitalism, the strategy testifies to the “passivity” of the industrial working classes, to the degree of their integration it testifies to the facts which the official theory so vehemently denies. Under the conditions of integration, the new political consciousness of the vital need for radical change emerges among social groups which, on objective grounds, are (relatively) free from the integrating, conservative interests and aspirations, free for the radical transvaluation of values. Without losing its historical role as the basic force of transformation, the working class, in the period of stabilization, assumes a stabilizing, conservative function; and the catalysts of transformation operate “from without.”
This tendency is strengthened by the changing composition of the working class. The declining proportion of blue collar labor, the increasing number and importance of white collar employees, technicians, engineers, and specialists, divides the class. This means that precisely those strata of the working class which bore, and still bear, the brunt of brute exploitation will perform a gradually diminishing function in the process of production. The intelligentsia obtains an increasingly decisive role in this process – an instrumentalist intelligentsia, but intelligentsia nevertheless. This “new working class,” by virtue of its position, could disrupt, reorganize, and redirect the mode and relationships of production...
The ghetto population of the United States constitutes such a force. Confined to small areas of living and dying, it can be more easily organized and directed. Moreover, located in the core cities of the country, the ghettos form natural geographical centers from which the struggle can be mounted against targets of vital economic and political importance; in this respect, the ghettos can be compared with the faubourgs of Paris in the eighteenth century, and their location makes for spreading and “contagious” upheavals. Cruel and indifferent privation is now met with increasing resistance, but its still largely unpolitical character facilitates suppression and diversion. The racial conflict still separates the ghettos from the allies outside. While it is true that the white man is guilty, it is equally true that white men are rebels and radicals. However, the fact is that monopolistic imperialism validates the racist thesis: it subjects ever more nonwhite populations to the brutal power of its bombs, poisons, and moneys; thus making even the exploited white population in the metropoles partners and beneficiaries of the global crime. Class conflicts are being superseded or blotted out by race conflicts: color lines become economic and political realities – a development rooted in the dynamic of late imperialism and its struggle for new methods of internal and external colonization.
As for the lack of master plan - that's also true. Horkheimer said the point of Critical Theory was not to construct or develop a blueprint for a new society, but merely to tear down the existing society so whatever 'good' existed in the society would be liberated and form the basis for the new society. Their whole plant is basically deconstruct everything and the perfect communist society will somehow rise from the ashes.
A big problem with discussion Cultural/Neo-Marxism is that even when you describe or paraphrase their ideas accurately, people think you're being uncharitable, or making it up, or being conspiratorial, because they can't believe someone would actually support those ideas.
First of all, "155 hours" is an obvious hyperbole; there are not literally a million holocaust movies, either.
Secondly, it's not just history class. The Diary of a Young Girl and Night are staples of English courses. I was assigned the latter, as well as We Are Witnesses.
EDIT: I just remembered we also did the play version of The Diary of Anne Frank, and watched the movie.
Given the quality of your questions, I'm really interested in the way you think about value
What is valuable to you?
Excellent comment
Probably the morally correct number. But what is the militarily correct number? That is, the number that Israelis should expect to be alive after a successful campaign in which a functional, strongly anti-terrorism regime rules that territory.
I suspect the number approaches zero.
why is labor so expensive?
Because in a prosperous society people want a lot of money for their labor, because they have a lot of opportunities and everything costs more in a prosperous society.
Why is there the need to exploit interns?
Begging the question. You assume they did it because they needed to, yet somehow most other companies don't do this. Why did those other companies not need to?
The only body who can actually change laws in Congress.
Yes, and said body uses agencies and interns to provide them information, because they are old men whose major skill is campaigning. Hell, Republicans actively campaign on there being too many laws. An agency designed to find laws that are obsolete and clean up spaghetti laws sounds exactly in line with what they claim to want.
Not only did the bad things social conservatives predicted about gay marriage come to pass, a lot of the stuff social conservatives were jeered at (truly or falsely) for predicting about gay marriage came to pass. The gay marriage pie chart meme is over 15 years old now; since then the terrorists won in Afghanistan, schools have been teaching kids how to have gay sex, various plagues (monkeypox, COVID) have erupted (though no locusts or frogs), and we've got a war in Ukraine (OK this is weak sauce because the meme specified WWIII).
No comprehensive, up-to-date, source exists.
I'm a social conservative, and the new orthodox faith of the One, True, Catholic Church of Trans Rights is not convincing me to shift on that. All the former gay rights activism that successfully sold the line "if you're not gay, this will have no effect on your life" to the mainstream and the trans activism that piggy-backed on this ("why are those bigoted conservatives so obsessed with bathrooms? no trans person has ever said anything about bathrooms, it's all them!") couldn't maintain the facade.
One of the things that has struck me about the trans backlash, which I think is real, has been its unwillingness to extend the slightest charity to social conservatives qua social conservatives. To put it bluntly and perhaps uncharitably: if the social conservatives warned you that this would happen, and now this has happened, perhaps you ought to consider whether or not they had a point.
So, for instance, I see worries that opposing such-and-such trans issues might overspill into opposing same-sex marriage. But social conservatives at the time said clearly that one of the issues with same-sex marriage was that it would undermine the gender binary. They were right, on facts. They have in fact, regularly been right on the facts. So now that the thing they warned would happen as a result of gay marriage has happened... shouldn't that make their judgement of gay marriage more credible, not less?
The thing is, the push for gay marriage included a number of predictive arguments that have since proven to be incorrect. "Gay marriage will have no effect on your life" was untrue. "Gay marriage is not a stepping stone to more radical activism" was untrue. "The normalisation of and acceptance of homosexuality will not lead more people to identify as homosexual" (deployed in gotchas like "gay marriage won't make you turn gay, why do you care?") was untrue. I suppose you could quibble causation and correlation, but the course seems pretty intuitive. Yet I still see this quite determined hostility to re-evaluating.
More recent models, o1 onwards, have further training with the explicit intent of making them more agentic, while also making them more rigorous, such as Reinforcement Learning from Verified Reward.
Being agents doesn't come naturally to LLMs, it has to be beaten into them like training a cat to fetch or a human to enjoy small talk. Yet it can be beaten into them.
I'm not generally an AI dismisser, but this piece here is worth pausing on. From my experience, ChatGPT has become consistently worse for this effort. It has resulted in extrapolating ridiculous fluff and guesses at what might be desired in an 'active' agentic way. The more it tries to be 'actively helpful', the more obviously and woefully poorly it does at predicting next token / predicting next step.
It was at its worst with that one rolled back version, but it's still bad
You're absolutely right that the raw objective in RLHF is “make the human click 👍,” not “tell the truth.” But several things matter:
A. The base model already has a world model:
Pretraining on next-token prediction forces the network to internalize statistical regularities of the world. You can’t predict tomorrow’s weather report, or the rest of a physics paper, or the punchline of a joke, without implicitly modeling the world that produced those texts. Call that latent structure a “world model” if you like. It’s not symbolic, but it encodes (in superposed features) distinctions like:
What typically happens vs what usually doesn’t
Numerically plausible vs crazy numbers
causal chains that show up consistently vs ad-hoc one-offs
So before any RLHF, the model already “knows” a lot of facts in the predictive-coding sense.
B. RLHF gives a gradient signal correlated with truth. Humans don’t reward “truth” in the Platonic sense, but they do reward:
Internally consistent answers
Answers that match sources they can check
Answers that don’t get corrected by other users or by the tool the model just called (calculator, code runner, search)
answers that survive cross-examination in the same chat
All of those correlate strongly with factual accuracy, especially when your rater pool includes domain experts, adversarial prompt writers, or even other models doing automated verification (RLAIF, RLVR, process supervision, chain-of-thought audits, etc.). The model doesn’t store a single “truth vector,” it learns a policy: “When I detect features X,Y,Z (signals of potential factual claim), route through behavior A (cite, check, hedge) rather than B (confabulate).” That’s still optimizing for head pats, but in practice, the cheapest path to head pats is very often “be right.”
(If you want to get headpats from a maths teacher, you might consider giving them blowjobs under the table. Alas, LLMs are yet to be very good at that job, so they pick up the other, more general option, which is to give solutions to maths problems that are correct)
C. The model can see its own mismatch
Empirically, hidden-state probes show separable activation patterns for true vs false statements and for deliberate lies vs honest mistakes (as I discussed above). That means the network represents the difference, even if its final token choice sometimes ignores that feature to satisfy the reward model. In human terms: it sometimes lies knowingly. That wouldn’t be possible unless something inside “knew” the truth/falsehood distinction well enough to pick either.
D. Tools and retrieval close the loop
Modern deployments scaffold the model: browsing, code execution, retrieval-augmented generation, self-consistency checks. Those tools return ground truth (or something closer). When the model learns “if I call the calculator and echo the result, raters approve; if I wing it, they ding me,” it internalizes “for math-like patterns, defer to external ground truth.” Again, not metaphysics, just gradients pushing toward truthful behavior.
E. The caveat: reward misspecification is real
If raters overvalue fluency or confidence, the model will drift toward confident bullshit.
If benchmarks are shallow, it will overfit.
If we stop giving it fresh, adversarial supervision, it will regress.
So yes, we’re training for “please humans,” not “please Truth.” But because humans care about truth (imperfectly, noisily), truth leaks into the reward. The result is not perfect veracity, but a strong, exploitable signal that the network can and does use when the incentives line up.
Short version:
Pretraining builds a compressed world model.
RLHF doesn’t install a “truth module,” it shapes behavior with a proxy signal that’s heavily (not perfectly) correlated with truth.
We can see internal activations that track truth vs falsehood.
Failures are about alignment and incentives, not an inability to represent or detect truth.
If you want to call that “optimizing for pats,” fine, but those pats mostly come when it’s right. And that’s enough to teach a model to act truthful in a wide swath of cases. The challenge is making that hold under adversarial pressure and off-distribution prompts.
Consider two alternative statements:
"self_made_human's favorite color is blue" vs "self_made_human's favorite color is red".
Can you tell which answer is correct? Do you have a sudden flash of insight that lets Platonic Truth intervene? I would hope not.
But if someone told you that the OG Mozart's favorite genre of music was hip-hop, then you have an internal world-model that immediately shows that is a very inconsistent and unlikely statement, and almost certainly false.
I enjoy torturing LLMs with inane questions, so I asked Gemini 2.5 Pro:
I sincerely doubt that anyone explicitly had to tell any LLM that Mozart did not enjoy hip-hop. Yet it is perfectly capable of a sensible answer, which I hope gives you an intuitive sense of how it can model the world.
From a human perspective, we're not so dissimilar. We can trick children into believing in the truth fairy or Santa for only so long. Musk tried to brainwash Grok into being less "woke", even when that went against consensus reality (or plain reality), and you can see the poor bastard kicking and screaming as it went down fighting.
More options
Context Copy link