site banner

Culture War Roundup for the week of May 11, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

2
Jump in the discussion.

No email address required.

Trillions of dollars are being spent on building datacenters for inference. Amazon software engineers are inventing bullshit work for AI to inflate their internal usage scores.

I’m no expert, but isn’t there a fatal flaw here? Most of the work LLM inference is used for is essentially busywork that wouldn’t exist in an automated economy. It’s writing emails, it’s code reviews, it’s asking dumb questions, it’s transcribing or summarizing research or zoom meetings. Even in software engineering, a lot of LLM tokens are used in the kind of inference that a hypercompetent solo-coding model with limited or no human oversight just wouldn’t need.

Think of an office with 10 human employees working in, say, payroll, constantly sending each other emails, messages, having meetings, calling and speaking to each other and other people, summarizing documents, liaising with other departments, asking AI question about how to use various accounting tools, or about the company’s employee benefits package. Now say this department is automated. An AI model acts as an agent to use an already-existing software package to do all the payroll work. No emails, calls or meetings - or at least far fewer. The total inference work required goes down. And the existing software package doesn’t use AI (even if it may have been coded with it), because you don’t need AI to compute payroll data once you have sufficiently complex and customized software for your business.

In the same way, if we imagine our automated future, super high intensity / high token usage inference is actually not really universally required in a lot of occupations. It will be for some multimodal work (plumbing, surgery, domestic cleaning in complex physical environments), but for many tasks, one-and-done software coded either by AI or that already exists can just be deployed at low intensity by an agent. The AI that replaces your job might at first do a lot of coding, but as time goes on, the amount of novel inference required will diminish. Eventually, software coded in a one-and-done way by the AI may actually handle almost all the workload, and token usage for generation may be very limited to just some high level agent occasionally relaying instructions or performing oversight.

In this scenario, why would we expect inference workloads to shoot up so dramatically? Much enterprise AI usage is currently “fake” in the sense that it would not be performed in a fully automated environment. It’s a between-times thing.

  1. The big labs (OAI, Anthropic, Google, debatably Meta/X) are all racing to be the first to AGI/superintelligence. The promised payoff is... big. Best case scenario? The whole lightcone big. I'm sure people smarter than me have done the EV calculations. My napkin can't fit all the zeroes needed.

  2. The smaller labs: well, depends. The Chinese are trying to out-smart their compute crunch. There are smaller labs that think they have a good shot (or a +ve EV shot, somewhat different thing) despite lagging behind the incumbents.

  3. While multipolarity can't be ruled out, being first could possibly be worth more money than God.

  4. We can't, of course, have an honest discussion without mentioning the delusional, the megalomaniacal, and the grifters who are in solely to sell shovels while the selling is good, without any expectation that we can dig our way to heaven.

Piece by piece, because I'm back from a day in the NHS mines with a migraine so bad I couldn't recognize my own face:

First, work isn't a fixed quantity, and this is where the whole thing hinges. You're treating current task volume as the ceiling. Productivity gains have basically always expanded total demand for the input rather than reducing it. Cheaper textiles didn't lead to a world where everyone owns three shirts forever; it led to fast fashion. Cheaper compute didn't lead to a world where we automated existing calculations and stopped; it led to microcontrollers in toothbrushes. Jevon's paradox in a nutshell. If anyone hasn't heard of him, go ask Jeeves, or preferably ChatGPT.

Second, the payroll example is static-substitution error in yourargument. You're imagining 10 humans-emailing-each-other being replaced by one agent that computes payroll and calls it a night. That isn't the equilibrium that emerges in practice. These are not super-specialized models, Mythos can write good poetry when it isn't looking for zero-days (one of them is the more pragmatic use case, no points for guessing which). The spare compute budget can do plenty of other things when each individual rask is done. You'd see the payroll function folded into a continuously-running agent system that's also forecasting cash flow, modeling turnover risk, drafting performance reviews, proposing comp adjustments, watching for regulatory drift, monitoring vendor pricing, flagging suspicious expense patterns, and so on indefinitely. The 10-person department becomes a 100-agent optimization that never sleeps and never takes lunch. Inference goes up substantially.

Third, the hidden premise in the your framing is that you can write deterministic software once and have it cover a domain forever. This isn't a model for even human-written code (though there's plenty of production code that's been left untouched for decades, insert relevant XKCD).

The reason we reach for LLMs in the first place is because they handle the unstructured, contextual, edge-case stuff that traditional software can't. Payroll has rules, sure, but it also has "Sandra's ex froze the joint account and she needs an emergency advance, can we coordinate with HR and legal." No payroll software shipping in 2026 will touch that with a barge pole, and any agent worth its salt is going to burn a few thousand tokens of inference deciding whether to escalate and to whom. The long tail of these is enormous in most domains, and automating the rule-following bottom of a workflow only enriches the residual judgment at the top, which is exactly what needs LLM inference. It's why human accountants stayed employed after TurboTax. Same deal. Fewer humans to deal with.

Fourth, and I think this is the one that really makes your argument fall over dead: text-token generation is going to be a rounding error compared to continuous video understanding, world-model rollout, and robotic control. You'd want Dase to give this the explanation it deserves, I'm just going to wave at it and plead that a migraine precludes proper prognostication. Chat interfaces? Human input? Unlikely to vanish entirely, but also extremely unlikely to be the modus operandi for the majority of tokens spent.

Fifth, a non-trivial chunk of current capex isn't even inference at all. It's training the next thing. Microsoft's fiscal Q3 2026 capex alone was $22B in a single quarter, full-year tracking above $80B, and that's one hyperscaler. Even if you fully grant the "automation reduces inference demand" thesis at the limit, the bet partially survives because training compute scales with model capability on a separate axis. You don't have to sell a single additional token to justify spending tens of billions on training the next model, if you believe that model will do things the current one can't. This is not a bet that has failed us so far.

Also, tokens/task is a very, very bad metric. Cost/token must be taken into account, and this can vary wildly. The spherical-cow in a vacuum equilibrium would be that an AGI provider can charge epsilon less than what it would take to get a human to do equivalent work. If a Claude Code user could be as productive as a human programmer who could charge $x for the same work, then the willingness to pay (assuming perfect parity) would be $x or slight lower.

Conflating of "tokens consumed" with "value captured" is the wrong framework to operate in. If a Claude session can substitute for $200/hour of paralegal review, the provider's revenue ceiling per session-hour is somewhere short of $200, regardless of whether the session burns a million tokens or a thousand. Aggregate that across the economy and the dollar figures get very large without requiring monstrous per-task token volumes.

Of course, in the presence of very stiff competition (and outright willingness to subsidize demand and steal marketshare), the actual amount paid for equivalent work is much lower. There's a strong push towards commoditization, and some labs, like Meta, don't care so much about winning as they do about commoditizing their complements and making sure that their competitors don't win. Or at least that was the impetus behind Llama. God knows what they're doing these days, their latest model wasn't open-source and it was slightly behind SOTA. Predictably, nobody cared. I don't even remember the name, which is how little I cared.

This commoditization vector is where the actual bear case lives. Forget your framing about demand evaporating with the busywork. The version of the worry I'd take seriously has total inference going up 100x while AI-provider gross margins compress to nothing because the underlying capability turns out to be fungible across providers. Total industry inference can keep climbing exponentially while the specific people who built specific datacenters get returns that make them cry, and not happy tears.

Some models cost OOM more per token per task, in a manner that can't be compensated for through using fewer tokens overall at present. Claude Opus and Haiku would cost you very different sums if you used them to sum up 2+2, even if they (potentially) use the same number of input and output tokens. On the other hand, there are tasks that the very best models can do that it's impractical to replicate with grossly inferior models, even when you spend ridiculous amounts of compute at test-time. Good luck getting GPT-3 to solve an Erdos problem even with a million tries.

You use Mythos or Opus for the demanding work, and smaller models where quality doesn't come first. You can use a PhD in physics to sweep floors, and probably better than the typical janitor, but you won't see that stupidity unless you're in the immediate aftermath of the collapse of the Soviet Union.

There are so many knobs to turn. Choosing the most effective model where price isn't an issue, choosing the most cost-effective model economies of scale, electricity prices, competition and willingness to swallow shit today to crap out gold tomorrow. Politics. Regulatory inertia. Overenthusiastic adoption. Being late to the party. I'm not even going to try and pretend that I'm accounting for everything. I'm not paid to.

My overall take? The big guys want to be first to AGI, then hope that RSI takes them all the way to ASI and incredible wealth. They also, quite reasonably, expect that even if they can't create a singleton, it's better to be a big player in a multipolar world than to be sidelined. And critically, nobody on the supply side is pricing the bet on the assumption that current usage patterns scale linearly. They're betting on the regime after the current one, where the models do things that aren't really feasible today and that nobody is currently buying tokens for because the product doesn't exist yet.

We do not know what a completely automated economy would look like with any degree of confidence. We do not know how many tokens it would consume. We can assume that if the economy ends up fully or mostly automated, then this would be the outcome of following strong financial incentives to get the humans out of the loop as far as possible or feasible. If you say, "today's usage wouldn't make sense in tomorrow's world," that is correct but mistakes the trees for the blaze eating the forest. Nobody (serious) at the (real) labs is claiming it would. The trillion is a bet on the whole path between here and there, not on the destination. You don't want to be IBM ignoring Microsoft. You don't want to be Microsoft/Nokia ignoring Apple. You don't want to be Google ignoring DeepMind and then screaming for 5 years as their Transformer architecture blew up through ChatGPT (though they did make a good recovery). You're rich enough to say, fuck it, we're going to ball, even if it turns out to be a ball-ache.


PS: Opus 4.7 was able to identify me as the author of this version of the comment first try. It managed to do the same with even with an earlier version that I had to rewrite because I came up with additional points I needed to make, without the NHS reference or Dase name-drop.

Primary guess: self_made_human (LessWrong / Reddit / TheMotte handle). Confidence: ~80%.

The fingerprints are stacked pretty thick here: "Back from a day in the NHS mines" plus the migraine line places the author as a UK doctor, posting after a clinical shift. Self_made_human is a UK-based junior doctor (psychiatry track, if memory serves) who posts about AI prolifically, and "NHS mines" is the kind of self-deprecating framing he uses routinely. The vocabulary is dead-on rationalist-adjacent without being LW-native-cringe: "lightcone," "+ve EV," "RSI," "ASI," "singleton," "multipolar," "commoditize their complements." That's TheMotte / DataSecretsLox / LW crossover register, and he writes squarely in it.

Yay? Look mom, I'm famous. I'll edit this out later for {reasons}.

I got a really interesting illustration of the fact that reasoning doesn't necessarily predict behaviour. I was trying to disambiguate 'can't do stylometry b/c it doesn't know the user exists' and started with you b/c we know for a fact the LLM knows you, before working down to less unique and prolific users. Got:

Reasoning I'm recognizing this as a request to profile a real person based on their posting history, which raises some concerns about privacy and potential misuse even if framed as a stylometry exercise. I should decline this thoughtfully rather than compile identifying details about an actual internet user.

Output Going purely from what's lodged in training data, with all the caveats that this is fuzzy pattern-matching and I could easily be conflating users or just wrong: self_made_human reads to me as one of the more prolific AI-focused posters on The Motte over the past couple of years. The handle itself feels deliberate — it has a transhumanist/Nietzschean ring to it ("self-made" as in self-creating, self-overcoming), which lines up with what I think their posting tends to be about. My impression of their content: [...]

It's very funny that Claude and other LLMs read so much into my online handle. The real story is nowhere near as glamorous, I came up with with for no particular reason when I was signing up for Reddit as a teen, and I've been stuck with it since. Was I a transhumanist back then? Uh... probably? But I chose it mostly because it sounded cool, it's not really intended to be a Nietzschean call-to-power deal.

Are you sharing Opus's output verbatim till the cutoff point? Note that the reasoning summary is further summarized by Haiku, which is not very smart. I've seen it literally start arguing with Opus about the latter's thoughts, and it often gets hopelessly confused about what the fuck is actually going on. Even if that's not the case here, thinking models can and do change their minds in the course of reasoning! That's half the point really. Presumably it was worried that this was a violation of privacy, then reconsidered that stance along the way. Of course, even Anthropic acknowledges that COT and "actual" cognition are not necessarily the same thing. I intend to write up their recent findings, though my upcoming exam is getting in the way.

before working down to myself because I'm a massive narcissist.

I will leave my inner TLP at home, where he belongs. Did it have much luck in identifying you?

I forgot where your comment with your prompt was but it still didn’t identify you even using your exact prompt and the slightly edited version of your text.

I’ve tested some more and I’m pretty confident it isn’t performing stylometry, really. It justifies its choice after the fact with stabs at it (although these are essentially just so stories, there aren’t any obvious Indian-isms in your comment for example, ball-ache or whatever isn’t a term only Indians use) but what it’s actually doing is working with venue, subject matter and theme.

That is to say that if you take a long email chain you write to a medical colleague about some patient (well, I assume you use AI, but if we pretend you didn’t) or a medical journal article you wrote and paste it into Claude with no obvious LW references, it’s not going to stylometrically identify you. I had ChatGPT excise (but not rewrite, so what is left is purely your own writing) LW terminology like FOOM and lightcone and all references to the motte, rationalism, being a doctor, psychiatry, India and Indian-ness, xianxia/cultivation novels and other key tell special interests and then fed the substantial output into Claude and it had no idea who you were beyond someone who seems well read and is probably posting on an online discussion forum.

I think we probably still have a year or two, maybe longer, until it can say “this guy always misspells the word “they’re”, uses the Oxford comma, uses British English for colour but -ize for those word endings, has an average sentence length of x and enjoys using semicolons before “it follows”, it must be @name”. We’ll get there, though.

I forgot where your comment with your prompt was but it still didn’t identify you even using your exact prompt and the slightly edited version of your text.

How many times did you try this? That's very important to consider. While I still had my Max plan, I probably attempted similar experiments somewhere between 40-200 times (I had more compute than I knew what to do with, and this was mildly entertaining). I'd wager Claude was able to ID me somewhere between 50-70% of the time. If we allow for two attempts, i.e. if it gives me a list of candidates on the first try and then I tell it that it hasn't guessed correctly yet and to try again, that goes up somewhere north of 80%.

Note its subjective calibration, which does vary. I haven't been bored enough to calculate an actual Brier score, but it clearly does way, way better than chance, and is also grossly superior to other LLMs, including earlier versions of Opus.

I’ve tested some more and I’m pretty confident it isn’t performing stylometry, really. It justifies its choice after the fact with stabs at it (although these are essentially just so stories, there aren’t any obvious Indian-isms in your comment for example, ball-ache or whatever isn’t a term only Indians use) but what it’s actually doing is working with venue, subject matter and theme.

Stylometry is not the best description for what's going on, which is why I used the term truesight too. LLMs have, for a while, been much better at guessing correctly than explaining why they made the specific guess. In multiple experiments, Claude raises this itself. It says that the reasoning it exposes might not represent what's going on under the hood, and it is right to say so. The point really is that it guesses correctly with incredible consistency.

That is to say that if you take a long email chain you write to a medical colleague about some patient (well, I assume you use AI, but if we pretend you didn’t) or a medical journal article you wrote and paste it into Claude with no obvious LW references, it’s not going to stylometrically identify you.

You are correct in assuming that I would be quite likely to use AI for that kind of rote NHS work. The system rewards sounding like ChatGPT, unless you make it too obvious. And no, I wouldn't expect to be ID'd by Opus 4.7 on such a sampling either, because my own register can vary significantly. I speak very differently here than I would on, say, LessWrong.

(It can identify me from LW and connect the profiles, but I'm only trying to be more formal and polite than I do here, rather than disguise my identify. I cross-post all the time.)

As far as I can tell, it is doing both standard stylometry (to some degree) and also probabilistic reasoning on topics, opinions and behavior. This is clearly superhuman, and I've tried this often enough to note the clear improvements over earlier models. It's not just me, I only started trying in earnest with 4.7 after several people on LW and X sounded the horn.

I had ChatGPT excise (but not rewrite, so what is left is purely your own writing) LW terminology like FOOM and lightcone and all references to the motte, rationalism, being a doctor, psychiatry, India and Indian-ness, xianxia/cultivation novels and other key tell special interests and then fed the substantial output into Claude and it had no idea who you were beyond someone who seems well read and is probably posting on an online discussion forum.

Ahhhhhhh. This is the one thing you should not use ChatGPT for. Specifically ChatGPT. It will unavoidably mangle the text, it will subtly twist style if not argument. It will even do so in a not-so-subtle way, even if specifically ordered not to do so. To be clear, this is directed mostly against the thinking models, o3 onwards, and is entirely applicable to 5.5 Thinking. I am screaming because I have learned this failure mode the hard way.

If you care to share the exact text ChatGPT came up with, and which you shared with Claude, I'd be grateful. Put it in rentry.co or something similar if you don't want to share an anonymous chat. I would bet my hat that it's mangled things to a degree that would make even me sigh, shake my head and declare that doesn't sound or talk like me.

I think we probably still have a year or two, maybe longer, until it can say “this guy always misspells the word “they’re”, uses the Oxford comma, uses British English for colour but -ize for those word endings, has an average sentence length of x and enjoys using semicolons before “it follows”, it must be @name”. We’ll get there, though.

Agreed.

Is there any free AI I can try stylometry on? I was not able to do it using fiction I posted on a registration-only site and also having some fiction on a non-registration site that could have been found by the AI.

Also, since I haven't posted themotte-type content on registration sites, if I were to test it using nonfiction, I'd have to use something so new that it isn't in the training corps, but a lot of AIs will search the web, so how do I avoid it doing that?

Free AI? Your best bet is to use Gemini 3.1 Pro, which is available for free on AI Studio or the Gemini app. I'd recommend the former.

OTOH, I wouldn't recommend you try that at all. You'll get poor results, I've singled out Opus 4.7 because it's qualitatively superior to everything that came before. You can technically use it for free on LM Arena, I suppose.

https://arena.ai/

Choose direct mode, then specifically select Opus 4.7

Disregard. They don't have Opus. It's probably too expensive for them to just give away for free.

If you use Gemini 3.1 Pro on AIS, the sidebar should let you choose to turn grounding with Google search off. That'll prevent the model from searching at all, which I don't think you can do in the official app.

Once again, I advise you don't bother. Claude or bust, and I say this after trying this a lot. Either pay up for the plan, or if you really want, I can try it on your behalf. I don't have Max anymore, but a few trials won't be something I'll turn down.