site banner

Culture War Roundup for the week of March 30, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

4
Jump in the discussion.

No email address required.

The proliferation of models and harnesses times individual work styles, preferences and use cases creates an exponentially large space, making it a futile endeavor to diagnose the reason for your experience (Opus vs Sonnet, Claude Vs GPT, Claude Code vs Codex? Tool use configuration? Sun spots?) and give any advice. And besides, why engage in big-picture futuristic forecasting? Frontier labs should shill their product on their own dime, and their thesis will be proven right or wrong soon enough.

There's a clear object-level flaw in your writeup, however, and ironically it's the exact sort of confident slop we've come to associate with LLMs when they come short of the standard of human reasoning over novel context. This isn't to dunk: the standard, see, is very high, humans often need conscious effort to match it. That models can ever touch it is miraculous enough.

I mean this part:

The second happening is the ARC prize people releasing version 3 of their AGI test suite, a series of puzzle games. They released it within a few hours of Jensen Huang saying he thinks the latest and greatest models are capable of AGI. Humans were capable of solving 100% of the puzzles. The highest scoring AI couldn't complete more that 0.5%.

Here are the AGI puzzles for anyone interested in trying them out: https://arcprize.org/arc-agi/3

You've played the games and you've thought of making it an argument, but you weren't curious enough to read on the actual scoring rule. It's contentious enough that Chollet has to make excuses on Hackernews.

To be clear:

  1. Each submission (i.e., an attempt to output a solution for a task) counts as one action. Internal reasoning steps do not count.
  2. For each task, the baseline is the second‑best number of actions taken by humans who attempted the task for the first time. Using the second‑best reduces the impact of luck.
  3. If the AI solves the task: (human_baseline_actions / ai_actions)², yes squared. If the AI fails: 0.
  4. Maximum per task is 1.0, which is to say even if the AI beats the human baseline, the score is capped at one point zero.
  5. Scores are weighted by task difficulty (later tasks in a game count more), then averaged across all games.

So if the human baseline is 10 actions:

  • AI uses 10 actions → (10/10)² = 1.0
  • AI uses 20 actions → (10/20)² = 0.25
  • AI uses 100 actions → (10/100)² = 0.01

A score of 0.5% (0.005) does not mean that the best AI only solves 1/200th of the problems. Same score is achieved by it being 14.5 times less sample-efficient. But should we care? How much of an opportunity cost comes with wasted AI samples? A white collar professional in the US earns a Claude Max (x10) subscription in 1-2 hours; Claude Max will generate ≈2 OOMs more tokens in a month than said professional can; even if they're 1000 times less useful per token, that's a massive bargain. We already routinely afford AI that's this inefficient. It'll be more efficient soon, though.

Which is not to say it'll be cheaper. Consider that car rentals in American cities go for roughly one monthly Claude Max subscription a day. Sure, the US is a tough place and a car preserves you from being stabbed in the neck on public transport, but we can quantify the millimorts and assign a cost to them; after that, does a car for a day provide as much economic value as a fully exploited Claude Max for a month – tens of millions of Opus-grade output tokens? Seeing how fast OpenAI and especially Anthropic revenues have been growing, what do you think will be their asking price when all dumping from also-rans is rendered irrelevant?
Right now the cost of tokens is suppressed by the lingering user base acquisition phase, hardware gains, rapid competitive model churn and, more importantly, by the threat of cheap open weights models, mostly Chinese, increasingly Nvidia. Should those fall far enough behind, together with other minor competitors, and we enter the territory of a Frontier Cartel, $1000/month subscriptions as baseline expectation. (This, fyi, is implicitly Dario Amodei's theory of victory – see him invoking Cournot equilibrium on Dwarkesh's podcast.) I pray we don't. But people would pay for it. These systems aren't a joking matter, being shut out of them will be quite literally existentially threatening for many businesses soon enough.

The best model+harness scores 36%, by the way. But I'm more impressed by the 12.58% scored by a 4-layer CNN called StochasticGoose. Read this piece, it contains some pretty neat analysis.

Chollet's evals are neat too, but he's pushing a narrative against machine superintelligence from within Deep Learning paradigm, and he's getting embarrassingly biased, with more and more abstract justifications for denying what looks like inevitability.