@BigObjectPermanenceShill comments on "Culture War Roundup for the week of May 11, 2026

Culture War Roundup for the week of May 11, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

2rafa 1mo ago · Edited 1mo ago

Trillions of dollars are being spent on building datacenters for inference. Amazon software engineers are inventing bullshit work for AI to inflate their internal usage scores.

I’m no expert, but isn’t there a fatal flaw here? Most of the work LLM inference is used for is essentially busywork that wouldn’t exist in an automated economy. It’s writing emails, it’s code reviews, it’s asking dumb questions, it’s transcribing or summarizing research or zoom meetings. Even in software engineering, a lot of LLM tokens are used in the kind of inference that a hypercompetent solo-coding model with limited or no human oversight just wouldn’t need.

Think of an office with 10 human employees working in, say, payroll, constantly sending each other emails, messages, having meetings, calling and speaking to each other and other people, summarizing documents, liaising with other departments, asking AI question about how to use various accounting tools, or about the company’s employee benefits package. Now say this department is automated. An AI model acts as an agent to use an already-existing software package to do all the payroll work. No emails, calls or meetings - or at least far fewer. The total inference work required goes down. And the existing software package doesn’t use AI (even if it may have been coded with it), because you don’t need AI to compute payroll data once you have sufficiently complex and customized software for your business.

In the same way, if we imagine our automated future, super high intensity / high token usage inference is actually not really universally required in a lot of occupations. It will be for some multimodal work (plumbing, surgery, domestic cleaning in complex physical environments), but for many tasks, one-and-done software coded either by AI or that already exists can just be deployed at low intensity by an agent. The AI that replaces your job might at first do a lot of coding, but as time goes on, the amount of novel inference required will diminish. Eventually, software coded in a one-and-done way by the AI may actually handle almost all the workload, and token usage for generation may be very limited to just some high level agent occasionally relaying instructions or performing oversight.

In this scenario, why would we expect inference workloads to shoot up so dramatically? Much enterprise AI usage is currently “fake” in the sense that it would not be performed in a fully automated environment. It’s a between-times thing.

Context

BigObjectPermanenceShill 2rafa 1mo ago

While it is true that high AI performance and thus automation making certain busywork obsolete will cause some demand destruction, there are so many ways to use tokens.

It's been a slow day. I've "used" something like 75 million tokens. Of those maybe 72 million were cache reads, true, but also about 2.5M input and 500K output. If you look at modern benchmarks like Artificial Analysis or MathArena, you'll see that even very best models use tens of thousands of tokens to solve problems. We have enough problems. The cheaper intelligence is, the more problems become economical to solve by throwing tokens at it.

Better yet, look here.

Corvos BigObjectPermanenceShill 1mo ago

How are you using it? I'm currently somewhat constrained by limits in my startup - I tend to stick to the quotas provided by my various $20 services, so I get the models to do specific constrained tasks using up to 100k tokens, write what they did to a new log, and spin up a new instance.

I can make a case for massively increasing our AI budget if using 100x the tokens would have a genuine effect, but my impression so far is that Claude tends to get out of control and go up the garden path when you let it think too much. I'd be very interested if you're getting better results by just letting 'er rip, how you're doing it, and on what kinds of problems?

BigObjectPermanenceShill Corvos 1mo ago

I mean aggregated across multiple models. I don't use Claude much anymore (only occasionally API via OpenRouter) and don't think their offer is economically viable. Now, GPT 5.5 is the orchestrator, DeepSeek V4 Pro/Flash is the workhorse. Their 1M context and new prices, especially context caching, make long agentic projects basically free. Cache persists for a whole day too, so speed is of no issue if the top-level plan is reasonable.

FearandLoathingintheMotte BigObjectPermanenceShill 1mo ago

Echoing Corvos' question

I'm currently on the $200 OpenAI pro plan due to my thirst for tokens but I'm about to drop to the $100 plan and I'm barely even using 5.5 at this point as I'm largely having AI do non-coding activities for me now so I can get away with way cheaper models for "summarize all my to-do notes for today and file them" so having deepseek do my filesystem work would be nice (but I love the codex app)

BigObjectPermanenceShill FearandLoathingintheMotte 1mo ago

You can manually enable any third party model in Codex if the model's endpoint offers a responses API. DeepSeek and a few others don't, but I see there's a converter.

I see, thank you. Are you physically organising this yourself, or do you use a program where GPT can spin up Deepseek agents to do the heavy lifting automatically? What harness do you use?

Many harnesses now support subagent calling. I have built and recommend building your own from pi-agent with a small set of packages, because everything else feels bloated, excessively opinionated and cache-inefficient. Also pi was shown to sometimes perform better with GPT 5.x than Codex itself, so it's not a big compromise. OpenCode is supposedly good now, I don't have the patience to check it out. If you want server-side execution without being locked into a specific lab's ecosystem, I've heard credible praise for Factory Droid.

One note, all this is likely not viable for your scenario, because DeepSeek's cache pricing is limited to their own first party API (Chinese). I am hopeful that in time Western providers, with their superior hardware, will also figure out effective cache compression and serving from disk, or be compelled to grow more generous if they have figured it out already. But this day is not yet here. (Claude Code plans treat cache as free but has obvious usage limits).

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats