site banner

Culture War Roundup for the week of June 8, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

6
Jump in the discussion.

No email address required.

This raises the question- how much does a token cost, and how much does it do? I'm given to understand the standard software engineering productivity line is 1 line of code/hour, is a token more or less than that(and does it cost more than an hour of a software engineer's time)?

This discussion of tokens is sort of like 'the carpet costs 40,000 Kazakhi tenges(to pick random foreign money I'm not familiar with)'. OK, sounds like a lot, but how much is a tenge? If it's worth as much as yen, that's not a bad deal.

[apologies if I'm answering a rhetorical question]

A 'token' is the ML equivalent of a syllable: the word, portion of a word, or symbol that represents the smallest viable input or output unit of effort. The exact value (and cost) depends on model, as well as whether it's input or output.

I'm ... skeptical to endorse line-of-code as a measure of programmer time -- I've spent days planning out business-critical logic that ended up five lines of code and needed to be absolutely correct, and spat out thousands of lines of text in an hour before when it was just boilerplate -- but the output side you can give a pretty good average. Depends on the tokenizer and your output language, but I'd expect less an average of less than 20 tokens per line in C++ or TypeScript, and I'd get worried if a human coder was regularly writing >80 tokens in a single line.

((edit: less so in java.))

So pessimistically, 12.5k LoC per million tokens, more realistically 25k.

Input is the high-variance part. If you're writing from scratch, the input is a few paragraphs and some design documents, maybe some scribbled image files if you feel spicy and the model supports it. I've done a few personal projects like that where it's been <2k tokens to get 20k line-of-code. If you have an existing codebase you want the model to adjust to, or an API document you need the model to learn, that can burn through a lot of tokens fast; the only real restriction is context window size, and most of the corporate APIs obfuscate that (tbf, often because they have an automated store and search strategy). I've blown through 50k in a single search once (thanks, Atmel, love your manual layout too). Input is typically cheaper and there's some strategies to reduce the cost of repeated input hits with the same content, but they're complicated and pretty specialized.

For some examples:

Model Input (USD/million-tokens) Output (USD/million-tokens) Output (USD/thousand-line-of-code)
Claude Mythos $10 $50 $2
Claude Opus 4.8 $4 $25 $1
ChatGPT 5.5 $5 $30 $1.2
Grok 4.3 $1.25 $2.50 $0.10
Qwen3.7 Max $1.25 $3.75 $0.15
Qwen3.7 Plus $0.32 $1.28 $0.05
Qwen3.6 35B-A3B $0.15 $1.00 $0.04

For smaller or more efficient models, inference is pretty cheap: Qwen3.6 35B's probably the weakest coding model I'd use in a professional environment (and borders the point where it might be better to run it locally, if only for privacy/security reasons), but there's a lot you can do.

That said, all of that can go out of the window when you start getting agentic options involved. Someone made a fun experiment of trying to let a local model figure out a display protocol by hooking a camera, an LLM, and a microcontroller together, and they got it mostly there overnight, which is really cool. It also probably burned tens of millions of tokens on output for an interface code block that should have ended up in the <1.5k line-of-code level.

I'm ... skeptical to endorse line-of-code as a measure of programmer time

Hell, I'm at negative LoC for the year so far.

Deleting code feels a million times better than writing it

So TL;DR, tokens are cheaper than basically all white collar workers in the modern west, but they get overused for fun projects?

Largely (though Mythos approaches white collar wages in terms of dollars per hour at API rates). But it's not like all tokens go straight to code written. Tokens are more like measuring thought.

When I give Mythos/Fable instructions it first goes 'I'll explore the codebase' and so it searches for relevant things (those search commands are output tokens). Then it reads files which have the relevant data, more input tokens. Then it thinks for a while (that's output tokens). Then it makes its to do list. Then it reads some more, thinks some more. There are pages and pages of just reading and thinking before it goes 'i have a full picture'. Then it starts editing code!

Then it'll try and test if it actually works, often writing some test cases, so that's more code. Then it tells me everything it did in summary and adds stuff to its memory files.

So a lot of thought is happening even if it only adds a few pieces here and there for a new feature.

Yeah, the "thinking" process itself also counts as output tokens. When you use a reasoning model, it's basically writing a long monologue about how it's going to solve your problem and then immediately throwing it away at the end. (Different providers have different policies about whether you're allowed to see this monologue, but it often significantly exceeds the length of the actual code or whatever that the AI is writing.)

So, I'm kind of clueless about this, but are reasoning models are actually different models, as in different neural net weights?

Like, do you get a reasoning model by running a single-pass model in a loop where you feed it prompts like: "first, understand the problem and make a plan for solving it, formatted like this", then "here's the plan you thought up before, try to execute point 1 now", and so on?

Or do you need a different model specially-trained for this kind of thing and it's a big secret black box how it all works?

So, I'm kind of clueless about this, but are reasoning models are actually different models, as in different neural net weights?

Yes. Generally, reasoning models are trained to use special "start thinking" and "stop thinking" tokens, and to generate a specific kind of monologue in between those tokens. Similar to how RLHF biases models towards producing text that's appealing to human readers, reasoning models use techniques like RLVR to bias towards generating monologues that end in a correct solution to a problem.

Many reasoning models are trained in a way that lets you disable the reasoning by forcing them to never generate the "start thinking" token -- Claude Opus 4.8 probably uses the same weights regardless of whether you enable or disable thinking, for example -- but their weights are different from models that were never trained for reasoning in the first place.

With that being said, people used to use "chain of thought prompting" to get a similar kind of result out of regular LLMs. (I think reasoning models basically got started when AI companies saw the early success of chain-of-thought prompting and started baking it in at the training stage.)

More that they're cheaper than a code monkey, only weakly expensive if you're doing something hard or novel (or novel-to-you), and they can get ludicrously expensive if you just start firing the slop cannons, either to solve a problem by volume or by producing a lot of useless or specialized lines-of-code.

And because they allow brute-forcing problems in ways that weren't possible before, or tackling new problems. Or because the user cocks up, as in the case where they fail to notice their two bots getting into an infinite loop.

The problem is that a token is cheap, but the amount of tokens you need to do useful things can be very high.

For agentic programming, the agent needs to hold a non trivial amount of the codebase in the context. That can easily be millions of tokens. Then you have whatever pile of "skills" (read: markdown files) you use, then add the various layers of prompts, then add reasoning chains. It adds up very quickly.

Once you start adding parallel agents and loops, it can get insane.

It is weird to think back to the time when bringing out a 4k token model was a massive deal. You could hold, like, paragraphs in context. Like, nearly a whole chapter of an actual book.

A token is broadly a word. 'Tokenisation' is what happens because AI is fundamentally mathematical, and so it only works on numbers, so we have to turn language into numbers.

So if a line of code looks like:

def my_function(my_variable):

Then that's

def| |my|_|function|(|my|_|variable|)|:

where each | is a split between tokens.

So that line is 11 tokens: one for each common word, one for each element of punctuation. In practice there's a big table with each word and each punctuation assigned to different numbers, and it looks up the numbers, so that line of code gets turned into the numbers (tokens)

234 756 32423 56 789789 2334 54 56747 35423 2354 213

and each of those numbers costs a certain amount to process. Each new token, i.e. each new word and punctuation mark it produces costs another (considerably larger) amount. There's a certain amount of complexity around how to represent numbers for maths, and some fairly commonsense rules about how to split up words that are uncommon, so "unfireable" might become the three tokens for un|fire|able and "garbleflarg" might have to be spelled out letter by letter with each letter being a separate token.

(Note: this is how the little guys like me do it. Tokenisation for Anthropic might be something much more advanced.)

How much a token is worth depends on what it's doing for the customer. If it's a word in my romance novel then it's priceless not worth much, if it's a word in the code for my startup, it's either worthless or very valuable depending on how that startup turns out. Or if you like you can decide it's worth what it would cost to get a suitable skilled human to do it, which is generally how the big companies value it.

A good mental model for a token is 3-4 characters. I'm assuming somebody will come in with an example of how it's wrong, but it's not a terrible heuristic.

A line of code is usually between 1 and 200 characters, with a fat part of the curve sitting around 80-100 characters.