@stuckinbathroom comments on "Culture War Roundup for the week of April 20, 2026

Culture War Roundup for the week of April 20, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

Shaming.
Attempting to 'build consensus' or enforce ideological conformity.
Making sweeping generalizations to vilify a group you dislike.
Recruiting for a cause.
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
Don't imply that someone said something they did not say, even if you think it follows from what they said.
Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

stuckinbathroom 1mo ago

LLMs play chess entirely through text. It's the equivalent of asking a person to play a game of correspondance chess, buth they can't recreate the game physically, they can't have any drawings of the game, all they can do is have a record of moves already made. Outside of literal chess masters, how many humans would get through such a game without making a mistake?

But LLMs* also have a massive advantage over an unassisted human: they have access to the internet, or at least to a sandboxed Python interpreter or similar coding environment. So the fact that they are constrained to text-based I/O should really be no excuse: there’s absolutely nothing, in principle, preventing the LLM from “thinking” to itself “Hmm, I’m being asked to provide answers about a formal system. Let me create a computer program to record the state of the game and make sure I don’t make any illegal moves.” But current SOTA LLMs never think to do that, even when all the tools are at their disposal.

In other words, the equivalent human activity is not playing correspondence chess with nothing but a record of all past moves, but rather playing correspondence chess with a book of chess rules plus pen and paper (or text editor and Python interpreter, if you like)

*OK, I admit I am playing fast and loose with the definition of “LLM” here. In the very strict sense, language models do only transform one sequence of tokens into another, as you said. But in the colloquial sense, which is also the more relevant one for discussing the abilities of SOTA consumer-facing AI, “LLM” refers to a product like ChatGPT, Claude, etc. consisting of a core language model (in the strict sense) together with tools that it can invoke to solve problems.

Context

faul_sname Fuck around once, find out once. Do it again, now it's science. stuckinbathroom 1mo ago

An LLM with access to a sandboxed coding environment (and instructed to use it) will generally not make illegal moves in a chess game.

stuckinbathroom faul_sname 1mo ago · Edited 1mo ago

I admit I have never tried this, but I’ll take your word for it; it does seem plausible that Opus 4.6 or equivalent would be able to one-shot a simple program that computes the state of the board after a given sequence of past moves and validates that a proposed next move is legal.

Still, this raises 2 questions, one rather surface-level/product focused and one deeper and more architectural.

Firstly, why should I as the user have to prompt the AI to make a program to ensure that it doesn’t go off the rails? Why can’t it figure that out for itself? For example when I ask a modern SOTA AI to answer trivia questions, I don’t have to tell it to go to such and such website; I don’t even have to tell it to search the internet. It just “knows” without prompting that a Google search is the right tool for the job. Why can’t it do the same thing for chess? Or for that matter, for the old “number of Rs in ‘strawberry’” question that it kept stumbling on last year? There are any number of common natural-language queries that really boil down to a problem of logic or some other formal system—it should be the AI’s job, not mine, to identify them, come up with the right formalism, and then use it to solve the problem.

I suspect this shortcoming may be trivially resolved by adding something like “Always consider whether you can map this question, or some piece of it, to a problem that can be solved in Python and remember that you have access to a sandboxed Python environment” to CLAUDE.md or the system prompt or whatever. Fair enough. But this gets us to the second and more fundamental question: for a given AI and problem, it’s not always obvious what the best formalism or representation of that problem is. Let’s go back to the chess example; suppose the AI writes a Python program to keep track of the board state and ensure all of its moves are legal. On some level, the “game loop” then becomes something like:

I type in a (legal) move in algebraic notation
AI appends that move to a text file
AI runs the program to print some representation of the board, after all the moves recorded in the text file, to its internal context
AI decides on its move—either by simply treating the board state as another sequence of input tokens and emitting the corresponding output, or by running some other program of its own devising, but for the sake of discussion assume the former, as the latter presupposes the ability to one-shot Stockfish which AFAIK is beyond the current SOTA—and appends it to the text file
AI runs the program again to confirm the move is legal; if not, erase the last move from the text file and goto 3
AI prints its move from above to the screen so I can see it
Goto 1

Let’s drill into step 3: what is the optimal representation of the board that the AI should be using for its own benefit? It’s a bit of a trick question: “optimal” here means something like “maximizing the probability of the AI winning the game” but perhaps also “minimizing the probability of making illegal moves which cause it to waste time looping through steps 3-5 again”. As a human I can certainly come up with various representations; an obvious one would be to render the board as an 8x8 CSV or Markdown table. But I have no idea whether this is “optimal” for the AI in this case, and in general I may not even know what “optimal” should mean. Again, it should be the AI’s job to figure all of this out—otherwise it’s not worthy of the name AGI in my book.

One last thing: I don’t actually care whether AI uses a sandboxed coding environment or whatever to solve problems. Perhaps it will turn out to be the case that just scaling up—more compute, more RLHF, bigger transformers, bigger context windows—will suffice to get LLMs to the point where they can (e.g.) play and win games they have never seen before purely by transforming tokens to tokens, without the use of external tools, or to the point of one-shotting algorithmic solvers like Stockfish. If so, great; one is reminded of the old Deng Xiaoping quote about the color of the cat. But based on what I’ve seen so far, it looks like there’s a ton of low-hanging fruit in the direction of “just use the tools already at your disposal” at our current levels of model complexity.

The reason I said "if you instruct Claude to use the programming env" is that Claude will generally do things similar to those that were evaluated well in the past, and most chess-like-evals would have forbidden tool use or anything else that human players wouldn't consider "fair play". I expect "always consider what tools you have available and make use of them where it makes sense unless explicitly told not to" in your user instructions will work so that you just never run into this in practice.

Bluntly, I don't think it matters how the board state is represented, as long as the answer isn't "Claude is trying to reconstruct the entire board state from the move sequence".

FWIW I tried the prompt

Play good chess.

d4

and Opus 4.7, at various points in the opening, dumped a snapshot of the board state into the chat.

Not playing the full game because Claude spent 20 minutes thinking and writing janky minmax code after blundering before hitting the compaction limit then erroring out, then on the second attempt spent another 15 minutes thinking and almost hit the compaction limit but you can see that it does in fact use tools.

Anyway, to answer the question:

Firstly, why should I as the user have to prompt the AI to make a program to ensure that it doesn’t go off the rails? Why can’t it figure that out for itself?

The AI has no memory. Every conversation is a fresh new world. As a rule of thumb, I expect AI to significantly outperform me at anything I've never done before, but that for any task that hasn't been the subject of absurd amounts of RL (and some tasks that have), I'll very quickly be able to identify the places that AI is likely to fail and steer it around those pitfalls. Because I can learn, and the AI can't.

What is this place?

Why are you called The Motte?

New post guidelines

Rules

Recommended Posts And Communities

Recommended Realtime Chats