This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
This just isn't a good model of how LLMs work. If it were doing some naive averaging of all the text it was trained on for a subject, shouldn't it randomly insert words in Spanish or Chinese? But it doesn't. If you ask an LLM whether it's a man or a woman (one without "as an AI language model" post-training), it doesn't present itself as the hermaphroditic average of the people described in its training set, it chooses one and at least tries to stick to its answer. Now, either way it's incorrect, obviously, but it's clearly not an average; a mode, perhaps. But it doesn't just naively take the mode either: If you ask it whether Harry Potter is a real person it will correctly tell you he's fictional, despite the overwhelming majority of the text concerning Harry Potter -- How many billions of words of Harry Potter fanfiction are there? -- treating him as real.
A lot of people argue that LLMs are incapable of understanding context or judging the quality of sources, but that's just... obviously untrue? Ask Gemini whether magic is real, and it'll tell you about sleight of hand and historical beliefs about witchcraft, but conclude the answer is very likely 'no.' Ask it what the spell Create or Destroy Water does and it'll quote the 5th edition rulebook. It understands what was meant by each question perfectly. And it does understand: respond to the second with 'But magic isn't real, right?' and it'll explain the implied category error as well as you could wish.
It's not that it doesn't learn the incorrect ideas in its training set -- tell it to emulate a Yahoo Answers poster and it can do so -- it just also learns contextual information about those ideas (such as that they're false) much as we do. Tell it you want a good answer (which is largely what post-training does) and it'll know to discount those sources. It doesn't do so perfectly, but the notion they lack the capacity altogether is not credible.
Regarding @dr_analog's point:
This is true so far as I know; did you actually try it? LLMs are bad at tasks requiring strict precision, accuracy and rigor that can't be objectively and automatically judged. There's a huge disconnect between performance on math/coding, where it's trivial to generate good/bad responses for DPO etc. post-training, and subjects like law, where it isn't. @dr_analog is right: LLMs are currently much better at exactly math/coding than they are at essay writing, purely due to the ease of generating high-quality synthetic data.
I actually see a fair bit of Chinese in longer conversations - not enough to make it unreadable, but enough for me to notice.
Take a look at the attached image. That's about a week old. Once you've looked at it, go look up that ticker. (Thanks to @ToaKraka for pointing out the image feature, BTW). That one was a pretty big shock to me from Gemini 3 fast. It doesn't do it every time, but it's done it more than once for that exact ticker.
/images/17711967195902364.webp
Huh, are you giving it any Chinese characters in the prompt? Which model(s)? I think I've seen this from a commercial model exactly once (Gemini 2 Pro), when I was asking some pretty in-the-weeds questions about Shinto and Japanese Buddhism and it gave me quotes in Japanese without translating them, and even there, its own words were in English. The Deepseek R1 paper mentions language confusion in reasoning blocks was a problem before post-training, but I never encountered it with the final model. I have seen it from some small open weights models, but they're kind of dumb all around.
Yeah, that doesn't shock me. Not quite the case I meant. The reason code specifically is special is that they can use this process:
Which works very well. The reason normal prose hasn't seen nearly as much improvement is that judging prose takes skilled human labor to do well, and these huge models are so data-hungry it's just not feasible to get enough of it. (I also suspect a lot of these companies like their models bland and obsequious -- customer support scripts have the same qualities, and those at least were written by real people.) So you only really see these big gains for code and math (for which a similar process can be developed).
This specific example is kind of borderline. It's a dynamic table, right? Something the model made up to answer your prompt? While it got things objectively wrong in a manner that's in principle possible to automatically check, setting up automatic checking for any claim of fact is not as easy as running pylint, which really will catch any syntax error. I imagine they do try to DPO for cases like this, but it's a lot harder.
Models are prone to just making stupid errors occasionally on even the most basic tasks, and I don't know if we're going to be able to find a real solution to that. Something that does help (and is often used on benchmarks) is taking the consensus result of several runs, but that massively inflates inference costs for a relatively small reduction in error rate. It does seem to be a hard problem, in that it's only gotten a a bit better over the past year or so. (There was more improvement in 2024, which I take as a bad sign; they've already tried the easy stuff.)
I'm not giving Chinese characters in the prompts. I don't speak a lick of Chinese. I've seen it in Gemini 3 fast, thinking, and pro. Usually it's for questions about electronics, though it's come up for questions about music theory as well.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link