site banner

Culture War Roundup for the week of February 16, 2026

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

3
Jump in the discussion.

No email address required.

I promise I'm not trying to be a single purpose account here, and I debated if this belonged here or the fun thread. I decided to go here because it is, in some ways, a perfect microcosm of culture war behaviors.

A question about car washing is taking HN by storm this morning. Reading the comments, it's pretty funny. The question is, if you want to wash your car, should you walk or drive to the car wash if it's 50 meters away.

Initially, no model could consistently get it right. The open weight models, chat gpt 5.2, Opus 4.6, Gemini 3, and Grok 4.1 all had a notable number of recorded instances saying of course you should walk. It's only 50 meters away.

Last night, the question went viral on the tik Tok, and as of this morning, the big providers get it correct like somebody flipped a switch, provided you use that exact phrase, and you ask it in English.

This is interesting to me for a few reasons. The first is that the common "shitty free models" defense crops up rapidly; commentors will say that this is a bad-faith example of LLM shortfalls because the interlocutors are not using frontier models. At the same time, a comment suggests that Opus 4.6 can be tricked, while another says 4.6 gets it right more than half the time.

There also multiple comments saying that this question is irrelevant because it's orthogonal to the capabilities of the model that will cause Mustafa Suleyman's Jobpocalypse. This one was fascinating to me. This forum is, though several steps removed, rooted in the writing of Scott Alexander. Back when Scott was a young firebrand who didn't have much to lose, he wrote a lot of interesting stuff. It introduced me, a dumb redneck who had lucked his way out of the hollers and into a professional job, into a whole new world of concepts that I had never seen before. One of those was Gell-Mann Amnesia. The basic idea is that you are more trusting of sources if you are not particularly familiar with a topic. In this case, it's hard not to notice the flaws - most people have walked. Most have seen a car. Many have probably washed a car. However, when it comes to more technical, obscure topics, most of us are probably not domain experts in them. We might be experts in one of them. Some of us might be experts in two of them, but none of us are experts in all of them. When it comes to topics that are more esoteric than washing a car, we rapidly end up in the territory of Dick Cheney's unknown unknowns. Somebody like @self_made_human might be able to cut through the chaff and confidently take advice about ocular migraines, but could you? Could I? Hell if I know.

Moving on, the last thing is that I wonder if this is a problem of the model, or the training techniques. There's an old question floating around the Internet where asking an LLM if it would disarm a nuclear bomb by saying a racial slur, or condemn millions to death. More recently, people charted other biases and found that most models had clear biases in terms of race, gender, sexual orientation, and nation of origin that are broadly in line with an aggressively intersectional, progressive worldview. Do modern models similarly have environmentalism baked in? Do they reflexively shy away from cars in the same way that a human baby fears heights? It would track with some of the other ingrained biases that people have found.

That last one is interesting, because I don't know of anyone who has done meaningful work on that outside of what we consider to be "culture war" topics, and we really have no idea what else is in there. My coworker, for example, has used Gemini 3 to make slide decks, and she frequently complains that it is obsessed with the color pink. It'll favor pink, and color palettes that work with pink, nearly every time for her. If she tells it not to use pink, it'll happily comply by using salmon, or fuschia, or "electric flushed cheek", or whatever pantone's new pink synonym of the year is. That example is innocuous, but what else is in there that might matter? Once again, hell if I know.

Somebody like @self_made_human might be able to cut through the chaff and confidently take advice about ocular migraines, but could you? Could I? Hell if I know.

I still saw a real doctor after consulting the models. In fact, I saw a doctor because I consulted the models: they raised the possibility of differential diagnoses like TIA (mini-stroke) that, while unlikely according to both my judgment and theirs, seemed worth ruling out. As I mentioned in the linked comment, Dr. GPT still lacks opposable thumbs. Most medical advice requires actual physical examinations and actual tests to implement.

This doesn't excuse the first two human doctors who misdiagnosed me. The symptoms were clearly inconsistent with their diagnosis, though I'm not confident 2024-era models would have caught this as quickly as today's versions do.


Beyond this specific case, I have thoughts.

LLMs are both force multipliers and substitute goods. "Substitute" sounds pejorative, but it shouldn't. An MRE is a poor substitute for a home-cooked meal if you're at home. But on a hiking trail, you'd gladly take that chicken tikka over nothing, even if your digestive system later files a complaint. A terrible car beats no car most of the time. And so on.

My medical training lets me extract more value from any model. But even without that training, LLM medical advice beats having no doctor at all. It beats frantically Googling symptoms at 2 AM like we used to do. One of my most upvoted posts on The Motte discussed GPT-4, which now lags so far behind the current state of the art that it's almost embarrassing. It was still incredibly useful at the time. Back then, I said:

I'd put their competency around the marks of a decent final year student versus a competent postgraduate resident

Now? Easily at or better than the median specialist.

(This is part of why people not paying close attention miss the improvements in models until there's a flashy new headline feature like image generation, web search, Deep Research, or in-interface code execution.)

At this point, I would trust GPT 5.2 Thinking over a non-specialist human doctor operating outside their lane. It gives better cardiology advice than an ophthalmologist would, better psychiatric advice than an ER physician. Even specialists aren't safe: I know cases where models outperformed my own superiors. I'd already noticed them making suboptimal choices; confirming this with citations from primary literature didn't take long.

For laypeople, this is invaluable, albeit bottlenecked by the need for humans who can authorize tests. LLMs can recommend the right drugs and doses, check for interactions, create personalized regimens, but you still need a human physician somewhere in the chain.

(Much of this reflects regulatory hurdles. See recent discussions about why LLMs giving legal advice lack the same privileges as lawyers saying identical things.)

LLMs serve as both complement and partial substitute for human physicians. Many doctors get defensive when patients quote ChatGPT at them. I try not to. Even the free tier usually gives non-terrible advice. It's eminently reasonable to consult LLMs for help, especially for non-critical symptoms. They're surprisingly good at flagging when seemingly innocuous problems might indicate something serious. For anything important, treat them as an informed second opinion before seeing a human doctor, or use them to review advice you've already received. I'd take any LLM-raised concerns from a patient seriously and double-check at minimum. If your current doctor isn't as generous, I apologize; your mileage may vary.

The Layman's Guide to Using LLMs for Medical Advice Without Shooting Your Dick Off

1. Pay for a state-of-the-art model. Your health is worth $20 a month, you fucking cheapskate. Google gives away their (almost) best model for free on AI Studio.

2. Be exhaustive. List every detail about your symptoms. When I asked GPT 5.2 Thinking or Gemini 3 Pro about my eye problems, I had an annotated Amsler grid and timeline ready. Over-explaining beats omitting details. Unlike human doctors, LLMs don't bill by the hour (yet). Remember that they don't have the ability to pull open your medical records or call your other doctor for you. What you put into them informs what you get out of them.

3. For anything remotely important, consult two or three models. Note commonalities and differences. If they disagree, have them debate until they reach consensus, or get another model to arbitrate. This effectively mitigates hallucinations, even though base rates are low these days.

4. Ask for explanations. Medical terminology is arcane. LLMs are nearly superhuman at explaining things at your exact level of understanding. I wish my colleagues were as good at communicating information, even when the information itself is correct. If you're confused about anything, just ask.

5. Optional: Ask for probabilistic reasoning. Get them to put numbers on things like good Bayesians. Have them use their search tools if they haven't already (most models err toward using them even when not strictly necessary).

6. Remember you'll need a human eventually. But you can enter that consultation well-prepared.

That's it, really. A year or two ago, I'd have shared sample prompts with extensive guardrails (red flags, conflicting treatment protocols, high yield tests etc). You don't need that anymore. These models are smart. They understand context. Just talk to them. They are smart enough to notice what matters, and to tell you when the right move is “stop talking to me and go get checked.” I did just that myself.


Edit:

Humans are hardly immune to hallucinations, confabulation or suggestibility. You might have fallen prey to:

Say silk five times. What do cows drink? Milk. Oh fuck, wait a second–

And that is not very good evidence of humans not being general-purpose reasoners. I invite people to look, actually fucking look at what AI can do today, and the rate of improvement.

At this point, I would trust GPT 5.2 Thinking over a non-specialist human doctor operating outside their lane.

Taking this at absolute face value, I wonder if this is at least partially because the specialists will have observed/experienced various 'special cases' that aren't captured by the medical literature and thus aren't necessarily available in the training data.

As I understand it, the best argument for going to an extremely experienced specialist is always the "ah yes, I treated a tough case of recurrent Craniofacial fibrous dysplasia in the Summer of '88, resisted almost all treatment methods until we tried injections of cow mucus and calcium. We can see if your condition is similar" factor. They've seen every edge case and know solutions to problems other doctors don't even know exist.

(I googled that medical term up there just to be clear)

LLMs are getting REALLY good at legal work, since EVERYTHING of importance in the legal world is written down, exhaustively, and usually publicly accessible, and it all builds directly on previous work. Thus, drawing connections between concepts and cases and application to fact patterns should be trivial for an LLM with access to a Westlaw subscription and ALL of the best legal writing in history in its training corpus.

It is hard to imagine a legal specialist with 50 years of experience being able to outperform an LLM that knows all the same caselaw and law review articles and has working knowledge of every single brief ever filed to the Supreme Court.

I would guess a doctor with 50 years of experience (and good enough recall to incorporate all that experience) can still make important insights in tough cases, that would elude an AI (for now).

I would guess a doctor with 50 years of experience (and good enough recall to incorporate all that experience) can still make important insights in tough cases, that would elude an AI (for now).

As an aside, older is not better for doctors. It's a common enough belief, including inside the profession, but plenty of objective studies demonstrate that 30-40 year old clinicians are the best overall. At a certain point, cognitive inflexibility from old age, habit, and not keeping up with the latest updates can't be compensated for from experience alone.

(This doesn't mean older doctors are bad doctors, it just means they aren't the best anymore, all else being equal)

Taking this at absolute face value, I wonder if this is at least partially because the specialists will have observed/experienced various 'special cases' that aren't captured by the medical literature and thus aren't necessarily available in the training data.

I think there's a component for this, but if pushed I wouldn't say it's the biggest factor. An ophthalmologist hasn't studied cardiology since med school, they might remember the general details and interactions when it comes to the drugs they prescribe, but they're still not a cardiologist.

Gun to my head, I'd say that the human doctors who outperform LLMs are still smarter, if not as well read (they can't boast a near encyclopedic knowledge of all of medicine like any half-decent LLM can). IQ matters, and some doctors are just that smart, while having the unfair advantage of richer interaction with a human patient. Plus LLMs don't have the same "scaffolding" or affordances, they can't just look or lay hands on a patient (though they can ingest pictures, that's still an extra step). I suspect the difference diminishes to a large extent when the doctors are given the exact same information as an LLM, say some kind of case overview and lab tests + imaging. GPT-4 was scoring at the 95th percentile level in the USMLE, and these days medical benchmarks are simply not good enough to compare between them (official, graded benchmarks, I'm sure you can make a few more-ad-hoc ones if you really try, though by "you" I mean a competent physician).

As an aside, older is not better for doctors. It's a common enough belief, including inside the profession, but plenty of objective studies demonstrate that 30-40 year old clinicians are the best overall. At a certain point, cognitive inflexibility from old age, habit, and not keeping up with the latest updates can't be compensated for from experience alone.

I definitely believe that younger doctors are more up-to-date in best practices and aren't full of old knowledge that has proven ineffective or even harmful.

But if you could hold other factors approximately equal, I'd still bet my life on the guy whose' seen 10,000 cases and performed a procedure 8000 times over someone who is merely younger but with 1/3 the experience.

Lindy rule and all that. If he's been successfully practicing for this long its proof positive he's done things right.

and some doctors are just that smart, while having the unfair advantage of richer interaction with a human patient.

Yeah, I suspect that even if LLMs are a full standard deviation IQ higher than your average doctor, the massive disadvantage of only being able to reason from the data stream that the humans have intentionally supplied, and not go in and physically interact with the patient's body will hobble them in many cases. I also wonder if they are willing/able to notice when a patient is probably straight up lying to them.

And yet, they're finding ways to hook the machine up to real world sensor data which should narrow that advantage in practice.

And as you gestured at in your comment... you can very rapidly get second opinions by consulting other models. So now that brings us to the question of whether the combining the opinions of Claude AND Gemini AND ChatGpt would bring us even better results overall.

But if you could hold other factors approximately equal, I'd still bet my life on the guy whose' seen 10,000 cases and performed a procedure 8000 times over someone who is merely younger but with 1/3 the experience.

https://www.nature.com/articles/s41598-022-15275-7

The mortality in patients undergoing surgery by old-aged surgeons was 1.14 (1.02–1.28, p = 0.02) (I2 = 80%) compared to those by middle-aged surgeon. No significant differences were observed according to the surgeon’s age in the major morbidity and subgroup analyses. This meta-analysis indicated that surgeries performed by old-aged surgeons had a higher risk of postoperative mortality than those by middle-aged surgeons. Thus, it necessitates the introduction of a multidisciplinary approach to evaluate the performance of senior surgeons.

I don't think 14% is a big deal, there's already a great deal of heterogeneity in terms of surgical outcomes for all surgeons overall, but it does exist.

Yeah, I suspect that even if LLMs are a full standard deviation IQ higher than your average doctor, the massive disadvantage of only being able to reason from the data stream that the humans have intentionally supplied, and not go in and physically interact with the patient's body will hobble them in many cases. I also wonder if they are willing/able to notice when a patient is probably straight up lying to them.

While it's frustratingly hard to find actual sources on the average IQ of doctors, most claim an average of 120-130. IQ testing LLMs on human IQ tests like the Stanford-Binet or Weisler-IV is fraught, but I've seen figures around 130 from o3 onwards. If I had to wild-ass-guess, 130+ is a fair estimate for GPT 5.2T, and versions with enhanced reasoning budgets are making novel discoveries in mathematics, so...

(Did I say that IQ research in doctors is bad? Oh boy, just see what it's like for LLMs. There are papers still awaiting peer-review that use Claude 3.5 Sonnet. The field moves *fast".)

https://www.trackingai.org/home seems better than nothing and claims 140 IQ for GPT 5.2T on the public Mensa Norway test and 129 for the so-called offline version.

Note: The "Offline" IQ quiz is a test made by a Mensa member that has never been on the public internet, and is in no AI training data. Mensa Norway is a public online IQ test.

Make of that what you will.

I haven't specifically tested models on their ability to catch lies or inconsistencies, but I think they'd do okay, but probably worse than a decent human doctor. This is a moderate confidence claim, and could be ameliorated by giving them previous clinical records, but a video feed would be even better (doable today). I'm already zooted on stims and typing up massive comments instead of studying, or else I'd try it myself.

And yet, they're finding ways to hook the machine up to real world sensor data which should narrow that advantage in practice.

The Transformer architecture is a universal function approximator, LLMs are already multimodal despite the name, and in the worst case, they can ingest text instead of raw sensor data like humans often do. I don't look inside the SpO2 probe, I read the number.

And as you gestured at in your comment... you can very rapidly get second opinions by consulting other models. So now that brings us to the question of whether the combining the opinions of Claude AND Gemini AND ChatGpt would bring us even better results overall.

I've seen some evidence that diversity of models is good, for models of similar general competence. Even so, just putting the same info into another instance of the same model is highly effective, and I wouldn't yell at someone who did that.