This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.
Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.
We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:
-
Shaming.
-
Attempting to 'build consensus' or enforce ideological conformity.
-
Making sweeping generalizations to vilify a group you dislike.
-
Recruiting for a cause.
-
Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.
In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:
-
Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.
-
Be as precise and charitable as you can. Don't paraphrase unflatteringly.
-
Don't imply that someone said something they did not say, even if you think it follows from what they said.
-
Write like everyone is reading and you want them to be included in the discussion.
On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.
No email address required.
Notes -
I definitely believe that younger doctors are more up-to-date in best practices and aren't full of old knowledge that has proven ineffective or even harmful.
But if you could hold other factors approximately equal, I'd still bet my life on the guy whose' seen 10,000 cases and performed a procedure 8000 times over someone who is merely younger but with 1/3 the experience.
Lindy rule and all that. If he's been successfully practicing for this long its proof positive he's done things right.
Yeah, I suspect that even if LLMs are a full standard deviation IQ higher than your average doctor, the massive disadvantage of only being able to reason from the data stream that the humans have intentionally supplied, and not go in and physically interact with the patient's body will hobble them in many cases. I also wonder if they are willing/able to notice when a patient is probably straight up lying to them.
And yet, they're finding ways to hook the machine up to real world sensor data which should narrow that advantage in practice.
And as you gestured at in your comment... you can very rapidly get second opinions by consulting other models. So now that brings us to the question of whether the combining the opinions of Claude AND Gemini AND ChatGpt would bring us even better results overall.
https://www.nature.com/articles/s41598-022-15275-7
I don't think 14% is a big deal, there's already a great deal of heterogeneity in terms of surgical outcomes for all surgeons overall, but it does exist.
While it's frustratingly hard to find actual sources on the average IQ of doctors, most claim an average of 120-130. IQ testing LLMs on human IQ tests like the Stanford-Binet or Weisler-IV is fraught, but I've seen figures around 130 from o3 onwards. If I had to wild-ass-guess, 130+ is a fair estimate for GPT 5.2T, and versions with enhanced reasoning budgets are making novel discoveries in mathematics, so...
(Did I say that IQ research in doctors is bad? Oh boy, just see what it's like for LLMs. There are papers still awaiting peer-review that use Claude 3.5 Sonnet. The field moves *fast".)
https://www.trackingai.org/home seems better than nothing and claims 140 IQ for GPT 5.2T on the public Mensa Norway test and 129 for the so-called offline version.
Make of that what you will.
I haven't specifically tested models on their ability to catch lies or inconsistencies, but I think they'd do okay, but probably worse than a decent human doctor. This is a moderate confidence claim, and could be ameliorated by giving them previous clinical records, but a video feed would be even better (doable today). I'm already zooted on stims and typing up massive comments instead of studying, or else I'd try it myself.
The Transformer architecture is a universal function approximator, LLMs are already multimodal despite the name, and in the worst case, they can ingest text instead of raw sensor data like humans often do. I don't look inside the SpO2 probe, I read the number.
I've seen some evidence that diversity of models is good, for models of similar general competence. Even so, just putting the same info into another instance of the same model is highly effective, and I wouldn't yell at someone who did that.
I'd also be suspicious that this could be an artifact of the older surgeons being handed the tougher cases, or handling older patients such that complications are somewhat more likely to arise.
Similar logic to that study about black babies getting 'worse' outcomes when treated by white doctors... which dissolves when accounting for the fact that the white doctors were getting the toughest cases of any given race.
At any rate I'm sure there's more direct ways to assess a surgeon's skills from the outside (although apparently just asking for their IQ results is out?) but finding one that's a reliable, hard-to-fake signal is the challenge.
The main thing I appreciate about LLMs is the general fact that I can ask them to detail the source of all their knowledge and they can generally cite and point to it all so I can double-check myself, whereas I'd guess most doctors would scoff if you tried to "undermine their credibility" in such a way.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link