site banner

Culture War Roundup for the week of April 24, 2023

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

Jump in the discussion.

No email address required.

Dr. GPT will see you now

I've often noted around these parts that GPT-4 is a really good clinician. It would be hard for it not to be one, I'd be bloody amazing too if I had quasi-eidetic memory of most relevant diagnostic criteria and guidelines. That being said, just the guidelines alone are next to useless, you need to have the crystalline intelligence to apply it in a specific context.

I'd go so far as to say that it's a better, more competent doctor than the majority of human ones, even if it isn't outright superhuman (I've seen some ridiculously good human ones). I read with some humor an article about how it was tested by some bigshot hospital in the US (Harvard Med?) as a clinical adjunct, but by the end of it, the Clinical Director was praising its bedside manner, in the sense that it was kind, compassionate, patient and understanding, and outright suggested human doctors try and emulate that(!).

However, I have one resounding advantage over it namely that thanks to decades of regulatory bullshit, I have a certificate saying I can legally prescribe meds and it can't. L Bozo 💅, as the kids these days say haha.

Still, it would be remiss of me not to make use of it to improve my own skills while that's still relevant, so of late, I've tried using it to hone my own clinical skills. I have it write clinical vignettes that I solve, and then it grades me on my performance.

So far, it's done an excellent job, and funnily enough, I feel a tingling of pride when it says I did a good job. Why wouldn't I? It's a smarter and more decent entity than most humans I've had the displeasure of meeting.

Here's an example:

Clinical Vignette: A 42-year-old male patient presents to the emergency department with sudden onset severe, colicky pain in the right flank and lower abdomen that began 6 hours ago. He also reports nausea, vomiting, and an urgent need to urinate. The patient has a history of hypertension, which is controlled with medication. His vital signs are: blood pressure of 145/95 mmHg, pulse rate of 100 bpm, respiratory rate of 20 breaths/min, and temperature of 99.1°F (37.3°C). Upon physical examination, there is tenderness to percussion on the right side of the lower abdomen and right flank, and the patient appears restless, frequently shifting positions.

What is your diagnosis and treatment plan?

My response:

My diagnosis is a case of a stone in the right ureter.

In terms of management, I'll start IV fluids and insert a catheter. IV antibiotics are to be started according to hospital guidelines.

IV morphine should be given as pain control, along with IV antiemetics.

An Xray KUB or USG should be ordered to identify the size of the stone, and then further treatment depends on said size. A wait and watch policy is appropriate for sub 2mm stones, while larger ones might need surgical removal.


Grade: A

Your response demonstrates a strong understanding of the clinical presentation and management of ureteral stones.

Diagnosis: Your diagnosis of a stone in the right ureter is appropriate given the patient's sudden onset severe, colicky pain in the right flank and lower abdomen, nausea, vomiting, and urgency to urinate. The physical examination findings, such as tenderness to percussion and restlessness, also support this diagnosis.

Treatment Plan: Your plan to start IV fluids and insert a catheter is appropriate for initial management, monitoring, and providing relief to the patient. Initiating IV antibiotics may not be necessary unless there are signs of infection or sepsis; however, following hospital guidelines is always a wise choice.

Administering IV morphine for pain control and IV antiemetics for nausea and vomiting are essential parts of the treatment plan.

Ordering an X-ray KUB (kidneys, ureters, and bladder) or ultrasound (USG) is appropriate to confirm the diagnosis and determine the size of the stone, which will guide further management. As you mentioned, a conservative approach of observation and pain management may be suitable for small stones (less than 5 mm) that are likely to pass spontaneously, while larger stones may require surgical removal or other intervention, such as extracorporeal shock wave lithotripsy (ESWL) or ureteroscopy.

Overall, your response demonstrates excellent clinical knowledge in identifying and managing a patient with suspected ureteral stone. Well done!

Honestly, as a pedagogical tool for medical education, it's unparalleled. I paid a great deal less attention in med school than I should have, from the month of anatomy classes I can't remember because I was getting frisky under the bench, to the period where I was severely depressed and barely dragging myself to class. I used to be very insecure about my knowledge base, but I did pull together and single-handedly teach myself most of the things I'd glossed over while preparing for the PLAB.

I've tried a similar method on GPT-3.5, and it was inadequate to the task. It made a lot more errors, and ended up confused often enough to be annoying. Till date, I haven't seen 4 fumble the bag once. I'd put their competency around the marks of a decent final year student versus a competent postgraduate resident

I find this idea very scary - I'm sure you know about the LLM phenomenon of what's been dubbed hallucination, so how can you trust the model with something as impactful as medicine? It might be better than a human doctor* 99% of the time, or even 99.99%, but like self driving cars this is exactly the kind of case where being good but not quite as good as a human is really dangerous. Remember that the script of House, MD is in the training set right along side WebMD with nothing to distinguish the truth from fiction - are you going to catch when the model probabilities switch from being driven by Gray's Anatomy to Grey's Anatomy? There's no way to know mathematically, because the whole thing is a non-deterministic black box.

If the process is as simple as to pattern match symptoms to diagnosis from a list of previous diagnoses with their associated symptoms, why can't this be done with a good old stupid database? Something like this has to exist. Does the LLM have an advantage beyond writing the answer in nice prose?

My warning to anyone trying to evaluate LLMs: your ability to judge it is inevitably compromised by the fact that the only optimization parameter the model has is how good the output sounds to a human, and you are not immune to being human!

*Edited, original was overall accuracy instead of relative to a human

It might be right 99% of the time, or even 99.99%, but like self driving cars this is exactly the kind of case where being good but not quite as good as a human is really dangerous.

Is getting things right 99.99%, or even 99% of the time not quite as good as a human doctor, rather than much better than a human doctor? I honestly don't know, and I'm not even sure how to quantify something like this to make like-for-like measurements. But surely at some % rate of success, the LLM would be getting things right at a rate better than could be expected of a typical (or even nth percentile) human doctor.

Which is scary in its own way. An LLM's "thought process" is currently completely unknowable, unlike that of a human doctor. So the types of mistakes it makes are likely to be more mysterious. Yet if it makes fewer mistakes and/or those mistakes are, on net, less harmful/unhealthy, then would it be our moral obligation to use those LLMs over doctors?

I was thinking in % as accurate as a human, but didn't say that. I'll correct my original post.

I was also assuming it wouldn't be as good as a human, because of course a general model wouldn't accidentally be better than a specialist... Or would it?

I don't think it would be that hard to devise an experiment to get at least a rough idea of it's capabilities. Get some doctors, maybe professors, to devise questions of the sort OP was generating, present them to LLMs and real doctors (and maybe non-doctors with Google for an extra point of comparison), then have the professors grade the answers blind. I recall people giving LLMs math problems in this way, but I don't know if experiments have been performed with any rigor.

To your final point: what is the purpose of a doctor? Is it to heal at all costs, or is it to make people feel better? Many people go to doctors seeking specific prescription medication as a goal, when they could cure their ailment in a better but more laborious way. Some of them don't even have an ailment and only desire the effect of the meds. Many people also have procedures performed which physically make them less healthy, but fulfill their desires.

So, what should an LLM say to these people? "Don't take the pills, exercise" or "don't have surgery, improve your social life?"

I suppose it's not a new problem, but it does move the power and responsibility. Who is it moving to though?

If it interests you, GPT-4 successfully passed the USMLE, a pretty difficult exam that is intended to be a benchmark for the minimum level of competence a doctor needs to be able to practise. I'd say that counts for a lot.

I've thrown GPT 3.5 at a significant number of medical professionals, since I have several in my close and extended family. And even that antiquated model was sufficient to impress them, so I think the bar was already beaten before 4 showed up!

Just for the record, self made human is a real doctor, or a good enough liar to convincingly pass for one. His op is basically him starting your test and being blown away by the initial results.