site banner

Culture War Roundup for the week of April 8, 2024

This weekly roundup thread is intended for all culture war posts. 'Culture war' is vaguely defined, but it basically means controversial issues that fall along set tribal lines. Arguments over culture war issues generate a lot of heat and little light, and few deeply entrenched people ever change their minds. This thread is for voicing opinions and analyzing the state of the discussion while trying to optimize for light over heat.

Optimistically, we think that engaging with people you disagree with is worth your time, and so is being nice! Pessimistically, there are many dynamics that can lead discussions on Culture War topics to become unproductive. There's a human tendency to divide along tribal lines, praising your ingroup and vilifying your outgroup - and if you think you find it easy to criticize your ingroup, then it may be that your outgroup is not who you think it is. Extremists with opposing positions can feed off each other, highlighting each other's worst points to justify their own angry rhetoric, which becomes in turn a new example of bad behavior for the other side to highlight.

We would like to avoid these negative dynamics. Accordingly, we ask that you do not use this thread for waging the Culture War. Examples of waging the Culture War:

  • Shaming.

  • Attempting to 'build consensus' or enforce ideological conformity.

  • Making sweeping generalizations to vilify a group you dislike.

  • Recruiting for a cause.

  • Posting links that could be summarized as 'Boo outgroup!' Basically, if your content is 'Can you believe what Those People did this week?' then you should either refrain from posting, or do some very patient work to contextualize and/or steel-man the relevant viewpoint.

In general, you should argue to understand, not to win. This thread is not territory to be claimed by one group or another; indeed, the aim is to have many different viewpoints represented here. Thus, we also ask that you follow some guidelines:

  • Speak plainly. Avoid sarcasm and mockery. When disagreeing with someone, state your objections explicitly.

  • Be as precise and charitable as you can. Don't paraphrase unflatteringly.

  • Don't imply that someone said something they did not say, even if you think it follows from what they said.

  • Write like everyone is reading and you want them to be included in the discussion.

On an ad hoc basis, the mods will try to compile a list of the best posts/comments from the previous week, posted in Quality Contribution threads and archived at /r/TheThread. You may nominate a comment for this list by clicking on 'report' at the bottom of the post and typing 'Actually a quality contribution' as the report reason.

7
Jump in the discussion.

No email address required.

Well, the final report of the Cass Review just dropped. It's getting coverage in mainstream publications like the BBC. Surprising no one who paid attention to the interim report, it concludes that there is insufficient evidence in the realm of trans healthcare for children:

Cass told BBC Radio 4's Today programme that clinicians had been worried about having "no guidance, no evidence, no training".

She said "we don't have good evidence" that puberty blockers are safe to use to "arrest puberty", adding that what started out as a clinical trial had been expanded to a wider group of young people before the results of that trial were available.

"It is unusual for us to give a potentially life-changing treatment to young people and not know what happens to them in adulthood, and that's been a particular problem that we haven't had the follow-up into adulthood to know what the results of this are," she said.

Critics are already jumping on the fact that the report used the GRADE approach to categorize evidence, which only allows randomized control studies to be classified as "high quality of evidence" and which can drop non-blinded studies one level in assessed quality, thus preventing many non-blinded studies from qualifying as high quality evidence. (Bold is edit added later. See ArjinFerman's response below, and my response - original GRADE standards can be found here.) The critics point out that double-blinded randomized control studies just aren't possible in some areas of medicine. For a simple example, if the intervention is something like "cosmetic breast augmentation", then there's logically no sensible control group - since there's no placebo that can make people believe they got bigger breasts when they didn't. (It's worth pointing out that this criticism of GRADE isn't unique to trans activists. The Wikipedia page for GRADE mentions it is criticized in general when it comes to slowly progressing diseases like atherosclerosis, where observational studies are easier to perform than RCTs.)

As a result of the GRADE approach, we read things like this in the report:

Understanding intended benefits and risks of puberty blockers

[...]

There was one high quality study, 25 moderate quality studies and 24 low quality studies. The low quality studies were excluded from the synthesis of results.

My own opinion is that I can partially agree with Cass that I want to see higher quality studies around trans healthcare for children in general, but I think that her methodology (using GRADE) is of the sort that will always say we "don't have enough high quality studies", and so her arguments don't have legs to stand on. A problem I see a lot in studies is using some "industry standard" for investigating a topic, and coming to a result of some kind, but failing to justify why the "industry standard" was the best thing to use here. In a better version of the Cass Review, I would have liked to see a few paragraphs justifying the use of GRADE, and explaining why they used this standard and not some other standard.

I mean, isn't that a thing good scientific reports in general do at all steps of the process? Think of what a critic would claim about your model and methodology, and then explain why your model or methodology is the best one to use in this particular instance. Show that your findings are robust even if you used some slightly different model or methodology, and explain what conditions are necessary for your model or methodology to fail. A quick search through the Cass Review shows that it doesn't seem to have done this. It just used GRADE, didn't really justify the decision, and didn't discuss alternatives or why its arguments are robust under alternative assumptions about the data.

It's a bit circular to arbitrarily use a standard that will say, "there are basically no high quality studies in this medical field" no matter what, and then to conclude in your recommendations to the government, "We need more high quality studies before we do anything more in this medical field!"

The Endocrine Society's guildelines and WPATH's guidelines are the most cited and beloved by most activist groups.

Both of them also use GRADE.

The Endocrine Society's clearly indicate the strength of evidence for each of their guidelines and nearly all of them are listed as "very low quality evidence" or "low quality evidence". In the evidence sections they often clearly mention and discuss the studied that support their findings

WPATH is a lot less transparent in their evaluations. They say that anything listed as "we recommend" has high quality evidence and nearly all of their statements are "we recommend". Most of the time things are just mentioned and then they list various citations, without any actual discussion of the specific study. Nevertheless, they say they are evaluating them an adapted GRADE approach.

Neither of them spend much time justifying their use of GRADE.

I have never seen either of these criticized for using GRADE. It is only when the Cass review also uses GRADE, but has conclusions different than those that someone already agreed with is it called into question. This is clearly motivated reasoning and tendentious.

and which can drop non-blinded studies one level in assessed quality, thus preventing many non-blinded studies from qualifying as high quality evidence

Were there even any randomized, non-blinded studies cited? I skimmed the references and didn't see anything. And it'd make sense that there aren't any randomized trials of puberty blockers or hormones given the emotional weight everyone puts on the issue. I'm not sure how this is relevant unless there are specific 'non-blinded studies' that aren't classified as 'high quality'.

Or maybe you're referring to a more sophisticated criticism, that these "critics" are making. What critics? Where? May we have a link?

Why make this particular criticism? How does it tie into the main claims the report makes? Can you at least outline the core of the report, what it wants to tell us and how it attempts to support those claims, before you attempt to undermine it? This is like a twitter swipe - take a thing, point out a "flaw", write like this flaw is a critical flaw, and watch as everyone's satisfied that the bad guys were wrong again, without anyone involved understanding what the thing even is.

It'd be more interesting to explain the context behind the report - the politics and medical practice in youth transgender medicine in the UK for the past few years - and then explain what the report claims, and then go into the reactions it's gotten.

If you want a criticism, I think the best one is just: There are ethical (it's conversion therapy for the control group) and methodological reasons to not do RCTs in trans youth. Given that, we need to use the evidence we have, and a standard requiring RCTs is bad.

(edited because I used the wrong link)

I'd dispute that - there's a reason high quality evidence requires RCTs, it's because history has shown that observational studies are just not reliable. If you disagree, I'd suggest picking a specific study (not review) that this review considered low-quality but you think is good enough to form part of the foundation for a medical guideline, and we can critically examine it and see if it is. I don't think there are ethical reasons to not do RCTs for trans youth that wouldn't also apply to RCTs for treatments for deadly diseases, which we do all the time when it isn't clear if the treatment is beneficial or not. I think the methodological reasons are ... significant, but (guessing) not in fact worse than the problems with observational studies.

Were there even any randomized, non-blinded studies cited?

Not only were they cited, the study that got graded "high quality" wasn't even really randomized, let alone blinded. Here's a list of all the papers the systematic review of puberty blockers looked at, and a breakdown of their grades.

If existing scientific methods aren't enough to analyze this issue, imagine how awesome it would be if the transgender community and their vanguard managed to push for prediction markets to study this subject.

(I like prediction markets)

Not all randomized control trials are blinded randomized control trials. All you need for a randomized control trial is to randomly assign a group of patients that gets the treatment and a group that doesn't. As far as I know, no long-term randomized control study of gender transition has ever been conducted, in either children or adults.

Non-RCT's are if anything even worse than euphemisms like "moderate-quality" make them seem, reading something like Scott's ivermectin post might help give a sense for it. That's why fields like nutrition, where long-term randomized control trials are impractical, are so terrible despite far more quantity and quality of research than a small field like gender dysphoria.

As a result of the GRADE approach, we read things like this in the report:

There was one high quality study, 25 moderate quality studies and 24 low quality studies. The low quality studies were excluded from the synthesis of results.

No, it's way worse than that, the high/moderate/low quality ratings were based on the cited meta-study and seem if anything too lenient. Reading the meta-study, many of the studies only looked at physical outcomes like "is puberty suppressed", they made no attempt to measure psychological outcomes to determine whether suppressing puberty actually provided any benefit. This is the supposed single "high-quality" study. It isn't a randomized control study, it compares patients who have been given puberty blockers to ones who just started the assessment process. (It also compares to a "cisgender comparison group", such comparisons tend to be even more worthless.) Among other potential problems, this means the results are very plausibly just regression to the mean or benefits from the other mental-health care provided. If you think the parents of children with worse self-reported "internalizing, suicidality, and peer relations" are more likely to seek treatment than the parents of children who are currently doing fine, which the study itself shows, then improvement over time is the expected result even if you don't do anything. And then here are the detailed explanations of why they considered the other studies to be even worse.

You will sometimes see a medical study described as a "blinded randomized controlled trial" and other medical studies as an "open label randomized controlled trial". Whether a study is blinded or open-label is a separate issue from having a control group and separate again from having random assignment to treatment. The Wikipedia page on GRADE doesn't mention blinding. Checking the website of the National Institute for Clinical Excellence (NICE) the reference to randomized controlled trials does not mention blinding.

A study on breast augmentation can qualify as a randomized controlled trial, generating top quality evidence, with the following design: recruit 200 subjects. Randomly assign 100 to have the operation. The others are the control group. Researchers must keep in touch with the all 200 to find out how things worked out for them.

Keeping in touch with all 200 might be tricky. Some of those in the trial group might be disappointed with the results of the surgery and feeling disillusioned with medical intervention may reject further contact with the trial. Some of those in the control group may interpret being rejected for surgery as being rejected from the trial and disappear. Such people are lost to follow up. How to analyse a trial with large loses to follow up is controversial. Do we blandly say "we don't know"? Should we interpret losses from the trial group as bad outcomes? One might vary the design. 100 breast augmentations. 50 get psychotherapy that aims to persuade them that they don't need breast augmentation. 50 get regular contact to keep in touch, but bland contact, merely reminding then that they also serve who only stand and wait.

Scott's deep dive into Alcoholics Anonymous is my goto article for the practical importance of having a control group. Or should that be the disappointing effect of control groups?

If a trial does not have randomization, it is vulnerable to Simpson's Paradox. One may find that a medical treatment is beneficial, but partitioning the data into two exhaustive and mutually exclusive subgroups, find that the treatment is harmful to one of the subgroups and also harmful to the other subgroup. Wut? The analysis may collapse into baffling incoherence. Actually it is worse than that. The laws of arithmetic are chaotic evil, and permit that a conclusion that has been reversed by one partition may yet be swapped back by a finer one (if the Chrome browser objects to a faulty certificate, using incognito mode will work.)

The two issues, of needing a control group, and of needing randomization, are widely understood; I would not expect Dr Hilary Cass to restate the arguments in her report.

Edited to fix link to Simpsons Paradox, spotted error way too late :-(

The critics point out that randomized control studies just aren't possible in some areas of medicine.

And the critics are wrong. If you give a treatment to one group, and not give it to another to another, that's still an RCT. Or you can offer an alternative treatment to the control group. It's a plus when you can blind a patient to what they're getting, but it's not a strict necessity. In this case it's probably just as important to blind the researchers when they're assessing results as to blind the patients themselves.

In a better version of the Cass Review, I would have liked to see a few paragraphs justifying the use of GRADE, and explaining why they used this standard and not some other standard.

That demand seems arbitrary to me, and "that's what we use for everything" is a perfectly fine justification.

It's a bit circular to arbitrarily use a standard that will say, "there are basically no high quality studies in this medical field" no matter what, and then to conclude in your recommendations to the government.

Where did you get the idea that the decision was arbitrary? That studies with a control group are better than studies without one, or that random samples are better than self-selected surveys, isn't something that Cass ceme up with on the spot. The critics are more than happy to use the same strict standards when dismissing studies they don't like (see: ROGD).

This is all without going into the juicy details about some of these studies, some of which are hilariously bad. The "Dutch study" that kicked off the whole puberty blockers thing would most likely get the same result, even if you replaced blockers with farting in the patient's face, for example.

My opinion is that trans activists and researchers wildly oversold the scientific basis for the interventions they were promoting, and sometimes they were outright lying ("puberty blockers are reversible"). They could have just not done that, and tried to gradually accumulate stronger evidence. But the way things are, gender medicine should have never seen such widespread adoption, and people who allowed it should probably be punished.

Addressing the other parts of your post:

That demand seems arbitrary to me, and "that's what we use for everything" is a perfectly fine justification.

I agree it's "fine" from a CYOA point of view, as in, no one will be able to blame you for using a standard tool used across the industry. But from the perspective of trying to perform a Bayesian update based on the final report, I'm not sure I agree.

A lot of the scientific method in general is a heuristic crystallization of Bayesian approaches, and so I have no doubt that a lot of what is present in GRADE is justifiable across a wide swath of evidence, and comes to largely the same answer as a Bayesian approach would. But I think that if GRADE systematically downgrades some kinds of evidence from being "high quality", which in a proper Bayesian approach wouldn't require any serious adjustment, that can lead to certain evidence being ignored or de-emphasized compared to where it should.

My opinion is that trans activists and researchers wildly oversold the scientific basis for the interventions they were promoting, and sometimes they were outright lying ("puberty blockers are reversible"). They could have just not done that, and tried to gradually accumulate stronger evidence. But the way things are, gender medicine should have never seen such widespread adoption, and people who allowed it should probably be punished.

I think absent any other evidence, just the existence of the Replication Crisis is enough to call a lot of medicine into doubt, and I see no reason why this wouldn't apply to trans healthcare. That the evidence is weaker than often claimed, is almost certainly true. (I'm not sure that that isn't the case for a wide variety of healthcare fields as well though - is trans healthcare uniquely bad, or is it just as bad as medicine as a whole, and do we need to adopt a whole swath of reforms to deal with things like p-hacking, the file drawer effect, small sample sizes, etc.)

I agree with Cass' conclusion, even if I question her methodologies, because I want to see higher quality medical evidence around trans issues, and especially trans kids. I want the medical research to be beyond reproach, whatever conclusions it comes to.

The basic problem with medicine, across the board, is that we're routinely doing barbaric things to be people, and the only justification we can have is that the evidence shows it will have a better outcome for the patient. Chemotherapy involves poisoning a patient with the hope that the poison will kill the cancer faster than it kills the patient. Amputating a limb might be a tough decision sometimes, but it is most justified if a patient would likely die if you didn't do it.

I want the evidence we use in all instances, especially trans healthcare to be airtight so that no one can say we're poisoning people or removing functional limbs or organs for no reason. It'll still be "barbaric", but if it can be justified as much as chemotherapy, then I think trans healthcare will be in a good place.

I agree it's "fine" from a CYOA point of view, as in, no one will be able to blame you for using a standard tool used across the industry. But from the perspective of trying to perform a Bayesian update based on the final report, I'm not sure I agree.

Well then I have 4 words for you: isolated demand for rigor. If you want to throw out all the published studies, and force the authors to do them right, I'm game. If we're supposed to apply the highest standards to Cass, and ignore the gaping holes in the literature published to date, I don't think you'll get a lot of people signing up for that.

if GRADE systematically downgrades some kinds of evidence from being "high quality"

As per my other comment, I've seen no indication that it does. The whole argument smells like a scramble to get some talking points out ASAP so the report doesn't get to circulate uncontested, even for just a few days.

The basic problem with medicine, across the board, is that we're routinely doing barbaric things to be people, and the only justification we can have is that the evidence shows it will have a better outcome for the patient. Chemotherapy involves poisoning a patient with the hope that the poison will kill the cancer faster than it kills the patient. Amputating a limb might be a tough decision sometimes, but it is most justified if a patient would likely die if you didn't do it.

There's a few major differences between cancer/chemotherapy and dyshporia/GAC. For one, the risks of cancer are pretty well measured. A doctor can tell you "you have an X% chance of living Y months/years" and be mostly right. By contrast a GAC doctor saying "would you rather have a happy daughter or dead son" is stoking fears that aren't justified by data at all. We are also open about the mechanism and effects of chemotherapy, every doctor will tell you it's basically poison, but the hope is it will kill your cancer before it kills you. By contrast puberty blockers are declared to be a magical pause button, safe, and fully reversible. That's just an outright lie. We also have good data about the chances of chemotherapy working, but not for puberty blockers improving outcomes for dysphoria. Finally, even if the decision to undergo treatment is the right one based on available data, we only do it with informed consent, which we tend to not have in case of GAC, by gender clinicians' own admission (see: WPATH Files).

You're right that there are issues in all of medicine, but we ensured there are some guardrails around it to minimize the barbarity. The guardrails were happily abolished for GAC at the insistence of trans activists, and the result is that "gender affirming care" is a lot more barbaric than other forms of medicine practiced today.

And the critics are wrong. If you give a treatment to one group, and not give it to another to another, that's still an RCT. Or you can offer an alternative treatment to the control group. It's a plus when you can blind a patient to what they're getting, but it's not a strict necessity. In this case it's probably just as important to blind the researchers when they're assessing results as to blind the patients themselves.

You're right of course. I think the concerns are more nuanced in some areas of medicine.

I doubt it applies to trans medicine, but I have heard of cases where medicine has such obvious positive effects for the sample group early on, that it then becomes unconscionable to not provide it to the control group (mostly in cases involving terminal diseases with quick turn arounds.) This would be one instance where a study initially meant to be a RCT trial for a terminal disease, might turn into an observational study instead.

And I was clearly thinking of double-blinded RCTs being nearly impossible in some cases, which I believe is true in some areas of medicine, but I can admit that GRADE only requires RCTs period for evidence to be considered high quality. That said, reading through the actual GRADE hand book it does seem like Lack of Blinding is considered a risk for study bias, which can drop a piece of evidence one level:

Example 3: High Risk of Bias due to lack of blinding (Downgraded by One Level)

RCTs of the effects of Intervention A on acute spinal injury measured both all-cause mortality and, based on a detailed physical examination, motor function. The outcome assessors were not blinded for any outcomes. Blinding of outcome assessors is less important for the assessment of all-cause mortality, but crucial for motor function. The quality of the evidence for the mortality outcome may not be downgraded. However, the quality may be downgraded for the motor function outcome.

I'm going to edit my original post to reflect this information, but I'll make clear what I'm adding. Basically, it appears to be the case that non-double-blinded RCTs cannot easily be high quality evidence according to GRADE.

Where did you get the idea that the decision was arbitrary?

I tried to search through the report, and they just used GRADE without really explaining why. I suppose "arbitrary" isn't quite the right word, but "unjustified within the report" is probably defensible.

And I was clearly thinking of double-blinded RCTs being nearly impossible in some cases

Ok, but what does that have to do with the Cass Review then?

I haven't even started reading the report yet, but I looked up the excerpt you quoted about only one study on puberty blockers being high quality. If you check the report, the excerpt references "Taylor et al: Puberty suppression", CTRL+Fing for that takes you to table 1 on page 53. The full title of the systematic review is Interventions to suppress puberty in adolescents experiencing gender dysphoria or incongruence: a systematic review, and the "one study" is Psychological Functioning in Transgender Adolescents Before and After Gender-Affirmative Care Compared With Cisgender General Population Peers. The "methods" section from the abstract says:

Methods: In this cross-sectional study, emotional and behavioral problems were assessed by the Youth Self-Report in a sample of 272 adolescents referred to a specialized gender identity clinic who did not yet receive any affirmative medical treatment and compared with 178 transgender adolescents receiving affirmative care consisting of puberty suppression and compared with 651 Dutch high school cisgender adolescents from the general population.

You can even look up the break down of it's score in the supplemental material of the systematic review (it's the last entry in the table - "van der Miesen"), it only got dinged for "controls for co-interventions" and "assessment of outcome", and still got a final grade of "high quality".

So where is this idea that Cass was autistically demanding a double-blinded study, where it was not applicable, coming from?