Last year I wrote a piece about the BMA Council’s decision to publicly critique the Cass Review. In that, I made numerous criticisms of a preprint paper which the BMA relied upon as part of that decision.
The BMA’s “task and finish” report was promised in January but still has not materialised - however, in the meantime, the preprint they cited approvingly has finally been published, and very little of substance has changed there. It has been reformatted and reordered, some of the claims and citations updated, others toned down a little but in essence it is broadly the same, so every criticism I made last year still applies. To summarise here though, the majority of the paper is directed at the methodology used by the York University systematic reviews of the evidence base, and seizing upon trivia to cast doubt on their results.
Since it has now been peer-reviewed though, it is getting a second round of attention - despite offering nothing new - so, rather than reiterate all the flaws I have previously drawn attention to I want to address one of its foundational assumptions in detail.
Before proceeding, it is important to reiterate that the critique is highly concerned with the use of the Newcastle-Ottawa Scale (NOS), claiming it is an inappropriate tool, and that the “adapted” version used by the York team invalidates the results:
The NOS has been criticised and the use of an adapted version negates previous attempts to validate the NOS.
I explained in my earlier piece why this is bunk - the NOS is very widely used, all tools are subject to criticism, replicating results with varied methods is a good thing, and the “adaptations” made to the NOS are completely standard ones that are typically made so that the generic base questions make sense for the review subject.
For now though, suffice to say: this critique challenges not only the specific use of the NOS, but the decision to use the NOS at all. More on this later.
The real starting point of the critique is that, whatever methodology was used, the Cass Review and the York University team proceeded from a faulty perspective.
That is, they approached this by thinking about children and young people in distress, and investigated whether treatments provided to them did actually improve their mental health and wellbeing. This, according to the critique, is the wrong thing to do:
evaluating the efficacy of GAC based on psychosocial well-being alone is misguided
The authors believe this, because to them the purpose of allegedly “life-saving” interventions is not to improve mental wellbeing, but to fulfil cosmetic goals - and it is this fulfilment that will then consequentially improve mental health through realising one’s authentic self:
The primary goal of GAC is to prevent or induce the appearance of certain physical characteristics, and their physiological efficacy is undisputed. Mental health benefits are a logical consequence of living authentically
This is somewhat contradictory, in that mental health improvements are promoted as a “logical consequence” but attempting to evaluate whether this is actually true by checking to see if there are actually any mental health benefits is “misguided”.
But in any case the reasoning here is that what is important is how the child or young person is approached. Providing treatment to alleviate distress is wrong, pathologising and paternalistic and will harm mental wellbeing, while offering children unfettered access to cosmetic alterations to achieve their embodiment goals is right and guaranteed to improve wellbeing:
Given that transgender people have a care need rather than a disease and seek actualisation of their identities as opposed to a cure, this paternalistic lens is inappropriate and pathologising
By this point, the critique is no longer arguing about evidence or from any basis in science, but is making a purely ideological point. It demands that children in distress at their sexed bodies be thought of in a specific way and be given a specific kind of treatment on demand, and any other viewpoint is wrong. Not only that, but any scientific evidence gathered that starts from an alternative point of view is also wrong, and should be disregarded.
So because the Cass Review did not accept up front the ideological perspective of the authors, it is suspect. The authors then cement this viewpoint by drawing an analogy to contraception:
[Gender Affirmative Care] should instead be considered through a similar lens as reproductive healthcare, akin to how healthcare providers and the public think about contraception, HRT, or fertility treatment.
This is quite fundamental to their critique: that Cass should have considered puberty blockers to be as straightforward as taking the contraceptive pill, and by not taking that as the starting point the findings and recommendations are flawed. The argument being made by analogy is that we do not pathologise young people seeking contraception nor do we consider contraception an intervention to improve their mental health, but instead see it as a choice relating to bodily autonomy. So, is that a valid comparison?
There is one citation for this sentence in the paper, and it goes to an earlier paper by one of the co-authors, Florence Ashley, in an ethics journal, making this argument from a rhetorical, ethical and philosophical standpoint. This is not a definitive medical consensus, rather it is the strong opinion of one of the authors.
So let’s look at what that citation actually says.
The earlier paper is titled “Adolescent Medical Transition is Ethical: An Analogy with Reproductive Health” and states its premises up front: transition is about attaining embodiment goals:
In this article, I argue that adolescent medical transition is ethical by analogizing it to abortion and birth control. The interventions are similar insofar as they intervene on healthy physiological states by reason of the person’s fundamental self-conception and desired life, and their effectiveness is defined by their ability to achieve patients’ embodiment goals.
This is claimed to be a legitimate comparison to make because the evidence for mental health benefits for both types of procedure is similar, and there is no scientific evidence suggesting harms of transition:
Since the evidence of mental health benefits is comparable between adolescent medical transition, abortion, and birth control, disallowing transition-related interventions would betray an unacceptable double standard. While great enough risks can override autonomy over fundamental aspects of personal identity, I demonstrate that the available scientific evidence does not corroborate the view that adolescent medical transition is dangerous. Consequently, adolescent medical transition should be recognized as ethical and remain available.
The whole argument rests on the premise that medical transition is already proven to be safe and effective to the same degree as contraception, and thus an analogy can be drawn.
This is a problem then, because the Cass Review went on to find the evidence was incredibly poor. So Ashley’s argument against the Cass Review is that it found the evidence was weak because they didn’t consider it like contraception.
Which is circular and begging the question.
So, how does Ashley arrive at the idea that the evidence is strong? Well, by attacking previous research that found it to be weak by using the exact same circular argument against the two 2020 NICE reviews that predated and precipitated the Cass Review:
The effectiveness of adolescent medical transition was subsequently questioned in evidence reviews of puberty blockers and hormone therapy published by the National Institute for Health and Care Excellence (NICE 2020b; 2020a). The reports were prepared in support of the England-wide review of gender identity services for minors led by Dr. Hilary Cass. Both reviews looked at mental health benefits and concluded that evidence of effectiveness was of very low quality under the GRADE framework for summarizing evidence (Guyatt et al. 2008). As I argue in this paper, understanding effectiveness in terms of psychological benefits mischaracterizes the purpose of adolescent medical transition.
According to Ashley, earlier reviews that found the evidence base to be weak should be disregarded because they want to evaluate effectiveness at alleviating distress rather than straightforwardly facilitate “embodiment goals”.
Additionally, those earlier reviews assessed the evidence using GRADE, which is a tool designed for randomized-controlled trials. However, since most existing literature in this area are nonrandomized, cohort studies, Ashley takes issue with this approach as inappropriately downweighting the available evidence:
The reviews […] employed the GRADE framework for summarizing evidence
[…]
Upholding randomized controlled trials as an evidentiary norm regardless of context underestimates the value of other methodologies.
[…]
Under frameworks tailored to non-randomized studies, such as the Newcastle-Ottawa Quality Assessment Scale, evidence of mental health benefits would likely be assessed as moderate to high
So, here we have Ashley saying that the earlier reviews were too harsh because they used GRADE, and claiming that an evidence review that used NOS would be more forgiving, therefore that the evidence is strong, actually.
We can check this, because Ashley lists several positive papers that would apparently have scored better under NOS:
(van der Miesen et al. 2020; Turban, King, et al. 2020; Carmichael et al. 2021; Moore 2018; Grannis et al. 2021),
And here is how the York University team scored those, using NOS:
van der Miesen et al. 2020 (puberty blockers) - Rated high
Carmichael et al. 2021 (puberty blockers) - Rated moderate
Turban, King, et al. 2020 (puberty blockers) - Rated low (due to lack of followup and that it is based on self-reporting whether participants even underwent treatment)
Grannis et al. 2021 (hormones) - Rated moderate
Moore 2018 is a doctoral dissertation, not a published study, so is excluded.
Apart from Turban and Moore (which weren’t anything like as strong as Ashley implied here), as predicted, three of the later papers scored moderate to high under NOS.
It has to be said though that according to a review by The Commission on Human Medicines, the York team bent over backwards to rate these studies generously:
In discussion of the CHM Core Group with Professor Hewitt from the University of York team we were informed that by usual standards the impacts identified as moderate quality evidence would usually be consistent with poor quality evidence, but were placed in this category as the overall quality was so poor they considered a need to provide some differentiation.
In any case, just as Ashley argued in 2022, NOS is an appropriate tool for the job of evaluating nonrandomized studies, and was able to score relevant papers more generously than previous systematic reviews that used GRADE. Despite these efforts though, the actual findings were themselves weak - eg. Carmichael et al found no improvements, van der Miesen et al found small improvements in some areas, none in others, and so on.
Unhappy that despite using the NOS the overall picture in the later studies is still of weak and inadequate evidence, Ashley has now co-authored a paper arguing that they shouldn’t have used NOS after all. Having not got the answer wanted from NOS, they insist another tool would be better:
The ROBINS-I is an example of a more suitable tool
This was addressed in a response to this critique that was published in the BMJ when it appeared in preprint form, which noted that ROBINS-I is less forgiving than NOS:
Our assessment shows that the assessment with the adapted NOS scale probably is more forgiving than the assessment with the ROBINS-I tool. Studies assessed as moderate quality would probably be of critical risk of bias due to its emphasis in confounding in non-randomized studies.
It might be apparent by this point that the authors of the critique are just throwing up any chaff they can think of, and that any approach that doesn’t end up with the desired result is going to be attacked. This is because the fundamental point of contention is not evidentiary, but philosophical.
Florence Ashley’s argument in this earlier paper is that requiring irreversible and experimental medical interventions on distressed children to show some sort of mental health benefit is a “double standard”, because that is not what it is for:
Adolescent medical transition is effective, fulfilling the physiological purpose of bringing the person’s sexual characteristics and gendered self-image into closer alignment. Like abortion and birth control, transition-related interventions do not seek to cure an illness but instead reflect autonomy over fundamental aspects of personal identity. Transition-related interventions are forms of definitional medical care. Proof of mental health benefits is not required of reproductive healthcare, another form of definitional medical care. Asking such proof for adolescent medical transition is an unacceptable double standard.
Ashley concludes that, from this standpoint, researchers should thus stop trying to gather any evidence of mental health benefits:
my argument suggests that studies should move beyond whether transition-related interventions confer mental health benefits and place greater focus on how to best meet individual embodiment goals.
This is not evidence based medicine - it is a philosophical argument to conceptualise medical interventions not as actual healthcare, but as liberal, consumerist choices that cannot be denied. The customer, after all, is always right.
Ashley is of course perfectly entitled to make this argument. The issue here is that what is merely an opinionated rhetorical standpoint is cited in this latest critique as if it were demonstrated to be true. As if it is some settled medical position that the Cass Review is remiss in failing to adopt, rather than an absolutely unhinged call for access to puberty blockers and cross-sex hormones on-demand.
To reiterate, this newly published critique of the Cass Review co-authored by Florence Ashley:
Condemns the York University systematic reviews for using the Newcastle-Ottawa Scale to assess the quality of evidence for improved mental health outcomes, because improved mental health isn’t the aim, embodiment goals are.
This is based on a citation to Ashley’s own opinion in an essay which insists the evidence for safety and efficacy is good, and therefore youth transition can be considered analogous to contraception.
This claim that the evidence for safety and efficacy is good is based on arguing that the 2020 NICE systematic reviews showing the evidence is poor did so because they didn’t use the Newcastle-Ottawa Scale.
It really is another case of the ol’ switcheroo, and frankly this sort of “rhetorical argument presented as fact by citation” is endemic in this field.
Florence Ashley is positing a specific conceptualisation of gender, gender identity and transition - one rooted in consumerism and justified by idealised notions of the self and personal autonomy. From Ashley’s perspective, this overrides the need for evidence - indeed, attempting to collect evidence that doesn’t already accept this perspective is condemned. The evidence must be considered enough already, for this conceptualisation to be true.
The most fundamental argument being made in this critique of the Cass Review is a philosophical one. So it was a fine display of hypocrisy that Ashley recently condemned the involvement of an MIT philosophy professor in a Health and Human Services report into paediatric transition released in the United States, stating on social media:
Yet Ashley’s work is itself a tangled web of philosophical claims held together by self-citations, circular arguments and begged questions that sits right at the heart of the current debate: the ethics and safety and rationale behind paediatric transition. If clinicians can’t even agree why a treatment is being carried out, what it is supposed to do, and what a “good” outcome is supposed to be, how is anyone supposed to evaluate whether it is working?
The Cass Review made the fatal mistake of diligently checking to see if a medical treatment actually had any sort of clinical justification, or led to any improvements whatsoever. As such, every piece of evidence gathered is suspect, because it starts from a perspective of pathology, of treating a condition, of alleviating distress, of minimising harm, of paternalism, of gatekeeping, of thinking doctors know best, of not believing children when they say who they are, of interfering with self-actualisation.
This perspective practically requires a philosopher to unpick it, and it is quite clear that the only evidence the authors will accept is that which agrees with and supports their philosophical starting point: the fulfilment of embodiment goals.