While causal models are gaining popularity in science studies due to their explanatory power over certain phenomena, they are often misused, creating a false sensation of confidence. This letter aims to discuss the limitations of such models from internal and external critique, showing some of the risks they entail.

In recent years, causal graph models, introduced by Pearl (2009), have taken an increasingly prominent position in the study of inequalities in science (Moffitt, 2005; Traag, 2021; Traag & Waltman, 2022; Woodward, 2007). It is regarded by some as the only way to disentangle the essential from the concomitant (Pearl & Mackenzie, 2018). By showing some limitations of this framework, I aim to warn on the problems of considering causal models as the sole option for the study of inequalities, while showing the dangers of its misuses.

By construction, within the context of inequalities in science, causal models are aimed to answer a specific type of question: Is there enough evidence to prove direct discrimination? This is a question that can only be answered within the limits of the data set in use and controlling for all relevant confounding factors. This method of questioning reflects an underlying worldview of the phenomena under study. It establishes a directionality in the burden of proof, and a delimitation of what can be known and what falls beyond the scientific discussion. Both aspects can be considered as external critiques. But first, I will delve into the internal consistency of causal models for the study of inequalities in science.

As with any parametric statistical model, causal models rely on a series of assumptions. Among these, the completeness of the model is crucial (Pearl, 2009). There should not be any missing variable relevant to explain the phenomena, or at least we should expect that the omitted variables cancel each other out on average and can be considered noise. If the model is incomplete, there is no way to measure the degree to which the conclusions of the ill-defined causal model are valid or not.

The assumption of completeness is impossible to operationalize for two main reasons. The first is a technical limitation. Although we can measure, or conceptualize, a metric for many of the analytical categories involved with inequalities in science, a comprehensive data set that allows us to work with all those variables at the same time does not exist, and will not exist in the foreseeable future. The second limitation is deeper, as many elements are just not directly measurable—such as the prestige or the habitus—but still remain a relevant part of structural inequalities that needs to be qualitatively and/or theoretically accounted for. The intertwined nature of inequalities in science implies that unobserved factors have a multiplying effect rather than offsetting each other. The lack of completeness is a problem that affects all statistical models, and not just causal analysis. Although causal graphs make assumptions more explicit than other regression methods, this limitation persists and calls for complementary approaches. A second internal problem is that causal theory is built around causality structures that can be defined as directed acyclic graphs (Pearl & Mackenzie, 2018). But already the seminal work on inequalities in science, the Matthew effect (Merton, 1968), implies a cyclical mechanism where old citations drive new citations. The cumulative nature of inequalities implies both that inequalities cannot be conceived as directed acyclic graphs, and that any small—and possibly statistically undiscoverable—effect can eventually mount up to a big difference over time. In this sense, history presents a dual challenge for causal models. First, cumulative effects are associated with cyclic graphs, which cannot be properly modeled. Second, the mechanisms through which systemic inequalities take form also evolve over time, adding further complexity to the modeling process.

Alternatively, quasiexperimental designs can be defined for very specific case studies where it can be assured that only the variables of interest are in play. This scenario also presents the limitation of the degree of external validity of the results. In this sense, it shares some common ground with case studies and qualitative analysis, which within their case study can provide valuable knowledge, but its conclusions should not be considered valid for the population as a whole.

One of the main harms that causal theory entails to the discussion of inequalities is that it redirects the focus of discussion. Fairness, disparity, and bias need to be defined as independent concepts (Traag & Waltman, 2022), and in many cases the questions deemed valid are focused only on bias (Cruz-Castro & Sanz-Menéndez, 2021; van den Besselaar, Mom et al., 2020). The historical forms of discrimination (by race, gender, nationality, or any other form) change over time, and the lack of explicit discrimination does not imply an absence of systemic injustice. Social inequality is a historical process, and as Traag and Waltman (2022) acknowledge, the legacy of past discrimination is a central element of understanding current inequalities. Moreover, individual choices—in terms of fields, research topics, career path, work-life balance, etc.—are not made in a void, but can be explained as part of the system (Larregue & Bourihane, 2024). Individuals’ choices are endogenous to the system, as they are based on their past and present material possibilities, their cultural and academic capital, and the stereotypes they are subjected to (Bourdieu, 1975). Causal models can be useful for the definition of specific policy interventions, but if we do not acknowledge their limitations, they can be used to funnel the discussion towards the existence—or absence of—direct discrimination, which is only a limited part of the overreaching problem.

A recent article published in QSS (Cruz-Castro & Sanz-Menéndez, 2023) proposed the following definition:

by gender disparity we understand a difference in the outcome of interest between male and female1 applicants; whereas by gender bias, we understand any difference between male and female applicants that is directly causally affected (and directly measured) by their gender; a gender disparity may be the result of an indirect causal pathway from someone’s gender to a particular outcome and may be affected by differences in merit, but a gender bias is a direct causal effect of the action of reviewers.

The competing explanations of gender disparity are thus direct explicit discrimination or difference in merit between genders, as is also made clear in Figure 1, extracted from that same article. This oversimplification of systemic inequalities in science holds women responsible for their alleged lack of merit, while it omits all the different paths through which inequalities take place. Analysis of gender inequalities needs to be properly framed in theoretical work. In this sense, a starting point for any analysis of discrimination should be that people are not less capable of doing science given their identity markers per se. Then, if causal modeling is used, the causal graph should not have a direct link between gender (or any other identity marker) and merit, because this so-called “merit” functions as an umbrella of all the hidden systemic inequalities that lead to a difference in productivity, visibility, or impact.

Figure 1.

Causal design (taken from Cruz-Castro & Sanz-Menéndez, 2023).

The main problem that our society faces nowadays is the cumulative nature of inequalities—within the life cycle and intergenerationally—rather than the direct discrimination that an individual faces at one specific moment of their lives. Therefore, the most needed work to be done in this area is that which focuses on the systemic aspect rather than on the explicit discrimination that could be measured with a causal framework.

On a recent critical self-reflection of his field, the Nobel Prize winner in economics Angus Deaton considered that

The currently approved methods, randomized controlled trials, differences in differences, or regression discontinuity designs, have the effect of focusing attention on local effects, and away from potentially important but slow-acting mechanisms that operate with long and variable lags. Historians, who understand about contingency and about multiple and multidirectional causality, often do a better job than economists of identifying important mechanisms that are plausible, interesting, and worth thinking about, even if they do not meet the inferential standards of contemporary applied economics. (Deaton, 2024)

This is precisely the potential harm that causal models carry for the study of inequalities in science. The main drivers of inequality are slow-acting mechanisms of a cumulative nature that take place across generations and within each person’s life experience. Therefore, we cannot measure discrimination as a shock event of direct blatant discrimination, which is the type of phenomena that causal modeling is best armed for identifying. Focusing only on local effects has the risk of missing the big picture. This is a problem shared between causal models and many other types of statistical models. This does not mean we should completely omit causal models (or any other method), as they can bring valuable elements to the discussion. As long as we understand that these models cannot prove the existence or absence of systemic inequalities, they can help to disentangle concomitant mechanisms that are hard to study independently otherwise. At the same time, the construction of causal graphs can, in many situations, alert us to the presence of statistical artifacts or colliders, for which a control might obscure what we intend to study. In this sense, this is not a call for not using causal models, but a call for their responsible use, acknowledging the historical processes that define current inequalities, and the limitations the method entails, as shown in Traag and Waltman (2022); instead proposing a false dichotomy between identity and individual merit, as seen in Cruz-Castro and Sanz-Menéndez (2023). But, as a field, it is important to recognize both the limitations and potential of this method, in the same way that we do for qualitative, descriptive, and other inferential approaches. In this sense, it is important not to build a methodological hierarchy in which causal models are better regarded than other quantitative or qualitative approaches. The collective skepticism and plurality of voices and methods is the only path through which we can move forward in the discussion of inequalities in science, and ultimately in the struggle against them.

I want to thank Yotam Sofer, Carolina Pradier and Natsumi S. Shokida for their valuable comments and suggestions. This work was possible thanks to the support of the SSHRC.

The author has no competing interests.

This project was funded by the Social Science and Humanities Research Council of Canada Pan-Canadian Knowledge Access Initiative Grant (Grant 1007-2023-0001), and the Fonds de recherche du Québec-Société et Culture through the Programme d’appui aux Chaires UNESCO (Grant 338828).

1

While causal analysis is often presented as a more rigorous approach to the discussion, the misuse of terminology (using male and female instead of men and women to refer to gender) is also telling about their lack of conceptual rigor when discussing gender inequality.

Bourdieu
,
P.
(
1975
).
The specificity of the scientific field and the social conditions of the progress of reason
.
Social Science Information
,
14
(
6
),
19
47
.
Cruz-Castro
,
L.
, &
Sanz-Menéndez
,
L.
(
2021
).
Blogpost 5: Gender disparities in research funding, bias and equality policies: The need for new evidence
.
Grant Allocation Disparities
,
June 29
. https://www.granted-project.eu/blogpost-5-gender-disparities-in-research-funding-bias-and-equality-policies-the-need-for-new-evidence/
Cruz-Castro
,
L.
, &
Sanz-Menéndez
,
L.
(
2023
).
Gender bias in funding evaluation: A randomized experiment
.
Quantitative Science Studies
,
4
(
3
),
594
621
.
Deaton
,
A.
(
2024
).
Rethinking economics or rethinking my economics
.
International Monetary Fund
. https://www.imf.org/en/Publications/fandd/issues/2024/03/Symposium-Rethinking-Economics-Angus-Deaton
Larregue
,
J.
, &
Bourihane
,
H.
(
2024
).
The gendered structure of science does not transpire in an experimental vacuum
.
Quantitative Science Studies
,
5
(
1
),
261
263
.
Merton
,
R. K.
(
1968
).
The Matthew effect in science: The reward and communication systems of science are considered
.
Science
,
159
(
3810
),
56
63
. ,
[PubMed]
Moffitt
,
R.
(
2005
).
Remarks on the analysis of causal relationships in population research
.
Demography
,
42
(
1
),
91
108
. ,
[PubMed]
Pearl
,
J.
(
2009
).
Causality
.
Cambridge
:
Cambridge University Press
.
Pearl
,
J.
, &
Mackenzie
,
D.
(
2018
).
The book of why: The new science of cause and effect
.
New York, NY
:
Basic Books
.
Traag
,
V. A.
(
2021
).
Inferring the causal effect of journals on citations
.
Quantitative Science Studies
,
2
(
2
),
496
504
.
Traag
,
V. A.
, &
Waltman
,
L.
(
2022
).
Causal foundations of bias, disparity and fairness
.
arXiv
.
van den Besselaar
,
P.
,
Mom
,
C.
,
Cruz-Castro
,
L.
, &
Sanz-Menéndez
,
L.
(
2020
).
Identifying gender bias and its causes and effects
.
GRANteD Project
. https://www.granted-project.eu/wp-content/uploads/2020/12/D-2.1_publication.pdf
Woodward
,
J.
(
2007
).
Causal models in the social sciences
. In
S. P.
Turner
&
M. W.
Risjord
(Eds.),
Philosophy of anthropology and sociology
(pp.
157
210
).
North Holland
.

Author notes

Handling Editor: Gemma Derrick

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.