Abstract
This article assesses the balance of research concerning women and men over the past quarter century using the crude heuristic of counting Scopus-indexed journal articles relating to women or men, as suggested by their titles or abstracts. A manual checking procedure together with a word-based heuristic was used to identify whether an article related to women or men. The heuristic includes explicit mentions of women and men, implicit mentions, and a set of gender-focused health issues and medical terminology. Based on the results, more published articles now relate to women than to men. Moreover, more than twice as many articles relate exclusively to women than exclusively to men, with the ratio increasing from 2.16 to 1 in 1996 to 2.25 to 1 in 2020. Monogender articles mostly addressed primarily female health issues (maternity, breast cancer, cervical cancer) with fewer about primarily male health issues (testicular cancer, pancreatic cancer, health needs of men who have sex with men). Some articles also explicitly addressed gender inequality, such as empowering female entrepreneurs. The findings suggest that the androcentrism of early science has eroded in terms of research topics. This apparent progress should be encouraging for women researchers and society.
PEER REVIEW
1. INTRODUCTION
The modern scientific method was born in sexism during the 17th century, created by men partly with the belief that a masculine approach was necessary for the mastery of nature (Keller, 1995). Women were rendered invisible (Keller, 1982) with a focus on of the importance of “great men” (e.g., Auchmuty & Rackley, 2020; Hill, 2019), despite a few prominent early female scientists (Blum, 2005), and women were considered intellectually inferior in society (Laqueur, 1990). Although the research itself was seen as objective, it embedded male biases (androcentrism) in the choice of research topics and the seriousness with which different issues were treated. This was perhaps clearest in the field of medicine. For example, a century ago a woman could be diagnosed with hysteria for failing to conform to the social role that men expected her to play, with her uterus sometimes being blamed (Bueter, 2017; Edwards, 2009; see also Kadar, 2019). Similarly, cancer was originally regarded as a primarily female disease that was sometimes attributed to promiscuity (Löwy, 2013). The androcentrism of science was widely, forcefully, and (partly) successfully challenged during the second wave of feminism (Whelehan, 1995), especially in the late 1970s and early 1980s. The extent to which androcentrism has been driven out of science is unclear, however, especially in research topic choices.
1.1. Feminist Critiques of Androcentrism in Science
Androcentrism is a key feminist concept (Whelehan & Pilcher, 2004), with androcentrism in science being called out in print from a feminist perspective by 1970 (Heide, 1970). The extensive feminist critiques of science during the second wave of feminism merged feminist thinking and science studies social constructivist critiques of the monolithic objectivity claims of science (e.g., Kuhn, 1962). This tackled the invisibility of women in science (Bleier, 1988) and variously addressed at least five different types of androcentrism (Harding, 1986; Keller, 1982; see also Grady, 1981): the unfair low proportion of female scientists; the choice of science topics being male dominated, including problem definitions; methods and interpretations in some areas of science being restricted by sexism or male perspectives, particularly in biology and the social sciences; science being harnessed to justify sexist social projects; and modern science having (partly consciously) developed a fundamentally masculine core that skewed how scientists worked. Feminist studies documented how androcentrism was present in science and society and how it was learned (e.g., Kelly, 1985).
Because of the many and deep problems found in science, there were disagreements within feminism about whether science itself should be discarded or reformed, and how. For example, there was not a consensus about whether scientific objectivity was tainted by the masculine ethos of science or whether (masculine) scientific objectivity itself was irredeemable (Harding, 1986) or replaceable (Haraway, 1988). The problems identified were often connected to ways in which science affects other oppressed groups differently or more (Crenshaw, 1989; Harding, 2008; Schiebinger, 2004), mirroring many other feminist arguments (Davis, 2008). They also occurred in parallel with ongoing sexism in society (Schwartz, McDermott, & Martino-Harms, 2016), including in education (e.g., Lawson, 2020).
Strong evidence has been found for all five types of androcentrism mentioned above and, since the second wave of feminism, there has been mixed evidence of progress. First, there have been large but uneven increases in the proportion of female scientists in many countries (Keller, 2004; UNESCO, 2021), supported by reducing gender inequalities in education (Fox, 2001) and initiatives to support women in science (Barr & Birke, 1998), but with continued obstacles to women (Kim & Moser, 2021). There has also been increasing recognition of female scientists (Schiebinger, 1999, 2000), but falling far short of parity (Meho, 2021).
Second, a general shift in scholarly research topics to include women seems likely because of the changing nature of academia, not only with reducing gender imbalances in researchers but also with the professionalization and adoption by universities of people-focused social and health services (Blättel-Mink & Kuhlmann, 2003; Hunt, Adamson et al., 1998). Most people-based topics have an above-average proportion of female researchers (e.g., Thelwall, Bailey et al., 2019), suggesting that their increasing availability (e.g., social sciences, health sciences) also reflects the changing gendered nature of academia. There is evidence of substantial progress in the health sciences (Harding, 1998; Moscucci, 2016; Rolin, 2004; Schiebinger, 1999, 2000), although dangerous sexist medical decision-making seems to have continued into the 21st century (e.g., Krieger, Löwy et al., 2005). Moreover, there are now also female-oriented specialties, such as women’s studies (Berger & Radeloff, 2014). There is no systematic evidence of gendered topic shifts in academia, however.
Third, there are now many examples of research progress due to bypassing masculine interpretations (Keller, 2004; Schiebinger, 1999, 2000), with guides and guidelines to avoid sexist assumptions (Stark-Adamec & Kimball, 1984) and, fourth, there seems to be much less use of science to justify sexist social practices (e.g., examples cited by Walby [2001]) and more use of science to argue for gender equality (e.g., 11,782 documents in Scopus mentioned “gender equality” in their title, abstract, or keywords in November 2021, with the first being from 1982: Patterson [1982]). Fifth, there is no evidence about changes in the difficult-to-quantify masculine core of science, with persisting perceptions of science as not feminine (Banchefsky, Westfall et al., 2016), but declining sexist language use in science (Hegarty & Buechel, 2006).
The causes of the partial successes so far are impossible to determine, but they may be primarily the outcome of the societal changes triggered by the political demands of second wave feminism and pressure from women’s groups. For example, grassroots activism such as the Women’s Health Movement in the United States from the 1960s seems likely to have impacted health-related research (Nichols, 2000). Feminist critiques of science, although influential, may not have gained the science-wide audience necessary to be the likely cause of the main changes (Keller, 1995, 2004).
1.2. The Need for Female-Focused Research
There are many reasons why there is an ongoing need for research relating to women in a range of different contexts. The most obvious topics calling for female-specific research are arguably pregnancy and childbirth. This is not only for the health of the pregnant woman but also for the health of the baby. In addition, all medicines need separate safety checks for pregnant women to ensure that they do not have side effects for the birth process (Little & Wickremsinhe, 2017). Other female-specific or clearly gender-differentiated medical topics include reproductive health (Boyce & Neiterman, 2021), breast cancer, and cervical cancer. Research on women is also needed for health-related topics that have important gender-specific dimensions, including coronary heart disease (Bueter, 2017), drug and alcohol dependency (Meyer, Isaacs et al., 2019), domestic violence (Leung, Miedema et al., 2019), and sexual behavior (Kowalewska, Gola et al., 2020). More generally, gender medicine, in the sense of investigating gender differences in the physical and social effects of treatment, is now recognized as important (Baggio, Corsini et al., 2013).
In addition to health issues, there is a need for more women-focused research about employment to better encapsulate the type of work that women disproportionately do and the additional legal, social, and other challenges that they face (Flores, Settles et al., 2021; Traylor, Ng et al., 2020). Many papers have investigated why women are underrepresented in some jobs, including in science, technology, engineering, and mathematics (STEM) (Poggesi, Mari et al., 2020), policing (Chu, 2018), computing (Trauth, 2020), public relations (Place & Vardeman-Winter, 2018), and as entrepreneurs (Banihani, 2020; Cho, Li, & Chaudhuri, 2020). Women in senior roles in various professions have also been investigated, exploring why they are underrepresented or why there is a glass ceiling (Jalalzai, 2018; Leitch, Welter, & Henry, 2018; Moyo & Perumal, 2020). These studies all suggest that there is a greater need for research relating to women in many work contexts, if equality of opportunity is accepted as a societal good.
More generally, research may need to focus on women in a given role if their experience is substantially different from that of men. The many examples of this include farming in developing nations (Ball, 2020), friendship (Khan, 2020; Martinussen, Wetherell, & Braun, 2020), and imprisonment (Covington, 1998).
1.3. Research Questions
As the above brief and limited summary suggests, although women have been historically dismissed in academia, there are many reasons why research focused on women is essential. Moreover, while androcentrism in research topic choice seems likely to have diminished since the most influential feminist critiques of science were published, it is not clear whether it is still diminishing or substantial. Although many scholars have made clear cases for the need for more research relating to women in specific contexts, this article takes a broad approach, surveying all academic fields (primarily) over the last quarter century for evidence of the gender balance of academic research in terms of the proportion relating to men or women. The following questions drive this study.
How have the proportions of published academic research (journal articles only) relating to men and women changed over time?
How have the proportions of published academic research (journal articles only) exclusively relating to men and exclusively relating to women changed over time?
What is the current (2020) gender balance between men and women as topics of academic research?
2. METHODS
The research design was to construct a heuristic to identify journal articles relating to men, women, or people, and then compare their prevalence over the past quarter century. People are included for context in the initial set of terms. For this article, men and women are people described as such in research articles, whether this refers to their biological sex or socially constructed gender, ignoring the difference between the two as a practical necessity. When age ranges are reported for men or women, they must include 18 and older (i.e., adults). Although there is a difference between the biological sex and socially constructed gender, the two align for most people. For example, only 0.6% of adults identify as transgender in the United States (Flores, Herman et al., 2018), so differentiating between cisgender and transgender from article abstracts, even if possible, would not change the results. In the accuracy checks (see below) there was one case mentioning a transgender person and of course they were included with their identified gender.
As strong evidence of trends in research before 1996 could not be identified for the reasons given below, a simple title word counting heuristic was used to give an initial estimate of the proportion of research related to women, men, nonbinary people, and all people as an initial broad check. The remaining subsections relate to the primary data sets and tests.
Nonbinary people (Fausto-Sterling, 2012) were not examined in the rest of this article because there is too little reported information about them to support a comparable analysis, and the first nonbinary research in Scopus seems to be halfway through the period examined (Corwin, 2009). By 2020, there were about 213 journal articles about nonbinary people in Scopus (as estimated by the query: SRCTYPE(j) AND DOCTYPE(ar) AND TITLE-ABS(gender AND (nonbinary OR “non-binary” OR “non binary”)) which is 2% of the 10,343 for women (as estimated by the query: SRCTYPE(j) AND DOCTYPE(ar) AND TITLE-ABS(gender AND woman)).
2.1. Scopus Journal Articles 1996–2020 with 500+ Character Abstracts
The analysis was conducted on Scopus journal articles (documents of Scopus type “article” in publications of type “journal”) from 1996 to 2020. The year 1996 was chosen as the starting point, as 1996 was a recognized journal coverage threshold for Scopus, although an initiative has been conducted by Elsevier to address this (Beatty, 2015). Thus, earlier years could have substantially different journal compositions, making any trends difficult to interpret. All articles between these years were downloaded from Scopus (Supplement A: Table S5).
Some articles have no abstract in Scopus, with the proportion decreasing substantially between 1996 and 2020 (Supplement A: Table S5), which would affect the method used here. Articles were therefore required to have a nontrivial abstract to be included (this article was initially prepared without this step and the results were substantially different and misleading). After exploring the data, a minimum length of 500 characters was chosen as a round number that seemed to be sufficient to exclude trivial abstracts, such as those that were only copyright statements. Webometric Analyst (https://lexiurl.wlv.ac.uk) was used to extract articles with abstracts having at least 500 characters for the main analysis. Keywords were not analyzed because they can contain generic terms added by indexers, including outdated sexist terms, which would skew the results.
2.2. Articles Mentioning Combinations of Men, Women, and People
A program was written to identify articles directly mentioning women, men, or people in their titles or abstracts, as a first step to identify articles relating to women or men. These terms are justified below. The program identified the words “women,” “woman,” “men,” “man,” “people,” and “person,” counting and listing the articles matching one of the following six conditions. These conditions generate the main sets investigated and their opposites (see the next subsection for the purpose of the opposites). The people-related set is included to give context to the main results.
Mentions “women” or “woman”
Mentions “men” or “man”
Mentions “people” or “person”
Does not mention “women” or “woman” (opposite of 1)
Does not mention “men” or “man” (opposite of 2)
Does not mention “people” or “person” (opposite of 3)
The following sets were also generated to investigate exclusive mentions of women or men. People were not investigated for this set due to the difficulty in operationalizing research exclusively mentioning people.
- 7.
Mentions (“women” or “woman”) but does not mention (“men” or “man”)
- 8.
Mentions (“men” or “man”) but does not mention (“women” or “woman”)
- 9.
Does not mention (“women” or “woman”) or mentions (“men” or “man”) (the opposite of 7)
- 10.
Does not mention (“men” or “man”) or mentions (“women” or “woman”) (opposite of 8)
The terms above were identified at the word level, so “men” would not match “women” and “woman” would not match “womanly,” for example.
2.3. Articles Relating to Women or Men: Random Sample Generation
An article can mention one of the six keywords searched for due to a typographic error (e.g., “man” instead of “mean”) or a different meaning of the word (e.g., “man” meaning “human,” as in, “You can’t ever reach a man if you don’t speak his language.”) It is therefore useful to assess the extent to which these words are used in the context of men or women being subjects of the article concerned, at least as described by the title or abstract. Thus, checks were performed on the main sets analyzed (1, 2, 7, 8: men, women, exclusively men, exclusively women) to assess whether the articles related to the relevant group in addition to mentioning them. The concept of “related to” is clarified below.
It is also possible for an article to relate to men, women, or people without mentioning them. For example, women might be described as ladies or referred to by name. A study might also address issues of primary concern to one gender without explicitly mentioning them. To illustrate this, it seems reasonable to argue that an analysis of prostate cancer cells would relate to men but not women in this sense. Thus, for each of the main sets, articles not mentioning the terms were examined to assess whether they nevertheless related to women or men (4, 5, 9, 10: no men, no women, not exclusively men, not exclusively women).
As there are too many articles to investigate individually, random samples were taken of each set from the first and last years (1996 and 2020) to assess the proportion that were about the relevant group. A preliminary analysis suggested that samples of 10,000 articles would be needed to guarantee confidence intervals of width less than 1% for the opposite sets, and that samples of 1,000 would be large enough to identify trends within the main sets. These sample sizes were generated with a random number generator in Webometric Analyst (Text|Copy files|Randomly select n lines).
2.4. Articles Relating to Women or Men: Automatic Annotation of Random Samples
The eight random samples (88,000 article titles and abstracts, half for 1996 and half for 2020) were loaded into spreadsheets (a separate spreadsheet for each group and year) and sorted into random orders (using Excel’s random number generator) for manual classification. The first 1,000 records from each set were initially read to identify characteristics that would allow judgments about whether they related to women or men (aged 18+). This was used to generate a set of rules to identify terms in titles and abstracts that would help with decision-making, in addition to the main keywords themselves (e.g., woman, men). Excel commands were then used to extract snippets of text containing these terms into a separate column, to help identify the main contexts. Here are some examples.
The terms “female” and “male” were indicative of women and men, respectively, so extracting phrases containing these would help make a judgment about whether an article related to women or men, respectively. For example, one of the snippets extracted from an abstract was, “in a 27-year-old female patient, with long-term,” indicating that the article related to women.
Mentions of men or males often pointed to the implicit inclusion of women, such as in phrases such as “110 patients, 57 of which were men.”
Mentions of gender-specific conditions were accepted as being about the relevant gender, so phrases including identified gender-related terms (e.g., “breast [cancer]”, and “prostate”) were singled out.
Mentions of animals suggested that articles were not about humans, despite the inclusion of “male” or “female,” so words such as “porcine,” “bovine,” “rats,” and “mice” were extracted to help identify articles relating to animal genders.
The terms (including term stems) identified in this stage were initially as follows: sheep, guinea, drosophila, rats, locust, mice, mouse, bovine, porcine, children, pediatric, paediatric, adolescent, chemotherapy, women, men, mother, father, matern, gender, male, female, pregnan, breast, menopaus, abortion, prostate, and testicular. The italic terms are exclusion words, to flag that a gendered word might be animal related (also flagging that “mother” is in “chemotherapy”). This initial list of relevant terms was subsequently expanded for greater accuracy by adding a large set of systematically identified extra terms, as described in the next subsection.
2.5. Words Indicating That an Article Is Likely to Relate to Women or Men
This subsection describes the process used to systematically identify large sets of terms relevant to research about men and women from the data to help the manual coding process identify articles relating to women and men even if they were not explicitly mentioned. There does not seem to be a list of the main topics of research that relate to women or men, so lists of words associated with women and men were constructed from the 1996 and 2020 samples using the word association detection procedure previously used to identify gendered topics (Thelwall et al., 2019) using the free software Mozdeh (mozdeh.wlv.ac.uk). A complete explanation of the procedure is available elsewhere (Thelwall, 2021) and this section summarizes the method without the statistical justifications.
First, the articles from 1996 were split into three sets: those including the word “women,” those including the word “men,” and the rest. The terms “man” and “woman” were not used because “man” had many false matches for meanings related to “humans,” contaminating the results. A chi-squared test with familywise error correction (Benjamini & Hochberg, 1995) was then used to identify words that were statistically significantly more common in one of the two sets (p = 0.001). This was repeated for 2020, after taking a random sample of the 2020 articles of the same size as 1996 (to avoid making the 2020 analysis statistically more powerful). The corresponding gender sets from the 2 years were then combined to give an overall women-associated words list and an overall men-associated words list.
The lists of terms statistically significantly associated with women were then split into three sets based on an examination of their use within article titles and abstracts: probably indicates female, possibly indicates female, and does not indicate female. The same split was made for the set of words for men (Supplement A: Table S6). The three-way split was useful because the terms had varying associations with gender and varying degrees of usefulness. For example, childbirth was probably female, breast was possibly female (the phrase “breast height” was used for both genders), and delivering was not useful for indicating female. The term delivering was strongly associated with women for its use related to childbirth, but was also used in many other contexts and, when used for women, seemed to be accompanied by other terms in the list, so was redundant. Many of the male-associated terms were for diseases that disproportionately impact men, such as coronary heart disease and smoking, but these are also major problems for women so could not be classed as male diseases. Other male-associated terms that were common and not indicative of gender, such as adult, sex, and left, were also excluded. For these reasons, more of the male-associated terms were ignored.
In a few cases of clear imbalance, terms were added to the lists even though not present in the original results: he, him, his, and lesbian. In many cases, the meaning of the words had to be looked up online (e.g., oocyte is an immature egg cell, hence indicating a female, but Xenopus is a frog species, so a Xenopus oocyte wouldn’t be related to women). In two cases, words were common but in one context strongly gendered, so were replaced by phrases: section replaced by “C section,” and vitro replaced by “vitro fertilisation” and “vitro fertilization.”
The hormone-related terms in the “probably” lists were not useful in practice, despite strong gender associations. They seemed to be rarely used by the dominant gender (e.g., testosterone for men) without mentioning gender and therefore did not identify many new gender classifications, but were sometimes used for the other gender investigated (e.g., “Pelvic ultrasound and hormonal studies were performed in 29 adolescent patients, aged 12 to 20 years, to evaluate menstrual irregularities. […] Serum levels of LH, LH:FSH ratio, testosterone, and androstenedione were significantly higher (p < .05) in group III”). Many of the other terms in this list were “probably” for some meanings of the words but not for others (e.g., msm = “men who have sex with men” implies men, but msm = “mainstream media” doesn’t). Almost all the probable terms could also be used for nonhumans (in contrast to woman, women, men, man), so there is no “Definitely” category and they could not be added to the list used to separate the initial sets in Scopus (“women,” “men,” “woman,” “man”).
2.6. Articles Relating to Women or Men: Manual Analysis of Random Samples
Extracts based on the terms in Supplement A, Table S6 were added to the classification spreadsheets, which were then manually classified by the first author. Because this procedure relies on words derived from a word association analysis, the concepts of “relating to women” and “relating to men” were therefore effectively operationalized as follows.
An article relates to women (female humans aged 18+) if it mentions woman, women, male(s), female(s), another gendered term (e.g., mother) or any of the probably or possibly terms (for men or women) in Supplement A Table S6 and the subject of the study includes women. The article author and other researchers do not count, unless the researcher’s name is in the article title (e.g., “Magi Sque’s contribution to nursing practice”). This includes implicit mentions, such as “30% of subjects were men” or “smoking is more common amongst men.” A human fetus was not equated with the woman whose body it formed part of, but the placenta was. These were judgment decisions that could reasonably have been made differently.
An article relates to men (male humans aged 18+) if it mentions man, men, male(s), female(s), another gendered term (e.g., uncle), or any of the probably or possibly terms (for men or women) in Supplement A Table S6 and the subject of the study includes men. The author and other researchers do not count, unless the researcher’s name is in the article title (e.g., “Tribute to Bob”). This includes implicit mentions, such as “12 of the adults were female” or “the treatment is more effective for women.”
Mentions of humans, adults or “gender” were not regarded as relating to women or men because the focus is on research that relates directly to each gender. Mentions of almost exclusively male or female conditions (e.g., breast cancer) or human body parts (e.g., testicles) were regarded as mentioning the relevant gender. During the classification, terms were looked up online if their context was unclear. For example, the phrase, “The SMAD3 (mothers against decapentaplegic homolog 3) phosphorylation (pSMAD3) was significantly enhanced, and pSMAD3 staining was colocalized with αSMA in vein walls,” was judged unrelated to women because Wikipedia revealed SMAD3 to be an ironically named protein unrelated to mothers.
The first author’s classification of the four main (1, 2, 7, 8: men, women, exclusively men, exclusively women) and four opposite (4, 5, 9, 10: no men, no women, not exclusively men, not exclusively women) sets of articles were cross-checked by the second author blind coding a random sample of 100 articles from each of them and each year (i.e., 1,600 articles in total). The second author was also not told the overall results of the first coding in advance, in addition to not knowing the codes for individual articles. For each set, a random sample of 100 articles was selected using Excel’s random number generator. Separate sets of 100 were selected for 1996 and 2020 in case the results varied over time. For the greatest testing power, the sets were balanced if possible (50% for each code based on the first author’s results); otherwise the sets were made as balanced as possible. Cohen’s kappa (Cohen, 1960) intercoder consistency scores for each of these sets were substantial or higher in all cases, at least 0.74, validating the first author’s results. The one exception, for which no kappa could be calculated, had 100% agreement (Table 1).
Set . | 1996 . | 2020 . |
---|---|---|
Women | [100%] | 0.80 |
No women | 0.92 | 0.76 |
Men | 0.84 | 0.88 |
No men | 0.86 | 0.80 |
Women but no men | 0.76 | 0.74 |
Men or no women | 0.78 | 0.74 |
Men but no women | 0.80 | 0.90 |
Women or no men | 0.88 | 0.74 |
Set . | 1996 . | 2020 . |
---|---|---|
Women | [100%] | 0.80 |
No women | 0.92 | 0.76 |
Men | 0.84 | 0.88 |
No men | 0.86 | 0.80 |
Women but no men | 0.76 | 0.74 |
Men or no women | 0.78 | 0.74 |
Men but no women | 0.80 | 0.90 |
Women or no men | 0.88 | 0.74 |
2.7. Statistical Corrections
Set . | Sample size . | 1996 . | 2020 . |
---|---|---|---|
Women | 1,000 | 1.000 | 0.998 |
No women | 10,000 | 0.0428 | 0.0535 |
Men | 1,000 | 0.868 | 0.950 |
No men | 10,000 | 0.0326 | 0.044 |
Women but no men | 1,000 | 0.920 | 0.870 |
Men or no women | 10,000 | 0.0227 | 0.0207 |
Men but no women | 1,000 | 0.667 | 0.653 |
Women or no men | 10,000 | 0.0111 | 0.0124 |
Set . | Sample size . | 1996 . | 2020 . |
---|---|---|---|
Women | 1,000 | 1.000 | 0.998 |
No women | 10,000 | 0.0428 | 0.0535 |
Men | 1,000 | 0.868 | 0.950 |
No men | 10,000 | 0.0326 | 0.044 |
Women but no men | 1,000 | 0.920 | 0.870 |
Men or no women | 10,000 | 0.0227 | 0.0207 |
Men but no women | 1,000 | 0.667 | 0.653 |
Women or no men | 10,000 | 0.0111 | 0.0124 |
The other results were calculated similarly. This method works for the 2 years with manually classified random samples (1996 and 2020), and the percentages of correct and false matches for other years were estimated by linearly interpolating the proportions correct or false from these 2 years.
2.8. Content Analysis
A content analysis (Neuendorf, 2015) was applied to 500 (1%) articles mentioning women but not men and 500 (2%) articles mentioning men but not women to identify some common gendered topics for background to the main results. A sample size of 500 was chosen to identify the main themes, although it would not reveal rare topics. The 500 articles in each case were chosen at random from 2020 to give up-to-date information. The content analysis was applied inductively. The text was read first and then categories were added when a common factor was noticed in the titles and abstracts that would allow multiple related articles to be grouped together. Different schemes were used for the two sets of articles because the article types tended to be different, and a combined set of categories would be unhelpful. This analysis was conducted for an earlier version of the paper, before the short or missing abstract problem was discovered and so includes a small number with short or no abstracts (2.4% and 2.8% of the two sets). The codebooks used are in Supplement B.
The first and second authors independently coded the results with the same scheme to test for the accuracy of the coding. A Cohen’s kappa intercoder consistency check was used (Cohen, 1960), giving a score of 0.782 for women and 0.835 for men. These scores are high enough to validate the results. For example, according to one popular set of guidelines, they could be characterized as “substantial agreement” and “almost perfect agreement,” respectively (Landis & Koch, 1977).
3. RESULTS
Using the simplistic heuristic of counting words in Scopus-indexed journal article titles without correcting for false matches or changes in the journal coverage of Scopus, published academic research seems to have switched from focusing on men to focusing on women in 1986 (Figure 1), but this conclusion is likely to be untrue for multiple reasons. The data includes irrelevant matches (e.g., “man” meaning chess piece or operate), sexist language (e.g., men meaning people), research about these groups using other terminology or mentioning them outside titles, and substantial changes in the journals indexed by Scopus. Nevertheless, this graph provides at least weak support for the contention that science has changed from its early focus on men’s interests.
The remaining results relate only to Scopus-indexed journal articles with abstracts containing at least 500 characters, published 1996–2020. To avoid repetition, the phrase, “mentioning women” will be used to refer the occurrence of either “women” or “woman” (or possessives) in an article title and abstracts, and similarly for “mentioning men.” The phrases “relating to women” and “relating to men” will be used as defined in Section 2.6.
3.1. Research About or Mentioning Men or Women in Titles or Abstracts
The terms “women” or “woman” increased in use from 1996 to 2003, then decreased to 2018, before increasing again, giving an overall small increase (Figure 2). These terms are more commonly used in article titles and abstracts than are the terms “men” or “man,” which overall decreased in use over the same quarter century. As a result of this, the ratio of the female to the male terms increased unevenly from 1.37 to 1 to 1.76 to 1 over this quarter century. In contrast, the terms “people” or “person” have substantially increased in use. The increases in 2020 may be due to an increase in men, women, and people research in response to the COVID-19 pandemic.
After accuracy checks on random samples of 1,000 or 10,000 articles from 1996 and 2020 and interpolating accuracy linearly between the 2 years, the percentage of articles relating to men or women was estimated within articles mentioning or not mentioning men or women (Figure 3). Overall, more articles relate to women than mention women in titles or abstracts and similarly for men. For articles mentioning women, there are statistically significant differences between years (nonoverlapping confidence intervals), but not an overall simple trend. Annual changes may be due to the inclusion or exclusion of individual people-related journals in Scopus. The same is true for men. There is also a statistically significantly higher percentage of articles relating to women but not mentioning them in 2020 compared to 1996 (the line is almost straight because it is primarily based on accuracy interpolation between 1996 and 2020). The same is true for men. The “men” or “man” line does not decrease as much in Figure 2 due to a decrease in false matches. This was partly caused by a reduction in the use of sexist language (e.g., “men” for “people”).
The Figure 3 lines for each gender were added to estimate the overall proportion of Scopus-indexed articles relating to women and to men. This gives the main result for this subsection and addresses the first research question. There were statistically significant increases in both cases, with the gap widening slightly (Figure 4). The ratio of the female to the male terms decreased slightly from 1.41 to 1 to 1.39 to 1 over this quarter century, however.
3.2. Research Exclusively About or Mentioning Men or Women in Titles or Abstracts
Considering exclusive uses only (i.e., women but not men, men but not women), the terms “women” or “woman” were used about twice as often in article titles and abstracts than were the terms “men” or “man” from 1996 to 2020, with the gap widening (Figure 5). The ratio of the female to the male terms increased from 1.65 to 1 to 2.41 to 1 over this period.
After accuracy checks on random samples of 1,000 or 10,000 articles from 1996 and 2020 and interpolating linearly between the two, the percentage of articles relating to men or women was estimated within articles exclusively mentioning or not mentioning men or women (Figure 5). For articles exclusively mentioning women, there are statistically significant differences between years (nonoverlapping confidence intervals), and, while there is an increase from 1996 to 2002 and then a slight decline, there is no overall simple trend (Figure 6). Annual changes may be due to the inclusion or exclusion of individual people-related journals in Scopus. There is a clear declining trend for men, however.
The difference between the percentages of articles exclusively mentioning women and exclusively relating to women in the opposite set (i.e., mentioning men or man but not women or woman) in 1996 and 2020 is not statistically significant (substantially overlapping confidence intervals), so despite the clear linear trend in the corresponding line (Figure 6), there may have been no change. The same is true for men.
The Figure 6 lines for each gender can be added to estimate the overall proportion of Scopus-indexed articles exclusively relating to women or exclusively relating to men. This gives the main result for this subsection and addresses the second research question. There were no statistically significant differences over the quarter century in either case (Figure 7). More importantly, over twice as much academic research exclusively relates to women than to men, with the ratio increasing slightly from 2.16 to 1 in 1996 to 2.25 to 1 in 2020.
3.3. Context of Studies Relating to Women or Men
The content analysis of 500 year-2020 article titles and abstracts mentioning women but not men found that men were also studied, despite not being mentioned, in 14.8% of cases and a further 1.2% were false matches (Table 3). For example, an abstract might report the proportion of women in a study, or factors that only applied to the women investigated. In the remaining articles, health-related factors (almost) exclusive to women accounted for well over half (61.4%) the articles: Maternity (34.3%), Female-specific health issues (18.6%) and Breast cancer (8.6%). Some 16.2% of the articles were case reports of a woman medical patient presenting with an unusual set of symptoms. Women’s equality, parity, or empowerment was discussed in 5% and violence against women in 1.7%. Obesity or weight management was the focus of 1.0% of the articles. A wide range of medical and other topics accounted for the remaining articles, but no extra classes could be found that included many papers.
Context . | Articles . | Percentage . | Valid percentage . |
---|---|---|---|
Maternity/pregnancy/childbirth | 144 | 28.8 | 34.3 |
Female-specific health issues (e.g., gynaecology) | 78 | 15.6 | 18.6 |
Patient case | 68 | 13.6 | 16.2 |
Breast cancer | 36 | 7.2 | 8.6 |
Women’s equality | 21 | 4.2 | 5.0 |
Violence against women | 7 | 1.4 | 1.7 |
Weight management | 4 | 0.8 | 1.0 |
Other | 62 | 12.4 | 14.8 |
Men also studied | 74 | 14.8 | |
False match | 6 | 1.2 | |
Total | 500 | 100.0% | 100.0% |
Context . | Articles . | Percentage . | Valid percentage . |
---|---|---|---|
Maternity/pregnancy/childbirth | 144 | 28.8 | 34.3 |
Female-specific health issues (e.g., gynaecology) | 78 | 15.6 | 18.6 |
Patient case | 68 | 13.6 | 16.2 |
Breast cancer | 36 | 7.2 | 8.6 |
Women’s equality | 21 | 4.2 | 5.0 |
Violence against women | 7 | 1.4 | 1.7 |
Weight management | 4 | 0.8 | 1.0 |
Other | 62 | 12.4 | 14.8 |
Men also studied | 74 | 14.8 | |
False match | 6 | 1.2 | |
Total | 500 | 100.0% | 100.0% |
A corresponding content analysis of article titles and abstracts mentioning men but not women found that women were also studied, despite not being mentioned, in 20.6% of cases and a further 1.8% were false matches (Table 4). In addition, 10.6% of the ostensible mentions of men were additional false matches from sexist or noninclusive language that referred to all genders collectively as men or man, such as “since men walked the earth,” “man-made,” “man-machine interaction,” and “man-in-the-middle cyberattack.” From the remainder, men’s health issues accounted for only 13.8% (including testicular and prostate cancer), although 44.8% of the rest were medical case reports of individual men presenting with unusual symptoms.
Context . | Articles . | Percentage . | Valid percentage . |
---|---|---|---|
Patient case | 150 | 30.0 | 44.8 |
Men who have sex with men | 30 | 6.0 | 9.0 |
Male-specific health issues (other) | 24 | 4.8 | 7.2 |
Testicular and prostate cancer | 22 | 4.4 | 6.6 |
Sport and exercise | 20 | 4.0 | 6.0 |
Masculinity, fatherhood | 12 | 2.4 | 3.6 |
Other | 77 | 15.4 | 23.0 |
Women also studied | 103 | 20.6 | |
Sexist language | 53 | 10.6 | |
False match | 9 | 1.8 | |
Total | 500 | 100.0% | 100.0% |
Context . | Articles . | Percentage . | Valid percentage . |
---|---|---|---|
Patient case | 150 | 30.0 | 44.8 |
Men who have sex with men | 30 | 6.0 | 9.0 |
Male-specific health issues (other) | 24 | 4.8 | 7.2 |
Testicular and prostate cancer | 22 | 4.4 | 6.6 |
Sport and exercise | 20 | 4.0 | 6.0 |
Masculinity, fatherhood | 12 | 2.4 | 3.6 |
Other | 77 | 15.4 | 23.0 |
Women also studied | 103 | 20.6 | |
Sexist language | 53 | 10.6 | |
False match | 9 | 1.8 | |
Total | 500 | 100.0% | 100.0% |
There were relatively many articles about men who have sex with men, usually employing this terminology but sometimes describing the similar demographics “gay men” or “sexual minority men.” The relatively high coverage of these overlapping demographics seemed to be due to the extra challenges with HIV and the need to promote safe sex practices. There were no articles about female sexual minorities in the women-only set (i.e., Table 3). Nontrivial numbers of articles discussed male sports or exercise investigations involving cohorts of men (there were only two female sport/exercise articles in the women-only set of Table 3). Presumably, the larger amount of professional male sport or a masculinity-related greater tendency for males to consider performance in exercise accounts for the apparent male association of this topic. Masculinity was also investigated in 3.6%, including a few fatherhood-related articles. There was no corresponding female-only investigation of femininity, although gender norms for weight relate to some ideals of femininity (Nagata, Domingue et al., 2020), and violence against women is an outcome of some types of masculinity, so the topics relate.
4. DISCUSSION
A limitation of this study is that the inclusion of gender information in an abstract is sometimes stylistic. This particularly applies when a sample of people is studied and the researcher includes gender information in the sample even though gender was not a variable in the research (e.g., “we recruited 25 patients, including 12 males and 10 over 60”). The results are also limited by the focus on journal articles. It seems likely, for example, that much important gender research is published in edited volumes and books. The results are also limited by the focus on Scopus. Moreover, the analysis of explicit mentions of men or women may hide other types of topic androcentrism in the sense of choosing topics from a male perspective. They may also partly reflect researchers framing studies to foreground their implications for women or women’s equality to increase their perceived importance.
The increased percentage of research mentioning the words “person” or “people” in Scopus might reflect an increase in university courses and lecturers focusing on service sector vocations. For example, the rise of profession-focused degree courses, such as tourism, hospitality, and event management, in addition to nursing training moving to universities in some countries, has presumably generated a need for academic research to support them (e.g., if lecturers need PhDs to teach Master’s courses). The extent to which this change influences women:men ratios for research objects is unclear, however. While extensive people-focused research might naturally lead to more gender-focused research as part of a tendency to increase knowledge by specialization, research topics without people also tend to have few female researchers (Thelwall et al., 2019), and may therefore need research about their lack of female professionals or researchers.
As a triangulation check of the high women:men ratios found, Scopus was searched for the number of journals with articles indexed in 2020 and the journal name containing woman/women or man/men, for example, SRCTITLE(women OR woman) AND SRCTYPE(j) AND DOCTYPE(ar) AND PUBYEAR IS 2020. This retrieved 2,562 journal articles in 45 women-titled journals (e.g., Women and Criminal Justice) compared to 353 articles in seven men-titled journals (e.g., Journal of Men’s Health), excluding a journal with a sexist title (IEEE Transactions on Systems, Man, and Cybernetics: Systems). The women:men ratios for both journals and articles here greatly exceed the 2.25:1 found for articles exclusively mentioning women or men in their titles in 2020. The greater number of women-titled journals presumably reflects a reaction to the androcentrism identified by feminist critiques of science.
The increasing proportion of research focusing on women compared to men addresses one of the five types of androcentrism identified by the feminist critique of science. As there is also evidence of improvements in three of the others (see Section 1.1), this suggests that broad progress has occurred in addressing the concerns, although parity clearly has not been reached in the gender of scientists at least (UNESCO, 2021).
The relative scarcity of research focusing on men is potentially problematic given the long-term societal problems due to men that are far from being resolved, such as violence against women, men’s sexism (Schwartz et al., 2016), and the wider problems of toxic masculinity or masculinist extremism (de Boise, 2019).
5. CONCLUSIONS
The results show that journal article titles/abstracts relating to women (adults, 18+) over the past quarter century have been more common than those relating to men. This result is dependent on the operationalization of the phrase “relating to,” as described in Section 2. This may extend a trend of apparently decreasing male domination from the 1970s (Figure 1), although the pre-1996 evidence is not strong. In terms of research exclusively relating to one of the two genders analyzed here (women but not men; men but not women), research exclusively relating to women is now (2020) more than twice as prevalent as research exclusively relating to men. This suggests that academia is redressing its original androcentrism and prejudice against women in this regard, at least. It does not suggest that such prejudices have been eradicated because research areas may still have androcentric assumptions that do not manifest in abstract or title words (e.g., by selecting male-centric research objects, such as predominantly male types of work). In addition, as the feminist critique of science showed (Harding, 1986; Keller, 1982), other types of androcentrism exist that have not been investigated here.
There is not a simple way to estimate the optimal gender ratio for current research, if there is one, because of unknown overall gender differences in the need for research. These differences are demonstrated by the topics of the content analysis. Thus, this study cannot conclude that there is “enough” research relating to women compared to research relating to men and this is an issue that should be (and has sometimes been) tackled by subject specialists within their domains (e.g., Flores et al., 2021; Traylor et al., 2020). Nevertheless, it seems very likely that there has been a major shift since the early 1900s from an academic culture of treating women as invisible to a 2020s culture in which past mistakes are recognized and academia is taking health conditions that affect women seriously and also publishing a small but nontrivial amount of research into gender equality. Moreover, the current female-favoring balance seems appropriate and necessary, not only because of the complexity of maternity but also because of the importance of eradicating sexism in society (even though this accounted for only 5% of the woman-focused articles in 2020). It is therefore impossible to decide whether the current ratio of 2.25:1 favoring women for journal articles relating to one of these two genders is “about right,” too little, or too much. Nevertheless, this ratio and the general trends should be encouraging for women considering a career in academia and for society as a whole.
AUTHOR CONTRIBUTIONS
Mike Thelwall: Methodology, Writing–original draft, Writing–review & editing. Abrizah Abdullah: Methodology, Writing–review & editing. Ruth Fairclough: Writing–review & editing.
COMPETING INTERESTS
The authors have no competing interests.
FUNDING INFORMATION
This research was not funded.
DATA AVAILABILITY
The processed data used to produce the tables and graphs are available in the supplementary data files (https://doi.org/10.6084/m9.figshare.14720922 and https://doi.org/10.6084/m9.figshare.16680535). A subscription to Scopus and API access permission is required to replicate the research, with the methods described above.
REFERENCES
Author notes
Handling Editor: Ludo Waltman