Young male and female scientists: A quantitative exploratory study of the changing demographics of the global scientific workforce

Abstract In this study, the global scientific workforce is explored through large-scale, generational, cross-sectional, and longitudinal approaches. We examine 4.3 million nonoccasional scientists from 38 OECD countries publishing in 1990–2021. Our interest is in the changing distribution of young male and female scientists over time across 16 science, technology, engineering, mathematics, medicine (STEMM) disciplines. We unpack the details of the changing scientific workforce using age groups. Some disciplines are already numerically dominated by women, and the change is fast in some and slow in others. In one-third of disciplines, there are already more youngest female than male scientists. Across all disciplines combined, the majority of women are young women. And more than half of female scientists (55.02%) are located in medicine. The usefulness of global bibliometric data sources in analyzing the scientific workforce along gender, age, discipline, and time is tested. Traditional aggregated data about scientists in general hide a nuanced picture of the changing gender dynamics within and across disciplines and age groups. The limitations of bibliometric data sets are explored, and global studies are compared with national-level studies. The methodological choices and their implications are shown, and new opportunities for how to study scientists globally are discussed.


Introduction
We explore the changing demographics of the global scientific workforce from the combined perspectives of age, gender, academic discipline and time.Our approach is large scale, generational, and both cross-sectional and longitudinal.With this approach, we examine 4.3 million nonoccasional scientists (defined as scientists with an output of at least three Scopus-indexed articles) publishing In a longitudinal study, gender differences in publishing career lengths and dropout rates were studied in Huang et al. (2020); they used a career length matching design to study the relationship between career length and total productivity (412,778 female authors were matched with 412,778 male authors).A large proportion of the observed gender gaps were rooted in gender-specific dropout rates and subsequent gender gaps in publishing career length and total productivity (Huang et al., 2020, p. 4615).
The authors reconstructed the complete publication histories of all gender-identified authors from Web of Science whose publishing careers ended between 1955 and 2010.Their focus was on careerwise gender differences in productivity and impact.The gender gap was found to be increasing over time and persistent.Each year, women scientists had a 19.5% higher risk to leave academia compared with male scientists-which is a major cumulative advantage for male authors over time (Huang et al., 2020, p. 4613).
In their longitudinal research design, Boekhout et al. (2021) examined publication productivity for men and women who started their careers as publishing researchers in 2000, 2005, and 2010, using a full counting and fractional counting approaches.They showed an increasing trend in the percentage of women starting their careers as publishing researchers, from 33% in 2000 to 40% in 2015.Instead of considering entire publication careers (as in Huang et al., 2020), the authors compared the productivity of male and female scientists in specific years in their careers, showing that male scientists have a consistently higher publication productivity than female scientists, regardless of the year in which they started their career and period in their career, with differences in the range of 20-35% (full counting) and 25-40% (fractional counting) in favor of male productivity (Boekhout et al., 2021, p. 9; see Kwiek, 2016 andKwiek, 2018 on research top performers).
Finally, gender gap in self-citations across time and disciplines was examined in King et al. (2017), with men in the past few decades self-citing 70% more than women.Women were also more than 10 percentage points more likely not to cite their previous work at all.The authors linked self-citations to larger themes of inequality in science and cumulative advantage in science careers (because selfcitations increase citations).They reported the gender self-citation gap to be stable over the past 50 years.Compared with men, women have been overrepresented in the zero self-citation category and under-represented in terms of citing their papers (King et al., 1917, p. 8; for a general overview of gender differences in science, see Halevi, 2019 andSugimoto &Larivière, 2023).
Other examples of recent influential large-scale studies of academic careers and global publishing, collaboration, and impact patterns include Robinson-Garcia et al. (2020) who studied gender differences in archetype career tasks, Larivière et al. (2013) who examined global gender disparities in science, Nielsen and Andersen (2021) who studied the global citation elite, and Ioannidis et al. (2014) who focused on the continuously publishing core in global science.Robinson-Garcia et al. (2020) examined 71,000 publications from PLoS journals with 350,000 distinct authors to profile scientists across three task specializations and the changes in their career stages.They used four career stages (junior, early career, mid-career, and late career, using the years passed from first publication); the three archetype tasks were studied: leader, specialized, and supporting.Scientists were reported to be unevenly distributed by gender in each archetype, with men being more likely to be leaders and women to representing the specialized archetype in early career stages, which is the most important for later academic promotions (Robinson-Garcia et al., 2020, p. 12).
The authors constructed publication histories and grouped publications by career stages, using the minimum threshold of five publications, academic age based on the first publication, and the 90% accuracy threshold in assigning gender to individual scientists.
Global gender disparities in science were also studied in Larivière et al. (2013).The authors used 5.5 million papers and 27.3 million authorships to show that, globally, women account for fewer than 30% of fractionalized authorships and are similarly under-represented regarding first authorships.Female collaborations tend to be more domestically oriented than collaborations of males from the same country, and when a women is in prominent author positions (sole authorship, first authorship, and last authorship), a paper attracts fewer citations than when a man was in one of these roles (Larivière et al., 2013, p. 213; see also Kwiek, 2020).Based on a dataset of 4 million authors and 26 million papers, Nielsen and Andersen (2021) studied the rise in global citation inequality, with a small stratum of elite scientists accruing increasing citation shares.They examined the temporal trends in the concentration of citations at the author level, focusing on differences in the degree of concentration across fields, countries, and institutions.They found that the top 1% most cited scientists ("the citation elite") have increased their cumulative citation shares from 14% to 21% between 2000 and 2015 without increasing its general productivity level (in fractional counts) or its impact per paper.The authors in the citation elite increasingly reside in Western Europe and Australasia, with a decreasing share of top-cited scientists in the United States (Nielsen & Andersen, 2021, p. 4).Nielsen & Andersen (2021) and Ioannidis et al. (2014), in contrast to Larivière et al. (2013), did not disaggregate their results by gender or by academic age.However, Nielsen & Andersen (2021) noted that citation-elite membership is strongly correlated with age and suggested future research within and across age cohorts.
Finally, in their study of the "continuously publishing core" of the global scientific workforce, which was based on 15.2 million publishing scientists from 1996 to 2011, Ioannidis et al. (2014) showed that less than 1% of scientists-or about 150,000-published their research each year in the studied 16-year period, accounting for as much as 87.1% of papers with more than 1000 citations.The authors examined what they termed "uninterrupted continuous presence" (UCP) in the Scopusindexed literature, analyzing who maintains their presence each and every year for many years, which is another dimension of the "elite" or "core" status in science.The proportion of scientists with a UCP presence is very limited, but they account for the lion's share of researchers with a high citation impact.
As in the case of our present research, the authors used Scopus author identifiers rather than attempting to disambiguate authors on their own.The UCP-birth and UCP-death years of an author were the calendar years that start and end their chain of uninterrupted, continuous, and annual publications (Ioannidis et al., 2014, p. 2).The 1% of scientists was found to be a very influential core of science, with much higher citation metrics than other researchers.Although the global scientific workforce is enormous, its continuously publishing core is very limited, with many departments or institutions having none or very few researchers who belong to this group (Ioannidis et al., 2014, p. 9).The analysis did not consider variables such as gender or academic age, without disaggregating the data into countries, men and women, or career stages, with the assumption being that the UCP presence by definition refers to older age cohorts and higher seniority levels.

Academic Careers: Examples from the USA
Also, large-scale, national-level studies of academic careers in the USA have been increasingly precise in terms of gender, discipline, and age determination.For instance, Way et al. (2017) examined the traditional "rapid rise, gradual decline" narrative about productivity patterns, showing that this pattern holds for only 20% of individual faculty (while for the remaining 80%, there is a rich diversity of patterns).Using a DBLP dataset of 200,000 publications and career trajectories of 2,453 tenure-track faculty from computer science departments and their CV data, the authors showed how much diversity is hidden behind average academic career trajectories, creating inaccurate pictures of productivity patterns.The authors examined the productivity trajectories of individual researchers in an entire field of research and showed that 60 years of research on aggregate trends needs a revision in view of the conclusions derived from studies based on much larger and more comprehensive datasets.
Although academic experience was heavily used, gender differences were not studied.(Similarly, using the Academic Analytics commercial database, Savage and Olejniczak (2021) showed that the career publication activity of US scientists does not follow the traditional "peak-and-decline" pattern described in earlier studies.) Finally, using a combination of data sources such as Academic Analytics, Web of Science, and the NSF Survey of Graduate Students and Postdoctorates in Science and Engineering, Zhang et al. (2022) showed that the disproportionate productivity of scientists in US elite institutions can be largely explained by their substantial labor advantage: their better access to externally funded graduate and postdoctoral labor.
They used a matched pair design in which one midcareer researcher in the pair moved to a working environment with more available labor, while the other moved to an environment with less available labor (n=778 faculty), with detailed productivity data for 78,000 faculty across 25 scientific disciplines.The association of institutional prestige with greater productivity was explained by greater available funded labor, which drove larger group sizes, thereby increasing group productivity (Zhang et al., 2022, p. 6).
The productivity dominance of researchers at elite institutions was found to not result from inherent characteristics (such as differences in talent) but rather can be explained by the greater labor resources provided to them in more prestigious environments.The authors showed the pivotal role of funded labor and external research funds in explaining the dominance of elite institutions but did not distinguish between academic careers by men and women scientists.

Academic Careers: Cross-National Survey-Based Studies
Additionally, recent changes in academic careers have been widely documented in a separate line of research: the literature generated by cross-national comparative survey designs.Large-scale comparative studies have included books on the United States (Cummings & Finkelstein, 2012) and Japan (Arimoto et al., 2015), as well as Europe (Kwiek, 2019), with a focus on academic work (Fumasoli et al., 2015); recruiting and managing the academic profession (Teichler & Cummings, 2015); internationalization of teaching and research (Huang et al., 2014); the relevance or impact of research (Cummings & Teichler, 2015); and the various faces of internationalization of the academic profession (Calikoglu et al., 2023).
Cross-national comparative studies from this line of research (summarized in Carvalho, 2017) have provided excellent complementary sources to studies of bibliometric datasets: they are relatively small scale, with national datasets usually in the range of 1,000 to 4,000 observations, and focusing on issues not obtainable through bibliometric data (such as, e.g., personal opinions, perceptions, and feelings; family life and motherhood in academia; university governance and management; job satisfaction, etc.).
Although young men and women are often examined in survey-focused literature under the label of juniors (contrasted with seniors), the number of cases is usually too limited to analyze the differences by disciplines, and research designs only allow for cross-sectional analyses.

Academic Careers: Statistical Reports
There have also been several reports on women in science over the past few years, with different geographical focus (see, e.g., NSF, 2023 on the USA; EC, 2021 on the European Union; and globally, Elsevier, 2020, Elsevier, 2017, and Elsevier, 2015).Specifically, the Elsevier reports on gender differences in research provide statistics and analyses on similar topics to ours.
There are, however, important differences between our approach and those in the reports in the study design, research focus, methodology, and results.
Most importantly, our focus is on specifically defined (via age groups based on academic age or academic experience) young male and female scientists and their changing participation across STEMM (traditional STEM disciplines plus medicine) and over time in 38 OECD countries; we have used a combination of horizontal (men compared with women in the same age group and across time) and vertical (men and women separately disaggregated into age groups and compared across time) approaches; and our unit of analysis is the individual scientist with specific characteristics derived from large-scale bibliometric datasets, especially nonoccasional status in science, which requires meeting the threshold of having at least three research articles published.
The first report is a cross-national comparative study of European countries.She Figures 2021 (EC, 2021), which covered 44 countries, used the data extracted from Eurostat statistics on education, research and development, professional earnings and human resources in science and technology, and the Scopus database.The report discussed the labor market participation of researchers, working conditions of researchers, career advancement and participation in decision-making, and research and innovation output, all of which is outside of the scope of our paper.Interestingly, the report provided examples of actions taken to promote gender balance in science across different countries (e.g., EC, 2021, pp. 183-185).It provided the data on women among doctoral graduates across broad fields of study, including the STEM fields, with a general conclusion that women remained underrepresented in most STEM fields, with little or no progress since 2015 (EC, 2021, p. 39).The report discussed the changes in the Glass Ceiling Index (GCI), a relative index comparing the proportion of women in academia to the proportion of women in top academic positions for 2015-2018, with the GCI decreasing in most countries studied (EC, 2021, pp. 192-194).
The report provided an analysis of the gender gap among active authors, who were defined as those who produced 10 or more papers over the past 20 years (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018)(2019) and at least one paper in the past five years or those who produced four or more papers in the past five years.The report used three seniority levels estimated via the time elapsed since an author's first publication in Scopus (early-stage, middle-stage, and senior authors) and the ratios of women to men among active authors by broad fields and countries.
The major takeaway from the report is that, among early-stage authors, the gender gap was smaller, but as seniority level increased, the gender gap widened to twice as many men as women authors.
Women were the least represented in the natural sciences and engineering and technology and most represented in medical and health sciences and agricultural and veterinary sciences (EC, 2021, p. 218).
The second report is a single-nation study of the USA.The NSF report (NSF, 2023) analyzed women, minorities and persons with disabilities in STEM workforce in the USA, specifically using gender, race and ethnicity, and degree levels.This was a single-nation report with no references to academic publications or career stage or academic age combined with gender.The report used the notion of the STEM workforce as defined in labor force statistics, which included workers in science and engineering (S&E) and S&E-related and middle-skill occupations.In 2021, nearly a quarter (24%) of individuals in the US workforce were employed in STEM occupations (NSF, 2023, p. 8).
The three reports from Elsevier came the closest to our study in terms of their methodologies because they used the Scopus dataset and individual Scopus identifiers to define individual scientists.The single-nation report on Germany (Elsevier, 2015) paved the way for the report on 12 geographies (Elsevier, 2017) and a more comprehensive report on gender in research in 16 geographies (Elsevier, 2020).The first report linked Germany's relatively low share of female researchers among European countries to its research focus on physical sciences and mathematics, which are traditionally male-dominated fields.Female researchers in Germany are reported to be concentrated in medicine (and social sciences, which is not discussed in our paper).
Consistent with the findings from other studies, the share of women was lower among senior researchers than among junior researchers, with a "leaky" pipeline in science careers: a higher proportion of women than men moved out of the world of science while moving up the academic career ladder (Elsevier, 2015, p. 9).In the report, the research productivity and citation impact of men and women per year by seniority level was compared.Across the three seniority categories, male scientists had higher productivity compared with female scientists; however, gender gaps in citation impact were visible mainly for junior and middle-senior levels and almost disappeared for senior levels (Elsevier, 2015, pp. 12-16).
Another Elsevier report (Elsevier, 2017) provided detailed methodological and data sources appendices, with major procedures explained and definitions provided.Specifically, name and gender disambiguation for researchers was described, as were the concepts of "active researchers," "authors," "inflow," "outflow," "migratory," "transitory," and "nonmigratory" researchers (Elsevier, 2017, pp. 84-87).In the report, the proportion of men and women among researchers in 12 comparator countries and regions in the two time periods (1996-2000 and 2011-2015) was analyzed by subject areas for each gender and comparator.
The key findings were that the proportion of women among researchers has increased in all comparators and that women tend to specialize in biomedical fields and men in physical sciences (Elsevier, 2017, p. 19).The report did not refer to career stages and gender, especially not analyzing the participation of young female scientists in science.
Finally, the most detailed report on men and women in science today was again by Elsevier (2020).In its comprehensive approach, it provided the gender-disaggregated data on science participation, publishing career and mobility, and collaboration networks across 15 countries and the EU-28 and used the Scopus database.The analyses used four broad subject clusters (physical sciences, health sciences, life sciences, and social sciences) and 27 major subject areas (e.g., mathematics, medicine, biochemistry).The major procedures were described in detail in the appendices: author definition and disambiguation (Scopus Author Profiles), active authors, author country and subject area assignation, country selection, author gender inference, author publication history, author mobility, and author collaboration network analysis (Elsevier, 2020, pp. 119-133)."Active authors" were defined as those who authored at least two publications in the study periods (1999-2003 and 2014-2018).
The report showed that men are more highly represented among authors with a long publication history and women-with a short publication history (Elsevier, 2020, p. 37).In terms of publication output, on average, women published less than men in a five-year period in every country assessed, regardless of authorship position (Elsevier, 2020, pp. 37-38) and in terms of citation practice, the average Field-Weighted Citation Impact of men was higher than that of women (Elsevier, 2020, p. 41).
In the report, the concept of "academic age" was not used, and researchers' academic stages were not defined accordingly.However, four groups based on the length of publication history were used.The report did not focus on the participation of young women (and men) in science across disciplines and over time, but it provided excellent methods to study academic careers using bibliometric data sources.
Our paper examines what we can know-based on available global data sources of the bibliometric type-about the changing demographics of the scientific workforce globally and over time.We wanted to explore how useful the potential global data sources can be in analyzing the scientific workforce along the combined four dimensions of gender, age, discipline, and time.We tested how demographic transformations of the global science profession can be measured using new data sources, hence transgressing the traditional approach in which national statistics from national statistical offices are aggregated, as in the OECD, UNESCO, and the European Union scientific workforce datasets.
In the present research, we contribute to the discussion of the advantages and disadvantages of using global publication and citation databases-or "structured" Big Data (Holmes, 2017;Salganik, 2018;Selwyn, 2019)-in global academic profession studies in which the data on gender, age, and disciplines have traditionally been available almost exclusively cross-sectionally (single points in time), mostly on a small national scale (through case studies) and increasingly on a small international comparative scale through cross-national survey research of the academic profession.We unpack the details of the changing scientific workforce using ten 5-year age groups within each discipline from a longitudinal perspective.

Women in STEM: Theoretical Background
The global picture of young men and women in science is a general overview of their representation across disciplines around the world.This global picture shows patterns and trends over time and across disciplines.The representation varies widely at the national level because of social, economic, political, and cultural factors.There are countries with stronger policies and initiatives in place to encourage women to pursue STEM education, with a larger pool of women graduates entering doctoral programs and the academic profession; and there are countries where cultural and societal attitudes may discourage women from pursuing careers in science.
As a result, while variations by country can be huge, our interest is in global cross-disciplinary differences changing over time.Targeted interventions and policies to address the underrepresentation of women in some disciplines, here resulting from both low entering shares and high exiting shares for women young and older alike, need to be developed at a national level.
By examining the national picture, we can obtain a more nuanced understanding of the representation of women in science, leading to more effective strategies at the level of disciplines.In the present research, we do not consider career breaks, which may be more common among women because of caregiving responsibilities; and we do not consider the broader context of gender and work-family balance.
Young scientists-young female scientists in particular-face unique challenges and barriers to enter, continue, and advance in science careers.Apart from underrepresentation of women in science, there are implicit biases (stereotyping and discrimination against women in STEM); unwelcoming or hostile workplace cultures, especially in male-dominated disciplines; and challenges related to work-life balance and motherhood responsibilities, possibly leading to career interruptions and slower career progression.As the Elsevier (2020) report showed, women continue to face significant challenges at every stage of their careers: they are under-represented in senior positions, less likely to collaborate internationally, more likely to experience career breaks, less likely than men to publish articles in high-impact journals, and have articles that are cited less frequently, on average (see Sugimoto & Larivière, 2023;Tang & Horta, 2023;Dusdal & Powell, 2021;Kwiek & Roszka, 2021b;Kwiek & Roszka, 2022a).
Although both men and women leave science in some proportions, the attrition for women in STEM is higher.Major theories about women leaving science are "leaky pipeline" theory, the "chilly climate" hypothesis, and the "self-selection" hypothesis: leaky pipeline theory suggests that there is a significant loss of talent at every stage of the academic career pipeline, from female graduates to female postdocs to female assistant professors and to female tenured professors because of systemic barriers such as bias and discrimination (see, e.g., Sexton et al., 2012;Shaw & Stanton, 2012;Sheltzer & Smith, 2014;Wolfinger et al., 2008); chilly climate theory suggests that a hostile or unwelcoming work environment in STEM disciplines can discourage women from pursuing careers (see, e.g., Cornelius et al. 1988;Hall & Sandler 1982;Maranto & Griffin 2011;Morris & Daniel 2008); and self-selection theory suggests that women are under-represented in STEM disciplines because they are less interested in these disciplines because of societal and cultural factors that discourage them (see, e.g., Britton, 2017;Hyde et al., 1990;Whitt et al., 1999).
Finally, the glass ceiling metaphor is used to describe gender inequality in science from a different angle: an invisible barrier that prevents women from advancing to higher levels of leadership and power within organizations, including universities.There are systemic barriers that make women unable to reach the opportunities and rewards above them.An invisible barrier limits professional recognition, with few women becoming full professors (see e.g., Morrison et al., 1987;Tang, 1997).

Research Questions
We focus on the individual scientists (with their unique identity) as the unit of analysis, rather than publications.Although a bibliometric data source is used (Scopus raw data provided to us by Elsevier's International Center for the Studies of Research (ICSR) Lab through a multiyear collaboration agreement), our focus is on scientists and their attributes rather than publications and their properties.Our micro-data show gender, academic age or academic experience, discipline, country, and publications and their types (lifetime); we turn bibliometric data sources on publications into data sources on individuals.
Our three research questions regarding publishing and nonoccasional STEMM scientists are as follows: (1) What is the global disciplinary distribution of young male and female scientists?(2) How do the global gender and age distributions of scientists across disciplines change over time, especially for young male and female scientists?(3) How is the participation in science of female scientists changing over time and across disciplines, and what are the disciplinary gender participation trends?

Data
The major characteristics of the longitudinal study population for 1990-2021 (4,314,666 scientists, including 1,645,860, or 38.15% female) are presented in Table 1.The major characteristics of the cross-sectional study subpopulation for 2021 (1,502,792 scientists, including 579,399, or 38.55% female) are presented in Table 2. Our population was constructed as follows (we refer to the population rather than the sample because we have all scientists, with their attributes, as units of analysis): First, to determine the number of scientists, unique authors of publications (type: journal article, conference paper in a book, or a journal) who published their works in 1990-2021 were selected.For this selected group of authors, the years of their research activities were determined.The resulting set of scientists was then narrowed down according to a package of five restrictions: (1) an OECD country, (2) a STEMM discipline, (3) gender (binary approach: man or woman), (4) a nonoccasional status in science: a minimum scientific output defined as three publications throughout the scientist's career (lifetime), and (5) academic age, or the time passed since the first publication, here in the 1-50 years range.
The minimum output in lifetime publication history allowed us to limit our population to nonoccasional scientists, that is, scientists functioning in the scientific community more than accidentally.Additionally, scientists with one or two publications in the Scopus database are more likely to result from mistakes made by author name disambiguation algorithms (see Boekhout et al., 2021, p. 3).Generally, in terms of author name disambiguation, Scopus is more accurate than Web of Science (Sugimoto & Larivière, 2018, p. 36).Then, for each scientist, academic experience in full years, beginning in the year of the first publication of any type, was determined.For each year of a scientist's research activities, the length of their academic experience and membership in the corresponding academic age group were determined.We used a population for 1990-2021 for longitudinal analyses, a subpopulation for 2021 for a cross-sectional analysis, and the two subpopulations for 2000 and 2021 for analyses comparing two points in time.Figure 1 summarizes the population's design.

Methods
In this section, we present the five basic procedures to unambiguously define the attributes of the scientists in our population.We initially used raw data for 2020 and before, here based on the Scopus database version dated 18 August 2021.The raw data were made available to us by Elsevier under an agreement with the ICSR Lab.Finally, the Scopus database version for 2021 and before, dated October 21, 2022, was used.
To obtain the results at the aggregate level, the operation in the ICSR Lab relied on the use of the Databricks environment, which allowed for managing and executing cloud computing with Amazon EC2 services.The scripts to generate the results were written using the PySparkSQL library.The work on obtaining the results proceeded in two steps.The first step was to work on 1% of the Scopus database data with the snapshot dated August 18, 2021, (from ICSR Lab: 1% of the data volume based on a set of 20,000 publications between 2010 and 2018 and including all publications cited by and citing these publications) using a cluster in standard mode with Databricks Runtime version 11.2, including Apache Spark technology in version 3.3.0,Scala 2.12, and an i3.2xlarge instance with 61 GB memory, eight cores, one to four workers for worker type, and an i3.xlarge instance with 30.5 GB memory and four cores for the driver type.Test runs of the scripts covered 1% of the data, with the goal of optimizing the time and cost of the performed calculations.

Gender determination
To obtain the gender of the scientists in the population, the gender data established by the ICSR Lab platform was first used (N author =34,596,581).Then, only scientists who had a defined gender (man/woman) with a gender probability score greater than or equal to 0.85 were included (N author =21,508,029).To assign gender to an author, the ICSR Lab used Elsevier's solution, which used the Namsor tool.Determining gender was based on three characteristics: author's first name, author's last name, and author's first country.The author's first country was determined based on the author's dominant country in their first publication year, which was based on output in the Scopus database.
For authors who had more than one dominant country, the observation was not assigned a value.The Namsor tool returned gender and gender probability score (Elsevier, 2020, pp. 122-123).

Discipline determination
To obtain the dominant discipline of scientists in the population, a set of publications from the Scopus database was used (N pub =85,585,123; N author =43,632,099).Publications were from 2021 and before and were restricted by source and type of publication: (1) journal article and (2) conference paper in a book or journal (N pub =60,987,987; N author =36,379,221).From the table of publications, the columns with publications' identifiers, authors' identifiers, and cited references were selected.Each cited reference (N citedreference =1,434,621,669) was accompanied by its discipline, as assigned by the discipline of the journal in which it appeared.The disciplines assigned to a cited reference were based on the four-digit ASJC code used by the Scopus database.To switch to a two-digit classification, unique disciplines were selected, here based on the first two digits of the four-digit value.Then, for each author, the number of cited references was counted for all disciplines referenced by the author, excluding the "multidisciplinary" discipline.For each author, the discipline with the highest number of cited references (modal value) was selected.A table containing the author's identifier and their dominant discipline was obtained.For the described summary, there could have been cases in which an author had several dominant disciplines or no disciplines (included N author =26,706,031).Here, authors who had more than one dominant discipline or no discipline were removed from the table (removed N author =9,673,190).Authors were removed, among other reasons, because the cited references from their articles may have referred to journals outside the Scopus database or because there was an equal number of cited references to different disciplines.Subsequently, the table was restricted to only authors with an assigned discipline from the STEMM group, resulting in the final number (N author =24,425,447).

Determining the country of affiliation
Publications were from 2021 or earlier and were restricted by source and type: (1) journal article and (2) conference paper in a book or journal.From the table of publications, columns with publications' identifiers, authors' identifiers, and countries for each author of the publication were selected.Then, for each author, the number of countries that the scientist indicated in all their publications was counted.For each author, the country with the highest number of references (modal value) was selected.For the described summary, there may have been cases in which an author had several countries (included N author =31,332,750).For this purpose, authors who had more than one country or no countries were removed from the table (removed N author =5,046,471).The table was then filtered to include scientists from 38 OECD countries.The final number was (N author=19,296,388).

Determining scientists' nonoccasional status
Under the proposed definition, a nonoccasional scientist has at least three research articles (as defined above) in their output.The publications were from 2021 or before and were limited by the same source and type of publication as above.Columns containing publications' identifiers and authors' identifiers were selected from the table of publications.For each author, the number of publications was counted.
The table was then filtered to include scientists who had a minimum of three publications (N author =12,057,755).

Determining academic age
Finally, to obtain the academic age of the scientists in the population, the same set of publications from the Scopus database was used, and the publications were from 2021 or before.Author identifiers and year of publication were selected from the table.For each author, the year of the first and last publication (of any type) was determined.Then, the number of years of authors' research activities (distance from the first to last publication in years) was calculated according to the following formula: year of the last publication -year of the first publication + 1. Authors who had more than 50 years of research activities were removed from the table (included N author =43,568,252; removed N author = 63,847).Then, for the authors included in the study (N author =4,314,666; i.e., the final population) that contained the years of academic activity defined for publications, the academic age in a given publication year was determined according to the following formula: publication year -year of first publication + 1.Based on the value of academic age, an author was assigned to an age group according to 10 ranges: 5 and less, 6-10, 11-15, 16-20, 21-25, 26-30, 31-35, 36-40, 41-45, and 46-50.

List of STEMM disciplines
We focused on all 16 STEMM disciplines (science, technology, engineering, mathematics, and medicine), as defined by the journal classification system used in the Scopus database (

Results
To study the gender distribution of the scientific workforce by age group, we used two complementary approaches we termed "horizontal" and "vertical." (1) A horizontal approach: Analyzing the gender distribution of scientists horizontally within the same age groups.For each discipline, for each of the ten 5-year age groups, the percentages of male and female scientists totaled 100%.
(2) A vertical approach: Analyzing the gender distribution of scientists vertically-separately male and separately female scientists-across all age groups.For each discipline, there was 100% of male and 100% of female scientists who were differently distributed across the 10 age groups.
Parts of our study have been based on a longitudinal research design in a broader sense, which requires a short methodological explanation.In longitudinal studies in a narrow sense, data are collected at multiple points in time from the same group of participants; we used this narrow approach in a recent study of 2,326 Polish full professors, tracing their promotions, publications and productivity classes over a period of 40 years (Kwiek & Roszka, 2023b).In classical definitions, longitudinal research concerns the data collection and analysis over time, and it is a broad term that describes a family of methods: specifically, longitudinal research includes repeated cross-sectional studies, prospective studies, and retrospective studies (Menard, 2002, pp. 2-3).As a minimum, any longitudinal design would permit the measurement of differences or a change in a variable from one period to another.In this broader sense, longitudinal research is research in which (a) data are collected for each item or variable for two or more distinct time periods; (b) the subjects or cases analyzed are the same (or at least comparable) from one period to the next; and (c) the analysis involves some comparison of data between or among periods (Menard, 2002, p. 2).
Our study represents both a cross-sectional design (in its analyses of a single point in time, 2021) and longitudinal design in a broader sense, in its repeated cross-sectional design variation (analyzing two points in time, 2000 and 2021, and the trend in 1990-2021, following the idea that cross-sectional data are repeated over time with a high level of consistency between questions; Ruspini, 1999).Our sets of cases, scientists with their micro-data, for each period are not entirely different: to some extent, they overlap (for scientists active for a longer period of time).Our micro-data are at the individual level of scientists, which means that their individual-level records contain the same variables measured at several different time points.For this reason, they were pooled to form a single data file: this increased the sample size and also introduced a temporal dimension, as suggested in the literature (Ruspini, 1999, p. 222).
This section is divided into the subsections on general results (3.1), results from a horizontal (3.2) and vertical (3.3) perspectives, which include both cross-sectional and longitudinal data, and the results of the trend analysis for the 1990-2021 period based on a longitudinal dataset (3.4).

General Results
Although the analysis of the changing numbers of male and female scientists over time may be distorted by the inability to distinguish between an expansion in numbers of scientists and in numbers of journals indexed in large bibliometric datasets, in contrast, the changing relative presence of female scientists is traceable.Although the increasing number of publishing scientists over time correlated with the increasing coverage in Scopus, the percentages of publishing male and female scientists were independent of the journal coverage.Consequently, although the number of publishing scientists changing over time was not a reliable measure of the changing women's participation in global science, the percentages of male scientists and female scientists adequately reflected the changes in the global academic workforce.
In 2021, 45.98% of the global scientific workforce in STEMM (as defined in this research, especially in the 38 OECD countries and with a nonoccasional publishing status in Scopus) was involved in medical research, with 690,958 scientists in medicine, followed by biochemistry, genetics, and molecular biology (213,039).In 2021, there were 1.5 million scientists, as defined in our population, with 923,000 men and 579,000 women (38.55%).The majority of female scientists (63.09%) were concentrated in six countries: the USA, Italy, the UK, Germany, France, and Spain.Over 70% of female scientists were in medicine and biochemistry, genetics, and molecular biology.Immunology and microbiology had the highest share of female scientists (50.03%), followed by several fields with over 40% female representation (e.g., AGRI, ENVIR, BIO, NEURO, PHARM, and MED).In contrast, engineering, physics and astronomy, computer science, and mathematics had 20% (or less) female representation.

A Cross-Sectional View (2021): All age groups horizontally
Disciplines at a single point in time (2021) were populated by the scientists of different age groups and genders.Figure 3 shows the percentage of female scientists across disciplines by age group.We generally observed the results of a huge inflow of female scientists (who were present in 2021) to most disciplines in the past years and decades: for younger generations working in 2021, the percentages of female scientists were substantially higher than for older generations.
Generally expecting ever more female scientists across all STEMM disciplines moving down the age groups, we assessed ongoing changes based on a snapshot (2021), especially examining the youngest age groups.MED and BIO showed a structure in which, for every successive lower age group in 2021, a higher share of female scientists was observed.PHYS, COMP, ENG and MATH, termed the Big Four in this paper, which have been traditionally male-dominated disciplines comprising about 262,000 scientists in our population (15.09%; including merely 30,649 women), in contrast, showed a stable structure in which, for every successive lower age group in 2021, a similar (or only slightly higher) share of female scientists was observed.These two contrasting demographic patterns showed different inflows of young female scientists to disciplines in the past: huge and increasing versus small and stable.This can be compared with mathematics MATH and biochemistry BIO: in a single year of interest, with the most recent data available, the share of very young, young, and middle-aged women is almost the same; in contrast, the share of women in the same age groups for BIO increases continually with every age group.
The current global disciplinary distribution of young women in science is consequential for gender parity in science in the future, despite the high attrition rate among young scientists generally and young female scientists in particular (1 in 10; see Boothby et al., 2022).The current young cohorts will be middle-aged cohorts within a decade, and the current oldest cohorts will disappear from the publishing enterprise, exiting from academic work, with new challenges for disciplines continuously heavily male dominated.Traditional gender-aggregated and age-aggregated data about scientists in general across disciplines, countries, and institutions hide a much more nuanced picture of the changing gender dynamics within and across disciplines and age groups.In this research, we examined the subpopulation of "young" scientists (academic age 10 and less years, Figure 4).

A comparative horizontal view (2000 vs. 2021)
When comparing women participation in STEMM disciplines from another perspective of two snapshots of 2000 and 2021 (Figure 5), for all disciplines, the share of female scientists increased, albeit to different degrees.The white lines show the shares of female scientists for the year 2000, while the dark blue bars on the right show this for 2021.For the youngest age group, for all disciplines combined, the share of female scientists increased from one-third to half (from 34.93% to 50.16%), indicating that the share of male scientists decreased from two-thirds to half (from 65.07% to 49.84%).Comparing the old age category of 31-35, the share of female scientists increased three times, from 8.12% to 23.98%.From the perspective of two decades, the changes are noticeable across all disciplines-although in most cases, they can be described as small scale.

The decreasing isolation of female scientists in the Big Four math-intensive disciplines
We compared the share of young and old female and male scientists across disciplines to show gender differences.Among the young scientists, the share of female scientists in several disciplines was about half (e.g., BIO and MED), while among old scientists, the share was much lower (Table 4).In some disciplines, the share of old female scientists was about 10% or lower, with gender differences being at least 10-fold (e.g., ENG and PHYS: 6.31% and 9.21%, respectively).
In many institutions, old female scientists were not merely minorities: they were tokens (or single, exemplary scientists representing all female scientists; see Kanter, 1977; on the role of micro-level departmental climates, see Fox & Nikivincze, 2021).However, the isolation of young female scientists in COMP, ENG, MATH, and PHYS decreased at least twice, with higher visibility for young cohorts.Younger age groups had more female scientists and higher percentages across all disciplines, including male-dominated ones (ENG, MATH, PHYS) and those closer to gender parity (MED, AGRI, BIO).Female scientists were more present in numbers and percentages in younger age groups.For all disciplines except six (AGRI, BIO, IMMU, MED, NEURO, and PHARM), there were more of the youngest male scientists than youngest female scientists, and many more old male scientists than old female scientists.Table 4 shows a general increase in the percentage of female scientists among the younger age groups (10 years or less of experience) compared with the older age groups (31-50 years of experience) across all disciplines.This suggests a growing trend in women's participation in science.However, in the context of scientists from different age groups working at the same time (2021), in the Big Four, the isolation of young female scientists has decreased significantly compared with the isolation of old female scientists.In 2021, for these four disciplines, the percentage of female scientists in the younger generations was at least twice that in older generations: for instance, in engineering, young female scientists made up 16.47% of the total, compared with only 6.31% for the older cohorts (Table 5).In engineering, for instance, in the 36-40 years age group, there were only 84 (nonoccasional, publishing, etc.) female scientists compared with 1,486 male scientists.This shows a stark contrast in representation, with males outnumbering females by more than 17 times.However, in the younger (5 years and less) age group, the gap has narrowed considerably, with 2,316 female engineers and 10,739 male engineers.In this case, the number of male scientists is only about 4.6 times higher than that of female scientists.This example illustrates that the isolation of female scientists in engineering has decreased significantly in younger generations.In physics and astronomy (PHYS), in the 46-50 years age group, there were only 79 female compared with 1,489 male scientists (nearly 19 times difference in this age group).In the younger (5 years and less) age group, however, there were 3,817 female physicists and 13,998 male physicists (only about 3.7 times difference).The academic worlds of young scientists in the Big Four today and 20-30 years ago are amazingly different, with the old today being the young decades ago and surviving in heavily male-dominated environments.

A cross-sectional view (2021): All age groups vertically
Examining the gender composition within disciplines, we found that, in the majority of disciplines (nine), most female scientists were in the two young age groups.That is, with no more than 10 years of academic experience (Figure 6).Young female scientists dominated (> 50%) among all female scientists in disciplines like CHEM, ENG, or MED.Thus, the inflow of (publishing nonoccasional) female scientists in the past decade or so in these disciplines has been massive.The lowest share of young female scientists among all female scientists-or the weakest inflow (< 40%)-was for COMP and MATH.In all disciplines combined (Total), the share of young female scientists among all female scientists reached 51.54%, and the share of young male scientists among all male scientists was considerably lower and reached 39.82%.The emergent picture supports narratives of an increasing number of young women in science: of all the women currently present in global science, more than half had no more than 10 years of publishing experience (see the details in Figure 7).In this section, we discuss the change in age pyramids (distributions) over two decades by comparing the age pyramids in 2021 and 2000.Longitudinal research measures the differences or changes in a variable between distinct periods.An age pyramid consists of paired bar graphs for men and women, with the vertical axis representing age.The 2021 age pyramids (light blue) are superimposed over the 2000 pyramids (dark blue).An age pyramid is made up of a pair of bar graphs-one for men and one for women-turned on their sides and joined, where the vertical axis corresponds to age.For each of the 10 age groups in our population, the bar coming off the axis to the right represents the share of women in that group, and the bar to the left represents the share of men (see Wachter, 2014, pp. 218-221).Both age pyramids cover a different population (there are incoming and outgoing scientists in each case); however, some of the cohorts of scientists were found to be common.The included scientists were publishing between 1970 and 2021 (for 2021 data) and 1940 and 1990.
In Figure 8, we present the percentages of male and female scientists among nonoccasional publishing authors at two points in time, disregarding the number of authors.Using the same sampling principles, this approach allows us to compare demographics at two points in time and focus on young (and old) scientists.Figure 8 displays the snapshots of 2021 and 2000 by age groups and gender, showing the distribution of male and female scientists by age group in each discipline and illustrating the dynamics of change.Although Section 3.3 uses trend analysis to demonstrate the change in female scientist percentages by discipline, this section adds age to the analysis.In general terms, each discipline exhibits a pyramid-like demographic structure, where biological age is replaced with academic or professional age.For each discipline, the age pyramid narrows at the top and expands at the bottom, to varying degrees.The bottom represents the percentage of young scientists among all scientists, while the top signifies the percentage of older scientists.A wider bottom indicates a higher percentage of young scientists.
A common pattern emerged across all disciplines in 2021: the age pyramid's bottom (first age group, 5 years and less) was narrower compared with two decades earlier for both male and female scientists.The share of young female scientists among all female scientists decreased significantly compared with smaller decreases for young male scientists (see Figure 9).This decrease could also indicate that young female scientists who entered academia two decades ago remained in the system in 2021, increasing their shares in older age groups.The shrinking bottom for female scientists in 2021 compared with 2000 is also visible for all disciplines combined (Total).In terms of age structures in demographics (Rowland, 2014, pp. 98-107), the 2000 age structures can be classified as "very young" and the 2021 structures as "young" or "mature."In contrast, when comparing the shares of older male and female scientists in 2000 and 2021 within disciplines (Figure 10), the pattern is clear: the shares of both genders in the four older age groups were much higher in 2021 than in 2020.There was a higher percentage of older scientists in 2021 than in 2020 in each older category for each discipline, without exceptions, highlighting the graying of the scientific workforce.

Results: Trends 1990-2021, Female Scientists by Disciplines
In this section, we analyzed the changing participation of women in science over time to test the claim that the inflow of female scientists into science over the past three decades was powerfully differentiated by discipline.
The number of individual scientists used here to examine the trend over time was 4.3 million (61.85% male and 38.15% female, Table 1).We studied the trend of the percentage of female scientists present in global science in 1990-2021.Our analysis used a linear trend in the form of y = at + b.In the equation, b is where the line intersected the "y axis" and a denotes the slope of the line.The slope describes how steep a line is by using a positive or negative value.The slope of a indicates the average change from year to year, and b is the intercept indicating the level of the phenomenon in the zero period (preceding the first year of analysis).
In some disciplines, women's participation in science was high with strong growth (MED and PHARM) or high with weak growth (BIO); in others, participation was low with strong growth (AGRI, CHEMENG).The Big Four, the cluster of four math-intensive disciplines had low participation and weak growth: COMP, ENG, MATH, and PHYS.For all disciplines combined (Total), the increase was substantial, from 22.16% to 38.55%.The percentage of female scientists has been rising yearly in all disciplines, though at varying rates.MATH, COMP, PHYS, and ENG had the lowest increase, with slopes equal to or smaller than 0.33 (Table 6).All slopes were significantly positive, indicating an upward trend in female scientists' percentages across disciplines.The confidence intervals of slopes revealed specific groups' average growth rates per year.Each discipline had a different time for a one percentage point increase in female scientists' percentage.The fastest growth occurred for ENVIR (1.24 years), AGRI (1.37), and MED (1.41).Nine disciplines took slightly longer (1.64-2.39years), while the Big Four of MATH, COMP, PHYS, and ENG took the longest, with 3.03 to 3.69 years (Table 7).Hypothetically, under stable conditions of professional access to disciplines and current trends in women's participation in science by discipline, here based on the past three decades, none of which can be guaranteed in the future, the male-female parity within a discipline, that is, 50% female scientists and 50% male scientists, in the four disciplines can be expected to be reached about a century from today: after 90.5 years for MATH (year 2112), 112.9 years for COMP (year 2134), 118.5 years for PHYS (year 2140), and 133.5 years for ENG (year 2155); across all other disciplines, the parity can be reached between 2027 and 2028 (PHARM and CHEMENG) and 2081 (ENER).The only discipline in which gender parity has already been achieved is IMMU (see Table 7 for details).To calculate the date for gender parity for any discipline, we took the percentage points missing to reach the 50% parity level from Table 3 and multiplied the missing number of years by the time needed to reach 1 p.p. change.
Instead of gender parity (50%/50%) for all, we can focus on gender parity for the youngest generations of scientists only.And we can recalculate the results for the new age group of interest, with parity already achieved in six disciplines (e.g., AGRI, BIO, and MED; see Figure 4), and almost achieved for all disciplines combined.Instead of gender parity, we can also take an alternative approach: gender balance that refers to a presence of men and women in science that ranges between 40% and 60% of the total population (EC, 2021, p. 20).We recalculated the results for gender balance for all, with much shorter periods in which it can be achieved and with seven disciplines in which gender balance was already achieved (Table 7 and Figure 12).However, predictive analytics was outside of our scope.

Summary, Discussion, and Conclusions
We have examined the changing demographics of the global scientific workforce over the past three decades (as defined in this research: STEMM disciplines, 38 OECD countries, nonocassional status in science, articles indexed in the Scopus database), with special emphasis on the changing participation in the science of young male and female scientists.Our research was large scale (4.3 million scientists); generational (scientists were allocated to 10 academic age groups, with a major distinction between the young cohort, academic experience 10 or less years, and the old cohort, 31-50 years); and both cross-sectional (2021) and longitudinal (in a broader sense, the 1990 to 2021 period and 2000 vs. 2021).
We combined two approaches to comprehensively examine the four dimensions (gender, age, discipline, and time): in what we termed a horizontal approach, we focused on the gender distribution of scientists within the same age groups across disciplines; and in what we termed the vertical approach, we focused on the concentration of male and female scientists separately across age groups and within disciplines.
Our underlying methodological choice was to use individual scientists (with their attributes) rather than individual publications (with their characteristics) as a unit of analysis.We used raw micro-data at an individual level from the Scopus dataset because our research heavily relied on author identifiers and because Scopus provided bibliometric data with a precision of 98.1% and recall of 94.4% (Baas et al., 2020).Our study was quantitative and exploratory in nature: appropriately measured large scale exploratory data can set broad baseline understanding of complex issues and serve as the foundation for more specific research questions.Therefore, the present research can be complemented with further small-scale quantitative studies (based on global and national survey data) and qualitative studies based on interview and focus group methodologies (as Fox (2020) suggests in studying gender and rank).We are not aware of a similar research exercise mapping young men and women STEMM scientists across disciplines in the context of older age groups (in terms of academic or professional experience).Although statistical reports (as described in the Introduction) providing the data on men and women in science are extremely useful, they do not seem to enter the global scholarly conversation on women in science.
Our research does not test the various hypotheses about gender disparities in science because we have carried out an exploratory exercise; however, our findings support selected findings from the theoretical background discussed in the Introduction section in general terms.Attrition levels for women scientists are high ("leaky pipeline" theory), there are clearly disciplines which-for some reason-are not welcoming to women ("chilly climate" hypothesis ), and in which the generational structure of the scientific workforce is not changing ("self-selection" hypothesis, see the Big Four disciplines and its stable age and gender distribution over time).
The scientific workforce has been changing in terms of its gender and age composition, with different intensities in different disciplines.These changes have been ongoing and global in nature.Among the 16 STEMM disciplines, most were currently numerically dominated by men, but some were already dominated by women, and the change processes seemed to be fast in some and slow in other disciplines.A surprising finding, even in the context of the COVID-19 pandemic, was the pivotal role of medical research for the global scientific workforce, especially for women scientists: almost half of all scientists (45.98%) were defined in our methodology as doing medical research (a dominating discipline, based on all cited references from lifetime publications for each scientist).The concentration of female scientists was steep across disciplines: more than half (55.02%) were located in MED and 1 out of 7 (15.91%) in BIO.Consequently, about 70% (70.93%) of all female scientists globally, across all STEMM science sectors, were concentrated just in these two disciplines.
The traditional narratives about some STEMM disciplines being much more heavily male dominated than others have been confirmed: women's participation in COMP, ENG, MATH, and PHYS was very low (and smaller than 20% in 2021).In most disciplines in 2021, the share of female scientists in each successive younger cohort was higher (and it was usually the highest for the youngest cohort: scientists with five or less years of academic experience); for COMP, ENG, MATH, and PHYS, however, the principle did not hold, with very small intracohort differences (Figure 3).
Our trend analysis of the 1990-2021 period showed that the participation of women scientists in global science increased across all disciplines, albeit with different starting points in 1990 and different intensities, following an array of past research on "women in science."For the least increasing trends, the increase in the percentage of female scientists by one percentage point took 3.03 years for MATH, 3.55 for COMP and PHYS, and 3.69 years for ENG.Hypothetically, the male-female parity within a discipline (50% female scientists, 50% male scientists) in the four disciplines can be expected to be reached about a century from today: for MATH in the year 2112, for COMP in the year 2134, for PHYS in the year 2140, and for ENG in the year 2155; across all other disciplines, the parity can be reached between 2027 and 2028 (PHARM and CHEMENG) and 2081 (ENER).In a less restricted approach, gender balance (40% female scientists, 60% male scientists) has already been achieved in seven disciplines, see Figure 12 for details.
However, from an age-disaggregated perspective, in 6 out of 16 disciplines, there were already more youngest female than male scientists (IMMU, PHARM, NEURO, MED, AGRI, BIO), and the discipline most open to female scientists has been IMMU (59.04%).Interestingly, more than 8 out of 10 STEMM female scientists globally worked in these six disciplines (82.90%).Across all STEMM disciplines combined, the majority of women currently involved in publishing articles were young women (with 10 years of academic experience or less).
Most interestingly, there was a higher concentration of young women than young men across all STEMM disciplines, and there was a higher concentration of old men than old women across all disciplines.For every discipline, the share of young female scientists among all female scientists within a discipline was higher than the share of young male scientists among all male scientists.For every discipline, the share of old male scientists among all male scientists within a discipline was substantially higher than the share of old female scientists among all female scientists.The patterns are clear: for all STEMM disciplines, female scientists were generally younger and male scientists generally older.
Moving from standard data (of the OECD, UNESCO and Eurostat type) to gender-disaggregated data for particular age groups, we begin to understand what the global isolation of female scientists in such disciplines as MATH, PHYS, and ENG means.In these disciplines, in 2021, the share of old female scientists was about 10% or less (the difference in numbers by gender was about 10-fold or higher, e.g., ENG, MATH, and PHYS: 6.31%, 11.09%, and 9.21%, respectively).In older generations, female scientists were isolated individuals among their similar-age male colleagues.The numbers show more than percentages (Table 5): for instance, in the 36-40 academic age group, there were 84 female scientists globally working alongside 1,466 male scientists in ENG and 396 female scientists working alongside 3,726 male scientists in PHYS.
However, the context of changing times is important: for the same three disciplines of ENG, MATH, and PHYS, the isolation of young female scientists powerfully decreased, from a 10-times difference for older cohorts to a 5-times difference for young cohorts (i.e., to 16.47% for ENG, 22.04% for MATH, and 20.23% for PHYS).In these three male-dominated disciplines in 2021, female scientists in young cohorts were at least twice as present as female scientists in older cohorts (on the role of gender team composition in science, see Fox & Mohapatra, 2007).
The change in gender participation in science has been gradual and the pattern unambiguous: across all STEMM disciplines, both those heavily male dominated and those closest to gender parity, the younger generations have generally always more female scientists and higher percentages than older generations.Female scientists were more present in numbers and more present in percentages going down the 10 age groups and when moving from the cohort of old scientists to that of young scientists.From a longitudinal perspective, in a broader sense, for all disciplines, the share of scientists in the youngest age group in 2000 was higher than in 2021 for both male and female scientists.There was a shrinking base of young scientists, both men and women, and there was an expanding base of old scientists, both men and women.
Most limitations of bibliometric datasets have been widely discussed (English language and STEMM focus, Anglo-Saxon bias, articles only, etc.; see Sugimoto & Larivière, 2018, pp. 38-44 on "cultural biases of data sources").However, our use of a bibliometric dataset to define the individual attributes of the global scientific workforce requires a brief discussion of new limitations, as follows: (1) Gender determination: A binary approach was used with different coverage for different countries as algorithms used by Scopus (and other gender-determining tools such as, e.g., Genderize.ioor Gender Guesser, see Halevi, 2019, p. 566;Mihaljević & Santamaría, 2020, pp. 1477-1478) work much better for some rather than for other countries; all gender-unknown cases were removed from our analysis.
(2) Discipline determination: A commercial academic journal classification was used as a proxy for the richness of nationally defined academic disciplines and lifetime Scopus-indexed publication history, with lifetime cited references being used to determine a single attribute of discipline (a single dominant value, possibly suppressing the changes between disciplines over time).
(3) Determining the country of affiliation: A single dominant value, possibly suppressing individual lifetime migration histories.
(4) Determining scientists' nonoccasional status: The threshold of three articles as an entry condition for inclusion in the population was arbitrary, underplaying the role of scientists in very early stages of academic careers; a higher threshold would decrease the population, while a lower one would increase it.
(5) Determining academic age: Although the correlation between biological age and academic age in the STEMM disciplines was high (and possibly higher than 0.9, as we have shown for a sample of 20,000 Polish scientists with doctorates; Kwiek & Roszka, 2022b), the first publications in individual lifetime publication histories may appear in different moments of academic lives in different disciplines; additionally, publishing patterns clearly change over time; that is, scientists tend to start publishing earlier in their careers today than before.
Another takeaway is that there were clear differences between national-level studies, especially when bibliometric data were merged with administrative and biographical data, and a global study of the academic workforce and careers.In short, national studies can use commercial and noncommercial datasets available for a few countries only (e.g., the USA, Norway, Poland, and Italy: see Abramo et al., 2016;Abramo et al., 2021;Savage & Olejniczak, 2021), which may include globally directly unavailable biographical information such as gender, date of birth, dates of PhD and other degrees and ranks, national discipline classifications, and full employment history.In our two recent longitudinal (in a narrow sense) studies of changing productivity classes of 2,326 full professors over 20-40 years of their careers (Kwiek & Roszka, 2023b) and of the impact of early and late, as well as fast and slow promotions on productivity on a sample of 16,000 STEMM university professors (Kwiek & Roszka, 2023a), our dataset of about a million Polish Scopus-indexed publications from the past 50 years was enriched with full biographical and administrative data from a registry of 100,000 Polish scientists.
In global studies-as opposed to national studies-biological age needs to be examined through a proxy of academic or professional age, gender needs to be inferred with probability thresholds, academic ranks should be used through a proxy of career length from the first publication, and national prestige ranks should be used through a proxy of global rankings.All scientists registered nationally must be replaced in global studies with publishing-only scientists, with Scopus-(or WoS-) indexed publications.Real scientists with national identification numbers available in national databases need to be replaced with Scopus Author IDs, and near-perfect administrative and biographical data need to be replaced with either inferred data or proxies.However, global exploratory research, provisionally mapping the terrain and testing the best tools and methodologies, is interesting in its generality before more sophisticated analyses arise.The world of Scopus authors (and their Scopus-indexed publication) is not the real world of science -but it may be a useful proxy of it.
The scholarly and policy implications of the present research are manifold.In scholarly terms, we make the first attempt to define the scientific community globally through attributes so far understudied on a large scale.The mapping of changing gender and age distribution of scientists globally over time, as well as a glimpse of the global scientific workforce today, opens science (and academic) profession studies to more detailed questions.The scientific workforce is often discussed in two policy contexts: the aging and accompanying problems for higher education and innovation systems and access to the science profession of young scientists.Our methodological approach and findings can be useful in examining the complex policy issue of entering and leaving the science profession, with the accompanying questions about changing productivity over scientists' life cycles, aging and changing publishing and collaboration patterns, and so forth (especially in the academic sector).
Our research can be useful for policymakers, administrators, and large grant-making organizations in showing where the scientific workforce has been focusing their research efforts, how large segments of academics are involved in studies in particular disciplines, and where male and female scientists are disciplinary located.Our mapping of substantial gender differences between the various STEMM disciplines (and especially between ENG, COMP, MATH, and PHYS versus all others) may provide new empirical grounds that are useful in discussing women's participation in science and its discipline-based social, institutional, and political impediments.
seminar hosted by Simon Marginson, University of Oxford (April 4, 2023).We gratefully acknowledge the assistance of the International Center for the Studies of Research (ICSR) Lab and Kristy James, Senior Data Scientist.We also want to thank Dr. Wojciech Roszka from the CPPS Poznan Team for many fruitful discussions.We are also very grateful to the three anonymous reviewers for their penetrating comments.

Figure 1 .
Figure 1.Flowchart: stages in constructing the population and two subpopulations.

Figure 2 .
Figure 2. The number of publishing nonoccasional STEMM scientists in 38 OECD countries by discipline and gender (left top) and by country (20 biggest systems only) and gender (right top).The share by discipline and gender (left bottom) and by country (20 biggest OECD systems only) and gender (right bottom) (in %), 2021 (N = 1,502,792)

Figure 4 .
Figure 4. Zooming in on young scientists only.More young men than young women in all STEMM disciplines except six (e.g., MED).Horizontal approach: young scientists only (academic age 10 years and less).Distribution of young publishing nonoccasional STEMM scientists by discipline, age group, and gender (row percentages: 100% horizontally), 2021 (N = 666,355)

Figure 6 .Figure 7 .
Figure 6.Young women in STEMM: in most disciplines, the majority of women belong to the two youngest age groups.Vertical approach: distribution of publishing nonoccasional STEMM scientists by discipline, age group, and gender (column percentages: 100% vertically, for all age groups combined), 2021 (N = 1,502,792)

Figure 8 .
Figure 8. Shrinking percentages of the youngest male and female scientists among all male and female scientists over time, across all disciplines.Overview of change directions in percentages, 2000 vs. 2021: vertical approach.Distribution of nonoccasional publishing STEMM scientists by discipline, age group, and gender (column percentages: 100% vertically for all age groups combined, dark blue 2000, light blue 2021) (N 2021 = 1,502,792, N 2000 = 716,796)

Figure 9 .
Figure 9. Shrinking base of young scientists, both men and women, over time.Overview of percentage change directions, 2000 vs. 2021: vertical approach.Zooming in on young scientists only (academic age 10 years or less).Distribution of young publishing nonoccasional STEMM scientists by discipline, age group, and gender, 2000 (dark blue) and 2021 (light blue) (based on column percentages) (N 2021 =666,355, N 2000 = 437,113)

Figure 10 .
Figure 10.Expanding base of old scientists, both men and women, over time.Overview of change directions, 2000 vs. 2021: vertical approach.Zooming in on old scientists only: academic age of 31-50 years.Distribution of old publishing nonoccasional STEMM scientists by discipline, age group, and gender, 2000 (dark blue) and 2021 (light blue) (based on column percentages) (N 2021 = 146,090, N 2000 = 17,463)

Figure 11 .
Figure 11.Different starting points and growth in participation of women in science over time.The trend in the percentage of female scientists by discipline, 1990-2021 (N = 4,314,666)

Table 1 .
The population for 1990-2021: major characteristics.After reviewing the correctness of the scripts, the final run was performed.The operation was carried out on a 100% Scopus database with a snapshot date October 21, 2022, using cluster in standard mode with Databricks Runtime version 11.2 ML with Apache Spark technology version 3.3.0,Scala 2.12, and an instance i3.2xlarge with 61 GB memory, eight cores, one to six workers for worker type, and an instance c4.2xlarge with 15 GB memory and four cores for the driver type.The execution time for the entire script took 1.13 hours; this operation was launched on November 22, 2022.

Table 3 .
The subpopulation for 2021 by discipline and gender, as sorted by the number of male scientists (in descending order) Figure5.The increasing participation of young female scientists for all disciplines over time.Overview of percentage change directions, 2000 vs. 2021: horizontal approach.Zooming in on young scientists only (academic age 10 years or less).Distribution of young publishing nonoccasional STEMM scientists by discipline, age group, and gender; dark blue percentage female scientists 2021, white lines percentage female scientists 2000 (row percentages: 100% horizontally)(N 2021 = 666,355, N 2000 = 437,113)

Table 4 .
The frequencies and percentages of female scientists among publishing nonoccasional STEMM scientists by discipline in the two age cohorts (the young and the old), 2021.
This trend of higher female representation in younger cohorts is stronger across disciplines closer to gender parity.For example, in medicine in 2021, young female scientists made up 52.39% of the total (compared with 25.99% for old scientists; and in biochemistry, 49.87% and 27.47%, respectively)With our individual-level micro-data, we can explore further what the isolation of female scientists in STEMM disciplines means in practice.As shown in Table5, female scientists are more present in numbers and percentages when moving from older to younger generations across the 10 age groups in the same year 2021.This indicates a positive trend toward decreasing isolation of female scientists with every next younger age group.Detailed examples from specific age groups can further emphasize the contrast between the presence of female scientists in younger and older generations in 2021.

Table 5 .
Zooming in on numbers of the young vs. the old: gender-and age-disaggregated data, distribution of nonoccasional publishing STEMM scientists by selected academic age groups and gender, 2021

Table 6 .
Regression model statistics: Trends in the percentage of female scientists bydiscipline, 1990-2021.

Table 7 .
Trends in the percentage of female scientists by discipline (slope, intercept, and speed of change), 1990-2021.

Time needed to achieve gender parity (women 50%) in years, and the date Time needed to achieve gender balance (women 40%) in years, and the date
Gender parity (50/50) vs. gender balance (40/60), time needed to achieve, in years, by discipline.