Gender issues in fundamental physics: A bibliometric analysis

I analyze bibliometric data about fundamental physics worldwide from 1970 to now, extracting quantitative data about gender issues. I do not find significant gender differences in hiring rates, hiring timing, career gaps and slowdowns, abandonment rates, citation, and self-citation patterns. Furthermore, various bibliometric indicators (number of fractionally counted papers, citations, etc.) exhibit a productivity gap at hiring moments, at career level, and without integrating over careers. The gap persists after accounting for confounding factors and manifests as an increasing fraction of male authors going from average to top authors in terms of bibliometric indices, with a quantitative shape that can be fitted by higher male variability.


INTRODUCTION
This paper originates from an observational opportunity: For the first time sociological issues in fundamental physics can be studied using the public InSpire database, which has accumulated bibliometric data about fundamental physics worldwide from around 1970 to now (InSpire, 2010). Fundamental physics is a subdiscipline of physics that deals with the fundamental aspects of the field and that presently focuses mostly on particle physics, cosmology, and astrophysics, from an experimental and theoretical point of view.
Such bibliometric data are being used to study various aspects of the field. Like other science, technology, engineering, and mathematics (STEM) fields, physics exhibits persistent gender differences, which I try to characterize and understand in the present paper. The bibliometric approach relies on large amounts of objective quantitative data about papers, authors, citations, and hires. Having a large amount of new data I will follow a data-driven approach. Enough statistics is sometimes needed to reveal effects, and to go beyond simple counting by devising dedicated analyses that target specific questions. I can do this, as I have the full database, not just access to some predefined metrics.
While a vast literature has studied gender differences in STEM, no previous studies have specifically focused on fundamental physics: The present study will fill this gap. 1 A main theme is 1 Various studies focused on discrimination as a possible source of gender differences. Small samples of female physics students were interviewed by Barthelemy, McCormick, and Henderson (2016) and Aycock, Hazari et al. (2019). The NASEM Report (2018) focused on the University of Texas System, finding that 17% (13%) of female (male) students in science reported "sexist hostility." However, the NASEM Report (2018) also found that a higher 45% rate of "sexist hostility" is reported by female students in medicine, a field with a negligible gender gap in participation. Only a few percent of male and female STEM students reported more serious problems (NASEM Report 2018, Figure 3.3). As "evidence of direct discrimination is limited," alternative interpretations of the gender representation difference in STEM have been considered and "many scholars now emphasize the role of gender differences in preferences, self-concept and attitudes" (Breda & Napp, 2019). understanding why women remain underrepresented in STEM fields, a worldwide phenomenon that has persisted for decades, despite interventions on its alleged social causes (Stoet & Geary, 2018). A limitation of bibliometric analyses is that authors start being scientifically active roughly at PhD level: In physics (as in other STEM fields) low female representation is already present at this entry level of bibliometric data. Earlier phases need to be explored with other tools.
An important clue is that a similar gender difference already appears in surveys of occupational plans and first choices of high-school students (Ceci, Ginther et al., 2014;Xie & Shauman, 2003). This is possibly mainly due to gender differences in interests (Ceci et al., 2014;Hyde, 2014;Lippa, 2010;Stoet & Geary, 2018;Su & Rounds, 2015;Su, Rounds, & Armstrong, 2009;Thelwall, Bailey et al., 2019). Gender differences in relative attitudes (girls with high mathematical ability tend to also have high verbal ability) also contribute to student choices (Ceci et al., 2014;Stoet & Geary, 2018;Wang, Eccles, & Kenney, 2013): Most of the gender gap in student intentions to study math disappears after taking into account their mathematics versus reading difference in PISA scores, while absolute results (boys outperform girls in mathematics, girls outperform boys in reading) are much less able to explain the gender gap (Breda & Napp, 2019).
Coming to the later phase that can be studied by experiments and by bibliometrics, initial small-scale experiments and anecdotal reports suggested biases against hypothetical female applicants (see, for example, Moss-Racusin, Dovidio et al. [2012] and Wennerås and Wold [1997]); see also Eaton, Saunders et al. [2019]. These findings have not been supported by more recent larger-scale experiments (see the review in Ceci and Williams [2011] and also Ceci and Williams [2015] and Williams and Ceci [2017]). Milkman, Akinola, and Chugh (2015) sent letters by fictional students seeking research opportunities to professors and measured their response rate. The result of this social experiment performed in the United States was that, looking at gender in isolation (rather than at "women and minorities"), female students received slightly more responses from public schools (the majority of the sample) with respect to men in the same racial group.
In Section 2 I describe how I identify the gender of authors; how I obtain lists of hires; how I combine citations to define bibliometric indicators that can be used as reliable proxies for scientific merit, being significantly correlated to human evaluations such as scientific prizes; and how I deal with the confounding factor due to the gender evolution of the field.
In Section 3 I present findings that exhibit interesting gender differences. New authors today appear with roughly 4:1 male:female proportion, with order one variations in different countries. I find that this entry difference in representation is negligibly affected by hiring, consistent with Ceci et al. (2014). As I use citations, in Section 3.1 I first verify that male (M ) and female (F ) authors cite in the same way. I achieve this by defining a gender asymmetry in citations sensitive only to a differential gender bias, not to gender differences in number or productivity of authors. In Section 3.2 I then compare reliable bibliometric indices based on citations finding that F authors are hired with indices that are, on average, not higher than those of M authors. Section 3.3 finds a productivity gap consistent with previous studies. This new difference is quantitatively studied in Section 3.4 finding that the M fraction progressively grows going from average to top authors, consistent with Abramo et al. (2009b, and Bordons et al. (2003). In Section 3.5 I consider self-references, finding no new gender differences. Data are made available as discussed in the "Data availability" section. As bibliometric data are influenced by a complicated background of social and historical accidents in the supplementary Section S1 I show that the results above persist after taking confounding variables into account. Statistical details are presented in the supplementary Section S2. Interpretations of the data are discussed in Section 4.

METHODS
The public InSpire database (InSpire, 2010) maintained by CERN and other institutions offers a picture of fundamental physics worldwide from around 1970 to now. InSpire gives data on about 1.3 million scientific papers, 30 million references, 71,104 identified authors in over 7,000 institutes and over 6,000 collaborations. InSpire individually identified all authors (except occasional authors), solving the problem of name disambiguation. I expect that the database is negligibly affected by direct or indirect gender bias. Indeed, the database provides essentially full coverage of the scientific literature within "fundamental physics." For decades this has been a self-contained, highly specialized subject (Sinatra, Deville et al., 2005), so that the "boundaries" of the database play a minor role. Papers in InSpire include all those published in some categories of the preprint bulletin arXiv and in some journals with topics considered relevant for fundamental physics. Human intervention is minor, such as adding extra papers considered relevant. Only a minor fraction of authors occasionally work in other fields or arrive from other fields. A possible gender difference in multidisciplinary attitudes is thereby expected to have negligible impact on the subsequent discussion. On the other hand, various authors work on multiple topics within the field, which cannot be sharply subdivided.
InSpire does not provide gender information: In Section 2.1 I describe the procedure to infer gender from full names and nationality of the authors. In Section 2.2 I describe how I obtain lists of hires in fundamental physics worldwide. Section 2.3 motivates the bibliometric index that I will use to indicate scientific merit.

Name-Gender Association
I need to infer gender from names in an accurate and complete way. 2 Three main problems are encountered. First, the InSpire database provides only name initials for about 13% of the authors. These are mostly authors with little impact, as defined by any index. Second, some names such as Nicola are "ambiguous": They correspond to different genders in different countries. Third, some authors have unusual names. The Mathematica machine learning function Classify (Wolfram Mathematica, n.d.) uses information about the first name only and leaves about 40% of authors with unclassified gender. I tested two approaches to determine their strengths and to choose the best combination: 1. First, I run the online Ethnea (Torvik & Agarwal, 2016) tool, which uses the full name (first and family name) to infer both gender and ethnicity. Ethnea leaves 26% of the authors with unclassified gender. 2. Second, for each author I extract a "guessed" nationality from the earlier affiliations in his or her papers and use it to disambiguate "ambiguous" names. The obtained list of first names and nationalities is matched to a database of names and countries from the Worldwide Gender-Name Dictionary (WGND) (Raffo, 2016). This database contains 175,917 names with their associated countries. About 70% of authors have "unambiguous" names that are present in the WGND. Authors with "ambiguous" names present in the WGND are matched using the nationality inferred from their earliest affiliations. The size of this subset of authors is approximately 3% of the total, and the uncertainty induced by this procedure is below the per cent level. About 0.1% of the authors have "ambiguous" names and no nationality information: I match them to the most common gender corresponding to their name, defined as the one used in the largest number of countries. Twenty-three percent of the authors remain unclassified.
The results discussed in the following are affected in a minor way using the Ethnea or WGND classifications. By comparing them it can be seen that the Ethnea classification is less complete, leaving unclassified more authors with unusual names. On the other hand, the WGND classification leads to some authors with misidentified gender, typically arising due to a misidentification of their nationality. Different genders are found for 1.8% of all identified authors; the percentage rises to 5% among Chinese, 3 Indian and Korean authors, and falls to 1% among European authors.
As a best choice, I adopt the Ethnea classification whenever available, and the WGND classification otherwise. Furthermore, I selected 1,000 of the top-cited authors in different time periods and systematically verified and correctly assigned their gender with no errors, using information available on the internet.

Hiring
InSpire is integrated with HepNames, a database with biographical information about the various authors, including papers, affiliation history, experiments they participated in, PhD advisor, and graduate students. As an example, the internet page http://inspirehep.net/author/profile/A 2 Some U.S. astronomers "discourage" adopting this "quantitative methodology," seen as "epistemically violent" and "discriminatory" (Rasmussen, Maier et al., 2019). 3 Gender can be reliably extracted from Chinese names only when they are written in Chinese characters: This information is not always provided by InSpire. .Strumia.1 shows the profile of the present author. A user interface allows researchers to create and update HepNames records for themselves and for other authors, providing precise career information on a voluntary basis (to be validated by the InSpire team). Furthermore, large collaborations systematically provide complete author information upon submission of documents through a dedicated format.
From HepNames I obtained a database of about 10,000 first hires in fundamental physics worldwide, including dates and disambiguated institutions. Such InSpire hires might be biased if those F and M authors who need to self-report their data tend to do this differently. While InSpire is a widely used tool in the community, integrated with a job announcement system, funded by multiple official institutions and endorsed in Reviews of Particle Physics, occasional authors might not use it.
I therefore complement InSpire hires by computing unbiased "pseudohires," defined as follows. For each paper, there is a list of disambiguated affiliations of each author. I consider an author as pyr-hired when he or she starts writing papers with the same affiliation for at least p years. Using this definition I obtain a database of about 40,000/19,000 5/10yr-first hires from 1960 to 2013/2008 (64,000/23,000, including multiple hires for the same author). However, in this way it is not possible to obtain a precise hiring date for the subset of authors hired by the same institution to which they were previously affiliated. Therefore, InSpire hires will be used when a precise hiring date is more important than increased statistics, and pseudohires in the opposite situation, when full coverage is more important than precise timing. In each case, the other sample will be used as a control sample.

Bibliometrics
I here motivate the use of appropriate bibliometric indicators as a proxy for what is commonly considered as scientific merit.
Various authors have studied what citation counts do measure.
At theoretical level, two main models have been proposed. According to the normative interpretation, scientists primarily cite to give credit. A bibliometric index then provides a valid proxy of scientific merit, especially when highly correlated with scientific prizes or other human evaluations of scientific merit. According to the social-constructivist interpretation, citations are instead primarily a social persuasion tool; the concept of scientific merit itself is questioned: as reviewed in Bornmann and Daniel (2008) "scientific knowledge is socially constructed through the manipulation of political and financial resources and the use of rhetorical devices." According to this point of view, citation counts could be correlated to prizes simply because both reflect social status. Some prizes in physics require established scientific results; others are awarded following rules that leave more space for sociological distortions.
Observational works supported the normative interpretation at high aggregation level (Bornmann & Daniel, 2008;Tahamtan & Bornmann, 2019); personal factors important at individual level average out when considering many authors. My data provides extra evidence in this direction: For example, authors of top-cited papers tend to be younger, rather than powerful senior scientists.
Citation counts are surely influenced by some confounding factors (Bornmann & Daniel, 2008): the citation intensity depends on time and field (my indicator will compensate for this), on language (essentially all physics literature is in English), on accessibility (physics literature has been freely available on the preprint bulletin arXiv since 1995), and on collaboration size.
Collaboration size is a big issue in fundamental physics, due to the presence of very large (up to 3,000 authors) and very productive (up to 6,000 papers) collaborations, mostly in high-energy experimental physics. Because of this main reason, traditional metrics (such as citation counts, h-index, paper counts) now fail to provide reasonable proxies for scientific merit in fundamental physics . Signing more papers that one can read stretches the concept of authorship (Birnholtz, 2006). At a quantitative level, the problem is that the contribution of one big collaboration overwhelms the database (1.3 million papers) if 6,000 papers are counted as 3,000 × 6,000 = 1.8 million.
This situation can be corrected by "fractional counting" (Hooydonk, 1997;Leydesdorff & Park, 2016;Perianes-Rodriguez, Waltman, & van Eck, 2016): A fraction 1/N aut of each paper (rather than the full paper) is equally attributed to its N aut authors, as appropriate for an intensive quantity. All authors, including first and last authors, are treated on equal footing because authors are usually sorted alphabetically in fundamental physics, unlike in other fields. 4 Thereby there is no way of telling who contributed what to multiauthored papers. When huge collaborations are involved there is no warranty that each author contributed to each paper. Despite this, the data show that the total fractionally counted bibliometric output of collaborations scales, on average, as their number of authors (Rossi, Strumia, & Torre, 2019), suggesting that large collaborations form when scientifically needed and that gift authorship does not play a large role.
Fractional counting of citations already provides one simple acceptable indicator. I improve on it by using the closely related number of "individual citations" (summed over all citing papers, as precisely defined in Strumia and Torre (2019), which gives reduced weight to citations coming from papers with a larger number N ref of references. This refinement addresses the issue of normalization between different fields and times (Zitt and Small (2008); see Waltman (2016) for a review and extra references): Papers in sectors with a higher rate of publication (such as phenomenology in fundamental physics) tend to receive more citations; for the same reason these papers also tend to have more references. Thereby, dividing by the number of references tends to give a common normalization to different fields, without needing a field classification system. Indeed, the average number of individual citations received by papers in any field disconnected from other fields is 1. As a test that this concept works in practice in the InSpire database, I computed the average number of citations, of references, and of individual citations of papers within the main theoretical fields defined by arXiv (hep-th, hep-ph, gr-qc, nucl-th, astro-ph, hep-lat 5 ), finding that the dispersion in N icit among different fields is reduced to 8%, more than twice as small as the dispersion in N cit or N ref . Similar results are found considering different times: The field grows with time, such that newer papers receive more citations and have more references, in roughly proportional amounts.
Individual citations have the following meaning: An author who wrote N icit fractionally counted papers of average impact in his or her field received N icit individual citations. Table 7 of Strumia and Torre (2019) lists the 50 physicists who received the most individual citations, together with their scientific prizes. Physicists can read their names and consider whether N icit is dominantly influenced by scientific achievements or by social constructivism. For practical 4 The first author is not alphabetically sorted in 6% of the multiauthored papers in the hep-th arXiv bulletin, 13% in hep-ph and hep-ex, 18% in hep-lat, 25% in gr-qc, and 44% in astro-ph. In more papers the author highlighted as first might accidentally be also alphabetically first. 5 Experimental papers form a separate category, as they tend to have many coauthors and to receive more citations.

Quantitative Science Studies
purposes, an indicator provides an acceptable proxy of scientific merit if scientific merit positively affects the index more than confounding variables. A full or large correlation with scientific merit improves the sensitivity of the analysis, but some effects can be large enough that fine sensitivity is not needed to reveal them.
The use of bibliometric indices based on citation counts as a proxy for scientific merit comes with limitations and dangers. Some citations are given for negative reasons. On short timescales citations are more influenced by visibility, and some authors engage in boosting their citation counts in various ways: large collaborations, many references, self-references, citation networks, salami slicing into minimum publishable units, etc. Individual citations are not boosted by the first two strategies. As this paper is concerned with gender differences, it is reassuring that Section 3.5 will find no extra significant gender differences in self-referencing.
As "when a measure becomes a target, it ceases to be a good measure," in the supplementary Section S1 I consider a metric based on citations more different from common targets, which is not enhanced by the latter three strategies. The CitationCoin C = = is defined as the difference between the number of received and given individual citations (up to a correction factor that prevents systematically negative contributions from recent papers), such that it is not affected by self-citations or by networks of circular citations . Authors who write too many poorly cited papers can even have a negative C = = score.
Bibliometric indicators measure the average opinion of the community: While all opinions can be wrong, a better possibility could be to rely on the opinion of top authors. This is done by metrics based on the PageRank algorithm (such as those discussed in Chen, Xie et al. (2007), Ma, Guan, and Zhao (2008), Pinski and Narin (1976), Radicchi, Fortunato et al. (2009), Strumia and Torre (2019), and West, Jensen et al. (2014)). This is studied in the supplementary Section S1, where for completeness I also consider the widely used but naive bibliometric indicators based on paper counting and on the average number of citations per paper.
In practice, the differences in bibliometric indices among authors are so large that log-scale plots will be appropriate and refined metrics only make minor differences. Individual citations N icit are used because this metric is simpler and closer to the commonly used number of citations N cit , while allowing us to meaningfully deal with experimentalists, theorists, and astrophysicists by compensating for the vastly different typical number of co-authors N aut of papers produced by these communities.

The Age Confounder
The fraction of female authors in fundamental physics has significantly increased with time, producing demographic gender differences (female authors are on average younger than male authors) that act as a confounding factor to my later analyses. Apparent gender differences can just be age differences. Career-integrated indices tend to favor senior authors, and indices based on single papers tend to favor younger authors. As age is a significant confounder, I will compensate for the different time evolution N start F;M (t) (number of F and M authors that produced their first paper during year t) by assigning to each author A a weight proportional to where t A is the date of his or her first paper and G is his or her gender. This is equivalent to selecting every year a random subset of new authors (respecting the time evolution of the total number of authors) such that M and F authors are numerically equal, and averaging over the possible choices.

RESULTS
Among the 71,104 authors in fundamental physics listed in the InSpire database, 49,860 male and 9,205 female authors were identified. 16% of authors with identified gender are classified as female, and wrote 10% of the fractionally counted papers receiving 7% of the individual citations. These raw numbers, meant only to give a first rough idea of the field, are affected by a variety of historical accidents.
As documented in Strumia and Torre (2019), the field has expanded significantly: Due to increased publication intensity, about half of citations have been given after 2000, so that metrics based on citations favor recent authors (the metric N icit automatically compensates for publication intensity). Furthermore, the F percentage grew with time, as shown by the raw data in the left panel of Figure 1.
The right panel shows that, within the countries that most contributed to fundamental physics, the female fractions range between 7% and 23%. It is interesting to explore if the female fraction is correlated with the Global Gender Gap Index (GGGI) of the countries (World Economic Forum, 2016), which measures the gap between women and men in education, politics, health, and economy, as this is a possible cause of the low female representation. The GGGI ranges between 0 and 1, with 1 indicating parity or a gap in favor of women (as the GGGI ignores imbalances to the advantage of women). The right panel of Figure 1 shows that the female fraction is not positively correlated with the GGGI, as similarly observed among students in STEM (Stoet & Geary, 2018). Figure 2 shows that the female percentage is a factor of 2 higher in subfields dominated by large experimental collaborations than in theoretical fields.
Clearly, the field and its gender composition have evolved in the past 50 years. While describing such changes from a bibliometric point of view is an interesting subject, I try to focus on the general features that emerge from the complicated background of social factors. This will need to take into account possible confounding variables, by studying subperiods and subtopics or by trying to compensate for the above variations.

Citations
I want to investigate if citations are influenced by the gender of the cited authors, searching for a possible different tendency of the two genders to cite a given gender more often.
In principle, complete information could be extracted by comparing "how citations are" with "how citations would be" in the absence of gender discrimination. In practice, this strategy needs a theoretical model of citations, but such models are affected by questionable systematic issues. One can try controlling for the main factors (such as different numbers of M and F authors, different average seniorities, regional differences), but reality can contain more complicated effects, such as different scientific qualities. For example, Caplar, Tacchella, and Birrer (2017) claim (consistent with my later findings) that papers in astronomy written by F authors are less cited than papers written by M authors, even after trying to correct for some social factors. After considering attributing the remaining difference to gender bias, Caplar et al. (2017) conclude "of course we cannot claim that we have actually measured gender bias." I will follow a different strategy, which is often more useful in the presence of backgrounds that cannot be reliably modeled: I construct an asymmetry such that it is not affected by the backgrounds. The extracted information encoded in the asymmetry is reliable but partial, as I give up on the attempt to model the full citation process.
To start, I restrict my inquiry to the subsample of single-author papers with identified gender G, as these would likely be more strongly affected by a possible gender bias. I count N cit G!G 0 , the number of single-author papers with gender G citing single-author papers with gender From this I define the gender asymmetry as The first formula means that A is the proportion in which solo men cite solo male research more than solo women cite solo male research. The second formula means that A is also the proportion in which solo women cite solo female research more than solo men cite solo female research. So the gender asymmetry ranges between −1 ≤ A ≤ 1. The final formula shows that A is symmetric under M $ F permutations, with a property that makes it useful: A vanishes whenever citations are given without considering gender. A > 0 (A < 0) signals same-gender (opposite-gender) preference, althought more complicated patterns are possible: Only one gender might have a particular preference for citing a given gender, or both might have a preference for opposite genders, or both might  Figure 1, showing after 1995 the result within the main arXiv categories, plotted as colored curves: Experimental categories include hep-ex (high-energy experiments) and nucl-ex (nuclear experiments). Theoretical categories include hep-ph (high-energy phenomenology), hep-th (high-energy theory), hep-lat (lattice), and nucl-th (nuclear theory); gr-qc (general relativity and quantum cosmology) is mostly theoretical, although it includes some experiments. Finally, astro-ph contains astrophysics and cosmology.
have a preference for the same gender, in different amounts. On the other hand, A is insensitive to a difference in the total number and in the average scientific quality of M and F authors (as quantified by the chosen indicator), as well as to a possible collective equal bias of both genders towards one gender, which corresponds to multiplying one column of the matrix above by a fixed constant. 6 To better understand what the asymmetry measures, it is useful to compute its predicted value in a toy model of citations where N aut G authors of gender G cite with gender-dependent rates p G!G 0 (for simplicity I ignore that some authors are more active than others, so that an effective number would be directly relevant). In this model N cit G!G 0 / N aut G N aut G 0 p G!G 0 and the asymmetry equals in the limit where all p G!G 0 are close to a common value (otherwise a slightly more cumbersome expression applies).
I extract A from the data by removing self-citations, which introduce a background of samegender preference not due to an actual gender preference. This removal is done exactly, as I have a list of all references where all authors are identified with a unique code. The removal of selfcitations reduces same-gender citations, introducing a small gender discrimination of order 1/ N aut G in the asymmetry: When considering many authors this bias is negligibly smaller than the statistical uncertainty of A, which scales as 1/ ffiffiffiffiffiffiffiffi ffi N aut G p . 7 Figure 3 shows the time evolution of the gender asymmetry, found to be compatible with zero at all times. 8 Restricting to papers after 2010 gives the results shown in Table 1. 9 The uncertainty is shown as one standard deviation after the ± symbol. A hint of an asymmetry, A other = (4.8 ± 1.2)%, is observed among about 10 4 Figure 3. Time evolution of the gender asymmetry defined in Eq. 3; A > 0 (A < 0) signals same-gender (opposite-gender) preference. Left panel: As a function of the publication year of the cited single-author papers. Right panel: As a function of the publication year of the citing single-author papers. After 1995 I also show the asymmetry in different sectors of fundamental physics, based on their arXiv categories: theory (hep-ph, hep-th, hep-lat, nucl-th, and gr-qc), experiment (hep-ex and nucl-ex), and astrophysics (astra-ph). The bin 2018-20 only uses data available up to mid-2018. 6 The subsample of "ambiguous" authors (whose name is associated with different genders in different countries) does not show anomalous features that would support the hypothesis of a collective gender bias. 7 More precisely, the uncertainty on A equals [N cit M! ] 1/2 using the usual propagation of statistical fluctuations on each counts, ffiffiffiffiffiffiffiffiffiffiffiffiffiffi N cit G!G 0 p . 8 An analysis performed along the same lines but replacing genders with countries shows an order one preference for citing authors of the same country, especially in some countries. This can be a manifestation of the stronger contacts between nearby authors. 9 Our results have been reproduced by Hossenfelder (2018), who also tried to go beyond the asymmetry by assuming a model similar to my Eq. 4 (but with N aut G replaced by N pap G ). As such models introduce questionable systematic uncertainties, I restrict my attention to the model-independent gender asymmetry. other papers (mostly unpublished) not included in the eight major arXiv categories relevant for fundamental physics. As a result, combining all single-author papers citing single-author papers gives an asymmetry A published = (1.0 ± 0.5)% when restricting to published papers, or A all = (1.9 ± 0.4)% when including all papers.
The definition of the gender asymmetry could be extended to multiauthored papers knowing how a hypothetical gender bias would depend on the relative amount of F and M authors. One simple possibility is just to generalize the definition of N cit is the fraction of authors with gender G in each citing (cited) paper. All self-citations, now defined as whenever the cited and citing paper have at least one author in common, are now dropped. With the new N cit G!G 0 I find the results in Table 2. Uncertainties (not shown) are there about five times smaller than in the single-author sample, if propagation of errors is naively applied to fractional counts.
Taking into account the definition of the asymmetry A and the relative number of F and M authors in my data, I conclude that A is so close to zero that a nonzero gender asymmetry in citations within its measured range would not significantly distort the bibliometric indices based on citations discussed in the following.

Hiring
The lack of a gender asymmetry in citations means that there is no fracture along gender lines in the community about which research in fundamental physics is more relevant/used/visible. In Section 2.3 it was shown that appropriate bibliometric indices based on citations are useful proxies for scientific merit. I here use such indices to search for a possible gender difference in hiring. For each hired or pseudohired author I compute his or her bibliometric indices at the hiring moment, defined as in Section 2.2. From this I extract the mean bibliometric indices of hired F and M authors.
The left and right panels of Figure 4 show the mean number of fractionally counted papers and of individual citations N icit , respectively, of authors at their hiring date as reported by InSpire. For the sake of clarity I use traditional color codes: blue for male and pink for female authors.
It can be seen that hired F authors do not have, on average, bibliometric indicators above those of hired M authors. Rather, a tendency in the opposite direction seems present at all times, across the main subfields 10 and most countries (statistical uncertainties become significant when restricting to some countries with not enough authors). This result persists after taking into account the possible confounding variables considered in the supplementary Section S1.1. I next provide extra information. Table 1. Gender asymmetry A defined in Eq. 3 computed restricting to single-author papers after 2010, in the arXiv categories defined in the caption of Figure 3. The counts are the number of single-author papers in a given arXiv category cited by any single-author papers, not necessarily in the same category

Quantitative Science Studies 235
Gender issues in fundamental physics Figure 5 shows the cumulative distribution of hired physicists as a function of their scientific age at hiring. It exhibits no significant gender difference. A gender difference could have been produced in various ways: 1. Some hiring committees might take into account career gaps due to maternity (about which no information is available): This would tend to increase the average scientific age of female hired scientists. 2. A gender discrimination in hiring would tend to reduce the average scientific age at hiring of scientists with the favored gender. 3. A gender difference in abandonment rates would tend to reduce the average scientific age at hiring of scientists with the higher abandonment rate. 11 A warning is necessary about the two next plots, which extend the analysis to authors who have not been hired. My analysis is restricted to InSpire authors listed in HepNames (described in Section 2.2), which misses many authors who leave the field after writing a few papers. This generates an extra systematic issue, which presumably tends to be gender neutral, such that gender ratios presumably are more reliable than absolute rates. Indeed, information for M and F authors presumably is similarly incomplete, as InSpire does not collect data about gender, especially of unknown authors. Figure 6 shows the fraction of hired authors among those who started writing papers in given time periods. Significant gender differences are not seen. I used 10-year hiring because coverage is more important here than timing. Therefore the plot stops 10 years ago, and absolute numbers would be different using incomplete InSpire hiring. Furthermore, as warned above, extra unhired authors not in InSpire would lower the hired fraction. Figure 7 shows the abandonment rate per year as a function of scientific age. I considered departures during 2000-2015, counting as having left those authors who wrote no further papers up to 2018. Older authors started when the M fraction was higher and the abandonment rate was lower (as hinted by Figure 6): This confounder generates an apparently lower abandonment rate among M authors. I thereby compensate for gender history as described in Eq. 2. I find that the abandonment rate is maximal among older authors who retire, minimal among senior authors,  (2018), finding that F authors are hired on average 1.1 ± 0.6 years earlier than M authors (considering the time after receiving their PhD; astronomers are hired on average five years later). I find a difference of 0.95 ± 0.5 yr restricting to astro/cosmo authors (considering the time after the first paper; authors are hired on average nine years later). According to Flaherty (2018), the hiring time distribution is better fitted assuming a three to four times higher F abandonment rate, rather than assuming a 10:1 bias in favor of F astronomers. However this claim is only based on a very simplified model of hiring that neglects important effects (some authors are better than others; quotas would not be overfilled, etc). I do not attempt to model hiring, as I do not see how models can be made realistic. Rather, I have extra data about papers and citations that support neither a 10:1 bias (see Figure 4) nor a 4:1 difference in abandonment rates (see Figure 7). Flaherty's (2018) results have been "firmly ruled out" by Perley (2019), who ruled out gender differences larger than 40% in hiring and abandonment rates. and intermediate among junior authors (as warned above, the abandonment rate of very young authors who leave the field after writing just a few papers is underestimated). Abandonment rates show no significant gender difference, in agreement with the null result by Perley (2019) and in disagreement with Flaherty (2018) (these authors only considered astrophysics).
In conclusion, the gender gap in representation at the entrance level of research is negligibly affected by "leaky pipeline" effects consistent with Ceci et al. (2014) that finds large gender differences at PhD level in STEM, and mild differences in the subsequent progress; see also Allen-Hermanson (2017) and Miller and Wai (2015).

Productivity
In this section I study scientific productivity as quantified through bibliometric indices. Of course, such indices say nothing about other activities of researchers that do not result in publications, such as teaching, mentoring, and outreach. Figures 1, 2, 8, and 11 show a possible gender gap in the fractionally counted number of papers: Male authors write, on average, 10% more papers. The gap is consistent with earlier findings in the literature (see, for example, Figure 5. Among all authors first hired after 2000, the cumulative fraction of hired authors as a function of their scientific age is shown, for male (blue) and female (pink) authors in experiment (left), theory (middle), astro/cosmo (right). InSpire hires are used and gender history is compensated for as described in Eq. 2.   Abramo et al. (2009a. A slightly larger gap is found in the number of received individual citations. Is such a gap due to the different average scientific age of M and F authors? To check this possibility, the left panel of Figure 8 shows the mean number of fractionally counted papers written by M and F authors as a function of their scientific age (time since their earliest paper). The gap persists. The right panel of Figure 8 similarly shows the mean number of received individual citations. In both cases it can be seen that junior M and F authors have similar productivity, and that a gap develops with their scientific age. A higher scientific age means going backwards in time to authors who started earlier, when the field was different and when the F percentage was smaller.
The averages in Figure 8 are shown separately for hired and not-hired authors, using 10-yr-hires in order to have more complete coverage. It can be seen that hiring is not the reason for the gap.
Furthermore, in Figure 8 only scientifically active authors are considered (those who wrote at least one paper after 2013), such that these results would not be affected by a gap in abandonment rates. Figure 8 does not compensate for possible career gaps, as such gaps do not exhibit significant gender differences. This is shown in Figure 9, where for each author I computed the longest time gap between consecutive papers, using arXiv dates to have precise information about publication dates. The distribution of longest gaps among M and F authors shown in Figure 9 does not exhibit significant gender differences. A similar null result is found when restricting to hired authors. Stopping writing papers might, however, be the extremum of a tendency towards reduced productivity (possibly due to maternity issues). I therefore searched for consecutive years of reduced publication intensity: Some authors are more regular, other experience periods of relatively lower productivity, but again the distributions show no significant gender differences (see Figure 10). No significant gender differences are found looking at periods of relatively higher productivity.  A gender difference in abandonment rates or career gaps or periods of lower productivity would reduce the cumulative number of papers and of received citations of authors at career level. It is thereby interesting to test whether a gap persists in noncumulative productivity indices that avoid summing over author careers. The procedure is as follows: For each year the subset of scientifically active authors that produced papers is selected, and Figure 11 shows their average productivity, separately for M and F authors. I find that active F authors produce on average roughly 30% fewer papers than active M authors, and receive roughly half the number of citations. 12 Furthermore, Figure 12 (left panel) analyzes the gap at the level of papers, finding a smaller F percentage among authors of top-cited papers, even when restricting to single-author papers (see also Jagsi et al., 2006;West et al., 2013). The right panel of Figure 12 shows that F authors tend to work in larger collaborations.
The supplementary Section S1.2 discusses other possible confounding variables, without finding anything that can remove the gender gap in productivity.
I now discuss some possible causes of such a gap.
In various countries F authors have earlier retirement ages. But gender differences show up before retirement in Figure 8. Furthermore, many physicists tend to remain scientifically active after retirement (although the productivity of most physicists tends to decline before retirement).
A possible reason for the gender gap observed in various fields is children and maternity. See Ceci et al. (2014) for a recent summary of the literature, which is not univocal. Some studies find no or small effect (Cole & Zuckerman, 1987;Sax, Hagedorn et al., 2002;Stack, 2004;Xie & Shauman, 2003). Other studies find a negative impact on women (Fox, 1995;Ginther & Kahn, 2009), and on men and women equally (Hargens, McCann, & Reskin, 1978), while some studies found a positive impact on men (Ceci et al., 2014), possibly due to selection effects. 12 Fractional counting is used. Using full counting, hyperauthored publications would lead to a recent boom, roughly equal for M and F authors, due to the appearance of experimental collaborations with thousands of authors. Alternatively, the productivity gap can be seen using full counting and restricting to theorists.
Results vary depending on field (with physical sciences sometimes being an outlier, possibly a fluctuation) and are mostly focused on the situation in the United States and on the number of papers produced or worked hours. Ceci et al. (2014) conclude: "the presence of children cannot explain the overall gender productivity gaps." While maternity would deserve a dedicated study, the InSpire data do not provide any personal information, so it is only possible to proceed indirectly. As has already been described, timing of publications does not show gender differences in periods of null or reduced productivity. Figure 8 indicates that the productivity gap opens at an age roughly consistent with maternity (but also consistent with the transition to scientific independence), and that it does not close at older ages. A similar situation is found when analyzing the salaries of physicists in the United States: no gender gap just after graduation; a 10% gap after 10-15 years according to Porter and Ivie (2019), who report large differences in personal life choices, in particular that women are four times more likely to have a career break.
As maternity laws are different in different countries, an alternative possible strategy is looking for national differences in the M/F gap, which seems stronger in Germany, the United Kingdom, and Italy; weaker in the United States and France, and null in Japan. However single-country statistics are poor and many other national differences can act as confounding factors.

Distribution of Individual Citations
In the previous section a productivity gap was found. I here characterize its statistical properties. Figure 13 shows the distributions in the number of individual citations N icit received by female and male authors in fundamental physics, considering the whole InSpire database. The bellshaped distributions spread through a few orders of magnitude in N icit . The dotted curves show that each bell is well approximated, at least on its upper side, by a log-normal as a function of N icit (namely, by a Gaussian as function of log N icit , the variable used on the horizontal axis of Figure 9. Fraction of authors active between 2000 and now (divided by their main topic) as a function of the longest time break among their papers. Gender history is compensated for as described in Eq. 2. Figure 10. For each author the minimal number of papers he or she produced in a consecutive three-year period is computed. This number is divided by the author average publication rate, obtaining a number 0 ≤ r ≤ 1 normalized such that r = 1 indicates an author who published in a regular way, while r = 0 indicates an author with a three-year period of null productivity. As a function of r the fraction of authors active between 2000 and now is plotted, divided by their main topic. Gender history is compensated for as described in Eq. 2. Figure 13; log is the logarithm base 10). A log-normal, already observed in bibliometrics (Thelwall & Wilson, 2014), arises when many positive independent random variables contribute multiplicatively. 13 The difference between the F and M distributions in Figure 13 is statistically significant. This is better shown by the black curve in Figure 13, that shows the ratio N F /N M of female versus male authors (left axis) as function of the number of received individual citations (horizontal axis). 14 The N F /N M gender ratio is not constant: The M fraction progressively grows when going from average to top authors in terms of individual citations.
Again it is necessary to study whether such a difference can be a byproduct of confounding factors that affect my composite sample of data. This issue is discussed in supplementary Section S1.3: I do not find any confounder that washes away the trend. As discussed in Section 2.4 I compensate for one significant confounder: Male authors are presently on average more senior than female authors. This is relevant here because senior authors had more time to receive citations (and because younger authors contribute more to top-cited papers). This confounder does not remove the gender difference in N icit , given that it is observed within subsamples of authors with same scientific age (see supplementary Figure S11). Correcting for the age confounder is, however, needed to precisely quantify the difference. I proceed as described in Eq. 2. The left panel of Figure 13 shows age-corrected results, while the right panel shows raw data. Both averages and variances differ in raw distributions. The difference in variances persists among the upper sides of the age-corrected distributions, while no significant gender differences are seen on the lower sides of age-corrected distributions, more affected by social phenomena (lower sides would be mostly removed by restricting to hired authors). Different ages can be accounted for by using a complementary strategy: I consider a new bibliometric index N icit /Δt p A that approximatively compensates for the scientific age Δt A = t now − t A of each author. In order to achieve the compensation I choose p ≈ 1.8, because N icit Figure 11. Left panel: Mean number of fractionally counted papers produced each year by M and F authors active that year. Right panel: Mean number of received individual citations divided by mean number of fractionally counted papers. The shading reminds us that citation counts are incomplete for recent papers. The continuous curves show the result compensated for gender history as described in Eq. 2, and the negligibly different dashed curves show raw data. 13 The following observations suggest that the dominant random variables are unlikely to be of social type (e.g., the possibility that some authors get more visibility and funds that boost their citation counts (Ruocco, Daraio et al., 2017): I find that the number of citations received by authors in physics tends to grow linearly or quadratically rather than exponentially with their scientific age. Furthermore, when looking at single papers, rather than at author careers, a significant excess of young authors (scientific age below approximatively 15 years) is observed among authors of top-cited papers. 14 In precise mathematical notation this is dN F /dN M , and the bells are N −1 G dN G /d log N cit .
averaged over authors approximatively scales as Δt 1:8 A within all main subfields. The result of such a correction is shown in the left panel of Figure 14: Both M and F distributions become narrower, having removed one source of their variability. I again find a higher male variance among the upper sides. The right panel of Figure 14 shows that applying both age corrections has negligible extra effect. The difference in variances on the upper side is seen in any case. The F fraction of each paper is determined assuming that each author contributed equally to collaboration papers and compensating for gender history. The same result is also computed restricting to solo papers and to papers with fewer than 10 authors. Right panel: Fractional contribution of all F authors to papers with N aut authors. The M fraction is largest among top authors, as clear from the dots below the bells in the right panel of Figure 13, which show the individual authors. It is useful to focus on the subset of top authors. A physicist might read their names and conclude that no sociological confounder can wash away most of them. It is thereby interesting to show that the gender difference is statistically significant, even when restricting to top authors. This is done by considering the hypothesis of no gender difference: While being sociologically implausible it has precisely computable consequences that can be compared to data about top authors. Starting from raw data, the F author with most individual citations is in position F 1 = 69. Under the hypothesis of no gender difference one can mathematically compute that the probability of observing F 1 ≥ 69 is } 1 = m F1−1 ≈ 3 10 −6 , where m = 1 − f ≈ 0.83 is the male fraction of the large sample (or m ≈ 0.87 restricting to theorists). Under the same assumption, the kth F author should be on average position hF k i = k/f, with probability distribution f k m Fk−k ( F k −1 k−1 ) (Knapp, 2010). The observed positions are F 2 = 147 (the probability of being in this or lower position is } 2 ≈ 3 10 −11 ), F 3 = 191 (} 3 ≈ 2 10 −13 ), etc., roughly fitted by F k ≈ 69k. Such low probabilities mean that gender differences are needed to account for data about top authors. Of course, it is already known that the time evolution of f = N F /(N M + N F ) is a significant confounder. I again find that such a confounder is not enough to remove the difference, as the female fraction f is larger than 1/69 at all relevant times. More precisely, after performing the age correction as in Eq. 2, the kth top female authors shift to F k = {36, 82, 114, 126, …} 33k position. Considering the age-corrected metric N icit /Δt 1:8 A the positions are F k = {20, 57, 109, 180, …} 38k. After correcting for the age confounder, }-values remain small because the female fraction 1/33 or 1/38 found among top authors remains smaller than the fraction f of female authors in the full sample. An excess of male top authors is found even after correcting for the age confounder. 15 The fact that gender differences are maximal among those top authors who receive up to 3,000 more individual citations than average authors indicates that differences do not predominantly result from sociological factors that could give factors of 2 differences at individual level (such as harder working vs. career gaps, more research vs. more teaching, specialization on physics vs. wider interests, etc).  Figure 13, adopting a measure that, on average, does not depend on the scientific age of authors. The M distribution still has a longer upper tail. 15 A varying gender fraction that culminates in a small group of extremely productive, mostly male, "star authors" has been observed in Abramo et al. (2009b and Bordons et al. (2003) (see also Kwiek, 2016). Ioannidis, Baas et al. (2019) used Scopus data about 6.9 million scientists in all disciplines to compute a "composite" bibliometric indicator, producing a list of top authors. When restricted to fundamental physics, their list (despite minor problematic aspects) is significantly correlated to my list. In their all-fields list the female authors are found in positions F k = {133, 146, 160, …}. While this naively seems to extend my findings, I cannot correct their list for the age confounder nor for other confounders.
As the variation of N F /N M survives to confounding variables, I try to better understand this effect by investigating its quantitative shape.
A look at Figures 13 or 14 suggests that the M bell has a longer tail of top authors (the raw data also show a difference in averages, mostly due to confounders, that makes the difference in variances less easily visible). The supplementary Section S2 shows that the difference in upper variances is statistically significant. I here provide a simpler, more intuitive, argument through analytical approximations based on the observation that the distributions of individual citations received by M and F authors separately are approximatively log-normal (Gaussian as a function of ℓ = ln N icit ) on their upper side. Log-normal distributions with a common standard deviation and different averages μ M ≠ μ F for M and F authors would produce a N F /N M of the form Different standard deviations M ≠ F would produce Thereby a dominant difference in averages (standard deviations) would produce a line (a parabola) when N F /N M is plotted as a function of N icit on a log-log scale. Such a plot is shown in Figure 15, again performing the usual corrections for the age confounder. The important point is that, in all cases, N F /N M exhibits a parabolic shape along the upper sides of the bells, where the log-normal approximation is accurate enough. 16 In conclusion, the data exhibit a gender difference in the upper variances. The blue points are raw data, the red points are corrected compensating for the different time evolution of the overall number of M and F authors. In the right panel a different bibliometric index is considered, which approximatively compensates also for the scientific age of each author. Data are not well fitted by a linear function in a log-log scale (which corresponds to p = 1 in the supplementary Eq. S2) and can be fitted by a quadratic function (which corresponds to p = 2). This can be interpreted as different male and female variances, as in Eq. 6, rather than as different averages, as in Eq. 5. 16 Adding higher-order terms in the exponent would not change the above conclusion, because fits to the observed distributions find small higher-order terms.
Is this statistically strong preference an artefact of the complexity of the full data sample? To answer, the analysis is repeated within the independent subsamples of supplementary Figure S11: Plotted on a log-log scale they independently tend to show a parabolic (rather than linear) trend in N F /N M . The dotted curves in Figure S11 show how well each subsample can be fitted in terms of R ≈ 2 and p ≈ 2. Furthermore, the probabilities } i that test the hypothesis of no gender difference restricting to top authors are small within the subsamples of Figure S11 (where the top authors are plotted as points), consistent with Abramo et al. (2009b) and Bordons et al. (2003).
A similar difference in upper variances is also found when looking at different bibliometric indicators; see the supplementary Section S1.3.

Self-References
Gender differences in self-references are an interesting topic on its own, as as well as a possible confounding factor to previous analyses. I verified that my previous results persist when dropping self-references, and I next justify why there is no need to drop self-references.
Bibliometric studies have found that men cite their own papers more than women: Cameron, White, and Gray (2016) focused on six ecology journals; Ghiasi, Larivière, and Sugimoto (2016) on the Web of Science database; King, Bergstrom et al. (2017) on the JSTOR database; and Hossenfelder (2018) on single-authored papers in arXiv. In agreement with such studies, restricting to solo papers, I find a 20-30% higher fraction of self-references among the papers written by M authors (7.3% versus 5.9%).
However, King et al. (2017) mention the possibility that this gender difference in selfreferences is just a reflection of the fact that male authors tend to write more solo papers, and thereby have more scientific reasons for self-references. King et al. (2017) could not check if this is a significant confounding factor, because authors are not disambiguated in their database. As authors are disambiguated in the InSpire database, I can perform this check finding that this confounding factor removes the gender difference in self-references: A similar fraction of self-references is found when comparing male and female authors who wrote the same number of papers N pap . In other terms, the fraction of self-references can be a misleading indicator because the average number of self-references grows with the number of solo papers following a scale law N self cit / N p pap with power p > 1. My data suggest p ≈ 1.3.
As solo papers are a relatively small subsample that might be not representative of the full database, I extend the analysis to multiauthored papers. In order to do so, it is necessary to distinguish self-references from self-citations. I count a citation as self-citation whenever the citing paper has at least one author in common with the cited paper. I count it as a self-reference only for those authors who wrote both the cited and the citing paper. I clarify this with an example. One paper by authors A and B cites a paper by authors B and C: B gives a self-reference; both B and C receive a self-citation (B directly and C indirectly).
As usual, fractional counting is used. For each author I compute the total number of received individual citations N icit , of received individual self-citations N received icit , of given individual selfreferences N given icit , and of fractionally counted papers N pap (equal to the number of given individual references).
I consider the mean fraction of given individual self-references N given icit /N icit (left panels of Figure 16) and the mean fraction of received individual self-citations N received icit /N icit (right panels of Figure 16). The upper row shows some gender difference: Male authors tend to give themselves a higher fraction of self-references (13.2% instead than 12.0%); female authors tend to receive a higher fraction of indirect self-citations (23.5% instead t hen 19.8%). Such differences again disappear in the lower row of Figure 16, where the self fractions are computed as a function of the number N pap of fractionally counted papers written by each author. Similarly to what we found in the solo sample, male authors tend to cite themselves and their collaborators more just because male authors tend to have more past papers. Similar results are found restricting within the main subfields.

CONCLUSIONS
I performed a bibliometric analysis of gender issues in fundamental physics worldwide from approximately 1970 to now. Bibliometrics gives quantitative data on activities of researchers that result in publications and tells nothing about other possible activities, such as teaching, mentoring, and outreach. Nevertheless, research is an interesting area in which individual talent can be expressed, as confirmed by the large differences between authors found in data. Concerning gender I find the following results: 1. First, the well-known initial gender difference in representation is seen: There are roughly four males for each female among new authors that appear at PhD-level. The initial female fraction is not positively correlated with the Global Gender Gap Index of the countries, 17 and negligibly evolves in the subsequent career stages (Figure 7). 2. When citing works by others, authors exhibit no or small gender difference: Male and female authors have the same average opinion about which research in fundamental physics deserves to be cited (Figure 3). 18 Furthermore, M and F authors give to their own papers a similar fraction of self-references (Figure 16), taking into account that M authors tend to write more papers, especially solo papers (Figure 12). 3. Female authors do not have, at hiring moments, higher average bibliometric indicators based on individual citations or fractionally counted papers than male authors (Figures 4  and 5). 4. Among authors identified by InSpire HepNames profiles (which misses authors who write very few papers), I do not find a gender difference in hired percentages ( Figure 6), in abandonment rates (Figure 7), in longest breaks between papers (Figure 9), in periods of reduced activity relatively to their average ( Figure 10).
The above results are in line with the literature, as summarized in Ceci et al. (2014): "the overall picture is one of gender neutrality"; "no evidence of women having harder time getting tenure"; postbachelor gender differences in attrition rates are significantly smaller in STEM than in life science, psychology, and social science. The literature finds a second gender difference, in productivity: "women on average publish fewer papers than men"; "there are no sex differences in citations per article" (Ceci et al., 2014). I find: 5. A productivity gap both in the fractionally counted number of publications and in their citational impact (Figure 8), which does not appear to be predominantly concentrated in specific countries, topics, periods, bibliometric indicators, journals (supplementary Figure S1), etc. The gap is also found without integrating over careers (see Figures 11 and 12). 6. A gradually increasing male fraction when going from average to top authors in terms of individual citations (or other indices, such as fractionally counted publications). The quantitative shape of this trend appears predominantly due to a higher variance on the upper side of the M distributions (see Figure 15 and Eqs. 5 and 6), rather than due to a difference in averages.
I verified that my results still hold when ignoring hyperauthored papers (possibly affected by guest authorship) or restricting to single-authored papers.
While many social phenomena could produce different averages, producing different variances would need something that specifically disadvantages research by top female authors. Just to take one example of a social nature, a gender gap in research productivity could arise if better female authors receive more honours and leadership positions that drive them away from research. However, data also show an excess of young authors among those who produced top-cited papers: The excess is observed among both M and F authors. This suggests extending my considerations from possible sociological issues to possible biological issues.
18 Some authors discuss the possibility that physicists are collectively affected by an unconscious gender bias-a concept that has received recent attention in the United States following the development of Implicit Association Tests (IAT) that claimed to reveal such biases. Even if such tests were scientifically valid (see, however, Oswald, Mitchell et al. (2013) for a metareview), reading a scientific paper involves different mental processes than those probed by IAT. Some of my results are based on citations: The same conclusions are reached when removing from my analysis citations to single-authored research, which would be more affected by a hypothetical collective gender bias. Furthermore, fields that study bias might have their own biases: Stewart- Williams, Thomas et al. (2019) and Winegard, Clark et al. (2018) found that scientific results exhibiting male-favoring differences are perceived as less credible and more offensive. Handley, Brown et al. (2015) found that men (especially among STEM faculty) evaluate gender bias research less favorably than women.
It is interesting to point out that the gender differences in representation and productivity observed in bibliometric data can be explained at face value (one does not need to assume that confounders make things different from what they seem), relying on the combination of two effects documented in the scientific literature: differences in interests (Diekman, Johnston, & Clark, 2010;Lippa, 2010;Su, Rounds, & Armstrong, 2009;Su & Rounds, 2015;Thelwall, Bailey et al., 2019) and in variability (Deary, Irwing et al., 2007;Halpern, Benbow et al., 2007;Hyde, 2014;Stevens & Haidt, 2017;Wang et al., 2013).
Greater male interest in things and greater female interest in people is observed consistently across cultures and time and is large (d ≈ 1, i.e., distributions differ by about one standard deviation): Such a difference in interests predominantly accounts for the initial difference in representation. 19 Difference in variability accounts for the difference in productivity. This is consistent with O'Dea, Lagisz et al. (2018), which confirms the difference in variabilities looking at grades, and observes that this difference alone cannot reproduce the representation gap.
The amount of higher male variability suggested by bibliometric data in fundamental physics is at the 10% level, roughly consistent with independent observations of presumably relevant traits. While such psychometrics observations predominantly probe the central, most populated, part of the distributions, I reasonably expect that physicists probe the upper tail 20 and that topcited physicists reach the far-end upper tail. I estimate to reach about five standard deviations above the mean, given that this is the maximal deviation statistically expected from a pool of 10 9 persons in a Gaussian approximation.
When dealing with complex systems, any simple interpretation can easily be incomplete, including a hypothetical gender discrimination. In any case, it is interesting that data can be explained without invoking such a hypothesis.
I conclude by addressing ethical and social values, given that a gender difference in variances is seen by some as offensive, like other ideas originally proposed by Darwin (Hill, 2017) (modestly keeping things in proportion in this comparison). The interpretation in terms of different variabilities implies that one should keep giving gender-neutral equal opportunities to everybody by considering each person based on his or her individual qualities, not as member of a demographic group (gender, nationality, or whatever). The refusal to consider population level differences in distributions when trying to understand gaps in representation can lead to discrimination allegedly aimed at establishing equal outcomes (see, for example, Strumia (2019) for a more extensive discussion of such issues). 19 Large gender differences along the people/things dimension are observed in occupational choices and in academic fields: Such differences are reproduced within subfields (Thelwall et al., 2019). In particular, female participation is lower in subfields closer to physics, even within fields with their own cultures, such as "physical and theoretical chemistry" within chemistry (Thelwall et al., 2019). This suggests that the people/things dimension plays a more relevant role than the different cultures of different fields.
Furthermore, psychology finds that females value careers with positive societal benefits more than do males: Some authors propose that women tend more to opt out of STEM because "women tend to endorse communal goals more than men" (Diekman et al., 2010;Evans & Diekman, 2009). Indeed Gibney (2007) finds that women in UK academia report dedicating 10% less time than men to research and 4% more time to teaching and outreach, and Guarino and Borden (2017) finds that women in U.S. non-STEM fields do more academic service than men. Concerning fundamental physics, old discoveries gave huge societal benefits, but no practical applications are resulting from contemporary explorations of very small and very large scales (such as production at colliders of particles that decay in 10 −25 seconds, or cosmological observations of objects billions of light years far away). 20 Physics attracted students with high average grades; see, for example, Figure 9 of Ceci et al. (2014) and Figure 1 of Ginther and Kahn (2015). Looking at career-integrated citations, physics (and especially fundamental physics) shows the largest ratio between high (90th) and low (25th) percentile (Tables 1 and S3 of Ioannidis et al. (2019)).