Heavy-Tailed Distribution of the Number of Papers within Scientific Journals

Scholarly publications represent at least two benefits for the study of the scientific community as a social group. First, they attest of some form of relation between scientists (collaborations, mentoring, heritage,...), useful to determine and analyze social subgroups. Second, most of them are recorded in large data bases, easily accessible and including a lot of pertinent information, easing the quantitative and qualitative study of the scientific community. Understanding the underlying dynamics driving the creation of knowledge in general, and of scientific publication in particular can contribute to maintaining a high level of research, by identifying good and bad practices in science. In this article, we aim at advancing this understanding by a statistical analysis of publication within peer-reviewed journals. Namely, we show that the distribution of the number of papers published by an author in a given journal is heavy-tailed, but has lighter tail than a power law. Interestingly, we demonstrate (both analytically and numerically) that such distributions match the result of an modified preferential attachment process, where, on top of a Barab\'asi-Albert process, we take finite career span of scientists into account.


I. INTRODUCTION
One of the core mechanism in the practice of science is the self examination of a field of research. The validation of a scientific result is always collective, in the sense that it has been scrutinized, criticized, and (hopefully) validated by a sufficient number of peers. Furthermore, any scientific result is permanently subject to new evaluation and might be replaced by a more accurate work. At the level of a community, scientists are then used to criticize the work of colleagues and to have their work criticized by them. It is then not surprising that some scientists started to study (and thus somehow critically assess) the scientific community itself [de Solla Price (1963)].
The quantitative study of the scientific community, sometimes referred to as Science of Science [de Solla Price (1976); Narin (1976); Fortunato et al. (2018); van Raan (2019)], is a key step to unravel the underlying behaviors of its composing agents (authors, journals, institutions, etc.). Pioneered by the early works of Lotka (1926), the science of science gained a lot of momentum in the second half of the XXth century, with the creation of the first data bases of scientific publications [Garfield (1955); de Solla Price (1965); Merton (1968)]. More recently, the scientometric investigations have been significantly eased by the emergence of large online data bases of scientific publications (Web of Science, PubMed, arXiv,...) and the ever increasing computation power of modern computers. These improvements allowed to analyze scientometric indicators on a larger scale [Wang and Waltman (2016); Frandsen and Nicolaisen (2017)] and with finer resolution in terms of publication units (considering single articles instead of whole journals [e.g., ]) and time [Egghe and Rousseau (2000); Newman (2001)]. For a clear historical overview of scientometrics, we refer to van Raan (2019).
The science of science has the potential to help maintaining the quality of research, and thus a good use of public funding. There is nowadays an increasing number of scientific papers [de Solla Price (1965); Bornmann and Mutz (2015)], combined with the ubiquitous presence of predatory journals which publish the papers they receive, charging publication fees, but without performing the fundamental editorial work that guarantees the papers' quality (e.g., quality and pertinence check, referee process) [Bohannon (2013); Sorokowski et al. (2017)]. In such a context, distinguishing bad practices from honest work in scientific publishing becomes more and more challenging. Understanding the underlying dynamics of scientific publication will be instrumental in this endeavor.
The fight against predatory publishing has benefited from the effort of many dedicated citizens, whose initiatives have shown their efficiency [Butler (2013); Grudniewicz et al. (2019)], as well as their limits [Beall (2017)]. In regard of the proliferation of predatory journals, the task of identifying all of them unequivocally is overwhelming. In such a context, the ability to perform a preliminary data-based sanity check of a given journal would allow to focus the resources on the more problematic venues. Rev. D (PRD) among the authors who published in these journals. For each value of n, the height of the bar gives the proportion of authors who published n articles in the corresponding journal. Best distribution fits (see Sec. II A) are displayed for an exponential distribution (gray dotted), a power law (dashed black), an power law with cutoff (dash-dotted black), and a Yule-Simon distribution (dotted black). The arrows indicate significant peaks in the number of authors corresponding to the ATLAS and CMS experiments at the CERN. Right: Two-dimensional, color-coded histogram of the number of authors with respect to the number of papers published in PRL (horizontal axis) and PRD (vertical axis).
However, such an approach requires an accurate understanding of the quantitative and qualitative characteristics of scientific journals which is still scarce.
The quality of a scientist's work is commonly quantified by two different, but related, measures. Namely, their number of papers and the number of citations thereof (summarized in the h-index [Hirsch (2005); Siudem et al. (2020)]). A vast majority of investigations about the scientific publication process is focused on the citation side. These analysis mostly aim at describing how the citation network impacts the number of citations a given paper is (and therefore its authors are) likely to receive. In particular, evidence suggests that citations follow a cumulative advantage or preferential attachment process, where the more citations a scientist has, the more likely they are to get new citations [de Solla Price (1976)]. This process leads to a power law distribution of citations [Eom and Fortunato (2011);Waltman, van Eck, and van Raan (2012)] or other heavy-tailed distributions [Thelwall (2016)]. Indeed, preferential attachment has been proven to lead to heavy-tailed distributions [Krapivsky, Redner, and Leyvraz (2000)], with some refinements to account for the life-time of a paper [Parolo et al. (2015)].
As early as 1926, Lotka showed that, in the field of chemistry, the number of scientists having published N papers is proportional to N −2 [Lotka (1926)]. In other words, he showed that the distribution of the number of papers published by scientists follows a power law. Later on, the same analysis has been extended to other fields of science [e.g., Gupta and Karisiddappa (1996); Wagner-Döbler and Berg (1999); Huber and Wagner-Döbler (2001b,a); Sutter and Kocher (2001) ;Newby, Greenberg, and Jones (2003); Barrios et al. (2008); Pal (2015)] and refined to more elaborate distributions, such as the power law with cutoff [Saam and Reiter (1999); Kretschmer and Rousseau (2001); Smolinsky (2017)] or the stretched exponential distribution [Laherrère and Sornette (1998)]. Despite this early start, the number of papers published by a scientist has been less investigated than the number of citations that a paper or a scientist gets.
With the objective of refining these past analysis, in this article, we focus on the distribution of the number of papers published by scientists within a given peer-reviewed journal. The distribution of the number of papers is both easily accessible (through any scientific publication data base) and informative. Indeed, various characteristics of the publication dynamics within a journal can be extracted from the aforementioned distribution. We illustrate this claim in the striking examples of Physical Review Letters and Physical Review D, shown in Fig. 1, where the analysis of the distribution emphasizes: (i) an underlying preferential attachment dynamics; (ii) the finiteness of the scientific careers; and (iii) the presence of (very) large groups of scientists in the related fields of physics (see caption of Fig. 1 for a detailed discussion).
As interestingly pointed out by Sekara et al. (2018), publishing in a peer-reviewed journal (especially in high-impact ones) is more likely if one author of the manuscript already published in the same journal. Such a process can be interpreted as preferential attachment, and an expected outcome of such an observation is a high representation of a few authors in a given journal [Krapivsky, Redner, and Leyvraz (2000)]. Furthermore, a scientist whose field of research is well-aligned with a journal topic is likely to publish a large proportion of their work in this journal, leading again to a high representation of a few specialized authors in a given journal.
The heavy-tailedness of the distribution of the number of papers is striking in the histograms (see Figs. 1 and 2). Indeed, the tail of the histogram is stronger than the best exponential fit to the data (gray dotted line). However, as we show below, the famous power law is not a good fit to the data neither, and the actual distribution lies somewhere between an exponential and a power law. In addition to our analysis of the distribution, we propose an adaptation of the preferential attachment law that models the evolution of the number of papers of a set of authors, within a journal.  Table I for legends). As in Fig. 1, for each value of n, the height of the bar gives the proportion of authors who published n articles in the corresponding journal. The gray dotted line is an exponential fit of the data, emphasizing that the distribution is heavy-tailed. We also show the best fit (MLE), discussed in Sec. II A, for a power law distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashed line indicates the theoretical maximal number of papers if the distribution was the fitted power law [see Sec. IV). The same plots for the other journals are available in Fig. 1 I. Labels, names, and number of authors in the journals considered. In parenthesis is given the reduction year (discussed in Sec. IV) and the number of authors up to this year. One (resp. two) asterisk(s) indicate the journals where authors with one (resp. two) paper(s) are discarded.

II. EMPIRICAL AND FITTED DISTRIBUTIONS
We consider an arbitrary selection of 14 peer-reviewed journals (Table I), whose data are available on the Web of Science data base (WoS, www.webofscience.com). The selected journals vary in age (from a few decades to more than a century) but are not too young, in order to have sufficiently many papers available, and all of them are still publishing nowadays. Whereas the choice of journals is arbitrary and limited, we tried to cover a diversity of disciplines of the natural sciences and various time spans. The limited sample of journals do not allow us to be claim any universality in our results, but we argue that it demonstrate the pertinence of our approach in the quantitative analysis of the scientific publication process.
We denote by J = {NAT, PNA, ..., PRL} the set of journals considered (see Table I for the list of labels). Within each journal J ∈ J , we index authors by an integer i = 1, ..., A tot J , A tot J being the number of authors who published in journal J. Then for each author i = 1, ..., A tot J , we count the number n J i of papers published by author i in journal J up to year 2017 in the whole WoS data base (meaning from year 1900 or the year of the journal's creation, whichever is the latest). This process yields the set of data D J = {n J i : i = 1, ..., A tot J }, which is a set of A tot J integer numbers. We restrict our investigation to papers labeled as "Article" in the WoS data base, to focus on peer-reviewed papers.
From the data set D J we can compute the number and proportion of authors who published n papers and by definition, n a J (n) = 1. The proportion a J is represented in logarithmic scales in Figs. 1, 2, and 9, each panel corresponding to a different journal.
Remark. Note that we did not take into account the fact the different papers are co-signed by multiple authors. Consequently, different papers have different "weights" in the data set. This article is mostly interested in the number of papers from the point of view of the authors, it is then adequate to count, for each author, the number of paper they signed, independently of the number of co-authors. Refining the analysis and taking into account the number of co-authors on each paper would be the purpose of future work.
Note also that we do not take into account papers published anonymously, which represent a large number of papers in medicine journals in particular.
Finally, for some journals, the number of authors is too large to be downloaded from the WoS data base. As a consequence, the authors having published only one or two papers in these journals have to be removed from the data (e.g., NAT, PNA, or SCI, indicated by asterisks in Table I).

A. Distribution fitting
In regard of the apparent heavy-tailedness of the distribution, it is tempting to fit a power law. However, as pointed out by Clauset, Shalizi, and Newman (2009), such fitting should be done with care in order to avoid spurious conclusions [Broido and Clauset (2019)]. We therefore fit three heavy-tailed distributions and assess the goodness-of-fit of our fitting following Clauset, Shalizi, and Newman (2009), which is encoded in a p-value. Numerical results are summarized in Table II.
For each empirical distribution of the number of papers published by an author i in journal J, we fit an exponential distribution (gray dotted lines in Figs. 1 and 2) to emphasize their heavy-tailed behavior. The three heavy-tailed distribution that we fit are: • A power law distribution (black dashed lines in the figures), with α > 1 and C α ∈ R normalizing the distribution; • A power law with cutoff (black dash-dotted lines in the figures), with β > 1, γ > 0, and normalizing constant C β,γ ∈ R; • A Yule-Simon distribution (black dotted lines in the figures), with ρ > 0, C ρ ∈ R is the normalizing constant, and where B(x, y) is the Euler beta function.
We perform the distribution fitting by optimizing the parameters α, β, γ, and ρ with a Maximum Likelihood Estimator [Clauset, Shalizi, and Newman (2009)]. The curves of the fitted distributions are plotted in Figs. 1, 2, and in the supplementary figure 9, and the fitted parameters are given in Table II. Other distributions (such as log-normal, Lévy, Weibull) were tested and discarded because they were far from matching the data.

B. Goodness-of-fit
To evaluate the goodness of our fits, we again follow Clauset, Shalizi, and Newman (2009), to which we refer for an in-depth discussion of heavy-tailed distribution fitting. The whole goodness-of-fit estimation is summarized in Fig. 3. No set of data is well-fitted by a power law distribution. However, the power law with cutoff seems to be a good fit for three journals (SCI, PLC, CHA), and the Yule-Simon distribution seems to correctly fit the distribution of NEM and SIA. For the other journals, none of the distributions seem to fit the data appropriately.
Let us denote by θ J the parameters of the distribution P (X; θ) (e.g., θ J = α for the power law distribution), fitted to the data set D J . We generate 5000 sets of synthetic dataD i , i = 1, ..., 5000, each of them composed of A tot J = |D J | integer numbers, drawn randomly from the probability distribution P J = P (X; θ J ). For each of these synthetic data setsD i , we perform again a MLE to fit the same distribution P (X; θ), yielding parametersθ i and the distribution P i = P (X;θ i ).
The goodness-of-fit then relies on how well F e , the empirical cumulative distribution function (ECDF) for a given set of data, matches F t , the theoretical cumulative distribution function (TCDF) of its fitted distribution. We define and F e J and F t J are defined similarly with the data set D J . The p-value of the goodness-of-fit is then given by where the Kolmogorov-Smirnov distance between two cumulative distribution functions F 1 and F 2 is defined as the maximum difference between them, i.e., Namely, p is the proportion of synthetic data sets that are further from the theoretical distribution (in the Kolmogorov-Smirnov sense) than the analyzed data set. The fit is rejected if p < 5%, and considered as good otherwise [see Clauset, Shalizi, and Newman (2009) for more details]. This goodness-of-fit estimation is performed for each journal J ∈ J and each distribution listed above (power law, power law with cutoff, and Yule-Simon). The results are presented in Table II and the resulting distributions together with the data are shown in Figs. 1, 2, and in the Supplementary Figure 9.
As can be seen in Figs. 1, 2, and Supplementary Figure 9, the power law distribution is a poor fit for all data, its p-value being zero for all journals. Indeed, for most of the journals, the tail of the data set is lighter than the tail of its power law fit (black dashed lines). For three journals (namely SCI, PLC, CHA), the p-value of the power law with cutoff is larger than 5% and it seems to be a rather good fit, and for two others (NEM and SIA), the Yule-Simon distribution cannot be excluded.

III. GENERAL DYNAMICS
We argue that the heavy-tailedness observed in the previous section is likely to be a consequence of a preferential attachment or cumulative advantage process. Many social processes are ruled by the so called preferential attachment [Jeong,Néda,and FIG. 3. Scheme of the goodness-of-fit computation. For a given journal J, the data set DJ is fitted with a distribution whose parameters are θJ , and we compute the Kolmogorov-Smirnov (KS) distance between its empirical and theoretical cumulative distribution functions. Then, based on the parameters θJ , we generate 5000 synthetic data setsDi for i = 1, ..., 5000, on which we repeat the same process. Finally, the p value is the proportion of sythetic data sets whose empirical and theoretical cumulative distribution functions are closer to each other (in the KS sense) than for the original data set DJ . Barabási (2003)], also called cumulative advantage. Scientific co-authorship [Barabási et al. (2002)], citations [de Solla Price (1976); Eom and Fortunato (2011)], and performance of scientific institutions [van Raan (2007)] are apparently no exception to the rule. For instance, according to Eom and Fortunato (2011), the probability that a paper will get a new citation at time t is proportional to the number of citations this paper already has at time t.
Such processes naturally lead to power laws in the relations between characteristics of the systems of interest. For instance, Katz (1999) showed that the number of citation a scientific community gets is a power law of the number of publications in this community, with positive exponent (≈ 1.27). More recently, Bettencourt et al. (2010) illustrate that the Gross Metropolitan Product of a city is a power law of its population, with positive exponent (≈ 1.126). In a similar spirit, Barabási and Albert (1999) showed that the empirical probability that a web page is targeted by k other pages follows a power law with negative exponent (≈ −2.1).
It is reasonable to expect that the evolution of the number of papers published by an author in a given journal is described by a similar preferential attachment process. We support the hypothesis of a preferential attachment or cumulative advantage process by two distinct but similar analysis of publication data.
Remark. Notice that even though we refer to the two analysis below as preferential attachment and cumulative advantage respectively, these two denominations fundamentally refer to the same general process [Perc (2014)]. The main reason for us to use these two denominations is to distinguish the two analysis. Furthermore, the line of reasoning underlying each of our analysis is inspired by the definition of the corresponding notion ("preferential attachment" or "cumulative advantage").

A. Preferential attachment
Heuristically, our first argument is that if an author published a lot of papers in a journal, it means (i) that they write a lot of papers, and (ii) that their research topic is well-aligned with the scope of the journal (for specialized journals), or that the scientific impact of this author's research matches the standards of the journal (for interdisciplinary journals). Assumptions (i) and (ii) together imply that this author is likely to publish again in this journal. We refer to this process as preferential attachment.
The above heuristic can be made more rigorous. For a given journal and for k, t ∈ Z ≥0 , we define: • S(k, t): the set of all authors who have published k papers on December 31st of year t − 1; • A k (t) = #S(k, t): the number of authors in the set S(k, t); • N k (t): the number of papers published during year t by all the authors in the set S(k, t); • ρ k (t) = N k (t)/A k (t) ∈ R: the average number of papers published during year t, by the authors in the set S(k, t).
In Fig. 4, we plot the values of ρ k (t) with respect to the number of papers k for years t ∈ {1999, ..., 2008} for SCI, LAN, and PRL (each point corresponds to one year t and one number of papers k). For each of the three journals, these values have a linear correlation coefficient larger than 0.7, supporting a fairly good linear dependence, Note that, for each year considered, we do not take into account authors who did not publish, because the majority of those are not active anymore. The empirical probability that a new paper is signed by an author with k papers is then close to be proportional to k. Krapivsky, Redner, and Leyvraz (2000) rigorously proved that, if the relation in Eq. (8) was exactly proportional, then after a long enough time, the distribution of the number of papers over the set of authors would be a power law with exponent α ≤ −2. The fact that the relation (8) is not exactly proportional, but close to be, probably explains that the observed distributions have tails that are heavy, but lighter than the power law, as suggested in Figs. 1, 2.

B. Cumulative advantage
The concept of cumulative advantage, which is directly related to preferential attachment, has been derived from the seminal work of Merton [Merton (1968[Merton ( , 1988] and Price [de Solla Price (1976)], and the follow-up by Katz [Katz (1999)]. Cumulative advantage emphasizes that an initial advantage leads to a disproportionate advantage in the future. For instance, it has been shown that, if author i has twice as many publications as author j, then they are likely to get more than twice as many citations [Katz (1999)].
In the context of interest for this article, cumulative advantage translates as follows. Assume that author i and author j have respectively n i (t 0 ) and n j (t 0 ) papers in a journal at time t 0 , with a ratio η ij (t 0 ) = n i (t 0 )/n j (t 0 ) > 1. Then cumulative advantage means that, at a later time t 1 > t 0 , the ratio η ij (t 1 ) ≥ η ij (t 0 ), implying that author i gains a disproportional advantage over time. Mathematically speaking, cumulative advantage implies the following equivalences, where we defined ξ i (t, s) = n i (s)/n i (t), and where equalities hold if the relation in Eq. (8) is exact. In order to support the presence of a cumulative advantage in the publication within the journals SCI, LAN, and PRL, we computed ξ i (1999,2008) for each author who published between 1999 and 2008. The statistics of ξ i are shown in Fig. 5 as a function of the initial number of papers n i (1999). Even though the data are not perfectly conclusive, we clearly observe an increasing trend of ξ i as a function of n i , suggesting that the relation of Eq. (9) may be satisfied. This observation supports (at least partly) a cumulative advantage process, and henceforth the presence of a power law.
The increasing trends in Fig. 5 even suggest a superlinear cumulative advantage [Zhou et al. (2007); Krapivsky and Krioukov (2008)]. Indeed, as mentioned above, if the relation Eq. (8) was exact, ξ i (t 0 , t 1 ) would be constant with respect to n i (t 0 ). In such a case, the heavy-tailed distribution observed in Figs. 1, 2, and 9 would be the transient state of the distribution discussed by Krapivsky and Krioukov (2008). A more in-depth analysis of the possibility of a superlinear cumulative advantage could be done, following the calibration approach proposed by Zadorozhnyi and Yudin (2015), but goes beyond the purpose of this article and will be treated in future work.  III. Fitted parameters and p-value of the goodness-of-fit for power law (PL), power law with cutoff (PLwC), and Yule-Simon (Y-S) distributions, for the 9 journals with reduced time span. We see that the only data that are well-approximated by the power law are for NAT when reduced to the first 3374 entries of WoS. The power law with cutoff, however, seems to be a good fit for the reduced data of six journals (NAT, PNA, SCI, LAN, TAC, and ENE). ENE is particularly well-fitted by the power law with cutoff. Finally, the Yule-Simon distribution seems to correctly fit the distribution of PAN, PLC, and ACS. For the other journals, none of the distributions seem to fit the data appropriately. Remark that the reduced data of NAT and PNA are correctly fitted for two distributions indicating that the amount of data is probably not sufficient for a good fit.

IV. KEY PLAYERS
The general distribution of the number of papers per author is quite clear in our analysis, it seems to be somewhere between an exponential distribution and a power law. The power law having the heaviest tail of the three distributions considered (power law, power law with cutoff, and Yule-Simon), we use it to estimate an upper bound on the number of papers published by an author for each journal. Assuming that the data are well-described by the power law distribution in Eq. (2), one can compute the number of authors with n papers in journal J, A n ≈ A tot J C α n −α . Setting this number to A n = 1, the maximal number of papers is given by n max ≈ (A tot J C α ) 1 α , determining a theoretical upper bound on the number of papers published by an author for each journal, shown as the vertical dashed lines in Figs. 1, 2, and 9.
In some journals (see e.g., PNA, CHA, SIA, and AMA in Fig. 2, and NEM and ACS in the Supplementary Figure 9), it appears that, some authors, which we refer to as key players, publish significantly more papers in a journal than what the power law would predict. Note that we checked that these key players are not artifacts due to multiple authors having the same name which would count as the same person.
In order to make the data of different journals more comparable, we restricted our investigation to the early years between 1900 (earliest possible in WoS) and the year in parenthesis in the second column of Table I for our first nine journals in the table. This yields a number of authors comparable to the three following journals in Table I (CHA, SIA, and AMA). The reduced number of authors is given in parenthesis in the third column of Table I. The resulting distributions are depicted in Fig. 6 and in the Supplementary Figure 10, and the fitted parameters are detailed in Table III. It appears from Fig. 6 and the Supplementary Figure 10 that for such reduced number of authors, the overshoot of some authors is more systematic, suggesting that in the early years of scientific journals, there is usually a few very prolific authors publishing in it at a rather high rate. FIG. 6. Histograms of the number of papers n published in the six journals indicated in the insets, among the authors who published in these journals (see Table I for legends). Data are restricted to the years between 1900 (earliest possible in WoS) and the years indicated in the insets. The number of authors covered is given in parenthesis in the third column of Table I. . As in Fig. 1 and 2, for each value of n, the height of the bar gives the proportion of authors who published n articles in the corresponding journal. We show the best fit for a power law distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashed line indicates the theoretical maximal number of published papers if the distribution was the fitted power law (see Sec. IV). We observe an almost systematic exceeding of the number of papers published by some authors. The same plot for other journals is available in the Supplementary Figure 10.
Considering the results of the fitting, in Table III, we observe better agreements than for the full data sets. This probably indicates that the sample size is not large enough to accurately fit heavy-tailed distributions, which obviously need large samples. The fact that NAT and PNA are well-fitted by two distributions, also indicates that the reduced data sets are not large enough to be conclusive.

V. MODELING
We observe in Figs. 1, 2, and in the Supplementary Figure 9 that for old journals where a lot of papers are published, the tail of the histogram has a rather fast decay after a heavy-tailed regime (this is particularly striking in PRL and PRD, Fig. 1). We explain this observation by the fact that the number of publications of a given author depends on two parameters, namely their publication rate and the length of their career. Both these quantities are bounded in practice and even if it is possible to publish a very large number of papers in a given journal, there is a practical limit to this number. We hypothesize that the decay in the histograms of long-living journals comes from the finiteness of publication rates and career lengths.
To support our hypothesis, we propose a model to generate data sets that mimic the distributions observed above. As discussed, this model is built on two main dynamics. Fundamentally, it is a preferential attachment process, where the likelihood that a researcher is in the author's list of a new paper is proportional to the number of papers this researcher already has in this journal. But in addition, it is refined with a limited career span, requiring that after some time, the likelihood that a researcher publishes a new paper decreases to reach zero after they retire.
The models is based on five parameters: • N y ∈ Z ≥0 : The number of years, i.e., number of iteration, over which the model is run.
• N p ∈ Z ≥0 : The number of papers that are published every year in the synthetic journal; • ρ 0 ∈ [0, 1]: The proportion of papers that are authored by new researchers who have not yet published in the synthetic journal; • T min , T max ∈ Z ≥0 : The likelihood that an author publishes a new paper decreases linearly after their T min th year of activity, until reaching zero at their T max th year of activity. We illustrate this likelihood in Fig. 7 The model is arbitrarily initialized with some number of authors each with a few papers in the synthetic journal, gathered in the data set D(0) = {n 1 (0), n 2 (0), ..., n A(0) (0)}. Then for each year t ∈ {1, ..., N y } where the model is run, N p papers are attributed randomly either to new authors (i.e., who have not yet published) with probability ρ 0 , or to an existing author with probability 1 − ρ 0 . If it is attributed to an existing author, the probability that it is attributed to author i is: • proportional to n i (t), the number of papers published by i at year t; • linearly decreasing for T i (t) ∈ [T min , T max ], where T i (t) is the "academic age" of i, which is the number of iteration between t and the first publication year of i. Mathematically, knowing that the new paper is attributed to an existing author, the probability that it is attributed to author i at year t is given by where Z(y) is the appropriate normalizing factor. The actual implementation of this model is available online [Delabays (2022)].
Histograms of the outcome of this model are illustrated in Fig. 8 and the fitted parameters are in Table IV. We observe a clear similarity between the histograms for synthetic and real data. Namely, for short lifetime (N y = 50), some authors beat the power law and exceed the number of papers that would be expected, as is observed in Fig. 2 for CHA, SIA, and AMA. For longer lifetime (N y = 150) the tail of the distribution decays and loses its heaviness similarly as PRL and PRD in Fig. 1.
These observations advocate in favor of the hypothesis that the two main ingredient in the description of the evolution of the authorship within journals are both the preferential attachment and the finiteness of careers.

VI. DISCUSSION
The main observation of our article is the heavy-tailed shape of the distribution of papers, which we explain by a preferential attachment or cumulative advantage process. Heavy-tailedness in distributions related to scientific publications, especially in citation or collaboration networks, has widely been documented [de Solla Price (1976); Eom and Fortunato (2011)]. We showed that heavy-tailedness is preserved when restricting the analysis to a single journal.
Interestingly, our analysis suggests that the distribution does not follow a power law, but has a slightly lighter tail. Whereas we have not been able to unequivocally identify a canonical distribution, we demonstrated that a power law with cutoff or a Yule-Simon distribution seem to be better fits to the data than the power law.
We argue that the observed heavy-tailedness of the distribution follows from a preferential attachment process through three pieces of evidence. First, we showed that the probability that an author gets a new paper in a given journal at time t is approximately proportional to the number of paper they already have in the very same journal. According to Krapivsky, Redner, and Leyvraz (2000), exact proportionality would lead to a power law. Therefore, it is likely that an approximate proportionality leads to an heavy-tailed distribution.
Second, we emphasized an approximate cumulative advantage process, which also leads to power law behaviors. Whereas both what we refer to as preferential attachment and cumulative advantage are closely related, they display two underlying mechanisms explaining the heavy-tailedness of the distributions.
Finally, we provided a mathematical model for generating synthetic data of number of papers in a given journal, where preferential attachment plays a crucial role. The similarity between the obtained distribution and the observed distributions also supports the claim of the heavy tails being driven by preferential attachment.
Even though there seems to be a pattern in the data analyzed in this article, standard distributions (e.g., power law with cutoff, Yule-Simon) do not perfectly fit the data. More advanced fitting techniques could identify a common distribution for all journals, provided that one exists. A more refined explanation of the approximate preferential attachment taking place in scientific publishing could unravel with more certainty the source of the distributions observed in this article. Even though the preferential attachment has been emphasized in the past, the underlying reasons of this bias are intricate. Disentangling the impact of scientific factors (quality and novelty of the research) and more social ones (rank and reputation of the authors) in the publication process will be a key step towards a fair and square evaluation of scientists and their work.  Table I for legends). As in Figs. 1 and 2, for each value of n, the height of the bar gives the proportion of authors who published n articles in the corresponding journal. The gray dotted line is the exponential fit of the data, emphasizing that the distribution is heavy-tailed. We show the best fit for a power law distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashed line indicates the theoretical maximal number of published papers if the distribution was the fitted power law.  Figure. Histograms of the number of papers n published in the six journals indicated in the insets, among the authors who published in these journals (see Table I for legends). Data are restricted to the years between 1900 (earliest possible in WoS) and the years indicated in the insets. The number of authors covered is given in parenthesis in the third column of Table I. . As in Fig. 1 and 2, for each value of n, the height of the bar gives the proportion of authors who published n articles in the corresponding journal. We show the best fit for a power law distribution (dashed black), power law with cutoff (dash-dotted black), and Yule-Simon distribution (dotted black). The vertical dashed line indicates the theoretical maximal number of published papers if the distribution was the fitted power law. We observe an almost systematic exceeding of the number of papers published by some authors.