Consistency pays off in science

Abstract The exponentially growing number of scientific papers stimulates a discussion on the interplay between quantity and quality in science. In particular, one may wonder which publication strategy may offer more chances of success: publishing lots of papers, producing a few hit papers, or something in between. Here we tackle this question by studying the scientific portfolios of Nobel Prize laureates. A comparative analysis of different citation-based indicators of individual impact suggests that the best path to success may rely on consistently producing high-quality work. Such a pattern is especially rewarded by a new metric, the E-index, which identifies excellence better than state-of-the-art measures.


I. INTRODUCTION
The number of scientific papers has been growing exponentially for over a century [1,2].The number of papers per author has been relatively stable for a long time, but it has been increasing over the past decades [1], favored by the growing tendency of scientists to work in teams [3].
Such increased productivity is incentivized by career evaluation criteria that typically reward large outputs, making scientists less risk averse when choosing research directions [4].This, however, may come at the expense of the quality of research outcomes [5,6].Indeed, it has been shown that the exponential growth of the number of publications corresponds to a much slower increase in the number of new or disruptive ideas [7,8].However, while scholars should focus on quality, it is unclear whether it is more rewarding to pursue rare hit papers, have a consistent track record of valuable outputs, or be in between these scenarios.Analyzing the careers of arguably the most successful class of scientists, Nobel Prize laureates, may help address this issue.In particular, we would like to check if there is a dominant path to success in the careers of such illustrious scholars.
To that effect, we consider a broad range of evaluation metrics that reward one-hit wonders alongside those that favor a consistent production of high-quality research and investigate their effectiveness in identifying Nobelists from within a more extensive set of similarly productive scientists.We find that the best-performing metrics are indeed the ones that prioritize a consistent stream of high-quality research.
The rest of this article is organized as follows.We first describe the data collection and curation in Sec.II.Then, we briefly review some popularly adopted impact metrics and introduce two new ones.In Sec.III, we describe and discuss the two sets of experiments we used to check which of the two competing scenarios is more common.Finally, we give our conclusions in Sec.IV.

A. Data
We consider three fields in which the Nobel Prize is awarded: Physics, Chemistry, and Physiology or Medicine (abbreviated henceforth as Medicine).
The publication records of scientists are obtained from two sources.For Nobelists, we use the hand-curated dataset with explicit annotations for prize-winning papers [9].As a baseline, we consider scientists with verified Google Scholar (GS) profiles tagged with either Physics, Chemistry, Physiology, or Medicine as of May 2021.
We use the 2017 version of the Web of Science (WoS) database to compile the citation statistics of the articles.We rely on gathering data from different sources on purpose, as WoS and GS complement each other well.GS offers the possibility of obtaining accurate publication records of individual scientists without the need to perform name disambiguation [10].WoS lets us reconstruct the citation history of individual papers.Both ingredients are necessary for the type of analysis that we perform in this paper.
We adopt a similar methodology as Sinatra et al. [11] to match papers across databases.Given a paper p written by author a in GS, we list the papers P a in WoS authored by people with the same last name as a.From P a , we select the paper p with the highest normalized Levenshtein similarity between the corresponding paper titles [12].We consider it a successful match only if the similarity exceeds 90%.Otherwise, we discard p from further analysis.Following this procedure, we could match 78.1% of papers by Nobelists and 49.6% of papers by baseline scientists, respectively.For our analysis, we only consider scientists who published their first paper after 1960 and have a portfolio with at least ten papers.Detailed statistics are provided in Table I.
H: H-index, i.e., the largest number H of the top-cited papers with at least H citations [13].
G: G-index, i.e., the largest number G of the top-cited papers with at least G 2 combined citations [14].
, up to a constant factor, where Θ is the Heaviside function, i.e., Θ(x) = 1 if x > 0 and 0 otherwise, and c 10,i is the citations gained by paper i within 10 years of publication.We normalize c 10,i by dividing it with the average c 10 of all papers published in the same discipline and year as paper i [11].
Q: a variant of the unnormalized Q-index, where we use the total number of citations c i instead of c 10,i .
We observe that these measures have their unique preferences for ranking portfolios.Some, like C max , appear to reward one-hit wonders, while others, like H, reward consistency.One of the goals of this work is to identify and differentiate Nobelists from baseline scientists.Therefore, we argue that we need a new, simple, yet interpretable metric covering the whole portfolio spectrum.

C. Citation Moment and E-index
Given a publication portfolio P, one may consider the following extreme scenarios: • Citations are equally distributed among the papers, with each paper having C tot /N citations.
• A single paper accounts for all citations.
In the first case, there is a sustained production of work of similar quality, while the second represents a one-hitwonder situation.Citation Moment.We propose the citation moment M α , a new parametric measure that can reward both scenarios, as well as the ones in between, depending on the value of the parameter α.It is defined as where α is a real positive number.We remark that M α is essentially an average of the citation scores of the papers, where the weight of each score is modulated by the exponent α.We can make the following observations of the behavior of our metric for different values of α. α → 0: M α behaves like Q as c α ≈ log c, but unlike Q, it accounts for uncited papers.0 < α < 1: M α is higher for balanced portfolios, i.e., ones with a more uniform distribution of citations.

E-index.
We also propose an additional parameterfree measure that, like M α , is sensitive to the distribution of citations.We call this metric E-index, defined as which reaches its maximum C avg log N when citations are distributed equally among papers, favoring authors with large average numbers of citations.In fact, E(P) is just the product of the average number of citations C avg and of the Shannon entropy of the citation distribution.

D. Behavior of metrics on stylized portfolios
To better understand the behavior of the different metrics in our analysis, we consider a portfolio with n cited papers with C tot /n citations each and N − n uncited papers.In Table II, we show the values several key metrics take in this case.
We see that the citation moment M α (for α = 0, 1), E-index, and G-index depend on n, N , and C tot .The Hindex and the Q depend only on the cited papers.So, for example, two portfolios with identical values of C tot and n would have the same H-index, regardless of the number of uncited papers.Furthermore, even though G-index depends on all three parameters, it depends on them in a somehow undesirable way.For example, a portfolio with

Metric
Value In Fig. 1, we plot Nobelists and baseline scientists according to their number of papers and the total number of citations.As expected, most Nobelists lie in the top right region, indicating high levels of both productivity and impact.However, there appear to be a few Nobelists in the top left, indicating that they only produced a handful of high-impact papers.To further illustrate this difference, we consider two Nobelists in Physics, David J. Gross (2004) and John M. Kosterlitz (2016), and plot their publication timelines in Fig. 2. Gross has a consistent production of high-impact works, while Kosterlitz stands out for having a single big paper.
We now focus on two tasks: portfolio classification and future Nobelist identification.

A. Portfolio classification
We test the performance of the metrics in distinguishing the portfolios of Nobelists from those of the baseline scientists.We consider two subtasks which we describe below.We use the area under the precision-recall curve (AUC-PR) in each task as the performance metric.This curve shows the trade-off between precision and recall at different thresholds.Bounded between 0 and 1, higher AUC-PR values indicate better classification performance.For random predictions, AUC-PR is the fraction of positive samples.AUC-PR is better suited for imbalanced datasets than the area under the receiver operating characteristic curve (ROC-AUC) [15].Results for the ROC-AUC are reported in Appendix B and are consistent with the analysis done using AUC-PR.Full.
We use the entire portfolio of the scientists described in Sec.2(a).Pre-award.
We construct the pre-award portfolio of Nobelists, i.e., the set of papers published until the year of the prize-winning paper, discarding those with fewer than ten papers.We find that 15 (27%), 28 (55%), and 22 (39%) of Nobelists in Physics, Chemistry, and Medicine, respectively, satisfy the above criteria.
Specifically, for a Nobelist who published their first paper in year y 0 and wrote their prize-winning article in year y p , we consider the papers published and citations accrued between years y 0 and y p − 1.We then pair the Nobelist with 20 baseline scientists who published their first papers around the year y 0 and wrote at least ten papers in their careers' first y p − y 0 years.Optimal α selection.Recall that, unlike other measures, M α has a tunable parameter α.Therefore, for each task, we record the performance of M α across a range of α values and plot the results in Fig. 3.We observe a slight dependence of the optimal α-value (α * ) on the task and the field.We use the corresponding α * values while comparing the performance of M α with other metrics.In each case, however, we find α * < 1, which indicates that portfolios are most separable when the metric prioritizes consistent impact.
We record the metrics' performance in Table III.In Appendix A, we report the classification results on the American Physical Society (APS) bibliographic dataset.
Metrics agnostic to the distribution of citations appear to perform worse than their counterparts across either task.This includes the total number of papers N , as well as total citations C tot , and maximum citations C max .We highlight the performance of three metrics: N , C avg , and C max .N is consistently the worst performer because it does not account for the impact, only volume.C avg is among the top performers considering the whole portfolio.We believe that is partly due to the nature of the distributions observed in Fig. 1, where the Nobelists are likely to accumulate higher than average citations over their careers.However, performance for the pre-award portfolios is a bit worse, probably because we only consider the pre-award period of their careers.Winning the Prize has been shown to provide a tangible boost to the overall visibility of a scientist, resulting in more citations [16].The number of citations of the most cited paper C max is among the worst performers, which suggests that the one big-hit portfolio is not typical among Nobelists.This finding supports that scientists win the Nobel Prize after years of consistent, high-quality work.
We now shift our focus to the other category of indicators, i.e., ones sensitive to the citation distributions.
We find that H records mediocre performance despite rewarding consistency.Its dependence on productivity likely fails to account for the Nobelists with a few highly cited papers.The Q-index performs poorly.However, its variant, Q, fares considerably better, which is consistent with the fact that it is similar to M α for small α.
M α and E consistently rank in the top 2 positions.This further supports the hypothesis that Nobelists set themselves apart by producing a steady stream of highimpact work.

B. Identifying future Nobelists
As a test of the predictive power of the metrics, we check whether we can identify scholars who received the Nobel Prize from 2018 to 2022, i.e., the period not covered by our WoS dataset.First, we note that our set of baseline scientists may be missing some of these new Nobelists, in which case we add them manually, provided they have a GS profile.
Then, for each metric, we construct a top 20 list of baseline scientists by ranking them in descending order and highlighting the Nobelists.We report the table for  the E-index in the main text (Table IV), while the remaining lists can be found in Appendix C.
In Table V, we show how many Nobelists appeared in the top 20 lists for each metric.E-index outperforms all other indicators, proving particularly effective for Medicine.
To further corroborate this conclusion, we matched each Nobelist with a baseline scientist with (nearly) identical N and C tot values.In Fig. 4, we plot the E-index of each Nobelist and matched baseline pair.We find that the E-index of Nobelists usually exceeds that of their matches.Some exceptions correspond to Nobelists with a low number of highly cited papers.Other outliers might be prominent scholars who have not yet received the award but might receive it in the future.

IV. CONCLUSION
In this work, we searched for productivity patterns in excellent scientific careers.Specifically, we aimed to assess whether the output of high-profile scientists is more likely to be characterized by a low number of hit papers or by a consistent production of high-quality work.To address this question, we have examined the scientific portfolios of Nobel Prize winners in Physics, Chemistry, and Physiology or Medicine and checked which citationbased metrics are most suitable to recognize them among a much larger number of baseline scholars.In addition, we introduced two new metrics, the E-index and M α , that reward both consistency and high average impact
We found that the best-performing metrics are the ones that peak when citations are distributed among a considerable number of works rather than being concentrated on a few hit papers.The E-index, in particular, proves especially effective in identifying future Nobelists.A portal for the calculation of E-index and other scores of individual performance can be found at e-index.net.
While there are Nobelists whose success relied on isolated hit papers, the most successful scientists usually stayed on top of their game for most of their careers.

FIG. 1 .
FIG. 1.Total number of citations vs. total number of papers for Nobelists (purple dots) and baseline scientists (grey dots).

FIG. 3 .
FIG. 3. Classification performance of Mα for varying α.Different symbols denote different fields.The dashed line α = 1 separates the two regimes.We use the optimal values of α (α * ) in our analyses.

TABLE I .
Number of scientists in each category and field.

TABLE II .
Values of metrics for portfolios with N papers with Ctot citations, of which n are equally cited and N − n are uncited.

TABLE III .
AUC-PR values for the Full and Pre-award (PA) portfolio classification tasks.The best-performing metrics for each field are marked in boldface.Mα and E are the standout performers.Note that values across columns are not comparable as the baseline values are determined by the respective class imbalance ratios.

TABLE IV .
Top 20 baseline scholars with the largest E-index in each discipline.The ones marked in boldface received the Nobel Prize between 2018 and 2022.Some authors are assigned multiple labels, so they may appear in multiple lists.

TABLE V .
Count of Nobelists awarded in the period[2018,  2022]identified in the top 20 lists of various metrics.The numbers in parentheses indicate how many such Nobelists have a GS profile.