The most commonly used publication metrics for individual researchers are the total number of publications, the total number of citations, and Hirsch’s h-index. Each of these is cumulative, and hence increases throughout a researcher’s career, making it less suitable for evaluation of junior researchers or assessing recent impact. Most other author-level measures in the literature share this cumulative property. By contrast, we aim to study noncumulative measures that answer the question “In terms of citation impact, what have you done lately?” We single out six measures from the rather sparse literature, including Hirsch’s m-index, a time-scaled version of the h-index. We introduce new measures based on the idea of “citation acceleration.” After presenting several axioms for noncumulative measures, we conclude that one of our new measures has much better theoretical justification. We present a small-scale study of its performance on real data and conclude that it shows substantial promise for future use.
Despite strong opinion to the contrary among researchers, it is deemed necessary by bureaucrats worldwide to use simple measures of researcher impact. Measures based on research publications (mostly research monographs and peer-reviewed articles) are heavily used, the most common being the cumulative number of citations N(t), cumulative number of papers P(t), and the h-index h(t) (Hirsch, 2005) (defined as the greatest integer h such that the author has at least h papers each of which has at least h citations). All three quantities above are biased toward senior scholars, being cumulative and therefore automatically increasing over time, even after the end of the researcher’s career. Also, they provide information on overall career citation impact, but no answer to “What have you done lately?” For many purposes it is not particularly useful to know the h-index of Isaac Newton or total number of citations of Albert Einstein. Comparing researchers near the start of their careers, comparing them with more senior researchers, or trying to predict the future productivity and impact of a researcher clearly require different metrics.
Citation metrics that are not automatically increasing have received much less discussion in the literature. For example, the survey (Wildgaard, Schneider, & Larsen, 2014) of 108 author-level metrics contains at most 15 that are not automatically increasing and which attempt to measure time-varying performance. The earlier survey (Bornmann, Mutz, et al., 2011) of variants of the h-index included only six variants that attempted to adjust for career age, out of 37 indicators. Of course, nonincreasing measures intended to account for career age have been covered by some researchers. Indeed, Hirsch in his original paper (Hirsch, 2005) devoted substantial analysis to the rate of growth of h with the number of years t since the author’s first publication, and defined the m-index by m(t) = h(t)/t. He calculated m for a selection of physicists (using a single fixed year, presumably 2005) and concluded that a value of m around 1, 2, 3 correlated with his judgment of distinguishing between “successful scientist,” “outstanding scientist,” and “truly unique” individual respectively. The m-index was immediately studied by others (Burrell, 2007; Liang, 2006) but has been relatively little explored since. Another measure based on the h-index and attempting to measure recent performance is the contemporary h-index (Sidiropoulos, Katsaros, & Manolopoulos, 2007). Other measures based on the h-index and attempting to adjust for career age include the AR-index (Jin, Liang, et al., 2007), the square root of the sum over papers of citations per year, restricted to publications in the h-core (the minimal set needed for computation of the h-index). The literature contains fewer nonincreasing measures not involving the h-index. One could of course also look at the analog of the AR-index where all publications are considered. We stop here, conscious that we must draw a line somewhere—given the axiomatic approach to be taken below, it already seems clear that most of the above measures will fail many of the axioms.
1.1. Our Contribution
We argue that the “instantaneous rate of accumulation of citations owing to recent work” is the relevant measure of recent citation productivity. We claim that this is precisely the “citation acceleration,” the second time-derivative of the number of citations accumulated by an author. As this quantity is not directly observable, we explore measures aimed at approximating it.
We single out six existing measures from the literature that are noncumulative and intended to adjust for career age. These are
- • Hirsch’s (Hirsch, 2005) m-index defined by
- • Mannella and Rossi’s (2013) measure
The contemporary h-index hc(t) (Sidiropoulos et al., 2007), defined below.
The trend h-index ht(t) (Sidiropoulos et al., 2007), defined below.
The age-weighted citation rate A(t), defined below.
The average number of citations per year μ(t), defined below.
In Section 3 we evaluate all nine citation measures against axiomatic criteria. The difference between “theoretical” and “empirical” work in bibliometrics has been well described by Waltman and Van Eck (2012). Our approach here is grounded in a theoretical analysis. The axiomatic approach to bibliometric indicators has been applied to the h-index by several authors following the initial paper (Woeginger, 2008)—we single out Quesada (2011) and Bouyssou and Marchant (2013). To our knowledge, axiomatics have been applied to few other indicators. We single out Waltman and van Eck (2009) and Bouyssou and Marchant (2014), which gives axiomatic characterizations of several well-known indicators such as N and P.
To check that we have not strayed too far from reality, we evaluate the performance of the theoretically best measures W and W5 on several small data sets of mathematical researchers. The results, as shown in Section 4, are promising with respect to predictive value and correlation with expert judgment.
1.2. Definition of Remaining Measures
The age-weighted citation rateA(t) is obtained by summing over all publications the average number of citations per year, evaluated at time t. The measure μ(t) is simply the average number of citations per year, evaluated at time t.
2. BACKGROUND AND DEFINITIONS
Consider a researcher emitting research publications starting from time t = 0. These publications accumulate citations at a certain rate, dependent both on such factors as the number of publications, the size of the research field, the citation practices of the field, and the attractiveness of the papers to other researchers. As mentioned above, the most commonly used metrics are
P(t), the number of publications up to time t;
N(t), the number of citations up to time t; and
h(t), the (Hirsch) h-index at time t.
Note that these each increase in t, even after the end of the researcher’s career.
2.1. The Simple Citation Model
Our main idea is that the quantity pc measures the instantaneous accumulation of citations from new work. This “acceleration” is a key measure of recent productivity and citation impact. For a given research field, larger values indicate researchers with greater recent impact.
We also consider the measure W5, obtained by fitting quadratics to N through successive windows of five data points (separated by 1-year intervals) and approximating the second derivative on each. This is an example of a Savitzky-Golay filter, widely used to smooth discrete data of this type. In fact W5 is the simplest such filter, and we could define W2k+1 analogously for k > 2 by using larger numbers of points.
So is W5, because the filter used is exact for quadratics and hence reproduces the (constant) second derivative.
Thus in the simple model several measures are constant. However, not all constants are equally valid. The units of acceleration should be citation/(year)2, and indeed w, W, W5 have these units. However the units of m and α1 are inconsistent with this requirement.
The value of μ(t) is clearly pct/2. The age-weighted citation measure A(t) equals pct in this model, because in Eq. 1 we divide the integrand by s. The contemporary h-index and trend h-index are harder to compute, but are also nonconstant unless δ = 1 (we use the method shown above for the h-index, and omit the details here).
2.2. A Model Incorporating Retirement
Direct computation shows that the measure W takes the value zero provided T ≥ t + 2, so that sufficient time has elapsed for measurements to be taken. However, the other measures above take on nonzero values when t > T. We see that w has the value pcT(2t − T)/t2. For example if t = 2T, so the total time elapsed after retirement is as long as the researcher’s entire career, w has reduced by only 25%, to 3pc/4, from its previous constant value pc during the career.
Hirsch’s original argument shows that the h-index has the same value pct/(p + c) in the second model until time t = T(1 + p/c) as it had in the simple model, but that after this time it has value pT/t. Interestingly, this is independent of c. The measure α1 takes the value when t = 2T. We do not compute the details of the contemporary and trend h-index here, because we can see enough to rule them out axiomatically below.
It is of course possible to explore more complicated models, but the focus of this article is the introduction of a new measure with axiomatic justification, so we now proceed to that.
As mentioned above, the number of possible measures is enormous, but without axiomatic foundations, it seems pointless to study them in detail. We now present several axioms for measures intended to describe noncumulative citation impact. All except the last seem to us to be uncontroversial.
Computability: The measure should be easily computable from citation counts, paper counts, and academic age.
Units: The units of the measure should be (citation)/(time)2.
Locality: If no citations are gained during a time interval, the measure is zero during that interval.
Constancy: In the first model, the measure is constant.
End of career: In the second model, the measure is zero for t > T.
Packaging-independence: The measure should not depend on P: It is computable only from citation counts and academic age.
The Packaging-independence axiom requires more explanation. We argue that the impact of a researcher with 10 papers each attracting 100 citations is the same as if all 10 papers had been combined into a book that receives 1,000 citations. The packaging into publications may in practice affect the number of citations in a more complicated way, but if publications have no overlap and citations reflect intellectual influence only, this should not occur. Of course, this is also a strong argument against using the h-index.
The Locality axiom may need to be interpreted slightly differently when time is discrete. For example, the measure W satisfies this axiom provided the point chosen is 2 years past the left endpoint of the interval in question.
Table 1 shows the performance of the abovementioned measures against these axiomatic criteria. Clearly, W and W5 perform much better than the others, and we consider only these measures in the next section. Note that hc requires specification of two free parameters before we can even compute it, and we know of no principled way to do that, hence the failure of the Computability axiom. The other axioms were evaluated for arbitrary γ and δ, and give the same answer for all choices of these constants (with an exception for the constancy axiom, as noted above).
|Axiom .||Measure .|
|w .||W .||W5 .||m .||α1 .||hc .||ht .||A .||μ .|
|End of career||✗||✓||✓||✗||✗||✗||✗||✗||✗|
|Axiom .||Measure .|
|w .||W .||W5 .||m .||α1 .||hc .||ht .||A .||μ .|
|End of career||✗||✓||✓||✗||✗||✗||✗||✗||✗|
Based on the discussion above, we move on to consider only W(t) and W5(t), which seem the most promising measures.
4. CASE STUDY OF MATHEMATICAL RESEARCHERS
We now discuss the performance of the measures on real data.
The experiments described below were carried out in February 2020. All raw data and computed data are publicly available (Wilson, 2020).
We concentrated for this article on the research area most familiar to us, namely mathematics. We compiled several data sets based on specific sets of researchers. The lack of availability of this data in an open format, or even in a proprietary one that allows for comprehensive analysis, is a major factor in hampering studies of this type. We resorted to making many time-intensive manual web-based queries. To test our hypotheses, for each researcher we require the total number of citations to his or her works in each calendar year. We used Web of Science Core Collection, which we chose because of its availability and reasonably wide coverage.
We extracted four sets of mathematicians and categorized them. The first set used consisted of all Abel Prize winners still living at May 1, 2018 (with the exception of two for whom author name disambiguation was too difficult) and included 17 authors. The second consisted of 10 mathematicians from a single department (University of Massachusetts Amherst, Mathematics & Statistics; UMass) with interests in a common subfield (algebraic geometry). The third data set consisted of 10 authors generated “randomly” from MathSciNet (we chose authors of the most recent papers in algebraic geometry according to MathSciNet). The fourth consisted of all living winners of the Fields Medal from 2006 to 2018 inclusive, and consisted of 13 authors. This gave 50 mathematical researchers. For each researcher we took year 1 to be the first year t for which N(t) ≥ 10.
Larger data sets would give more confidence in the results below, but they are clear enough to show that the axiomatically well-founded measures W and W5 measure something about a researcher that enables us to distinguish between randomly chosen, successful and outstanding researchers.
4.2.1. Variation of Measures
In Figure 1 we graph, for three Abel and three Fields prizewinners, W(t) and W5(t) for t from 5 years after career start (defined as the first year for which N exceeds 10). As can be seen, there is substantial variation over time for each author. In Table 2 we give the mean and standard variation of the values of the measures over the same time period.
|.||mean(W) .||sd(W) .||mean(W5) .||sd(W5) .|
|.||mean(W) .||sd(W) .||mean(W5) .||sd(W5) .|
4.2.2. Predictive Value of Measures
As explained by Penner, Pan, et al. (2013), cumulative increasing measures such as the h-index contain intrinsic autocorrelation, which vastly overstates their predictive power. They find that the actual ability of the h-index to predict future citations from future publications is rather low. In our case, we are dealing with noncumulative measures, whose predictive power is not so clear. The results of Section 4.2.1 show considerable variation in the year-to-year values of W(t) and W5(t), so the idea of the simple model, that these measures are constant and hence precisely determine something intrinsic to the researcher, is not plausible. We do not expect to be able to predict the value of W(t + 5), for example, from W(t) only. However, the results in Section 4.2.3 show that gross distinctions between researchers at different levels of impact can be made (when dealing with researchers in the same fairly narrowly defined field), and these seem to mean something.
To obtain a better idea of predictive power, for the union of our data sets we computed the mean in years 3–5 of career of W, and used this to attempt to predict the mean in years 6–8 of the same measure. The ordinary least squares linear regression results are displayed in Figure 2 (note that the extreme outlier Terence Tao was removed from the data set, as was the very young Peter Scholze, leaving 48 researchers). The value R2 = 0.748 shows a high level of predictive power. Note that the definition of W means that we are trying to predict a linear combination of N(8), N(7), N(5), and N(4) from a linear combination (with the same coefficients) of N(5), N(4), N(2), and N(1), and there is no reason a priori to expect this to have such a high coefficient of determination.
4.2.3. Relation to Expert Judgment
We expect that the citation acceleration for randomly chosen authors should be lower overall than that of the UMass researchers. Also, we expect the Fields data set and Abel data set to have higher values of the measures than the UMass data set. Given the age of the members of the Abel data set, we expect a small advantage to the Fields data set. All our expectations are borne out by the results shown in Figure 3.
Before summarizing our main points, we acknowledge that the entire subject of bibliometrics is suspect in the eyes of many researchers, because of its perceived misapplication by unimaginative bureaucrats having influence over researcher careers. We share these concerns, and the present article was motivated by the idea that if we researchers are forced to be evaluated by simple metrics, we can at least have some agency in their design. The popularity of the h-index, for example, is very mysterious to us, given its weak theoretical foundations (Waltman & Van Eck, 2012). Any one measure can be strategically manipulated, but we hope that by use of sufficiently many metrics with low theoretical correlation, incentives for researchers to act in ways that are not helpful to science overall will be reduced.
All of the citation measures in the literature are susceptible to many problems (including missing data, author name disambiguation, negative citations, contributions of multiple authors, citation inflation owing to growth in number of researchers). Also, there is the problem of normalization across different fields (for example, one citation in mathematics corresponds roughly to 19 in physics and 78 in biomedicine [Podlubny, 2005]). This question of “field” is a difficult one—it is obvious that certain areas of mathematics have communities of different sizes, leading to substantial variation in the number of citations across areas. Thus, as usual, there is no completely automated substitute for human judgment.
5.2. Positive Outcomes
The index W(t) introduced in this article seems to measure something specific to a researcher that is related to his or her recent productivity and impact, and seems promising as a way to make coarse distinctions between researchers in the same field who may be at different career stages. It behaves well with respect to natural axioms. It seems fairly well correlated with subjective measures of research impact or quality. It is less sensitive to the way in which ideas are packaged into individual publications, and considerably easier to compute, than the m-index (under the assumption that splitting a paper splits the citations in the obvious way, the m index discourages extreme “salami-slicing,” whereas W is indifferent to it and P encourages it).
Insofar as citation metrics are to be increasingly used for evaluation of researchers and especially for allocation of resources to them, the W-index provides another useful (perhaps the single most useful found so far) measure of recent publication activity leading to citation impact, and one that has decent predictive value.
5.3. Future Work
We are grateful to an anonymous referee for informing us about two papers that deal with the issue of discrete data in bibliometrics (Liu & Rousseau, 2012, 2014). The idea used in those papers, to approximate (in their case the citations to a single research article) by a continuous cubic spline, could also be used in the situation of the present paper. For example, we could approximate N(t) by a cubic spline and then estimate the second derivative by taking the second derivative of the spline. The measure so derived would satisfy all our axioms. We do not expect substantially different results from using interpolation rather than filtering as we have done, but it may be worth further investigation.
Bouyssou and Marchant (2014) state that their paper explicitly does not deal with any indicators intended to adjust for career age, and the last part of the paper suggests further work in such a direction. We offer the present work as an initial contribution, and intend to follow up. A stream of research initiated by Woeginger (2008) deals with axiomatic characterizations of the h-index—that is, a set of axioms that, taken together, uniquely determine the h-index. Our experience with characterization theorems (and also impossibility theorems where “too many” axioms are chosen, a prominent approach in social choice theory, for example) is that very often the axiom systems consist of a few innocuous assumptions and one that is much less intuitive and essentially encodes the desired result. Nevertheless, it would be interesting to obtain an axiomatic characterization of our measure W, for example.
The relation of Eq. 4 derived from the simple model has wider validity than might be expected at first sight. Mannella and Rossi (2013) find via a study of 1,400 Italian physicists that this quadratic relationship holds well on real data, and empirically find the best fit value of β = 0.53 in Eq. 4, agreeing with the rough calculations of Hirsch based on a smaller data set of physicists. Yong (2014) showed analytically, based on theory of random partitions, that a very good estimate should be β = ln 2/π ≈ 0.54. He also demonstrated the accuracy of this approximation on a small data set of prominent mathematicians.
To concentrate on the main concept of citation acceleration, we have omitted more subtle issues, such as weights for coauthored papers and normalization by the size of the research field. These of course could be explored.
Mark Wilson: Conceptualization, Formal analysis, Investigation, Methodology, Project administration, Supervision, Writing—original draft, Writing—review & editing. Zhou Tang: Formal analysis, Software, Visualization.
The authors have no competing interests.
This project received no funding.
Data are available from Harvard Dataverse.
We thank Hooman Alavizadeh for preliminary discussions and Thierry Marchant for feedback on a draft of this article. We thank the editor and referees of this journal for their constructive comments on the initial submission.
Handling Editor: Ludo Waltman