A recent analysis of scientific publication and patent citation networks by Park et al. (Nature, 2023) suggests that publications and patents are becoming less disruptive over time. Here we show that the reported decrease in disruptiveness is an artifact of systematic shifts in the structure of citation networks unrelated to innovation system capacity. Instead, the decline is attributable to “citation inflation,” an unavoidable characteristic of real citation networks that manifests as a systematic time-dependent bias and renders cross-temporal analysis challenging. One driver of citation inflation is the ever-increasing lengths of reference lists over time, which in turn increases the density of links in citation networks, and causes the disruption index to converge to 0. The impact of this systematic bias further stymies efforts to correlate disruption to other measures that are also time dependent, such as team size and citation counts. In order to demonstrate this fundamental measurement problem, we present three complementary lines of critique (deductive, empirical and computational modeling), and also make available an ensemble of synthetic citation networks that can be used to test alternative citation-based indices for systematic bias.

A measure of disruption was recently developed and applied to empirical citation networks (Funk & Owen-Smith, 2017; Park, Leahey, & Funk, 2023; Wu, Wang, & Evans, 2019). This bibliometric measure, denoted by CD, quantifies the degree to which an intellectual contribution p (e.g., a research publication or invention patent) supersedes the sources cited in its reference list, denoted by the set {r}p. As defined, CDp is measured according to the local structure of the subgraph Gp = {r}pp ∪ {c}p comprised of the focal node p, nodes belonging to its reference list {r}p, and the set of nodes citing either p or any member of {r}p, denoted by {c}p. If future intellectual contributions cite p but do not cite members of {r}p, then it is argued that p plays a disruptive role in the citation network. In what follows we highlight a critical issue underlying the definition of CD which calls into question the utility of this metric and the conclusions that have been drawn following its application to empirical citation networks (Park et al., 2023; Wu et al., 2019). Namely, as the length rp = ∣{r}p∣ of the reference list increases, so does the likelihood that one of those papers is highly cited. Consequently, CDp is a biased measure because reference lists have increased dramatically over time, and so too have the number of citations that highly cited papers accrue (Pan, Petersen et al., 2018)—both phenomena being by-products of citation inflation (Petersen, Pan et al., 2018b), a manifestation of secular growth that naturally extends to patent citation networks (Huang, Chen, & Zhang, 2020; Macher, Rutzer, & Weder, 2024).

Citation inflation (CI) refers to the systematic increase in the number of links introduced to the scientific (or patent) citation network each year. CI is analogous to monetary inflation (Orphanides, 2003; Orphanides & Solow, 1990), whereby as a government prints more money the sticker price of items tends to go up, rendering the impression that the real cost of goods is increasing (to what degree this relationship is valid depends on wage growth and a number of other economic factors). By analogy, it might also be tempting to attribute the increased volume of scientific production to techno-social productivity increases, yet this explanation neglects the persistent growth rate of the inputs (e.g., researchers and research investment) that are fundamental to the downstream production of outputs (e.g., research articles, patents).

Indeed, secular growth underlies various quantities relevant to the study of the scientific endeavor, from national expenditures in R&D to the population size of researchers (Petersen et al., 2018b) and the characteristic number of authors per research publication (Pavlidis, Petersen, & Semendeferi, 2014; Wuchty, Jones, & Uzzi, 2007)—all quantities that have persistently grown over the last century. Nevertheless, the degree to which such growth affects the quantitative evaluation of research outcomes is underappreciated and can manifest in inconsistent measurement designs and metrics. Indeed, the number of citations an article receives is not solely attributable to the novelty or prominence of the research but also depends on the the population size and citing norms of a discipline, and quite fundamentally, the nominal production rate of links in the citation network, among other considerations (Bornmann & Daniel, 2008). Hence, there is real need to distinguish nominal counts versus real values in scientific evaluation, which in the analysis of citation networks requires accounting for when each citation was produced, and in further extensions, how the credit is shared (Huang et al., 2020; Pavlidis et al., 2014; Petersen et al., 2018b; Petersen, Wang, & Stanley, 2010; Shen & Barabàsi, 2014).

So what are the main sources of CI in scientific citation networks and what are the real-world magnitudes of their effects? Figure 1(a) illustrates how CI arises through the combination of longer reference lists, denoted by r(t), compounded by an increasing production volume, denoted by n(t). By way of real-world example, prior calculation of the growth rate of total number of citations produced per year based upon the entire Clarivate Analytics Web of Science citation network estimated that the total volume C(t) ≈ n(t)r(t) of citations generated by the scientific literature grows exponentially, with annual rate gC = gn + gr = 0.033 + 0.018 = 0.051 (Pan et al., 2018). Hence, with the number of links in the citation network growing by roughly 5% annually, the total number of links in the citation network doubles every ln(2) = gC = 13.6 years!

Figure 1.

‘Citation inflation’ attributable to the increasing number and length of reference lists. (a) Schematic illustrating the inflation of the reference supply owing to the fact that the annual publication rate n(t) (comprised of increasing diversity of article lengths), along with the number of references per publication r(t), have grown exponentially over time t, which implies a nonstationary cross-generational flow of attribution in real citation networks. Such citation inflation cannot be controlled by way of fixed citation windows (Petersen et al., 2018b). (b) The probability density function Py(rp) of the number of references per article rp calculated for articles included in the MAG citation network grouped by the decade of publication y. Vertical dashed lines indicate the average value; vertical solid lines indicate the 90th percentile, such that only the 10% largest rp values are in excess of this value. (c) Conditional relationship between two quantities that systematically grow over time (the number of coauthors per article, kp, and rp). Note the increasing levels and slope of the relationship over the 50-year period.

Figure 1.

‘Citation inflation’ attributable to the increasing number and length of reference lists. (a) Schematic illustrating the inflation of the reference supply owing to the fact that the annual publication rate n(t) (comprised of increasing diversity of article lengths), along with the number of references per publication r(t), have grown exponentially over time t, which implies a nonstationary cross-generational flow of attribution in real citation networks. Such citation inflation cannot be controlled by way of fixed citation windows (Petersen et al., 2018b). (b) The probability density function Py(rp) of the number of references per article rp calculated for articles included in the MAG citation network grouped by the decade of publication y. Vertical dashed lines indicate the average value; vertical solid lines indicate the 90th percentile, such that only the 10% largest rp values are in excess of this value. (c) Conditional relationship between two quantities that systematically grow over time (the number of coauthors per article, kp, and rp). Note the increasing levels and slope of the relationship over the 50-year period.

Close modal

While the dominant contributor to CI is the growth of n(t) deriving from increased researchers and investment in science coupled with technological advances increasing the rate of manuscript production, the shift away from print towards online-only journals, and the advent of multidisciplinary megajournals (Petersen, 2019), the contribution to CI from growing reference lists alone is nevertheless substantial and varies by discipline (Abt & Garfield, 2002; Dai, Chen et al., 2021; Nicolaisen & Frandsen, 2021; Sánchez-Gil, Gorraiz, & Melero-Fuentes, 2018). By way of example, consider descriptive statistics based upon analysis of millions of research publications comprising the Microsoft Academic Graph (MAG) citation network (Sinha, Shen et al., 2015): in the 1960s, the average (± standard deviation) number of references per articles was rp¯ = 9 (±17); by the 2000s, rp¯ increased to 23 (±27), a 2.6-fold increase over the 50-year period—see Figure 1(b). This persistent growth in reference list lengths has even occurred in journals with tight space limitations that traditionally publish shorter research articles and letters, such as Nature (Petersen, Arroyave, & Pammolli, 2025).

Meanwhile, as research team sizes—denoted by kp, and used as a proxy for the production effort associated with a research output—increase in order to address research problems featuring greater topical and methodological breadth, there emerges a nonlinear relationship between rp¯ and kp showing that the modern research article is fundamentally different from those produced even a decade ago—see Figure 1(c). Thus, not only does the nominal value of a citation vary widely by era, but the implications of secular growth on the topology of the citation network and thus citation-based research evaluation are profound (Pan et al., 2018). A standard solution to taming variables that are susceptible to inflation is to use a deflator index, which amounts to normalizing the cross-temporal variation by way of standardized reference point (Petersen et al., 2010, 2018b). A more nefarious problem is the accurate measurement of the quantitative relationship between variables that are independently growing over time, which is susceptible to omitted variable bias if the role of time is neglected in the panel of covariate controls.

In what follows, we demonstrate how CI renders CD unsuitable for cross-temporal analysis using three different approaches. As such, we contribute to a growing number of critiques regarding the validity of CD and the interpretation of trends and correlations (Bentley, Valverde et al., 2023; Bornmann, Devarakonda et al., 2020; Holst, Algaba et al., 2024; Leibel & Bornmann, 2024; Macher et al., 2024; Petersen et al., 2025; Ruan, Lyu et al., 2021; Wu & Wu, 2019), which together call into question the results reported in two Nature cover articles (Park et al., 2023; Wu et al., 2019). To establish how the disruption index suffers from citation inflation and is confounded by shifts in scholarly citation practice, we follow three distinct lines of reasoning: deductive analysis based upon the definition of CDp, empirical analysis of the Microsoft Academic Graph (MAG) citation network, and computational modeling of synthetic citation networks. In the latter approach, we are able to fully control the sources of the systematic bias underlying CD (namely CI), thereby demonstrating that CD follows a stable frequency distribution in the absence of CI. We conclude with research evaluation and publishing policy implications.

The disruption index is a higher order network metric that incorporates information extending beyond the first-order links connecting to p—those nodes that cite p and are prospective (forward looking or diachronous), and those nodes that are referenced by p, and thus retrospective (backward looking or synchronous) (Glänzel, 2004; Nakamoto, 1988). The original definition of CD was formulated as a conditional sum across the adjacency matrix (Funk & Owen-Smith, 2017), and was subsequently reformulated as a ratio (Wu et al., 2019). According to the latter conceptualization, calculating CDp involves first identifying three non-overlapping subsets of citing nodes, {c}p = {c}i ∪ {c}j ∪ {c}k, of sizes Ni, Nj, and Nk, respectively—see Figure 2(a) for a schematic illustration.

Figure 2.

Empirical analysis of the disruption index. (a) Schematic of the disruption index calculation for two papers pa and pb that have the same reference list length, and differ only in the information contributing to Nk. In short, the disruption index CDp can be calculated by identifying three nonoverlapping subsets of {c}p = {c}i ∪ {c}j ∪ {c}k, of sizes Ni, Nj, and Nk, respectively. The subset i refers to members of {c}p that cite the focal p but do not cite any elements of {r}p, and thus measures the degree to which p disrupts the flow of attribution to foundational members of {r}p. The subset j refers to members of {c}p that cite both p and {r}p, measuring the degree of consolidation that manifests as triadic closure in the subnetwork (i.e., network triangles formed between p, {r}p, {c}j). The subset k refers to members of {c}p that cite {r}p but do not cite p. Hence, the publication pa represents a paper from the 1980s when half of all papers were only cited twice within their first 5 years postpublication however, pb represents a newer paper from the 2000s, when the median paper was cited nine times over the same period. The effect of this sole difference on the overall CDp is significant, and would be even greater if we were instead using examples for {c}k involving highly cited papers—which in the present day can readily reach hundreds of citations within 5 years of publication, causing CDp to converge to 0 rather easily; for the nominal citation counts corresponding to the median and highly cited (e.g., 99th percentile) papers over time, see Figure 4 in Pan et al. (2018). (b) Average disruption index, CD5(t) calculated using a 5-year citation window based upon 29.5 × 106 articles from the MAG data set from 1945–2012. (c) Average number of references per paper per year, r(t), which increased by a factor of 4 over the 6-year period shown. (d) Average extraneous citation rate, Rk(t) ≫ 1 that is central to the critique of CD, and derives from the increasing citation count of highly cited papers belonging to the reference list {r}p which systematically inflates the size of the extraneous set {c}k. (e) Rk(t) grows roughly proportional to r(t). (f) Marginal effects calculated with all other covariates held at their mean values, showing that CD5 is negatively correlated with the log of the number of references, ln rp. (g) CD5 is positively correlated with the log of the number of coauthors, ln kp.

Figure 2.

Empirical analysis of the disruption index. (a) Schematic of the disruption index calculation for two papers pa and pb that have the same reference list length, and differ only in the information contributing to Nk. In short, the disruption index CDp can be calculated by identifying three nonoverlapping subsets of {c}p = {c}i ∪ {c}j ∪ {c}k, of sizes Ni, Nj, and Nk, respectively. The subset i refers to members of {c}p that cite the focal p but do not cite any elements of {r}p, and thus measures the degree to which p disrupts the flow of attribution to foundational members of {r}p. The subset j refers to members of {c}p that cite both p and {r}p, measuring the degree of consolidation that manifests as triadic closure in the subnetwork (i.e., network triangles formed between p, {r}p, {c}j). The subset k refers to members of {c}p that cite {r}p but do not cite p. Hence, the publication pa represents a paper from the 1980s when half of all papers were only cited twice within their first 5 years postpublication however, pb represents a newer paper from the 2000s, when the median paper was cited nine times over the same period. The effect of this sole difference on the overall CDp is significant, and would be even greater if we were instead using examples for {c}k involving highly cited papers—which in the present day can readily reach hundreds of citations within 5 years of publication, causing CDp to converge to 0 rather easily; for the nominal citation counts corresponding to the median and highly cited (e.g., 99th percentile) papers over time, see Figure 4 in Pan et al. (2018). (b) Average disruption index, CD5(t) calculated using a 5-year citation window based upon 29.5 × 106 articles from the MAG data set from 1945–2012. (c) Average number of references per paper per year, r(t), which increased by a factor of 4 over the 6-year period shown. (d) Average extraneous citation rate, Rk(t) ≫ 1 that is central to the critique of CD, and derives from the increasing citation count of highly cited papers belonging to the reference list {r}p which systematically inflates the size of the extraneous set {c}k. (e) Rk(t) grows roughly proportional to r(t). (f) Marginal effects calculated with all other covariates held at their mean values, showing that CD5 is negatively correlated with the log of the number of references, ln rp. (g) CD5 is positively correlated with the log of the number of coauthors, ln kp.

Close modal
The subset i refers to members of {c}p that cite the focal p but do not cite any elements of {r}p, and thus measures the degree to which p disrupts the flow of attribution to foundational members of {r}p. The subset j refers to members of {c}p that cite both p and {r}p, measuring the degree of consolidation that manifests as triadic closure in the subnetwork (i.e., network triangles formed between p, {r}p, {c}j). The subset k refers to members of {c}p that cite {r}p but do not cite p. As such, the CD index is given by the ratio
(1)
which can be rearranged as follows:
(2)
The ratio Rk = Nk/(Ni + Nj) ∈ [0, ∞) is an extensive quantity that measures the rate of extraneous citation, whereas CDpnok ∈ [−1, 1] is an intensive quantity. The polarization measure CDpnok is an alternative definition of disruption that simply neglects Nk in the denominator (Bornmann et al., 2020); for this reason, characteristic values of CDpnok(t) are larger and decay more slowly over time then respective CDp(t) values—see (Park et al., 2023). Following initial criticism regarding the definition of CDp (Bornmann et al., 2020; Wu & Wu, 2019), other variations on the theme of CD have since been analyzed (Wu et al., 2019) and critiqued according to their advantages and disadvantages (Leydesdorff, Tekles, & Bornmann, 2021).

To summarize, we argue that a simple deductive explanation trumps the alternative socio-technical explanations offered (Kozlov, 2023; Park et al., 2023) for the decline in CD calculated for publications and patents. In short, we argue that the disruption index CDp systematically declines for the simple reason that CD features a numerator that is bounded and a denominator that is unbounded. As elaborated in the next section, the term Rk in the denominator is susceptible to CI, and continues to inflate according to two mechanisms: (a) It grows proportional to the reference list length; and (b) it is highly sensitive to highly cited papers, which in the present day can readily achieve hundreds of citations within 5 years, thereby causing Rk to explode and CD to converge to 0 over time.

In this section we show empirically that CDp declines over time due to the runaway growth of Rk(t) and, implicitly, r(t). While our results are based upon a single representation of the scientific citation network made openly available by the MAG project (Sinha et al., 2015), the implications are generalizable to other citation networks featuring CI, such as patent citation networks (Huang et al., 2020). The citation network we analyzed is formed from the roughly 29.5 × 106 million research articles in the MAG data set that have a digital object identifier (DOI), and were published between 1945 and 2012. To be consistent with Park et al. (2023) and Wu et al. (2019), we calculate CDp,CW(t) using a CW = 5-year citation window (CW), meaning that only articles published within 5 years of p are included in the subgraph {c}p = {c}i ∪ {c}j ∪ {c}k.

Figure 2(a) illustrates the subgraph used to calculate CDp for two hypothetical publications, a and b, which differ just in terms of {c}k. Comparing the two subgraphs for a and b, the former schematic represents a paper from the 1980s when the median paper was cited just twice in its first 5 years from publication; whereas the latter schematic represents a paper published in the 2000s and accruing nine citations in the same amount of time. This difference alone corresponds to a 35% reduction in CDb from CDa. This reduction is characteristic of the average paper in MAG, as indicated by Figure 2(b), where the average CD5(t) features a decline that is consistent with the overall trends reported in Figure 2 of Park et al. (2023). We note that the CD5(t) curves reported in Figure 2 and Figures 6 and 9 of Park et al. (2023) are calculated according to broad disciplines, and show that disciplines with higher publication volumes—namely, life sciences and biomedicine, and physical sciences—naturally produce more references, and also tend to have smaller CD5(t) values in any given year relative to the social sciences (e.g., JSTOR). A more definitive validation of this critique is provided by Macher et al. (2024), who found that many patent citations were omitted in the data set analyzed by Park et al. (2023). After accounting for the artificially reduced reference lists by incorporating the missing citations, Macher et al. (2024) find that CD5(t) decreases in the early years to the extent that the negative trend reported by Park et al. (2023) disappeared entirely. Moreover, upon correcting for this systematic measurement bias attributable to the omitted patent citations, they instead find that the number of highly disruptive IT technology patents has actually increased since 1980. In a similar vein, Bentley et al. (2023) develop a weighted version of CD5(t) to account for CI; after applying this modification to the scientific citation network, they report a significant positive trend in CD5(t) starting in the 1990s and extending until the end of their data sample in the early 2010s. Moreover, Holst et al. (2024) recently show how the substantial subset of publications with zero references tend to be biased towards earlier years of scientific publishing and patenting; however, by construction according to the definition of CD, these items are assigned the maximum disruption value of CD = 1. Similarly, uncited papers also generate the modal disruption value of CD = 0, yet these papers and patents also tend to be biased towards earlier years (Kozlowski, Andersen, & Larivière, 2024). After correcting for these data quality artifacts and the systematic bias this incurs by excluding the publication and patent subsamples with CD = 1, Holst et al. (2024) show that temporal trends in CD become negligible for both patents and publications.

We also note that while the implementation of a CW may control for right-censoring bias, it does not control CI in any precise way. By way of example, consider the impact of the CW on Nk, the number of extraneous articles that do not cite p but do cite elements of {r}p. A CW will reduce the number of papers contributing to CD5(t) via Nk, but it will also reduce Ni + Nj in similar proportions, leaving the ratio Rk(t) unchanged, on average. Consider a more quantitative explanation that starts by positing that Nk increases proportional to n(t)r(t), as the nodes belonging to {c}k are unconstrained by the first-order citation network {c}i ∪ {c}j ∪ {r}p. As such, the term Nk is susceptible to CI according to two distinct channels, both r(t) and n(t). Following the same logic, Ni + Nj grows proportional to n(t). In both cases, even if the proportionality constant depends weakly on CW, the ratio Rk(t) will thus grow proportional to r(t).

There is likely to be considerable variance in the publication-level relationship between Rk,p and rp, even comparing among publications with fixed rp, because if any member of {r}p is highly cited then Nk becomes skewed towards the heavy right tail of the citation distribution. This particular sensitivity of CDp is illustrated Figure 2(a), which makes the case using median citation values, as representing the case for highly cited papers would be challenging to visualize accurately. This is because the base number of citations associated with extreme values in the citation distribution have increased dramatically over the last half century as a result of CI, such that the number of citations C(Qt) corresponding to the Q = 99th percentile of the citation distribution has grown at an annual rate of roughly 2%—increasing from roughly 55 citations in 1965 to roughly 125 citations in 2005—see Figure 4 in Pan et al. (2018). In this work we focus on the impact of CI manifesting through the growth of r(t), which has increased historically at roughly the same rate as C(99∣t), from roughly nine to 23 references per paper over the same period—see Figure 2(c). Consequently, Rk(t) ≫ 1 for nearly the entire period of analysis, where the growth of Rk(t) is largely explained by the growth of r(t)—see Figures 2(d) and 2(e). For this reason, it is more accurate to describe CD as converging to 0 as opposed to decreasing over time.

In order to confirm these aggregate-level relationships at the publication level, we applied a linear regression model whereby the unit of analysis is an individual publication. The linear model specification is given by
(3)
which controls for secular growth by way of yearly fixed effects, denoted by Dt. The results of the ordinary least squares (OLS) estimation using the STATA 13.0 package xtreg with publication-year fixed effects are shown in Table 1, and are based upon 3 million publications with 1 ≤ kp ≤ 10 coauthors, 5 ≤ rp ≤ 50 references, and 10 ≤ cp ≤ 1000 citations that were published in the two-decade period 1990–2009. The independent variables are modeled using a logarithmic transform because they are each right-skewed. Note that ln cp = ln(ci + cj), the number of citations received by p in the 5-year window.
Table 1.

Multivariate regression with yearly fixed effects. Results of linear regression model implemented in STATA 13 for dependent variable CDp,5, controlling for rp and secular growth by way of yearly fixed effects. Publication years are within the 20-year range 1990–2009. Covariates are included following a logarithmic transform. Shown below each coefficient estimate is the standard error (in parentheses). The first three columns show partial models, and the fourth shows the full multivariate model

Dependent variable(1) CDp,5(2) CDp,5(3) CDp,5(Full model) CDp,5
Team size, ln kp 0.00188* (0.000670)     0.00394*** (0.000796) 
Reference list length, ln rp   −0.0248*** (0.00106)   −0.0253*** (0.00111) 
Citation impact, ln cp     −0.00227* (0.00102) 0.000667 (0.00119) 
Constant −0.0717*** (0.000849) 0.00806* (0.00331) −0.0624*** (0.00310) 0.00259 (0.00245) 
  
Year FEYYYY
N 3,008,422 3,008,422 3,008,422 3,008,422 
adj. R2 0.000 0.007 0.000 0.007 
Dependent variable(1) CDp,5(2) CDp,5(3) CDp,5(Full model) CDp,5
Team size, ln kp 0.00188* (0.000670)     0.00394*** (0.000796) 
Reference list length, ln rp   −0.0248*** (0.00106)   −0.0253*** (0.00111) 
Citation impact, ln cp     −0.00227* (0.00102) 0.000667 (0.00119) 
Constant −0.0717*** (0.000849) 0.00806* (0.00331) −0.0624*** (0.00310) 0.00259 (0.00245) 
  
Year FEYYYY
N 3,008,422 3,008,422 3,008,422 3,008,422 
adj. R2 0.000 0.007 0.000 0.007 

p-values in parentheses.

*

p < 0.05.

**

p < 0.01.

***

p < 0.001.

Figure 2(f) shows the marginal relationship between CDp,5 and ln rp, holding all covariates at their mean values. Results indicate a negative relationship between CDp,5 and the number of references, consistent with our deductive argument. In particular, we measure a net shift of roughly −0.06 units in CD as rp increases by a factor of 10 from five to 50 total references. As cross-validation, Leahey, Lee, and Funk (2023) also report a negative correlation between rp and CDp,5 resulting from a multivariate regression model for CDp,5 including a broader panel of covariates (see Table 3 of that study, which focuses on disruption trends in sociology research).

Similarly, Figure 2(g) shows the positive correlation between CDp,5 and ln kp, corresponding to a shift of roughly +0.01 units as kp increases by a factor of 10 from one to 10 coauthors. Despite this correlation representing a relatively small effect magnitude, it is nevertheless in stark contrast to the negative relationship reported by Wu et al. (2019). Interestingly, Leahey et al. (2023) also obtain a positive relationship between the number of coauthors and the disruption index calculated for both five and 10-year windows (see again Table 3 of that study). Considering the consistency between our result based upon large-scale data and their result based upon a narrow subfield, this would suggests that that descriptive results reported in Wu et al. (2019) are confounded by omitted variable bias. Moreover, the estimated coefficient for publication year in Leahey et al. (2023) is not statistically significant (i.e., it is indistinguishable from 0), which is inconsistent with the main result reported by the same authors in Park et al. (2023). Hence, as developed in the next section, these inconsistencies merit investigation by way of a mechanistic citation network model where confounding sources of variation can be fully controlled.

4.1. Generative Network Model Featuring Citation Inflation and Redirection

We employ computational modeling to explicitly control several fundamental sources of variation, and to also explore complementary mechanisms contributing to shifts in CD over time—namely, shifts in scholarly citation practice. Our identification strategy is to grow synthetic citation networks that are identical in growth trajectory and size, but differ just in the specification of (a) r(t) and/or (b) the rate of triadic closure denoted by β that controls the consolidation-disruption difference defining the numerator of CD.

We model the growth of a citation network using a model originally developed in Pan et al. (2018) that applies Monte Carlo (MC) simulation to operationalize stochastic link dynamics by way of a random number generator. This model belongs to the class of growth and redirection models (Barabàsi, 2016; Krapivsky & Redner, 2005), and reproduces a number of statistical regularities established for real citation networks—both structural (e.g., a log-normal citation distribution; Radicchi, Fortunato, & Castellano, 2008) and dynamical (e.g., increasing reference age with time (Pan et al., 2018); exponential citation life-cycle decay (Petersen, Fortunato et al., 2014)). The synthetic networks constructed and analyzed in what follows are available at the Dryad data repository and can be used to test CD and other citation-network based bibliometric measures for sensitivity to CI and other aspects of secular growth.

We construct each synthetic citation network by sequentially adding new layers of nodes of prescribed volume n(t) in each MC period t ≥ 0 representing a publication year. Each new node, denoted by the index a, represents a publication that could in principle cite any of the other existing nodes in the network. As such, the resulting synthetic networks are representative of a single scientific community, and also lack latent node-level variables identifying disciplines, authors, journals, topical breadth or depth, etc.

We seed the network with n(t = 0) ≡ 30 “primordial” nodes that are disconnected (i.e., they have reference lists of size ra ≡ 0). By ensuring that the initial conditions are the same for all networks generated, in combination with the relatively large system size being evolved, we are confident that the long-term evolution of the system is sufficiently independent of the initial conditions and finite-size effects (i.e., the networks being simulated are uniformly sampled from the representative space).

To minimize variation in intracohort connectivity, all nodes added within a specific cohort t have reference lists of a common prescribed size, denoted by r(t). To model the exponential growth of scientific production, we prescribe the number of new “publications” according to the exponential trend n(t) = n(0) exp[gnt]. We use gn ≡ 0.033 as the publication growth rate empirically derived in prior work (Pan et al., 2018). Similarly, we prescribe the number r(t) of synchronous (outgoing) links per new publication according to a second exponential trend r(t) = r(0) exp[grt]. For both n(t) and r(t) we use their integer part, and plot their growth in Figure 3(a). We set the initial condition r(0) ≡ 25 in scenarios featuring no reference list growth (characterized by gr = 0), such that each new publication cites 25 prior articles independent of t. Alternatively, in scenarios that do feature reference list CI, we use the empirical growth rate value, gr ≡ 0.018 and r(0) ≡ 5. We then sequentially add cohorts of n(t) publications to the network over t = 1, …, T ≡ 150 periods according to the following link-attachment (citation) rules (network growth rules) that capture the salient features of scholarly citation practice:

  1. System growth: In each period t, we introduce n(t) new publications, each citing r(t) other publications by way of a directed link. Hence, the total number of synchronous (backwards) citations produced in period t is C(t) = n(t)r(t), which grows exponentially at the rate gC = gn + gr.

  2. Link dynamics: Illustrated in the schematic Figure 3(b). For each new publication an(t):

    • (i) 

      Direct citation ab: Each new publication a starts by referencing one publication b from period tbta (where ta = t by definition). The publication b is selected proportional to its attractiveness, prescribed by the weight Pb, t ≡ (c× + cb,t)[n(tb)]α. The factor cb,t is the total number of citations received by b through the end of period t − 1, thereby representing preferential attachment (PA) link dynamics (Baràbasi, Jeong et al., 2002; Jeong, Neda, & Barabàsi, 2003; Peterson, Presse, & Dill, 2010; Redner, 2005; Simon, 1955). The factor n(tb) is the number of new publications introduced in cohort tb, and represents crowding out of old literature by new literature, net of the citation network. The parameter c× ≡ 6 is a citation offset controlling for the citation threshold, above which preferential attachment “turns on” (Petersen et al., 2014) such that a node becomes incrementally more attractive once cbc×.

    • (ii) 

      Redirection a → {r}b: Immediately after step (i), the new publication a then cites a random number x of the publications cited in the references list {r}b (of size rb) of publication b. By definition, β represents the fraction of citations following from this redirection mechanism, which is responsible for the rate of nonspurious triadic closure in the network. Hence, by construction β = λ/(λ + 1) ∈ [0, 1], where λ represents the average number of citations to elements of {r}b by publication a (such that the expected value of x is λ). Consequently, λ = β/(1 − β) is the frequency ratio for citations following the “redirection” mechanism (ii) to citations following the “direct” mechanism (i).

      We operationalize the stochastic probability of selecting x references according to the binomial distribution,
      (4)
      with success rate q = λ/rb to ensure that 〈x〉 = λ. Put another way, on average, the total number of new citations per periods that follow from the redirection citation mechanism (ii) is r(ii)(t) = βr(t).

      Once x is determined by way of a random number generator, we then select xBinomial(rb,q) members from the set {r}b (i.e., without replacement). Each publication belonging to {r}b is selected according to the same weights Pp,t used in step (i). As such, this second-stage PA also prioritizes more recent elements of {r}b (i.e., those items with larger tp), in addition to more highly cited elements of {r}b. Note that we do not allow a to cite any given element of {r}b more than once within its reference list.

    • (iii) 

      Stop citing after reaching r(t): The referencing process alternates between mechanisms (i) and (ii) until publication a has cited exactly r(t) publications.

  3. Repeat step 2 (Link dynamics) for each new publication entering in period t.

  4. Update the PA weights, Pp,t, for all existing nodes at the end of each t.

  5. Perform steps (1–4) for t = 1, …, T periods and then exit the network growth algorithm.

Figure 3.

Numerical simulation of growing citation networks elucidates roles of citation inflation and strategic citation practice. (a) Model system evolved over T = 150 periods (representing years), using growth parameters estimated for the entire Clarivate Analytics Web of Science citation network (Pan et al., 2018). (b) Schematic of the citation model comprised of two citing mechanisms: (a) direct citations, and (b) redirected citations made via the reference list {r}b of an intermediate item b. Type (b) references give rise to triadic closure corresponding to the Nj factor in CDp. (c) The rate of Type (b) references is controlled by the parameter β(t), which quantifies the fraction of links in the citation network directly following this “consolidation” mechanism (Funk & Owen-Smith, 2017; Park et al., 2023), which yields more negative CDp values. To disentangle the roles of citation inflation (owing to gr > 0) from shifts in scholarly citation practice (owing to ∂tβ(t) > 0), we compare four scenarios: Scenarios (1, 2) (gray and black curves) feature no citation inflation (gr = 0); (2, 3) compare β(t) = 0 and β(t) = t/400; and (3, 4) (cyan and blue curves) compare the effects of different citation windows (CW). (d) Each curve is the average CDCW(t) calculated for a single synthetic network. (e) Average CD5nok(t). (f) Rk(t) is the average rate of extraneous citations, which increases as either r(t) or CW increase. (Inset) High linear correlation between r(t) and Rk(t) shows that the decreasing trend in CD(t) is largely attributable to citation inflation. (g) The average value of Nij(t) = Ni + Nj (which defines the denominator of CDnok) also systematically increases, and so neglecting the term Nk does not solve the fundamental issue of CI.

Figure 3.

Numerical simulation of growing citation networks elucidates roles of citation inflation and strategic citation practice. (a) Model system evolved over T = 150 periods (representing years), using growth parameters estimated for the entire Clarivate Analytics Web of Science citation network (Pan et al., 2018). (b) Schematic of the citation model comprised of two citing mechanisms: (a) direct citations, and (b) redirected citations made via the reference list {r}b of an intermediate item b. Type (b) references give rise to triadic closure corresponding to the Nj factor in CDp. (c) The rate of Type (b) references is controlled by the parameter β(t), which quantifies the fraction of links in the citation network directly following this “consolidation” mechanism (Funk & Owen-Smith, 2017; Park et al., 2023), which yields more negative CDp values. To disentangle the roles of citation inflation (owing to gr > 0) from shifts in scholarly citation practice (owing to ∂tβ(t) > 0), we compare four scenarios: Scenarios (1, 2) (gray and black curves) feature no citation inflation (gr = 0); (2, 3) compare β(t) = 0 and β(t) = t/400; and (3, 4) (cyan and blue curves) compare the effects of different citation windows (CW). (d) Each curve is the average CDCW(t) calculated for a single synthetic network. (e) Average CD5nok(t). (f) Rk(t) is the average rate of extraneous citations, which increases as either r(t) or CW increase. (Inset) High linear correlation between r(t) and Rk(t) shows that the decreasing trend in CD(t) is largely attributable to citation inflation. (g) The average value of Nij(t) = Ni + Nj (which defines the denominator of CDnok) also systematically increases, and so neglecting the term Nk does not solve the fundamental issue of CI.

Close modal

4.2. Computational Simulation Results

In this section we present the results of a generative citation network model (Pan et al., 2018) that incorporates latent features of secular growth and two complementary citation mechanisms illustrated in Figure 3(b), namely, (a) direct citation from a new publication a to publication b; and (b) redirected citations from a to a random number of publications from the reference list of b. The redirection mechanism (b) gives rise to triadic closure in the network, thereby capturing shifts in correlated citation practice—such as the increased ease with which scholars can follow a citation trail with the advent of web-based hyperlinks, as well as self-citation. This redirection is the dominant contributor to “consolidation,” which is measured by Nj in CDp. We explicitly control the rate of (b) with a tunable parameter β ∈ [0, 1] that determines the fraction of links in the citation network resulting from mechanism (b). And to simulate the net effect of β, we construct some networks featuring a constant β(t) = 0 and other networks featuring an increasing β(t) ≡ t/400 such that β(t = 150) = 0.375 corresponding to roughly 1/3 of links arising from mechanism (b) by the end of the simulation—see Figure 3(c).

We construct ensembles of synthetic networks according to six growth scenarios that incrementally add or terminate either of two citation mechanisms: gr = 0 corresponds to no CI; and β = 0 corresponds to no triadic closure (i.e., no “consolidation”). More specifically, the parameters distinguishing the six scenarios analyzed in what follows are

  1. No CI (gr = 0 with r(t) = 25); and no explicit redirection mechanism that controls triadic closure (β = 0);

  2. No CI (gr = 0 with r(t) = 25); and an increasing redirection rate, β(t) = t/400 such that β(150) = 0.375;

  3. CI implemented using the empirical value (gr = 0.018) with r(0) = 5; and increasing redirection rate, β(t) = t/400;

  4. Same as (3) but calculated using a larger citation window;

  5. Same as (3) but reference list capped at r(t) = 25 for tT* ≡ 92; and

  6. Same as (4) but reference list capped at r(t) = 25 for tT*.

For each scenario we constructed four distinct synthetic citation networks, each evolving over t ∈ [1, T = 150] periods (i.e., years) from a common initial condition at t = 0. For scenarios (1–3) we calculate CDp using a citation window of CW = 5 periods, whereas in (4) we use CW = 10 periods. Scenarios (3) and (4) are shown in order to demonsrate the nonlinear sensitivity of CDCW to the CW parameter (Bornmann & Tekles, 2019), and in particular underscoring that fixed CWs do not address CI (Petersen et al., 2018b).

Figure 3(d) shows 16 average CD(t) curves calculated for each synthetic network. Because the sources of network variation are strictly limited to the stochastic link dynamics, there is relatively little variance across each ensemble of networks constructed using the same scenario parameters, so in what follows we show all realizations simultaneously. As there are no latent institution, author or other innovation covariates, the difference between network ensembles is attributable to either CI or the redirection mechanism.

We start by considering scenarios (1, 2) for which gr = 0, which show that CD5(t) systematically increases in the absence of reference list CI. While scenario (1) does capture CI attributable to increased publication volume (gn > 0), it does not appear to be sufficient to induce a negative trend in CD5(t). Scenario (2) features an increasing β(t), which results in larger CD values because redirected citations tend to fall outside shorter CW and thus are not incorporated into the CD subgraph. Summarily, comparison of (1) and (2) indicates that the redirection mechanism capturing shifting patterns of scholarly citation behavior is the weaker of the two mechanisms we analyzed.

The comparison of scenarios (2, 3) illustrates the role of CI. Notably, scenario (3) reproduces both the magnitude and rate of the decreasing trend in CD(t) observed for real citation networks (Park et al., 2023). Figure 3(e) shows an alternative metric CD5nok proposed by Bornmann et al. (2020), which also matches the empirical trends reported by Park et al. (2023). These results demonstrate the acute effect of reference list CI on CD, as the only difference between scenarios (2) and (3) pertains to gr.

Figure 3(f) reproduces the linear relationship between r(t) and Rk(t) and confirms the empirical relationship shown in Figure 2(e)—thereby solving the mystery regarding the origins of the decreasing disruptiveness over time (Kozlov, 2023): As the size of the reference list {r}p increases, so does the likelihood that {r}p contains a highly cited paper, which increases Nk to such a degree that Rk,p ≫ 1 and so CDp → 0, independent of the relative differences between disruption and consolidation captured by NiNj. Figure 3(g) shows that even CD5nok suffers from systematic bias affecting its denominator, so neglecting the term Nk does not solve the fundamental issue of CI.

Scenarios (3, 4) reveal the effect of CW, which controls the size of the set {c}p and thus the magnitude and growth rate of Rk(t). Notably, the number of items included in {c}p depends on both CW and t because the reference age between the cited and citing article increases with time (Pan et al., 2018). Regardless, the average CDCW(t) → 0 as r(t) increases, independent of CW.

Figure 4 further explores the implications of CI on CD by modeling a hypothetical scenario in which CI is suddenly “turned off” after a particular intervention time period T*. In this way, scenarios (5, 6) explore the implications of a restrictive publishing policy whereby all journals suddenly agree to impose caps on reference list lengths. Scenarios (5, 6) enforce this hypothetical policy at tT* ≡ 92 by way of a piecewise smooth r(t) curve such that: r(t) = r(0) exp[grt] for t < T* and r(t) = r(T*) = 25—see Figure 4(a). This hypothetical intervention exhibits the potential for the scientific community to temper the effects of CI by way of strategic publishing policy. For completeness, scenarios (3) and (5) use CW = 5 and scenarios (4) and (6) use CW = 10.

Figure 4.

Hypothetical publishing policy intervention reveals effect of capped reference list lengths on CD. (a) Evolution of network size in scenarios (5, 6) where the number of references per paper is capped at r(tT*) = 25 after T* = 92, such that the growth in the total citations produced per year depends solely on the growth of n(t). (b) Average CDCW(t) for scenarios (3)–(6). Immediately after T* = 92 the CD(t) trends for intervention scenarios (5, 6) reverse from decreasing to increasing. (c) The divergence in CD(t) trends is attributable to the taming of CI which stabilizes Rk(t). (d) The frequency distribution Pt(CD5) aggregated over 10-period intervals indicated by the color gradient; vertical dashed lines indicate distribution mean. (e) The stability of the Pt(CD5) distribution after T* suggests that quantitative properties of the Extreme Value (Fisher-Tippett) distribution could be used to develop time-invariant disruption measures; orange curves represent the best-fit Fisher-Tippett distribution model.

Figure 4.

Hypothetical publishing policy intervention reveals effect of capped reference list lengths on CD. (a) Evolution of network size in scenarios (5, 6) where the number of references per paper is capped at r(tT*) = 25 after T* = 92, such that the growth in the total citations produced per year depends solely on the growth of n(t). (b) Average CDCW(t) for scenarios (3)–(6). Immediately after T* = 92 the CD(t) trends for intervention scenarios (5, 6) reverse from decreasing to increasing. (c) The divergence in CD(t) trends is attributable to the taming of CI which stabilizes Rk(t). (d) The frequency distribution Pt(CD5) aggregated over 10-period intervals indicated by the color gradient; vertical dashed lines indicate distribution mean. (e) The stability of the Pt(CD5) distribution after T* suggests that quantitative properties of the Extreme Value (Fisher-Tippett) distribution could be used to develop time-invariant disruption measures; orange curves represent the best-fit Fisher-Tippett distribution model.

Close modal

Figures 4(b, c) verify that the average CD(t) and Rk(t) trajectories for each pair of scenarios are indistinguishable prior to T*. Yet immediately after T*, scenarios (5) and (6) diverge from (3) and (4), respectively. Notably, the average CD5(t) in scenarios (5) and (6) reverse to the point of slowly increasing, thereby matching the trends observed for scenario (2). In the spirit of completeness, Figure 4(c) confirms that this trend-reversal is due to the relationship between r(t) and Rk(t). The shifts in the average CD5(t) are indeed representative of the entire distribution of CDp,5 values—see Figures 4(d, e). Interestingly, the distribution Pt(CD5) converges to a stable Extreme-Value (Fisher-Tippett) distribution in the absence of reference list growth, which exposes candidate avenues for developing time-invariant measures of disruption by rescaling values according to the location and scale parameters. The feasibility of this approach was previously demonstrated in an effort to develop field-normalized (Radicchi et al., 2008) and time-invariant (z-score) citation metrics (Petersen, Ahmed, & Pavlidis, 2021; Petersen, Majeti et al., 2018a).

In general, the benchmark for bibliometric analysis, and more generally in econometric analysis, is establishing that the measurement framework is not biased by confounders or other forms of systematic bias, so that results reflect phenomenological trends and not methodological bias. In practice, this is traditionally achieved by way of demonstrating that the metrics being used are stationary over time, such that the first moments (mean and standard deviation) of the metric distribution are time-independent. In multivariate regression methodologies, this can be approximated by including temporal dummy variables (fixed effects) that control for time variation in the first moment of the dependent variable. To promote a better understanding of the implications of citation inflation in temporal network analysis, we have provided the full ensemble of synthetic networks analyzed here, which can be used to test alternative citation-based indices for temporal systematic biases emerging from the growth dynamics of the network—see the Data availability statement for the weblink.

In summary, despite the reasonable logic behind the definition of CD, we show that its numerator (which captures the difference between disruptive and consolidating links, NiNj) is systematically susceptible to becoming overwhelmed by the extensive quantity Rkr(t) appearing in the denominator. We further contextualized this deductive critique by showing that the CD index artificially decreases over time due to citation inflation deriving from ever-increasing r(t), rendering CD systematically biased and unsuitable for cross-temporal analysis. This conclusion is further supported by two recent methodological critiques of (Park et al., 2023), one focusing on patents (Macher et al., 2024) and the other on scientific publications (Bentley et al., 2023). The former corrects for omitted patent citations, whereas the latter defines a weighted variant of CD5(t) that more appropriately accounts for CI. Following their proposed modifications, both studies report an increasing rate of disruption in recent decades, likely owing to the techno-informatic revolution that has transformed the scientific endeavor—and is manifest in bibliometric studies using quantities and model designs that are resistant to CI (Petersen, 2022; Petersen et al. 2018a, 2021; Yang, Pavlidis, & Petersen, 2023).

For the same reasons that central banks must design monetary policy to avoid the ill effects of printing excess money (Orphanides, 2003; Orphanides & Solow, 1990), researchers analyzing scientific trends should be wary of citation-network bibliometrics that are not stable with respect to time. Scenarios where achievement metrics are nonstationary and thus systematically biased by nominal inflation are common, including researcher evaluation (Petersen et al., 2010), journal impact factors (Althouse, West et al., 2009), and even achievement metrics in professional sports (Petersen & Penner, 2020; Petersen, Penner, & Stanley, 2011). Yet our results do not point to a straightforward way to address this methodological issue with the existing formulation of the disruption index. First, as illustrated in Figures 1(a) and 2(a), the impact of exponentially growing r(t) and n(t) is nonlinearly incorporated into the definition of CD, so there is no easy way to address this bias at the metric level. Second, while Figure 4 suggests that there is a stable distribution of CD, these results are conditioned on publication cohorts with fixed reference list lengths. Yet in general, publications from even the same year feature a broad frequency distribution P(rp,t), which explains why the unconditional CD distribution is asymmetric and highly skewed, and far from normal—see Wu et al. (2019). Hence, distribution-level normalization (also referred to as detrending and standardization) is not likely to be straightforward, because the Extreme Value (Fisher-Tippett) distribution that emerges in Figure 4 belongs to the double-exponential class of distribution functions.

In addition to the measurement error induced by CI, the disruption index also does not account for confounding shifts in scholarly citation practice. The counterbalance to disruption, captured by the term Ni in Eq. 1, is consolidation (Nj), which is fundamentally a measure of triadic closure in the subgraph Gp. While triangles may spuriously occur in a random network, their frequency in real networks is well in excess of random base rates due to the correlated phenomena underlying the scholastic practice—in particular, the strategic (personal and social) character of scholarly citing behavior that is increasingly preconditioned by implicit and explicit bias embedded within the digital platforms that mediate the search and recommendation of scientific literature (Helbing & Abeillon, 2023).

The source and implications of citation inflation are not inherently undesirable, and if anything point to thriving industry emerging from the scientific endeavor. The advent of online-only journals is a main reason for the steady increase in r(t), as they are not limited by volume print capacity, unlike more traditional print journals. In the present era of megajournals (Petersen, 2019), there may be a tendency to cite more liberally than in the past. Another mechanism connecting CI and citation behavior derives from the academic profession becoming increasingly dominated by quantitative evaluation, which thereby promotes the inclusion of strategic references dispersed among the core set of references directly supporting the research background and findings (Abramo, D’Angelo, & Grilli, 2021). Notably, scholars have identified various classes of self-citation (Ioannidis, 2015), which generally emerge in order to benefit either the authors (Fowler & Aksnes, 2007; Ioannidis, Baas et al., 2019; Pinheiro, Durning, & Campbell, 2022), institutional collectives (Qiu, Steinwender, & Azoulay, 2024; Tang, Shapira, & Youtie, 2015), the handling editor (Petersen, 2019), and/or the journal (Ioannidis & Thombs, 2019; Martin, 2016)—but are otherwise difficult to differentiate from “normal” citations. Regardless of their intent, these self-citations are more likely to contribute to triadic closure because if article b cites c as a result of self-citation, then for the same reason a new article a that cites b (or c) is that much more likely to complete the triangle on principle alone.

These two issues—citation inflation and shifting scholarly behavior—introduce systematic bias in citation-based research evaluation that extends over significant periods. Indeed, time is a fundamental confounder, so to address this statistical challenge various methods introducing time-invariant citation metrics have been developed (Petersen et al., 2014, 2018a, 2018b, 2021; Radicchi et al., 2008). A broader issue occurs when different variables simultaneously shift over time, such as the number of coauthors, topical breadth, and depth of individual articles, which makes establishing causal channels between any two variables ever more challenging. By way of example, we analyzed the relationship between CDp and kp, using a regression model with fixed effects for publication year to superficially control for secular growth, and observe a positive relationship between these two quantities, in stark contrast to the negative relationship reported by Wu et al. (2019).

We conclude with a policy insight emerging from our analysis regarding interventional approaches to addressing citation inflation. Namely, journals might consider capping reference lists commensurate with the different types of articles they publish (e.g., letters, articles, reviews). An alternative that is more flexible would be to impose a soft cap based upon the average number of references per article page (Abt & Garfield, 2002). The results of our computational simulations indicate that such a policy could readily temper the effects of citation inflation in research evaluation, and might simultaneously address other shortcomings associated with self-citations by effectively increasing their cost.

We thank the anonymous reviewers for their timely and astute comments that helped us to improve the report.

Alexander M. Petersen: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing—original draft, Writing—review & editing. Felber Arroyave: Conceptualization, Writing—review & editing. Fabio Pammolli: Conceptualization, Methodology, Supervision, Writing—review & editing.

The authors have no competing interests.

AMP acknowledges financial support from a Hellman Fellow award that was critical to completing this project.

All synthetic citation networks analyzed are openly available at the Dryad data repository: https://doi.org/10.6071/M3G674. The pseudocode for the citation network growth is sufficient to generate additional citation networks with different parameters.

Abramo
,
G.
,
D’Angelo
,
C. A.
, &
Grilli
,
L.
(
2021
).
The effects of citation-based research evaluation schemes on self-citation behavior
.
Journal of Informetrics
,
15
(
4
),
101204
.
Abt
,
H. A.
, &
Garfield
,
E.
(
2002
).
Is the relationship between numbers of references and paper lengths the same for all sciences?
Journal of the American Society for Information Science and Technology
,
53
(
13
),
1106
1112
.
Althouse
,
B. M.
,
West
,
J. D.
,
Bergstrom
,
C. T.
, &
Bergstrom
,
T.
(
2009
).
Differences in impact factor across fields and over time
.
JASIST
,
60
(
1
),
27
34
.
Barabàsi
,
A.
(
2016
).
Network science
.
Cambridge
:
Cambridge University Press
.
Barabàsi
,
A.-L.
,
Jeong
,
H.
,
Neda
,
Z.
,
Ravasz
,
E.
,
Schubert
,
A.
, &
Vicsek
,
T.
(
2002
).
Evolution of the social network of scientific collaborations
.
Physica A
,
311
(
3–4
),
590
614
.
Bentley
,
R. A.
,
Valverde
,
S.
,
Borycz
,
J.
,
Vidiella
,
B.
,
Horne
,
B. D.
, …
O’Brien
,
M. J.
(
2023
).
Is disruption decreasing, or is it accelerating?
Advances in Complex Systems
,
26
(
2
),
23500066
.
Bornmann
,
L.
, &
Daniel
,
H.-D.
(
2008
).
What do citation counts measure? A review of studies on citing behavior
.
Journal of Documentation
,
64
(
1
),
45
80
.
Bornmann
,
L.
,
Devarakonda
,
S.
,
Tekles
,
A.
, &
Chacko
,
G.
(
2020
).
Are disruption index indicators convergently valid? The comparison of several indicator variants with assessments by peers
.
Quantitative Science Studies
,
1
(
3
),
1242
1259
.
Bornmann
,
L.
, &
Tekles
,
A.
(
2019
).
Disruption index depends on length of citation window
.
El Profesional de la Información (EPI)
,
28
(
2
),
e280207
.
Dai
,
C.
,
Chen
,
Q.
,
Wan
,
T.
,
Liu
,
F.
,
Gong
,
Y.
, &
Wang
,
Q.
(
2021
).
Literary runaway: Increasingly more references cited per academic research article from 1980 to 2019
.
PLOS ONE
,
16
(
8
),
e0255849
. ,
[PubMed]
Fowler
,
J.
, &
Aksnes
,
D.
(
2007
).
Does self-citation pay?
Scientometrics
,
72
,
427
437
.
Funk
,
R. J.
, &
Owen-Smith
,
J.
(
2017
).
A dynamic network measure of technological change
.
Management Science
,
63
(
3
),
791
817
.
Glänzel
,
W.
(
2004
).
Towards a model for diachronous and synchronous citation analyses
.
Scientometrics
,
60
,
511
522
.
Helbing
,
D.
, &
Abeillon
,
F.
(
2023
).
Manipulated attention: Do digital platforms promote bias in science?
ResearchGate
. https://www.researchgate.net/publication/369800457_Manipulated_Attention_Do_Digital_Platforms_Promote_Bias_in_Science
Holst
,
V.
,
Algaba
,
A.
,
Tori
,
F.
,
Wenmackers
,
S.
, &
Ginis
,
V.
(
2024
).
Dataset artefacts are the hidden drivers of the declining disruptiveness in science
.
arXiv
.
Huang
,
Y.
,
Chen
,
L.
, &
Zhang
,
L.
(
2020
).
Patent citation inflation: The phenomenon, its measurement, and relative indicators to temper its effects
.
Journal of Informetrics
,
14
(
2
),
101015
.
Ioannidis
,
J. P.
(
2015
).
A generalized view of self-citation: Direct, co-author, collaborative, and coercive induced self-citation
.
Journal of Psychosomatic Research
,
78
(
1
),
7
11
. ,
[PubMed]
Ioannidis
,
J. P.
,
Baas
,
J.
,
Klavans
,
R.
, &
Boyack
,
K. W.
(
2019
).
A standardized citation metrics author database annotated for scientific field
.
PLOS Biology
,
17
(
8
),
e3000384
. ,
[PubMed]
Ioannidis
,
J. P.
, &
Thombs
,
B. D.
(
2019
).
A user’s guide to inflated and manipulated impact factors
.
European Journal of Clinical Investigation
,
49
(
9
),
e13151
. ,
[PubMed]
Jeong
,
H.
,
Neda
,
Z.
, &
Barabàsi
,
A. L.
(
2003
).
Measuring preferential attachment in evolving networks
.
Europhysics Letters
,
61
(
4
),
567
.
Kozlov
,
M.
(
2023
).
‘Disruptive’ science has declined—and no one knows why
.
Nature
,
613
(
7943
),
225
. ,
[PubMed]
Kozlowski
,
D.
,
Andersen
,
J. P.
, &
Larivière
,
V.
(
2024
).
The decrease in uncited articles and its effect on the concentration of citations
.
Journal of the Association for Information Science and Technology
,
75
(
2
),
188
197
.
Krapivsky
,
P. L.
, &
Redner
,
S.
(
2005
).
Network growth by copying
.
Physical Review E
,
71
(
3
),
036118
. ,
[PubMed]
Leahey
,
E.
,
Lee
,
J.
, &
Funk
,
R. J.
(
2023
).
What types of novelty are most disruptive?
American Sociological Review
,
88
,
562
597
.
Leibel
,
C.
, &
Bornmann
,
L.
(
2024
).
What do we know about the disruption indicator in scientometrics? An overview of the literature
.
Scientometrics
,
129
,
601
639
.
Leydesdorff
,
L.
,
Tekles
,
A.
, &
Bornmann
,
L.
(
2021
).
A proposal to revise the disruption index
.
El Profesional de la Información (EPI)
,
30
(
1
),
e300121
.
Macher
,
J. T.
,
Rutzer
,
C.
, &
Weder
,
R.
(
2024
).
Is there a secular decline in disruptive patents? Correcting for measurement bias
.
Research Policy
,
53
(
5
),
104992
.
Martin
,
B. R.
(
2016
).
Editors’ JIF-boosting stratagems—Which are appropriate and which not?
Research Policy
,
45
(
1
),
1
7
.
Nakamoto
,
H.
(
1988
).
Synchronous and diachronous citation distributions
. In
L.
Egghe
&
R.
Rousseau
(Eds.),
Informetrics 87/88: Select Proceedings of the 1st International Conference on Bibliometrics and Theoretical Aspects of Information Retrieval
(pp.
157
163
).
New York
:
Elsevier
.
Nicolaisen
,
J.
, &
Frandsen
,
T. F.
(
2021
).
Number of references: A large-scale study of interval ratios
.
Scientometrics
,
126
,
259
285
.
Orphanides
,
A.
(
2003
).
The quest for prosperity without inflation
.
Journal of Monetary Economics
,
50
(
3
),
633
663
.
Orphanides
,
A.
, &
Solow
,
R. M.
(
1990
).
Money, inflation and growth
. In
B. M.
Friedman
&
F. H.
Hahn
(Eds.),
Handbook of monetary economics
(
Vol. 1
, pp.
223
261
).
Elsevier
.
Pan
,
R. K.
,
Petersen
,
A. M.
,
Pammolli
,
F.
, &
Fortunato
,
S.
(
2018
).
The memory of science: Inflation, myopia, and the knowledge network
.
Journal of Informetrics
,
12
(
3
),
656
678
.
Park
,
M.
,
Leahey
,
E.
, &
Funk
,
R. J.
(
2023
).
Papers and patents are becoming less disruptive over time
.
Nature
,
613
,
138
144
. ,
[PubMed]
Pavlidis
,
I.
,
Petersen
,
A. M.
, &
Semendeferi
,
I.
(
2014
).
Together we stand
.
Nature Physics
,
10
,
700
702
.
Petersen
,
A. M.
(
2019
).
Megajournal mismanagement: Manuscript decision bias and anomalous editor activity at PLOS ONE
.
Journal of Informetrics
,
13
(
4
),
100974
.
Petersen
,
A. M.
(
2022
).
Evolution of biomedical innovation quantified via billions of distinct article-level MeSH keyword combinations
.
Advances in Complex Systems
,
25
,
2150016
.
Petersen
,
A. M.
,
Ahmed
,
M. E.
, &
Pavlidis
,
I.
(
2021
).
Grand challenges and emergent modes of convergence science
.
Humanities and Social Sciences Communications
,
8
,
194
.
Petersen
,
A. M.
,
Arroyave
,
F.
, &
Pammolli
,
F.
(
2025
).
The disruption index suffers from citation inflation: Re-analysis of temporal CD trend and relationship with team size reveal discrepancies
.
Journal of Informetrics
,
19
(
1
),
101605
.
Petersen
,
A. M.
,
Fortunato
,
S.
,
Pan
,
R. K.
,
Kaski
,
K.
,
Penner
,
O.
, …
Pammolli
,
F.
(
2014
).
Reputation and impact in academic careers
.
Proceedings of the National Academy of Sciences of the USA
,
111
(
43
),
15316
15321
. ,
[PubMed]
Petersen
,
A. M.
,
Majeti
,
D.
,
Kwon
,
K.
,
Ahmed
,
M. E.
, &
Pavlidis
,
I.
(
2018a
).
Cross-disciplinary evolution of the genomics revolution
.
Science Advances
,
4
(
8
),
eaat4211
. ,
[PubMed]
Petersen
,
A. M.
,
Pan
,
R. K.
,
Pammolli
,
F.
, &
Fortunato
,
S.
(
2018b
).
Methods to account for citation inflation in research evaluation
.
Research Policy
,
48
(
7
),
1855
1865
.
Petersen
,
A. M.
, &
Penner
,
O.
(
2020
).
Renormalizing individual performance metrics for cultural heritage management of sports records
.
Chaos, Solitons & Fractals
,
136
,
109821
.
Petersen
,
A. M.
,
Penner
,
O.
, &
Stanley
,
H. E.
(
2011
).
Methods for detrending success metrics to account for inflationary and deflationary factors
.
European Physical Journal B
,
79
,
67
78
.
Petersen
,
A. M.
,
Wang
,
F.
, &
Stanley
,
H. E.
(
2010
).
Methods for measuring the citations and productivity of scientists across time and discipline
.
Physical Review E
,
81
(
3
),
036114
. ,
[PubMed]
Peterson
,
G. J.
,
Presse
,
S.
, &
Dill
,
K. A.
(
2010
).
Nonuniversal power law scaling in the probability distribution of scientific citations
.
Proceedings of the National Academy of Sciences of the USA
,
107
(
37
),
16023
16027
. ,
[PubMed]
Pinheiro
,
H.
,
Durning
,
M.
, &
Campbell
,
D.
(
2022
).
Do women undertake interdisciplinary research more than men, and do self-citations bias observed differences?
Quantitative Science Studies
,
3
(
2
),
363
392
.
Qiu
,
S.
,
Steinwender
,
C.
, &
Azoulay
,
P.
(
2024
).
Paper tiger? Chinese science and home bias in citations
.
Working Paper, Working Paper Series 32468. National Bureau of Economic Research
.
Radicchi
,
F.
,
Fortunato
,
S.
, &
Castellano
,
C.
(
2008
).
Universality of citation distributions: Toward an objective measure of scientific impact
.
Proceedings of the National Academy of Sciences of the USA
,
105
(
45
),
17268
17272
. ,
[PubMed]
Redner
,
S.
(
2005
).
Citation statistics from 110 years of Physical Review
.
Physics Today
,
58
(
6
),
49
54
.
Ruan
,
X.
,
Lyu
,
D.
,
Gong
,
K.
,
Cheng
,
Y.
, &
Li
,
J.
(
2021
).
Rethinking the disruption index as a measure of scientific and technological advances
.
Technological Forecasting and Social Change
,
172
,
121071
.
Sánchez-Gil
,
S.
,
Gorraiz
,
J.
, &
Melero-Fuentes
,
D.
(
2018
).
Reference density trends in the major disciplines
.
Journal of Informetrics
,
12
(
1
),
42
58
.
Shen
,
H.-W.
, &
Barabàsi
,
A.-L.
(
2014
).
Collective credit allocation in science
.
Proceedings of the National Academy of Sciences of the USA
,
111
(
34
),
12325
12330
. ,
[PubMed]
Simon
,
H. A.
(
1955
).
On a class of skew distribution functions
.
Biometrika
,
42
(
4
),
425
440
.
Sinha
,
A.
,
Shen
,
Z.
,
Song
,
Y.
,
Ma
,
H.
,
Eide
,
D.
, …
Wang
,
K.
(
2015
).
An overview of Microsoft Academic Service (MAS) and applications
. In
Proceedings of the 24th International Conference on World Wide Web
(pp.
243
246
).
Tang
,
L.
,
Shapira
,
P.
, &
Youtie
,
J.
(
2015
).
Is there a clubbing effect underlying Chinese research citation Increases?
Journal of the Association for Information Science and Technology
,
66
(
9
),
1923
1932
.
Wu
,
L.
,
Wang
,
D.
, &
Evans
,
J. A.
(
2019
).
Large teams develop and small teams disrupt science and technology
.
Nature
,
566
(
7744
),
378
382
. ,
[PubMed]
Wu
,
S.
, &
Wu
,
Q.
(
2019
).
A confusing definition of disruption
.
SocArXiv
.
Wuchty
,
S.
,
Jones
,
B. F.
, &
Uzzi
,
B.
(
2007
).
The increasing dominance of teams in production of knowledge
.
Science
,
316
(
5827
),
1036
1039
. ,
[PubMed]
Yang
,
D.
,
Pavlidis
,
I.
, &
Petersen
,
A. M.
(
2023
).
Biomedical convergence facilitated by the emergence of technological and informatic capabilities
.
Advances in Complex Systems
,
26
,
2350003
.

Author notes

Handling Editor: Vincent Larivière

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.