This article aims to improve our understanding of scientometric data in a Benfordian context. Recently, Benford’s law has been used to detect scientific fraud. However, we need to better understand its application to scientometric data. Through the implementation of Benford’s law and the generalized Benford’s law, we propose a categorization of science products and metrics. To this end, we have performed chi-square, MAD, and Max tests on data sets from WoS and Scopus as well as on historical data. This enables us to better understand the behavior and characteristics of these objects in a Benfordian context, and invites us to discuss the nature of bibliometric indicators in this particular context.

Benford’s law was first demonstrated by the astronomer Simon Newcomb (1881), who noticed the nonuniform deterioration of pages of logarithmic tables. However, it was the engineer and physicist Frank Benford (1938) who made it famous in an article published 57 years later. This stipulates that for certain collections of numbers, the first significant digit (FSD) does not follow the uniform law on 1, 2, …, 9. More precisely, for a set of numbers where d denotes the first digit of each number, then P(d) is the probability that the first digit is equal to d:
(1)
or expressed with Naperian logarithms:
(2)
The values expressed as a percentage are:

This law is surprising, as one would expect a uniform distribution of the different digits. Numerous publications have given different explanations for this unexpected regularity: Gauvrit and Delahaye (2008) recall the well-known fact (cf. for example, Diaconis, 1977) that the uniformity of the logarithmic mantissa log X of a strictly positive real random variable X is equivalent to Benford’s law on X. They propose a sufficient condition for a random distribution to generate numbers that satisfy this law: If a statistical distribution concentrates a significant number of low and high values and has a certain regularity, it roughly conforms to this law.

The work of Delahaye (2012) and Kafri (2023) shows similarities between Zipf’s law and Benford’s law. The latter shows that under certain conditions (Eq. 2) is obtained by applying the Riemann sum to Zipf’s law (Kafri, 2023).

1.1. Evolution of Benford’s Law

In 1976, Raimi wrote an article on Benford’s law called The First Digit Problem, in which he presented a first state of the art of the work (Raimi, 1976, p. 521). He presents 37 papers in chronological order with the aim of retaining only the papers: “deliberately omitting only those references to the problem which make no attempt to add to its understanding … among authors of which are mathematicians, statisticians, economists, engineers, physicists and amateurs.” He also proposed 15 additional bibliographical references which “[do] not refer explicitly to the problem.”

In his book, Nigrini returned to the works identified around Benford’s law (Nigrini, 2012, p. 293). In 2000, he identified 200 articles on Benford’s law. Eleven years later, he had identified 750 papers on the subject. He continued the reasoning of Raimi (1976) by suggesting a difference between statistics which deals with real data and mathematics. He pointed out that Benford’s law is interested in both aspects. That it motivates the studies of both of these fields. He proposed a classification of the works in eight points with three main categories: papers that prove Benford’s law; those that have an approach to mathematical phenomena; and those that consider Benford’s law as a test, particularly in terms of fraud.

The applicability of Benford’s law is very interesting. This law is verified in many collections of numbers enumerating objects of various origins: the number of inhabitants of cities, the distances between stars, the lengths of rivers, the prices in supermarkets, and the citations of journals in a database. Not only does Benford’s law cover many areas, but it is regularly used and cited in different fields. A quick study of the Web of Science (WoS) among the 50 most cited articles dealing with Benford’s law shows that the fields concerned are, of course, fraud (whether in the financial or electoral field (Mebane, 2011; Tam Cho & Gaines, 2007), but also on electrical networks (Wei, Sundararajan et al., 2017); scientific data (Diekmann, 2007; Geyer & Williamson, 2004; Judge & Schechter, 2009); astronomy (Alexopoulos & Leontsinis, 2014); atomic physics (Pain, 2008) and quantum physics (Rane, Mishra et al., 2014); hydrology (Nigrini & Miller, 2007); image processing (Fu, Shi, & Su, 2007; Perez-Gonzalez, Heileman, & Abdallah, 2007); natural sciences (Sambridge, Tkalčić, & Jackson, 2010); and, more recently, in epidemiology with COVID-19 (Lee, Han, & Jeong, 2020).

1.2. Integrity of Academic Research

Scientific integrity and trust are the cornerstones of scientific research, because of their relationship with constantly evolving knowledge. In recent years, Benford’s law has become a tool for detecting scientific fraud (Eckhartt & Ruxton, 2023). Reproducibility in academic research has long been a persistent problem, in contradiction with one of the fundamental principles of science. The growing number of false claims found in academic manuscripts is worrying. This goes against the very nature of science and calls into question the reproducibility of academic research. Lazebnik and Gorlitsky (2023)’s work has determined the rate of manipulation in academic research. Furthermore, Schumm, Crawford et al. (2023) offer an approach designed to detect issues in social science surveys when dealing with small sample sizes. An application of this law to medical data has been proposed by Hein, Schuepfer, and Konrad (2011). More recently, the work of Gupta, Singh, and Banshal (2023) proposes that Benford’s law can be used to define a framework for assessing the quality of altmetric data.

1.3. Benford Applied to Scientometrics

There are fewer studies in the field of scientometrics, but the theoretical developments and experiments are just as relevant. Over the last decade, several studies have been carried out on data from WoS and Scopus. One focuses on the number of articles, citations, and impact factor of WoS from 1998 to 2007 (Campanario & Coslado, 2011). A second study aims to compare this law on data from WoS and Scopus (Alves, Yanasse, & Soma, 2014).

Benford’s law is generalized for the other digits. The work of Alves, Yanasse, and Soma (2016). focuses on Benford’s law for the second digit.

The work of Hürlimann (2015b) discussed parametric extensions of Benford’s law. For this purpose, they used the mean absolute deviation (MAD), test developed by Nigrini (2012) for their experiment. All of these studies tend to show that the data in these databases, which are at the source naturally produced by researchers, agree in verifying this law.

1.4. The Problem Addressed Is the Understanding of Bibliometric Data

While Benford’s law is well known and studied, there remains a question mark over the nature of the data studied in the field of scientometrics. Before experimenting with Benford’s law on bibliometric data for the purposes of fraud detection or the regulation of science, it is necessary to understand the nature of these indicators in a Benfordian context. The research question concerns the nature of scientometric data.

Indeed, we have found that previous studies of Benford’s law on scientometric data have not explained its behavior in relation to the nature of the scientometric information used. On page 431 of Campanario’s article, the authors tested the distribution on articles, citations and impact factor (Campanario & Coslado, 2011). Their conclusion is as follows: “We have no explanation for these differences.”

Egghe’s work based on Campanario’s data stated in the conclusion that “We consider this to be an interesting discovery, but we have no informetric explanation for it” (Egghe & Guns, 2012, pp. 1663, 1665). Alves’ work, published in 2014 following Campanario and Egghe, did not offer a discussion of the nature of the objects studied, but wondered about anomalies in the data sets and the ability to detect them (Alves et al., 2014). Despite these previous works, there is still a lack of understanding when it comes to reading Benfordian distributions.

Therefore, we proposed a typology of the scientometric objects studied, based on the different approaches to a Benfordian distribution. Consequently, in this article we have constructed a new data set extracted from WoS from 1997 to 2019 and from Scopus from 1999 to 2019. We have also taken into account data from the work of Campanario and Coslado (2011) and Alves et al. (2014). This led us to a corpus of 181 distributions from different scientometric objects. The data set we used consists of unaggregated data. We have made available all the distributions analyzed. They are both included in the Supplementary material and available for download via Zenodo (Bertin & Lafouge, 2024).

The article is structured as follows: After recalling Benford’s law and its generalized form in Section 2, Section 3 describes the construction of the scientometrics data set from WoS, Scopus, and historical data.

Next, we present the tests used in this work. Two well-known tests were employed: the MAD test, developed specifically for this law, and the classic chi-square test. Furthermore, a third test, called Max, is proposed, which represents a compromise between the macro and micro tests.

Section 4 presents the results obtained and a discussion that proposes an analysis at the micro and macro levels. The results lead us to categorize the scientometric data in our study and to propose a typology of distributions based on the results. The distinction between macro and micro data and the tests used allow us to hypothesize about the classes that define the nature of the scientometric information used. It allows us to provide some answers about the nature of the scientometric information used and to better understand the Benfordian phenomena in a scientometric perspective.

Section 5 concludes the study with a summary of this work and of the main results obtained.

2.1. Zipfian Form

A probability law g written in the form given below (rank frequency) is called Zipfian form in this article if
(3)
where r is the rank, g(r) the corresponding frequency of a source of rank r, B a constant, β a positive exponent, and T the number of sources. Therefore, in the following we use Equation 3 in its continuous form: g(r) is then a density function and r varies continuously on the interval [1..10]. The source and item nomenclatures are used to describe the information production process of informetric systems (Egghe, 2005). For instance, we have reviews (sources) producing citations (items) and words (sources) producing occurrences of words (items) (Zipf’s law).

In this formalism Benford’s law can be stated as follows: A digit d = 1, …, 9 (sources) produces the numbers (items) whose first digit is d. This scheme is the same in Zipf’s law. In this case the number of sources is finite. This scheme will allow us to modify how the law is written and, consequently, to offer a proposal for generalization.

2.2. Benford’s Law in Zipfian Form

By definition: If we assume that d is continuous, then Benford’s law can also be written as
(4)
with constant B checking:
(5)
The Zipfian form proposed by Egghe of Benford’s law (Egghe, 2011) is
(6)

This form led Egghe to generalize this law.

2.3. Statement of the Generalized Benford’s Law

By introducing the exponent β of Zipf’s law, supposed to be different from 1, we naturally generalize Equation 6. It is necessary to calculate a normalization coefficient A:
(7)
which gives
Equation 4 allows us to define the generalized Benford’s law (Egghe & Guns, 2012), denoted Pg:
(8)
Equation 8 is applied as follows:
(9)
Equation 9 is not defined if β = 1. If we pose 1 − β = x
We have ax = 1 + x ln a + O(x) so the limit is

The limit of the generalized law when β → 1 is thus equal to the original Benford’s law (Eq. 1). This unproven result in Egghe’s article reinforces this generalization in our view.

Figure 1 shows two examples of the law for β = 0.6 and β = 2.2. When β tends to 1 the values are close to the original law. We can demonstrate the inequality:
(10)
Figure 1.

Variation of Benford’s law as a function of the coefficient β.

Figure 1.

Variation of Benford’s law as a function of the coefficient β.

Close modal
We pose for x ≠ 0, 1 − β = x
(11)
So
Let
g′(x) < 0, x > 0; g is decreasing we can conclude because limx→0Pg(1) = ln2ln10.

This generalized law can be perceived through the scale invariance of power laws (Raimi, 1976, p. 529; Pietronero, Tosatti et al., 2001).

The construction of the data set is based both on historical data, for the purposes of validation and reproducibility of this experiment, and also on WoS and Scopus data in order to have a set of scientometric objects such as the impact factor, h-index, number of journals, references, citations, and articles over different periods. In order to respond to the problem posed, namely the understanding of bibliometric objects in a Benfordian context, we will observe the behavior of these scientometric objects during the application of the tests. The latter invites us to reflect on the nature of scientometric objects through discussion.

3.1. Construction of the Scientometrics Data Set

For the construction of the data set that will be used for our experimentation, we retained the data set used by Alves et al. (2014). This is a compilation of the Campanario and Coslado (2011) and Alves et al. (2014) data sets used by Hürlimann (2015a) and Egghe and Guns (2012). The data set produced, contains the historical data but also the new data set we produced from WoS and Scopus.

To create the new data set to test Benford’s law, we collected for each WoS journal between 1997 and 2019 the impact factor and the total number of citations. For each Scopus journal between 1999 and 2019, we collected the h-index, the number of cumulative citations over three years, the number of references over the year, and the number of articles over the year. We also collected the ratio of the number of references to the number of articles. The ratio is a bibliometric indicator, as are the h-index and the impact factor.

For these 181 distributions of WoS and Scopus, we calculate the percentage of the first digit. The results are shown in Tables S1S10 in the Supplementary material. It was necessary to change the scale to process the 23 distributions of impact factor: We multiply all these values by 10,000. For example, the Biologia journal whose impact factor was 0.159 in 2000 will now has a value of 1,590. Benford’s law has been tested on data extracted from the WoS and Scopus databases.

3.2. Chi-Square, MAD and Max Tests

Our adjustment takes place in two stages. First, we compare the observed distribution of the first significant digit, denoted Po(d) with that of Benford’s law, denoted P(d) (see Eq. 1). Second, we search for an optimal β for the generalized Benford’s law. Because this law depends on the single parameter β, we test all possible cases by varying β over the interval [0.5..1.5] in steps of 0.01. At each step we calculate the corresponding χ2 metric Dβ.

The optimal β is then chosen when Dβ is minimal with an error range of 0.01.

The values are expressed in percentages in all the tables. To measure the adequacy of the two theoretical distributions with the 181 observed distributions we use three tests.

The first test is the classical distance of (Eq. 12) between observed and theoretical numbers. For the calculation of the distance of χ2 we use the number of journals. If N designates the number of journals involved in this study, Po(d). N equals the observed number of digit d and the χ2 is
(12)
Consider two distributions, Po1(d) and Po2(d), one concerning N1 journals and the other concerning N2 journals where the proportions of digits are identical (Po1(di) = Po2(di) i = 1, …, 9). By definition the theoretical distributions are also identical. If we denote their χ2 respectively by χ12 and χ22 respectively, from Eq. 12 we have
(13)

Equation 13 is only valid for the Benford’s law. It shows us the dependence on the number of journals: If two observed distributions are identical, one for 10,000 journals and the other for 20,000 journals, the χ2 will necessarily be in a ratio of 2.

This dependence of χ2 on the number of items (here journals) leads researchers to develop other tests.

We then carry out the classical test of the adjustment of χ2: We will carry out this test with a precision of 5%. The threshold of rejection read in the table for eight degrees of freedom (Benford’s law) is then 15.51 and for seven degrees (generalized Benford’s law) it is 14.07.

The second test is the mean absolute deviation (MAD; see Eq. 14), a test developed by Nigrini and used by Hürilmann (2015a, p. 354).
(14)

The MAD is not a classical test as it does not depend on the size of the sample. It assumes that we know the FSD perfectly. Here, it is considered as a conformity indicator. It is also possible to use the mean χ2 (weighted least squares (WLS): chi-square divided by sample size). This measure is more of an empirical rule in order to choose the best adjustment (Hürilmann, 2015b, p. 355). We should also mention the recent application (Cerqueti & Lupi, 2022) of another type of test for large samples: the severity principle.

Table 1 was provided by Nigrini to clarify conformity with the MAD (Nigrini, 2012, p. 160). We adopt these notations in all tables.

Table 1.

MAD critical values and conformity to first significant digit (FSD)

MAD critical valuesConformity to Benfordian distributionAbbreviation
MAD ≤ 6.10−3 Close Conformity 
6.10−3 < MAD ≤ 12.10−3 Acceptable Conformity AC 
12.10−3 < MAD ≤ 15.10−3 Marginal Conformity MC 
MAD > 15.10−3 Non Conformity NC 
MAD critical valuesConformity to Benfordian distributionAbbreviation
MAD ≤ 6.10−3 Close Conformity 
6.10−3 < MAD ≤ 12.10−3 Acceptable Conformity AC 
12.10−3 < MAD ≤ 15.10−3 Marginal Conformity MC 
MAD > 15.10−3 Non Conformity NC 

It is relevant at this step of the discussion to add a complementary test denoted Max for a better interpretation of the data.

This third test is called Max (Eq. 15), expressed here as a percentage, and is the maximum of the absolute deviations of the proportions.
(15)

This test constructed here is of the same nature as the one using the Z-statistic (Alves et al., 2016, p. 1492).

To signify whether the value of Max (Eq. 15) is small enough to validate the laws, we use the method of “the confidence interval of a proportion.” Given an infinite population where a proportion p of individuals have a certain characteristic, the method consists in finding a confidence interval of p from a proportion f observed in a sample of size n. According to Saporta (2006, p. 111), we show that
(16)

We note Δp = tα · f1fN.

In the case of a confidence interval at 95% we have tα = 1.96. If we increase Δp by taking the worst case with f = 12, we obtain Δp98N%. In this case the law will be validated with Max if:
(17)
where N is the number of journals.

We then carry out a goodness-of-fit test on Max with a precision of 5%. The calculation of Fmax is done in Tables 37.

In summary, these three tests are quite different and seem necessary to validate the adjustments. The MAD test, specifically designed for Benford’s law, is unavoidable due to its widespread use in many studies. However, it is not a classical test and of course it is inoperative for the generalized Benford’s law. The chi-square test is the statistical test that seems to us the most relevant for this type of discrete law. However, as this law does not depend on any parameter, we have seen that it is very sensitive to the size of the population. Therefore, it seemed necessary to construct a third test. This does of course depend on the size of the population, but its rejection threshold is linked to the root of the population and is therefore less sensitive.

This section is in three parts. The first part, microanalysis, focuses on distributions over time (i.e., the calculations are made year by year).

Our data is presented in Tables 36. Additionally, the results from historical data are presented in Table S7.

The second part proposes a macroanalysis, aggregating distribution data over several years, leading us to construct Table 2. This part also takes into account historical data. The final section presents a categorical organization of the scientometric data correlated with the tests used in this study. The results obtained, which will be discussed, are shown in Table 2.

Table 2.

Products and metrics of science in a Benfordian context

Products and metrics of science in a Benfordian context
Products and metrics of science in a Benfordian context

In this study, we therefore examined 181 distributions. The distribution of the data is as follows. WoS provides 23 distributions for citations and 23 distributions for impact factor. The historical data set provides 10 distributions for citations, 10 distributions for articles and 10 for the impact factor. Finally, Scopus provides 21 distributions for citations, 21 for the h-index, 21 for the number of articles, 21 for the number of bibliographic references and 21 for the ratio. The 181 distributions analyzed can be found in Supplementary material: Tables S1–S10.

The three previous tests are implemented. In all cases, as expected, we observe a hierarchy of validity between the three tests: If the χ2 test validates a fit, then the Max test also validates it, and when it makes sense, the MAD test is also validated, with “Close Conformity” or “Acceptable Conformity” as the critical value. We test the validity of the generalized Benford’s law in all cases. Remember that the studies cited concerning historical data only use the χ2 test (Campanario & Coslado, 2011; Egghe & Guns, 2012).

4.1. Microanalysis

4.1.1. WoS and Scopus data set

The results are in Tables 36. For each year, we tested the validity of Benford’s law and the generalized Benford’s law. For these tables the results are read as follows:

  • Column 1: year

  • Column 2: number of journals analyzed (the number of journals analyzed may be different for two indicators in the same year)

  • Column 3: MAD critical value (see Table 1)

  • Columns 4, 5, and 6: calculation of the three indicators Fmax (Eq. 17), Max (Eq. 15) and distance of χ2 (Eq. 12).

Table 3.

WoS corpora from 1997 to 2019 with number of citations and impact factor

YearsNumber of citationsImpact factor
JournalsBenfordBenford GJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2MADFMaxMaxχ2βMaxχ2
1997 6,634 2.41 0.84 8.45 1.01 0.8 8.33 6,537 2.42 0.58 4.31 0.97 0.44 2.57 
1998 7,146 2.32 1.04 16.74 — 7,037 2.34 0.44 5.90 0.98 0.36 3.89 
1999 7,249 2.30 0.65 5.58 0.98 0.48 4.35 7,142 2.32 0.74 14.78 0.97 0.76 12.78 
2000 7,383 2.28 0.40 2.78 0.99 0.29 2.10 7,289 2.30 1.17 24.78 0.96 0.98 19.50 
2001 7,434 2.27 0.72 8.54 — 7,332 2.29 1.66 35.14 0.95 1.42 27.8 
2002 7,585 2.25 0.46 3.13 1.01 0.25 2.84 7,481 AC 2.27 1.13 40.37 0.96 1.17 27.31 
2003 7,621 2.25 0.31 3.18 1.01 0.17 3.00 7,548 2.26 0.84 24.05 0.96 1.08 19.93 
2004 7,681 2.24 0.33 2.91 — 7,621 2.25 1.39 19.96 1.01 1.20 19.44 
2005 7,835 2.21 0.55 7.80 1.01 0.51 7.66 7,770 2.22 0.71 20.04 0.99 0.90 19.46 
2006 7,934 2.20 0.63 12.08 1.02 0.53 10.68 7,893 2.21 2.25 41.27 1.06 1.57 30.14 
2007 8,292 2.15 0.59 7.25 1.01 0.54 7.06 8,219 AC 2.16 3.20 48.81 1.09 1.05 21.66 
2008 8,605 2.11 0.89 18.35 — 8,541 AC 2.12 3.29 64.26 1.11 0.88 18.83 
2009 9,644 2.00 0.69 22.60 0.99 0.74 22.27 9,567 AC 2.00 3.22 84.14 1.11 1.34 40.60 
2010 10,804 1.89 0.72 16.15 1.01 0.67 15.88 10,712 AC 1.89 2.79 62.80 1.10 0.68 14.98 
2011 11,302 1.84 0.43 7.97 0.99 0.43 7.66 11,215 AC 1.85 2.26 91.30 1.09 1.00 27.33 
2012 11,518 1.83 0.29 5.88 1.01 0.29 5.69 11,455 AC 1.83 2.83 91.22 1.11 0.94 35.20 
2013 11,569 1.82 0.48 6.27 — 11,538 AC 1.82 3.09 109.23 1.14 0.96 34.70 
2014 11,813 1.80 0.78 12.70 0.99 0.76 13.38 11,745 AC 1.81 2.88 109.54 1.12 1.23 53.29 
2015 12,026 1.79 0.73 5.54 0.98 0.25 3.46 11,984 MC 1.79 3.97 256.80 1.16 1.28 55.72 
2016 12,120 1.78 0.55 9.31 0.98 0.46 7.88 12,060 MC 1.78 3.56 371.07 1.18 2.36 100.12 
2017 12,236 1.77 0.62 12.72 0.97 0.43 6.55 12,294 NC 1.77 4.52 371.07 1.23 2.64 104.79 
2018 11,822 1.80 0.69 11.04 0.97 0.30 4.30 12,525 NC 1.75 4.28 435.14 1.24 3.43 146.76 
2019 12,872 1.73 0.80 7.10 0.99 0.75 6.38 12,485 NC 1.75 4.45 459.07 1.25 3.51 139.32 
YearsNumber of citationsImpact factor
JournalsBenfordBenford GJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2MADFMaxMaxχ2βMaxχ2
1997 6,634 2.41 0.84 8.45 1.01 0.8 8.33 6,537 2.42 0.58 4.31 0.97 0.44 2.57 
1998 7,146 2.32 1.04 16.74 — 7,037 2.34 0.44 5.90 0.98 0.36 3.89 
1999 7,249 2.30 0.65 5.58 0.98 0.48 4.35 7,142 2.32 0.74 14.78 0.97 0.76 12.78 
2000 7,383 2.28 0.40 2.78 0.99 0.29 2.10 7,289 2.30 1.17 24.78 0.96 0.98 19.50 
2001 7,434 2.27 0.72 8.54 — 7,332 2.29 1.66 35.14 0.95 1.42 27.8 
2002 7,585 2.25 0.46 3.13 1.01 0.25 2.84 7,481 AC 2.27 1.13 40.37 0.96 1.17 27.31 
2003 7,621 2.25 0.31 3.18 1.01 0.17 3.00 7,548 2.26 0.84 24.05 0.96 1.08 19.93 
2004 7,681 2.24 0.33 2.91 — 7,621 2.25 1.39 19.96 1.01 1.20 19.44 
2005 7,835 2.21 0.55 7.80 1.01 0.51 7.66 7,770 2.22 0.71 20.04 0.99 0.90 19.46 
2006 7,934 2.20 0.63 12.08 1.02 0.53 10.68 7,893 2.21 2.25 41.27 1.06 1.57 30.14 
2007 8,292 2.15 0.59 7.25 1.01 0.54 7.06 8,219 AC 2.16 3.20 48.81 1.09 1.05 21.66 
2008 8,605 2.11 0.89 18.35 — 8,541 AC 2.12 3.29 64.26 1.11 0.88 18.83 
2009 9,644 2.00 0.69 22.60 0.99 0.74 22.27 9,567 AC 2.00 3.22 84.14 1.11 1.34 40.60 
2010 10,804 1.89 0.72 16.15 1.01 0.67 15.88 10,712 AC 1.89 2.79 62.80 1.10 0.68 14.98 
2011 11,302 1.84 0.43 7.97 0.99 0.43 7.66 11,215 AC 1.85 2.26 91.30 1.09 1.00 27.33 
2012 11,518 1.83 0.29 5.88 1.01 0.29 5.69 11,455 AC 1.83 2.83 91.22 1.11 0.94 35.20 
2013 11,569 1.82 0.48 6.27 — 11,538 AC 1.82 3.09 109.23 1.14 0.96 34.70 
2014 11,813 1.80 0.78 12.70 0.99 0.76 13.38 11,745 AC 1.81 2.88 109.54 1.12 1.23 53.29 
2015 12,026 1.79 0.73 5.54 0.98 0.25 3.46 11,984 MC 1.79 3.97 256.80 1.16 1.28 55.72 
2016 12,120 1.78 0.55 9.31 0.98 0.46 7.88 12,060 MC 1.78 3.56 371.07 1.18 2.36 100.12 
2017 12,236 1.77 0.62 12.72 0.97 0.43 6.55 12,294 NC 1.77 4.52 371.07 1.23 2.64 104.79 
2018 11,822 1.80 0.69 11.04 0.97 0.30 4.30 12,525 NC 1.75 4.28 435.14 1.24 3.43 146.76 
2019 12,872 1.73 0.80 7.10 0.99 0.75 6.38 12,485 NC 1.75 4.45 459.07 1.25 3.51 139.32 

Note. — indicates that there is no β that improves the result for the generalized Benford’s law.

Table 4.

Scopus corpora from 1999 to 2019 with number of cumulative citations over three years

YearsNumber of citations
JournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2
1999 14,351 1.70 1.57 49.70 1.08 0.65 7.15 
2000 14,457 1.63 1.26 30.22 1.06 0.67 9.16 
2001 14,958 1.60 1.02 34.65 1.06 0.76 11.48 
2002 15,666 1.57 0.90 39.51 1.06 0.70 14.56 
2003 14,649 1.62 1.57 49.08 1.07 0.79 10.97 
2004 17,420 1.54 0.80 31.54 1.05 0.68 16.15 
2005 18,274 1.45 0.57 24.61 1.03 0.51 16.06 
2006 19,738 1.40 0.50 16.95 1.04 0.87 13.15 
2007 21,109 1.35 1.00 27.24 1.05 0.31 6.30 
2008 22,659 1.30 1.14 22.83 1.04 0.27 4.78 
2009 24,262 1.26 0.52 16.26 1.03 0.49 8.78 
2010 26,104 1.21 0.93 23.63 1.04 0.21 3.66 
2011 27,582 1.18 0.97 17.91 1.03 0.23 3.63 
2012 28,865 1.15 0.65 18.64 1.02 0.27 12.37 
2013 29,593 1.14 0.70 21.56 1.04 0.29 6.16 
2014 30,014 1.13 0.79 22.28 1.04 0.19 4.18 
2015 30,526 1.12 0.62 33.83 1.04 0.35 15.05 
2016 31,099 1.11 0.43 12.27 1.02 0.25 5.23 
2017 31,580 1.10 0.58 23.52 1.03 0.34 9.91 
2018 22,659 1.13 0.62 22.83 1.04 0.27 4.78 
2019 23,627 1.17 1.02 54.08 — 
YearsNumber of citations
JournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2
1999 14,351 1.70 1.57 49.70 1.08 0.65 7.15 
2000 14,457 1.63 1.26 30.22 1.06 0.67 9.16 
2001 14,958 1.60 1.02 34.65 1.06 0.76 11.48 
2002 15,666 1.57 0.90 39.51 1.06 0.70 14.56 
2003 14,649 1.62 1.57 49.08 1.07 0.79 10.97 
2004 17,420 1.54 0.80 31.54 1.05 0.68 16.15 
2005 18,274 1.45 0.57 24.61 1.03 0.51 16.06 
2006 19,738 1.40 0.50 16.95 1.04 0.87 13.15 
2007 21,109 1.35 1.00 27.24 1.05 0.31 6.30 
2008 22,659 1.30 1.14 22.83 1.04 0.27 4.78 
2009 24,262 1.26 0.52 16.26 1.03 0.49 8.78 
2010 26,104 1.21 0.93 23.63 1.04 0.21 3.66 
2011 27,582 1.18 0.97 17.91 1.03 0.23 3.63 
2012 28,865 1.15 0.65 18.64 1.02 0.27 12.37 
2013 29,593 1.14 0.70 21.56 1.04 0.29 6.16 
2014 30,014 1.13 0.79 22.28 1.04 0.19 4.18 
2015 30,526 1.12 0.62 33.83 1.04 0.35 15.05 
2016 31,099 1.11 0.43 12.27 1.02 0.25 5.23 
2017 31,580 1.10 0.58 23.52 1.03 0.34 9.91 
2018 22,659 1.13 0.62 22.83 1.04 0.27 4.78 
2019 23,627 1.17 1.02 54.08 — 

Note. — indicates that there is no β that improves the result for the generalized Benford’s law.

Table 5.

Scopus corpora from 1999 to 2019 with indicators: ratio & h-index

YearsNumber of references / Number of articlesh-index
JournalsBenfordBenford GJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2MADFMaxMaxχ2βMaxχ2
1999 13,270 NC 1.70 9.93 1,848.3 1.35 8.78 1,260.5 17.113 1.50 1.71 56.34 0.95 1.48 38.55 
2000 13,993 NC 1.66 9.91 1,769.4 1.29 8.89 1,302.0 17.547 1.48 1.73 65.35 0.94 1.45 40.13 
2001 14,512 NC 1.65 9.54 1,933.1 1.31 8.47 1,454.5 18.105 1.46 1.69 71.21 0.93 1.36 36.76 
2002 15,362 NC 1.58 9.93 2,208.2 1.28 8.90 1,787.3 19.165 AC 1.42 1.95 96.78 0.92 1.57 45.26 
2003 15,430 NC 1.58 9.97 2,329.4 1.27 8.98 1,922.0 19.760 1.39 2.04 105.52 0.92 1.70 53.42 
2004 16,481 NC 1.53 9.39 2,229.0 1.23 8.54 1,925.9 20.557 AC 1.37 1.96 91.60 0.93 1.63 48.92 
2005 17,515 NC 1.48 9.17 2,306.1 1.25 8.25 1,910.6 22.004 1.32 1.95 84.20 0.94 1.67 52.41 
2006 18,784 NC 1.43 7.97 2,327.4 1.23 7.72 1,984.9 23.638 1.27 1.94 93.28 0.94 1.60 59.95 
2007 19,506 NC 1.40 8.30 2,582.2 1.23 7.94 2,222.3 25.405 1.23 1.94 101.86 0.95 1.70 72.93 
2008 20,705 NC 1.36 8.10 2,814.4 1.23 7.85 2,427.1 27.501 AC 1.18 2.04 118.17 0.95 1.80 91.70 
2009 22,327 NC 1.31 8.01 3,042.9 1.20 7.84 2,707.2 29.350 AC 1.14 2.26 167.02 0.95 2.02 133.60 
2010 23,010 NC 1.29 7.86 3,212.9 1.20 8.48 2,944.6 31.048 AC 1.11 2.32 194.59 — 
2011 24,187 NC 1.26 7.67 3,648.3 1.13 8.37 3,502.3 32.584 AC 1.09 2.57 239.39 0.93 2.19 165.97 
2012 24,716 NC 1.25 7.55 3,880.6 1.08 8.38 3,818.7 33.433 AC 1.07 2.44 241.21 0.92 2.06 168.63 
2013 24,980 NC 1.24 8.08 4,287.1 1.07 8.72 4,252.4 33.965 AC 1.06 2.39 268.81 0.91 1.09 149.60 
2014 25,987 NC 1.21 8.25 4,579.6 1.01 8.73 4,579.4 34.701 AC 1.05 2.26 250.11 0.91 1.83 123.08 
2015 26,048 NC 1.21 9.83 5,159.2 0.95 8.82 5,133.9 35.115 AC 1.05 2.18 246.08 0.91 1.75 125.88 
2016 26,597 NC 1.20 10.35 5,580 0.91 8.4 5,509.9 35.505 AC 1.04 2.18 296.89 0.90 1.65 134.43 
2017 25,093 NC 1.24 13.8 7,184.9 0.82 10.1 6,896.2 34.464 AC 1.06 1.86 248.66 0.90 1.38 111.47 
2018 24,248 NC 1.36 16.9 8,570.2 0.9 14.53 8,213.9 32.447 1.09 1.41 128.18 0.94 1.12 72.49 
2019 23,627 NC 1.27 17.3 8,439.9 0.7 10.6 7,645.4 30.142 1.13 1.11 55.48 0.97 0.97 45.54 
YearsNumber of references / Number of articlesh-index
JournalsBenfordBenford GJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2MADFMaxMaxχ2βMaxχ2
1999 13,270 NC 1.70 9.93 1,848.3 1.35 8.78 1,260.5 17.113 1.50 1.71 56.34 0.95 1.48 38.55 
2000 13,993 NC 1.66 9.91 1,769.4 1.29 8.89 1,302.0 17.547 1.48 1.73 65.35 0.94 1.45 40.13 
2001 14,512 NC 1.65 9.54 1,933.1 1.31 8.47 1,454.5 18.105 1.46 1.69 71.21 0.93 1.36 36.76 
2002 15,362 NC 1.58 9.93 2,208.2 1.28 8.90 1,787.3 19.165 AC 1.42 1.95 96.78 0.92 1.57 45.26 
2003 15,430 NC 1.58 9.97 2,329.4 1.27 8.98 1,922.0 19.760 1.39 2.04 105.52 0.92 1.70 53.42 
2004 16,481 NC 1.53 9.39 2,229.0 1.23 8.54 1,925.9 20.557 AC 1.37 1.96 91.60 0.93 1.63 48.92 
2005 17,515 NC 1.48 9.17 2,306.1 1.25 8.25 1,910.6 22.004 1.32 1.95 84.20 0.94 1.67 52.41 
2006 18,784 NC 1.43 7.97 2,327.4 1.23 7.72 1,984.9 23.638 1.27 1.94 93.28 0.94 1.60 59.95 
2007 19,506 NC 1.40 8.30 2,582.2 1.23 7.94 2,222.3 25.405 1.23 1.94 101.86 0.95 1.70 72.93 
2008 20,705 NC 1.36 8.10 2,814.4 1.23 7.85 2,427.1 27.501 AC 1.18 2.04 118.17 0.95 1.80 91.70 
2009 22,327 NC 1.31 8.01 3,042.9 1.20 7.84 2,707.2 29.350 AC 1.14 2.26 167.02 0.95 2.02 133.60 
2010 23,010 NC 1.29 7.86 3,212.9 1.20 8.48 2,944.6 31.048 AC 1.11 2.32 194.59 — 
2011 24,187 NC 1.26 7.67 3,648.3 1.13 8.37 3,502.3 32.584 AC 1.09 2.57 239.39 0.93 2.19 165.97 
2012 24,716 NC 1.25 7.55 3,880.6 1.08 8.38 3,818.7 33.433 AC 1.07 2.44 241.21 0.92 2.06 168.63 
2013 24,980 NC 1.24 8.08 4,287.1 1.07 8.72 4,252.4 33.965 AC 1.06 2.39 268.81 0.91 1.09 149.60 
2014 25,987 NC 1.21 8.25 4,579.6 1.01 8.73 4,579.4 34.701 AC 1.05 2.26 250.11 0.91 1.83 123.08 
2015 26,048 NC 1.21 9.83 5,159.2 0.95 8.82 5,133.9 35.115 AC 1.05 2.18 246.08 0.91 1.75 125.88 
2016 26,597 NC 1.20 10.35 5,580 0.91 8.4 5,509.9 35.505 AC 1.04 2.18 296.89 0.90 1.65 134.43 
2017 25,093 NC 1.24 13.8 7,184.9 0.82 10.1 6,896.2 34.464 AC 1.06 1.86 248.66 0.90 1.38 111.47 
2018 24,248 NC 1.36 16.9 8,570.2 0.9 14.53 8,213.9 32.447 1.09 1.41 128.18 0.94 1.12 72.49 
2019 23,627 NC 1.27 17.3 8,439.9 0.7 10.6 7,645.4 30.142 1.13 1.11 55.48 0.97 0.97 45.54 

Note. — indicates that there is no β that improves the result for the generalized Benford’s law.

Table 6.

Scopus corpora from 1999 to 2019 with references and articles

YearsNumber of referencesNumber of articles
JournalsBenfordBenford GJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2MADFMaxMaxχ2βMaxχ2
1999 13,488 AC 1.69 1.74 79.5 0.93 1.65 53.9 14,734 AC 1.61 2.53 185.0 1.07 2.80 162.8 
2000 13,994 AC 1.66 2.00 99.0 0.93 1.70 68.0 15,065 AC 1.60 3.03 183.9 1.06 2.80 163.3 
2001 14,152 AC 1.63 2.38 104.4 0.93 2.05 77.8 15,844 AC 1.56 2.82 164.2 1.08 2.68 132.8 
2002 16,812 AC 1.51 2.85 161.4 1.08 2.58 127.8 15,363 AC 1.58 2.46 160.8 0.9 2.08 99.0 
2003 15,783 AC 1.56 2.12 132.3 0.92 2.09 93.9 17,068 AC 1.50 2.49 125.7 1.05 2.27 103.4 
2004 16,481 AC 1.53 2.40 176.6 0.94 2.33 146.9 17,567 AC 1.48 2.65 184.2 1.08 2.34 138.8 
2005 17,891 AC 1.46 2.59 175.7 0.95 2.36 153.3 18,858 AC 1.42 2.67 189.9 1.06 2.41 160.9 
2006 18,784 AC 1.43 2.48 224.5 0.95 4.17 341.1 19,622 AC 1.40 2.73 267.9 1.07 3.05 235.1 
2007 19,847 AC 1.39 2.56 213.1 0.95 2.55 191.6 20,598 AC 1.36 2.50 220.6 1.07 2.50 192.3 
2008 21,008 AC 1.35 2.40 240.4 0.96 2.70 225.8 21,697 AC 1.33 2.97 291.3 1.05 2.99 272.0 
2009 23,329 AC 1.28 2.67 250.6 0.95 2.44 230.9 23,407 AC 1.29 3.20 332.3 1.05 2.98 307.1 
2010 23,308 AC 1.28 2.15 255.7 0.98 2.60 251.4 24,100 MC 1.26 3.29 380.2 1.05 3.13 365.6 
2011 24,188 AC 1.26 2.26 270.1 0.96 2.57 207.9 25,089 AC 1.24 3.38 371.8 1.06 3.22 340.9 
2012 24,717 AC 1.25 1.95 253.8 0.97 2.36 241.2 25,536 AC 1.22 3.29 365.6 1.06 3.29 339.9 
2013 25,170 AC 1.24 2.07 249.7 0.97 2.38 237.1 25,975 AC 1.22 3.34 326.7 1.07 2.94 279.2 
2014 26,001 AC 1.21 2.17 283.4 0.99 2.41 233.2 26,789 AC 1.20 2.97 321.8 1.06 2.71 279.6 
2015 26,208 AC 1.22 2.66 295.5 — 26,968 AC 1.19 3.03 312.9 1.07 2.74 261.9 
2016 26,599 AC 1.20 2.66 290.5 — 27,330 AC 1.18 3.14 318.4 1.06 2.88 281.8 
2017 25,095 AC 1.24 2.62 237.6 0.91 2.13 236.8 25,805 AC 1.22 3.22 328.6 1.07 2.93 285.5 
2018 24,352 AC 1.26 2.53 195.8 1.02 2.05 193.4 24,820 AC 1.24 3.07 331.6 1.05 2.86 306.2 
2019 23,627 1.27 1.02 54.7 0.99 1.26 54.4 24,188 1.26 1.6 65.7 1.04 1.48 54.8 
YearsNumber of referencesNumber of articles
JournalsBenfordBenford GJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2MADFMaxMaxχ2βMaxχ2
1999 13,488 AC 1.69 1.74 79.5 0.93 1.65 53.9 14,734 AC 1.61 2.53 185.0 1.07 2.80 162.8 
2000 13,994 AC 1.66 2.00 99.0 0.93 1.70 68.0 15,065 AC 1.60 3.03 183.9 1.06 2.80 163.3 
2001 14,152 AC 1.63 2.38 104.4 0.93 2.05 77.8 15,844 AC 1.56 2.82 164.2 1.08 2.68 132.8 
2002 16,812 AC 1.51 2.85 161.4 1.08 2.58 127.8 15,363 AC 1.58 2.46 160.8 0.9 2.08 99.0 
2003 15,783 AC 1.56 2.12 132.3 0.92 2.09 93.9 17,068 AC 1.50 2.49 125.7 1.05 2.27 103.4 
2004 16,481 AC 1.53 2.40 176.6 0.94 2.33 146.9 17,567 AC 1.48 2.65 184.2 1.08 2.34 138.8 
2005 17,891 AC 1.46 2.59 175.7 0.95 2.36 153.3 18,858 AC 1.42 2.67 189.9 1.06 2.41 160.9 
2006 18,784 AC 1.43 2.48 224.5 0.95 4.17 341.1 19,622 AC 1.40 2.73 267.9 1.07 3.05 235.1 
2007 19,847 AC 1.39 2.56 213.1 0.95 2.55 191.6 20,598 AC 1.36 2.50 220.6 1.07 2.50 192.3 
2008 21,008 AC 1.35 2.40 240.4 0.96 2.70 225.8 21,697 AC 1.33 2.97 291.3 1.05 2.99 272.0 
2009 23,329 AC 1.28 2.67 250.6 0.95 2.44 230.9 23,407 AC 1.29 3.20 332.3 1.05 2.98 307.1 
2010 23,308 AC 1.28 2.15 255.7 0.98 2.60 251.4 24,100 MC 1.26 3.29 380.2 1.05 3.13 365.6 
2011 24,188 AC 1.26 2.26 270.1 0.96 2.57 207.9 25,089 AC 1.24 3.38 371.8 1.06 3.22 340.9 
2012 24,717 AC 1.25 1.95 253.8 0.97 2.36 241.2 25,536 AC 1.22 3.29 365.6 1.06 3.29 339.9 
2013 25,170 AC 1.24 2.07 249.7 0.97 2.38 237.1 25,975 AC 1.22 3.34 326.7 1.07 2.94 279.2 
2014 26,001 AC 1.21 2.17 283.4 0.99 2.41 233.2 26,789 AC 1.20 2.97 321.8 1.06 2.71 279.6 
2015 26,208 AC 1.22 2.66 295.5 — 26,968 AC 1.19 3.03 312.9 1.07 2.74 261.9 
2016 26,599 AC 1.20 2.66 290.5 — 27,330 AC 1.18 3.14 318.4 1.06 2.88 281.8 
2017 25,095 AC 1.24 2.62 237.6 0.91 2.13 236.8 25,805 AC 1.22 3.22 328.6 1.07 2.93 285.5 
2018 24,352 AC 1.26 2.53 195.8 1.02 2.05 193.4 24,820 AC 1.24 3.07 331.6 1.05 2.86 306.2 
2019 23,627 1.27 1.02 54.7 0.99 1.26 54.4 24,188 1.26 1.6 65.7 1.04 1.48 54.8 

Note. — indicates that there is no β that improves the result for the generalized Benford’s law.

The green shaded boxes are those where Benford’s law is validated by χ2 for a 95% confidence level. The red shaded and bold boxes are those where Benford’s law is validated by the Max test for a 95% confidence level. We then test the generalized Benford’s law.

  • Column 7: value of β. When the column is empty, it means that the generalized Benford’s law does not improve the original: In this case, the optimal β is 1

  • Columns 8 and 9: calculation of Max and χ2 as before

The purple shaded and underlined boxes are those where the generalized Benford’s law is validated by χ2 for a 95% confidence level. The red shaded and bold boxes are those where the generalized Benford’s law is validated by the Max test for a 95% confidence level.

The results can be summarized as follows. For the 46 WoS distributions, we have

  1. The total number of journal citations (Table 3): The critical value of the MAD is always “Close conformity”; the Max test agrees in all cases. The χ2 test validates 82% of cases.

    For the generalized Benford’s law, the result is the same.

  2. The impact factor (Table 3): The critical value of the MAD is “Close Conformity” for only nine cases and “Non Conformity” for three cases; 39% of cases are validated by the Max test. Only 13% of cases are validated by the χ2 test.

    For the generalized Benford’s law, the result for the χ2 test is the same; for the Max test the result is better: it validates 80% of cases. β varies between 0.97 and 1.25.

For the 105 Scopus distributions, we have
  1. The number of cumulative citations over three years (Table 4): The critical value of the MAD is always “Close Conformity.” The Max test agrees in all cases. Only one is validated by the χ2 test.

    For the generalized Benford’s law, the χ2 test validates the law in 80% of cases and in 100% of cases for the Max test. β varies between 1.02 and 1.08.

  2. h-index (Table 5): The critical value of the MAD is “Close Conformity” for only nine cases and “Acceptable Conformity” for the other cases. The Max test validates only one case. The χ2 test is not valid in any cases.

    For the generalized Benford’s law, the Max test validates four cases. The χ2 test is not valid in any cases. β varies between 0.90 and 0.97.

  3. Number of bibliographic references/Number of articles (ratio) (Table 5): The MAD critical value is “Non Conformity.” The Max and χ2 tests are never validated. The generalized Benford’s law introduces no modification.

  4. Number of bibliographic references (Table 6): The MAD critical value is “Close Conformity” for 1 case and “Acceptable Conformity” for 20 cases. The Max and χ2 tests are never validated. The generalized Benford’s law introduces no modifications.

  5. Number of articles published (Table 6): The MAD critical value is “Acceptable Conformity”. The Max and χ2 tests are never validated. The generalized Benford’s law is validated in only one case by the Max test.

4.1.2. Historical data set

The results obtained are presented in Table 7. We apply our analysis tools to the 2011 scientometric data (Campanario & Coslado, 2011) on the number of articles, the number of citations and the impact Factor from 1998 to 2007 of WoS journals. The results are gathered in Table 7. They follow the same presentation as our results in Table 3. The test used by the authors is χ2. Obviously, we obtained the same result as the authors on this test with their data. There is a difference between our calculation and that of the authors because we do not analyze the same number of journals. However, we note that when the value is different, the ranking is very close. We proceed with the MAD test, the Max test, and an adjustment with the generalized Benford’s law. The results for the citations and impact factor data set between 1998 and 2007 are compared with the result obtained from our own collection of WoS data. We were not able to collect the data set corresponding to the number of articles.

  1. Citations (Table 7): The result is expected (see Table 3). The MAD critical value is “Close Conformity” and the Max test is valid in all cases. The χ2 test is valid in all cases.

  2. Number of articles (Table 7): The MAD critical value is “Close Conformity” in one case and “Acceptable Conformity” in the other cases. The χ2 test is never significant. The Max test is significant in eight out of 10 cases. For the generalized law, the Max test is always validated and the χ2 test is significant in six cases. β varies between 0.87 and 0.92.

  3. Impact factor (Table 7): If we compare the results with those of Table 3, the MAD and Max tests are almost identical. The χ2 test validates three cases, one more than for our data. The Max test validates all cases, as is the case with our data set. For the generalized law, the χ2 test is significant in five cases. The β test ranges from 0.97 to 1.1.

Table 7.

Historical scientometric data set from Alves et al. (2014) and Campanario and Coslado (2011) 

 YearsJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2
Articles 1998 5,188 AC 2.72 1.4 27.8 0.91 1.02 14.72 
1999 5,283 AC 2.70 1.58 27.4 0.92 1.20 17.07 
2000 5,412 2.66 1.37 16.2 0.92 0.71 6.57 
2001 5,477 AC 2.65 2.15 38.1 0.89 1.03 10.68 
2002 5,607 AC 2.61 2.01 57.9 0.87 1.16 29.77 
2003 5,660 AC 2.60 2.47 43.5 0.88 0.96 9.63 
2004 5,722 AC 2.59 1.91 31.3 0.91 0.80 10.7 
2005 5,887 AC 2.55 1.97 41.5 0.90 1.11 16.77 
2006 5,981 AC 2.53 2.54 27.8 0.91 0.62 5.41 
2007 6,266 AC 2.48 2.50 31.3 0.90 0.79 5.02 
Citations 1998 5,467 2.65 1.12 15.1 — 
1999 5,550 2.63 1.00 7.1 0.96 0.50 4.11 
2000 5,696 2.60 0.54 4.5 0.98 0.38 3.34 
2001 5,752 2.58 0.48 5.2 0.99 0.51 4.75 
2002 5,876 2.55 0.35 3.1 — 
2003 5,907 2.55 0.49 3.5 0.98 0.34 2.28 
2004 5,968 2.54 0.34 3.0 0.98 0.40 1.94 
2005 6,088 2.51 0.73 11.2 0.99 0.72 11.97 
2006 6,166 2.49 0.53 9.7 1.01 0.49 9.25 
2007 6,417 2.44 0.61 8.4 1.01 0.67 8.26 
Impact factor 1998 5,378 2.67 0.61 6.6 — 
1999 5,467 2.65 0.64 11.3 0.98 0.91 10.5 
2000 5,607 AC 2.61 1.14 22.2 0.99 1.10 21.7 
2001 5,670 2.60 1.15 20.2 0.97 1.34 8.6 
2002 5,791 2.57 1.32 24.9 0.98 1.34 23.4 
2003 5,845 2.56 0.76 12.5 0.99 0.75 12.1 
2004 5,918 2.54 1.76 16.7 1.04 0.90 12.80 
2005 6,033 2.52 0.99 16.3 1.01 0.75 15.85 
2006 6,122 AC 2.50 0.82 39.3 1.06 1.52 27.84 
2007 6,359 AC 2.46 2.74 40.4 1.10 1.06 14.04 
 YearsJournalsBenfordBenford G
MADFmaxMaxχ2βMaxχ2
Articles 1998 5,188 AC 2.72 1.4 27.8 0.91 1.02 14.72 
1999 5,283 AC 2.70 1.58 27.4 0.92 1.20 17.07 
2000 5,412 2.66 1.37 16.2 0.92 0.71 6.57 
2001 5,477 AC 2.65 2.15 38.1 0.89 1.03 10.68 
2002 5,607 AC 2.61 2.01 57.9 0.87 1.16 29.77 
2003 5,660 AC 2.60 2.47 43.5 0.88 0.96 9.63 
2004 5,722 AC 2.59 1.91 31.3 0.91 0.80 10.7 
2005 5,887 AC 2.55 1.97 41.5 0.90 1.11 16.77 
2006 5,981 AC 2.53 2.54 27.8 0.91 0.62 5.41 
2007 6,266 AC 2.48 2.50 31.3 0.90 0.79 5.02 
Citations 1998 5,467 2.65 1.12 15.1 — 
1999 5,550 2.63 1.00 7.1 0.96 0.50 4.11 
2000 5,696 2.60 0.54 4.5 0.98 0.38 3.34 
2001 5,752 2.58 0.48 5.2 0.99 0.51 4.75 
2002 5,876 2.55 0.35 3.1 — 
2003 5,907 2.55 0.49 3.5 0.98 0.34 2.28 
2004 5,968 2.54 0.34 3.0 0.98 0.40 1.94 
2005 6,088 2.51 0.73 11.2 0.99 0.72 11.97 
2006 6,166 2.49 0.53 9.7 1.01 0.49 9.25 
2007 6,417 2.44 0.61 8.4 1.01 0.67 8.26 
Impact factor 1998 5,378 2.67 0.61 6.6 — 
1999 5,467 2.65 0.64 11.3 0.98 0.91 10.5 
2000 5,607 AC 2.61 1.14 22.2 0.99 1.10 21.7 
2001 5,670 2.60 1.15 20.2 0.97 1.34 8.6 
2002 5,791 2.57 1.32 24.9 0.98 1.34 23.4 
2003 5,845 2.56 0.76 12.5 0.99 0.75 12.1 
2004 5,918 2.54 1.76 16.7 1.04 0.90 12.80 
2005 6,033 2.52 0.99 16.3 1.01 0.75 15.85 
2006 6,122 AC 2.50 0.82 39.3 1.06 1.52 27.84 
2007 6,359 AC 2.46 2.74 40.4 1.10 1.06 14.04 

Note. — indicates that there is no β that improves the result for the generalized Benford’s law.

In summary, the results are almost identical for both data sets. The differences are due to the different number of journals analyzed. For example, for the year 2003 and the impact factor 7,548 journals are collected in one case and 5,845 in the other.

4.2. Macroanalysis

After an initial study of the years, we propose a macroanalysis of the 181 distributions by aggregating the temporal data. A digit-by-digit analysis (i.e., a column-by-column analysis of the 90 distributions in the appended tables) is carried out. We calculate the mean, median, and standard deviation of the nine distributions of digit values over the total number of years (23 for Scopus, 21 for WoS, 10 for historical data). We will define the Benfordian distribution of averages.

4.2.1. Distribution of averages

If T denotes the number of years studied in a data set, then we have T Benford distributions:
The average distribution P¯ is defined by
(18)
Let’s confirm that this is a probability distribution:

This distribution is all the more relevant if the mean is a representative of the value of each digit for all periods. A quick examination of the tables in the appendices where mean, median, and standard deviation are calculated shows that the distribution of values has a Gaussian appearance (median and mean close together) with a low standard deviation. We have represented the histogram of the distribution of the 23 values of digit 1 of the WoS citations which have an average of 29.93 (see Figure 2). This leads us to calculate the average distributions for each data set.

Figure 2.

Variation in the value of digit 1 of the WoS citation data set (see Table S7 in the Supplementary material).

Figure 2.

Variation in the value of digit 1 of the WoS citation data set (see Table S7 in the Supplementary material).

Close modal

In order to produce Table S7 in Supplementary material, which allows us to analyze, we then use these 90 distributions to construct the 10 average distributions. Each digit average is compared with the corresponding theoretical value. Except for the data set corresponding to the ratio Number of references/Number of articles (see Table S7), good conformity with the theoretical value is observed.

Only the MAD test is possible, as the number of journals varies from year to year. Except for the ratio data set, the other distributions are C or AC. The three citation distributions are of type C.

4.3. Discussion

This work has made it possible to apply Benford’s law and the various related tests to several bibliometric objects. By considering categorizations, macro- and microanalyses have laid the foundations for a reflection on the nature of the scientometric objects used. Macroanalysis shows the nine distributions of average have MAD values of C or AC (see Table 2). Note that the distribution of ratios has not been taken into account.

In the microanalysis, Benford’s law was first tested with the three tests MAD, χ2, and Max. From a quantitative and global point of view, the results obtained for the 160 distributions show

  • 154 distributions (96%) are of type C or AC;

  • 58 distributions (36%) validate the χ2 test for the generalized Benford’s law; and

  • 98 distributions (61%) validate the Max test for the generalized Benford’s law.

However, microanalysis allows us to refine the results and to observe differences between the different types of distributions. We then grouped the homogeneous distributions into three classes (see Table 2);

  • A first class: the 54 distributions of citations produced by scientific activity;

  • A second class: the 52 distributions relating to the number of references and the number of articles produced; and

  • Finally, a last class: the 54 distributions concerning the scientometric indicators h-index and impact factor, which are metrics of science.

They clearly show that we can classify the three categories according to the success of the statistical tests (χ2 and Max) (see Table 2).

First, we get citations (100% Max, 56% χ2), second, indicators (35% Max, 11% χ2), and third, references and articles (17% Max, 0% χ2).

Second, we tested the generalized Benford’s law. The ranking remains unchanged. In all three categories, the generalized law improves the results. The range of variation of β is only 0.11. The most significant improvement is for items in the historical data set (Table 7).

How can we explain these differences in applicability? We have observed Benford’s law on a wide range of scientometric data. There are scientometric distributions that do not verify Benford’s law, and these phenomena invite us to reflect on their nature. We can divide the scientometric objects into two main categories, namely Product of Science and Metrics of Science. The Product of Science class contains citations, references, and articles. The Metrics of Science class contains impact factor, h-index, and ratio. To understand this phenomenon, we’ll discuss the following: citations, references, and articles, which belong to the Product of Science class, and bibliometric indicators, which belong to the Metrics of Science class.

4.3.1. Citations

Benford’s law applies particularly well to bibliographic citations.

Whatever the data set considered, the citation distributions follow Benford’s law (Gauvrit & Delahaye, 2008). Indeed, citations are data that reflect the actual use of scientific articles by other researchers. Scientometrics has shown that, very often, only a few articles receive a large number of citations, while the majority of articles receive far fewer. They often follow an uneven distribution, with a few articles receiving a large number of citations, while the majority of articles receive fewer.

4.3.2. References and articles

The number of references in an article, unlike the number of citations, is limited by the researcher’s practice: There are rarely extremes. For the number of articles in a journal, the constraint is imposed by the edition itself. Thus, for this third class of scientometric elements, the results obtained show poor results for the application of Benford’s law. Although this may seem paradoxical at first glance, because it’s easy to replace the term citation with references, it’s important to understand that citations are generated naturally.

References and the number of articles, on the other hand, are limited by editorial practices. Indeed, the length of an article is artificial, in the sense that the length of the article is limited. This also implies a constraint on bibliographic references. As the length of articles is limited, the use of references is also limited. These constraints also affect the distribution of significant digits, which partly explains the poor results obtained.

4.3.3. Bibliometric indicators

When considering bibliometric indicators, their applicability depends on the specific characteristics of the bibliometric data in question.

The impact factor is a specific indicator that measures the frequency with which a journal’s articles are cited in other articles over a given period. It is calculated as the average number of citations received by a journal’s articles over a given period (e.g., 2 years). The h-index reflects both the number of publications and the number of citations of a researcher. The ratio does not follow Benford’s law. One possible interpretation is that references and the number of articles are limited by editorial practice. As the length of articles is limited, so is the use of references. However, we have not categorized the journals, whether paper, digital, or megajournal, and this work remains to be done.

4.3.4. Generalized Benford’s law

The generalized Benford’s law naturally gives better results when fitting a distribution. Indeed, Section 2 shows that its generalization consists in adding a parameter, as for Zipf’s law, which in its primitive form has no parameter. This improvement can be seen when we compare the indicator values of χ2 and Max for the two laws (see Tables 36). We will also observe that the generalized Benford’s law is all the more relevant as the scientometric objects that constitute the three classes respect Benford’s law (see Table 2).

In our experiment, the generalized Benford’s law does not provide any results for distributions that are not Benfordian. It is important to point out that Benford’s law is not universal and does not apply to all types of scientometric data; see for example the distribution of ratio. Indeed, distributions that are not constrained by the system or by humans better verify Benford’s law. This partly explains the open questions in the literature (Alves et al., 2014; Campanario & Coslado, 2011; Egghe & Guns, 2012).

This paper focuses on scientometric objects and their behavior. It is important to consider the nature of scientometric observables. To this end, we have built a corpus that allows us to experiment with different tests. All the data produced are freely available for reproducibility purposes. We have also confirmed the results of our colleagues. We used MAD, Max, and χ2, applying Benford’s law and the generalized Benford’s law. The latter confirms its generalization for certain scientometric objects, such as citations.

We have shown that Benford’s law applies particularly well to citations and, to a lesser extent, to bibliometric indicators. Benford’s law is not easily applicable to bibliographic references and articles. After proposing a categorization, we put forward an explanation of this phenomenon, together with the constraints that apply to these objects.

The next steps in this work are to extend the scientometric objects. Indeed, we can consider altmetrics, as suggested by the work of Gupta et al. (2023). This work is based on an abductive approach based on observations of data sets, so it would be relevant to mathematize the concept of constraint around scientometric objects.

The authors would like to express their gratitude to the reviewers for their valuable feedback, which has been incorporated into this version of the manuscript.

Marc Bertin: Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Resources, Supervision, Validation, Visualization, Writing—original draft, Writing—review & editing. Thierry Lafouge: Conceptualization, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing—original draft, Writing—review & editing.

The authors have no competing interests.

The data used in this study are published in https://doi.org/10.5281/zenodo.12698510.

Alexopoulos
,
T.
, &
Leontsinis
,
S.
(
2014
).
Benford’s law in astronomy
.
Journal of Astrophysics and Astronomy
,
35
(
4
),
639
648
.
Alves
,
A. D.
,
Yanasse
,
H. H.
, &
Soma
,
N. Y.
(
2014
).
Benford’s law and articles of scientific journals: Comparison of JCR® and Scopus data
.
Scientometrics
,
98
,
173
184
. ,
[PubMed]
Alves
,
A. D.
,
Yanasse
,
H. H.
, &
Soma
,
N. Y.
(
2016
).
An analysis of bibliometric indicators to JCR according to Benford’s law
.
Scientometrics
,
107
(
3
),
1489
1499
. ,
[PubMed]
Benford
,
F.
(
1938
).
The law of anomalous numbers
.
Proceedings of the American Philosophical Society
,
78
(
4
),
551
572
. .
Bertin
,
M.
, &
Lafouge
,
T.
(
2024
).
Scientometric dataset for Benford’s law (Version 3)
.
Zenodo
.
Campanario
,
J. M.
, &
Coslado
,
M. A.
(
2011
).
Benford’s law and citations, articles and impact factors of scientific journals
.
Scientometrics
,
88
(
2
),
421
432
.
Cerqueti
,
R.
, &
Lupi
,
C.
(
2022
).
Severe testing of Benford’s law
.
arXiv
.
Delahaye
,
J.-P.
(
2012
).
Les entiers ne naissent pas égaux
.
Pour la Science
,
421
,
80
85
.
Diaconis
,
P.
(
1977
).
The distribution of leading digits and uniform distribution mod 1
.
Annals of Probability
,
5
(
1
),
72
81
.
Diekmann
,
A.
(
2007
).
Not the first digit! Using Benford’s law to detect fraudulent scientific data
.
Journal of Applied Statistics
,
34
(
3
),
321
329
.
Eckhartt
,
G. M.
, &
Ruxton
,
G. D.
(
2023
).
Investigating and preventing scientific misconduct using Benford’s law
.
Research Integrity and Peer Review
,
8
,
1
. ,
[PubMed]
Egghe
,
L.
(
2005
).
Lotkaian informetrics: An introduction
. In
L.
Egghe
(Ed.),
Power laws in the information production process: Lotkaian informetrics
(pp.
7
99
).
Emerald Group Publishing Limited
.
Egghe
,
L.
(
2011
).
Benford’s law is a simple consequence of Zipf’s law
.
ISSI Newsletter
,
7
(
3
),
55
56
.
Egghe
,
L.
, &
Guns
,
R.
(
2012
).
Applications of the generalized law of Benford to informetric data
.
Journal of the American Society for Information Science and Technology
,
63
(
8
),
1662
1665
.
Fu
,
D.
,
Shi
,
Y. Q.
, &
Su
,
W.
(
2007
).
A generalized Benford’s law for JPEG coefficients and its applications in image forensics
. In
Proceedings of SPIE, Volume 6505, Security, Steganography, and Watermarking of Multimedia Contents IX
(pp.
65051L1
65051L11
).
Gauvrit
,
N.
, &
Delahaye
,
J.-P.
(
2008
).
Pourquoi la loi de Benford n’est pas mysterieuse
.
Mathématiques et Sciences Humaines
,
182
),
7
15
.
Geyer
,
C. L.
, &
Williamson
,
P. P.
(
2004
).
Detecting fraud in data sets using Benford’s law
.
Communications in Statistics – Simulation and Computation
,
33
(
1
),
229
246
.
Gupta
,
S.
,
Singh
,
V. K.
, &
Banshal
,
S. K.
(
2023
).
On the quality of altmetric data: An exploratory analysis using Benford’s law
. In
Proceedings of ISSI 2023: The 19th International Conference of the International Society for Scientometrics and Informetrics
(
Vol. 1
, pp.
139
154
).
Bloomington, IN
.
Hein
,
J.
,
Schuepfer
,
G. K.
, &
Konrad
,
C.
(
2011
).
Is fraud detection in scientific anaesthesia papers possible by using a method of financial auditors?
.
European Journal of Anaesthesiology
,
28
,
212
.
Hürlimann
,
W.
(
2015a
).
A first digit theorem for powerful integer powers
.
SpringerPlus
,
4
,
576
. ,
[PubMed]
Hürlimann
,
W.
(
2015b
).
On the uniform random upper bound family of first significant digit distributions
.
Journal of Informetrics
,
9
(
2
),
349
358
.
Judge
,
G.
, &
Schechter
,
L.
(
2009
).
Detecting problems in survey data using Benford’s law
.
Journal of Human Resources
,
44
(
1
),
1
24
.
Kafri
,
O.
(
2023
).
Zipf’s law, Benford’s law, and Pareto rule
.
Advances in Pure Mathematics
,
13
,
174
180
.
Lazebnik
,
T.
, &
Gorlitsky
,
D.
(
2023
).
Can we mathematically spot possible manipulation of results in research manuscripts using Benford’s law?
arXiv
.
Lee
,
K.-B.
,
Han
,
S.
, &
Jeong
,
Y.
(
2020
).
COVID-19, flattening the curve, and Benford’s law
.
Physica A: Statistical Mechanics and its Applications
,
559
,
125090
. ,
[PubMed]
Mebane
,
W. R.
, Jr.
(
2011
).
Comment on “Benford’s law and the detection of election fraud”
.
Political Analysis
,
19
(
3
),
269
272
.
Newcomb
,
S.
(
1881
).
Note on the frequency of use of the different digits in natural numbers
.
American Journal of Mathematics
,
4
(
1
),
39
40
.
Nigrini
,
M. J.
(
2012
).
Benford’s law: Applications for forensic accounting, auditing, and fraud detection
.
Chichester
:
John Wiley & Sons
.
Nigrini
,
M. J.
, &
Miller
,
S. J.
(
2007
).
Benford’s law applied to hydrology data—Results and relevance to other geophysical data
.
Mathematical Geology
,
39
(
5
),
469
490
.
Pain
,
J.-C.
(
2008
).
Benford’s law and complex atomic spectra
.
Physical Review E
,
77
,
012102
. ,
[PubMed]
Perez-Gonzalez
,
F.
,
Heileman
,
G. L.
, &
Abdallah
,
C. T.
(
2007
).
Benford’s law in image processing
. In
2007 IEEE International Conference on Image Processing
(pp.
I-405
I-408
).
Pietronero
,
L.
,
Tosatti
,
E.
,
Tosatti
,
V.
, &
Vespignanti
,
A.
(
2001
).
Explaining the uneven distribution of numbers in nature: The laws of Benford and Zipf
.
Physica A: Statistical Mechanics and its Applications
,
293
(
1–2
),
297
304
.
Raimi
,
R. A.
(
1976
).
The first digit problem
.
American Mathematical Monthly
,
83
(
7
),
521
538
.
Rane
,
A. D.
,
Mishra
,
U.
,
Biswas
,
A.
,
Sen De
,
A.
, &
Sen
,
U.
(
2014
).
Benford’s law gives better scaling exponents in phase transitions of quantum XY models
.
Physical Review E
,
90
(
2
),
022144
. ,
[PubMed]
Sambridge
,
M.
,
Tkalčić
,
H.
, &
Jackson
,
A.
(
2010
).
Benford’s law in the natural sciences
.
Geophysical Research Letters
,
37
,
L22301
.
Saporta
,
G.
(
2006
).
Probabilité analyse des données et statistiques
.
Technip
.
Schumm
,
W. R.
,
Crawford
,
D. W.
,
Lockett
,
L.
,
Ateeq
,
A.
, &
AlRashed
,
A.
(
2023
).
Can retracted social science articles be distinguished from non-retracted articles by some of the same authors, using Benford’s law or other statistical methods?
Publications
,
11
(
1
),
14
.
Tam Cho
,
W. K.
, &
Gaines
,
B. J.
(
2007
).
Breaking the (Benford) law: Statistical fraud detection in campaign finance
.
American Statistician
,
61
(
3
),
218
223
.
Wei
,
L.
,
Sundararajan
,
A.
,
Sarwat
,
A. I.
,
Biswas
,
S.
, &
Ibrahim
,
E.
(
2017
).
A distributed intelligent framework for electricity theft detection using Benford’s law and Stackelberg game
. In
2017 Resilience Week (RWS)
(pp.
5
11
).

Author notes

Handling Editor: Vincent Larivière

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data