Anatomy of the top 1% most highly cited publications: An empirical comparison of two approaches

Abstract Scientific excellence is an ongoing object of quantitative analysis of science and technology literature. The most commonly adopted of the various criteria for such quantification is to define highly cited papers as the ones lying in the first percentile (top 1%) of citation counts. Wagner and colleagues have recently proposed a new method in which citation counts are determined as a whole, irrespective of discipline. This study analyzes the practical implications of the new approach compared to the traditional procedure in which papers are ranked bearing in mind the scientific field involved, with particular attention to the consequences for rankings by country and discipline. The results show that the new methodology favors countries with a predominance of polytechnical scientific production, technological development, and innovation and lowers the rank of countries where the social sciences, humanities, and basic research account for the lion’s share of output. An analysis of worldwide production confirms the growth of scientific output in technical and technological disciplines.


INTRODUCTION
Scientific excellence-production that stands out for the number of citations received-has been recurrently addressed by experts in the quantitative analysis of science and technology literature, and some of the most relevant studies are referenced in this section.The most commonly adopted of the various criteria for such quantification is to define highly cited papers as those lying in the first percentile (top 1%) of citation counts, in descending order.Related issues include how to establish the list of highly cited papers as ranked in descending (or ascending, depending on the method) order and how to convert that ordinal distribution into percentiles apt for comparing publications in different subject areas (McAllister, Narin, & Corrigan, 1983).
A number of methods have been deployed to square that circle, all of which have technical drawbacks that stifle consensus on how to find the ideal ranking.One of the most prominent difficulties is that establishing percentiles requires access to a database well suited to bibliometric analysis; that is, with a full listing of papers and their citation counts (Ahlgren, Persson, & Rousseau, 2014).Such a database should also ensure a balance of scientific literature for the most accurate cross-discipline comparison possible, taking into consideration that this balanced coverage is not sufficient to avoid field-related biases (Waltman & Schreiber, 2013).
The use of percentiles is a nonparametric alternative to eluding the technical and conceptual drawbacks to some raw or normalized citation indicators that fail to take into account the bias and nonnormality inherent in such distributions (Bornmann, Leydesdorff, & Wang, 2013;Bornmann, Moya-Anegón, & Leydesdorff, 2012;Leydesdorff, Wagner, & Bornmann, 2014).
The lively discussion maintained around the ambiguities of quantile-based analysis has prompted many proposals to narrow uncertainties and correct flaws when dealing with tied scores or establishing quantile thresholds (Schreiber, 2013;Waltman & Schreiber, 2013).
Although no method has yet been universally deemed the most suitable, a consensus has been reached on the utility of such approaches for within-and between-field variability analysis and comparison (Rodriguez-Navarro & Brito, 2021).The recent inclusion in Web of Science (WoS) of the subject category percentile for journals listed in Journal Citation Reports ( JCR) attests to such consensus.The general opinion is, then, that irrespective of how the ranking is calculated, the publication and citation practices of each scientific specialty must be borne in mind to reduce the effects of the so-called skewness of science (Seglen, 1992).
A new approach has been proposed in this regard (Wagner, Zhang, & Leydesdorff, 2022).In that study the focus is on how the National Science Board (NSB) normalizes and compares country results based on the top 1% of papers in terms of citation count (NSB, 2020).The authors list the technical problems posed by the latter methodology and put forward an alternative to purportedly reach a more accurate assessment.Theirs is among the suite of nonparametric alternatives for analyzing nonnormal distributions.Although the first step in their procedure, determining the ratio between observed and expected number of top 1% publications, has been used previously (Perianes-Rodriguez & Ruiz-Castillo, 2016), the second is new.Each paper in the top 1% in the analysis of citation counts by country is selected with no reference to scientific specialty.This new proposal to create a ranked list of citations irrespective of the discipline involved therefore deviates from the approach adopted to date.Citation counts have traditionally been established bearing in mind not only publication type and year of publication, but also journal specialty (Bornmann et al., 2012;Waltman, Calero-Medina et al., 2012), acknowledging the well-known fact that output and citation differ widely among disciplines (Albarran & Ruiz-Castillo, 2011;Albarran, Crespo et al., 2011).The new proposal also does not discuss the issues around establishing interpercentile thresholds or handling tied scores.Neither does it mention that the traditional method, selecting highly cited papers by scientific field, differs entirely from the field normalization approach with which the authors compare theirs.In addition, the new proposal ignores the sixth principle of the Leiden Manifesto about considering variations by field in publication and citation practices (Hicks, Wouters et al., 2015).This study analyzes the practical implications of the new approach compared to the traditional procedure in which papers are ranked bearing in mind the scientific field involved, with particular attention to the consequences for rankings by country and discipline.
This comparative empirical analysis is intended to answer the following research questions.How closely matched are the two data sets?Does the output by country ranking differ depending on which is deployed?If so, which countries are most heavily impacted?Does the output by discipline ranking differ depending on which method is used?Which disciplines are most heavily impacted?And perhaps the most relevant question: Are the arguments defending the greater accuracy of the results delivered by the new methodology for calculating the top 1% of publications justified?

METHODS
OpenAlex, snapshot November 2022 (Priem, Piwowar, & Orr, 2022), was the database used to answer the research questions posed.Although, as its creators acknowledge, this source of academic metadata is still in the initial stages, OpenAlex can potentially enhance the transparency of research assessment, navigation, representation, and discovery.In addition, this 100% open access resource contains vast volumes of data.
Its lack of maturity translates into a number of issues relating to author and institutional disambiguation and country assignment.It also still lacks information on funding and corresponding authors.Other similar resources are beset by some of these same issues (Guerrero-Bote, Chinchilla-Rodriguez et al., 2021;van den Besselaar & Sandstrom, 2016;Visser, van Eck, & Waltman, 2021).Nonetheless, further to the present analysis, the country name was missing in only 0.01% of the highly cited papers.In addition, the time and effort devoted by the OurResearch team to developing OpenAlex favor its likelihood of becoming a workable alternative to today's subscription-based databases that curtail freedom of analysis in quantitative studies of scientific output.Recent studies have analyzed this data source in detail (Mongeon, Bowman, & Costas, 2022;Mugabushaka, 2022;Scheidsteger & Haunschild, 2022;Scheidsteger & Bornmann, 2023;van Eck & Waltman, 2022).
The version of OpenAlex employed covers 75,987,546 journal articles published between 2000 and 2021.The analysis by discipline used here, the Scopus Source List of the October 2021 All Science Journal Classification (ASJC, 2-digit and 4-digit code), was subsequently linked to the OpenAlex-listed publications using journal ISSNs, included in both sources.The analysis developed in Section 3 is based on the 2-digit code classification.That reduced the total number of classified articles to 49,526,533 (Figure S1, Supplementary material).The papers analyzed here therefore bore the OpenAlex publication citation data as well as the Scopus classification.Inasmuch as the journals were the same in both sources, this analysis is foreseeably reproducible for anyone with access to the full version of the Elsevier database.
The top 1% was found for the classic model using InCites methodology (hereafter InCites Method or simply InCites) (Clarivate, 2018), a method similar to that used in the Leiden Ranking and the SCImago Journal Rank.For the new model, the method used for calculation was as described by Wagner et al. (2022); hereafter Wagner method or simply Wagner.As noted, the main difference between the two methods is that the latter disregards the discipline to which publications pertain and bases the selection exclusively on the total number of citations in a given year.
A more pragmatic than theoretical calculation was performed to define the first percentile papers, in which that 1% was matched to the actual number in the two methods.The formula to calculate the relative rank for each row was: (rank − 1)/(total partition rows − 1).For the Wagner method, that pragmatism affected just 0.5% of the publications on average (values of the last column in Table S1 in the Supplementary material).
The following indicators were used: 1. P and PP: total number and proportion of publications per country or discipline.2. P(top 1%) and PP(top 1%): total number and proportion of highly cited publications per country or discipline.3. Growth rate: rate at which the scientific production in a country or discipline increased in a given period.More specifically, the difference in scientific production at the end-P(t n )-and beginning-P(t 0 )-of a period, divided by scientific output at the beginning of the period and expressed in per cent.
4. TC: total number of citations received by a publication, used to calculate percentiles.
The multiplicative counting method was used.In this approach, coauthored or multicategorized publications are fully assigned to each country or field (Perianes-Rodriguez & Ruiz-Castillo, 2015; Waltman & van Eck, 2015).

Total Number of Publications
Figure 1 shows that although InCites delivered 509,187 individual (and 832,930 multicategorized) top 1% or most highly cited publications, the values found with Wagner  were 495,256 individual papers.The difference was attributable to considering the field as a selection criterion.A total of 393,309 individual publications concurred in the two data sets.
Figure 2 depicts the proportion of highly cited papers found in common in both data sets in the years studied.The mean difference observed was 7.9%, with no significant year-by-year differences.The effects of that difference on output by country and discipline are analyzed in the subsections that follow.

Publications by Country
The distribution of the top 1% publications by country found with InCites is given in Figure 3(a) and with Wagner in Figure 3(b).Both methods revealed that the steep decline in US relative output and the slight decline in European Union (EU27) production were offset by a clear rise in China's numbers, in the last 5 years in particular.
The essential difference between the two methods is that according to InCites, China matched the EU27 in 2021 and, according to Wagner, the EU27 outperformed the United States in 2013 and China in 2021.
The total number of highly cited publications by country and year  is graphed in Figures 4(a) (InCites) and 4(b) ( Wagner).As in the two preceding figures, the EU27 is almost  The graphs denote the impact of using one or the other method for counting citations on country distribution.The differences are further illustrated in Figure 5, which graphs the InCites data on the horizontal axis and Wagner values on the vertical axis.The countries in orange lettering accounted for a smaller share and those in blue for a larger share of the top 1% publications with the Wagner method than with the InCites method.The figure on the right is a detail of the tail of the graph on the left.
The Wagner method attributed a larger share of the top 1% publications to China, Japan, Korea, and Singapore in Asia, Switzerland in Europe, and the United States.The differences found between the two methods were particularly significant for China (71%), the United States (46%) and Japan (44%).In contrast, Canada and European countries account for a lower share with Wagner, with the widest differences found for Canada (−61%), the EU27 (−35%) and the United Kingdom (−29%).
That analysis revealed the clear impact of the method used to calculate the top 1% on the results.The Wagner method favors Southeast Asian countries and the United States to the detriment of Australia, Canada, and the European Union.The question then posed is whether the distribution by discipline also differs and whether such differences might help understand the differences in performance by country.

Publications by Discipline
The total number of highly cited publications and distribution by discipline found with the two methods are given in Table 1.The disciplines attributed larger shares by Wagner were Biochemistry (4.5), Multidisciplinary (3.6), Chemistry (2.0),Chemical Engineering (1.8), and Immunology and Microbiology (1.0).Those found to have smaller proportions with that method were Medicine (−4.7),Social Sciences (−3.6), Arts and Humanities (−3.1),Mathematics (−1.3), and Agriculture (−1.2).Regarding the difference in the values observed with the two methods (values in parentheses above), with Wagner technological and applied sciences exhibit a higher share of the total.Conversely, medical disciplines, basic research fields, and nontechnological specialties have a lower proportion.Figure 7(a) plots the values for the specialties with the highest growth rate in the period, for two of which, Chemical Engineering and Chemistry, they are consistent with the data in Table 1.The highest growth according to Wagner was observed for Energy (270%), Materials Science (182%), Chemistry (140%), Chemical Engineering (136%), and Engineering (111%).
Humanities was the sole discipline of the low growth fields graphed in Figure 7(b) that is also shown in Table 1 as accounting for a small share of the total.The most prominent decline in share was observed for the specialty Multidisciplinary (−65%), but the lowest growth rates were in Psychology (−70%), Humanities (−68%), Economy (−64%), and Neurosciences (−55%).
As noted, the disciplines graphed in Figures 7(a) and 7(b) were those with the highest and lowest growth rates, respectively.That is not the same as the greater or lesser share of each discipline in Wagner as compared to InCites.Further to that approach, the specialties with the largest share were Multidisciplinary (4.5%),Chemical Engineering (1.8%), and Energy and Biochemistry (1.7%) (see Data availability).In other words, the two indicators refer to different items.Conversely, the fields with the smallest shares in the world total (as per InCites) were Veterinary (0.1%), Humanities (0.2%), Dentistry (0.3%), and Social Sciences (0.5%).
The proportion of highly cited publications in high-growth disciplines is given in Figure 8(a

Quantitative Science Studies
Selection of the disciplines depicted in both figures is based on the higher or lower growth rate in the period as well.
Those data contrast with the values in Figure 8(b), which graphs the same parameter in the disciplines with lowest growth in the period.Growth declined most in the Humanities (−81%), followed by Psychology (−73%), Neuroscience (−64%), Economy (−61%), and Earth Sciences (−56%).Similarly, Figures 9(a) and 9(b) graph the proportion of highly cited publications in high growth rate disciplines according to InCites.Again, selection of the disciplines depicted in both figures is based on the higher or lower growth rate in the period.

DISCUSSION AND CONCLUSIONS
As noted earlier, the Wagner method for calculating percentiles has been proposed by its authors as a tool to obtain more accurate assessments thanks to methodological improvements relative to parametric solutions such as used by the NSB.On the one hand, the method used by the NSB is similar to others traditionally used in the Leiden Ranking, Scopus, Web of Science, or SJR, according to the information provided by Roberge, Bédard-Vallée, and Rivest (2021, p. 33).In this case, the normalization procedure described has no effect on the percentile distribution compared to the results obtained with the raw citation counts.On the other hand, the Wagner proposal furnishes no statistical or mathematical justification for improvements over the nonparametric method used to date, one of whose criteria is to generate lists of highly cited papers bearing in mind the respective discipline.The new methodology likewise fails to integrate the technical solutions discussed in the literature cited in the introduction to this paper.Its impact on results must not therefore be underestimated.
Although it is always risky to pass categorical judgement in light of the many factors involved in scientific output and citation, the Wagner method exhibits a "country effect" that favors China, Japan, South Korea, and Singapore, as well as Saudi Arabia and Switzerland.In contrast, it attributes a smaller share than InCites to other regions and countries, such as Australia, Canada, the European Union, the United Kingdom, Brazil or South Africa (Figure 10). Figure 10 presents the proportion of relative complements of both methods by country.Relative complements are the publications not included in the intersection of both data sets (see the nonconcurrent sets in Figure 1).In essence, if the yellow bar is longer than the blue bar, the country is better represented in the Wagner method.The analysis by discipline helps to explain those rises and falls in share of the total.On the one hand, countries in which polytechnical scientific output, technological development, and innovation prevail are favored by Wagner.That is consistent with the results reported by Veugelers (2017) respecting the rise in Chinese scientific production, which she attributed to the weight of science and technology in that country's research.On the other hand, countries with significant output in basic research, Social Sciences, and Humanities are underrepresented compared to the InCites method.The findings of the analysis of the share of highly cited output by specialty confirm such a "discipline effect," whereby Wagner favors polytechnical fields.That predominance is visible in both the comparison to worldwide totals as per InCites and in terms of the disciplines with the highest growth: Multidisciplinary, Chemical Engineering, Environmental Sciences, Computer Sciences, Energy, and Materials Science.
In contrast, with Wagner, fields such as Humanities, Social Sciences, Veterinary, Health Professions, and Mathematics are underrepresented or exhibit negative growth.Figures 11(a  Another instance of the imbalance prompted by the Wagner method can be found in Multidisciplinary fields.Whereas InCites output in that field grew by 90%, according to Wagner it declined −42%.Other sizeable differences in growth rate were observed in Physics (15% in Wagner but −24% in InCites) and Chemistry (111% in Wagner and −1% in InCites).
In a similar vein, worldwide output revealed growth in the polytechnical disciplines, some of which concurred with the Wagner analysis: Chemical Engineering, Energy, Computer Science, and Environmental Sciences.Unfortunately, however, the data analyzed here provide no explanation for such growth, although the change in type of publication used in these specialties to disseminate scientific findings from conference proceedings to journal articles might be an initial assumption.That effect would be compounded by the rise in research to improve the technologies that encourage digitization of the worldwide economy and the pursuit of sustainable alternatives to fossil energy to reverse climate change, two of today's most pressing societal challenges.
In conclusion, the present findings confirm that the choice of method for calculating percentiles affects the share of total output found for each country and especially the overrepresentation of certain disciplines and underrepresentation of others.Consequently, the failure of the Wagner method to classify papers when defining highly cited publications distorts the results.The traditional approach to calculating percentiles is not affected by the field normalization.Using the disciplinary affiliation of publications and citations as a selection criterion ensures its sensitivity to disciplinary idiosyncrasies being the fairest and most reliable method for calculating percentiles.
Most of the results shown can be considered as intuitive and obvious.However, intuitions and obviousness must be confirmed with numbers and facts.In addition, the findings are also in line with a recent publication confirming, again, that some fields are fundamentally different from other fields, even within disciplines, warning about those techniques applied for bibliometric analyses without controlling for field (Andersen, 2023).Likewise, the outcome suggests that use of the Wagner method exacerbates the very well-studied effects of differences in publication and citation practices across disciplines, revealing important side effects to consider before applying the new method proposed.

LIMITATIONS AND FURTHER RESEARCH
Quantitative studies are always welcome and necessary to understand the ecosystem to which the object of analysis pertains.They are nonetheless nearly always insufficient for decisionmaking unless backed by qualitative analyses that control for the undesirable bias inherent in data sources or processing or, as here, in the measuring procedure.
Secondly, the very early stage of development of the data source used, OpenAlex, which is still being enlarged and improved, may affect the reliability of the findings.The verifications run of the set of highly cited publications studied nonetheless denoted the very limited scope of some of those defects in this analysis.By way of example, although citation data, needed for percentile calculations; affiliation data, required for counts by country; and subject matter information, necessary for the publication count by discipline were missing, the proportion involved was imperceptible in the top 1% of publications.However, that does not excuse the need to assess and verify reliability for other types of analysis in research on the effects of using the methods deployed here on university rankings or on international scientific collaboration.One substantial benefit of the appearance of 100% open sources such as OpenAlex is that they broaden the scope for democratizing quantitative analysis, guaranteeing transparency and reproducibility (Hendricks, Kramer et al., 2021;van Eck & Waltman, 2022).Future expert efforts should place the focus on harmonizing and consolidating procedures and indicators.That would enhance the capacity to engage in increasingly complete and sophisticated analyses.Based on reliable data and standardized identifiers (ORCID, DOI, ROR, ISO, ISSN, NUTS, Funder id, etc.), such endeavors would favor interconnection with other data sources to heighten the impact of quantitative studies and citation analyses.That manner of analysis could then be combined with information on project funding, doctoral thesis output, socioeconomic indicators, geographic analyses, and technology transfer.The specialty should also address the investment and human, technological, and economic support required by open data infrastructures that should be provided by all the actors benefitting from their use.Such support would be needed for updating as well as data maintenance, disambiguation, and cleansing to ensure the quality, transparency, and reproducibility of the results.Guaranteeing open bibliographic metadata infrastructure and access is a responsibility that must be shared by all participating institutions and researchers.

Figure 1 .
Figure 1.Multicategorized publications in the InCites and Wagner methods.

Figure 2 .
Figure 2. Proportion of publications appearing in the top 1% in common in both data sets (InCites and Wagner), 2000-2021.
caught up by China in 2021 according to InCites, and both the EU27 and China surpassed the United States in Wagner.
Figures 7(a) and 7(b) compare the proportion of the top 1% publications identified by Wagner relative to the worldwide total by discipline further to InCites.Selection of the disciplines depicted in both figures is based on the higher or lower growth rate in the period.
) and the proportion in low growth disciplines in Figure8(b), both according to Wagner.

Figure 9 .
Figure 9. (a) Trends of disciplines with highest growth rates, top 1% publications; (b) trends of disciplines with lowest growth rates, top 1% publications (InCites).

Figure 10 .
Figure 10.Proportion of publications of relative complement by country (InCites and Wagner).

Figures
Figures 11(a) and 11(b) use a similar comparison to that in Figure 10.Again, blue bars represent the proportion of publications in the intersection of both methods (publications considered as top 1% in InCites and Wagner).Yellow bars represent the proportion of publications only included in the set of highly cited publications in InCites (Figure 11(a)) and Wagner (Figure 11(b)).For example, 19.9% of highly cited publications of Pharmacology in InCites are not included in Wagner.Conversely, all the publications of Pharmacology in Wagner are included in InCites.This means that publications of Pharmacology are underrepresented in Wagner.
) and 11(b) illustrate the weight of the publications of relative complements (exclusive to one or the other method) as the cause of the imbalance and differences across disciplines in the InCites and Wagner methods.

Figure 11 .
Figure 11.(a) Proportion of publications by discipline: intersection and relative complement (InCites); (b) proportion of publications by discipline: intersection and relative complement ( Wagner).

Table 1 .
Total number and proportion of publications in the top 1% by discipline (InCites and Wagner).