Abstract
The list of Highly Cited Researchers (HCR) published each year by Clarivate occupies a special place in the academic landscape, due to its use in the Shanghai rankings. This article looks at the evolution of this list, based on material communicated between 2001 and 2023 by its various producers (the Institute for Scientific Information, Thomson Reuters, and Clarivate) on their respective websites. Three main phases in its trajectory have then been identified. The first is characterized by the creation of a database (2001–2011), the second by the affirmation of an indicator (2012–2018), and the third by the weakening of a strategy (2019–2023). An analysis of this trajectory provides a better understanding of the importance of this list and the challenges it faces today, in a context where some of the key issues of research evaluation and scientific integrity are being called into question.
PEER REVIEW
1. INTRODUCTION
In a context of international competition for scientific excellence, the American-British company Clarivate has been promoting three “recognition programs” since November 2022, with the aim of “Applauding the elite group of people behind innovative contributions to global research.”1 Each year, the Eugene Garfield Award is given to young researchers in the field of scientometrics, the Citation Laureates list names around 20 highly cited researchers “whose influence is comparable to that of past and future Nobel Prize recipients,”2 and the annual list of Highly Cited Researchers (HCR) aims to identify “individuals who have demonstrated significant and broad influence in their field(s) of research.”3 However, these three programs have a much longer history. The first Citation Laureates were nominated in 1989, the first HCRs were identified in 2001, and the first recipients of the Eugene Garfield Award were honored in 2017. Over the past 23 years, the annual list of HCRs has become an integral part of the international academic landscape, and a key issue for many institutions and researchers, because of its use as an indicator in the Shanghai rankings (Docampo & Cram, 2019). Despite the importance of this list, its history and development have been little studied. As we shall see in the second section of this article, the scientific publications that have looked at this list since the 2000s have essentially focused on two approaches: the HCRs as an indicator (what the number of HCRs says about an institution, a region, or a country) and the HCRs as a population (the characteristics of the researcher profiles selected in terms of age, gender, and publication practices). Some major methodological changes have of course been mentioned in the context of these publications, but the overall trajectory of this list, terms of both in conception and positioning, has not been analyzed.
The aim of this article is therefore to retrace the trajectory of this list, from its first publication in 2001 to its most recent edition in 2023.4 To do this, we chose to rely on the materials communicated during this period by the various producers of the list on their respective websites. As a reminder, the Institute for Scientific Information (ISI), founded in 1960 by Eugene Garfield, is behind the HCR list project. The Institute was first acquired by JPT Publishing in 1988, then by Thomson Corporation in 1992, before becoming part of the Intellectual Property & Science division of Thomson Reuters around 2008, until it was acquired by Clarivate Analytics in 2016 and the Institute itself was re-established in 2018.5 The corpus of 124 documents analyzed in this study was compiled using the Wayback Machine, a tool provided by Internet Archive, which can be used to obtain snapshots of websites at a given time. A more detailed composition of this corpus is the subject of Section 3. It should be pointed out, however, that the choice made in this article of relying solely on documents produced and made public by producers reflects a desire to understand the list as it has been presented to us over time by its producers. Interviews were conducted with current and former producers of this list as part of our research, but this material is not used here. It will be the subject of future work.
The analysis of this corpus allows us to identify three main phases in the trajectory of this list, detailed in Section 4. The first is the creation of a database (2001–2011). This phase corresponds to the evolution of a research project initiated by Eugene Garfield in the 1980s, towards the development by the ISI of a regularly updated database providing the scientific community with both biographical and bibliographical data on the identified researchers. However, this database was gradually abandoned by Thomson Reuters and was archived in 2011. The second phase identified is the affirmation of an indicator (2012–2018). During this period, the list was relaunched by Thomson Reuters with two major changes, which radically modified its purpose: the disappearance of the biographical and bibliographical dimension and the publication of a list that is entirely recalculated each year. In this phase, new actors are involved, such as Shanghai Jiao Tong University, which has been establishing the ARWU Shanghai rankings since 2003 and uses this list of highly cited researchers to rank universities according to their number of HCR. We will see how the importance of this ranking will gradually transform this list into an indicator of excellence. Finally, the third phase corresponds to the weakening of a strategy (2019–2023). This phase is characterized by the constant increase in profiles suspected of scientific misconduct, which challenges the ability of this list to identify truly influential researchers. It compels Clarivate to revise its methodology and positioning once again to ensure the continued existence of this recognition program.
2. THE HCR LIST IN THE SCIENTIFIC LITERATURE
Since 2006, HCRs have been the subject of several dozens of scientific articles. In these publications, two main approaches emerge: One focuses on HCR as an indicator and the other on HCR as a population.
2.1. HCR as an Indicator
Regarding HCR as an indicator, it is the sheer number of researchers identified as highly cited that raises interest, because this is seen as a way of measuring the impact, or even the excellence, of research. It allows comparisons between laboratories, institutions, or countries.
Although, in this context, some attention is paid to the methodology used by Thomson—and then by Clarivate—to build this list and to its major changes (Docampo & Cram, 2019), it is the relevance of this indicator that is analyzed above all in these publications. It is understood as a way of measuring the impact, quality, importance, recognition, influence, value, performance, or excellence of research and, by extension, of the individuals, institutions or even countries that carry out this research. It is difficult to say whether these terms are used as synonyms throughout the texts studied for this literature review, or whether they truly reflect different concepts, a fortiori, because they may themselves have several meanings. Aksnes and Aagaard (2021) note that it is difficult to define excellence in a standardized and consistent way, as its definition may vary according to discipline or research policy. In this context, citation indicators are widely used to represent the impact of a publication and the reputation of a researcher (Martinez & Sá, 2020), although the link made between citations and excellence (or quality) is nuanced. Citing an article can be influenced by many factors (Aksnes & Aagaard, 2021). There is also debate about how citation counts correlate with other forms of recognition (Basu, 2006). The articles singled out by citations are not necessarily those that peers, or even the authors themselves, consider to be the best (Aksnes & Aagaard, 2021; Borchardt & Hartings, 2018).
Secondly, the way in which this indicator is used is examined in these publications. The number of HCRs can, for example, be a way of obtaining information about the major contributors to research, and the dynamics under way, on a global scale. They can thus be used to identify trends in global leadership in science and technology (Basu, Foland et al., 2018). The number of HCRs can also be used to identify leading institutions and fields across a country (Basu, 2006; Bornmann & Bauer, 2015; Butler, Xu, & Musial, 2018; Wei & He, 2021). This indicator is also used, as mentioned, in the Shanghai rankings (Cram & Docampo, 2014), with collateral effects: Many Saudi universities have made spectacular progress in the university rankings thanks to the affiliation of HCR to their institutions through specific contracts (Alhuthali & Sayed, 2022). Finally, HCRs are identified as a stake for universities, beyond university rankings (Docampo & Cram, 2019). The concept of HCR is then considered as a social fact with relevant implications for research policy, practice, and academic careers (Martinez & Sá, 2020), by guiding funding decisions (Li, 2018) or questioning the relevance of local evaluation systems (Diko, 2015).
2.2. HCR as a Population
In this second approach, the characteristics of researchers and their research practices are analyzed.
First, the point is to get a better picture of the profile of HCRs. The gender of HCRs is a quite widely studied characteristic, at least from a quantitative point of view (Bornmann, Bauer, & Haunschild, 2015; Bornmann, Bauer, & Schlagberger, 2017; Confraria, Blanckenberg, & Swart, 2018; Meho, 2022; Sinay, Carter, & de Sinay, 2020; Shamsi, 2021; Shamsi, Lund, & Mansourzadeh, 2022; Wei & He, 2021). Another characteristic studied is the age of HCRs and the length of their careers (Must, 2020; Wei & He, 2021). The mobility experiences of HCRs are also the subject of analyses (Confraria et al., 2018; Martinez & Sá, 2020; Must, 2020; Yang, Liang, & Xue, 2018). Finally, access to resources, whether human, financial, or material, is analyzed by Sinay et al. (2020), as is the importance of language in the publication and collaboration processes, but only based on limited samples.
The other axis of analysis of the HCR as a population is research practices. To begin with, the production and productivity of HCRs is studied (Aksnes & Aagaard, 2021; Alhuthali & Sayed, 2022; Confraria et al., 2018; Must, 2020; Wei & He, 2021), including document type (Aksnes & Aagaard, 2021) and journal impact factor (Aksnes & Aagaard, 2021; Sinay et al., 2020). The diversity and duration of their collaborations are then the focus of attention (Aksnes & Aagaard, 2021; Confraria et al., 2018; Martinez & Sá, 2020; Must, 2020). Finally, the ability of HCRs to disseminate and promote the results of their research is examined, particularly their use of social networks (Mas-Bleda, Thelwall et al., 2014; Niu & Huang, 2021; Ye & Hong, 2019).
It should be noted that when analyzing the profile of HCRs and their research practices, the question of the scientific integrity of these highly cited researchers is regularly raised. This angle is addressed both through the identification of possible overproduction (Alhuthali & Sayed, 2022), a rate of self-citation considered unusual (Van Noorden, 2020; Van Noorden & Singh Chawla, 2019), or the participation of HCRs in retracted articles (Kamali, Rahimi, & Abadi, 2022).
3. APPROACH AND METHODOLOGY
To add a new perspective to the already rich literature on this list, we collected in January 2024 a corpus of 124 documents enabling us to trace the development of this object.
Once we had identified the various URLs concerned (isihighlycited.com, highlycited.com, hcr.stateofinnovation.thomsonreuters.com, clarivate.com/hcr/, hcr.clarivate.com, etc.), we used the Wayback Machine to review the pages captured by Internet users between 2001 and 2023, to collect the various contributions made by the producers of this list. In most cases, we chose to collect the last catch of the year (with the exception of 2001, when the first catch was also used). We also used all the hyperlinks that worked on these captures to get as many pages as possible. Each page has been saved in text format for easier use.
Table 1 shows the distribution by document type and producer.
Corpus distribution by document type and producer
. | ISI . | Thomson Reuters . | Clarivate . | Total number of documents . |
---|---|---|---|---|
“About us” section | 1 | 3 | 1 | 5 |
Public relation documentation | 0 | 0 | 2 | 2 |
Press release | 1 | 0 | 9 | 10 |
FAQ | 6 | 1 | 8 | 15 |
Methodology | 7 | 6 | 8 | 21 |
News | 30 | 3 | 18 | 51 |
Home page | 1 | 6 | 7 | 14 |
Report | 0 | 2 | 4 | 6 |
Total number of documents | 46 | 21 | 57 | 124 |
. | ISI . | Thomson Reuters . | Clarivate . | Total number of documents . |
---|---|---|---|---|
“About us” section | 1 | 3 | 1 | 5 |
Public relation documentation | 0 | 0 | 2 | 2 |
Press release | 1 | 0 | 9 | 10 |
FAQ | 6 | 1 | 8 | 15 |
Methodology | 7 | 6 | 8 | 21 |
News | 30 | 3 | 18 | 51 |
Home page | 1 | 6 | 7 | 14 |
Report | 0 | 2 | 4 | 6 |
Total number of documents | 46 | 21 | 57 | 124 |
This distribution has its limits: A “News” or “FAQ” page, for example, may contain methodological information. It does, however, give a more precise idea of the corpus used. We should also point out that you should not rely on the date in the URL to date the documents. This is in fact the date on which a capture was made on the Wayback Machine, and this may have occurred long after the web page was edited.
Once the corpus had been established, we analyzed each document to identify the chronology, the actors involved, and the vocabulary used. In all these texts, we looked for elements relating to the methodology used to compile this list, the analyses provided by the producers (China’s position in relation to the United States, ranking of the best institutions, etc.) and communication (portraits of HCRs, promotion of the system, etc.).
The method used here is inspired by Glaser and Strauss’s Constant Comparative Method (Glaser & Strauss, 1967). The aim is to closely link coding and analysis. To do this, we went through all the documents collected chronologically, carrying out two operations: First, we identified occurrences and categories in each text; then we wrote a short summary for each year that highlights the salient elements and interesting quotations. The chronological reading was key to identifying “significant episodes” and putting them in series (as well as in relation with one another) (Dodier, 2003). The short summaries, or mementos (Glaser & Strauss, 1967), enabled us to associate an idea with its illustration.
This inductive, iterative approach gradually gave coherence to some elements and was relevant to identify breaks in the story. It consequently supported an analytical breakdown of the list’s trajectory into three main phases.
4. TRAJECTORY OF THE LIST
4.1. Creation of a Database (2001–2011)
The first identified phase began in 2001 and runs until the end of 2011, although 2008 is already a turning point. While following in the footsteps of the work initiated by Eugene Garfield in the 1980s, the HCR list is distinguished by the desire to collect a large amount of information on the authors, to offer the scientific community more than just a list of names.
4.1.1. A project in Garfield’s footsteps
Between October 1981 and July 1982, Garfield (1981) published a series of six articles in his “Essays of an Information Scientist” devoted to “the 1,000 contemporary scientists most-cited 1965–1978.”6 These 1,000 authors have been identified by the ISI teams and are included in a list that provides the first and last names of the researchers, their discipline, their date of birth (and possibly death), as well as the number of citations received during the corresponding period.
When the ISIHighlyCited.com gateway was launched 20 years later, it was presented as a direct continuation of the work initiated by Garfield. Marie McVeigh, producer of this gateway, speaks of “the more recent of two major projects at the ISI that used citation analysis to identify influential scientists throughout the world,”7 with direct reference to the 1981 work. And thanks to its indexing of the most influential journals since 1945, the ISI introduces itself as the only actor capable of continuing this work in 2001.
4.1.2. A cumulative approach
To compile the first list of HCRs in 2001, the ISI began by identifying articles indexed in the Web of Science (WoS) between 1981 and 1999. Conference proceedings and books were not taken into account because the ISI considered their bibliographies to be incomplete (only the first authors were generally mentioned, as opposed to the articles, which, according to the Institute, undermined the contribution of other researchers).8 The 19 million articles listed for the given period were assigned to one of 22 Essential Science Indicators (ESI) categories according to the journal in which they were published. In the case of so-called multidisciplinary journals (Science, Nature, etc.), an analysis was carried out at article level.9 For each article, the citations obtained were counted and attributed in full to each of the authors. The latter were then classified according to their total number of citations. The Institute was seeking to identify 250 researchers for each of the ESI categories, giving around 5,000 researchers. The 250 authors with the most citations in each category were therefore selected for inclusion in the list. Once this initial identification had been made, the ISI team carried out research to determine the full name, affiliation, and address of each researcher. They were then contacted and asked to provide the ISI with a copy of their CV and a list of their publications. This stage eliminated any remaining homonym problems.
At the time of publication of the ISIHighlyCited.com platform, the work had only been carried out for four ESI categories and the data was incomplete, as only 100 researchers had been identified. The list was then gradually completed, with new categories being added little by little and the list of researchers growing until it reached the target of 250 names. It was not until 2004 that all 21 categories were available and complete. 2004 was also the start of an update based on 1983–2002 data, without any changes being made to the methodology.10 The ISI reported that the new corpus had resulted in the emergence of 1,000 new names that had been added to those obtained from previous data sets, because “no researchers will be removed from the site.”11
This cumulative approach, combined with extensive data collection by the ISI teams, created a database that complemented Thomson’s range of services and products.
4.1.3. A complement to the Web of Science
For the ISI, the idea of this gateway or platform was not so much to publish a list as to offer “a uniquely enhanced bibliography of published work of a select group of influential scientists.”12
The information provided on the selected researchers was both biographical (studies carried out, jobs held, membership of learned societies, research interests, distinctions, and awards) and bibliographical (list of journal articles, books or book chapters, scientific communications, websites or other Internet resources, etc.). The aim was to give a context to this work of identifying the most influential researchers, to provide “a comprehensive view of pivotal influences on a scientific field.”13
In 2002, the Institute decided to fully integrate ISIHighlyCited.com with the bibliographic and citation data on the Web of Science (WoS). However, unlike WoS, which requires a subscription to be consulted, the ISIHighlyCited.com gateway remained accessible free of charge, and the HCR bibliography was “a critical point of interest.”14
The Institute believed that researchers should be involved in the development of this database. They could contribute directly to their profile after authenticating themselves with a username and password. The process was to be “an on-going cooperation with the researcher,”15 while addressing the whole scientific community.
4.1.4. A wide range of uses
When the gateway was published in 2001, the Institute identified five potential uses for it. Three of them provided access to bibliographical and biographical information on HCR: “view a researcher’s publication records and achievements,”16 “link directly to the ISI Web of Science and access the full bibliographic information […]”17 and “discover relationships between different aspects of a researcher’s work […].”18 A fourth use consisted of identifying possible collaborations (“locate potential collaborators, experts or colleagues”).19 Finally, a fifth use was emerging, for monitoring purposes (“keep up-to-date on research authorities in a variety of fields”).20 The following year, the Institute added two more potential uses to its list: access to the patent index (“Derwent Innovations Index,” accessible by subscription only) and a new form of monitoring, which consisted of “stay current on scientific community news and trends.”21
The scientific community seems to have been the primary target audience for this resource. On 17 February 2001, the ISI decided to preview the gateway at the annual meeting of the American Association for the Advancement of Science (AAAS) in San Francisco, before it went online in May. Similarly, to celebrate the launch of the gateway, the ISI organized a reception on 27 August 2001 in Boston, at the annual meeting of the American Chemical Society (ACS). At this event, the ISI spoke directly to chemists to justify the use of citations and the quality of the lists. Through these events, the Institute addressed itself first and foremost to the scientific community, with whom it promoted its resources.
However, as early as 2001, the ISI identified three other potential users of its tool, although they were only the subject of a few lines repeated each year. These were “legal professionals,”22 who could “find expert witnesses”23 thanks to this gateway, and also “corporations and government agencies can locate centers of excellence to help make policy decisions based on the information they find.”24 This last potential use is not insignificant. By highlighting its ability to identify “key,”25 “most influential,”26 “preeminent”27 researchers who are “authorities”28 or “leaders in their field,”29 individuals whose contribution to the “advancement of science”30 was deemed “fundamental,”31 and whose influence on their field of research was “undeniable,”32 the ISI was gradually giving this list another direction: to be an indicator of scientific excellence.
By developing this unprecedented resource, the ISI seems to have built a tool that, like WoS, was destined to become an important component of the academic landscape. However, it was abandoned only a few years after its launch.
4.1.5. An abandoned project
The ISIHighlyCited.com gateway appears to have shut down in 2008. The interface was no longer evolving, and updates seemed to have stopped. No new content is available in 2009. As for 2010, no site capture is available on Wayback Machine. It is interesting to note that no information was provided to explain the reasons for this abandonment. The subprime crisis could be one of the reasons, as the global economic crisis did not leave the world of scientific publishing untouched, as Chérifa Boukacem-Zeghmouri (2015) explains. This assumption is reinforced by statements made by the producers during their interviews, who put forward economic reasons for ceasing to compile personal information on the researchers.
In 2011, the gateway underwent what appeared to be a final update, but also a swan song. The URL was changed to HighlyCited.com, the site was given the emblematic Reuters color (orange), and the home page was renamed “Highly Cited Research” (rather than “Researchers”), thereby removing both the historical producer of the list, the ISI, and the actor at the heart of the process, the researcher.
December 31, 2011 marked the end of this resource as such, and of the first phase we have identified in this journey of the HCR lists. The Institute announced that the gateway “will no longer be maintained or updated as a stand-alone resource.”33 The database is now an archive. The second of the ISI’s “major projects” therefore seems to have come to an end.
4.2. The Affirmation of an Indicator (2012–2018)
4.2.1. Two major changes
While the ISIHighlyCited.com resource was archived at the end of 2011, Thomson Reuters returned with a new list proposal, making public the methodology and results in December 2012. This was characterized by two major changes, which profoundly altered the purpose.
The first major change was the disappearance of the biographical and bibliographical dimension from the lists. Whereas ISIHighlyCited.com offered a range of information about authors and their scientific output, the new list offered just five items: the researcher’s surname, first name, ESI category, primary affiliation, and secondary affiliation. The bibliographic information accessible by a hypertext link was only that referenced by the WoS, unlike what was offered on ISIHighlyCited.com. Similarly, biographical information was limited to the information available in the researcher’s affiliations. The list lost some of its richness in the first phase, returning to what the ISI described at the time as “really just the skeletal element.”34
The second major change was the frequency with which the list was published. Unlike the ISIHighlyCited.com gateway, which was updated on an ad hoc basis, the list was now published annually. This new rhythm made the publication of the list an event, with an announcement and even a press release from 2016. Thomson Reuters considered that these annual updates did not replace existing information but extended it, as an HCR retained this status. However, the decision to switch to this frequency of publication was not fully explained. Thomson Reuters simply stated that “elite status is continually changing as researchers rise and fall in their influence throughout the years. Monitoring such changes, even annually, has an interest all its own.”35
4.2.2. Many methodological adjustments
On the new list proposal in 2012, Thomson Reuters declared that “to ensure Highly Cited remains an authoritative, respected tool that supports the research community [they] will continue to collaborate with [their] constituents to ensure [their] methodology and data analysis is transparent and relevant to the current state of global research.”36 Adjustments would then be made to various aspects of the methodology.
4.2.2.1. A new time window.
The 20-year window was initially reduced to 10 years for publications and 11 years for citations, then to 11 years for both items. In addition, this window ended 2 years before the year in which the list was published. For the 2014 list, the period used for both publications and citations was 2002–2012. This new time frame was the consequence of a second methodological change: the use of ESI’s Highly Cited Papers (HCPs) rather than citations. HCPs correspond to articles identified, for each ESI category and for each year of publication, as appearing in the top 1% of citations in the WoS, over an 11-year window.
4.2.2.2. HCPs counting.
Until now, HCRs were identified based on the cumulative number of citations obtained for their work. In 2012, Thomson Reuters was trying out a new approach, aimed to normalize citations according to specialty.37 However, this attempt to standardize citations was not conclusive. Thomson Reuters reported that they had been criticized by bibliometricians, researchers, and administrative staff. The standardization of citations led to an overrepresentation of publications with the highest number of citations, as well as their authors, which resulted in the omission of many scientists and made it impossible to compile “a robust listing of leading researchers.”38 The company therefore decided to revise its list using the ESI’s HCP count, explaining that the HCP methodology was “well described in ESI, familiar to many, and therefore transparent, […] and because you, the scholarly community, consistently reference ESI in your responses to the new methodology and its results […].”39
Thomson Reuters considered that this selection method, based on the percentile, eliminated the disadvantage in terms of citations of recently published articles compared with older articles. It reflected the company’s desire to identify researchers with a recent impact in terms of research. Lastly, it should be noted that the citation count did not disappear; it still played a part in the selection of authors, a selection process that was also subject to methodological changes.
4.2.2.3. Number of authors and entry threshold.
For the list now archived, the ISI chose to select around 250 researchers for each category. With the new methodology proposed in 2012, Thomson Reuters considered that, based on the annual number of publications, scientific communities could be classified according to three levels: small, medium, or large.40 These levels would now determine the number of HCRs to be included in the list: 100 for the small communities (five ESI categories), 150 for the medium (eight categories) and 250 for the large (eight categories).
However, a new change occurred in 2014. Thomson Reuters kept the idea of adapting the number of researchers selected to the size of the scientific community but chose to rely on the number of authors rather than the number of publications, and to establish admission thresholds adapted to each ESI category. Then, for each category, the number of authors who had collaborated on at least one HCP was identified and the square root of this number was calculated.41 Researchers were then ranked in descending order of HCP, and those with the highest HCP were admitted until the threshold designated by the square root was reached. Clarivate provided a fictional example to explain this process (Figure 1).
Fictional example of selection process. Source: https://clarivate.com/highly-cited-researchers/evaluation-and-selection/.
Fictional example of selection process. Source: https://clarivate.com/highly-cited-researchers/evaluation-and-selection/.
4.2.2.4. Considered contributions.
In 2014, Thomson Reuters explained its “desire to include reports deriving from extensive multi-institute collaborations but not those from huge teams, as found in many high-energy physics experiments.”42 To this end, the Institute decided in 2014 that for the “Physics” ESI category, articles with more than 30 affiliations would be excluded from the calculations used to identify HCRs. This threshold of 30 affiliations was determined after examining HCP in physics, as well as their content. In 2017, this exception was also applied by Clarivate to the “Space Science” category.
The question of contribution also arises about the way publications and citations are counted. The ISI, and then Thomson Reuters and Clarivate, chose not to use fractional counting. Thomson Reuters considered that the solution for distributing credits was not simple, and that in the absence of information on each person’s contribution, split-counting at article level seemed as arbitrary as full-counting. They added that splitting up the citations could even introduce bias: “what would appear to be fairer in terms of assigning credit using fractional counting may in fact bias an analysis against papers resulting from teamwork, since some fields, such as mathematics, typically exhibit a very low average number of authors in comparison to, say, biomedical fields.”43
Finally, there is the question of considering researchers who publish HCPs in several categories but not enough in each of them to be selected. Many criticisms were addressed to them on this subject, criticisms considered to be “valid and welcome.”44 Thomson Reuters considered that it was their duty to overcome this problem if it caused them to miss out on HCR. An approach involving the creation of seven major groupings of categories for which cross-dependence is recognized was envisaged in 2012.45 However, this solution was never implemented and, until 2018, Thomson Reuters (then Clarivate) considered that, given the number of possible combinations in terms of categories and the number of HCPs concerned, calculating a baseline to decide whether to include a researcher on the list would be “unmanageable.”46 However, in 2018, a new category, called “Cross-field,” was finally designed to remedy this situation. Here again, Clarivate offered a fictional example that explains its methodology (Figure 2).
Fictional example of cross-field selection process. Source: https://clarivate.com/highly-cited-researchers/evaluation-and-selection/.
Fictional example of cross-field selection process. Source: https://clarivate.com/highly-cited-researchers/evaluation-and-selection/.
Clarivate believed that “breaking through the artificial walls of conventional disciplinary categories will help keep [the] Highly Cited Researcher list contemporary and relevant.”47 Here again, we see a desire on the part of the list’s producers to adjust their methodology to retain their usefulness with actors with whom interactions seem to be increasing.
4.2.3. New actors
During the first phase of the list, that of creating a database, the researcher was central to the process. Now that this biographical and bibliographical content was no longer offered, the company seemed to be interacting less directly with the researcher, and perhaps giving priority to other actors, particularly in drawing up the list.
While the new methodology proposed at the end of 2012 was openly subject to review by the scientific community, it had already been drawn up with the help of an undisclosed “expert and informed university leaders in different regions.”48 Universities became important actors in this second phase. As proof, Thomson Reuters was even accepting in 2015 (but only for this year) requests from universities to change their main affiliations on behalf of HCR, 49 probably because affiliations are a major issue in the Shanghai rankings.
While this ranking has existed since 2003 and the HCR list is used in its production, Thomson Reuters first mentioned collaboration with Shanghai Jiao Tong University (SJTU) in 2013. The data from the list would later be presented by Clarivate as a “key component of the Academic Ranking of World Universities, one of the longest established and most influential annual surveys of top universities globally.”50 In 2014, we learned that Thomson Reuters not only sent the university in charge of the ranking the number of HCRs per institution, but also worked with this actor before the list was published to identify the researchers’ affiliations.51
New challenges and interests were emerging for the producers, which noted the importance taken on by these lists, particularly in the Shanghai ranking, which was now seen as an indicator of scientific excellence for many actors in higher education and research.
4.2.4. An indicator of excellence
One of the directions outlined in the first phase of the list’s trajectory became a constituent part of this second phase. The list was an indicator of excellence at two levels: individual (at the level of the researcher) and collective (at the level of an institution or a country).
4.2.4.1. At the researcher level.
In 2016, Clarivate stated that while it was difficult to quantify the value of each contribution among the more than two million publications that appear each year, “the research community, publishers, academic administrators and others seek such insight, beyond who has the highest salaries, biggest laboratories or office shelves with the most awards.”52 Clarivate positioned itself as being able to meet this need and reaffirmed its strategy of relying on HCP, given that “a paper that other authors have frequently cited has quantifiably proved itself to be significant.”53
The list’s presumed ability to identify the “scientific elite”54 was such that being an HCR became a criterion which, according to the producer, can influence a researcher’s career. Jessica Turner, global head of government and academia at Clarivate, stated that “[they] are proud that [their] list of Highly Cited Researchers has earned global respect among the academic and scientific community and has the potential to present new opportunities for career advancement, recruitment and institutional enrollment.”55
In 2018, Clarivate took a cautious approach by referring to the recommendations of the Leiden Manifesto on the use of indicators in research evaluation. The company also stated that the use of HCR status in the context of appointments, promotions or funding decisions “demands informed interpretation,”56 that “one should never rely on publication and citation data as a substitute for reading and assessing a researcher’s publications—that is, for human judgement.”57 But, at the same time, Clarivate was confident in the relevance of this indicator at an individual level. Thus, although the company considered that “evaluating the research performance of individuals is the most contentious application of publication and citation data,”58 it believed that when “a researcher’s record exhibits top-tier status quantitatively, demonstrated by the production of papers in the top 1%, top 0.1%, or even top 0.01% of a citation distribution, researchers can be more certain of having positive and reliable evidence that the individual under review has contributed something of utility and even significance.”59
It is interesting to note that from 2016, self-promotion tools for researchers were to be introduced. Researchers were invited to download a badge, a certificate, and an official letter of congratulations using a special form. This underlined the importance of being labeled ‘highly cited’ for a researcher.
4.2.4.2. At the level of an institution or a country.
The idea of aggregating HCRs at the level of institutions or countries, both to outline major trends in research and to provide indications of the most prolific areas, appeared as early as 2003.
The gradual decline of U.S. hegemony and the rapid rise of China, in terms of the number of HCRs, was the subject of analyses every year, whether in blog posts or in reports published by Thomson Reuters from 2014 onwards. As of 2018, Clarivate went beyond these analytical elements and offered a ranking (“Top 10”) of the 10 countries with the most HCRs. The same applied to institutions: While the most prolific HCR institutions had been named by the ISI since 2003, a Top 10 was also produced from 2018 onwards.
This aggregation by institution had also been an indicator in the Shanghai rankings since 2003, as had the number of Nobel Prizes, a scientific distinction with which the list’s producers were seeking to make a comparison.
4.2.4.3. Parallels with the Nobel Prizes.
To support the performance of its lists in identifying what it calls the “scientific elite,” the ISI used a parallel with the Nobel Prizes. While the match between the HCR and Nobel lists was relatively anecdotal until 2018, it became almost systematic since then. Not only did Clarivate count the number of Nobel Prize-winning HCRs, but it also chose to highlight these profiles in its communications materials and to refer to Eugene Garfield, stating that he “tookspecial interest in using citation data to forecast Nobel Prize winners by identifying a group of researchers he termed ‘of Nobel class’.”60 Clarivate used this parallel to demonstrate the efficiency of its approach.
It is interesting to note that the parallels between the Nobel Prizes and the HCR list can be drawn at another level: While these two approaches aim to celebrate individuals, they become an indicator once they are aggregated at the level of an institution or a country. Like the HCRs, the Nobel Prizes are an indicator in the Shanghai rankings and account for a significant proportion of an institution’s final score. However, unlike the HCR, being nominated for a prize such as the Nobel is based on qualitative judgements made directly by peers and not through citation counts. This peer selection process, while not exempt from all bias, may have the advantage of limiting the risk of manipulation that could threaten the integrity of this indicator and undermine an entire strategy, as Clarivate seems to have been experimenting with in recent years.
4.3. The Weakening of a Strategy (2019–2023)
Given the importance of the HCR list in the global higher education and research landscape over the years, the increase in the number of researchers suspected of scientific misconduct on the list in recent years has raised questions that go beyond the usual criticisms and call into question the strategy deployed by the list’s producers to position themselves as a privileged actor in the identification and measurement of scientific excellence.
4.3.1. The prevalence of scientific misconduct
The issue of scientific misconduct was first mentioned by Clarivate in 2016. To the question “How do you handle cases in which a Highly Cited Paper is later retracted or an identified Highly Cited Researcher has been found to have committed scientific misconduct?,” the company declared that retracted items would not be counted, and that “researchers found to have committed scientific misconduct in formal proceedings conducted by a researcher’s institution, a government agency, a funder, or a publisher are excluded from [their] list of Highly Cited Researchers.”61
However, from 2019 the issue of scientific misconduct and its impact became more prominent. The ISI, reinstated by Clarivate in 2018, thus referred to the fact that “factors such as retractions, misconduct, and extreme self-citation—all of which would detract from true community-wide research influence—may lead to an author being excluded or suppressed from the list.”62 In 2020, a small paragraph appeared on the last page of the report that claims “The Institute for Scientific Information (ISI)™ at Clarivate has pioneered the organization of the world’s research information for more than half a century. Today it remains committed to promoting integrity in research whilst improving the retrieval, interpretation and utility of scientific information.”63 In 2021, the Institute reiterated its commitment to detect search behavior that “would detract from true community-wide research influence.”64 On the cover of the report published that same year, there was no longer any mention of “identifying top talent,” as in 2018 or 2019, or of describing highly cited researchers as “Pioneers in their fields. Recognized by their peers. Applauded by the world.” The title of this report is simply “Highly Cited Researchers 2021,” as if the ISI was scaling down its ambitions.
In 2022, it became clear that scientific misconduct was now a real issue for the ISI. The Institute noted that being an HCR can become a goal and lead to abuses:
The incentives to achieve Highly Cited Researcher status are in some nations and research systems quite high. Highly Cited Researcher status often results in rewards for a researcher such as higher renumeration, recruitment to other institutions (which benefit in the Academic Ranking of World Universities, the number of Highly Cited Researchers represents 20% of an institution’s score for ranking) and sometimes offers to become affiliated researchers at other institutions in exchange for large payments and a researcher’s agreement to preferentially list the contracting institution regularly on publications (this represents a shortcut to higher placement in the Academic Ranking of World Universities).65
4.3.2. A change of positioning
The criticism addressed to the ISI regarding the HCR lists is not new. In 2015, Thomson Reuters reported that it had been criticized, particularly by scientists themselves, for the negative effects that individual celebrations could have on the community. Similarly, the ISI pointed out in 2018 that the interpretation of citations has been the subject of debate for many years, that “some assert that they convey importance or popularity; others say they function largely as rhetorical devices and collectively create a socially constructed reality.”69
In the face of such criticism, list producers have essentially adopted two approaches: anticipating criticism and appealing to arguments of authority. One way of anticipating criticism was to point out as early as 2014 that there is no ideal measure of performance and that “the only reasonable approach to interpreting a list of top researchers such as ours is to fully understand the method behind the data and results, and why the method was used. With that knowledge, in the end, the results may be judged by users as relevant or irrelevant to their needs or interests.”70 In 2015, Thomson Reuters also reminded us that counting the most cited articles is only one way of measuring a researcher’s impact, that no approach is ideal, and that any method will generate exclusions.
The producers used arguments of authority on several occasions. In response to criticism that a few individuals were being celebrated at the expense of the scientific community, Thomson Reuters begins by pointing out that “human talent is unequally distributed in research just as in the arts or athletics,”71 then chose to rely on an article published in 2011 by two scientometricians and science policy analysts, Diana Hicks and J. Sylvan Katz, in which the authors stated that a “lack of recognition […] likely result in ‘the suppression of incentives for the very best scientists’.”72 The following year, in 2016, Clarivate quoted Lutz Bornmann, bibliometrician and sociologist at the Max Planck Society, to testify to the interest of this list: “in quantitative research evaluation, there is hardly another freely accessible database which can bring to expression the high reputation of researchers in a similar way to the list of the Highly Cited Researchers.”73 In 2018, to justify its use of citations, the ISI relied on the “late Robert K. Merton,” described as “the 20th century’s leading sociologist of science.”74 Finally, in 2018, the ISI assumed that Eugene Garfield, who had died a year earlier, “would be most gratified by those instances in which [their] designation of Highly Cited gave a deserving but underappreciated researcher the recognition and opportunity he or she deserved.”75
However, the scale of scientific misconduct seemed to be pushing the Institute into a corner. In 2021, the communication strategy was more defensive. The ISI stated that “the methodology and individuals selected for the Highly Cited Researchers list have been determined at [their] sole discretion.”76 At the end of the year, the ISI stated that the 2022 list was already being developed and that candidates had been identified and contacted. However, it pointed out that “inclusion in this preliminary list does not guarantee inclusion in the final launch list.”77 Although most of these statements disappeared in 2022, a nervousness was still evident in the communication put in place by the ISI, particularly following the revelations in the Spanish daily El Pais about the manipulations surrounding HCR affiliations in April 2023.78 The Institute published a press release the same month in response to the questions raised in these press articles, to provide a “clarification of how [they] identify and publish primary researcher affiliations in the Highly Cited Researchers program.”79 To our knowledge, this is the first time that the Institute had issued a press release relating to the HCR list, other than the one announcing the publication of the list in November each year. It is also interesting to note that when the 2023 list was published, for the first time a specific contact was proposed to journalists who “would like to speak to someone at Clarivate about this program.”80
It appears that after this initial reflex of withdrawal or defense, the ISI was seeking to regain control and reposition itself both as a force for proposal and as a trusted partner, by implementing a so-called “qualitative” approach.
4.3.3. The “qualitative” approach
In 2021, the ISI for the first time used both quantitative and qualitative approaches to draw up this list. The new qualitative approach involved examining HCR files to identify any scientific misconduct that might lead to exclusion from the list. This analysis, which began in 2019, “now look[s] at a growing number of factors when evaluating papers […].”81 These factors, also known as “filters,” relate to both contributions and citations.
4.3.3.1. Contribution.
The question of which contributions to consider was not a new one, Thomson Reuters was already asking itself this question in 2014, as we saw earlier. Five years on, how to count contributions remained at the heart of the Institute’s concerns. The exclusion of publications with more than 30 institutional addresses was extended to all 21 ESI categories. The ISI noted a general increase in the number of articles coauthored by numerous authors. From then on, “award[ing] credit to a single author among hundreds listed on a paper strains reason.”82 These new practices were even leading the Institute to reconsider fractional counting, which was rejected in 2014.83
In 2021, the ISI was introducing a change in the way contributions are considered: Publications with more than 30 institutional addresses would no longer be eliminated, but rather those with more than 30 authors, or a group of authors. The ISI considered this change to be an “improvement in reasonably crediting individual authors—the previous use of the institutional addresses was a heuristic.”84,85
The question of highly prolific profiles, raised the following year, finally consisted of excluding from the preliminary list of HCRs authors presenting an “extreme level of hyper-authorship,” in other words “outsized output, in which individuals publish two or three papers per week over long periods, by relying on international networks of co-authors.”86 This type of practice “strains [their] understanding of normative standards of authorship and credit.”87
4.3.3.2. Citations.
The problem of excessive self-citation appears in 2019 in the ISI declarations. Self-citation is when an author refers to his or her previous work in a publication. The Institute decided to exclude from the lists researchers whose HCPs reveal abnormally high levels of self-citation. To identify these cases, the Institute says it calculated a self-citation distribution for each ESI field, enabling extreme values to be identified. Additional methodological information was provided in 2020, in the form of a bibliographical reference to an article by Szomszor, Pendlebury, and Adams (2020), three authors affiliated to ISI. In 2023, however, ISI said it was interested in “very recent publications”88 with a very high number of self-citations. This element echoed a new methodological change made in 2019: the change in the time window used. When the decision was made in 2014 to use the 11-year ESI citation window, it ended two years before the list was published. In 2019, the ISI chose to reduce this gap to one year, stating that “this year [they] were able to access and analyze the data more quickly than in past years. [They] used the latest data to present a more up-to-date list of Highly Cited Researchers.”89 It is possible that these more recent publications, which require fewer citations to qualify as “highly cited,” will require greater vigilance by the Institute.
Other manipulations, described by the ISI as “ingenious,” required closer scrutiny. These concerned the citations of the coauthors. The Institute considered that “outsized output […] raise[s] the possibility that an individual’s high citation counts may result from co-authors alone when publishing without the individual in question.”90 Therefore, “if more than half of a researcher’s citations derive from coauthors, for example, [the ISI] consider[s] this narrow rather than community-wide influence and that is not the type of evidence [they] look for in naming Highly Cited Researchers.”91 These are “networks of coauthors” that the ISI sought to identify.
This qualitative approach led the ISI to exclude an entire ESI category from its list in 2023, namely mathematics. For David Pendlebury, Head of Research Analysis, the Institute had to adapt to “address the challenges of an increasingly complex and polluted scholarly record.”92 It considered that the exclusion of mathematics was the result of “difficult choices in [their] commitment to respond to threats to research integrity across many fields.”93 According to the ISI, this discipline was particularly “vulnerable to strategies to optimize status and rewards through publication and citation manipulation.”94 Indeed, “it is a highly fractionated research domain, with few individuals working on a number of specialty topics. The average rate of publication and citation in Mathematics is relatively low, so small increases in publication and citation tend to distort the representation and analysis of the overall field.”95 As a result, this manipulation, “not only misrepresents influential papers and people; it also obscures the influential publications and researchers that would have qualified for recognition.”96 To remedy this situation, the ISI considered that this category should be analyzed more closely to identify those who have a significant influence in this area. This work was based on consultation with “leading bibliometricians and mathematicians to discuss [their] future approach to the analysis of this field.”97 While the use of experts and researchers was not new for ISI, it also involved new actors in its work to identify scientific misconduct.
4.3.4. Strategic partnerships
“This year Clarivate extended the qualitative analysis of the Highly Cited Researchers list, to address increasing concerns over potential misconduct (such as plagiarism, image manipulation, fake peer review). With the assistance of Retraction Watch and its unparalleled database of retractions, Clarivate analysts searched for evidence of misconduct in all publications of those on the preliminary list of Highly Cited Researchers.”98
In 2023, the ISI also declared that it would take account of
expressions of concern from identified representatives at research institutes, national research managers and our institutional customers, along with information shared with us by other collective community groups, e.g. For Better Science, Pubpeer. Some of these resources include anonymous or whistleblower sources. [They] also consider such evidence from trusted sources – where [they] can verify claims rough direct observation.100
It seems that ISI’s relationship with the scientific community was also changing. Researchers were no longer, beyond even the HCR, those “who carry out vital, important work every day to innovate and to improve our futures.”101 They may also be the perpetrators of scientific misconduct. For this reason, in 2022 the ISI launched an “an explicit call for the research community to police itself through more thorough peer review and other internationally recognized procedures to ensure integrity in research and its publication,”102 a call repeated in 2023. Similarly, Clarivate “welcomes and expects accuracy and clarity in researchers’ own claims of primary and secondary affiliations”103 and declares its support for “the actions of universities and research organizations to monitor and manage the activities and behaviors of their employees with respect to specifying correct home institutions which reflect their permanent, tenured positions.”104 While the Institute intended to “play [its] part to respond to a rise in threats to research integrity in many areas,”105 it did not intend to do so alone.
Finally, it is interesting to note that another actor appeared between the lines in this third phase of the trajectory of the HCR lists, in an elliptical manner. Unlike the others, this is not a potential partner but a competitor. In 2023, we could read the following sentence: “The robust evaluation and curation of our data ensure that the Web of Science Core Collection™ remains the world’s most trusted publisher-independent global citation database.”106 It seems to us that this statement is an allusion to the ISI’s direct competitor Elsevier, which both publishes and produces the Scopus bibliographic database. This database, created in 2004, has also been used since 2013 by John Ioannidis, a researcher at Stanford University, to produce each year a database of “top-cited scientists.”107 Although these are far from achieving the reputation and influence of the lists set up by the ISI in 2001, this declaration is perhaps a way for the Institute to reaffirm its position, which is currently under threat, in an academic landscape that is constantly evolving.
5. CONCLUSION
For the teams at the ISI, Thomson Reuters, and Clarivate, understanding the research landscape and identifying the scientific elite is a major challenge. It is for this reason that the “Citation Laureates” or the annual list of highly cited researchers have been designed. For the Institute, “recognition of Highly Cited Researchers not only validates research excellence but also enhances reputation, fosters collaboration, and informs resource allocation.”108 As a result, the ISI is working to position itself as a key actor for “many organizations involved in research evaluation and assessment—including universities, governments, research assessment and ranking organizations globally to provide accurate, verifiable and trustworthy data.”109
Through an analysis of 124 documents produced by the producers of these lists, it seems to us that three main phases can be identified in the trajectory of the HCR lists, which manifest themselves by changes of positioning, methodological adjustments, and the intervention or solicitation of new actors. First, between 2001 and 2011, the HCR list appeared to be a biographical and bibliographical database, developed on a stable methodology and made available to the scientific community to identify determining influences in different fields of research. Between 2012 and 2018, this tool was gradually transformed into an indicator of scientific excellence, at the level of an individual researcher, an institution, or a country, subject to numerous methodological adjustments and used by the scientific community. Finally, from 2019 to 2023, this strategy was undermined by the growing importance of scientific misconduct, requiring the introduction of an evaluation and selection methodology based on a qualitative approach, as well as the involvement of new actors likely to support the ISI and Clarivate in this process.
However, we may wonder whether the exclusion of mathematics from the list in 2023 does not in fact mark the start of a new phase. For the ISI, this discipline differs from the other categories in terms of the research and publication practices of its community. But what will happen if other categories prove to be vulnerable to manipulations? What if quantitative analysis does not allow the ISI to “maintain the purpose of [their] selection process and the integrity of [their] data: to identify researchers with broad community influence and not those whose citation profile is narrow and substantially self-generated”?110 In 2022, the ISI said it did not wish to list all the filters used “to identify and exclude researchers whose publication and citation activity is unusual and suspect,”111 to “staying ahead of those attempting to game our identification of Highly Cited Researchers.”112 This is where the challenge lies for the ISI today: to stay in the race and, through the annual publication of the list of HCRs, to maintain its position as a protagonist in the academic landscape, “acting as a beacon for academic institutions and commercial organizations.”113
ACKNOWLEDGMENTS
The author would like to thank D. Egret, D. Docampo, and D. Pontille for discussions and encouragement during the process of preparing this manuscript, as well as the reviewers for their generous feedback.
COMPETING INTERESTS
The author has no competing interests.
FUNDING INFORMATION
No funding was received for conducting this study.
DATA AVAILABILITY
All web pages collected and used in this analysis can be accessed via the URLs indicated in the footnotes.
Notes
Ibid.
Ibid.
Most recent at the time of this work.
https://clarivate.com/the-institute-for-scientific-information/history-of-isi/. For a complete presentation, see also Docampo and Cram (2019).
The “multidisciplinary” category will be redistributed among the other disciplines as far as possible. ISI also points out that there is no Arts & Humanities category at this stage. NB: This category has not been created since.
Another update will be carried out during this first phase, based on 1984–2003 data.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
It should be noted that from 2014 to 2018, this calculation was made before the disambiguation of authors’ names, which involves identifying potential homonyms.
Ibid.
Ibid.
The term “elite” was used by Thomson Reuters for the first time in 2013.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
In two articles published in April 2023 (“One of the world’s most cited scientists, Rafael Luque, suspended without pay for 13 years” and “Saudi Arabia pays Spanish scientists to pump up global university rankings,” the journalist Manuel Ansede (2023a, 2023b) reported on the negotiations carried out by Saudi Arabian universities with Spanish researchers to have them counted as HCRs at their institution and thus move up the rankings.
In this context, a heuristic is an approximation that offers an answer, however imperfect, to a problem.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
Ibid.
REFERENCES
Author notes
Handling Editor: Rodrigo Costas