Citations are increasingly being used to evaluate institutional and individual performance, suggesting a need for rigorous research to understand what behaviors citations are reflecting and what these behaviors mean for the institution of science. To overcome challenges in accurately representing the citation generation process, we use postretraction citations to test competing theories under two different citation search processes, empirically testing predictions on the spread of retracted references. We find that retracted papers are continually cited after the retraction, and that these citations are more likely to come from audiences likely to be unfamiliar with the field of the retracted paper. In addition, we find this association to be much stronger among those citing high-status journals, consistent with the behavior of scientists relying on heuristic search instead of engaged search process. While the current policy debate on misinformation in science emphasizes increasing the visibility of retraction labels to discourage the use of such publications, we argue that institutional-level interventions may be more effective, as such interventions are more consistent with the heuristic citation process. As such citation behavior may not be limited to the case of postretraction citations, we discuss the implications for current science studies as well as science policy.

Citations to scholarly publications are increasingly being used for research evaluations, due to their perceived quality of being an unobtrusive and objective measure of scientific impact (Biagioli, 2018; Hicks, Wouters et al., 2015). At the same time, there is continual concern about the widespread uses and institutionalization of citation metrics as evaluation standards (Biagioli & Lippman, 2020; Hicks et al., 2015; Moravcsik & Murugesan, 1975; Woolgar, 1991). Such continuing criticisms call for rigorous scholarly studies to improve understandings of what citations actually measure (Bornmann & Daniel, 2008; Cronin, 1984; Kaplan, 1965). In other words, what motivations and behaviors are citations reflecting, and, in particular, regardless of the motivations, what do these behaviors mean for the institutions of science, as well as for science policy, particularly science evaluation?

While there has been longstanding interest in understanding what citations actually measure (Bornmann & Daniel, 2008; Tahamtan & Bornmann, 2019), one of the challenges has been the difficulty of accurately illuminating citation practices with conventional research methods and data, as most existing studies have relied either on investigators’ close examinations of citation contexts or authors’ self-reported explanations about their citation motivations. Such methods partly assume the ability of authors to make and recollect well-informed and rational judgments about the work they cite. Thus, instead of relying on our interpretation or author’s recollected memory, we argue that examining citations to retracted references could provide a unique opportunity to understand citation practices, because postretraction citation data allow us to identify a set of articles that should not have been cited but nevertheless were cited. In this regard, the use of postretraction citations as a case not only provides a unique opportunity to investigate abnormal citation behavior but also a strategic research site for theory-testing by comparing different predictions on the likelihood of such citations, built from competing theories. We first review two dominant theories of citation motivations from the sociology of science: the normative and constructivist theories (Bornmann & Daniel, 2008; Cozzens, 1989; Cronin, 1984). To simplify the distinction, the normative theory views the function of citation as conferring credits and indebtedness to the original author (Kaplan, 1965; Merton, 1957). Meanwhile, constructivists view citation as a means to bolster scientific claims to convince audiences (Gilbert, 1977; Latour, 1987).

However, these citation motivations alone cannot explain why retracted references were cited, which suggests that we need to consider the citation generation process independent of citation motivation. We use the term “citation search” to represent the citation generation process, which encompasses the process of accessing the published literature, and eventually marking the use of that literature in the author’s argument by use of a citation. We are agnostic about when in the process this search occurs. The search part of a “citation search” can occur early in one’s training as a researcher, at an early stage in the focal project’s conceptualization, at random moments when perusing published works, or in a targeted manner to find a specific piece of knowledge to support an argument or perhaps to respond to a comment from a reviewer. However, the citation appears in the focal paper, and we use its appearance in the paper as the marker of the search, and make this citation search the object of our analysis (i.e., the process by which a particular reference goes from being in the published literature to being cited in the focal paper). We argue that we can use citations to retracted papers as a strategic site for understanding this citation search. We first consider the citation process that follows a pure form of citation search, where authors would cite after thoroughly reading papers. We consider this as an engaged citation search. At the other end, we consider a citation process where authors would extensively rely on cues and signals that they think are useful in fulfilling their citation motivations. Inspired by a behavioral theory tradition (Cyert & March, 1963; Simon, 1997), we consider this citation process as a heuristic citation search. One important insight from the behavioral theory is that it can describe when authors are more likely to use heuristics, and more importantly, when such behavior can become overly mechanical, perhaps even to the point where they may be citing the materials without reading them. We then derive hypotheses that lead to competing predictions about conditions under which retracted articles are more likely to be cited. In doing so, we use field distance (between retracted articles and citing articles) and journal visibility (high vs. ordinary journal impact factor) as theory testing variables.

Our analysis is based on a set of retracted articles published from 1980 to 2016, obtained from Retraction Watch (2019), and corresponding metadata obtained from the Clarivate Analytics Web of Science. Based on 103,245 citing-cited article pairs from 2,123 retracted articles and 94,871 citing articles, we first show that, on average across time, from 38% to 44% of citations to retracted articles were made after retraction events. By operationalizing field distance with a natural language processing model, we show a strong association between postretraction citation and field distance. Furthermore, we find this association much stronger among those citing high-status journals, which supports a heuristic citation search model, regardless of the author’s citation motivation. As predicted by the heuristic citation search model, some authors seem to superficially use high-status journals as a heuristic when searching for distant (hence more likely unfamiliar) knowledge. These findings are consistent with a process of postretraction citations at least partly driven by a process where authors are citing the paper as a marker for some point in their argument, relying on more surface characteristics of the paper (it was published, it is related to point X, and it is in a reputable journal), perhaps without regard to the detailed contents of the paper, and, in particular, without regard to whether the publication has been nullified by a retraction.

This paper is organized as follows. In Section 2, we provide a review of the existing theories and empirical studies on citations. In Section 3, we discuss postretraction citation and construct hypotheses combined from the existing theories of citation motivations and two different citation search models. After presenting our data and method in Section 4, we report our findings in Section 5. In Section 6, we provide discussions of our findings, particularly on the implications for citation theory and policy intervention.

2.1. Two Dominant Theories on Citation

Contemporary scientific articles are characterized by the prominent use of citations to prior work, unlike, for example, newspaper opinion essays or literary works. Citation use can reveal important aspects of science as a social institution. First, some scholars view citations as a social device that establishes and maintains property rights and priority claims in science (Kaplan, 1965; Price, 1963; Zuckerman, 1987; Zuckerman & Merton, 1971). This view is described as the normative view due to the emphasis on the function of citations in maintaining the normative structure of science (Merton, 1957, 1973). According to the Mertonian norms of science, scientists are compelled to freely share their knowledge, and the social recognition of priority, in turn, serves as a primary means to compensate scientists for voluntarily sharing their findings with the public (Merton, 1957). Therefore, just like eponyms, prizes, and awards are used to maintain the collective memory of scientific discovery, citations can be used to maintain the norm of common ownership of scientific goods by rewarding original authors with social recognition (Kaplan, 1965). This social recognition can also lead to material rewards, such as jobs, promotions, and funding, providing an additional economic basis for this publishing-citation normative reward system. From this perspective, citations can reflect the operation of the Mertonian norms in science. What does this mean for the institution of science? To the extent that citations accurately reflect the conferring of one’s indebtedness for appreciating the scientific contribution, citation counts can be viewed as a valid measure of quality and impact.

Meanwhile, the constructivist schools in the sociology of science (Gilbert, 1977; Knorr-Cetina, 1981; Latour & Woolgar, 1979) question the normative interpretation of citation practices. Based on contemporary findings from citation context analyses (Chubin & Moitra, 1975; Moravcsik & Murugesan, 1975), which revealed multifaceted motivations for citations, Gilbert (1977) rejected the idea that recognition was the primary function of citations. He further argued that the presence of perfunctory and negative citations was not readily explained by the normative interpretation. Instead, citation practices were viewed as scientists’ attempts to persuade their peers by bolstering their scientific claims through embedding other people’s work into their texts (Gilbert, 1977; Latour, 1987), making the citation process a selective and strategic activity. For example, citations to highly cited and recognized works even when they have minor intellectual relevance would be better explained by the constructivist perspective. Because of this selective citation behavior, citations also reflect the process of a scientific claim transforming into a hard fact or black box (Gilbert, 1977; Latour, 1987; Latour & Woolgar, 1979). Such a scientific claim is less likely to be cited once it has become a “black box.” Thus, according to the constructivist interpretation, citations rather reflect strategies employed by scientists in constructing scientific knowledge and persuading their audience.

Therefore, some constructivist scholars show disdain for using simple citation counts to measure the quality or impact of a published article. For example, MacRoberts and MacRoberts (1987) criticized Cole and Cole (1972)’s use of citation counts to measure intellectual importance to investigate whether a few elite scientists disproportionately make scientific contributions. MacRoberts and MacRoberts (1987) reasoned that because scientists are more likely to cite works of high-status scientists to bolster their claims, eminent scientists end up garnering excessive citations. They also argued that the use of citations is prone to many errors due to the multifaceted context and content of citations. In response, Zuckerman (1987) argued that errors need to be systematically observed, as there can be undercitation of eminent scholars due to our tendency to drop citations of “hard” fact1. Moreover, she argued that citation motives and consequences are analytically distinct such that citations can have a variety of motives but the very fact that they were stamped on the text suggests that materials were read and had an influence on the authors. As seen from her quote below, she further questions whether the use of “argument from authority” can ever be devoid of any relevant cognitive materials.

What are the characteristics of those sources which can possibly be ‘persuasive’ citations in the clear sense of only providing ‘authority’ rather than relevant cognitive materials in support of the new work referring to it? Presumably, these authoritative sources have been assessed by the pertinent collectivity of peers as having made sound and consequential contributions. As Gilbert himself observes, it is the papers seen as “important and correct” which “are selected because the author hopes that the referenced papers will be regarded as authoritative by the intended audience.” (Zuckerman, 1987, p. 334).

The above statement assumes that “authority” can only come when both citing authors and their audience agree that the cited contents carry significant intellectual weight. This is plausible to the extent that the citing authors and audience have actually read the cited works. However, it is possible that such authority can also come from the status of a journal from which the cited article was published, in which case, both authors and audience may agree on its “authority” without having substantial knowledge of its contents. We will come back to this point in the discussion section.

2.2. Previous Empirical Findings

The empirical evidence for normative and constructivist interpretations is far from settled (Bornmann & Daniel, 2008; Tahamtan & Bornmann, 2019). This debate has been addressed broadly from three different methodological approaches. One approach attempts to understand the citation context by closely examining both cited and citing documents. Some early works used this method to illuminate the multifaceted uses of citations (Chubin & Moitra, 1975; Frost, 1979; MacRoberts & MacRoberts, 1986; Moravcsik & Murugesan, 1975). Some of these studies substantiate the normative interpretation by showing that the majority of citations reflected research impacts and few citations were “negational” (Chubin & Moitra, 1975; Moravcsik & Murugesan, 1975). Yet, the same studies also find nontrivial use of “perfunctory” citations (Gilbert, 1977; Latour, 1987), citations that were either misquoted, wrong, or meaningless, questioning the normative interpretation. Moreover, MacRoberts and MacRoberts (1986) also found that a large share of works that made significant contributions to the topic were never cited. They further argue that these “lost citations” are disproportionately lost by low-status authors (MacRoberts & MacRoberts, 1987), thereby casting doubt on using citation analysis to measure scientific contribution. One of the difficulties of citation context analysis is that it requires immensely painstaking efforts to read carefully as well as a high level of field expertise to accurately infer citation contexts from reading both cited and citing documents (Bornmann & Daniel, 2008). The former part is being partly addressed by the increasing diffusion of machine-readable documents and advances in natural language processing (Tahamtan & Bornmann, 2019), which have been increasingly used to scale citation context analysis (Berger, McDonough, & Seversky, 2017; Cohan, Ammar et al., 2019; Jurgens, Kumar et al., 2018; Teufel, Siddharthan, & Tidhar, 2006).

The other approach uses interviews or conducts surveys to directly asks the original authors about their citation intents/functions (Brooks, 1985, 1986; Cano, 1989; Teplitskiy, Duede et al., 2022; Vinkler, 1987). This method has helped our understanding of citation functions by revealing the heterogeneous and sometimes chaotic nature of citation uses. Yet, one weakness of this method is its reliance on the author’s self-response to identify citation motivations. Thus, it is not too surprising that few previous studies based on this method identify behavior such as citing retracted papers, let alone providing useful answers to continuous citations of nullified references. Last, citation studies also have been approached by using a statistical approach to examine the extent to which citations can be better predicted by normative or constructivist variables. Empirical evidence is again mixed. For example, based on a citation network of potential citing-cited pairs of publications from astrophysics, Baldi (1998) showed that the likelihood of citation increases with the content relevance and perceived quality of the work, thereby supporting the normative interpretation2. At the same time, he finds no relationship between the status of the author and citation, but does find that women are less likely to be cited, thereby providing partial support for constructivist interpretation. This latter finding is also supported by more recent bibliometric studies (Huang, Gates et al., 2020; Larivière, Ni et al., 2013), which show the persistent gender inequality in citations even after controlling for relevant observable variables, suggesting the presence of particularistic standards governing the citation process (Fox, Whittington, & Linkova, 2017; Long & Fox, 1995). As such, while previous citation studies have illuminated the diverse usages of citations, the debate between the normative and constructivists' interpretations of citation has not been settled. Our paper provides new insights into this longstanding debate by empirically presenting and examining an overlooked citation practice: citations to retracted references.

Retracted articles are nullified papers. Literally, the publication has been “undone” from the journal (Van Noorden, 2011). In other words, while the paper’s content still exists, the paper no longer exists as a publication. In most cases, the paper is retracted because there is some problem with the paper that implies it should not have been published to begin with. Hence, one could argue that such nullified papers should not then be used as a proper base for knowledge production. However, retracted articles often continue to be cited as if they were legitimate scientific findings (Bar-Ilan & Halevi, 2017; Bordignon, 2020; Budd, Sievert et al., 1999; Hamilton, 2019; Kochan & Budd, 1992; Pfeifer & Snodgrass, 1990), which raises concerns about the integrity of science across scientific communities (Campanario, 2000; Unger & Couzin, 2006).

One might argue that the paper’s content continues to exist and therefore can provide a legitimate basis for a citation: for example, because it contains an inspiring research question, or a particular finding that was unrelated to the retraction, or even, that the plagiarized content is still useful even if copied from elsewhere3. Here we argue that even if such motives would lead to “legitimate” citations to retracted papers, such a citation should at minimum include a caveat stating that the paper is being cited in service of point X, despite the fact that the journal has nullified the publication. Furthermore, the paper should not be cited in its publication form (perhaps citing a preprint, for example), as the publication no longer officially exists. Still, to further explore this line of argument regarding citation practices, we will explore various subsets of citations to retracted papers to address their implications for different models of citation search.

In response to awareness of these citations to retracted papers and to address concerns about such citation practices, previous studies (Bar-Ilan & Halevi, 2017; Davis, 2012; Garfield & Welljams-Dorof, 1990; Pfeifer & Snodgrass, 1990; Wager, Barbour et al., 2009) suggested various ways to increase the visibility of retractions by either standardizing the retraction notice, clarifying the retraction reasons, coordinating with nonpublisher platforms (such as Web of Science or PubMed) or implementing the author alert system. However, drawing from the literature above on citation search, it is plausible that citing retracted articles is a reflection of the search heuristics, leading to superficial or perfunctory citation practices. In fact, numerous prior studies suggest that many citations may not have been deeply engaged with by citing authors (Harzing, 1995, 2002; Harzing & Kroonenberg, 2016; Katz, 2006; Leng, 2020; Leung, Macdonald et al., 2017; MacRoberts & MacRoberts, 1986; Simkin & Roychowdhury, 2005; Vinkler, 1987). One additional reason to cite nullified papers is a negative citation (that highlights that the paper is retracted). However, prior work finds that such negative citations are rare among postretraction citations (Bar-Ilan & Halevi, 2017; Bordignon, 2020; Hsiao & Schneider, 2021; Schneider, Ye et al., 2020). Hence, it is highly plausible that a substantial share of postretraction citations may have resulted from authors citing articles without attending to the retracted status of the paper. If lack of awareness is the primary reason behind citing retracted articles, we would expect to observe more postretraction citations from distant fields (Dinh, Sarol et al., 2019). The presence of postretraction citations suggests that we need to separately address the citation process from citation motivation. In the next section, we consider the normative and constructivist theories as two ideal-type citation motivations. In addition, inspired by the behavioral theory tradition, we consider that the citation process may lie between engaged and heuristic citation search processes. We then use this framework to derive hypotheses about the relationship between field distance and postretraction citation.

3.1. Citation Motivation and Citation Search Process

While normative and constructivist theories provide plausible reasons for why scientists cite what they cite, these theories fall short of explaining postretraction citations. From the normative perspective, authors would not cite a retracted article, as the priority norm would compel them to properly confer credits to the rightful producers. Therefore, it would be absurd to give credit to the authors of works that have been nullified in the eyes of the scientific community. For constructivists, citing a retracted article would be like convincing peers with arguments based on flawed evidence. Thus, instead of relying solely on citation motivations, we examine the citation search process to understand why retracted articles are continually being cited. We consider that a citation search process may lie between two ends, between engaged and heuristic citation searches. Ideally, authors would thoroughly read the paper and then integrate the paper’s content into their own argument and presentation of findings via the citation, whether their citation motivations are to confer credits to original authors (normative) or bolster their claims by associating their works with the papers they cite (constructive). We categorize such a citation process as an engaged citation search. While particular citations may not adhere to such a strict citation process in practice, we argue that the engaged citation process represents an ideal type and norm for how academics should end up citing the work of others. However, it is not clear how common such “pure” engaged citations may be. One survey-based study suggests that about 75% of the references were cited after authors had thoroughly read them (Vinkler, 1987), suggesting about one-fourth of citations did not involve engaged citation search. Furthermore, numerous other studies provide plausible evidence against a presumption that authors strictly adhere to the engaged citation process (Harzing, 1995, 2002; Harzing & Kroonenberg, 2016; Katz, 2006; Leng, 2020; Leung et al., 2017; MacRoberts & MacRoberts, 1986; Simkin & Roychowdhury, 2005).

Such an imperfect citation search is consistent with the insights from the behavioral theory tradition (Cyert & March, 1963; Simon, 1997; Simon & March, 1958). With the increasing citation search space from the exponential increase in the number of publications and the increasing rate at which scientists produce articles, combined with a demand by reviewers to thoroughly incorporate the existing literature into the paper’s argument, it may be difficult to expect scientists to read thoroughly all the papers they cite. Thus, to the extent that citation search is more costly (for example, cognitively distant search), we expect authors to rely more heavily on cues and signals that they think might provide useful information related to their citation motivations. For example, the authors may rely on the visibility of journals, number of citations, status, and affiliated institutions of authors to guide their search. This information may further be used as part of the process of selecting which of the papers found will be cited by the searching author. We consider this type of citation search process as a heuristic citation search process. Recall that our concept of citation search incorporates the processes of both finding articles and incorporating those found into one’s paper. Such heuristics may guide both steps, which has implications for the probability that a retracted paper continues to get cited after the retraction event. Figure 1 illustrates how citation motivations can be classified as normative or constructivist, and how the citation search process can be driven by engaged or heuristic searches. Using this framework, we construct hypotheses for predicting citations to retracted references.

Figure 1.

Citation motivations and citation search processes. Predictions about the first-order effect of field-distance on postretraction citations are shown in part (a). Part (b) shows the competing predictions about the visibility-moderated effects of field-distance on postretraction citations.

Figure 1.

Citation motivations and citation search processes. Predictions about the first-order effect of field-distance on postretraction citations are shown in part (a). Part (b) shows the competing predictions about the visibility-moderated effects of field-distance on postretraction citations.

Close modal

3.2. Field Distance and Postretraction Citations

3.2.1. Engaged citation search

We first derive hypotheses from each citation motivation when authors are conducting an engaged citation search. While citing retracted references, by definition, is antithetical to the engaged search, we can derive conditions under which retracted articles are more likely to be cited. According to normative theory, the institutional norms of science compel scientists to protect the priority of their peers (Kaplan, 1965; Merton, 1957). Articles that falsely cite prior works may not make it past a series of field gatekeepers (Kaplan, 1965; Zuckerman & Merton, 1971), such as editors and reviewers, as they likely consider such an act a violation of the social norms of recognition. Under this normative pressure, authors are more likely to be cautious when citing sources from their own fields because a field is a unit in which the norms of science would operate with greater pressure (as more people are going to find out that you have infringed the norm to acknowledge the help of others (Kaplan, 1965)). Therefore, the likelihood of citing retracted articles would increase with the distance between cited and citing authors’ fields. The same prediction could be derived from the constructivist theory. In particular, those that view citation as a rhetorical device to convince peers would predict that citations to retracted articles are more likely to come from distant fields. This is because falsely citing references from their own discipline increases the chance that their “opponents” detect their mistakes, which would subsequently undermine their scientific claims (Gilbert, 1977; Latour, 1987).

Therefore, based on both normative and constructivist theories of citation under the engaged citation search model, the following first hypothesis can be derived:

H1: Postretraction citations are more likely to come from distant fields than from proximate fields, such that E[post_citedistant_field] −E[post_citeproximate_field] = β1 > 0

3.2.2. Heuristic citation search

The heuristic citation search considers a process of finding relevant citations as part of the information search process (Cyert & March, 1963; Simon, 1997; Simon & Newell, 1971) from a vast space of potentially citable articles. A citation search would be more costly to the extent that an author is unfamiliar with a topic, which would require her to spend extra time and effort to identify and decide whether a particular source would be relevant to her work. Thus, failure to properly evaluate the cited materials is more likely to occur when searching for knowledge from distant fields, regardless of whether the motivation is to give proper credit (normative) or to enhance your claim (constructive). The prediction from the heuristic search process for both normative and constructive citation motivations is consistent with the argument from engaged search derived from either motive, such that, again, the rate of citations to retracted articles may increase with field distance. Therefore, hypothesis H1 should hold for either engaged or heuristic search processes.

3.3. Visibility, Field-Distance, and Postretraction Citations

3.3.1. Engaged citation search

Not all retracted articles have equal visibility. Indeed, the most high-profile retracted articles are those published in the most visible and prestigious journals, such as Science and Nature (Oransky & Marcus, 2021), as well as research that committed severe research misconduct (Reich, 2009). Empirical evidence shows that highly visible articles have a sharper decline in citations following retractions (Azoulay, Furman et al., 2015; Furman, Jensen, & Murray, 2012). Exploiting the visibility of retracted articles may allow us to construct two competing hypotheses from normative and constructivist theories. First, the constructivist approach views citations as a means to bolster scientific claims (Gilbert, 1977; Latour, 1987). In this sense, falsely citing a highly visible article, even if that article is coming from a distant field, may pose a high risk of undermining the authors’ scientific claims because the “opponents” are simply more likely to be aware of the article given its high visibility. Note that the importance here is that the constructivists view the use of citations as a rhetorical device deployed in the “war of words.” While deploying many references to their claim may be equivalent to bringing in many “allies” that their opponents must defeat, any mistakes in making references can also be used against them (Latour, 1987). Thus, for retracted articles that are highly visible, such as those published in journals with a high journal impact factor (JIF), if the author is motivated in the way the constructivists argue, the relation of field distance to postretraction citations would be lower for high-profile journals, because both authors and referees are more likely to be aware of retraction.

H2A: [constructivist motive and visibility × distance interaction]: The field distance effect on postretraction citations will be weaker when citing retracted articles published in high JIF journals, such that β1,high_JIFβ1,low_JIF = β2 < 0

On the other hand, a primary mechanism by which falsely cited references are screened under normative interpretation is through defending the intellectual property of scholars evoked by shared institutional norms of science (Kaplan, 1965; Merton, 1957), rather than by actively searching for flaws and errors in claims substantiated by the cited references. From the normative interpretation, the soundness of citation rests on whether proper credit has been conferred to the rightful owners. Note how this emphasis on protecting the priority of scientists contrasts with constructivists’ emphasis on the relationship between the cited materials and the claims made by the authors. To the extent that the violation of property rights of authors from distant fields does not elicit moral indignation, highly visible retracted articles could be just as “foreign” as less visible retracted articles, as long as they both come from equally distant fields. Although the high visibility of retracted articles, particularly of those published from high JIF journals, alone would discourage postretraction citations, we posit that this effect may not correlate with the field distance between retracted and citing articles. Therefore, the following hypothesis is derived.

H2B: [normative motive and visibility × distance interaction]: Postretraction citations are more likely to come from distant fields than from proximate fields regardless of the JIF of the journals in which the retracted articles are published, such that β1,high_JIFβ1,low_JIF = β2 ≈ 0

3.3.2. Heuristic citation search

Searching for relevant citations from distant fields may be more costly due to the limited expertise, experience, and time available to authors. One insight from the behavioral theory tradition is that we rely on heuristics to guide our search, particularly in contexts of high uncertainty and of time pressure (Tversky & Kahneman, 1974). In the context of searching for relevant literature, we may rely on the perceived quality and status of journals, such as JIF, as a cue. For example, JIF, which was originally created to serve as a heuristic for librarians (Garfield, 2006), could be deployed as a search heuristic for authors searching for relevant literature (Osterloh & Frey, 2020; Wooding, 2020), whether our motivation is to give credit (normative) or bolster our claims (constructive). Such reliance on JIF would greatly reduce the mental effort expended searching for unfamiliar knowledge. Yet, reliance on such a heuristic can become overly mechanical, such that, in an extreme case, authors may not have even read the paper they cite. In fact, previous studies have documented possible evidence of authors making references without reading them (Ball, 2002; Harzing, 1995, 2002; Hoerman & Nowicke, 1995; Simkin & Roychowdhury, 2005). For example, by tracking misprints in citations, Simkin and Roychowdhury (2005) constructed a mis-citation propagation model, which estimated that around 70–90% of citations are copied from the reference lists of other papers. More subtle evidence shows researchers mechanically responding to a false (unknowingly to them) recommendation algorithm by citing recommended works that have substantially lower cognitive relevance than uncited works that were not recommended (Kolympiris, Drivas et al., 2020). We argue that such citation behavior may be more common among citations to articles from distant fields yet published in high JIF journals. That is, given that authors must bear significant costs in searching for distant knowledge, they are more likely blindly to “trust” the works that are published in high JIF journals due to their perceived higher status, just as hiring committees often rely on superficial uses of JIF of the articles authored by job candidates (Biagioli & Lippman, 2020). Combining this argument with the first hypothesis, we would expect to observe the association between field distance and postretraction citation to be higher for those citing retracted articles from high JIF journals, regardless of the citation motivation (normative or constructivist).

H2C: [heuristic search and visibility × distance interaction]: The field distance effect on postretraction citations will be stronger when citing retracted articles published in high JIF journals, such that β1,high_JIFβ1,low_JIF = β2 > 0

As seen from Figure 1(a), predictions about the first-order effect of field-distance on postretraction citations (H1) are identical across citation motivations and citation search processes. Meanwhile, once we examine the visibility-moderated effects of field-distance (Figure 1(b)), competing hypotheses (H2A–H2C) can be derived. Hence, to the extent that the logic of the arguments is sound, the first hypothesis provides confirmatory evidence that our measures are reflecting the processes we are describing and the second set of hypotheses provide a critical test for distinguishing the predictions from combining citation motivation and citation search process (Stinchcombe, 1968). Of course, because the prediction from normative theory under engaged citation search involves a null result for the interaction effect, if we find support for H2B, we will not be able to clearly distinguish support for this model from a simple null finding (e.g., that the observed parameter estimate is due to chance).

This section provides a detailed description of the construction of the data sets and methods used to test our hypotheses. We construct a citation network data set from retracted articles and their citing articles. The population of retracted articles was obtained from Retraction Watch (2019), a nonprofit organization that monitors and collects data on retractions. To our knowledge, the Retraction Watch database provides the most comprehensive coverage of retracted articles and detailed information about retracted articles, including but not limited to titles, authors, retracted dates, and curated retracted reasons. The data set we obtained from the Retraction Watch contains 18,525 articles retracted between 1980 to 2018. The database also provides unique identifiers such as DOIs and PMIDs for most of the articles. These identifiers were used to retrieve detailed bibliographic information from the Web of Science Core Collection. From 18,525 retracted articles, we identified 8,037 articles from the Web of Science database in this way.

From the 8,037 retracted articles, we retrieved bibliographic information on 198,674 citing articles (citations up to 2020) from the Web of Science. We removed articles (both retracted and citing) that were missing title, abstract, or cited references fields, as these fields were used to calculate field distance. We also removed both citing and cited articles whose journals did not have JIF information. We also removed articles retracted after 2016 to ensure at least 3 years of postretraction citation window. We also limited our sample to original research articles that cited retracted articles. That is, we removed citing articles that are not categorized as either “article,” “review,” or “proceedings papers” document types from the Web of Science database. We also removed self-citations to rule out alternative behavioral motivations for citing retracted articles. Finally, given that our identification relies on the within-retracted article variations, we isolated our sample to retracted articles that were cited at least 10 times by 20194. The resulting data set contains 103,245 citing-cited article pairs, which is the unit of our analysis. This data set contains 2,123 retracted articles published from 1980 to 2016, and 94,871 citing articles published from 1980 to 2019.

4.1. Dependent Variable: Postretraction Citation

While previous empirical studies focused on the effect of retraction on various dimensions of scientific activities, such as its effects on the subsequent reputation of the focal papers, fields, and authors (Azoulay, Bonatti, & Krieger, 2017; Azoulay et al., 2015; Furman et al., 2012; Jin, Jones et al., 2019), we use citations to retracted articles to examine the role of field distance and JIF in generating continuous citations to retracted articles. Thus, our dependent variable is a binary variable that takes the value of 1 if a citation was made after the retraction year or 0 if a citation was made before or during the retraction year. In other words, in each case we are comparing citations within a retracted paper to estimate the likelihood that the observed citation happened before or after the retraction event.

4.2. Field Distance

Operationalizing field distance is a central concern to test our hypotheses. Following a conventional method commonly used in scientometrics and innovation studies, we transform text information embedded in publication documents into a vector space. To the extent that the citing document and cited document share similar concepts as measured by represented vector similarity, we argue that the articles are cognitively similar. In our arguments, we are assuming that the more cognitively similar the two papers are, the more likely it is that the citing authors were familiar with the contents of the cited documents. While we are aware that this is not a perfect measure of field distance, we believe it is sufficient to argue that, for example, a paucity of shared concepts between sociology and materials science papers can be well captured by a large distance between the textual representation of documents from these two fields.

To measure the textual similarity between retracted and citing articles, we first transform texts embedded in scientific documents into vector space. Several methods are widely used for this task, including “one-hot” representation of texts into bag-of-words vectors or distributed representation of texts (Le & Mikolov, 2014; Mikolov, Sutskever et al., 2013) based on pretrained word-embedding vectors. The conventional word-embedding method is “context-free” in the sense that the representation of words is invariant with respect to surrounding words. Meanwhile, recently developed contextual representation models, such as BERT, provide a more accurate representation of scientific texts by considering the ambiguous usages of words inferred from their association with neighboring words (Beltagy, Lo, & Cohan, 2019; Lee, Yoon et al., 2020). In this paper, we use SPECTER embeddings (Cohan, Feldman et al., 2020) for the semantic representation of our corpus. SPECTER is one of the latest language models optimized for the semantic representation of scientific documents. It is based on the contextual representational model (SCIBERT) yet optimized for scientific documents by considering citation linkage during the training process. The authors of the SPECTER model provide a public API5 from which we have access to their pretrained model. By concatenating title and abstract fields from our documents and encoding it into the pretrained SPECTER model, we obtained dense vectors with 768 dimensions for 100,580 articles (both retracted articles and citing articles). As robustness checks, we also replicated our results using a bag-of-words model and a vector representation from cited Web of Science Subject Categories. The main results are qualitatively similar across different measures of field distance (results available from author).

We calculate field distance using the cosine distance between a retracted article and a citing article as shown by the following equation.
FieldDistance=1a·bab=1i=1nai×bii=1nai2×i=1nbi2
(1)
where a and b are the n-dimensional vector representation of a retracted article and citing article, respectively. Values closer to 1 represent more dissimilar article pairs.

4.3. Visibility

We use the JIF of the retracted articles, which we obtained from the Clarivate Analytics Journal Citation Report (downloaded in 2018), to proxy their retraction visibilities. High-JIF journals are highly visible. At the same time, high-JIF journals also carry significant status signals, such that many evaluators (wrongly) consider the JIF of an article as an indicator of future success (Biagioli & Lippman, 2020; Osterloh & Frey, 2020). We define a high JIF as being above the 75th percentile in the JIF distribution, which corresponds to a JIF above 18.43 for retracted journals and above 5.03 for citing journals. Note that we are not suggesting that authors specifically check the JIF of an article before referencing it in their work. Rather, we use JIF as an indirect indicator of the visibility and prestige of the journal in which a retracted article was published.

4.4. Controls

Our estimating models include several control variables. First, we address how citing authors from countries different from those of the retracted article authors may be more likely to cite them, as they are less likely to be informed about the retraction. Previous studies have shown that geographical distance continues to act as a barrier to knowledge flow despite the advancement in communication technologies (Abramo, D’Angelo, & Di Costa, 2020; Matthiessen, Schwarz, & Find, 2002; Pan, Kaski, & Fortunato, 2012). While this localization can partly be explained by the geographical concentration of research activities (Wuestman, Hoekman, & Frenken, 2019), we posit that information flow may be hampered by national boundaries net of cognitive distance. We thus calculate the country distance between retracted and citing articles based on sets of the affiliated countries using the Jaccard index. Precisely, the country distance is calculated as one minus the ratio of the intersection and union of affiliated countries. We also include several citing article-level control variables. First, it is plausible that common norms around publication practices may be different in countries that are considered “peripheral” regions (Honig & Bedi, 2012; Lewellyn, Judge, & Smith, 2017; Walsh, Lee, & Tang, 2019). Thus, we include a binary variable that takes a value of 1 if a citing article contains any authors from the United States and Western Europe and 0 otherwise. The difference between core and periphery may also be observed via institutional hierarchy and status. We thus include a dummy variable that takes a value of 1 if affiliated organizations of a citing article are among the top 50 universities based on the 2021 Times Higher Education Ranking. For a similar reason, we include the JIF of citing articles as a control. Lastly, we include the number of authors, affiliations, and countries of citing articles as controls, which is a standard practice to control for unobserved heterogeneity across different dimensions of team size (Liu, Jones et al., 2023).

4.5. Empirical Specifications

Our first hypothesis examines the role of distance in driving continuous citations of retracted articles. Specifically, our hypothesis predicts that greater field distance generates citations to nullified references, which can be operationalized by comparing the probability of citing retracted articles when citing authors are from distant fields as opposed to proximate fields. This can be expressed by the following inequality: E[post_citedistant_field] > E[post_citeproximate_field]. Given that we are operationalizing the probability of citing a retracted article by measuring the proportion of citations that are made after retraction, a naïve comparison between field distances of preretracted and postretracted citations may lead to a biased estimation, because the citation generation process, such as the likelihood of getting cited by articles from distant fields, may be highly influenced by subfield and article-level characteristics, such as journal type and status. Thus, our main estimation model employs fixed effects around the retracted article. Meanwhile, our dependent variable, postretraction citation, is mechanically correlated with citation age (years elapsed since the publication of retracted articles). Yet the citation age can positively affect the field distance, as it generally takes some time for a published idea to diffuse to other fields. Therefore, without controlling for citation age, the positive correlation between field distance and citation age (diffusion effect) can lead to an overestimation of the positive association between field distance and postretraction citations. We address the citation age problem semiparametrically by employing a set of citation age indicator variables in our estimation model. The inclusion of citation age is particularly useful, as our data set has substantial variations in the years it took for retracted articles to be retracted (see Figure 2(a)), which allows us to exploit cross-sectional variation for each citation age group to estimate field distance effects (see Figure 2(b)).

Figure 2.

(a) Distribution of years after which articles were retracted (2,123 retracted articles). The x-axis represents the number of years it took for articles to be retracted. The y-axis shows the frequency of the corresponding retracted articles. (b) Average annual citations received by retracted articles for articles that were retracted after 2 years (red), 4 years (blue), and 6 years (grey), respectively.

Figure 2.

(a) Distribution of years after which articles were retracted (2,123 retracted articles). The x-axis represents the number of years it took for articles to be retracted. The y-axis shows the frequency of the corresponding retracted articles. (b) Average annual citations received by retracted articles for articles that were retracted after 2 years (red), 4 years (blue), and 6 years (grey), respectively.

Close modal
Equation 2 below shows our main estimation model for hypothesis 1 (see Figure 1(a)).
post_citei,j=β0+β1distancei,j+Xi,j+yAGEagej,y+δt+αj+εi,j
(2)
The estimating equation relates the characteristics of citing article i from retracted article j to the probability of postretraction citation. The variable post_citei,j denotes our binary dependent variable, which takes the value 1 if a retracted article j is cited by citing article i after retraction and takes the value 0 for preretraction citations (including citation made in the retraction year). The variable distancei,j denotes the field distance between citing article i and retracted article j. β1 captures the association between postretraction citation probability and field distance. Thus, a positive estimated coefficient of β1 would support Hypothesis 1 (H1). X is a vector of control variables, including country distance as well as citing-article level characteristics, such as JIF, regional and organizational status (whether authors are from the Western countries and from the top 50 THE ranking institutions), the number of authors, affiliated organizations, and affiliated countries. The variable agej,y is an indicator variable that takes a value of 1 if a citing article that cited retracted article j was published y years after article j was published. We also include a full set of citation year indicator variables δt and retracted article-specific effects αj. Lastly, εi,j denotes assumed idiosyncratic errors left in the model.
Meanwhile, our second set of hypotheses tests three distinct predictions (see Figure 1(b)). For example, given that the effect of field distance on postretraction citation is positive after controlling for observable confounders vector Z, such that E[E[post_citedistant_field, Z] − E[post_citeproximate_field, Z]] = γ > 0, the constructivist theory under engaged citation search predicts that the distance effect from citing high-JIF retracted articles, γhigh_JIF, is less than the distance effect from citing ordinary JIF retracted articles, γordinary_JIF. Meanwhile, the normative theory under engaged citation search predicts no difference between γhigh_JIF and γordinary_JIF. Last, the citations made under heuristic citation search process, for both citation motivations, predicts that γhigh_JIF is greater than γordinary_JIF. We use the following specification to test the competing predictions.
post_citei,j=β0+β1distancei,j+β2distancei,j×jif_retractedj+Xi,j+yAGEagej,y+δt+αj+εi,j
(3)
Equation 3 tests the three versions of Hypothesis 2 by including the interaction effect between distancei,j (field distance) and jif_retractedj (dummy variable for high-JIF retracted article). The variable jif_retractedj does not vary within the retracted-article level but may vary across retracted articles. Thus, in Equation 3, β1 captures the association between postretraction citation and field distance for citations made to retracted articles published in ordinary JIF journals (γordinary_JIF). Meanwhile, β2 captures the difference in the distance effect between the high-JIF and ordinary JIF retracted articles (γhigh_JIFγordinary_JIF). A negative estimated coefficient of β2 would support the constructivist theory under engaged citation search (H2A) while a null finding would be consistent with the normative theory under engaged citation search (H2B). A positive estimated coefficient of β2 would support the heuristic citation search prediction for both citation motivations (H2C). The specification from Equation 3 also includes the same set of control variables used in Equation 2. Our main estimations use an OLS fixed effect around the retracted article to eliminate the article-specific effect αj, such that with the full set of citation-age indicator variables, we are accounting for both retracted-article specific and citation-age specific unobserved confounds6.

5.1. Descriptive Statistics

Table 1 reports the descriptive statistics and descriptions of all variables used in the analysis. The mean of the post_cite variable is 0.384, meaning around 38.4% of citations to retracted articles were made after retractions. Note that this value is an unweighted average of postretraction citations, without considering the positively skewed distribution of citations received across retracted articles in our data set (see Figure 3(a)). When we compute the average of the mean of the postretraction citation rate across 2,123 retracted articles (postretraction citation rate for each retracted article), the rate increased to 44.5%. Thus, given the positively skewed distribution of citations (see Figure 3), the postretraction citation rate is lower among retracted articles that received a large number of citations.

Table 1.

Descriptive statistics

VariableDescriptionVariable levelObsMeanStd. Dev.Min.Max.
post_cite Postretraction citations relational 103,245 0.384 0.486 
dist_embedding Field distance (SPECTER embedding) relational 103,245 0.294 0.114 0.038 0.997 
severe Retraction due to severe misconduct retracted article 103,245 0.612 0.487 
jif_retracted Impact factor of retracted article retracted article 103,245 13.974 13.874 0.429 55.873 
jif_citing Impact factor of citing article citing-article 103,245 4.763 5.347 0.000 115.840 
country_distance Degree of country overlap relational 103,245 0.782 0.370 
top_50_org Top 50 ranking institutions citing-article 103,245 0.202 0.402 
team_size Number of authors citing-article 103,245 5.412 3.729 194 
org_size Number of affiliations citing-article 103,245 2.774 2.201 147 
multi_country Number of countries citing-article 103,245 1.291 0.712 29 
has_west Has affiliation from Western countries citing-article 103,245 0.679 0.467 
year_citing Citation year citing-article 103,245 2009.941 5.778 1980 2019 
age Age of retracted article relational 103,245 5.340 4.131 39 
VariableDescriptionVariable levelObsMeanStd. Dev.Min.Max.
post_cite Postretraction citations relational 103,245 0.384 0.486 
dist_embedding Field distance (SPECTER embedding) relational 103,245 0.294 0.114 0.038 0.997 
severe Retraction due to severe misconduct retracted article 103,245 0.612 0.487 
jif_retracted Impact factor of retracted article retracted article 103,245 13.974 13.874 0.429 55.873 
jif_citing Impact factor of citing article citing-article 103,245 4.763 5.347 0.000 115.840 
country_distance Degree of country overlap relational 103,245 0.782 0.370 
top_50_org Top 50 ranking institutions citing-article 103,245 0.202 0.402 
team_size Number of authors citing-article 103,245 5.412 3.729 194 
org_size Number of affiliations citing-article 103,245 2.774 2.201 147 
multi_country Number of countries citing-article 103,245 1.291 0.712 29 
has_west Has affiliation from Western countries citing-article 103,245 0.679 0.467 
year_citing Citation year citing-article 103,245 2009.941 5.778 1980 2019 
age Age of retracted article relational 103,245 5.340 4.131 39 
Figure 3.

(a) Distribution of the number of citations received by 2,123 retracted articles. (b) Distribution of the number of citations received by 2,123 retracted articles after retraction events. Both x-axes represent the number of citations (with a bin size equal to 10 citations), and both y-axes are the frequency of the corresponding retracted articles.

Figure 3.

(a) Distribution of the number of citations received by 2,123 retracted articles. (b) Distribution of the number of citations received by 2,123 retracted articles after retraction events. Both x-axes represent the number of citations (with a bin size equal to 10 citations), and both y-axes are the frequency of the corresponding retracted articles.

Close modal

In Figure 4(a), we plot the average of postretraction citation rates across retracted articles against the year in which they were retracted. The solid line corresponds to the postretraction citation rate where postretraction refers to citations to retracted articles received 1 year after retraction. To provide a more conservative estimate, we also include a dotted line that represents postretraction citations made 2 years after retraction. Figure 4(a) reveals a declining trend in the postretraction citation rate over time. However, this trend may be attributed to the longer citation windows of older articles compared to more recently retracted ones. Overall, our data indicate that retracted articles, on average across time, received about 38–44% of their citations after retraction events. Meanwhile, model-free comparison of the pre- and postretraction citations reveals that the more distant citations are made to articles after retraction events (see Figure 4(b)).

Figure 4.

(a) The average fraction of postretraction citations for 1,979 articles that were retracted between 2000 and 2016. The x-axis represents the retraction year of the articles, while the y-axis shows the average fraction of postretraction citations for each retraction year. The solid line defines the postretraction citation as citations received 1 year after the retraction event. The dotted line defines it as citations received 2 years after the retraction event. (b) Field distance between retracted and citing articles before and after retraction events. The x-axis represents the distance between retracted and citing articles as measured by cosine distance using two embedding vectors. The y-axis represents CCDF for the two distributions.

Figure 4.

(a) The average fraction of postretraction citations for 1,979 articles that were retracted between 2000 and 2016. The x-axis represents the retraction year of the articles, while the y-axis shows the average fraction of postretraction citations for each retraction year. The solid line defines the postretraction citation as citations received 1 year after the retraction event. The dotted line defines it as citations received 2 years after the retraction event. (b) Field distance between retracted and citing articles before and after retraction events. The x-axis represents the distance between retracted and citing articles as measured by cosine distance using two embedding vectors. The y-axis represents CCDF for the two distributions.

Close modal

For ease of interpretability, we transform the field distance variable into standardized units in the analysis. We also transform the jif_retracted, jif_citing, team_size, and org_size variables into binary variables with a cut at 75th percentile values. Meanwhile, because most citing publications had affiliated organizations from a single country, we use a binary variable that assigns a value of 1 if a citing article has more than one country for the multi_country variable.

Table 2 reports the correlation matrix of all variables used in our analysis before transforming the aforementioned variables into binary or standardized variables. First, the correlation table shows a positive correlation between our dependent variable, post_cite, and field distance (dist_embedding). Meanwhile, the high level of positive correlations between age and post_cite, and age and dist_embedding suggests that citation age must be incorporated into our model to avoid overestimating the association between the postretraction citation and field distance variable.

Table 2.

Correlation matrix

 Variables12345678910111213
post_cite 1.000                         
dist_embedding 0.091 1.000                       
severe −0.132 −0.012 1.000                     
jif_retracted −0.016 0.126 0.138 1.000                   
jif_citing −0.142 0.013 0.060 0.115 1.000                 
country_distance 0.080 0.014 −0.025 −0.042 −0.086 1.000               
top_50_org −0.081 0.025 0.027 0.066 0.161 −0.122 1.000             
team_size 0.046 −0.074 −0.006 −0.045 0.046 0.055 0.069 1.000           
org_size 0.062 0.009 −0.016 −0.030 0.063 0.034 0.182 0.641 1.000         
10 multi_country 0.010 0.030 −0.009 −0.011 0.066 0.076 0.172 0.351 0.550 1.000       
11 has_west −0.206 0.075 0.072 0.099 0.204 −0.199 0.260 −0.052 0.079 0.224 1.000     
12 year_citing 0.359 −0.012 −0.085 −0.243 −0.148 0.119 −0.088 0.138 0.163 0.079 −0.273 1.000   
13 age 0.481 0.143 0.023 −0.039 −0.130 0.108 −0.068 0.067 0.075 0.033 −0.144 0.383 1.000 
 Variables12345678910111213
post_cite 1.000                         
dist_embedding 0.091 1.000                       
severe −0.132 −0.012 1.000                     
jif_retracted −0.016 0.126 0.138 1.000                   
jif_citing −0.142 0.013 0.060 0.115 1.000                 
country_distance 0.080 0.014 −0.025 −0.042 −0.086 1.000               
top_50_org −0.081 0.025 0.027 0.066 0.161 −0.122 1.000             
team_size 0.046 −0.074 −0.006 −0.045 0.046 0.055 0.069 1.000           
org_size 0.062 0.009 −0.016 −0.030 0.063 0.034 0.182 0.641 1.000         
10 multi_country 0.010 0.030 −0.009 −0.011 0.066 0.076 0.172 0.351 0.550 1.000       
11 has_west −0.206 0.075 0.072 0.099 0.204 −0.199 0.260 −0.052 0.079 0.224 1.000     
12 year_citing 0.359 −0.012 −0.085 −0.243 −0.148 0.119 −0.088 0.138 0.163 0.079 −0.273 1.000   
13 age 0.481 0.143 0.023 −0.039 −0.130 0.108 −0.068 0.067 0.075 0.033 −0.144 0.383 1.000 

5.2. Regression Results

5.2.1. Are citations from distant fields (as opposed to proximate fields) more likely to generate postretraction citations?

We report our OLS fixed effect estimations of Equation 2 in Table 3 columns (1) and (2). All models also include retracted article fixed effects, as well as citation age and citing year indicator variables, which are not reported in the table. Column (1) in Table 3 reports the model results estimated without control variables, and in Column (2), we report our findings with a full set of control variables. These regression results suggest that postretraction citations are more likely to be made by articles from distant fields. In Column (1) from Table 3, the estimated coefficient of dist_embedding is 0.0106 (p < 0.01), which suggests that one standard deviation increase in the field distance is associated with around a 1 percentage point increase in the probability of citing a retracted article. Considering that the mean value of the postretraction citation is around 38.4%, a one standardized unit increase in field distance increases the postretraction citation rate by around 2.76% (1.06/38.4). Once we include the full set of control variables in Column (2), the dist_embedding coefficient slightly increases to 0.0110 (p < 0.01). Therefore, these results support our first hypothesis (H1), which predicted that postretraction citations are more likely to come from distant fields.

Table 3.

Regressions of postretraction citations on field distance by JIF

 (1)(2)(3)(4)
dist_embedding 0.0106*** 0.0110*** 0.0071*** 0.0077*** 
(0.002) (0.002) (0.002) (0.002) 
jif_retracted × dist_embedding     0.0114*** 0.0111*** 
    (0.004) (0.004) 
country_distance   0.0050***   0.0050*** 
  (0.001)   (0.001) 
jif_citing   −0.0125***   −0.0123*** 
  (0.002)   (0.002) 
top_50_org   −0.0071***   −0.0070*** 
  (0.003)   (0.003) 
team_size   −0.0065***   −0.0065*** 
  (0.002)   (0.002) 
org_size   0.0052**   0.0053** 
  (0.003)   (0.002) 
multi_country   0.0044*   0.0044* 
  (0.003)   (0.003) 
has_west   −0.0209***   −0.0208*** 
  (0.003)   (0.003) 
constant −3.5577** −3.7473** −3.5231** −3.7136** 
(1.798) (1.848) (1.777) (1.828) 
  
R2 0.5178 0.5189 0.5180 0.5190 
Controls No Yes No Yes 
Retracted Articles 2,123 2,123 2,123 2,123 
Observations 103,245 103,245 103,245 103,245 
 (1)(2)(3)(4)
dist_embedding 0.0106*** 0.0110*** 0.0071*** 0.0077*** 
(0.002) (0.002) (0.002) (0.002) 
jif_retracted × dist_embedding     0.0114*** 0.0111*** 
    (0.004) (0.004) 
country_distance   0.0050***   0.0050*** 
  (0.001)   (0.001) 
jif_citing   −0.0125***   −0.0123*** 
  (0.002)   (0.002) 
top_50_org   −0.0071***   −0.0070*** 
  (0.003)   (0.003) 
team_size   −0.0065***   −0.0065*** 
  (0.002)   (0.002) 
org_size   0.0052**   0.0053** 
  (0.003)   (0.002) 
multi_country   0.0044*   0.0044* 
  (0.003)   (0.003) 
has_west   −0.0209***   −0.0208*** 
  (0.003)   (0.003) 
constant −3.5577** −3.7473** −3.5231** −3.7136** 
(1.798) (1.848) (1.777) (1.828) 
  
R2 0.5178 0.5189 0.5180 0.5190 
Controls No Yes No Yes 
Retracted Articles 2,123 2,123 2,123 2,123 
Observations 103,245 103,245 103,245 103,245 

Cluster standard errors around retracted articles shown in parentheses. All models include retracted-article, citation-age, citation-year fixed effects.

*

p < 0.1.

**

p < 0.05.

***

p < 0.01.

Before we move on to test our second hypothesis, we describe some of the interesting findings from our control variables. First, postretraction citations are more likely to come from articles whose authors’ affiliated countries are different from those of retracted articles. Post-retraction citations are also less likely to come from articles published in high-JIF journals. For example, articles published in journals that are above the 75th percentile of the JIF distribution among citing articles (above JIF of 5.03) were associated with around a 1.25 percentage point decrease (see Column (2) from Table 3) in the probability of citing retracted articles. This is equivalent to around a 3.26% (1.25/38.4) reduction in the postretraction citation rate. Retracted articles are also less likely to be cited by articles published by authors from high-ranking institutions (top_50_org) or scientifically “core” countries (has_west). In fact, whether citing articles had any authors from these “core” countries is one of the most predictive variables, with an estimated coefficient of −2.09 percentage points (Column (2) from Table 3).

5.2.2. Differential effects of field distance by JIF of retracted articles

We now test the competing hypotheses (H2A, H2B, and H2C) by examining whether the association between field distance and postretraction citation rate would vary across high and ordinary JIF retracted articles, by introducing interaction effects between the dist_embedding (field distance) and jif_retracted variable (JIF for the retracted article above the 75th percentile of 18.43). A negative interaction effect would support the prediction from constructivist theory under engaged citation search (H2A). A null result would be consistent with the normative theory under engaged citation search (H2B). Finally, a positive interaction effect would support the heuristic citation search process for both citation motivations (H2C). Columns (3) and (4) in Table 3 report regression results involving the interaction effect both with and without the control variables. All models are estimated with the OLS fixed effects around the retracted article and include a series of citation age and citing year indicator variables, which we did not report in the table due to space. Column (3) in Table 3 reports regression results estimated without control variables, while Column (4) includes the full set of control variables. As seen from both columns (3) and (4) in Table 3, the estimated coefficient for field distance is positive and statistically significant, which suggests a positive association between field distance and the post-retraction citation rate for retracted articles from ordinary JIF journals. We also find evidence of a positive interaction effect (p < 0.01) between field distance and the JIF of retracted articles (see Column (4) in Table 3). The estimated coefficient of the interaction effect is 0.0111, which suggests that one standard deviation increase in the field distance (dist_embedding), is associated with around a one percentage point additional increase in the postretraction citation rate (roughly 1.11/38.4 = 2.89% increase) for those that are citing retracted articles published in high-JIF journals over those that are citing retracted articles published in ordinary journals (0.77 percentage points increase).

We also report predicted probabilities of postretraction citation with respect to field distance between high and ordinary JIF retracted articles. The predicted probabilities plotted in Figure 5 are from the same specification used to estimate Column (4) from Table 3 but instead estimated with a random effect model around retracted articles. Figure 5 clearly shows that the association between field distance and postretraction citation rate is different across the high and ordinary JIF retracted articles. The association is much stronger among citations to retracted articles from high JIF journals. This evidence, coupled with the regression outputs from Table 3, provides strong support for hypothesis H2C, suggesting citation search consistent with the heuristic search model.

Figure 5.

Predicted probabilities of postretraction citation between articles that were published in high versus ordinary JIF journals. Plotted predicted probabilities are estimated from a random effect model with a specification equivalent to Column (4) from Table 3.

Figure 5.

Predicted probabilities of postretraction citation between articles that were published in high versus ordinary JIF journals. Plotted predicted probabilities are estimated from a random effect model with a specification equivalent to Column (4) from Table 3.

Close modal

5.3. Robustness Tests

To test the robustness of our results, we reran the models in Table 3 using alternative measures of field distance, both a bag-of-words vector and a distance measure based on Web of Science Categories from referenced journals (results available from the authors). For the main effect of distance, the magnitudes of the estimated coefficients from these other two field distance measures are slightly smaller, but they are positive and statistically significant, supporting H1. For the test of H2C, we find positive interaction effects for these alternative measures, with the effect statistically significant for the Web of Science Categories-based measure, although not for the bag-of-words measure. Hence, our results are qualitatively robust to alternative measures of field distance.

We also consider various scenarios for how postretraction citations are generated and how they would affect our results and interpretations. Our analysis assumes that by citing retracted articles, citing authors were not aware of the retraction (regardless of whether they read the paper or not). Here, we discuss three possible scenarios where postretraction citation generation may not follow this assumption. It is important to note that these are all scenarios that might explain why postretraction citations are perpetuated. However, while the presence of these scenarios may increase the base rate of postretraction citations, they cannot explain why the postretraction citation is predicted by field distance or, furthermore, by the interaction of field distance and journal status.

5.3.1. Deliberate citations of retracted articles

Firstly, citing authors may be aware of the retraction but would cite the article anyway because they believe that some findings are still valid (Bar-Ilan & Halevi, 2017). We address this concern by exploiting the severity of retraction reasons provided by the Retraction Watch data set. We manually classified 95 curated reasons for retraction into three categories (minor/major/severe) based on the severity of research misconduct (See Table S3 in the Supplementary material). For example, the minor misconduct category includes reasons such as “salami-slicing” or “plagiarism,”7 while the major misconduct category includes “concerns/issues about data or results.” Last, the severe misconduct category includes reasons such as the “fabrication of data” or “fabrication of results.” If retracted articles from our data set contained at least one “severe” retraction reason, we classified them into the “severe” retracted article. The idea is that deliberate citations of retracted articles are more likely to be observed if the cited articles are retracted for nonsevere reasons (Bar-Ilan & Halevi, 2017). In contrast, citing a paper even after retraction for a “severe” infraction suggests that a heuristic (rather than engaged) search may have produced the citing behavior. In Table S1 of the Supplementary material, we report our regressions separately for “nonsevere” retracted articles (Column (1) in Table S1) and “severe” retracted articles (Column (2) in Table S1). For the “nonsevere” sample, we do not see any statistical significance in the interaction effects (p > 0.1), which may have been contaminated with deliberate postretraction citations. Meanwhile, regression outputs from the “severe” sample show that the interaction effects are positive and statistically significant (p < 0.01). Moreover, the magnitude of this effect is greater than those estimated from using the full sample, which further substantiates heuristic citation search prediction (H2C). These findings are consistent if we use our alternative measures of field distance (results available from the authors).

5.3.2. Citation context analysis

In addition, some authors may have cited the retracted articles in negative ways while acknowledging the retraction. In this case, postretraction citation does not indicate incidences of “false” references. The problem is whether these incidences are correlated with our main independent variable, field distance. We would suggest that such negative citations are more likely to occur when citing authors have a substantial understanding of the topic. Therefore, to the extent that there is such a negative relationship between field distance and the tendency to cite “negatively,” and assuming that “negative” citations are more likely to occur for post-retraction citations, we would underestimate the field distance effect (meaning correcting this would produce even stronger evidence for our hypotheses). However, previous studies suggest that the incidence of negative citations among postretraction citation are rather low, making this source of bias unlikely (Bar-Ilan & Halevi, 2017; Bordignon, 2020; Schneider et al., 2020).

We conducted additional analysis using a subset of the full-text data set to check whether citing articles were aware of the retraction. We retrieved full-text data of 13,441 articles citing 2,560 retracted articles (21,355 citing-cited article pairs) from the Microsoft Academic Graph database8. This data set allows us to get a grasp of the general idea of how retracted references are cited. Here, we present 20 randomly selected citation contexts made to retracted articles. These are selected from the subset of postretraction citations that were made at least 2 years after retractions. We specifically focus on those that were retracted for severe reasons (see Table S3 in the Supplementary material for the classification). As seen from this sample of citation contexts in Table S4 of the Supplementary material, none of them cited retracted articles negatively, nor did they explicitly mention the retracted status. Furthermore, one can see that these citations seem to primarily be citing the prior retracted paper as a foundational piece of knowledge on which the citing author is building her argument, even though the paper had been retracted 2+ years prior for severe reasons such as data falsification or fabrication. This suggests that the citing author is not incorporating the citation for one of the “legitimate” reasons noted above.

Continuing in this vein, we then systematically analyzed the texts around citations to examine if citing authors have explicitly mentioned the retracted status of retracted articles. Out of 4,777 postretraction citation contexts made to 1,156 retracted papers, only 83 citation contexts (1.74%) had explicitly mentioned the words retraction, retracted, or retract when they cited retracted papers. We also analyzed whether retracted papers are more likely to be coreferenced with other references in citing manuscripts. Our rationale is that if a retracted paper is cited in a deliberate manner, or as a “negative” citation, it is more likely to be a “standalone” citation. On the other hand, if the citation is made in a nonengaged manner, it is more likely to be coreferenced with other papers (appearing in the same location in the manuscript in a list of citations). We examined distributions of the number of coreferences from 13,441 citing articles that made post- and preretraction citations. For each citing article, citation contexts are assumed to be coreferenced if they have the exact same tokens. This allows us to identify 20,999 unique coreferences from which we can compare the pooled distributions from articles that cite before and after retractions. We find that around 59.3% of preretraction citations are standalone citations, while this share is only around 52.1% for postretraction citations. Furthermore, as Figure S1 in the Supplementary material indicates, we find that coreferences are more common among postretraction citations than in preretraction citations. Our full-text analysis suggests that the majority of retracted articles are used as if they are legitimate knowledge, and there is no evidence that this becomes less likely after retraction compared to before the retraction.

5.3.3. Publication delay

We are also concerned that some of the authors may have read the paper they cite but were unaware of the retraction due to publication lag. For example, it is plausible that cited articles were not retracted when the citing authors incorporated them into their articles but only retracted during the review process of the citing articles. We reran the analyses using the same specifications from Table 3 but excluding citations made in the first year after retraction. The findings are consistent in direction and statistical significance to the main regression outputs shown in Table 3 (results available from the authors).

6.1. Postretraction Citation as Window

In our paper, we consider that the citation generation process can be abstracted to lie between engaged and heuristic search processes. Assuming that authors with either normative or constructive citation motivations have relied on the engaged citation search, the postretraction citations may simply be a result of the author’s unfamiliarity with the paper’s retraction. Thus, under the engaged citation search model, we could interpret postretraction citations as the result of a notification failure, which can partly be addressed by increasing the visibility of retraction notices. However, we also describe a heuristic citation search process, where authors may rely on heuristics to guide their citation search (Osterloh & Frey, 2020; Wooding, 2020), particularly when searching for knowledge in unfamiliar terrain. Researchers may rely on highly visible journals, such as journals with high JIF, when they need to cite unfamiliar knowledge. In a supplemental analysis, we show that, among a random sample of Web of Science publications, there is a strong positive correlation between JIF and citing-cited field distance, such that more distant cites are more likely to be to high-JIF journals (results available from authors). But we further argue that such use of heuristics can become overly mechanical. That is, to the extent that exploring knowledge is costly due to unfamiliarity, some researchers may bet on the perceived status of journals by citing the article without deep engagement with the contents (Ball, 2002; Hoerman & Nowicke, 1995; Simkin & Roychowdhury, 2005), perhaps, for example, because they are drawing from other’s cites to the paper. In the previous section, we showed that our findings are robust even after considering retraction lags or how citing authors may have deliberately or negatively cited retracted articles. Thus, our evidence suggests that postretraction citation is not all “honest mistakes” or simple laziness stemming from researchers’ inability to check retraction notices. Instead, part of it also appears to be a direct consequence of systematic citation search behavior that may not require careful reading of the papers they cite and, especially, their use of highly visible journals as guideposts when encountering unfamiliar knowledge (Osterloh & Frey, 2020). More importantly, we argue that our postretraction citation analysis may illuminate a citation practice more generally, as the citation search process (relying on heuristics rather than engaged search) is likely to generalize to citations more broadly. This seems more plausible than a theory that suggests authors use one process for drawing on retracted literature and a different one for drawing on nonretracted literature.

In addressing the long-lasting question of what it means to cite a paper in science (Bornmann & Daniel, 2008), our findings provide evidence consistent with a citation search process that is present in addition to, or instead of, the conventional understandings of citation practices. We are still left with a few important questions. Why would researchers use highly visible journals as a heuristic, and more importantly, why would someone ever use them in a perfunctory manner? Although answering these questions is beyond the scope of our paper, we attempt to provide a few potential explanations. Firstly, citing highly visible journals accompanied with unfamiliar search may be an acceptable response for boundedly rational individuals to address enormous search costs. With the increasing rate of publications, researchers simply cannot identify and evaluate all possible sets of relevant articles (Simon, 1997). The increasing reliance on article recommendation systems is meant to partly address this issue. In this sense, just as JIF was originally developed to support librarians sorting through a flood of information (Garfield, 2006), researchers could rely on the visibility of journals, such as JIF, but in this case, to identify which articles have the potential to be more important. Part of this behavior may also stem from how we tend to equate status with JIF. Papers published in high-JIF journals are perceived to be more legitimate by citing authors, reviewers, and future readers, which incentivizes researchers to cite articles from high-JIF journals. The use of JIF in this manner is also present for purposes other than citation practices. For example, Biagioli and Lippman (2020) compare the use of JIF by academic hiring committees with how futures contracts work in the financial markets. For example, some evaluators may judge published articles not based on their contents, but on a crude measure of how many citations they would expect to generate in the future, despite the fact that the skewed nature of citation distributions would render such prediction ineffective (Larivière, Kiermer et al., 2016). The important part that we show in our paper is that such reliance may become overly mechanical, as shown from Biagioli and Lippman (2020)’s example.

This brings us back to an interesting debate between the normative and constructivist theories of citations. As discussed in Section 2.1, one heated debate was around whether citation based on “argument from authority,” such as citing eminent authors or articles, could be constituted as a violation of the normative view (Leydesdorff, 1987; MacRoberts & MacRoberts, 1987; Zuckerman, 1987). However, if the authority comes from citing high-status journals, as our findings suggest, a citation can be generated irrespective of whether the citing authors have read or been influenced by the contents of the cited works (normative view) or the position of cited works and authors within the stratification structure of science (constructivist view). Thus, to the extent that our identified citation behavior can be generalized into general citation practices, there exists a Matthew Effect of JIF (Larivière & Gingras, 2010), an independent channel by which high-JIF articles garner additional citations. Furthermore, to the extent that citations produce both symbolic and material rewards (jobs, promotions, etc.) such heuristics can lead to a misallocation of resources in science.

If such mechanical citation practices constitute a nontrivial share of actual citation counts, why haven’t existing theories been able to explain them? First, we argue that it is through analyzing postretraction citations that we can uncover such citation behaviors. Second, and more importantly, we argue that the existing citation theories may rest on an idealized notion or a narrow definition of a scientist, drawing evidence and accounts based on selected scientific documents and field studies from what may be considered now as the “core” part of science, which we refer to as an engaged search process. However, during the last century, just as there has been enormous growth in publication activities (Milojević, 2015; Price, 1963), we have seen an increasing number of authors contributing to the publication activities accompanied by diversification in terms of their roles in the production of science (Hackett, 1990; Hagstrom, 1964; Larivière et al., 2013; Milojević, Radicchi, & Walsh, 2018; Walsh & Lee, 2015), nationalities (Maisonobe, Grossetti et al., 2017; Zhou & Leydesdorff, 2006), and organizations (Hicks, 1995; Li, Youtie, & Shapira, 2015). The citation behaviors of this broader population of authors do not have to follow the existing descriptions, which may have been based on a narrow definition of a once homogenous population of scientists. In fact, prior studies have shown that the increasing adoption of various performance evaluation measures by institutions, mostly in developing countries, may have created perverse incentives to publish (Franzoni, Scellato, & Stephan, 2011), which may have led to increasing publications accompanied by many instances of research misconduct (Biagioli & Lippman, 2020; Biagioli, Kenney et al., 2019; Walsh et al., 2019). When publishing becomes merely a means to an end (Price, 1963; Shibayama & Baba, 2015), it is not too difficult to expect that citations can become a ceremonial practice. While diversification of author demography may partly be blamed, given that our findings suggest that authors from non-Western countries and nonelite institutions were more likely to cite retracted articles, it is important to note that our main effects are consistent even after we control for these variables. Therefore, what our findings may suggest is that when researchers are provided with conditions that incentivize scholarly communication activities as an end in itself, and researchers are faced with increasing demands for productivity combined with a rapidly expanding knowledge base to be accounted for, the conditions for severe bounded rationality and an increasing reliance on heuristics are created. In such conditions, we are more likely to observe citation practices that do not reflect traditional notions of engaged citation.

6.2. Policy Implications

Our findings provide a few important policy implications. First, our findings have implications for addressing the continuous spread of false references in science, an issue that has become an important policy concern due to increasing misinformation and misuse of scientific knowledge (West & Bergstrom, 2021). To address this issue, we can examine different phases of the signaling pathway of false references. One solution is to interfere at the reader level to avoid readers directly citing retracted articles by flagging retraction notices. To our knowledge, increasing the retraction visibility has been the most widely suggested policy recommendation (Bar-Ilan & Halevi, 2017; Campanario, 2000; Cox, Craig, & Tourish, 2018; Da Silva & Bornemann-Cimenti, 2017; Schneider et al., 2020; Unger & Couzin, 2006). Indeed, while constructing our data set, we found that many journals and indexing databases, such as PubMed and Web of Science, fail to display retraction notices, suggesting there is room for policy interventions here. However, we argue that this intervention rests on a strong, and likely fallacious, assumption: that the postretraction citations are “honest mistakes” such that all authors who cited retracted articles have read the paper yet are unaware of its retraction (engaged citation search). Given that postretraction citations may also be driven by authors who use a heuristic search that may generate shortcuts in the process of incorporating the citation into their paper, any intervention at the reader level may not eradicate the spread of false references. In this regard, we argue that intervention at the journal level may be much more effective. Without putting an additional tax on already burdened reviewers and editors (West & Bergstrom, 2021), journals and publishers could implement an automated retraction detection system (Bar-Ilan & Halevi, 2017; Bornemann-Cimenti, Szilagyi, & Sandner-Kiesling, 2016). The idea is not for journals or publishers to forbid authors from citing retracted articles, but to flag them with a warning. This system could alert authors at different stages, as references could easily change during the paper’s path to publication. Alternatively, this could be implemented at the final copyediting/proofing stages, when other information about the references (such as missing page numbers) is routinely checked already.

Postretraction citation is one form of misinformation in science (West & Bergstrom, 2021), though it may have no intent to deceive others. Yet, it is important to note that such behaviors are not entirely due to honest mistakes or laziness, but rather, this behavior is partly reflective of systematic, and perhaps mechanical, uses of high-JIF journals when citing distant knowledge. In this sense, the spread of false references in scholarly communication is systematic and if we can extrapolate our findings to general citation practices, it may also involve the spread of irrelevant references, or in extreme cases, of references that do not exist (Harzing & Kroonenberg, 2016; Katz, 2006; Leng, 2020). In fact, the tendency of AI models such as ChatGPT to generate fictitious references (Walters & Wilder, 2023), when combined with a tendency to copy references embedded in other’s publications (secondary referencing) means that there is a risk of a proliferation of incorrect citations in the literature moving forward, from any of several processes. Therefore, the solutions mentioned above would partly reduce the circulation of retracted references, but they may do little to stop the spread of irrelevant but nonretracted references, which would involve substantial changes in publication practices (publication as an end itself) and in particular, how we use journal ranking as academic currency at both institutional (Biagioli & Lippman, 2020) and individual (Larivière et al., 2016; Osterloh & Frey, 2020) levels. While engaged search may be the ideal type practice, heuristic search may be common, and perhaps increasingly so as the knowledge burden increases (Jones, 2009).

Last, this paper contributes to the continual debate on how cumulative advantage, or the Matthew Effect (Merton, 1968), operates through the attributions given to high-JIF journals. Prior studies have suggested numerous reasons attributed to the cumulative advantage enjoyed by high-JIF journals, including increased visibility and status-seeking citation motivation (Drivas & Kremmydas, 2020; Larivière & Gingras, 2010; Traag, 2021). Assuming that our proposed citation behavior model can be extrapolated into general citation practice, we would expect some researchers to superficially cite references without thorough examination from high-JIF journals, especially when they are unfamiliar with topics when compared to when they are drawing knowledge from familiar fields. While this does not suggest that distant citations are necessarily perfunctory9, policies that aim to reward “broad” impacts should still be implemented with caution, especially when implementing a scope-based impact measure of a publication using the interdisciplinary of its citations.

The authors would like to thank two reviewers for their helpful feedback, which has significantly enhanced our paper. We extend our gratitude to the participants of the 2022 Workshop on the Organisation, Economics, and Policy of Scientific Research (WOEPSR) at KU Leuven for their invaluable comments. We also thank Paula Stephen, Philip Shapira, Stasa Milojevic, Mary Fox, Juan Rogers, Cassidy Sugimoto, and Seokbeom Kwon for their critical and constructive comments and insights on our paper. We are grateful to Retraction Watch for generously supplying retraction data and to the Georgia Institute of Technology for granting access to the Clarivate Web of Science.

Seokkyun Woo: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing—original draft, Writing—review & editing. John P. Walsh: Conceptualization, Methodology, Writing—original draft, Writing—review & editing.

The authors have no competing interests.

No external funding was received for this study.

Our main bibliometric data is proprietary to Clarivate Analytics and, therefore, cannot be disclosed. However, we have made available data sets and code for replicating our figures and regressions, which you can download from the following link (https://doi.org/10.5281/zenodo.10692463) (Woo, 2024).

1

Merton called this “obliteration by inclusion,” which bears a resemblance to Latour’s “black box” in this context.

2

One problem with this finding is the measures of content relevance and perceived quality, which were measured by the number of figures and tables per article, and the number of citations, respectively.

3

We thank the reviewers for these examples.

4

We replicated our analysis using various threshold values, using 2, 5, 20, 50, and 100 minimum citations. The results using these thresholds are consistent with our main analysis.

6

While the logistic regression model is generally preferred for a binary outcome variable, we use OLS for the following reasons. First, while the logistic regression model generally fits better than the linear model, the relationship between probability and log odds (which are a linear function of our covariates) are quasi-linear between the probability of 0.2 and 0.8 (Long, 1997; Von Hippel, 2015), which is well within much of the range of our postretraction citation probabilities. Secondly, for the interaction models our second set of hypotheses (H2A–H2C), OLS coefficients are more straightforward to interpret while logistic regression models require calculating and reporting the range of marginal effects for the interaction effect in the data. Finally, our estimating model, which includes more than one set of many indicator variables, makes computations of logistic regression (both conditional and unconditional fixed effect models) intractable due to the quasi-complete separation problem (Allison, 2008). We addressed this problem by combining the citation-age indicator variable into fewer categories. After dropping retracted articles without variation in the dependent variable (a condition for estimating logistic regression), we conducted both logistic and OLS regressions from this new data set and our results were consistent with our main findings. We also ran a regression model that explicitly predicts citation distance using postretraction citation. As shown in Table S2 (Supplementary material), our results are consistent with this alternative regression specification. Specifically, we find an increase in citation distance after the retraction event, and that this effect is greater among those that are citing retracted articles published in high impact factor journals.

7

We are not arguing that plagiarism is a minor problem. Rather, we are coding this as minor with reference to its likely impact on the validity of the cited finding (as false attribution of authorship does not affect the content of the findings).

9

Our main finding suggests P(false_citedistant_cite, high_JIF) ≥ P(false_citeproximate_cite, high_JIF). However, this does not suggest P(false_citedistant_cite, high_JIF) ≥ P(true_citedistant_cite, high_JIF), because the probability of false citation is generally far smaller than the probability of true citation (and so is the joint probability of false citations, distant citations, and citing high-JIF articles).

Abramo
,
G.
,
D’Angelo
,
C. A.
, &
Di Costa
,
F.
(
2020
).
Does the geographic proximity effect on knowledge spillovers vary across research fields?
Scientometrics
,
123
(
2
),
1021
1036
.
Allison
,
P. D.
(
2008
).
Convergence failures in logistic regression
.
Paper presented at the SAS Global Forum
.
Azoulay
,
P.
,
Bonatti
,
A.
, &
Krieger
,
J. L.
(
2017
).
The career effects of scandal: Evidence from scientific retractions
.
Research Policy
,
46
(
9
),
1552
1569
.
Azoulay
,
P.
,
Furman
,
J. L.
,
Krieger
,
J. L.
, &
Murray
,
F.
(
2015
).
Retractions
.
Review of Economics and Statistics
,
97
(
5
),
1118
1136
.
Baldi
,
S.
(
1998
).
Normative versus social constructivist processes in the allocation of citations: A network-analytic model
.
American Sociological Review
,
63
(
6
),
829
846
.
Ball
,
P.
(
2002
).
Paper trail reveals references go unread by citing authors
.
Nature
,
420
(
6916
),
594
. ,
[PubMed]
Bar-Ilan
,
J.
, &
Halevi
,
G.
(
2017
).
Post retraction citations in context: A case study
.
Scientometrics
,
113
(
1
),
547
565
. ,
[PubMed]
Beltagy
,
I.
,
Lo
,
K.
, &
Cohan
,
A.
(
2019
).
SciBERT: A pretrained language model for scientific text
.
arXiv
.
Berger
,
M.
,
McDonough
,
K.
, &
Seversky
,
L. M.
(
2017
).
cite2vec: Citation-driven document exploration via word embeddings
.
IEEE Transactions on Visualization and Computer Graphics
,
23
(
1
),
691
700
. ,
[PubMed]
Biagioli
,
M.
(
2018
).
Quality to impact, text to metadata: Publication and evaluation in the age of metrics
.
KNOW: A Journal on the Formation of Knowledge
,
2
(
2
),
249
275
.
Biagioli
,
M.
,
Kenney
,
M.
,
Martin
,
B. R.
, &
Walsh
,
J. P.
(
2019
).
Academic misconduct, misrepresentation and gaming: A reassessment
.
Research Policy
,
48
(
2
),
401
413
.
Biagioli
,
M.
, &
Lippman
,
A.
(
2020
).
Gaming the metrics: Misconduct and manipulation in academic research
.
Cambridge, MA
:
MIT Press
.
Bordignon
,
F.
(
2020
).
Self-correction of science: A comparative study of negative citations and post-publication peer review
.
Scientometrics
,
124
,
1225
1239
.
Bornemann-Cimenti
,
H.
,
Szilagyi
,
I. S.
, &
Sandner-Kiesling
,
A.
(
2016
).
Perpetuation of retracted publications using the example of the Scott S. Reuben case: Incidences, reasons and possible improvements
.
Science and Engineering Ethics
,
22
(
4
),
1063
1072
. ,
[PubMed]
Bornmann
,
L.
, &
Daniel
,
H. D.
(
2008
).
What do citation counts measure? A review of studies on citing behavior
.
Journal of Documentation
,
64
(
1
),
45
80
.
Brooks
,
T. A.
(
1985
).
Private acts and public objects: An investigation of citer motivations
.
Journal of the American Society for Information Science
,
36
(
4
),
223
229
.
Brooks
,
T. A.
(
1986
).
Evidence of complex citer motivations
.
Journal of the American Society for Information Science
,
37
(
1
),
34
36
.
Budd
,
J. M.
,
Sievert
,
M.
,
Schultz
,
T. R.
, &
Scoville
,
C.
(
1999
).
Effects of article retraction on citation and practice in medicine
.
Bulletin of the Medical Library Association
,
87
(
4
),
437
443
.
[PubMed]
Campanario
,
J. M.
(
2000
).
Fraud: Retracted articles are still being cited
.
Nature
,
408
(
6810
),
288
. ,
[PubMed]
Cano
,
V.
(
1989
).
Citation behavior: Classification, utility, and location
.
Journal of the American Society for Information Science
,
40
(
4
),
284
290
.
Chubin
,
D. E.
, &
Moitra
,
S. D.
(
1975
).
Content analysis of references: Adjunct or alternative to citation counting?
Social Studies of Science
,
5
(
4
),
423
441
.
Cohan
,
A.
,
Ammar
,
W.
,
van Zuylen
,
M.
, &
Cady
,
F.
(
2019
).
Structural scaffolds for citation intent classification in scientific publications
.
arXiv
.
Cohan
,
A.
,
Feldman
,
S.
,
Beltagy
,
I.
,
Downey
,
D.
, &
Weld
,
D. S.
(
2020
).
SPECTER: Document-level representation learning using citation-informed transformers
.
arXiv
.
Cole
,
J. R.
, &
Cole
,
S.
(
1972
).
The Ortega hypothesis: Citation analysis suggests that only a few scientists contribute to scientific progress
.
Science
,
178
(
4059
),
368
375
. ,
[PubMed]
Cox
,
A.
,
Craig
,
R.
, &
Tourish
,
D.
(
2018
).
Retraction statements and research malpractice in economics
.
Research Policy
,
47
(
5
),
924
935
.
Cozzens
,
S. E.
(
1989
).
What do citations count? The rhetoric-first model
.
Scientometrics
,
15
(
5–6
),
437
447
.
Cronin
,
B.
(
1984
).
The citation process: The role and significance of citations in scientific communication
.
Taylor Graham
.
Cyert
,
R. M.
, &
March
,
J. G.
(
1963
).
A behavioral theory of the firm
.
University of Illinois at Urbana-Champaign’s Academy for Entrepreneurial Leadership Historical Research Reference in Entrepreneurship
.
Da Silva
,
J. A. T.
, &
Bornemann-Cimenti
,
H.
(
2017
).
Why do some retracted papers continue to be cited?
Scientometrics
,
110
(
1
),
365
370
.
Davis
,
P. M.
(
2012
).
The persistence of error: A study of retracted articles on the Internet and in personal libraries
.
Journal of the Medical Library Association
,
100
(
3
),
184
189
. ,
[PubMed]
Dinh
,
L.
,
Sarol
,
J.
,
Cheng
,
Y.-Y.
,
Hsiao
,
T.-K.
,
Parulian
,
N.
, &
Schneider
,
J.
(
2019
).
Systematic examination of pre- and post-retraction citations
.
Proceedings of the Association for Information Science and Technology
,
56
(
1
),
390
394
.
Drivas
,
K.
, &
Kremmydas
,
D.
(
2020
).
The Matthew Effect of a journal’s ranking
.
Research Policy
,
49
(
4
),
103951
.
Fox
,
M. F.
,
Whittington
,
K. B.
, &
Linkova
,
M.
(
2017
).
Gender, (in)equity, and the scientific workforce
. In
U.
Felt
,
R.
Fouché
,
C. A.
Miller
, &
L.
Smith-Doerr
(Eds.),
The handbook of science and technology studies
.
Cambridge, MA
:
MIT Press
.
Franzoni
,
C.
,
Scellato
,
G.
, &
Stephan
,
P.
(
2011
).
Changing incentives to publish
.
Science
,
333
(
6043
),
702
703
. ,
[PubMed]
Frost
,
C. O.
(
1979
).
The use of citations in literary research: A preliminary classification of citation functions
.
The Library Quarterly
,
49
(
4
),
399
414
.
Furman
,
J. L.
,
Jensen
,
K.
, &
Murray
,
F.
(
2012
).
Governing knowledge in the scientific community: Exploring the role of retractions in biomedicine
.
Research Policy
,
41
(
2
),
276
290
.
Garfield
,
E.
(
2006
).
The history and meaning of the journal impact factor
.
JAMA
,
295
(
1
),
90
93
. ,
[PubMed]
Garfield
,
E.
, &
Welljams-Dorof
,
A.
(
1990
).
The impact of fraudulent research on the scientific literature: The Stephen E. Breuning case
.
JAMA
,
263
(
10
),
1424
1426
. ,
[PubMed]
Gilbert
,
G. N.
(
1977
).
Referencing as persuasion
.
Social Studies of Science
,
7
(
1
),
113
122
.
Hackett
,
E. J.
(
1990
).
Science as a vocation in the 1990s: The changing organizational culture of academic science
.
The Journal of Higher Education
,
61
(
3
),
241
279
.
Hagstrom
,
W. O.
(
1964
).
Traditional and modern forms of scientific teamwork
.
Administrative Science Quarterly
,
9
(
3
),
241
263
.
Hamilton
,
D. G.
(
2019
).
Continued citation of retracted radiation oncology literature—Do we have a problem?
International Journal of Radiation Oncology*Biology*Physics
,
103
(
5
),
1036
1042
. ,
[PubMed]
Harzing
,
A.-W. K.
(
1995
).
The persistent myth of high expatriate failure rates
.
The International Journal of Human Resource Management
,
6
(
2
),
457
474
.
Harzing
,
A.-W. K.
(
2002
).
Are our referencing errors undermining our scholarship and credibility? The case of expatriate failure rates
.
Journal of Organizational Behavior
,
23
(
1
),
127
148
.
Harzing
,
A.-W. K.
, &
Kroonenberg
,
P.
(
2016
).
The mystery of the phantom reference
. .
Hicks
,
D.
(
1995
).
Published papers, tacit competencies and corporate management of the public/private character of knowledge
.
Industrial and Corporate Change
,
4
(
2
),
401
424
.
Hicks
,
D.
,
Wouters
,
P.
,
Waltman
,
L.
,
De Rijcke
,
S.
, &
Rafols
,
I.
(
2015
).
Bibliometrics: The Leiden Manifesto for research metrics
.
Nature
,
520
(
7548
),
429
431
. ,
[PubMed]
Hoerman
,
H. L.
, &
Nowicke
,
C. E.
(
1995
).
Secondary and tertiary citing: A study of referencing behavior in the literature of citation analysis deriving from the Ortega Hypothesis of Cole and Cole
.
The Library Quarterly
,
65
(
4
),
415
434
.
Honig
,
B.
, &
Bedi
,
A.
(
2012
).
The fox in the hen house: A critical examination of plagiarism among members of the Academy of Management
.
Academy of Management Learning & Education
,
11
(
1
),
101
123
.
Hsiao
,
T.-K.
, &
Schneider
,
J.
(
2021
).
Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine
.
Quantitative Science Studies
,
2
(
4
),
1144
1169
. ,
[PubMed]
Huang
,
J.
,
Gates
,
A. J.
,
Sinatra
,
R.
, &
Barabási
,
A.-L.
(
2020
).
Historical comparison of gender inequality in scientific careers across countries and disciplines
.
Proceedings of the National Academy of Sciences
,
117
(
9
),
4609
4616
. ,
[PubMed]
Jin
,
G. Z.
,
Jones
,
B.
,
Lu
,
S. F.
, &
Uzzi
,
B.
(
2019
).
The reverse Matthew Effect: Consequences of retraction in scientific teams
.
Review of Economics and Statistics
,
101
(
3
),
492
506
.
Jones
,
B. F.
(
2009
).
The burden of knowledge and the death of the renaissance man: Is innovation getting harder?
Review of Economic Studies
,
76
(
1
),
283
317
.
Jurgens
,
D.
,
Kumar
,
S.
,
Hoover
,
R.
,
McFarland
,
D.
, &
Jurafsky
,
D.
(
2018
).
Measuring the evolution of a scientific field through citation frames
.
Transactions of the Association for Computational Linguistics
,
6
,
391
406
.
Kaplan
,
N.
(
1965
).
The norms of citation behavior: Prolegomena to the footnote
.
American Documentation
,
16
(
3
),
179
184
.
Katz
,
T. J.
(
2006
).
Propagation of errors in review articles
.
Science
,
313
(
5791
),
1236
. ,
[PubMed]
Knorr-Cetina
,
K. D.
(
1981
).
The manufacture of knowledge
.
Oxford
:
Pergamon
.
Kochan
,
C. A.
, &
Budd
,
J. M.
(
1992
).
The persistence of fraud in the literature: The Darsee case
.
Journal of the American Society for Information Science
,
43
(
7
),
488
493
. ,
[PubMed]
Kolympiris
,
C.
,
Drivas
,
K.
,
Helsby
,
M.
, &
Chalmers
,
A.
(
2020
).
How scientists search
.
Paper presented at the 14th Workshop on the Organisation, Economics and Policy of Scientific Research
.
Larivière
,
V.
, &
Gingras
,
Y.
(
2010
).
The impact factor’s Matthew Effect: A natural experiment in bibliometrics
.
Journal of the American Society for Information Science and Technology
,
61
(
2
),
424
427
.
Larivière
,
V.
,
Kiermer
,
V.
,
MacCallum
,
C. J.
,
McNutt
,
M.
,
Patterson
,
M.
, …
Curry
,
S.
(
2016
).
A simple proposal for the publication of journal citation distributions
.
bioRxiv
.
Larivière
,
V.
,
Ni
,
C.
,
Gingras
,
Y.
,
Cronin
,
B.
, &
Sugimoto
,
C. R.
(
2013
).
Bibliometrics: Global gender disparities in science
.
Nature
,
504
(
7479
),
211
213
. ,
[PubMed]
Latour
,
B.
(
1987
).
Science in action: How to follow scientists and engineers through society
.
Cambridge, MA
:
Harvard University Press
.
Latour
,
B.
, &
Woolgar
,
S.
(
1979
).
Laboratory life: The construction of scientific facts
.
Princeton, NJ
:
Princeton University Press
.
Le
,
Q.
, &
Mikolov
,
T.
(
2014
).
Distributed representations of sentences and documents
. In
Proceedings of the 31st International Conference on Machine Learning
(
Vol. 32
, pp.
II-1188
II-1196
).
Lee
,
J.
,
Yoon
,
W.
,
Kim
,
S.
,
Kim
,
D.
,
Kim
,
S.
, …
Kang
,
J.
(
2020
).
BioBERT: A pre-trained biomedical language representation model for biomedical text mining
.
Bioinformatics
,
36
(
4
),
1234
1240
. ,
[PubMed]
Leng
,
R. I.
(
2020
).
The phantom reference and the propagation of error
. .
Leung
,
P. T. M
,
Macdonald
,
E. M.
,
Stanbrook
,
M. B.
,
Dhalla
,
I. A.
, &
Juurlink
,
D. N.
(
2017
).
A 1980 letter on the risk of opioid addiction
.
New England Journal of Medicine
,
376
(
22
),
2194
2195
. ,
[PubMed]
Lewellyn
,
K. B.
,
Judge
,
W. Q.
, &
Smith
,
A.
(
2017
).
Exploring the questionable academic practice of conference paper double dipping
.
Academy of Management Learning & Education
,
16
(
2
),
217
236
.
Leydesdorff
,
L.
(
1987
).
Towards a theory of citation?
Scientometrics
,
12
(
5–6
),
305
309
.
Li
,
Y.
,
Youtie
,
J.
, &
Shapira
,
P.
(
2015
).
Why do technology firms publish scientific papers? The strategic use of science by small and midsize enterprises in nanotechnology
.
The Journal of Technology Transfer
,
40
(
6
),
1016
1033
.
Liu
,
L.
,
Jones
,
B. F.
,
Uzzi
,
B.
, &
Wang
,
D.
(
2023
).
Data, measurement and empirical methods in the science of science
.
Nature Human Behaviour
,
7
(
7
),
1046
1058
. ,
[PubMed]
Long
,
J. S.
(
1997
).
Regression models for categorical and limited dependent variables
(
Vol. 7
).
Thousand Oaks, CA
:
Sage
.
Long
,
J. S.
, &
Fox
,
M. F.
(
1995
).
Scientific careers: Universalism and particularism
.
Annual Review of Sociology
,
21
,
45
71
.
MacRoberts
,
M. H.
, &
MacRoberts
,
B. R.
(
1986
).
Quantitative measures of communication in science: A study of the formal level
.
Social Studies of Science
,
16
(
1
),
151
172
.
MacRoberts
,
M. H.
, &
MacRoberts
,
B. R.
(
1987
).
Testing the Ortega hypothesis: Facts and artifacts
.
Scientometrics
,
12
(
5–6
),
293
295
.
Maisonobe
,
M.
,
Grossetti
,
M.
,
Milard
,
B.
,
Jégou
,
L.
, &
Eckert
,
D.
(
2017
).
The global geography of scientific visibility: A deconcentration process (1999–2011)
.
Scientometrics
,
113
(
1
),
479
493
.
Matthiessen
,
C. W.
,
Schwarz
,
A. W.
, &
Find
,
S.
(
2002
).
The top-level global research system, 1997–99: Centres, networks and nodality. An analysis based on bibliometric indicators
.
Urban Studies
,
39
(
5–6
),
903
927
.
Merton
,
R. K.
(
1957
).
Priorities in scientific discovery: A chapter in the sociology of science
.
American Sociological Review
,
22
(
6
),
635
659
.
Merton
,
R. K.
(
1968
).
The Matthew Effect in science: The reward and communication systems of science are considered
.
Science
,
159
(
3810
),
56
63
. ,
[PubMed]
Merton
,
R. K.
(
1973
).
The normative structure of science
. In
N. W.
Storer
(Ed.),
The sociology of science: Theoretical and empirical investigations
(pp.
267
278
).
Chicago, IL
:
University of Chicago Press
.
Mikolov
,
T.
,
Sutskever
,
I.
,
Chen
,
K.
,
Corrado
,
G. S.
, &
Dean
,
J.
(
2013
).
Distributed representations of words and phrases and their compositionality
. In
Proceedings of the 26th International Conference on Neural Information Processing Systems
(
Vol. 2
, pp.
3111
3119
).
Milojević
,
S.
(
2015
).
Quantifying the cognitive extent of science
.
Journal of Informetrics
,
9
(
4
),
962
973
.
Milojević
,
S.
,
Radicchi
,
F.
, &
Walsh
,
J. P.
(
2018
).
Changing demographics of scientific careers: The rise of the temporary workforce
.
Proceedings of the National Academy of Sciences
,
115
(
50
),
12616
12623
. ,
[PubMed]
Moravcsik
,
M. J.
, &
Murugesan
,
P.
(
1975
).
Some results on the function and quality of citations
.
Social Studies of Science
,
5
(
1
),
86
92
.
Oransky
,
I.
, &
Marcus
,
A.
(
2021
).
Top 10 most highly cited retracted papers
. .
Osterloh
,
M.
, &
Frey
,
B. S.
(
2020
).
How to avoid borrowed plumes in academia
.
Research Policy
,
49
(
1
),
103831
.
Pan
,
R. K.
,
Kaski
,
K.
, &
Fortunato
,
S.
(
2012
).
World citation and collaboration networks: Uncovering the role of geography in science
.
Scientific Reports
,
2
,
902
. ,
[PubMed]
Pfeifer
,
M. P.
, &
Snodgrass
,
G. L.
(
1990
).
The continued use of retracted, invalid scientific literature
.
JAMA
,
263
(
10
),
1420
1423
. ,
[PubMed]
Price
,
D. J. de Solla
. (
1963
).
Little science, big science
.
Columbia University Press
.
Reich
,
E. S.
(
2009
).
Plastic fantastic: How the biggest fraud in physics shook the scientific world
.
New York, NY
:
Palgrave Macmillan
.
Retraction Watch
. (
2019
).
The Retraction Watch database
. .
Schneider
,
J.
,
Ye
,
D.
,
Hill
,
A. M.
, &
Whitehorn
,
A. S.
(
2020
).
Continued post-retraction citation of a fraudulent clinical trial report, 11 years after it was retracted for falsifying data
.
Scientometrics
,
125
(
3
),
2877
2913
.
Shibayama
,
S.
, &
Baba
,
Y.
(
2015
).
Impact-oriented science policies and scientific publication practices: The case of life sciences in Japan
.
Research Policy
,
44
(
4
),
936
950
.
Simkin
,
M. V.
, &
Roychowdhury
,
V. P.
(
2005
).
Stochastic modeling of citation slips
.
Scientometrics
,
62
(
3
),
367
384
.
Simon
,
H. A.
(
1997
).
Administrative behavior
.
New York, NY
:
Simon & Schuster
.
Simon
,
H. A.
, &
March
,
J. G.
(
1958
).
Organizations
.
Wiley
.
Simon
,
H. A.
, &
Newell
,
A.
(
1971
).
Human problem solving: The state of the theory in 1970
.
American Psychologist
,
26
(
2
),
145
159
.
Stinchcombe
,
A. L.
(
1968
).
Constructing social theories
.
Chicago, IL
:
University of Chicago Press
.
Tahamtan
,
I.
, &
Bornmann
,
L.
(
2019
).
What do citation counts measure? An updated review of studies on citations in scientific documents published between 2006 and 2018
.
Scientometrics
,
121
(
3
),
1635
1684
.
Teplitskiy
,
M.
,
Duede
,
E.
,
Menietti
,
M.
, &
Lakhani
,
K. R.
(
2022
).
How status of research papers affects the way they are read and cited
.
Research Policy
,
51
(
4
),
104484
.
Teufel
,
S.
,
Siddharthan
,
A.
, &
Tidhar
,
D.
(
2006
).
Automatic classification of citation function
. In
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
(pp.
103
110
).
Traag
,
V. A.
(
2021
).
Inferring the causal effect of journals on citations
.
Quantitative Science Studies
,
2
(
2
),
496
504
.
Tversky
,
A.
, &
Kahneman
,
D.
(
1974
).
Judgment under uncertainty: Heuristics and biases
.
Science
,
185
(
4157
),
1124
1131
. ,
[PubMed]
Unger
,
K.
, &
Couzin
,
J.
(
2006
).
Even retracted papers endure
.
Science
,
312
(
5770
),
40
41
. ,
[PubMed]
Van Noorden
,
R.
(
2011
).
Science publishing: The trouble with retractions
.
Nature
,
478
(
7367
),
26
28
. ,
[PubMed]
Vinkler
,
P.
(
1987
).
A quasi-quantitative citation model
.
Scientometrics
,
12
(
1–2
),
47
72
.
Von Hippel
,
P.
(
2015
).
Linear vs. logistic probability models: Which is better, and when?
Statistical Horizons
, .
Wager
,
E.
,
Barbour
,
V.
,
Yentis
,
S.
, &
Kleinert
,
S.
(
2009
).
Retractions: Guidance from the Committee on Publication Ethics (COPE)
.
Maturitas
,
64
(
4
),
201
203
. ,
[PubMed]
Walsh
,
J. P.
, &
Lee
,
Y.-N.
(
2015
).
The bureaucratization of science
.
Research Policy
,
44
(
8
),
1584
1600
.
Walsh
,
J. P.
,
Lee
,
Y.-N.
, &
Tang
,
L.
(
2019
).
Pathogenic organization in science: Division of labor and retractions
.
Research Policy
,
48
(
2
),
444
461
.
Walters
,
W. H.
, &
Wilder
,
E. I.
(
2023
).
Fabrication and errors in the bibliographic citations generated by ChatGPT
.
Scientific Reports
,
13
,
14045
. ,
[PubMed]
West
,
J. D.
, &
Bergstrom
,
C. T.
(
2021
).
Misinformation in and about science
.
Proceedings of the National Academy of Sciences
,
118
(
15
),
e1912444117
. ,
[PubMed]
Woo
,
S.
(
2024
).
On the shoulders of fallen giants: What do references to retracted research tell us about citation behaviors?
Zenodo
. https://zenodo.org/records/10692464
Wooding
,
S.
(
2020
).
Heuristics, not plumage: A response to Osterloh and Frey’s discussion paper on ‘borrowed plumes’
.
Research Policy
,
49
(
1
),
103871
.
Woolgar
,
S.
(
1991
).
Beyond the citation debate: Towards a sociology of measurement technologies and their use in science policy
.
Science and Public Policy
,
18
(
5
),
319
326
.
Wuestman
,
M. L.
,
Hoekman
,
J.
, &
Frenken
,
K.
(
2019
).
The geography of scientific citations
.
Research Policy
,
48
(
7
),
1771
1780
.
Zhou
,
P.
, &
Leydesdorff
,
L.
(
2006
).
The emergence of China as a leading nation in science
.
Research Policy
,
35
(
1
),
83
104
.
Zuckerman
,
H.
(
1987
).
Citation analysis and the complex problem of intellectual influence
.
Scientometrics
,
12
(
5–6
),
329
338
.
Zuckerman
,
H.
, &
Merton
,
R. K.
(
1971
).
Patterns of evaluation in science: Institutionalisation, structure and functions of the referee system
.
Minerva
,
9
(
1
),
66
100
.

Author notes

Handling Editor: Vincent Larivière

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data