The spread of retracted research into policy literature

Abstract Retractions warn users against relying on problematic evidence. Until recently, it has not been possible to systematically examine the influence of retracted research on policy literature. Here, we use three databases to measure the extent of the phenomenon and explore what it might tell us about the users of such evidence. We identify policy-relevant documents that cite retracted research, we review and categorize the nature of citations, and we interview policy document authors. Overall, we find that 2.3% of retracted research is policy-cited. This seems higher than one might have expected, similar even to some notable benchmarks for “normal” nonretracted research that is policy-cited. The phenomenon is also multifaceted. First, certain types of retracted research (those with errors, types 1 and 4) are more likely to be policy-cited than other types (those without errors, types 2 and 3). Second, although some policy-relevant documents cite retracted research negatively, positive citations are twice as common and frequently occur after retraction. Third, certain types of policy organizations appear better at identifying problematic research and are perhaps more discerning when selecting and evaluating research.


INTRODUCTION
In 2020, amid the scramble for research to inform COVID-19 policy, one data set stood out. The Surgisphere database purported to offer real-time patient data from thousands of hospitals across five continents. It was the basis for a series of grand claims in an April 2020 preprint on ivermectin, a May 2020 Lancet publication on hydroxychloroquine, and a May 2020 New England Journal of Medicine publication on angiotensins. By June 2020, all three publications were retracted after it was found that the data were falsified and patients were nonexistent.
Within a day of the Lancet publication, the WHO halted its trials of hydroxychloroquine in response to the paper. A week later, but before the retraction, it reversed its decision, choosing to disregard the evidence in the paper. In contrast, the Peruvian government had included ivermectin in its treatment guidelines, citing the preprint heavily in its white paper. Ivermectin remained on its therapeutic guidelines until 2021, long after the retraction.
The contrasting examples above illustrate our central interests in this study. To what extent do retracted studies influence policy literature, before and after retraction? What might this say about the users of such evidence?
Over the past decade, retraction, an esoteric feature of the scholarly publishing system, has come increasingly into the spotlight. Retractions are intended to prevent the spread of a n o p e n a c c e s s j o u r n a l problematic research and help to ensure the integrity of the scientific record (Pfeifer & Snodgrass, 1990). As the number of retracted studies has grown (Brainard, 2018), so too has interest in retractions as a mechanism of scientific self-correction (Steen, Casadevall, & Fang, 2013, Van Noorden, 2011 . Guidelines for retraction were proposed (COPE Council, 2009), and an influential blog called Retraction Watch (RW) laid the groundwork for the first comprehensive database of retracted articles (Marcus & Oransky, 2014).
The use of retracted research by other researchers, in the form of scholarly citations, has been one of its most well studied aspects. Retraction can take up to several years, during which problematic publications remain unchallenged and can influence other scholars' research direction, methodology, and results (Teixeira da Silva & Bornemann-Cimenti, 2017). Even after retraction, articles often continue to be cited as valid evidence, albeit at a reduced rate (Bar-Ilan & Halevi, 2017Dinh, Sarol et al., 2019;Furman, Jensen, & Murray, 2012;Redman, Yarandi, & Merz, 2008).
This emphasis on scholarly citations of retracted articles, however, has neglected other important ways in which retracted research may exert an influence. The use of research in policy and practice is one such area, where retracted studies can produce similarly disruptive effects sometimes with life-threatening consequences (Marcus, 2018;Steen, 2011). One reason for the neglect is the relative paucity of databases recording instances where research is used, and cited, in policy literature.
The recent arrival of new databases has opened new possibilities for those seeking a broad estimate of the cognitive influence of retracted research on policy (Tattersall & Carroll, 2018). The emergence of altmetrics has prompted renewed study of research in the media, online communities, and, most notably for our purposes-policy-relevant documents (Haunschild & Bornmann, 2017). So far, there have been few attempts to apply them to the study of retracted research.
We deploy a mixed method approach that makes it possible to analyze the spread of retracted research in policy-relevant documents. By combining data from RW, Overton, and Altmetric, alongside data gathered from interviews, we analyze how retracted articles make their way into policy-relevant documents before and after the retraction, how exactly they are cited, and why this might be happening. Our results suggest that retracted research does indeed creep into policy literature frequently, perhaps even as much as nonretracted research-but it does so unevenly.
Empirical approaches tend to focus on pre-and postretraction citations. Studies have examined the spillover effect of retracted research on adjacent subject fields (Azoulay, Furman et al., 2015), career productivity of authors and associated authors (Azoulay, Bonatti, & Krieger, 2017;Jin, Jones et al., 2019;Mongeon & Larivière, 2016) or, in the case of retracted medical research, the effect on patients' wellbeing (Steen, 2011). These studies show that the impact of continued citation of retracted studies is considerable.
To understand why retracted research continues to spread even after the retraction, one line of inquiry has examined postretraction citation context, and specifically whether a citation acknowledges the retraction (Deculllier & Maisonneuve, 2018;Fulton, Coates et al., 2015;Gray, Al-Ghareeb, & McKenna, 2019;Moylan & Kowalczuk, 2016;Vuong, La et al., 2020). Early work by Budd, Sievert, and Schultz (1998) showed that only a fraction of citations acknowledged the retraction, whereas the majority implicitly or explicitly used retracted research as valid evidence. This has been confirmed across different fields, time periods, and data sources (Bar-Ilan & Halevi, 2017). Together, these studies suggest that limited dissemination of retraction information may play a role in why we see persistent citation of retracted studies.
Nevertheless, the retraction signal is at least partially effective. Studies employing careful controls observed a 65% decline in citation rates after the retraction (Furman et al., 2012). Other studies supported this conclusion and there seems to be some consensus on the fact that citations decrease after the retraction (Dinh et al., 2019;Pfeifer & Snodgrass, 1990). The implication still remains that if some citations continue to accumulate after the retraction, more needs to be done to curb the spread. An analogous phenomenon could be lurking in the policy arena.

Spread of Retracted Research Outside of Academia
Concern about retracted research spreading beyond academic circles has been growing recently. A number of studies have been spurred by the availability of Altmetric data measuring online attention. One reported a positive association between Altmetric Score and shorter time to retraction, implying that problematic studies receiving more online attention tend to be scrutinized more and retracted faster (Shema, Hahn et al., 2019). 3 Jan and Zainab (2018) examine postretraction online mentions and report the number of citations in different Altmetric sources for a handful of retracted studies. Similarly, Bar-Ilan and Halevi (2018) analyzed how 995 retracted publications were cited in scholarly literature, Twitter, news, blogs and Wikipedia, and the number of Mendeley readers these publications had. Another study used Altmetric Score to show the broader impacts of retracted research (Feng, Yuan, & Yang, 2020). These studies demonstrate the potential of looking beyond scholarly citations, but they remain small in scale.
The most recent and relevant study investigated Altmetric Attention Scores (AAS) to retracted articles from the RW database and found that retracted articles had higher AAS than control unretracted articles, although for popular articles, preretraction attention far exceeded postretraction attention (Serghiou, Marton, & Ioannidis, 2021). As seen in the above examples, although there has been work in combining retraction data with altmetric indicators, most of it used AAS, which mixes attention from very different online sources from Twitter to YouTube. The heterogeneity of these sources and different rates of data accumulation (Sugimoto & Larivière, 2018) might undermine the linkage between the indicator and real-life phenomena that the indicator attempts to measure (Small, 1978). In this respect, a focus on one altmetric source could be more warranted. Policy-relevant documents represent one such source that has been described among the most high-value altmetric indicators due to its relationship with practice and arguably even societal impacts (Bornmann, Haunschild, & Marx, 2016;Tahamtan & Bornmann, 2020). However, no work in this direction has been undertaken yet.

Measuring Research Use by Tracing Back from Policy Literature
Various methodologies have been proposed to measure research use in policy, including interviews and documentary analysis (Hanney, Gonzalez-Block et al., 2003). Studies based on interviews are common and often attempt to qualitatively assess perceptions of research use among policymakers, identify barriers and facilitators of research use, and understand research selection and appraisal procedures (Elliott & Popay, 2000;Hyde, Mackie et al., 2016;Innvaer, Vist et al., 2002). For example, interviews with health authorities and researchers were used in combination with the analysis of project documentation in a study of research utilization practices in the NHS (Elliott & Popay, 2000). In line with Weiss's interactive model (Weiss, 1979), the authors found that research was often used in indirect ways, rather than to provide answers (Elliott & Popay, 2000). The same conclusion was reached by another interview-driven study on health policies for children in foster care (Hyde et al., 2016). Notably, this study also found that policymakers were producing research evidence themselves in addition to research use in the more standard interactive model (Hyde et al., 2016).
Studies involving documentary analysis often aim to trace back from a starting point to find prior research inputs as evidence of research utilization (Grant, Cottrell et al., 2000;Innvaer, 2009;Kryl, Allen et al., 2012;Newson, Rychetnik et al., 2018;Zardo, Collie, & Livingstone, 2014). As one example, a backward tracing study of transport injury compensation policies in Australia used quantitative content analysis of 128 policy documents (Zardo et al., 2014). By analyzing references to research, the authors found that academic evidence was the least used, and most references drew on clinical evidence and internal policies (Zardo et al., 2014). Another example of documentary backward tracing comes from bibliometric analyses of research publications featured in NICE guidelines (Grant et al., 2000;Kryl et al., 2012). This type of analysis can be useful in identifying potentially interesting features of research used in policy documents. For instance, the studies pointed out that NICE guidelines featured U.K. and U.S. research more than other international studies (Grant et al., 2000;Kryl et al., 2012).
Our search identified only one study that used more detailed citation analysis to evaluate how exactly research is cited in policy documents (Newson et al., 2018). This study offers an important critique of attempts to use policy document analysis for the purpose of linking research to actual policies and especially impacts. The authors note that using policy citations of research, without examination of the nature of the citation context, does not shed light on why individual publications are chosen (Newson et al., 2018). One way to address this is mixed method triangulation (Williamson & Johanson, 2018) and combining interviews with documentary analysis (Hutchinson, 2011;Nabyonga-Orem, Ssengooba et al., 2014) 4 .

Measuring Research Use by Tracing Forward from Research
A more fundamental shortcoming of backward tracing approaches is that, by selecting on the dependent variable (i.e., use in policy), the approach inherently limits the scope for systematic analysis of research and its characteristics as potential explanatory variables. Over the past decade, several tools have emerged that greatly facilitate forward tracing analysis of research utilization using big data sets of scientific publications, most notably Altmetric and Overton. This section addresses their coverage, along with their possibilities and limitations for the study of research use in policy.
Much of the research involving Altmetric has centered on its relationship with traditional scientometric indicators (Costas, Zahedi, & Wouters, 2015), potential for research impact assessment (Bornmann, 2014;Tahamtan & Bornmann, 2020) and various analyses of scholarly use of social media (Sugimoto, Work et al., 2017). Little attention has been dedicated to the phenomenon of research use in policy 5 . However, two studies that have investigated the coverage of research within Altmetric policy data offer useful parallels for our inquiry. These studies reported that less than 0.5% of all Web of Science publications (Haunschild & Bornmann, 2017) and 1.2% of Web of Science climate change publications (Bornmann et al., 2016) were cited in Altmetric-tracked policy-relevant documents.
Overton, another altmetrics company, was launched in 2018 by the founder of Altmetric and since then has been providing services to universities, research funding organizations, and NGOs who wish to understand their policy impact (Adie, 2020). Compared to Altmetric, Overton specializes exclusively in policy-relevant documents 6 . The overall coverage was found to be at 3.9% (Fang, Dudek et al., n.d.), which was higher than Altmetric's in similar studies (Bornmann et al., 2016;Haunschild & Bornmann, 2017) although Altmetric's coverage may have grown since those studies took place. 4 An elaborate version of such a mixed methodology called SAGE (Staff Assessment of Engagement with Evidence) was proposed to evaluate how policymakers search for, appraise, and use research to inform policies (Makkar, Brennan et al., 2016;Redman, Turner et al., 2015). What makes SAGE quite unique is that the interviews do not just address research utilization broadly, but focus on the development of a specific policy document that the interviewee was involved in (Redman et al., 2015). The SAGE framework was consequently used in a large-scale study of research use assessment in Australian health policies (Williamson & Johanson, 2018). 5 Some notable exceptions include the analysis of policy uptake of about 1,500 research publications by authors from the University of Sheffield (Tattersall & Carroll, 2018) and a recent study done by the same group on policy impact of NIHR trials (Carroll & Tattersall, 2020). Most other studies used policy documents along with other altmetric sources and not as a standalone indicator. 6 Overton's coverage is also qualitatively different from Altmetric and is said to include more policy documents related to economics (Adie, personal communication, October 30, 2020). An unpublished study by Leiden University conducted an extensive analysis of Overton's coverage of research citations and their distribution across various subject fields (Fang et al., n.d.). The study reported that publications in social sciences and humanities had the highest coverage in Overton, with life and earth sciences being the second and biomedical and health sciences the third (Fang et al., n.d.). Another recent analysis used Overton to track the coevolution of COVID-19 research and policy (Yin, Gao et al., 2021). Overton has also been used in a study suggesting that cross-disciplinary research is more often utilized by policy documents (Pinheiro, Vignola-Gagné, & Campbell, 2021).
Together, these studies offer a helpful set of benchmarks, ranging between 0.5% and 3.9%, for the share of normal nonretracted research publications that are cited in policy relevant documents. A priori, we would expect retracted research to be cited in policyrelevant documents much less than this, given what we know so far about the impact of retraction on citations. However, the phenomenon may be more multifaceted and varied for policy citations, because policy users and research users are likely to have different citation practices.

Data Collection
We began by collecting retracted publications drawn from the RW database in May 2020 along with their publication title, date, journal, author names and affiliations, unique identifiers (DOI and PubMed ID), and retraction-specific information, such as retraction date and reason. We also collected citation counts via the Crossref API for each retracted publication.
We then matched these retracted publications, using their DOIs, with two databases of policy-relevant documents: Altmetric and Overton. We retrieved policy-relevant documents that cited RW articles in November 2020 using researcher API access. Combining data from these two sources improved matching with RW publications, but also presented some challenges 7 .
Next, we downloaded all policy-relevant documents in PDF format following URL links provided by the policy databases. When we could not download a document using the link, we attempted to retrieve it from alternative sources. If the link led to a web page with multiple PDF documents, all of them were checked to identify the correct one. When these procedures failed, the document was labeled as "not found." Last, we removed documents that were not in English, or were clearly not policy documents. 8

Data Analysis
We adopted citation as the main unit of analysis, with supplementary analyses at the level of policy-relevant documents and retracted publications. This is because individual policyrelevant documents could contain references to multiple retracted publications, and conversely, individual retracted publications could be referenced in multiple policy-relevant documents.
Following research in citation context analysis (Bornmann & Daniel, 2008;Tahamtan & Bornmann, 2019), we assume that policy-relevant documents can also cite research either positively or negatively. We categorized policy citations of retracted articles as either 7 Duplicate policy-relevant documents from different databases could not always be identified using automated approaches, as their titles, URLs, and even policy organizations could be different depending on the database of origin. Therefore, additional cleaning, document disambiguation, and sorting out of documents in other languages had to be done manually. 8 Such nonpolicy documents were also identified by other researchers during a recent qualitative analysis of Overton (Pinheiro et al., 2021). These documents could be considered as data artifacts of policy-tracking tools and included conference programs, personal CVs and other documents where the reference had little to do with the actual content of the document. positive/neutral or negative/exclusionary, and as either acknowledging the cited publication as retracted or not. To this end, each citing policy document was reviewed to locate the citation within the text and bibliography and to assign the codes. The complete categorization manual can be found in the Supplementary material.
Coding on the full data set was done by one author, and a random 10% sample was coded by another author, which made it possible to calculate Cohen's kappa intercoder reliability score. The values of Cohen's kappa were interpreted according to the following scale: 0-poor agreement; 0.01 to 0.20-slight; 0.21 to 0.40-fair; 0.41 to 0.60-moderate; 0.61 to 0.80substantial; 0.81 to 1.00-almost perfect (Landis & Koch, 1977).
We calculated time from publication to retraction, from publication to policy, and from retraction to policy in full years for each pair of documents. We used time from retraction to policy to distinguish between pre-and postretraction citations. We analyzed the distribution of citation types and retraction acknowledgment before and after the retraction. 9 Although some studies of retracted literature used citation time windows to account for publishing delays, no such measures were adopted in this study. The main reason for this is that, contrary to scholarly literature, publication delays in policy literature are likely to exhibit wider variation. Some policy documents (e.g., policy briefs) could be expected to be published within days, but others could take years (e.g., guidelines or committee reviews). So, in this study, citations received during the year of retraction were assumed to be in the "grey area" neither before nor after the retraction.
We analyzed retraction reasons using data from the RW database. However, because each publication in RW could be assigned several reasons from a list of more than 80, the analysis relied on the methodology from Feng et al. (2020). Their approach grouped retracted publications, according to four retraction types (Table 1). Detailed explanation of the procedure is available in the Supplementary material.
We used organization data obtained from policy databases for further analyses. Policy organization names were extracted and disambiguated separately. We categorized each organization into one of four types: Government, IGO, NGO/Think Tank, or Aggregator. We adopted this classification from Overton, where it was already applied to some organizations. The extra coding was mainly aimed at classifying policy-relevant documents from Altmetric, where no such classification existed in the data.
We reported data as counts and percentages when categorical, and as medians and interquartile ranges when continuous. We conducted a Wilcoxon rank-sum test to compare the 9 We note that data from policy databases were not always reliable with respect to the publication date of policy documents. Presumably, when parsing policy repositories, policy-tracking tools gather these metadata based on the date when the document was deposited in a policy repository. This date did not always correspond to the exact date on the document, which affected the subsequent calculation of such parameters as time from retraction to policy. Therefore, when retrieved policy date differed from the date on the document, we made a correction in favor of the latter.

Interviews
To explore specific types of citing behavior that might lead to positive postretraction citations, we conducted 30-minute semistructured interviews with authors of policy reports who positively cited retracted publications after the retraction. We sent invitations to prospective participants, conditional on the availability of author names in corresponding policy reports and the availability of authors' email addresses in the public domain. This resulted in 61 email invitations and 10 interviews, with interviewees drawn from the United States, Canada, Australia, New Zealand, France, Switzerland, Germany, and the United Kingdom, and representing a range of policy organizations including IGOs, NGOs, and government agencies.
Our interviews aimed to address aspects of research use in policy in relation to specific documents, focusing on research engagement actions and types of research use (see Makkar et al., 2016). We added additional questions regarding the interviewee's familiarity with retractions and the context of the particular citation. We transcribed the interviews and coded them using the NVivo software to identify recurrent topics. Of the 21,424 unique publications in the RW database, we found 16,095 unique publications with DOIs. Data retrieval from Altmetric produced 167 publication matches, which amounted to 1% coverage of RW publications. These publications were cited 305 times in 266 documents. In turn, Overton provided a higher matching rate of 437 (2.7% coverage) publications, which were cited 852 times in 731 policy documents. Additional curation of the merged Altmetric and Overton data removed duplicates (n = 199), non-English documents (n = 219), documents that could not be downloaded or where correct citations were not found (n = 71), and cases that were not actual policy documents (n = 24).
The clean data set amounted to 367 (2.3% coverage) publications cited 644 times in 563 policy documents. The flowchart outlining the entire procedure is shown in Figure 1.
We compared the 367 retracted publications cited in policy documents with the entire population of publications with DOIs in the RW database. We found significant (p < 0.001) differences between these two groups. Retracted publications cited in policy tended to have a longer time to retraction, more Crossref citations, and fewer type 3 retractions (no error and no misconduct). In fact, 98% of policy-cited publications had one or more scholarly citations, as opposed to 66% in the complete RW data. Summary statistics comparing complete RW data with the sample are presented in Table 2.
We categorized citations of retracted publications in policy documents as either positive/neutral or negative/exclusionary, and as either acknowledging the retraction or not. Cohen's kappa score was 0.88 for the first variable and 0.78 for the second, indicating at least substantial agreement.
The analysis of 644 citations in 563 documents is summarized in Table 3 as a comparison between intervals relative to retraction. We found significant (p < 0.001) differences between the groups with respect to time variables, Crossref citations, and retraction acknowledgment.
We also compared positive/neutral and negative/exclusionary citations, presented in Table 4. We found significant (p < 0.001) differences with respect to time variables, citation period, retraction citation types, and policy organization types.  were identified after the retraction than before, given that citations during the retraction year were counted separately. The distribution of citation types across time from retraction to policy document is visualized in Figure 3. The pattern indicates that positive citations declined after the retraction, but negative citations increased. The presence of negative citations in years preceding the retraction is notable, as they indicate that some documents questioned or criticized problematic research before it became retracted. However, the persistence of positive citations after the retraction is equally notable and points to potential problems with the dissemination of retraction information. This observation is reinforced by the distribution of citations that acknowledged the retraction across the same time intervals (Figure 4). The data suggest that even when retracted publications were cited negatively, their publication status was correctly acknowledged only in 48% of cases. Notably, some policy documents began to acknowledge retractions during the retraction year. This supports the idea that although publishing delays can also exist in the policy arena, it is not uncommon to see documents published quickly-the same year as the actual writing and citing were done. Figure 5 shows whether publications with different retraction types tend to be cited positively or negatively over time. The most severe Type 1 (error and misconduct) publications began to be cited negatively as early as 10 years before the retraction and continued to accumulate negative citations more than 10 years after.

Are Different Types of Retracted Research Cited Differently in Policy Documents?
In contrast, Type 3 (no error and no misconduct) publications were barely cited negatively at all. Lastly, Type 4 (error and no misconduct) publications attracted the highest number of positive citations prior to retraction and also continued to receive positive citations long after. There were 146 policy organizations in the combined Altmetric-Overton data, including organizations with parent-daughter relationships or identical organizations with different names. After accounting for these differences, the count decreased to 98 unique organizations. These organizations were classified into policy organization types. Summary statistics for the distribution of organization types between citation intervals and citation types can be found in Tables 3 and 4 respectively. The most frequent type was found to be Government organizations (46%) followed by NGO/Think Tanks (24%), IGO (16%) and Aggregators (12%), which reflected the coverage of organization types in Overton data. We found no significant differences between organization types with respect to citation intervals from retraction. However, with respect to citation types, government organizations were found to cite retracted research more negatively, and the opposite was true for other types.  Figure 6 provides an overview of how documents authored by different organization types cited retracted publications before and after the retraction. The visualization highlights the shift towards negative citations after the retraction for government organizations. Other types tended to cite retracted research positively both before and after the retraction, with IGOs and Aggregators showing the most pronounced pattern.

Selection and Appraisal by Authors of Policy Reports Can Explain This Citing Behavior?
From the interviews, we identified recurrent themes relating to research selection and appraisal in policy and spread of retracted research. Many of the interviewees (n = 8) identified as academics in either the present or the past. In several cases participants were reluctant to describe their work as policy documents, referring to them instead as policy-relevant research. When asked about research utilization in policy, all interviewees (n = 10) acknowledged looking for scientific research to inform their documents. However, most (n = 7) also mentioned that this process was intuitive and not steered by any guidelines for research selection and appraisal. Only two participants mentioned such guidelines in their organizations.
The interviewees also pointed out that they used Google Scholar as their primary search tool (n = 5), sometimes because they lacked access to subscription databases (n = 3). They also reported using bibliographic managers (n = 3) to keep track of the literature.
Another part of the interview addressed retracted research and its inadvertent spread in policy documents. Respondents acknowledged that retractions could be easy to overlook (n = 5) and attempted to identify some reasons why positive citations might slip in. For example, some mentioned that checking references is too time consuming (n = 4) and that sometimes working with unfamiliar topics could make it harder to spot retracted articles (n = 3). Several reasons revolved around the inability to access up-to-date information about the retraction. Among such reasons

Quantitative Science Studies 80
The spread of retracted research into policy literature  were the habit of reusing articles from personal libraries, where the status of the publication could not be updated (n = 2), picking references from offline sources (n = 2) and publisher paywalls (n = 1).
In explaining the context of citations of retracted articles in their documents, some authors mentioned that the reference was central to the argument. Others, however, emphasized that it was included in passing and had no bearing on the actual conclusions of the document. One interviewee acknowledged becoming aware of the retraction during the proofreading stage, but chose not to amend the reference, because the retraction reason did not entail any implications for the document's message.
Finally, the interviewees were asked to think of possible solutions for preventing the spread of retracted research. The most frequent answers called for a user-friendly tool to check bibliographic references (n = 6) or an accessible and searchable database of retracted publications (n = 3). Other respondents emphasized the importance of proactive dissemination of retracted information on the part of the publishers (n = 2), including through constant data sharing with various repositories that might host copies of retracted publications without any identification (n = 1). Overall, the proposed solutions revolved around technical means aimed at improving the availability and accessibility of retraction information (see, for example, Zotero (2019) and Scite (2020)), as well as measures to improve general citation customs among researchers and policymakers alike. Some recent initiatives addressing the spread of retracted research singled out policy and practice as a separate area of concern (RISRS, 2020) 10 . 10 The Reducing Inadvertent Spread of Retracted Science (RISRS, 2020) working group recommendations include (a) ensuring that retraction information is easy to find and use, (b) producing a retraction taxonomy and metadata standards that can be adopted by all stakeholders, (c) developing best practices for coordinating retractions, and (d) promoting research literacy on how to deal with retractions and postpublication stewardship of the scientific record (Schneider, Woods et al., 2021). Figure 6. Citation types before and after the retraction for different organization types.

Extent of the Spread of Retracted Research in Policy Literature
Our measure of the extent to which retracted research is cited in policy documents is based on matching publications in the RW database with policy databases. We estimate that 2-3% of retracted research publications are cited in policy-relevant documents. This seems higher than one might have expected for retracted research, similar even to the share of nonretracted research that is normally cited in policy literature, in the region of 0.5-3.9% (see Bornmann et al., 2016;Fang et al., n.d.;Haunschild & Bornmann, 2017). However, the observation does not account for the distinction between negative and positive citations or between pre-and postretraction citations. Nevertheless, it provides some information on how retracted publications compare to normal publications when it comes to overall use in policy documents.
We also find that retracted studies cited in policy documents seem to differ from retracted studies not cited in policy documents. Retracted research cited by policy documents are older, with publication dates no later than 2014. This is consistent with the assumption that policy citations accumulate over longer periods of time (Fang et al., n.d.). This makes policy citations more like conventional scholarly citations and unlike other altmetric indicators.
In addition, policy-cited articles have longer time to retraction. This creates a longer window of opportunity for preretraction citations, allowing it to more easily gather momentum ("citation inertia") that continues after the retraction (Hagberg, 2020). The same effect could partially explain significantly higher Crossref citation counts for the policy-cited sample. An additional explanation could also be that policy document authors tend to select research with higher citation counts, as was shown to be the case, for example, with NICE clinical guidelines (Kryl et al., 2012). Interviewees also emphasized that they relied on citation counts and journal-based metrics when evaluating the quality of research to be cited.
Retraction type is another parameter where policy-cited publications differed from overall RW publications. The higher relative incidence of Type 4 (error and no misconduct) and lower incidence of Type 3 (no error and no misconduct) publications in the policy sample could be underpinned by several explanations. Type 3 publications are retracted faster than other types and accumulate few Crossref citations. It could be hypothesized that because these publications are retracted due to administrative reasons, they are more rapidly identified for retraction and have little time to attract interest among scholars and policymakers. The higher relative rate of Type 4 publications in policy-cited publications is notable, but not straightforward to interpret based on available data.

Citation Context
Citation context and temporal analysis help to understand how exactly retracted publications are cited once selected by authors of policy reports. A somewhat counterintuitive outcome of this analysis is that the number of policy citations increases slightly after retraction. One interpretation is that positive citations continue to accrue, but the retraction signal then draws in negative citations as well. Citations during the retraction year itself make interpretation more challenging. Nevertheless, as a general conclusion, the dynamic of pre-and postretraction citation accumulation in policy literature seems to be different from scholarly literature, where citations often decline after the retraction (Dinh et al., 2019).
A higher rate of postretraction citations does not necessarily in itself indicate that there is a problem. Citation context analysis shows that negative citations increase sharply after the retraction at the expense of positive citations. Perhaps more troubling is that positive citations remain higher relative to negative ones even after the retraction and can keep accumulating for more than 20 years. This points to potential issues with how retraction information is communicated by publishers. Authors of policy reports could also be more vulnerable to this weakness than professional researchers. Although researchers often have access to subscription databases and could be more knowledgeable about how specific journals and publishers display retraction information, this is not necessarily the case with authors of policy reports. Even when authors of policy reports identify as researchers, their work in policy context may push them into less familiar topics, which can make it challenging to identify retraction information.
It is worth mentioning that the distribution of citations among authors and retracted publications is unsurprisingly skewed. It has long been demonstrated that few researchers are responsible for a disproportionately high number of publications (Lotka, 1926;Price, 1964). The same logic applies to citations with only a handful of publications drawing most citations. It does not come as a surprise therefore that retracted research should behave in the same way.
With 56 policy citations, the Wakefield et al. (1998) study is an illustrative example. Most policy documents, however, cited this study to deliberately highlight its disruptive influence on public health and vaccination attitudes. This negative portrayal is consistent with how this study is cited in scholarly literature. In particular, Suelzer et al. (2019) conducted an exhaustive analysis of 1,153 articles citing the Wakefield study since its publication in 1998 to 2019, and discovered that the share of negative citations increased substantially after the partial retraction in 2004 and further after the full retraction in 2010. Overall, negative citations were found to account for 72.7% of all citations, and the authors concluded that because the case is so well known in the academic community, positive citations are extremely unlikely (Suelzer et al., 2019). The same seems to be the case with policy citations of this study.
A contrasting example is the study that has the second highest number of policy citations . This also had the most scholarly citations of all publications in RW as of December 2020 (Retraction Watch, 2020). This paper was published and subsequently retracted from The New England Journal of Medicine (Estruch et al., 2018b) because of a methodological flaw and its language implying a direct causal relationship between the Mediterranean diet and various health benefits-a claim which turned out to be unsubstantiated (McCook, 2018). Nevertheless, after accounting for methodological irregularities, the study was republished with softer language, but similar conclusions (Estruch et al., 2018a). Our analysis shows that policy documents cited it mostly positively. However, because the study was republished with similar conclusions, these positive citations are unlikely to represent serious risks. Therefore, both Wakefield's and Estruch's publications illustrate the importance of citation skewness and citation context, because both positive and negative citations can have widely different meaning depending on retraction circumstances.
Interviews particularly helped to shed light on this contextual information. For example, one interviewee became aware of the partial retraction of a cited publication during the proofreading process but chose to go ahead with the publication as it was. After reading the retraction, the authors realized that the partial retraction actually helped support their statement. The interviewee summarized the reasoning as follows: It's important actually to think about what has been retracted and what that means … if the whole article has been retracted because all of the findings were found to be faulty or nonverifiable, that is critical … If it is a study that has many different findings and one of them has been changed because the confidence interval was found to have been reported inaccurately … I think it is somewhat different. It is still important to pay attention to, but the implications of citing the original versus the revised are less significant. (Interview 5) Several other interviewees also expressed the idea that they may still cite a retracted paper if it is retracted for administrative reasons or minor errors. However, they always acknowledged that it might be better to avoid citing retracted research altogether for status reasons. One interviewee emphasized reputational risks: Since international organizations work in the political context we are sensitive to image and credibility, and even if there is a good part of a retracted publication that was not the focus of the retraction, we'll probably not cite it. (Interview 4).
This points to a larger problem with the current retraction process. Rather than retracting problematic publications entirely, some have proposed that publishers consider a wider amendment framework so that some cases can still be cited with reservations (Barbour, Bloom et al., 2017;Fanelli, Ioannidis, & Goodman, 2018). This could even perhaps accommodate some disagreement about retraction decisions 11 . Similarly, it remains an open question as to how policy organizations respond to discovery that their literature partly rests on retracted research, as our interviewees have indicated the need for a nuanced view of how retraction affects the policy document as a whole.

Retraction Awareness and Research Appraisal by Policy Authors
Authors of policy reports are often simply not aware of the retraction. We found a continuing flow of positive citations and, even where citations were negative, almost one half of negative citations did not mention the retraction. Poor acknowledgment of retractions in both scholarly and policy literature could be a reflection of noncompliance with retraction guidelines 12 . Compliance with these recommendations has improved in recent years, but they are still not followed consistently (Deculllier & Maisonneuve, 2018). Another possibility is that authors may feel it is not necessary to mention retractions if they have already described the study in highly critical terms (fabricated, discredited, etc.).
A further problem associated with the identification of retracted publications is that of data exchange between publishers and other searchable databases and repositories that can host versions of the publication. It has been reported that Google Scholar, Scopus, Web of Science, and Embase, among others, do not consistently warn their users about retractions (Bakker & Riegelman, 2018;van der Vet & Nijveen, 2016;Wright & McDaid, 2011). The situation is worse with informal platforms and repositories. Publication copies deposited on nonpublisher 11 For those who might disagree with a retraction decision, they may continue to cite the study, perhaps even without mentioning the (unjustified) retraction. 12 The COPE guidelines specify what information a retraction notice should include and how a retracted paper is to be identified (COPE Council, 2009). These recommendations include clearly stating the retraction reason in the notice, linking the notice to the retracted paper and including retraction information on the web page of the retracted paper. It is also recommended that the publication PDF is identified with watermarks that make the retraction status obvious. Yet, retraction notices often provide a vague and insufficient description of retraction reasons (Yan et al., 2016). Information presented in retraction notices is often arbitrary and not standardized (Azoulay et al., 2015;Nair et al., 2020). Additional inconsistencies exist with respect to how different publishers identify retracted research on their websites, either with a watermark or vignette or plain text, with or without the date of retraction and link to the retraction notice (Yan et al., 2016). platforms, such as SciHub, almost never provide retraction information, while remaining easily accessible (Mine, 2019).
The accessibility of retraction information is further complicated by the use of personal libraries and reference managers. This software is extremely useful to researchers but can lead to situations when retracted publications survive unamended in personal libraries and are then used by unsuspecting researchers (Teixeira da Silva & Dobránszki, 2017). Most of these factors were also mentioned by the interviewees as explanations of the incidence of positive postretraction citations in policy documents.
A reassuring sign is that at least some citing authors question problematic articles even before they become retracted. Most negative citations (both pre-and postretraction) originate from government organizations. These organizations may have "in-house" research departments and, as our interviews suggested, those citing the research often identify as researchers themselves with considerable expertise over the material that they cite. The bulk of negative postretraction citations from these organizations are of the exclusionary kind and usually appear in a special section of the bibliography along with other excluded studies and reasons for exclusion. These sections rarely refer to retraction as the reason for exclusion, even for postretraction citations. Rather, they often pinpoint actual problems with methodology or data. This is consistent with the notion that negative citations could be more likely to come from users who are familiar with the research field of the retracted study in question. A fruitful line of further inquiry then could be to explore the proximity of expertise between the citing policy organization and the cited evidence, to the extent that these are observable and measurable features of evidence-based policymaking.

Conclusions
Concern about retractions has been growing, with dozens of studies exploring the causes and consequences of the increasing retraction rate over the past decade. Persistent citation of retracted research in scholarly literature has dominated the agenda. However, alternative routes by which retracted research can exert influence have remained underexplored. Building on prior work in retraction studies and research use, this study has shown some of the potential for analyzing the spread of retracted research into policy documents.
Studies on the spread of retracted research emphasize a variety of measures that need to be put in place to mitigate the problem. It could be argued that most measures that could prevent the spread of retracted evidence in scholarly literature would also be effective for policy literature. However, this work suggests that due to the nature of research selection and appraisal in the policy context, authors of policy reports could be even more vulnerable to retracted research.