Abstract
This paper presents an analysis of the Overton policy document database, describing the makeup of materials indexed and the nature in which they cite academic literature. We report on various aspects of the data, including growth, geographic spread, language representation, the range of policy source types included, and the availability of citation links in documents. Longitudinal analysis over established journal category schemes is used to reveal the scale and disciplinary focus of citations and determine the feasibility of developing field-normalized citation indicators. To corroborate the data indexed, we also examine how well self-reported funding outcomes collected by UK funders correspond to data indexed in the Overton database. Finally, to test the data in an experimental setting, we assess whether peer-review assessment of impact as measured by the UK Research Excellence Framework (REF) 2014 correlates with derived policy citation metrics. Our findings show that for some research topics, such as health, economics, social care, and the environment, Overton contains a core set of policy documents with sufficient citation linkage to academic literature to support various citation analyses that may be informative in research evaluation, impact assessment, and policy review.
PEER REVIEW
1. INTRODUCTION
The premise that academic research leads to wider social, cultural, economic, and environmental benefits has underpinned our investment in publicly funded research since the 1950s (Bush, 1945). It was broadly accepted that research leads to positive outcomes (Burke, Bergman, & Asimov, 1985), but this belief was further scrutinized as technical analyses were developed to unpick the exact nature and scale of these impacts (Evenson, Waggoner, & Ruttan, 1979). The types of evaluation became more varied and complex as the investigators focused on specific domains (Hanney, Packwood, & Buxton, 2000; van der Meulen & Rip, 2000), taking into account the myriad ways in which knowledge is generated, exchanged, assimilated, and utilized outside of academia. The general assumption holds that there is a return on investment in research through direct and indirect mechanisms (Salter & Martin, 2001) and the most recent literature reviews (Bornmann, 2013; Greenhalgh, Raftery et al., 2016; Penfield, Baker et al., 2014) provide detailed perspectives on how to identify and differentiate between outputs and outcomes across a range of settings.
Research evaluation also developed to support a greater need for accountability (Thomas, Nedeva et al., 2020): initially, by peer review (Gibbons & Georghiou, 1987), then strategic reorientation (Georghiou, 1995), and recently using more data-driven approaches that incorporate bibliometric components (Adams, Gurney, & Marshall, 2007; Hicks, 2010; Hicks & Melkers, 2013; Martin, 1996). Despite shortcomings in their suitability to judge research quality (Moed, Burger et al., 1985; Pendlebury, 2009), citation indicators became more popular (May, 1997) due to their growing availability, relatively low cost compared with conventional peer review, and ready application to national, regional, and institutional portfolios (BEIS, 2017). Current evaluation programs that consider citation data include: Australia (ARC, 2018), EU (Reinhardt & Milzow, 2012), Finland (Lahtinen, Koskinen-Ollonqvist et al., 2005), Italy (Abramo & D’Angelo, 2015), New Zealand (Buckle & Creedy, 2019), Norway (Sivertsen, 2018), Spain (Jiménez-Contreras, Anegón et al., 2003), United Kingdom (REF2020, 2020) and United States (NIH, 2008).
However, growing use of bibliometric indicators also altered researcher behaviors via corrupted incentives, leading to a variety of negative outcomes (Abramo, D’Angelo, & Grilli, 2021; Butler, 2003; Lopez Pineiro & Hicks, 2015; Yücel & Demir, 2018) and motivating various groups to call for more nuanced and equitable research assessment, such as in the San Francisco Declaration on Research Assessment (DORA) (Cagan, 2013), Metrics Tide report (Wilsdon, Allen et al., 2015), and Leiden Manifesto (Hicks, Wouters et al., 2015). This has resulted in publishers, research organizations, and funders signing up to the aforementioned initiatives and developing their own policies to ensure metrics are deployed and used responsibly. A key aspect has been a push towards broad recognition of research contributions (Morton, 2015) and a more nuanced use of bibliometric indicators (Adams, Marie et al., 2019).
Throughout this growth and development in the use of metrics, it has become clear that standard citation indicators reflect only the strength of influence within academia and are unable to measure impact beyond this realm (Moed, 2005; Ravenscroft, Liakata et al., 2017). This has led to the exploration of adjacent data sources to provide signals of the wider impact of research, which have been collectively named altmetrics (Priem, Taraborelli et al., 2010). This term refers to a range of potential data sources that could potentially reveal educational impact (Kousha & Thelwall, 2008; Mas-Bleda & Thelwall, 2018), knowledge transfer (Kousha & Thelwall, 2017), commercial use (Orduna-Malea, Thelwall, & Kousha, 2017), public engagement (Shema, Bar-Ilan, & Thelwall, 2015), policy influence (Tattersall & Carroll, 2018), and more. With access to a broader range of indicators, it may be possible to address some contemporary research evaluation issues by increasing the scope of how research is measured and allow the full range of research outcomes to be attributed to researchers.
In the area of policy influence, the research underpinning clinical guidelines, economic policy, environmental protocols, etc. is a significant topic of interest. Analysis of the REF2014 Impact Case Study data (Grant, 2015) showed that 20% of case studies were associated with the topic Informing government policy, and 17% were associated with Parliamentary scrutiny, most frequently in Panel C (social sciences). In many cases, evidence cited in case studies included citations to the research from national and international policy organizations. In Unit of Assessment 1 (clinical medicine), 41% of case studies were allocated to the topic Clinical guidance, indicating some use of the academic research in policy setting.
Since 2019, a large database of policy documents and their citations to academic literature has been developed by Overton (see overton.io). As of December 2021, it indexes publications from more than 30,000 national and international sources including governments, think tanks, intergovernmental organizations (IGOs), and charities. The focus of this paper is to evaluate Overton as a potential bibliometric data source using a series of analyses that investigate the makeup of documents indexed (e.g., by geography, language, and year of publication), the network of citations (e.g., volume, distribution, time-lag), and how well data correlate with other impact logging processes (e.g., as reported to funders). An example analysis is also provided to show how Overton data can be used to test whether peer-review scores correlate with derived citation metrics. In doing so, it is our hope to understand more about the potential uses of policy citation data by highlighting which disciplines are most frequently cited and if citation volumes are sufficient to support the development of citation indicators.
2. RELATED WORK
The traditional bibliometric databases, namely the Web of Science (Clarivate), Scopus (Elsevier), Dimensions (Digital Science), Microsoft Academic (Microsoft), and Google Scholar (Google), have been extensively evaluated (Aksnes & Sivertsen, 2019; Chadegani, Salehi et al., 2013; Falagas, Pitsouni et al., 2008; Harzing & Alakangas, 2016; Visser, van Eck, & Waltman, 2021), particularly in terms of cited references (Martín-Martín, Thelwall et al., 2021), subject coverage (Martín-Martín, Orduna-Malea et al., 2018), comparability of citation metrics (Thelwall, 2018), journal coverage (Mongeon & Paul-Hus, 2016; Singh, Singh et al., 2021), classification systems (Wang & Waltman, 2016), accuracy of reference linking (Alcaraz & Morais, 2012; Olensky, Schmidt, & van Eck, 2016), duplication (Valderrama-Zurián, Aguilar-Moya et al., 2015), suitability for application with national and institutional aggregations (Guerrero-Bote, Chinchilla-Rodríguez et al., 2021), language coverage (Vera-Baceta, Thelwall, & Kousha, 2019), regional bias (Rafols, Ciarli, & Chavarro, 2020; Tennant, 2020), and predatory publishing (Björk, Kanto-Karvonen, & Harviainen, 2020; Demir, 2020). The notion of best data source is partly subjective (i.e., depending on personal preference), but also depends on the type of use (e.g., search and discovery versus bibliometric analysis), discipline, regional focus, and time period in question, and can be influenced by the availability of metadata and links to adjacent data sets (e.g., patents, grants, clinical trials, etc.), depending on task.
Much like the preference for bibliographic data source, the choice of citation impact indicator (Waltman, 2016) is highly debatable. It is generally accepted that citations should be normalized by year of publication, discipline, and document type, although whether the calculation should be based on the average of ratios (Opthof & Leydesdorff, 2010; Waltman, van Eck et al., 2011) or ratio of averages (Moed, 2010; Vinkler, 2012) is contentious (Larivière & Gingras, 2011), as is the selection of counting methodology (Potter, Szomszor, & Adams, 2020; Waltman & van Eck, 2015). Suitable sample size is key to providing robust outcomes (Rogers, Szomszor, & Adams, 2020), and any choices made with respect to category scheme used and indicator choice should influence interpretation of results (Szomszor, Adams et al., 2021).
The potential for use of altmetric indicators was initially focused on the prediction of traditional citations (Thelwall, Haustein et al., 2013) and possible correlation with existing indicators (Costas, Zahedi, & Wouters, 2015; Zahedi, Costas, & Wouters, 2014). It was suggested that “little knowledge is gained from these studies” (Bornmann, 2014) and that the biggest potential for altmetrics was toward measurements of broader societal impact (Bornmann, 2015). At this point, the coverage of altmetrics was limited to social media attention (e.g., Twitter and Facebook mentions), usage metrics (e.g., website downloads, Mendeley readers), and online news citations (both traditional and blogs). Comparisons with peer-review assessment (Bornmann & Haunschild, 2018a) revealed that Mendeley readership was the most strongly associated of these with high-quality research, but still much less than conventional citation indicators. More recent analysis (Bornmann, Haunschild, & Adams, 2019) have incorporated other altmetric indicators, showing Wikipedia and policy document citations to have the highest correlation with REF Impact Case study scores out of the available indicators. Bornmann, Haunschild, and Marx (2016) conclude “Policy documents are one of the few altmetrics sources which can be used for the target-oriented impact measurement.” To date, Overton data has been utilized in a small number of studies, including an investigation of how cross-disciplinary research can increase the policy relevance of research outcomes (Pinheiro, Vignola-Gagné, & Campbell, 2021), and the interactions between science and policy making during the COVID-19 pandemic (Gao, Yin et al., 2020; Yin, Gao et al., 2021). Most recently, Bornmann, Haunschild et al. (2022) explore how climate change research is cited in climate change policy, uncovering the complexities of how research is translated into policy setting.
Prior work investigating the translation of research through citations in clinical guidelines (Grant, 2000; Kryl, Allen et al., 2012; Newson, Rychetnik et al., 2018) have utilized specific data sources (often requiring significant manual curation) to show their value in evaluating research outcomes. Databases of clinical practice guidelines have emerged (Eriksson, Billhult et al., 2020) to support this specific line of inquiry, and recent work (Guthrie, Cochrane et al., 2019; Pallari, Eriksson et al., 2021; Pallari & Lewison, 2020) utilizes this information to uncover national trends and highlight relative differences in the evidence base used.
Patent data citations are another important data source that have been utilized in studies relating to the wider impact of scientific research (van Raan, 2017), usually for tracking technology transfer (Alcácer & Gittelman, 2006; Meyer, 2000; Roach & Cohen, 2013) or industrial R&D links (Tijssen, Yegros-Yegros, & Winnink, 2016), and often in the context of national assessment (Carpenter, Cooper, & Narin, 1980; Chowdhury, Koya, & Philipson, 2016; Narin & Hamilton, 1996) and convergence research (Karvonen & Kässi, 2013). Notably, recent research casts doubt on the suitability of patent data citations for these purposes (Abrams, Akcigit, & Grennan, 2018; Kuhn, Younge, & Marco, 2020) due to changes in citation behaviour and growth in the use of the patent system as a strategic instrument.
3. METHODOLOGY
The Overton database is the primary source of data for this study. It is created by web-crawling publicly accessible documents published by a curated list of over 30,000 organizations, including governments, intergovernmental organizations, think tanks, and charities. Each document is processed to extract bibliographic information (title, authors, publication date, etc.) along with a list of cited references, including those to academic literature as well as other policy documents. Technical details regarding the reference matching process can be found on the Overton website (Overton, 2022). A policy document itself may be composed of multiple items, referred to herein as PDFs because they are the majority format type, such as clinical guidelines (which contain separate documents with recommendations and evidence bases) or when language translations exist. The types of documents vary in nature and include reports, white papers, clinical guidelines, parliamentary transcripts, legal documents, and more, intended for a variety of audiences, including journalists, policy makers, government officials, and citizens. Generally speaking, Overton seeks to index materials written by a policy maker or primarily for a policy maker.
Overton classifies publication sources using a broad taxonomy that is further subdivided by type. Top-level source types are: government, intergovernmental organizations (igo), think tank, and other. Subtypes include bank, court, healthcare agency, research center, and legislative. Each publication source is assigned a geographic location, including country and region (e.g., state or devolved territory). Some sources are classified as IGO (i.e., global reach), or EU (European Union).
For this study, 4,504,896 policy documents (made up of 4,854,919 individual PDFs) citing 3,579,710 unique articles (DOIs) were used. To integrate this data with other sources, all records were converted into Resource Description Framework (RDF) (Bizer, Vidal, & Weiss, 2018), a semantic web metadata model, and loaded into a the graph database GraphDB™. The following additional data sources were used:
- –
Crossref: Metadata for all DOIs were extracted from Crossref records providing titles, source names (i.e., journal), collection identifiers (ISSNs and ISBNs), and publication dates.
- –
Scopus journal categories: As determined by linking ISSNs to Crossref records, each journal is associated with up to 13 All Science Journal Classification (ASJC) categories for journals, organized in a hierarchy under areas and disciplines. (n = 19,555). Source: scopus.com.
- –
REF2014 Case Studies: All publicly available case studies submitted to REF2014 and the associated DOIs mentioned in the references section. A total of 6,637 Case Studies were included, linking to 24,945 unique DOIs. Source: impact.ref.ac.uk.
- –
REF2014 Results: The final distribution of scores awarded in REF2014. For each Institution and UoA, scores for Outputs and Case Studies were loaded, expressed as the percentage of outputs in categories 4* (world-leading), 3* (internationally excellent), 2* (internationally recognized), and 1* (nationally recognized). Source: results.ref.ac.uk.
- –
Gateway to Research (GTR): All funded projects from UKRI Research Councils (n = 123,399), their associated publications (n = 1,015,664), and outcomes categorized as policy outcome (n = 39,406). Source: gtr.ukri.org.
This combination of information allows us to investigate a range of questions that will inform the potential viability of Overton as a bibliometric data source:
What is the makeup of the database in terms of sources indexed by geography, language, type, and year of publication? This analysis will determine, by year of publication, the count of policy documents and PDFs indexed according to source type, region, country, and language. This will reveal potential biases in coverage that would inform suitability for certain types of analysis. Overton does contain locally relevant policy sources, such as regional government publications, but not for all geographies.
How many scholarly references are extracted and over what time period? This will measure the total number of references to DOIs extracted according to policy publication year and source type, and show the count of citations received to DOIs by their publication year according to broad research area. It is important to know how many citations to research articles are tracked because the volume will inform their suitability for citation-based indicator development.
How long does it take research articles to accumulate policy citations and how does this vary across disciplines? This will provide details on how long DOIs take to accumulate citations, both in absolute volume per year, and cumulatively. Research areas and disciplines will be analyzed separately to illustrate any differences and to highlight domains in which citation analysis may be fruitful.
What is the time lag between the publication of scholarly works and their citation within policy literature and how does this vary between disciplines? This will show the distribution from the perspective of citing policy document (i.e., how old are cited references?), and from cited DOI (i.e., when are citations to research articles received?). A sample of policy sources for healthcare agencies and governmental banks is also benchmarked to illustrate feasible comparisons. The range and timeliness of evidence used is an important consideration in policy evaluation and may be possible using the Overton database.
What statistical distribution best models policy citation counts to research articles? This will test the fit of various distributions (e.g., power law, lognormal, exponential) to empirical data using conventional probably distribution plots. Analysis by research discipline and subject will be used to inform potential field-based normalization techniques (i.e., appropriate level of granularity).
How feasible is field-based citation normalization? This will determine if a minimum sample size can be created for each subject category and year for DOIs published between 2000 and 2020. This analysis will highlight subjects that may be suitable for citation metrics and those where insufficient data are available to make robust benchmarks.
Do the citations tracked in the policy literature correlate with policy influence outcomes attributed to funded grants? This will test the correlation between policy influence outcomes reported against funded grants (submitted via the ResearchFish platform to UKRI), and the number of Overton policy citations from DOIs specified as outputs of these projects. Correlations will also be calculated for each subject according to the GTR classification.
Does the amount of policy citation correlate with peer-review assessment scores as reported in the UK REF2014 impact case study data? This will test size-independent correlation (Traag & Waltman, 2019) between normalized policy citation metrics (percentiles) and peer-review assessment (according to 4* rating). Percentiles are calculated based on year of publication and Scopus ASJC subject categories.
To analyze data by research subjects, disciplines, and areas, we utilize the Scopus ASJC journal subject mapping. This is the preferred categorical system for this analysis because it is matched to the highest number of journals in the data set (compared to Web of Science journal categories or the ScienceMetrix journal classification), and offers three levels of aggregation (areas → disciplines → subjects).
4. RESULTS
4.1. What Is the Makeup of the Database in Terms of Sources Indexed (by Geography, Language, Type and Year of Publication)?
The growth of documents indexed in Overton is depicted in Figure 1. Four plots are included 1(a) the number of documents according to publication source type (government, think tank, igo, and other); 1(b) the number of documents indexed according to publication source region; 1(c) by publication source country (top 20); and 1(d): publication language (top 20). As mentioned earlier, a policy document may contain multiple PDFs, typically language translations or different parts of a larger report or set of guidelines. The total number of PDFs indexed is shown with a dotted line in Figure 1(a), which also corresponds to the total in Figure 1(d) because PDFs are associated with languages rather than the policy document container (i.e., a single policy document may exist in multiple languages as different PDFs). It should be noted that while there is a significant growth in the total number of documents indexed, this doesn’t necessarily correlate to a growth in the publication of policy documents overall—it only reflects how many resources are currently discoverable on the web. In this sense, our analysis shows that the availability of data is improving.
To illustrate global coverage, we also supply a map in Figure 2. The map includes an insert showing the number of documents indexed for the top eight regions. Due to the scale difference between the large number of documents indexed from the United States compared with other countries, four color bins are used rather than a straightforward linear gradient.
Clearly, Overton is dominated by policy documents published by sources in the United States, but it also includes significant coverage for Canada, the United Kingdom, Japan, Germany, France, and Australia, with the majority of content originating from governmental sources. The IGO grouping (including organizations such as the WHO, UNESCO, World Bank, and United Nations) and the European Union also make up a sizable portion of the database. In terms of the makeup of sources and languages, Figure 3 is included to show the percentage makeup of documents from the top 30 regions according to source type (left) and language (middle-left). For language, three values are shown: those in English, those in a local language, and those in other languages. For the regions IGO and EU, no local languages are specified. For reference, the total policy document count for each is shown (middle-right, log scale), along with the 2018 count of articles attributed to the country in the SCImago journal ranking.
The balance of source types in each country does vary, with some regions almost entirely represented by governmental sources, such as Japan, Taiwan, Turkey, and Uruguay. The unusually high percentage of documents from Australian sources categorized as other is due to articles indexed from the Analysis & Policy Observatory (also known as APO). Another large aggregator, PubMed Central, is also indexed by Overton (for practice and clinical guidelines), but is attributed to the United States and hence only appears as a small fraction of their output, which is very large overall.
In terms of language balance, many countries have a significant proportion of content in local languages—more than 80% for France, Japan, Switzerland, Netherlands, Brazil, Taiwan, Sweden, Spain, Norway, Peru, Czech Republic, and Denmark. Those that do not are either English-speaking (United States, United Kingdom, Australia, New Zealand) or have strong colonial ties (India and Singapore).
The comparison of Overton content to SCIMago article count is included to show possible over- and underrepresentation. For example, China produces the second largest number of academic articles (after the United States) but is only the eighth most frequently indexed country (excluding IGO and EU) in Overton. In contrast, Peru and Uruguay produce a much lower number of research articles than Brazil and Chile, but a similar amount of content is indexed in Overton.
4.2. How Many Scholarly References Are Extracted and Over What Time Period?
For each PDF indexed by Overton, references to research literature are identified and extracted. The number of PDFs indexed and the corresponding number of scholarly references extracted are shown for each year in the period 2000–2020 in Figure 4(a). Only references to DOIs are included in this analysis—2,027,440 references to other policy documents are excluded. The left axis (green) shows the totals and the right axis (blue) shows the average number of references per PDF. These data are also broken down by publication source type in Figure 4(b) where the average (mean) is shown for each through the period 2000–2020. The type “other” includes articles from PubMed Central, which would account for the relatively high rate of reference extraction for that source type compared to others, albeit for a small fraction of the database (about 1% of PDFs).
Data are also summarized in Table 1 where each row corresponds to a set of policy PDFs that contain a minimum number of scholarly references. For example, row ≥ 10 counts all PDFs that have 10 or more references to scholarly articles. There are 214,082 of these (4.4% of the corpus), accounting for 8,633,884 reference links, or 89% of references overall. The data indicate that although there are many policy documents that have no references, a core set of documents (approximately 200,000) may contain a sufficient number of references to build useful citation indicators. It is also possible that the documents that have no references may be linked to other entities in Overton, such as researchers, institutions and topics of interest, providing other analytical value.
Refs. count . | PDFs . | % PDFs . | Total refs. . | % Refs. . |
---|---|---|---|---|
≥ 0 | 4,854,919 | 100.00 | 9,747,436 | 100.00 |
≥ 1 | 570,830 | 11.76 | 9,747,436 | 100.00 |
≥ 5 | 305,637 | 6.30 | 9,248,600 | 94.88 |
≥ 10 | 214,082 | 4.41 | 8,633,884 | 88.58 |
≥ 50 | 38,235 | 0.79 | 4,772,402 | 48.96 |
≥ 100 | 14,162 | 0.29 | 3,139,856 | 32.21 |
≥ 500 | 794 | 0.02 | 725,307 | 7.44 |
≥ 1000 | 181 | 0.00 | 312,596 | 3.21 |
Refs. count . | PDFs . | % PDFs . | Total refs. . | % Refs. . |
---|---|---|---|---|
≥ 0 | 4,854,919 | 100.00 | 9,747,436 | 100.00 |
≥ 1 | 570,830 | 11.76 | 9,747,436 | 100.00 |
≥ 5 | 305,637 | 6.30 | 9,248,600 | 94.88 |
≥ 10 | 214,082 | 4.41 | 8,633,884 | 88.58 |
≥ 50 | 38,235 | 0.79 | 4,772,402 | 48.96 |
≥ 100 | 14,162 | 0.29 | 3,139,856 | 32.21 |
≥ 500 | 794 | 0.02 | 725,307 | 7.44 |
≥ 1000 | 181 | 0.00 | 312,596 | 3.21 |
Perhaps of more interest from the perspective of building citation indicators, Figure 5(a) presents the number of citations received by DOIs according to their year of publication, dating back to 1970. The database total is shown in red, along with the corresponding totals for main research areas (as defined by ASJC). The data show that since 2000, publications have been cited in each year at least 200,000 times, with a maximum of 404,271 in 2009. We also use the same data to plot Figure 5(b), which shows the number of unique journals receiving citations in each year. The total maximum of around 10,000 corresponds well with the core set of global journals, for example in the Web of Science flagship collection or core publication list in the Leiden Ranking (Van Eck, 2021).
4.3. How Long Does It Take Research Articles to Accumulate Policy Citations and How Does This Vary Across Disciplines?
To appreciate the dynamics of how research articles accumulate citations from policy literature, we plot the number of citations received in years following original publication for DOIs published in 2000, 2005, 2010, and 2015. In Figure 6(a), the total number of citations received in each year is plotted, and in Figure 6(b) the cumulative total is displayed. These data indicate that the citation lifetime for DOIs is not even across years—older publications have received fewer citations overall and over a longer time period than those published more recently. Articles published in 2005 peaked 7 years after publication, those published in 2010 peaked after 4 years, and those published in 2015 after only 2 years. Further investigation is necessary to understand these differences, but it might be accounted for by the way the database is growing—an increasing number of documents indexed year-on-year could manifest as a recency bias.
Differences in the rate of citation accumulation between different disciplines were also analyzed. In terms of broad research areas, Figure 7(a) shows cumulative citation rates for articles that were published in 2010. DOIs published in journals categorized as Social Science and Humanities received the most citations, followed by Health Sciences and then Life Sciences. There is marked drop in citation rate for Physical Sciences and Engineering journals. The data for Social Science and Humanities is further decomposed into disciplines in Figure 7(b) and reveals most citations in this area are to journals in Social Sciences and Economics fields. This subject balance is in contrast to traditional bibliometric databases which tend to be dominated by citations to papers in biological and physical sciences, but could reasonably be expected given the typical domain of policy setting (e.g., social, economic, and environmental).
4.4. What Is the Time Lag Between the Publication of Scholarly Works and Their Citation Within Policy Literature and How Does This Vary Between Disciplines?
For each year between 2000 and 2020, we analyze the age of cited references in all policy documents indexed. For example, a policy document published in 2015 that references a DOI published in 2010 has a cited reference age of 5 years. For the purposes of this analysis, any reference ages that are calculated to be negative (i.e., the policy document publication date is before that of the cited reference) are removed on the assumption that they represent data errors. The distribution of these ages is displayed using standard box and whisker plots in Figure 8 (orange lines denoting median values, blue triangles for mean). The upper plot (Figure 8(a)) aggregates by the publication year of the citing policy document, and the lower plot (Figure 8(b)) aggregates by the year of publication for the cited DOI. The right insert in each shows the mean of the distribution for each of the ASJC research areas. Over the 21-year period sampled, there is little variation in the distribution of cited reference ages, with a mean of around 10 years (Figure 8(a)), and no significant differences between research areas (right plot). As a result, the distribution of reference ages aggregated by cited DOI publication year (Figure 8(b)) shows a consistent trend where the oldest publications have had the longest period to accumulate citations.
Although cited reference age appears to be consistent at a broad level, we also checked for differences in the age of references between different policy organizations. Two examples are provided in Figure 9, showing four organizations classified as either Healthcare Agency (Figure 9(a)) or Government Bank (Figure 9(b)). In both of these plots, it is apparent that different organizations cite research with different age ranges. The Canadian Agency for Drugs and Technologies in Health Canada cite many more recent articles on average than the Centers for Disease Control and Prevention (United States). Of course, there are many factors that could influence such a difference, so any interpretation should be mindful of context and comparability of items.
4.5. What Distribution Best Models Policy Citations Counts to DOIs?
When examining the policy citation counts of DOIs, it is apparent that the distribution is heavy-tailed (Asmussen, 2003). For example, for DOIs published between 2010 and 2014 (n = 731,696), 425,268 are cited only once (58%), and only 25,190 are cited 10 or more times (3.4%). Prior research using conventional bibliographic databases has investigated possible statistical distributions that model citation data (Brzezinski, 2015; Eom & Fortunato, 2011; Golosovsky, 2021; Thelwall, 2016), although there is some disagreement on whether power law, log-normal, or negative binomial distributions are best. Results vary depending on time period and discipline analyzed, database used, and if documents with zero citations are included. For this analysis, uncited DOIs are not known because the database is generated by following references made at least once from the policy literature.
Figure 10 provides the probability distribution function (PDF—left), cumulative distribution function (CDF—middle), and complementary cumulative distribution function (CCDF—right) for citations received by DOIs published between 2010 and 2014. We use the Python package Powerlaw (Alstott, Bullmore, & Plenz, 2014) to fit distributions to exponential, power law, and lognormal. None of these provide an excellent fit for the data, although lognormal is the closest. In all cases, the fitted data overestimate slightly the frequency of low-cited DOIs (i.e., cited fewer than 10 times). Broadly speaking, it appears as though the distribution of policy document citations is similar in nature to that of academic citations.
As prior research has shown some variation in citation distributions according to subject (Wallace, Larivière, & Gingras, 2009), we analyzed a sample of subjects from the ASJC research areas Social Sciences and Humanities (Figure 11(a)) and Health Sciences (Figure 11(b)). In both cases, it is evident that substantial differences occur between subjects. For example, in the Social Sciences, Economics and Finance receive significantly more citations than in Clinical Psychology or the Arts. This is important to note, as it informs the selection of granularity for any field-based normalization. These findings suggest that variation at the subject level is present and therefore subject-level normalization is preferable, providing sufficiently large benchmark sets can be constructed.
4.6. How Feasible Is Field-Based Citation Normalization?
As with standard citation metrics, citation counts from policy documents to DOIs also vary according to year of publication and field. Hence, we consider the feasibility of producing field-normalized citation indicators by analyzing the number of DOIs cited at least once according to subject and year. From a practical point of view, it is necessary to have a minimum number of DOIs to compare for any combination of subject and publication year. If the data are too sparse (i.e., there are only a handful of DOIs to compare for any subject-year), more specialized techniques are required to give robust results (Bornmann & Haunschild, 2018b).
To illustrate coverage, Figure 12 is provided showing a heatmap of subjects in the discipline Social Sciences in terms of the number of DOIs cited each year from 2000 to 2020. The color coding shows cases where n documents are cited where n < 150 (red), 150 ≤ n < 250 (orange), 250 ≤ n < 1,000 (green), and n ≥ 1,000 (blue). According to Rogers et al. (2020), a minimum sample size of 250 is advised for bibliometric samples. The image clearly shows variation in the availability of data. In some subjects, large enough samples could be drawn throughout the study period (e.g., Development, Education, Law), but in other subjects, the data are more sparse and it would be ill-advised to construct normalized indicators (e.g., Human Factors and Ergonomics). As expected, sample sizes are much smaller in the most recent years as these articles are yet to accumulate a significant number of citations. The above analysis was carried out for all 330 ASJC subjects linked in the data, grouped into 26 disciplines, to determine the overall spread of data availability. For each row in Table 2, a discipline is listed along with:
- –
Subjects: The total number of subjects in the discipline.
- –
2000–2020%: The percentage of subjects where n ≥ 250 in every year 2000–2020.
- –
2000–2018%: The percentage of subjects where n ≥ 250 in every year 2000–2018.
- –
years%: Across all subjects in the discipline, the percentage of subject-years where n ≥ 250.
- –
dois%: Across all subjects, the percentage of DOIs that are in a subject-year where n ≥ 250.
Discipline . | Subjects . | 2000–2020% . | 2000–2018% . | years% . | dois% . |
---|---|---|---|---|---|
Agricultural and Biological Sciences | 12 | 41.7 | 83.3 | 88.9 | 98.3 |
Arts and Humanities | 14 | 14.3 | 14.3 | 35.7 | 82.8 |
Biochemistry, Genetics and Molecular Biology | 16 | 31.2 | 68.8 | 73.2 | 95.9 |
Business, Management and Accounting | 11 | 54.5 | 63.6 | 74.5 | 93.1 |
Chemical Engineering | 9 | 0.0 | 0.0 | 16.4 | 59.6 |
Chemistry | 8 | 12.5 | 25.0 | 41.1 | 83.1 |
Computer Science | 13 | 7.7 | 7.7 | 27.5 | 63.8 |
Decision Sciences | 5 | 0.0 | 20.0 | 36.2 | 70.6 |
Dentistry | 5 | 0.0 | 0.0 | 18.1 | 52.4 |
Earth and Planetary Sciences | 14 | 7.1 | 42.9 | 52.0 | 85.9 |
Economics, Econometrics and Finance | 4 | 75.0 | 75.0 | 95.2 | 99.6 |
Energy | 6 | 16.7 | 16.7 | 53.2 | 87.1 |
Engineering | 17 | 0.0 | 29.4 | 45.4 | 82.3 |
Environmental Science | 13 | 84.6 | 92.3 | 97.8 | 99.7 |
Health Professions | 16 | 0.0 | 6.2 | 9.2 | 50.8 |
Immunology and Microbiology | 7 | 57.1 | 71.4 | 81.6 | 98.8 |
Materials Science | 9 | 0.0 | 0.0 | 23.3 | 51.5 |
Mathematics | 15 | 0.0 | 6.7 | 15.9 | 62.4 |
Medicine | 49 | 53.1 | 73.5 | 82.9 | 98.8 |
Multidisciplinary | 1 | 100.0 | 100.0 | 100.0 | 100.0 |
Neuroscience | 10 | 10.0 | 30.0 | 53.3 | 83.3 |
Nursing | 23 | 4.3 | 8.7 | 17.2 | 66.6 |
Pharmacology, Toxicology and Pharmaceutics | 6 | 33.3 | 33.3 | 55.6 | 94.5 |
Physics and Astronomy | 11 | 0.0 | 0.0 | 19.0 | 53.4 |
Psychology | 8 | 62.5 | 62.5 | 73.8 | 95.8 |
Social Sciences | 23 | 60.9 | 69.6 | 87.8 | 98.1 |
Veterinary | 5 | 20.0 | 20.0 | 21.0 | 74.9 |
Discipline . | Subjects . | 2000–2020% . | 2000–2018% . | years% . | dois% . |
---|---|---|---|---|---|
Agricultural and Biological Sciences | 12 | 41.7 | 83.3 | 88.9 | 98.3 |
Arts and Humanities | 14 | 14.3 | 14.3 | 35.7 | 82.8 |
Biochemistry, Genetics and Molecular Biology | 16 | 31.2 | 68.8 | 73.2 | 95.9 |
Business, Management and Accounting | 11 | 54.5 | 63.6 | 74.5 | 93.1 |
Chemical Engineering | 9 | 0.0 | 0.0 | 16.4 | 59.6 |
Chemistry | 8 | 12.5 | 25.0 | 41.1 | 83.1 |
Computer Science | 13 | 7.7 | 7.7 | 27.5 | 63.8 |
Decision Sciences | 5 | 0.0 | 20.0 | 36.2 | 70.6 |
Dentistry | 5 | 0.0 | 0.0 | 18.1 | 52.4 |
Earth and Planetary Sciences | 14 | 7.1 | 42.9 | 52.0 | 85.9 |
Economics, Econometrics and Finance | 4 | 75.0 | 75.0 | 95.2 | 99.6 |
Energy | 6 | 16.7 | 16.7 | 53.2 | 87.1 |
Engineering | 17 | 0.0 | 29.4 | 45.4 | 82.3 |
Environmental Science | 13 | 84.6 | 92.3 | 97.8 | 99.7 |
Health Professions | 16 | 0.0 | 6.2 | 9.2 | 50.8 |
Immunology and Microbiology | 7 | 57.1 | 71.4 | 81.6 | 98.8 |
Materials Science | 9 | 0.0 | 0.0 | 23.3 | 51.5 |
Mathematics | 15 | 0.0 | 6.7 | 15.9 | 62.4 |
Medicine | 49 | 53.1 | 73.5 | 82.9 | 98.8 |
Multidisciplinary | 1 | 100.0 | 100.0 | 100.0 | 100.0 |
Neuroscience | 10 | 10.0 | 30.0 | 53.3 | 83.3 |
Nursing | 23 | 4.3 | 8.7 | 17.2 | 66.6 |
Pharmacology, Toxicology and Pharmaceutics | 6 | 33.3 | 33.3 | 55.6 | 94.5 |
Physics and Astronomy | 11 | 0.0 | 0.0 | 19.0 | 53.4 |
Psychology | 8 | 62.5 | 62.5 | 73.8 | 95.8 |
Social Sciences | 23 | 60.9 | 69.6 | 87.8 | 98.1 |
Veterinary | 5 | 20.0 | 20.0 | 21.0 | 74.9 |
From these data, it is clear that some disciplines are well covered and others are not. The best covered (i.e., with years% > 80 and dois% > 90) are Agricultural and Biological Sciences, Economics, Econometrics and Finance, Environmental Science, Immunology and Microbiology, Medicine, and Social Sciences. The least well covered in terms of dois% are Materials Science, Dentistry, Physics and Astronomy, Health Professions, and Chemical Engineering.
Of the 2,270,711 cited DOIs that were published between 2000 and 2018, 2,009,302 (88%) are in a subject that contains at least 250 other cited articles in the same year. This means a subject-level normalization approach is practical and could be applied to a large portion of scholarly references.
4.7. Do the Citations Tracked in the Policy Literature Correlate With Policy Influence Outcomes Attributed to Funded Grants?
To validate the citation data linked via the Overton database, we perform an analysis using data gathered by UK funders from the Gateway to Research (GTR) portal (UKRI, 2018). Following funding of certain grants in the United Kingdom, academics are required to submit feedback using the ResearchFish platform stating publications that resulted from the funding, as well as various research outcomes, including engagement activities, intellectual property, spin out companies, clinical trials, and more. One of these categories, policy influence, is used to report various outcomes, including citations from policy documents, clinical guidelines, and systematic reviews. Data are collected at the project level, each of which is associated with various DOIs and policy outcomes. For this analysis, a data set is constructed using all funded grants with a start year between 2014 and 2020, recording the funder and research subjects specified. The funders analyzed are Arts and Humanities Research Council (AHRC), Biotechnology and Biological Sciences Research Council (BBSCR), Engineering and Physical Sciences Research Council (EPSRC), Economic and Social Research Council (ESRC), Medical Research Council (MRC), and Natural Environment Research Council (NERC). 2014 is earliest year surveyed as it is the year in which ResearchFish was first adopted across all seven research councils.
For the analysis, data are aggregated at the project level noting the number of DOIs linked to the project, the total number of policy outcomes reported (referred to as all policy influence), the number of policy outcomes of the specific type citation (referred to as citation influence), and the total number of Overton citations. Effectively, this gives two features to compare—one, self-reported policy outcomes declared by academics, and another by tracking citations from policy documents via the Overton database. If Overton is able to index a sufficiently broad set of materials, these two features should be correlated.
Table 3 provides the correlation statistics (as measured using Pearson) for the complete data set (All row), and for each research council. Pearson measures the linear correlation between two sets of data, providing a normalized coefficient between −1 (a perfect negative correlation) and +1 (a perfect positive correlation). Positive effect sizes are commonly characterized as small (0.1–0.3), medium (0.3–0.5), and large (> 0.5) (Cohen, 1988). The p-values are omitted because they will only depend on sample size, which is suitably large in this experiment. In every row, the total number of projects and DOIs they link to is reported (columns Projects and DOIs), along with two sets of statistics—one testing Overton citation counts against the total number of policy influence outcomes reported (All policy influence—middle columns), and the other testing Overton citation counts against the number of policy influence outcomes that are specifically for citations in policy documents, clinical guidelines, or systematic reviews (Citation influence only—right columns). In each case, the correlation coefficient r is provided, as well as the percentage of projected that were linked to any policy influence outcomes. This percentage figure is given to contextualize results, as for some funders the number of projects associated with any policy outcomes is low. According to these results, the correlation between the count of policy influence outcomes and the total number of citations in Overton is larger when considering all policy influence types, rather than only those specifically for citation, although for EPSRC they are similar, and for ESRC they are higher (r = 0.70). There is a medium correlation over all funders (r = 0.42), and large correlation for the EPSRC (r = 0.66) and MRC (r = 0.63).
Funder . | Projects . | DOIs . | All policy influence . | Citation influence only . | ||
---|---|---|---|---|---|---|
r . | Projects% . | r . | Projects% . | |||
All | 67,702 | 383,642 | 0.42 | 7.13 | 0.32 | 1.17 |
AHRC | 3,902 | 14,254 | 0.26 | 13.84 | 0.19 | 2.26 |
BBSRC | 9,031 | 40,642 | 0.30 | 7.60 | 0.23 | 0.68 |
EPSRC | 17,799 | 106,312 | 0.66 | 4.72 | 0.65 | 0.51 |
ESRC | 5,732 | 37,503 | 0.48 | 16.99 | 0.70 | 4.41 |
MRC | 5,992 | 60,854 | 0.63 | 16.41 | 0.20 | 2.42 |
NERC | 4,727 | 30,035 | 0.22 | 13.71 | 0.17 | 3.13 |
Funder . | Projects . | DOIs . | All policy influence . | Citation influence only . | ||
---|---|---|---|---|---|---|
r . | Projects% . | r . | Projects% . | |||
All | 67,702 | 383,642 | 0.42 | 7.13 | 0.32 | 1.17 |
AHRC | 3,902 | 14,254 | 0.26 | 13.84 | 0.19 | 2.26 |
BBSRC | 9,031 | 40,642 | 0.30 | 7.60 | 0.23 | 0.68 |
EPSRC | 17,799 | 106,312 | 0.66 | 4.72 | 0.65 | 0.51 |
ESRC | 5,732 | 37,503 | 0.48 | 16.99 | 0.70 | 4.41 |
MRC | 5,992 | 60,854 | 0.63 | 16.41 | 0.20 | 2.42 |
NERC | 4,727 | 30,035 | 0.22 | 13.71 | 0.17 | 3.13 |
The data are further decomposed according to subject category assigned to the grant, as depicted in Figure 13. Each grant may be assigned to multiple subjects and is considered in the calculation for each subject. For each subject (a row), three columns are used to show the correlation r (red), percentage of projects reporting any policy influence (green), and the total count of DOIs linked to projects (blue). In this plot, correlations are measured against all policy influence outcomes (i.e., corresponding to the middle columns in Table 3). When analyzed at this level of granularity, there is a large spread in the correlation values: 27 are small (0.1–0.3), 23 are medium (0.3–0.5) and 17 are large (> 0.5). Thirteen are not correlated.
These results show that for some subjects, Overton citation data correlates well with policy influence outcomes reported by academics. This occurs most in subjects that might be expected to have some policy influence, such as Management & Business Studies (r = 0.84), Psychology (r = 0.83), Human Geography (r = 0.63), Economics (r = 0.60), and Political Science & International Studies (r = 0.58), but also in others that might not, such as Mechanical Engineering (r = 0.98), Systems engineering (r = 0.93), and Drama & Theatre Studies (r = 0.76). Essentially, the analysis shows which subjects are most amenable to analysis using Overton data.
4.8. Does the Amount of Policy Citation Correlate With Peer-Review Assessment Scores as Reported in the UK REF2014 Impact Case Study Data?
To test for possible correlation, we utilize the Impact Case Study database from REF2014. This contains 6,637 four-page documents that outline the wider socioeconomic and cultural impact of research attributed to a particular university and Unit of Assessment (UoA). Part of the case study document references the original underpinning research (up to six references per case study) which has been linked via DOIs. By means of peer review, each case study is scored as 4* (world-leading), 3* (internationally excellent), 2* (internationally recognized), or 1* (nationally recognized). Although the scores for individual cases studies are not known, the aggregate scores are made available as the percentage of case studies that received each score. Hence, it is possible to test possible correlations at the aggregate level (namely, institution and UoA).
For this analysis, we test the correlation between research scored as 4* (excellent) and citations to the underpinning research as reported in the Overton database. As the assessment exercise took place in 2014, only citations from policy documents published in or earlier than 2014 are considered. Rather than test raw citation counts, we calculate a year-subject normalized citation percentile for each DOI using ASJC journal categories (i.e., all DOIs published in a certain year and subject are compared with each other). Any DOIs in a year-subject group that contain < 250 examples are marked as invalid and excluded from the analysis. Of the total 24,945 unique DOIs associated with an impact case study, 4,292 are referenced in Overton and have a valid citation percentile.
Following the methodology presented in Traag and Waltman (2019), we measure the correlation between the percentage of case studies that scored 4* and the percentage of DOIs in the top 99, 90, and 75th Overton citation percentiles. Multiple percentiles were tested as it it not necessarily clear where the benchmark for 4* research would lie. We evaluated 1,847 scores are evaluated—one for each university and UoA. A size-independent test measures the Pearson correlation between the percentage of research scored 4* and the percentage of DOIs with a normalized citation percentile above the threshold.
Table 4 provides the results of this analysis. All 36 UoAs are shown along with the Pearson correlation r for three citation percentile thresholds: 99%, 90%, and 75%. In some cases (for example Classics), when no DOIs could be found exceeding the percentile threshold, the correlation is undefined and hence, left blank. Based on these results, it is apparent that different percentile thresholds yield different results depending on UoA. For example in UoAs 18—Economics and Econometrics and 25—Education, the highest correlations of 0.52 and 0.46 respectively are obtained with a threshold of 90%, but in UoA 7—Earth Systems and Environmental Sciences, a threshold of 99% yields the highest correlation of 0.52. This suggests that the threshold for what is considered 4* impact varies across fields in terms of policy influence.
UoA . | 4*_top99 r . | 4*_top90 r . | 4*_top75 r . |
---|---|---|---|
1—Clinical Medicine | 0.20 | 0.24 | 0.25 |
2—Public Health, Health […] | 0.22 | −0.20 | 0.24 |
3—Allied Health Professions, […] | 0.02 | 0.04 | 0.13 |
4—Psychology, Psychiatry and […] | 0.18 | 0.13 | 0.27 |
5—Biological Sciences | 0.16 | 0.08 | 0.02 |
6—Agriculture, Veterinary and […] | 0.28 | ⋆0.57 | ⋆0.54 |
7—Earth Systems and […] | ⋆0.52 | 0.24 | 0.17 |
8—Chemistry | 0.15 | 0.00 | |
9—Physics | 0.07 | 0.07 | 0.02 |
10—Mathematical Sciences | †0.32 | 0.11 | 0.13 |
11—Computer Science and Informatics | 0.27 | †0.30 | |
12—Aeronautical, Mechanical, […] | †0.49 | ⋆0.64 | |
13—Electrical and Electronic […] | †0.42 | 0.11 | |
14—Civil and Construction Engineering | 0.17 | −0.22 | −0.11 |
15—General Engineering | 0.11 | 0.15 | |
16—Architecture, Built […] | 0.13 | †0.34 | †0.30 |
17—Geography, Environmental […] | 0.21 | 0.16 | 0.20 |
18—Economics and Econometrics | †0.39 | ⋆0.52 | †0.33 |
19—Business and Management Studies | 0.00 | 0.10 | 0.18 |
20—Law | 0.16 | 0.05 | |
21—Politics and International Studies | 0.16 | 0.07 | 0.26 |
22—Social Work and Social Policy | †0.47 | 0.24 | †0.32 |
23—Sociology | 0.01 | 0.07 | 0.08 |
24—Anthropology and Development […] | 0.10 | 0.15 | 0.21 |
25—Education | †0.33 | †0.46 | †0.40 |
26—Sport and Exercise Sciences, […] | 0.22 | †0.40 | †0.31 |
27—Area Studies | †0.44 | †0.42 | |
28—Modern Languages and Linguistics | 0.12 | 0.24 | 0.24 |
29—English Language and Literature | 0.00 | 0.00 | |
30—History | 0.12 | 0.13 | |
31—Classics | |||
32—Philosophy | 0.24 | 0.20 | |
33—Theology and Religious Studies | 0.41 | ||
34—Art and Design: History, […] | −0.15 | ||
35—Music, Drama, Dance and […] | −0.01 | 0.20 | |
36—Communication, Cultural and […] | †0.38 | †0.33 |
UoA . | 4*_top99 r . | 4*_top90 r . | 4*_top75 r . |
---|---|---|---|
1—Clinical Medicine | 0.20 | 0.24 | 0.25 |
2—Public Health, Health […] | 0.22 | −0.20 | 0.24 |
3—Allied Health Professions, […] | 0.02 | 0.04 | 0.13 |
4—Psychology, Psychiatry and […] | 0.18 | 0.13 | 0.27 |
5—Biological Sciences | 0.16 | 0.08 | 0.02 |
6—Agriculture, Veterinary and […] | 0.28 | ⋆0.57 | ⋆0.54 |
7—Earth Systems and […] | ⋆0.52 | 0.24 | 0.17 |
8—Chemistry | 0.15 | 0.00 | |
9—Physics | 0.07 | 0.07 | 0.02 |
10—Mathematical Sciences | †0.32 | 0.11 | 0.13 |
11—Computer Science and Informatics | 0.27 | †0.30 | |
12—Aeronautical, Mechanical, […] | †0.49 | ⋆0.64 | |
13—Electrical and Electronic […] | †0.42 | 0.11 | |
14—Civil and Construction Engineering | 0.17 | −0.22 | −0.11 |
15—General Engineering | 0.11 | 0.15 | |
16—Architecture, Built […] | 0.13 | †0.34 | †0.30 |
17—Geography, Environmental […] | 0.21 | 0.16 | 0.20 |
18—Economics and Econometrics | †0.39 | ⋆0.52 | †0.33 |
19—Business and Management Studies | 0.00 | 0.10 | 0.18 |
20—Law | 0.16 | 0.05 | |
21—Politics and International Studies | 0.16 | 0.07 | 0.26 |
22—Social Work and Social Policy | †0.47 | 0.24 | †0.32 |
23—Sociology | 0.01 | 0.07 | 0.08 |
24—Anthropology and Development […] | 0.10 | 0.15 | 0.21 |
25—Education | †0.33 | †0.46 | †0.40 |
26—Sport and Exercise Sciences, […] | 0.22 | †0.40 | †0.31 |
27—Area Studies | †0.44 | †0.42 | |
28—Modern Languages and Linguistics | 0.12 | 0.24 | 0.24 |
29—English Language and Literature | 0.00 | 0.00 | |
30—History | 0.12 | 0.13 | |
31—Classics | |||
32—Philosophy | 0.24 | 0.20 | |
33—Theology and Religious Studies | 0.41 | ||
34—Art and Design: History, […] | −0.15 | ||
35—Music, Drama, Dance and […] | −0.01 | 0.20 | |
36—Communication, Cultural and […] | †0.38 | †0.33 |
This analysis shows that for some UoAs, Overton policy citation percentiles do correlate with peer-review assessment, but less than reported for citation data (Traag & Waltman, 2019) when compared to scoring of outputs. Ideally, the test would only be performed on the subset of case studies that might reasonably be expected to have some form of policy outcome. For example, searching the database for "policy outcome" ∼ 5 OR "policy influence" ∼ 5 (where the ∼ 5 operator specifies that terms must be within five words of each other) returns only 406 results. Hence, our test effectively measures the correlation of impact in general against that of policy citation and could only be expected to find correlation in UoAs where the dominant form of impact is policy-related, such as in UoA 22 Social Work and Social Policy. Unfortunately, because scores are not known for individual case studies, this type of analysis is not possible.
5. DISCUSSION
Our analysis of the Overton policy document citation database yields a promising outlook. Using this kind of data, it is possible to link the original research published in scholarly literature to their use in a policy setting environment. The Overton database indexes a sufficient amount of content to create large volumes of citations (> 400,000 every year since 2014) across a wide range of research topics and journals. Unlike conventional bibliometric databases, citations are more focused towards social sciences, economics, and environmental sciences than to biological and physical sciences, a feature that suggests novel value in the content in terms of analytical potential.
The balance of content by region broadly follows that of other bibliometric databases, namely it is dominated by North America and Europe, but the representation of local language documents is much higher than in scholarly publishing, where English dominates (Márquez & Porras, 2020; Mongeon & Paul-Hus, 2016). Anecdotal evidence in this study hints that Overton may have more equitable coverage across some countries: Figure 3 shows that Peru and Uruguay have a similar volume of policy documents indexed to Brazil and Chile despite producing fewer scholarly works. However, more detailed analysis, drawing on other indicators (e.g., economic and industrial), is required to produce robust conclusions in relation to this question.
One important issue that is not addressed in this study is the question of coverage. By indexing as much of the freely accessible data on the web, it is possible that some countries, organization types, or document types are better represented than others. However, this is not a straightforward question to tackle because the universe of policy documents is not well defined (i.e., what should and should not be considered a policy document?) and the only route to obtain information on missing sources requires significant manual effort. A practical approach may be to survey certain policy topics and regions to estimate coverage levels.
Although a significant proportion of the policy documents indexed are not linked to DOIs (88% of PDFs), a core set of around 200,000 contain more than 8 million references. This reflects the diverse range of material indexed including statistical reports, informal communications, proceedings, and commentary, many of which one would not expect to contain references to original research articles. A considerable pool of citations is generated—between 200,000 and 400,000 per year since 2000 across a broad set of journals. A more detailed analysis of these data could compare how citations are distributed across journals and if citations patterns from policy documents follow the same tendencies as scholarly publishing. It may be true that some journals are able to demonstrate higher utilization in policy documents relative to a citation-based ranking.
The potential for development of field-normalized citation indicators is good. When analyzed at the ASJC subject level, many fields contain a sufficient number of cited articles to create benchmarks (i.e., ≥ 250), especially if the most recent 2 years are excluded. Overall, 88% of articles published between 2000 and 2018 that receive any policy citations could be field-normalized in this way. However, although this approach is practical, it may not be best—a more detailed analysis comparing normalization results at different levels of granularity (i.e., field-based or discipline-based) would be required to make any recommendation.
One potentially interesting line of inquiry is that of citation lag. At the macro scale, our analysis shows there is little variation in the distribution of ages, even across disciplines, but when viewed at a more granular level (such as individual policy organizations), diversity occurs. This may offer useful insights into the differences between what research is used, in terms of age and also in citation ranking. Some organizations may favor newer but less established evidence than others that prefer older but more widely recognized research.
The distribution of citations accumulated by research articles seems to follow similar trends to that seen in other altmetric indicators, especially Mendeley, Twitter, and Facebook as reported in Fang, Costas et al. (2020), and like conventional citation data, are best matched to a log normal distribution. It is interesting to note that in Fang et al. (2020), 12,271,991 articles published between 2012 and 2018 were matched to Altmetric data and yielded 156,813 citations across 137,326 unique documents. For the same time period, Overton contains 2,600,464 citations across 1,006,439 unique DOIs. These coverage statistics are not directly comparable because the original pool of articles surveyed in Fang et al. (2020) is limited to the Web of Science and Overton tracks citations to any journal. Nevertheless, it does suggest that Overton tracks substantially more citations to policy literature than Altmetric.
Possibly the most striking and encouraging result is from the analysis of policy influence outcomes reported to UK funders. Our findings show that for some subjects, correlation between self-reported data and that extracted from Overton is high. This offers additional opportunities to reduce reporting burden, through either semiautomated or automated approaches. Further, it provides a basis to benchmark funders and institutions from different regions where self-reported data may not be available, although such an analysis should consider coverage variation across geographies.
Finally, our experiment to test for possible correlation between peer-review assessment of impact and Overton policy citations hints at some utility: For certain Units of Assessment, a correlation between peer-review score of impact and citation rank does exist, although less than that seen in other studies that assessed peer-review scores of academic impact against conventional citation data (Traag & Waltman, 2019). While the REF2014 impact case study data do provide a unique opportunity to understand how research is assessed from the perspective of wider socioeconomic impact, obfuscation of the individual scores prevents deeper analysis that is focused on research pertinent to policy outcomes. It may be more fruitful to utilize other sources to benchmark peer review, such as postpublication peer-review score (Waltman & Costas, 2014).
AUTHOR CONTRIBUTIONS
Martin Szomszor: Conceptualisation, Investigation, Methodology, Visualisation, Writing—original draft, Writing—review & editing. Euan Adie: Conceptualisation, Methodology, Writing—review & editing.
COMPETING INTERESTS
Martin Szomszor is an independent researcher. Euan Adie is Founder and CEO of Overton.
FUNDING INFORMATION
This research has been funded by Open Policy Ltd, which runs Overton.
DATA AVAILABILITY
Because Overton is a commercial database, data for this study cannot be shared publicly. For more information about access to Overton data for research purposes, please email [email protected].
REFERENCES
Author notes
Handling Editor: Ludo Waltman