Abstract
It has been argued that preprint coverage during the COVID-19 pandemic constituted a paradigm shift in journalism norms and practices. This study examines whether and in what ways this is the case using a sample of 11,538 preprints posted on four preprint servers—bioRxiv, medRxiv, arXiv, and SSRN—that received coverage in 94 English-language media outlets between 2014 and 2021. We compared mentions of these preprints with mentions of a comparison sample of 397,446 peer-reviewed research articles indexed in the Web of Science to identify changes in the share of media coverage that mentioned preprints before and during the pandemic. We found that preprint media coverage increased at a slow but steady rate prepandemic, then spiked dramatically. This increase applied only to COVID-19-related preprints, with minimal change in coverage of preprints on other topics. The rise in preprint coverage was most pronounced among health and medicine-focused media outlets, which barely covered preprints before the pandemic but mentioned more COVID-19 preprints than outlets focused on any other topic. These results suggest that the growth in coverage of preprints seen during the pandemic may imply only a temporary shift in journalistic norms, including a changing outlook on reporting preliminary, unvetted research.
PEER REVIEW
1. INTRODUCTION
On January 10, 2020, the World Health Organization published its first set of guidelines for preventing and controlling a suspected “novel coronavirus (nCoV)” (WHO, 2020). Soon journalists found themselves plunged into an unexpected crisis, with an out-of-control, little understood infectious disease, and an influx of new scientific information to sift through and report on. Without much peer-reviewed literature to go on—especially in the early stages of the pandemic—many turned to preprint servers to share urgent new information with the public (Fraser, Brierley et al., 2021). The ensuing media coverage of preprints seen during the pandemic has since been described as a complete rupture from past reporting practices (e.g., Burke, 2021; Makri, 2021). Yet, empirical evidence supporting this assertion is lacking. As noted in previous research, there is currently an absence of longitudinal investigations that examine preprint coverage over time and assess the impact of COVID-19 on journalistic practices and norms (Fleerackers, Chtena et al., 2023; van Schalkwyk & Dudek, 2022b). This study fills this gap by examining how media coverage of preprints has evolved, both qualitatively and quantitatively, in the lead up to, and during the first year of, the COVID-19 pandemic. Using Altmetric data, it examines changes in the volume and nature of media coverage of 11,538 preprints posted between 2013 and 2021 on bioRxiv, medRxiv, arXiv, and SSRN—four of the most actively used servers used to share COVID-19-related research (Waltman, Pinfield et al., 2021).
2. LITERATURE REVIEW AND RESEARCH QUESTIONS
2.1. Preprint Media Coverage Before and During the COVID-19 Pandemic
Preprints have been used extensively in physics, mathematics, and computational science since arXiv launched in 1991. However, scientists in the biological and medical fields have been more reluctant to do so—that is, until recently (Puebla, Polka, & Rieger, 2021). The early months of the pandemic saw a sharp increase in the volume of available COVID-19-related preprints (Funk, 2023; Horbach, 2020), with preprint servers such as medRxiv and bioRxiv becoming key disseminators of pandemic research (Else, 2020; Vergoulis, Kanellos et al., 2021). One study (Kousha & Thelwall, 2020) found that preprints posted to arXiv, bioRxiv, medRxiv, and SSRN comprised 13.26% of the COVID-19 literature during March–April 2020, and an analysis by Fraser et al. (2021) found that preprints posted to 16 servers (including the four examined in this study) comprised almost 25% of the COVID-19-related research available from January–October 2020. Studies have predicted that the use of pandemic-related preprints continued to grow at a relatively stable rate throughout 2021 and 2022, although more research is needed to confirm these predictions (Nane, Robinson-Garcia et al., 2023).
COVID-19-related preprints also gained traction within news media, receiving coverage in diverse media outlets around the world (Fleerackers, Riedlinger et al., 2021; Massarani, Neves et al., 2021; Massarani & Neves, 2021; Simons & Schniedermann, 2023; van Schalkwyk & Dudek, 2022a). One study found that more than a quarter of COVID-19-related bioRxiv and medRxiv preprints were mentioned in at least one media story during the pandemic, whereas only about 1% of those on other topics received media coverage (Fraser et al., 2021). Some journalists reported adopting novel practices to report on these unreviewed studies, something they said they had never done before (Fleerackers, Nehring et al., 2022b; Massarani et al., 2021).
This media coverage of preprints seen during the COVID-19 pandemic has been described by some journalists as a “paradigm shift” (Fleerackers et al., 2022b). Yet, although studies conducted during the COVID-19 pandemic provide important evidence into how journalists covered preprints during the evolving health crisis, little is known about whether journalists have covered preprints on other topics or during other communication contexts. For example, Fraser et al. (2021)’s widely cited study is often described as providing evidence that “During the pandemic, journalists … paid increased attention to preprints” (Kwon, 2021), but the authors did not compare pandemic preprint coverage to prepandemic levels. Instead, they provided evidence that COVID-19-related preprints received an outsized amount of media attention, relative to those on other topics posted to bioRxiv and medRxiv during the same time period—but not relative to preprints posted during different time periods or on different servers (Fraser et al., 2021). One recent study begins to fill this gap through an examination of coverage of preprints by seven German newspapers from 2018–2021 (Simons & Schniedermann, 2023). The authors identified low and stable rates of coverage leading up to the pandemic, followed by a major surge in 2020 and 2021 that was driven by COVID-19-related preprints. However, it is unclear whether this trend is reflective of other media outlets (e.g., those outside of Germany) and whether there are disciplinary differences in coverage trends.
More broadly, although preprints made up a significant proportion of the COVID-19-related literature available within the first months of the pandemic, it is unclear how media coverage of preprints compares to coverage of peer-reviewed research. One article found that the five COVID-19-related research articles that received the most media coverage were all peer-reviewed publications; however, the analysis was descriptive and did not compare the volume of preprint coverage to that of peer-reviewed papers (Kousha & Thelwall, 2020). Another small study found no significant difference in the amount of media coverage received by medRxiv preprints and peer-reviewed publications about COVID-19-related therapies that were posted between February 1–May 10, 2020 (Jung, Sun, & Schluger, 2021). A study of South African media found that only 3% of stories mentioning COVID-19 research included a mention of a preprint (van Schalkwyk & Dudek, 2022a). Besançon, Peiffer-Smadja et al. (2021) used Altmetric to examine news coverage of COVID-19-related preprints posted to arXiv, medRxiv, and bioRxiv from January–July 2020, finding that these preprints received more coverage than the non-COVID-19-related preprints posted to arXiv during the same time period. Again, coverage of preprints before the pandemic period was not considered. Fraser, Momeni et al. (2020) found that bioRxiv preprints submitted between November 2013 and December 2017 received far less media coverage than either their peer-reviewed versions or a control set of peer-reviewed articles that were never deposited to bioRxiv. Finally, Waltman et al. (2021) found that, although some COVID-19-related preprints were highly reported on, overall, news coverage of peer-reviewed literature outstripped coverage of preprints. Unfortunately, Waltman et al. (2021) did not report the average attention received per preprint vs peer-reviewed article. However, the authors did examine news coverage received by a sample of high-profile preprints and their corresponding peer-reviewed articles. For 45% of these preprint–article pairs, the preprint received more than 20% of the total news attention; for 11% of the pairs, preprints received more than 80% of the coverage (Waltman et al., 2021). Again, the authors did not compare these findings to rates of coverage before the pandemic.
Collectively, these results provide some of the first evidence that preprints have historically received less media coverage than peer-reviewed research and that this trend may have started to shift during the pandemic. However, given the mixed and incomplete body of evidence, several questions remain unanswered. In particular, it is unclear whether the volume of preprint media coverage increased, decreased, or remained relatively stable in the years leading up the pandemic—information that could help shed light on whether preprint-based media coverage is likely to continue post-COVID-19. It is also unclear whether any changes in coverage seen during the pandemic apply only to COVID-19-related preprints or reflect a change in journalists’ willingness to use preprints in general. As such, to examine whether the pandemic has truly introduced a “paradigm shift” in journalistic practice, this study uses a sample of preprints that received coverage in English-language media between 2014–2021 to examine the following research questions:
RQ1: Has the share of preprint coverage in the media increased during the COVID-19 pandemic?
RQ2: Do changes in media coverage of COVID-19-related preprints extend to coverage of preprints on other topics?
2.2. Preprint Media Coverage in an Evolving Media Landscape
It is also unclear from previous research which types of media outlets have driven media coverage of preprints and whether this has changed as a result of the pandemic. Journalism has evolved in important ways in the years leading up to the COVID-19 crisis, with financial pressures, shrinking news audiences, and changes to the digital communication landscape contributing to declines in specialized science journalism around the world (Saari, Gibson, & Osler, 1998; Schäfer, 2017). These declines have likely influenced the amount of media coverage that research articles—including preprints—receive, as outlets specializing in science appear to cover more research than general interest publications (Wihbey, 2017). In addition, an array of actors who have historically been considered “peripheral”—or outside of journalism—have entered the field, including bloggers, news aggregators, and other alternative outlets (Hermida, 2019; Schapals, 2022; Stocking, 2019). These peripheral actors may not always adhere to the established norms and practices that shape media coverage at traditional—or “legacy”—outlets (e.g., Harrison, Macmillan, & Rudd, 2020; Hurley & Tewksbury, 2012), which may affect how or whether they cover preprints. For example, journalists working at peripheral outlets may not be expected to adhere to professional journalism resources, such as the AP Style Guide, which recommend avoiding research that has not been peer reviewed (Froke, Bratton et al., 2020; Haelle, 2020). Yet, both peripheral and legacy outlets actively covered COVID-19-related preprints during the early months of the pandemic (Fleerackers et al., 2021). Similarly, outlets that publish content but are not considered journalism, such as university websites and press release distribution services, may also contribute to mobilizing preprint research. For example, the Science Media Centre in Germany—a nonjournalistic outlet that provides science journalists with access to research and expert perspectives—began sharing roundups of newly posted preprints during the pandemic (Broer, 2020; Broer & Pröschel, 2022). Again, however, any evidence about the nature of nonjournalistic outlets reporting on preprints is limited to the pandemic period. As such, our third research question asks:
RQ3: Have changes in media coverage of preprints occurred similarly across media outlets?
3. METHOD AND MATERIALS
To identify media coverage, this study relies on data from Altmetric1, a company that tracks mentions of research outputs across a range of digital media, including news media. Research suggests that Altmetric’s “Mainstream Media” category is a relatively reliable source of data but only when working with a predefined list of English-language media outlets (Fleerackers et al., 2022b; Ortega, 2020a, 2020b). In addition, because Altmetric regularly updates both the list of media outlets2 and research outputs3 it tracks, the volume of media coverage it collects may vary over time in ways that are unrelated to actual changes in news reporting. For these reasons, we decided to gather two data sets:
A primary data set comprising news mentions of bioRxiv, medRxiv, arXiv, and SSRN preprints;
A comparison data set comprising news mentions of peer-reviewed research indexed in the Web of Science (WoS).
3.1. Identifying and Characterizing Media Outlets That Frequently Cover Research
Data were queried from local snapshots of the WoS and Altmetric databases housed at the Observatoire des sciences et des technologies (OST) on January 30, 20234. Data filtering and cleaning were performed using the Python pandas package (Pandas Development Team, 2023). To identify our predefined set of media outlets, we queried a snapshot of the Altmetric database from June 3, 2021 for news mentions of all WoS research outputs associated with a digital object identifier (DOI). We restricted our search to mentions of research outputs that had been published in 2013 or later and that were mentioned in news stories between January 1, 2014 and June 3, 2021. We then filtered for outlets that consistently covered a high volume of research, defined for the purposes of this study as outlets that mentioned at least 100 WoS research items per year from 2014–2020. We manually checked the resulting 128 media outlets by visiting the URLs for their home pages provided by Altmetric. After excluding 25 outlets that were not written in English, five that were not tracked by Altmetric from 2021–2022 (e.g., because they had changed their domain names), three whose URLs did not resolve, and one with all misidentified mentions, we were left with a final sample of 94 outlets.
Next, we applied a coding protocol adapted from Hermida and Young (2019) to characterize the nature of these media outlets. We analyzed each outlet’s main topical focus (e.g., science and technology, health and medicine, general news, etc.) and assessed whether it was best described as legacy journalism (i.e., staffed by professional journalists who adhere to traditional journalistic norms), peripheral journalism (i.e., staffed by individuals who have traditionally worked outside of journalism and who adhere to emerging or alternative norms), or nonjournalism (i.e., organizations such as universities, press release services, or academic journals that do not produce journalism). A detailed version of the coding protocol, including examples, is available from Fleerackers and Fagan (2022).
Coding was performed by researchers with professional journalism experience: the lead author and a research assistant who was not aware of the study objectives (cf. Hermida & Young, 2019). The two coders independently explored the media outlets’ websites, examining their content, Mission Statement, and, if available, other relevant pages (e.g., Masthead, Editorial Guidelines, Code of Conduct). The coders compared their coding and resolved any discrepancies through discussion, and, if needed, by consulting an outside researcher (also a former journalist). Such double coding approaches are appropriate when data are not very numerous (Krippendorff, 2004), as in the present study. The results of the final coding are reported in aggregate in Table 1; coding for the full list of outlets is available at the ScholCommLab’s Dataverse (Alperin, Fleerackers, & Shores, 2023a).
3.2. Gathering News Mentions of Preprint Research
We gathered news mentions of preprints from four servers—bioRxiv, medRxiv, arXiv, and SSRN—because these servers were highly used for sharing COVID-19-related preprints (Waltman et al., 2021). These servers were also launched at different times (bioRxiv in 2013, medRxiv in 2019, arXiv in 1991, and SSRN in 1994), with different disciplinary scopes, and have seen different levels of uptake among scholars (Puebla, Polka, & Rieger, 2022), providing us with a diverse sample of preprints for our analysis. We queried Altmetric for mentions of preprints from these servers in stories published by the 94 outlets since January 1, 2014. This yielded 40,039 mentions of 15,041 preprints across 31,258 news stories. For each of these preprints, we gathered the publication dates from the arXiv and Crossref APIs using the Python arxiv and habanero packages (Chamberlain, 2020; Schwab, 2021).
Next, because previous research suggests that publication date metadata can often be incorrect or incomplete (Haustein, Costas, & Larivière, 2015), we manually checked subsamples of our data and compared the publication dates provided by Crossref, the arXiv API, and Altmetric. The most reliable publication date for each server was retained for analysis. For bioRxiv and medRxiv, this was the DOI creation date (i.e., the date that the DOI for the preprint was deposited in Crossref); for arXiv, it was the date provided by the arXiv API; and for SSRN, it was either the “first posted on” date provided by Altmetric or Crossref’s DOI creation date, whichever came first. We removed 3,619 preprints that were published before 2013, as these publication dates were particularly unreliable (perhaps because Altmetric started tracking mentions partway through 2012 and thus has incomplete data for previously published outputs)5. Even after excluding these preprints and selecting the most reliable publication date for each server, we noted that publication dates for arXiv and SSRN sometimes differed from the dates visible on the server web page by a few days—a limitation that we kept in mind during data cleaning and analysis.
We made several further exclusions to ensure that the mentions in our data set were mentions of true preprints (i.e., rather than postprints or journal versions of preprints). First, we removed 165 mentions of postprints, which we defined as preprints that were posted on the same day, or after, their journal versions were published. Because, as mentioned above, publication dates for preprints were often incorrect by a few days, we excluded an additional 332 mentions of preprints with a publication date within 7 days of the journal version’s publication date (i.e., suspected postprints). We also removed 327 mentions of preprints in news stories that were published before the preprint was first posted, using a 5-day cutoff to allow for the slight inconsistencies we identified in the publication metadata. Because Altmetric does not disambiguate between preprints and journal versions for some preprint servers6,7 and may thus erroneously include some mentions of peer-reviewed research, we removed 3,547 mentions in news stories published after the peer-reviewed journal version of the preprint was published, again using a 5-day margin. Although this approach may have removed some true mentions of preprints, these false removals are likely limited, as journalists strive to ensure their stories are timely and relevant (Rosen, Guenther, & Froehlich, 2016; Shoemaker & Reese, 1996) and seldom cover research outputs more than a few weeks after initial publication (Maggio, Alperin et al., 2017). Finally, we removed an additional 1,021 duplicate news mentions (where the same preprint was mentioned in the same story more than once). In total, filtering led to the exclusion of 9,081 mentions (22.5% of the original data set). The code used for filtering has been made publicly available (Alperin, Shores, & Fleerackers, 2023b). Our final preprint sample comprised 31,028 mentions of 11,538 preprints by the 94 outlets in our sample (Alperin et al., 2023a).
3.3. Gathering News Mentions of Peer-Reviewed Research
We downloaded all the mentions of WoS research from our 94 outlets (i.e., those described in Section 3.1), resulting in 1,657,202 mentions of 466,138 distinct research outputs. From these, we filtered 156,187 mentions of research articles that were published prior to 2013, 579 mentions that were already included in the preprint data, and 14,482 duplicate mentions (where an article was mentioned in the same news story more than once). In total, filtering led to the exclusion of 170,669 mentions (10.3% of original data set).
The final journal research sample comprised 1,486,533 mentions of 397,446 distinct peer-reviewed research outputs by the 94 outlets (Alperin et al., 2023a).
3.4. Identifying News Mentions of COVID-19 Research
To identify COVID-19-related preprints and WoS outputs, we searched for the presence of the following COVID-19-related keywords in the outputs’ titles using R version 4.3.0 (2023): coronavirus, covid-19, sars-cov, sars-cov-2, ncov-2019, 2019-ncov, hcov-19, sars-2, pandemic, covid, Severe Acute Respiratory Syndrome Coronavirus 2, and 2019 ncov. These keywords were a combination of those used by Fraser et al. (2021) and those listed in the National Library of Medicine’s search strategy for identifying COVID-19-related literature (Chen, Allot, & Lu, 2020). We also added the term “pandemic,” which was not included in either of these lists of keywords but is likely used in many COVID-19 titles. As some keywords (e.g., “pandemic”) may have been used in non-COVID-19 contexts, we also filtered for research published in 2020 or later when identifying COVID-19-related research.
3.5. Statistical Analyses
Statistical analysis was performed using Stata version 17 (StataCorp, 2021). The Stata script used for the following analysis has been made publicly available (Alperin et al., 2023b). Throughout our analyses, we examined changes in preprint media coverage in terms of proportions, rather than counts. Specifically, we compared mentions of preprints against mentions of all research in our sample (i.e., mentions of preprints and WoS research). Doing so allowed us to control for any fluctuations in the volume of preprint mentions that were created by changes in Altmetric’s approach to identifying research mentions during the study period, rather than the result of changing journalistic practices. For ease of reading, we use the term “share of preprint mentions” to refer to the proportion of all research mentions that focused on preprints and “share of WoS mentions” to refer to the proportion that focused on WoS research.
Disentangling any change in preprint coverage due to the launch of the server and the onset of the pandemic was necessary, as the creation of medRxiv preprints in 2019 (Kaiser, 2019) coincided closely with the start of the COVID-19 era. As such, in Eq. 1 we estimated an ordinary least squares (OLS) regression of a binary indicator (Yit) coded as 1 if the news mention (i) referenced a preprint and coded as 0 otherwise against time (t), encoded as linear days since Jan 1, 2014 and allowed to be identified with third-order polynomial trends (β1 through β3), with each vector of third-order polynomial terms estimated in both the pre-COVID-19 era and COVID-19 era (α0 interacted with the vector of time trends). We differentiated pre-COVID-19 from COVID-19 era mentions through a binary indicator, coded as 1 if the preprint was mentioned in a news story published after January 10, 2020 (i.e., when the WHO first used the term “2019-nCoV” to describe the novel coronavirus; WHO, 2020), and coded as 0 otherwise. We modeled the period between the first news mention of a medRxiv preprint (i.e., on July 23, 2019, which postdates the launch of the site on June 25, 2019 by about one month) and the WHO’s statement as a linear intercept shift (β4). In practice, this variable allowed us to differentiate the change in preprint mentions that occurred with the introduction of medRxiv before (but close to the onset of) COVID-19 from the effect of COVID-19 itself. Similarly, we modeled the mentions of preprints with titles that included COVID-19-related language (i.e., “sars-cov-2” or a related term) as a linear intercept shift (β5). This last variable is important, as it allowed us to differentiate the change in preprint mentions for COVID-19-related topics in the media from changes in preprint prints in the COVID-19-era but not about COVID-19 topics. Last, to adjust for seasonality and periodicity effects we controlled for week-of-year intercepts (γwy; e.g., first week of 2014) and day-of-month effects (δmd; e.g., Tuesdays in January). In practice, controlling for periodicity and seasonality had little effect on model parameters but allowed us to rule out correlations between period effects and the onset of COVID-19. The error term (εit) is robust to heteroskedasticity.
Next, we estimated separate OLS regressions that allow us to test whether changes in preprint mentions vary across preprint servers, media outlets focused on different topics, and media outlets of different types. Because preprint servers necessarily represent preprint mentions, we discarded mentions of articles from WoS and collapsed the data so that we could observe counts of preprint mentions by day and identify any changes in these counts among the four servers (RQ2). To identify changes among the four media outlet topics and three outlet types, respectively, we kept the data as described previously, with each row representing a unique news mention of a preprint or WoS article. To identify changes in the share of preprint mentions across the four media outlet topics (RQ3), we focused on the three most prevalent topics in our sample—Health/Medicine, General News, and Science/Technology—and an “Other” category that included a variety of other topics (e.g., Business, Lifestyle, Explicit Point-of-View).
4. RESULTS
4.1. Has the Share of Preprint Coverage in the Media Increased During the COVID-19 Pandemic?
Our models suggest that the annual number and share of preprint mentions increased slowly from 2014 to 2019, then increased dramatically in 2020–2021 (Table 2, Figure 1). However, even during the pandemic period, preprint mentions made up only a small subset of media coverage of research, at less than 5% of all mentions of research. We also saw evidence of a shift in which servers received the most attention during the pandemic. Before the pandemic, most mentions of preprints cited preprints posted to arXiv or SSRN; yet during the pandemic, bioRxiv and medRxiv became the most frequently mentioned servers.
Year . | Number of WoS mentions . | Number of preprint mentions . | arXiv preprints (%) . | SSRN preprints (%) . | bioRxiv preprints (%) . | medRxiv preprints (%) . | Total preprint proportion (%) . |
---|---|---|---|---|---|---|---|
2014 | 98,580 | 753 | 0.34 | 0.41 | 0.01 | N/A | 0.76 |
2015 | 122,641 | 1,263 | 0.42 | 0.56 | 0.04 | N/A | 1.02 |
2016 | 192,414 | 2,115 | 0.44 | 0.56 | 0.09 | N/A | 1.09 |
2017 | 214,714 | 2,412 | 0.42 | 0.60 | 0.09 | N/A | 1.11 |
2018 | 218,946 | 2,597 | 0.48 | 0.58 | 0.12 | N/A | 1.17 |
2019 | 231,503 | 3,275 | 0.55 | 0.67 | 0.16 | 0.01 | 1.39 |
2020 | 279,737 | 12,484 | 0.46 | 0.56 | 0.85 | 2.40 | 4.27 |
2021* | 127,998 | 6,129 | 0.50 | 0.29 | 1.27 | 2.51 | 4.57 |
Total | 1,486,533 | 31,028 | |||||
% of all research mentions | 0.46 | 0.55 | 0.35 | 0.68 | 2.04 | ||
% of all preprint mentions | 22.62 | 26.93 | 16.95 | 33.49 | 100 |
Year . | Number of WoS mentions . | Number of preprint mentions . | arXiv preprints (%) . | SSRN preprints (%) . | bioRxiv preprints (%) . | medRxiv preprints (%) . | Total preprint proportion (%) . |
---|---|---|---|---|---|---|---|
2014 | 98,580 | 753 | 0.34 | 0.41 | 0.01 | N/A | 0.76 |
2015 | 122,641 | 1,263 | 0.42 | 0.56 | 0.04 | N/A | 1.02 |
2016 | 192,414 | 2,115 | 0.44 | 0.56 | 0.09 | N/A | 1.09 |
2017 | 214,714 | 2,412 | 0.42 | 0.60 | 0.09 | N/A | 1.11 |
2018 | 218,946 | 2,597 | 0.48 | 0.58 | 0.12 | N/A | 1.17 |
2019 | 231,503 | 3,275 | 0.55 | 0.67 | 0.16 | 0.01 | 1.39 |
2020 | 279,737 | 12,484 | 0.46 | 0.56 | 0.85 | 2.40 | 4.27 |
2021* | 127,998 | 6,129 | 0.50 | 0.29 | 1.27 | 2.51 | 4.57 |
Total | 1,486,533 | 31,028 | |||||
% of all research mentions | 0.46 | 0.55 | 0.35 | 0.68 | 2.04 | ||
% of all preprint mentions | 22.62 | 26.93 | 16.95 | 33.49 | 100 |
Partial year.
With respect to medRxiv preprints, we found that the onset of COVID-19 increased the share of preprint mentions in the media beyond any increase due to the launch of the server in 2019. Specifically, we estimate that, prior to the introduction of medRxiv and COVID-19, the share of preprints mentioned in the media was increasing at a glacial pace (an annual rate of 0.21 percentage points; p < 0.000; 95% CI [0.13–0.29]; see solid gray line, Figure 1). When medRxiv was introduced, the share of preprint mentions did not change (estimated decrease = 0.005 percentage points; p = 0.957; 95% CI [−0.17–0.16]). In contrast, the share of preprint mentions increased by an estimated 2.58 percentage points after the onset of the pandemic (p < 0.000; 95% CI [2.45–2.70]; see solid blue line). This significant but modest increase applied to all preprint mentions, but masks large differences in the proportion of preprint mentions between COVID-19-related and non-COVID-19-related research during the pandemic.
Indeed, our model strongly suggests that preprints played a far greater role in media coverage of COVID-19 specifically rather than in coverage of other topics. This can be seen from the “COVID-19” line (in fuchsia) in Figure 1, which represents the estimated share of preprint mentions among all the mentions of COVID-19-related research (i.e., both preprints and WoS articles that included COVID-19-related language in the title). We estimated an increase in these COVID-19-related preprint mentions of 12.94 percentage points (p < 0.000; 95% CI [12.84–13.04]), a large increase relative to predicted preprint mentions based on pre-COVID-19 trends (gray dotted line). We explore coverage of non-COVID-19 preprints in more detail in Section 4.2.
We further tested whether any changes in the share of preprint mentions seen during the pandemic could be linked to changes in mentions of WoS research during this period. We implemented this test by comparing growth rates of news mentions for preprints and WoS research over time. Given that preprint mentions comprised only about 2% of all mentions in our sample and to place preprint and WoS mentions on a common y-axis, we plotted preprint and WoS mentions as growth rates. Growth rates for preprint and WoS mentions were each calculated using the total number of mentions in the first 28 days of our data, beginning with Sunday (i.e., January 5, 2014). These mentions in the first 28 days comprised our “base rate,” and the total number of mentions in each sequential 28 days were then scaled by that base rate.
Here, we find that the rise in the share of preprint mentions that took place during the pandemic was not simply an artifact of a decrease in WoS mentions. As can be seen from Figure 2, WoS mentions increased by about 2.3 percentage points between May 2014 and September 2019, and this pace of growth remained relatively unchanged after COVID-19 began and started to garner media attention. In contrast, preprint mentions had increased by about 5.7-fold by the time of the WHO’s announcement about “2019-nCoV” in January 2020, but skyrocketed to a 30-fold increase at the height of the pandemic in May 2020. This figure thus shows that the increase in the proportion of preprint mentions during the pandemic era was driven almost entirely by an increased number of preprint mentions and not a decrease in the number of WoS mentions.
4.2. Do Changes in Media Coverage of COVID-19-Related Preprints Extend to Coverage of Preprints on Other Topics?
Our results suggest that the onset of the pandemic not only increased media attention to COVID-19-related preprints but may have also decreased attention to preprints on other topics. Among all research that excluded COVID-19-related language (solid gray line, Figure 1), we found that the share of preprint mentions during the pandemic decreased by 0.18 percentage points, although this decrease was not significant (p = 0.129; 95% CI [−0.42–0.05]). Model-based estimates suggest that by June 3, 2021, if the pandemic had not occurred, we would have expected the share of preprint mentions to be 2.58 percentage points (dashed gray line, Figure 1), yet the observed share of non-COVID-19-related preprint mentions comprised only 0.86 percentage points of all news mentions, a difference of 1.71 percentage points from what would have been expected (p < 0.000; 95% CI [1.13–2.31]). This last result suggests that the pandemic may have shifted media attention away from preprints about non-COVID-19-related topics by modest amounts. In effect, our results suggest that COVID-19-related preprint mentions eclipsed prepandemic preprint mentions.
Looking at the number of preprint mentions by server, we observed that there was no increase in non-COVID-19-related preprint mentions in the pandemic for any server (Figure 3). All point estimates were trivially small—about 0.7 to 1.8 fewer mentions per day, on average—and not statistically significantly different from zero (p-values range from 0.217 to 0.699). For articles that included COVID-19-related language in the titles, there was an average increase in daily news mentions of bioRxiv and medRxiv preprints—of 6.2 (p < 0.000; 95% CI [5.34–6.95]) and 19.2 (p < 0.000; 95% CI [18.25–20.08]), respectively—and a significant decrease in average daily news mentions for arXiv and SSRN—of −2.7 (p < 0.000; 95% CI [−3.75– −1.68]) and −1.6 (p < 0.000; 95% CI [−2.69– −0.51]) mentions per day, on average. In total, for the 511 days in the pandemic era in our sample, this amounted to an increase of about 9,800 total mentions of medRxiv preprints and 3,170 total mentions of bioRxiv preprints.
It is important to note that the declines in mentions of arXiv and SSRN preprints were only significant for preprints that included a COVID-19-related keyword in the title. That is, the media were less likely to mention preprints from these servers that were about COVID-19; instead, when communicating about pandemic research, they tended to mention bioRxiv or medRxiv preprints. These results could suggest that the media drew on the servers they expected would house the research most relevant to their area of interest. It also suggests that COVID-19-related coverage tended to focus on medical aspects of the pandemic and less so on social or economic aspects.
4.3. Have Changes in Media Coverage of Preprints Occurred Similarly Across Media Outlets?
Finally, we tested how preprint mentions changed across media outlets with four different topical foci (i.e., General News, Science/Technology, Health/Medicine, Other) or of different types (i.e., legacy, peripheral, or nonjournalism). For mentions of COVID-19-related research, we found that outlets in all four topic categories increased their preprint coverage dramatically during the pandemic, but to different extents. Increases ranged from 8.3 percentage points (Science/Technology) to 15.6 percentage points (Health/Medicine) and were all statistically significant (p < 0.000 for all coefficients) (Figure 4). Changes in the share of mentions for non-COVID-19-related preprints were trivial, with only the “Other” category seeing a small but statistically significant increase (0.9 percentage points).
Similarly, none of the outlet types (i.e., legacy, peripheral, nonjournalism) saw a statistically significant increase in the share of preprint mentions of non-COVID-19-related articles, and the coefficients themselves were trivially small, never reaching 1 percentage point. However, just as with the topic-based data, all three outlet types increased the share of mentions of COVID-19-related preprints after the WHO announcement in 2020 (estimates ranged from 4 to 14 percentage points for nonjournalism and legacy outlet types, respectively.
Finally, to provide a better sense of the nature of the outlets that frequently rely on preprints, we identified the 25 media outlets whose coverage included the largest share of research mentions in general (i.e., mentions of preprints and WoS outputs) and calculated their share of preprint mentions both before and during the COVID-19 era (Table 3). The list represents about 75% of all research mentions in our sample and includes a mix of legacy media, such as BBC News and the New York Times, and peripheral outlets, such as Reason or The Conversation. Several nonjournalism outlets also appear on the list, mostly services such as EurekAlert! and Newswise, which do not publish original articles but distribute science press releases (many of which include mentions of new research). Among outlets that tended to cover a high proportion of preprints in general, the U.S. libertarian magazine Reason stood out, mentioning approximately one preprint for every three WoS outputs—far more than any other outlet in our sample prior to the COVID-19 pandemic. Interestingly, the outlet’s share of mentions actually decreased slightly during the pandemic, from 27% to 24%. Among the outlets that saw the largest increase in their share of preprint mentions, the peripheral Health/Medicine outlet News Medical topped the list, with essentially no preprint mentions before the pandemic but a share of 43% during the pandemic. Several major legacy General News outlets, such as BBC News, the Daily Mail, the New York Times, and the Guardian, also saw notable increases in preprint coverage, moving from minimal use of preprints to covering about one preprint for every four or five mentions of research. Although some specialized Science/Technology outlets (e.g., Scientific American, Phys.org) increased their coverage of preprints during COVID-19, these increases tended to be less pronounced than those seen among the major General News outlets.
. | Outlet’s share of all research mentions (%) . | Outlet’s share of preprint mentions . | |
---|---|---|---|
Prepandemic era (%) . | Pandemic era (%) . | ||
BBC News | 1.75 | 1.69 | 21.85 |
Business Insider | 2.51 | 2.61 | 17.45 |
Business Insider Australia | 1.37 | 2.03 | 16.32 |
Daily Mail | 2.07 | 1.45 | 22.24 |
EurekAlert! | 1.55 | 0.24 | 2.83 |
Forbes | 6.37 | 9.94 | 12.37 |
Gizmodo | 1.00 | 3.33 | 14.83 |
MedicalXpress | 2.42 | 0.19 | 7.75 |
New Scientist | 1.09 | 4.15 | 20.84 |
New York Times | 8.12 | 3.69 | 23.32 |
Newswise | 1.28 | 0.60 | 6.98 |
Phys.org | 3.86 | 0.99 | 5.63 |
Quartz | 1.47 | 4.93 | 16.38 |
Reason | 2.03 | 26.83 | 24.01 |
Salon | 1.25 | 4.27 | 17.27 |
Science/AAAS | 0.98 | 2.85 | 33.62 |
Scientific American | 1.02 | 2.13 | 17.17 |
The Atlantic | 1.57 | 6.48 | 23.82 |
The Conversation | 6.84 | 1.75 | 9.76 |
The Guardian | 1.88 | 2.01 | 20.56 |
Medical News | 8.73 | 0.30 | 42.70 |
Times of India | 1.03 | 0.99 | 11.96 |
Vox | 1.60 | 4.79 | 19.23 |
Washington Post | 3.16 | 4.66 | 13.68 |
Yahoo! News | 11.44 | 1.57 | 14.27 |
. | Outlet’s share of all research mentions (%) . | Outlet’s share of preprint mentions . | |
---|---|---|---|
Prepandemic era (%) . | Pandemic era (%) . | ||
BBC News | 1.75 | 1.69 | 21.85 |
Business Insider | 2.51 | 2.61 | 17.45 |
Business Insider Australia | 1.37 | 2.03 | 16.32 |
Daily Mail | 2.07 | 1.45 | 22.24 |
EurekAlert! | 1.55 | 0.24 | 2.83 |
Forbes | 6.37 | 9.94 | 12.37 |
Gizmodo | 1.00 | 3.33 | 14.83 |
MedicalXpress | 2.42 | 0.19 | 7.75 |
New Scientist | 1.09 | 4.15 | 20.84 |
New York Times | 8.12 | 3.69 | 23.32 |
Newswise | 1.28 | 0.60 | 6.98 |
Phys.org | 3.86 | 0.99 | 5.63 |
Quartz | 1.47 | 4.93 | 16.38 |
Reason | 2.03 | 26.83 | 24.01 |
Salon | 1.25 | 4.27 | 17.27 |
Science/AAAS | 0.98 | 2.85 | 33.62 |
Scientific American | 1.02 | 2.13 | 17.17 |
The Atlantic | 1.57 | 6.48 | 23.82 |
The Conversation | 6.84 | 1.75 | 9.76 |
The Guardian | 1.88 | 2.01 | 20.56 |
Medical News | 8.73 | 0.30 | 42.70 |
Times of India | 1.03 | 0.99 | 11.96 |
Vox | 1.60 | 4.79 | 19.23 |
Washington Post | 3.16 | 4.66 | 13.68 |
Yahoo! News | 11.44 | 1.57 | 14.27 |
Note: Top 25 outlets shown based on their share of mentions of Web of Science (WoS) outputs and preprints, representing about 75% of all mentions in our sample. Column 2 (Outlet’s share of all research mentions) shows each outlet’s share of all mentions of WoS outputs and preprints (i.e., number of preprints and WoS research for the outlet divided by total number of preprints and WoS mentions in our sample). Columns 3 and 4 (outlet’s share of preprint mentions) show the share of each outlet’s mentions of preprints (i.e., number of preprint mentions for the outlet divided by total number of preprint mentions in our sample) prior to COVID-19 and the share of each outlet’s mentions for preprints during COVID-19. Columns do not sum to 100% because only the top 25 outlets are shown.
5. DISCUSSION
It has been argued that preprint coverage during the pandemic constituted a break from journalism norms and a paradigm shift in how emergent research is reported on and shared with the public (Burke, 2021; Makri, 2021). Using longitudinal data from the WoS and four preprint servers, we sought to establish whether, in what ways, and to what extent this is the case. By identifying how the volume and nature of preprint media coverage has changed over time and what role the pandemic has played in this change, our study makes an important contribution to our understanding of journalists’ use of preprints—a topic about which much has been written, but very little is actually known.
A key finding from our analysis is that the volume of preprint media coverage increased by roughly fourfold in the pandemic period, a clear break from the slight but steady upward trend that preceded it. Virtually all of this increase was driven by coverage of COVID-19-related preprints, with little change in coverage of preprints on other topics. Although coverage of peer-reviewed research continued to exceed preprint coverage—even during the height of the crisis—the growth in coverage of preprints seen during this period may imply a shift in journalistic norms and practices, including a changing outlook on preliminary, unvetted research and its reporting.
At the same time, however, we observed a slight (but nonsignificant) decrease in coverage of non-COVID-19-related preprints during the pandemic. This lack of coverage of non-COVID-19-related preprints may simply be the result of outsized media attention to COVID-19 in general (i.e., not just COVID-19-related preprints), which may have come at the expense of coverage on other topics. Yet, it could also indicate that the surge in preprint coverage observed during the pandemic was a temporary change—a break from established norms that journalists made to cover a rapidly evolving crisis, rather than a true shift in practice. More research is needed to assess the degree to which increases in preprint coverage will persist in the coming years, as media outlets and scientists turn their attention away from COVID-19 and toward other issues.
Interestingly, the sharp rise in preprint coverage seen during the pandemic was most pronounced for health and medical outlets, which appear to have been resistant to covering preprints until relatively recently. Although outlets specializing in other topics, such as science and technology, covered preprints at least occasionally before the pandemic, our findings suggest that, for health and medical outlets, the crisis seems to have created something closer to the “paradigm shift” described by journalists in previous research (Fleerackers et al., 2022b). Preprints were barely mentioned in health and medical outlets up until 2019—even after medRxiv was launched—but become a frequent source of coverage in these outlets after 2020, particularly when reporting on COVID-19. Again, more research is needed to assess whether this trend will continue beyond the pandemic.
The factors that motivated health and medical journalists to adapt their practices during COVID-19 also remain unclear. Although the medical nature of the crisis likely played a primary role, at least some of this shift may be linked to a parallel shift in preprint use among health and medical scholars themselves. Like journalists, researchers in these areas have historically been hesitant to post or cite unreviewed research (Flanagin, Fontanarosa, & Bauchner, 2020; Maslove, 2018), but became active users of preprints during the pandemic (Fraser et al., 2021; Waltman et al., 2021). Because journalists who report on research rely heavily on interviews with scientific experts (Schultz, 2023), changing attitudes toward preprints among medical scientists would likely affect reporting practices on medical and related issues. It is possible, in other words, that the uptake of preprints by medical and health outlets reflects the growing acceptance of preprints within the medical and health sciences. This may also be true of preprint-based journalism more broadly, as preprint adoption also grew during the study period (Nane et al., 2023). Waltman et al. (2021) report that the number of preprints in 2020 was about 150% larger than the number of preprints in 2015, and Penfold and Polka (2020)—working with data from PubMed and 10 preprint servers—found that the number of biology preprints increased almost tenfold between 2013 and 2019 (from 0.24% to 2.36%). However, just because more preprints are becoming available that does not mean that journalists will automatically cover them. By covering preprint science, journalists may—potentially—be adapting their own norms to follow those of scientists working in the disciplines they report on. It is also possible that journalistic norms in terms of preprint coverage are changing as journalists are increasingly pressured to attract reader attention away from competing outlets. In other words, the rapid and competitive nature of the media landscape may encourage journalists to use preprints even outside the pandemic context—and thus get an “edge” over other media outlets, as suggested in previous research (Fleerackers et al., 2022b).
In terms of outlet types, we found that both traditional legacy outlets (e.g., the New York Times) and peripheral media outlets (e.g., News Medical) were covering preprints to some extent before the pandemic, but greatly increased this coverage during the crisis. The similar pattern seen for the two outlet types is surprising, as peripheral media outlets are often conceptualized as following different norms, ethics, and practices than legacy media and as being less beholden to professional guidelines, such as those that urge journalists to avoid covering unreviewed research (Froke et al., 2020). Our findings thus align with previous scholarship, which has suggested that the boundaries between legacy and peripheral journalism are blurring and that categorizing outlets this way may no longer be meaningful (Deuze & Witschge, 2018; Witschge, Anderson et al., 2019). Although more research is needed, it is possible that such blurring boundaries are especially likely in contexts where professional guidelines and best practices are still emerging, such as when reporting on preprints (van Schalkwyk & Dudek, 2022a). Future studies could explore whether the similarities we observed in preprint coverage among peripheral and legacy outlets also apply to larger and more diverse outlet samples, or to other situations in which journalistic practices are rapidly evolving.
Collectively, our findings provide some of the first evidence that journalists are increasingly using preprints—at least in some areas—and that the pandemic has greatly accelerated this use. However, this conclusion should be considered alongside several limitations. First, there are known challenges of working with Altmetric data to identify media coverage of research, particularly in languages other than English (see Ortega (2020a) for a review). We have attempted to mitigate these challenges by working with a predefined set of English-language media outlets, as recommended in previous research (Fleerackers, Moorhead et al., 2022a). Yet, although the restricted nature of our sample of outlets is a strength of this study, it is also a limitation, as the patterns we observed among these 94 outlets may not apply to those that less frequently report on research or do so in different languages. Relatedly, an increasing number of regional preprint servers are coming online, which offer spaces for authors to share preprints that are relevant to their own geographic context in languages other than English (e.g., SciELO Preprints, AfricArXiv, Jxiv). These servers may become preferred sources for media outlets publishing non-English content. Replicating our findings with a larger set of outlets and servers, or through complementary data gathering methods, would be a fruitful avenue for future research.
We also aimed to make our findings more robust by contextualizing any increases in preprint media coverage alongside changes in coverage of peer-reviewed research during the same period. To do so, we relied on WoS data, which is biased towards studies from scientific, technical, and medical disciplines and published in English-language journals from the Global North (Alperin, Babini, & Fischman, 2014; Mongeon & Paul-Hus, 2016). Given our study’s focus on English-language media outlets, the impact of the language bias is likely minimal (i.e., it is relatively unlikely that a journalist working for an English-language outlet would cover non-English research). However, the disciplinary and geographic biases are limitations of our study that should be kept in mind when interpreting the results.
Finally, the nature of our data only enabled us to explore changes in preprint media coverage from 2014 through the first year and a half of the pandemic, leaving many questions unanswered about what the future will bring. We hope that scholars will build on our findings to provide further insight into the implications of the preprint coverage seen during COVID-19 persisting long-term.
ACKNOWLEDGMENTS
The authors wish to thank Altmetric for providing this study’s data free of charge for research purposes and Stefanie Haustein for her input on the methodology.
AUTHOR CONTRIBUTIONS
Alice Fleerackers: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Writing—original draft, Writing—review & editing. Juan Pablo Alperin: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Supervision, Writing—original draft, Writing—review & editing. Kenneth Shores: Formal analysis, Methodology, Software, Visualization, Writing—original draft, Writing—review & editing. Natascha Chtena: Writing—original draft, Writing—review & editing.
FUNDING INFORMATION
This research is supported by a Social Sciences and Humanities Research Council of Canada (SSHRC) insight grant, Sharing health research (#453-2020-0401). AF is supported by a Social Sciences and Humanities Research Council Joseph-Armand Bombardier Doctoral Fellowship (#767-2019-0369).
COMPETING INTERESTS
The authors have no competing interests.
DATA AVAILABILITY
Preprint and WoS mention data for this study can be accessed via the Harvard Dataverse at https://dataverse.harvard.edu/ with the following link: https://doi.org/10.7910/DVN/ZHQUFD. Scripts used to analyze the data can be accessed via Zenodo at https://doi.org/10.5281/ZENODO.8125008.
Notes
NISO Altmetrics Working Group C “Data Quality” – Code of Conduct Self-Reporting Table.
REFERENCES
Author notes
Handling Editor: Rodrigo Costas