The surge in preprint server use, especially during the COVID-19 pandemic, necessitates a reexamination of their significance in the realm of science communication. This study rigorously investigates discussions surrounding preprints, framing them within the contexts of systems theory and boundary objects in scholarly communication. An analysis of a curated selection of COVID-19-related preprints from bioRxiv and medRxiv was conducted, emphasizing those that transitioned to journal publications, alongside the associated commentary and Twitter activity. The data set was bifurcated into comments by biomedical experts versus those by nonexperts, encompassing both academic and general public perspectives. Findings revealed that while peers dominated nearly half the preprint discussions, their presence on Twitter was markedly diminished. Yet, intriguingly, the themes explored by these two groups diverged considerably. Preprints emerged as potent boundary objects, reinforcing, rather than obscuring, the delineation between scientific and nonscientific discourse. They serve as crucial conduits for knowledge dissemination and foster interdisciplinary engagements. Nonetheless, the interplay between scientists and the wider public remains nuanced, necessitating strategies to incorporate these diverse discussions into the peer review continuum without compromising academic integrity and to cultivate sustained engagement from both experts and the broader community.

Preprint servers are becoming increasingly popular for publishing scientific articles prior to the review process (Abdill & Blekhman, 2019; Fraser, Momeni et al., 2019; Kirkham, Penfold et al., 2020; Polka & Penfold, 2020). The use of preprints offers several advantages for researchers. They can make their research results publicly available early on without restrictions (Green open access) and generate a DOI for their manuscripts. In addition, many of these platforms allow for publicly visible commenting on the manuscript, thereby making at least a technical peer review possible for anyone.

During the COVID-19 pandemic, particularly in 2020, interest in preprints in the medical field increased significantly, as there was a heightened need for the public and media to have access to the latest research results on epidemiological, virological, or immunological studies in the context of the pandemic (Kodvanj, Homolak et al., 2022; Kwon, 2020). As a result, visitor numbers on preprint servers increased significantly (Fraser, Brierley et al., 2021). Commenting on preprint servers and on social media also increased (Fabiano, Hallgrimson et al., 2020). While the commenting function of preprint archives is primarily intended for a knowledgeable expert audience, nonexperts and nonacademics have also started to comment. However, these processes have been little studied so far, and there is a need for clarification regarding the nature of the discussion as well as the provenance of the commentators. This is also important because preprint servers represent a data infrastructure in line with the Open Science movement, which enables peer-review-like processes with the participation of the public (Desjardins-Proulx, White et al., 2013; Vicente-Saez & Martinez-Fuentes, 2018). Therefore, the subject offers potential insights into the functioning and consequences of such opening processes, such as the relationship between the scientific system and its environment as well as related quality-related risks (Mirowski, 2018; Widener, 2020).

The question arises as to which partial publics participate in the commenting on preprints with COVID-19-related content, as well as their dissemination on social media, and what can be concluded from this regarding the opening of scientific quality assurance procedures such as peer review.

To investigate this question, Section 2 will first discuss the research on how systems theory and boundary objects can be applied to preprints. Theoretical considerations will be made for or against the blurring or stabilization of the boundary between science and its environment, leading to hypotheses (Section 3). This will be followed by the development and description of the empirical approach based on a sample of preprints from the field of biomedicine with a focus on COVID-19 (Section 4). The results of the study will be presented in Section 5 and discussed in Section 6. Finally, a conclusion will be drawn (Section 7).

2.1. Preprints as an Element of Open Science

Preprints belong to the field of Open Science. In short, Open Science aims to enable transparent and accessible knowledge, which is shared and further developed through collaborative networks (Vicente-Saez & Martinez-Fuentes, 2018). There are very different schools of thought in Open Science, ranging from efficiency considerations, democratization of processes, public transparency, and open infrastructures to new evaluation criteria (Fecher & Friesike, 2014). Preprints are mostly discussed in relation to Open Access, as they enable unrestricted access to scientific results, and in relation to Open Evaluation, as they concern scientific quality assurance.

Open Peer Review is another topic that is related to Open Evaluation (Bezjak, Clyburne-Sherin et al., 2018). Through the public commentability of manuscripts, preprints enable an extended form of the peer review process, in which experts and laypeople alike can contribute their perspectives and criticisms. There are various variants of Open Peer Review, such as Crowdsourced Review or Open Participation Peer Review (anyone can contribute to the review), as well as Prepublication Review (review takes place before a formal review at a journal) (Ford, 2013; Ross-Hellauer & Görögh, 2019). The advantage of crowdsourcing reviewers can be seen in countering disciplinarily isolated perspectives, that is, interdisciplinary perspectives and a higher number of reviewers can also contribute to quality assurance of manuscripts (Ross-Hellauer & Görögh, 2019, p. 9). However, with regard to commenting on preprints, it must be noted that commenting on the website does not automatically represent an assessment of the scientific quality. In addition to commenting on preprint pages, there are now also dedicated forms of prepublication Open Peer Review offered by some operators, such as ReviewCommons, PeerRef, and CrowdPeer. However, they currently serve more of a niche. There are also providers of postpublication Open Peer Reviews that enable fast publication, but they cannot address all concerns regarding the peer review process and the quality of articles (Kirkham & Moher, 2018).

2.2. Preprints and the Problems of Peer Review

The growing popularity of preprints cannot be understood without considering the problems of traditional peer review, which include low reliability, long publication delays, lack of transparency in the review process, potential biases of reviewers towards certain groups, rejection of negative results, and potential abuse of confidentiality (Bohannon, 2013; Dwan, Altman et al., 2008; Helmer, Schottdorf et al., 2017; Lee, Sugimoto et al., 2013; Powell, 2016; Schroter, Black et al., 2008; Smith, 2006). Additionally, the traditional peer review system struggles to ensure reproducibility of research results (Ioannidis, 2005; Nosek, Alter et al., 2015). Furthermore, with the increasing number of scientific publications and journals, it is becoming increasingly difficult to find enough suitable reviewers for submitted manuscripts.

Preprints offer solutions to some of these problems: They allow for the rapid dissemination of research results by bypassing long publication delays (Fraser et al., 2019). Early publication of preprints enables researchers to share their work, receive feedback, and respond to current scientific developments (Kirkham et al., 2020). Preprints that open up discussion and criticism of research results can help draw attention to reproducibility problems and find solutions more quickly (Nosek et al., 2015). Furthermore, opening up the peer review process to a wider community, including the interested public, can improve the quality and accountability of scientific work and increase trust in science (Desjardins-Proulx et al., 2013). This can lead to greater engagement of subpublics outside of science in the scientific discourse and a better understanding of the dynamics of research. By allowing for public commenting on manuscripts, preprints enable an expanded form of peer review in which experts and laypeople alike can contribute their perspectives and criticism. However, preprints can also bring new challenges for quality assurance. As they are published before formal review, there is a risk of disseminating erroneous or misleading results (Widener, 2020). This can be particularly problematic in controversial or fast-moving research areas, such as in the case of the COVID-19 pandemic (Kodvanj et al., 2022), especially when such information is picked up and used by media or political decision-makers without considering the underlying scientific quality and validity of the results. Public commenting can also be problematic if it poisons the discussion climate in the comment section with mass unsavory or hostile contributions, thereby pushing constructive contributions into the background. This underscores the need to develop clear communication strategies and educational approaches to ensure that the wider public appropriately understands the limits and provisional nature of preprints (Desjardins-Proulx et al., 2013; Vicente-Saez & Martinez-Fuentes, 2018).

2.3. Systems-Theoretical Perspective on Preprints

Systems theory describes science as an autopoietic (i.e., closed and self-sustaining) social system that operates with the guiding code of true/false (Luhmann, 1995). This guiding code enables the generation and communication of scientific statements that are capable of being true or false. However, science is not the only social system in society but coexists with other systems, each with its own guiding codes, such as politics (power), law (right/wrong), or art (beautiful/ugly) (Luhmann, 1997). These systems are operationally closed and can only communicate with each other through structural couplings that organize the exchange of information through the mode of self and other observation. Self-reference refers to the internal mechanisms and communication processes within the scientific system, while other-reference encompasses the consideration of information from other systems (Luhmann, 1990, 1992). A subsystem processes information from the environment by observing it through its own operations, which is referred to as other-references. Structural couplings can be said to occur when a system relies structurally on certain properties of its environment (Luhmann, 1993, p. 441).

Preprints accelerate the dissemination of knowledge and promote exchange with other systems. However, studies show that increased exchange does not always correlate with higher scientific quality (Bornmann & Haunschild, 2018). It is possible that increased other-reference may influence the scientific system and lead to a lack of differentiation, in which the system boundaries of science expand. Here, lack of differentiation means that a changed functional differentiation of society takes place (Stichweh, 2014). The guiding distinction between “true” and “false” could, for example, be replaced by a distinction between “useful” and “not useful.” However, it would be questionable whether we could still speak of a scientific system in that case. Breaking the operational closure of the scientific system would erode its functionally justified basis for existence. But the pressure for change does not necessarily have to materialize so radically. A weaker form of lack of differentiation could occur if other-reference increases in scientific practice, that is, if structural couplings deepen without shifting system boundaries. Here, the so-called program level plays a special role, structuring institutional arrangements and social organization within the subsystem below the guiding codes (Aretz, 2022, pp. 1164–1165; Luhmann, 1995). There is no need for adjustment of the guiding codes, as the programs can already process the communicative signals from the environment in such a way that the operations do not need to be fundamentally changed.

The question of opening up internal scientific communication to external groups—here using preprints as an example—is also important from the perspective of science communication. Where members of the scientific system regularly interact with nonscientific groups (i.e., take on tasks of other observation) they form “boundary points” (“Grenzstelle”) of the subsystem. Boundary points make the environment of the system, here in terms of organizations, and the system of the environment understandable (Luhmann, 1999 [1964], p. 223). This contributes to the stabilization of organizations and mediates between subsystems (Tacke, 1997). If science communication represents a boundary point, then opening up peer review to nonscientific actors would be equivalent to expanding the action areas of science communication and thus would be an additional boundary point of the scientific system.

2.4. Preprints as Boundary Objects

The concept of “boundary objects” was introduced by Susan Leigh Star and James R. Griesemer (1989) to describe collaboration in heterogeneous social worlds. It is very similar to the concept of the boundary point in Luhmann’s sense, although it is not derived from systems theory. Boundary objects are places, documents, terms, or models that facilitate the exchange of information between groups with different backgrounds, disciplines, or perspectives. They enable the groups to integrate their different perspectives within a common framework and make joint decisions. Boundary objects are flexible and adapt to local needs, but are also stable enough to maintain a shared identity and meaning.

Preprints as boundary objects represent complex scientific phenomena. Interaction and commentary allow for abstraction and accessibility for different groups of readers, promote interdisciplinary exchange, and inspire research. Hence, translation processes and communication with nonscientific groups are possible. Preprints allow for rapid sharing of unpublished results and flexible feedback. Challenges arise from different levels of quality and lack of standards and norms in commentary. Massive criticism and misunderstandings in public reception can damage the legitimacy and authority of preprints, which impairs their constructive function as boundary objects.

Boundary objects and systems theory have some similarities. Both theories emphasize the importance of structures that allow for the integration of different perspectives and views. While systems theory emphasizes structures within a system, the theory of boundary objects emphasizes structures that connect different groups at the boundary. Some authors have suggested that boundary objects can facilitate learning, innovation, and transformation in social systems (Akkerman & Bakker, 2011; Carlile, 2002; Wenger, 2000). In systems theory, this function is achieved through structural coupling. Luhmann argues that a system is structurally coupled when it is dependent on its environment but also able to influence its environment. This interaction between a system and its environment is a central aspect of structural coupling (Luhmann, 1995). In the context of the theory of boundary objects, boundary objects can be seen as a mechanism that enables the structural coupling between different social systems or groups. Unlike the possibility of de-differentiation, boundary objects are more suited to stabilizing system boundaries, as the contact points of exchange are clearly delineated and can resist deeper intervention in the inner scientific order.

3.1. Preliminary Conclusions on the Role of Preprints for the Relationship Between Science and the Environment

Within the previously established framework, arguments can be found as to why preprints and the possibility of their free commenting can contribute both to the softening of boundaries between science and nonscience—such as journalists, practitioners, or interested laypeople—and to the stabilization of these boundaries. Softening is understood here as an intensification of structural couplings to such an extent that the environment of science influences the internal operations of science. Stabilization means an increase in structural coupling, but in the sense of boundary objects (i.e., as a contact point for shared information exchange between subsystems) whose boundaries, however, remain clearly distinguishable due to their respective roles and guiding distinctions.

The following arguments speak for a blurring of boundaries through preprints:

  • Faster dissemination: Preprints enable faster dissemination of research results before they have gone through the traditional peer-review process. This can lead to greater uncertainty about the quality of research and make it difficult to distinguish between validated scientific findings and public opinions, although this is not empirically supported (Fraser et al., 2021; Kirkham et al., 2020).

  • Interaction effects: The commenting function on preprint platforms allows the public to interact directly with scientists and express feedback or criticism. This can lead to scientists aligning their research results more strongly with public interests and concerns (Weingart, 2012).

  • Media presence: The dissemination of preprints in social media and news portals can further blur the boundaries between scientific communication and public discourses, especially when controversial topics or immature research results receive broad media attention (Fabiano et al., 2020; Widener, 2020).

On the other hand, these points speak for a stabilization of boundaries through preprints:
  • Flexibility: Preprints can function as boundary objects because they are relevant for both scientific and public audiences. They are written in a format that meets scientific standards but is also publicly accessible to interested laypeople.

  • Adaptability: Different subsystems can use and interpret preprints in different ways. Scientists can use them as preliminary research results and use them for further scientific discourse, while the public can use them as a source of information or discussion material. As a supplement to the traditional peer-review process, they can serve a quality assurance function (Tennant, Crane et al., 2019). In particular, an integration of preprints and open peer review could help with this (Nosek et al., 2015; Priem & Hemminger, 2012; de Silva & Vance, 2017).

  • Mediation: Preprints can help promote communication and collaboration between different subsystems. For example, journalists can use preprints to report on scientific advances and thus promote knowledge transfer and public discussions.

  • Transdisciplinarity: Furthermore, preprints can serve as common reference points for different scientific disciplines and social groups, thus contributing to the formation of interdisciplinary and transdisciplinary research networks (Borgman, 2007).

3.2. Derivation of Hypotheses

Based on the previous considerations, some hypotheses can now be formulated for the following empirical investigations:

  • (H1a) 

    Preprints are relevant to professional and lay audiences. This is related to the theory of boundary stabilization, where science (experts) and nonscience (public) use the same platform to communicate, but the standards of scholarly communication remain unchanged. To empirically test this, the participation of a nonscience audience in the commenting of preprints is investigated.

  • (H1b) 

    Professional and lay audiences have different interests in preprints. Also in line with the theory of stabilization, the nonscience public discusses other issues in their comments than the professional peers. To test this, the topics discussed by both groups are compared.

  • (H1c) 

    Faster dissemination can be problematic. Related to boundary blurring, a potentially problematic side-effect of fast dissemination is the media presence of problematic preprints, which could undermine trust in scientific results. To test this, the presence in social media of preprints that did not get published yet in a journal is compared to those that were published after peer review.

  • (H2) 

    Comments on preprints can improve the peer review process. This hypothesis links with the theory of boundary stabilization, as it leverages preprints as boundary objects that provide a common platform for experts to give feedback and improve research quality. As there is no formal obligation to consider the comments during the review process an indirect testing approach would be to compare the reception in social media and news outlets (amounts and positive mentions) of preprints that had many comments to those that had none or a few only.

  • (H3) 

    Comments on preprints strengthen communication between scientists and the public. This hypothesis relates to boundary stabilization, as it implies that preprints provide a platform for direct communication between scientists and the public. Testing criteria could be direct replies from authors in the comments or, implicitly, frequent replies to comments. Interactions between commentators should at least in some cases cross boundaries between professional and lay audiences and, thus, establish communication.

  • (H4) 

    Comments on preprints promote interdisciplinary collaboration among scientists from different fields. This hypothesis aligns with the theory of boundary stabilization, where preprints act as boundary objects that facilitate information exchange and collaboration across different fields. A test criterion here could be differentiating between commentators that discuss statistical topics (i.e., interdisciplinary knowledge) in contrast to medical topics (disciplinary).

4.1. Data Generation

To empirically investigate the topic, a sample of preprint articles from the biological and medical fields related to the COVID-19 pandemic was examined to determine how they are commented on preprint servers and social media. The preprint archives bioRxiv and medRxiv were selected for this purpose. At the time of analysis (as of August 2, 2022), there were 24,438 preprint articles on the topic of COVID-19 or SARS-CoV-2, including 18,593 on the medRxiv portal and 5,845 on bioRxiv. For the sample of preprint articles from both portals, the “commented publication list” of the Science Media Center Germany (SMC) was used1. In addition to published research articles, the SMC also selects preprints that are listed with short summaries of their content. Therefore, these preprints represent a preselection of articles that have been recommended for journalistic reception or have been included in it. As of January 30, 2022, 229 preprints (175 from medRxiv, 54 from bioRxiv) were recorded, along with their URLs and DOIs, 222 of which were functioning URLs. The URLs always lead to a specific version of the preprint article, and comments on the preprint on the portal are only displayed for that specific version. As preprints are updated from time to time, usually due to revision after peer review, several versions of many articles are online, each of which has been commented on separately. Therefore, URLs had to be added retrospectively for other article versions that contained comments. Overall, 266 cases were analyzed (202 from medRxiv, 64 from bioRxiv).

Based on these cases, the comments on the entries were collected and stored via web scraping. The Rstudio program and in particular the R package Tidyverse (version 1.3.1) (Wickham, Averick et al., 2019) and Rselenium (version 1.7.7) (Harrison, 2022) were used for this purpose. The program first accessed the entry to retrieve central elements such as title, DOI, abstract, published version URL, number of comments, and number of Twitter tweets, as well as the links to embedded comments hosted by the Disqus service and the list of all tweets about the article, which can be accessed via the Altmetrics service. The content of comments and tweets was then collected via web scraping of the content elements on these links. For each comment, the username, comment, date, user profile link, and up and downvotes by other users were collected. In total, 1,992 comments (1,905 from medRxiv, 87 from bioRxiv) were identified (Figure 1).

Figure 1.

Base sample of preprints, data preparation, and data analysis.

Figure 1.

Base sample of preprints, data preparation, and data analysis.

Close modal

4.2. Analysis of Comments

The primary distinction to be made in the comments is between a scientific professional audience and a lay audience. A secondary distinction lies between peers (specifically those in human medicine or preferably in virology, immunology, pharmacology, or infectious diseases) and scientists from other disciplines. Such information is not available a priori in the data. The profile information of users commenting on the preprint pages via the Disqus online platform is too sparse or unreliable for this purpose. Thus, such distinction cannot be based on user profiles alone.

To determine group affiliation, quantitative methods will be employed and evaluated qualitatively. The basis for this is exclusively the content of the comments themselves, which are analyzed using text mining. Three distinct strategies were pursued:

  1. Manually Curated Keyword List (Manual): Initially, all comments were reviewed, and terms were identified that are clearly attributable to biomedicine or ostensibly belong to the usual statistical language of empirical biomedical studies. Through text mining, the most frequent words in the comments were also determined, and the list was supplemented with similar terms that were overlooked during the initial analysis (see Supplementary material).

  2. CIDO-based List (CIDO): During the pandemic, a Coronavirus Infectious Disease Ontology (CIDO) emerged, offering a standardized representation of various coronavirus infectious diseases, symptoms, transmission methods, etc., that is interpretable by both humans and computers (He, Yu et al., 2020). This is freely available online and could be processed using the R-package OntologyIndex (Greene, Richardson, & Turro, 2017)2. In the subsequent step, keywords were generated by aligning the abstracts of the preprints from the sample with the ontology. The matched terms formed the list for classifying the comments.

  3. List of Statistical Terms (Stat): Using GPT-43, a list of 31 words was compiled that contains standard statistical terms (e.g., regression, p-value, median). Categorization based on this list allows inferences about whether the commenter is academically trained and can interpret empirical research literature (see Supplementary material).

The grouping procedure remains consistent in each case: If a keyword from the list appears in the comments, the comment is labeled as a professional or scientific internal comment (“intra”). All others are not and are designated “extra.” Scientists from other disciplines can be approximately identified from the internal scientists by subtracting those from the Stat group who were simultaneously counted as internal for the CIDO list. The assumption is that anyone who expresses themselves using statistical terms but does not use biomedical technical terms is likely an external researcher engaging in the discourse due to methodological expertise. To control for variations in capitalization, special characters, and other word variations, both comments and keywords were converted to lowercase and truncated to their word stems (e.g., “antibod” for “antibodies”).

To evaluate the quality of these groupings, a manual qualitative coding of a sample of 367 comments (∼18%) was additionally conducted. Membership of the peers (A) as well as the scientific community (B) was coded (thus, the group of external researchers is derived from the difference B − A). This grouping serves as a reference point for the quantitative groupings. The method exhibiting the best correspondence will subsequently be defined as the distinguishing characteristic for further evaluations.

The evaluation of comments along the now-defined groups is structured as follows:

  1. Comparison of comments with characteristics of peer review reports: Building upon earlier studies (Spezi, Wakeling et al., 2018; Wakeling, Willett et al., 2020), the comments are examined for four attributes: novelty, significance, relevance, and soundness. For this purpose, lists of approximately 20 keywords each were developed using GPT-4 (see Supplementary material), which were then matched with the comments.

  2. Comparison of word frequencies among the groups: A comparison is made to see if certain terms particularly appear more frequently in specific groups.

  3. Comparison of discussed topics among the groups: Topic Modeling is a popular method in quantitative content analysis to extract topics from textual content. The Latent Dirichlet Allocation (LDA) method was employed for this purpose. LDA is a relatively straightforward tool for topic extraction with sufficient expressiveness (Miner, Stewart et al., 2023; Blei, Ng, & Jordan, 2003). It is based on the assumption that documents are created from a mix of topics, and that these topics in turn consist of a mixture of words. The exact topics and topic-word distributions are estimated using probabilistic inference methods; here the Gibbs sampling method4 is applied (Griffiths & Steyvers, 2004) using the R package topicmodels (Grün & Hornik, 2011).

  4. Sentiment analysis: As a further content-specific deepening, sentiments in the comments are identified using the Bing method (Hu & Liu, 2004). This compares the words in the comments with its own corpus, where positive or negative emotions are ascribed to them.

In the next step, the collected tweets that mention the sampled preprints were incorporated into the analysis. The following evaluations were conducted:
  1. Grouping of the tweets: Analogous to the preprint comments, the keyword list was utilized to differentiate the tweets into professional and layperson groups. The 333,872 tweets captured via altmetrics were processed into a text corpus suitable for text mining and topic modeling. For the analysis, a sample of 10,000 tweets was drawn from this corpus to minimize the computational load.

  2. Topic modeling: Also employing the LDA method, topics within the tweets were identified and subsequently evaluated differentiated by groups.

  3. Sentiment analysis: The analysis is finally supplemented by a sentiment analysis of the tweets, analogous to the procedure with the comments on the preprint pages.

Last, to offer an additional comparative perspective of the results, tweets related to the published versions of the preprints were collected and examined using the aforementioned methods. In total, 155 of the preprints had been published in a scientific journal by the time of the evaluation (September 2023). For these, 187,792 tweets referencing them were identified on the Altmetrics portal.

5.1. Commenting on Preprint Servers

A total of 2,113 unique comments were identified for the 222 preprints. In median, there were 0 comments per preprint, but on average, 9.44 (sd = 55) comments were recorded, which is due to the fact that a few articles were commented on very often, while many were not commented on at all. For 113 of the 222 preprints, no comments were found, and the five preprints with the most comments accounted for 1,570 (74.3%) of the 2,113 entries. This shows that particularly large participation in the commenting on preprints is only achieved in very few cases. However, 27 preprints had between 5 and 50 comments, indicating active reception. The length of comments also varies. On average, a comment is 92.6 words long (sd = 150), with a median length of 55 words. There are 121 comments that are over 250 words long, six comments that are over 1,000 words long, and the longest comment is 3,596 words long. Generally, a comment is typically about one paragraph long. Many substantive points cannot be covered in it, but discussing a specific aspect is possible.

5.2. Grouping the Comments

The grouping techniques employed yielded the results shown in Table 1. In total, 828 comments (39%) were categorized as peer comments, and 1,285 (61%) comments were designated as nonpeer comments (referred to as the “extra" group) using the manually curated list of terms. In contrast, selection based on CIDO terms allocated peer status to only 696 comments (33%), meaning two-thirds of the comments refrained from using relevant technical terms related to SARS-CoV-2 and COVID-19. Statistical terms were used in 837 comments (40%). Subtracting from this those comments that utilized CIDO terms, 466 comments remain (22%) that used academic but not specifically biomedical language. Adding the CIDO hits results in 1,162 comments (55%) in academic language, or conversely, 951 comments (45%) formulated in everyday language. Comparing this with the manually coded assignments presents a relatively similar picture: 33% of the comments are in everyday language, 44% can be attributed to peers, and 23% are academically formulated comments from nonspecialists.

Table 1.

Grouping results according to the different techniques

Grouping techniqueExtra groupIntra groupTotal
N%N%
Computed Manual terms 1,285 61 828 39 2,113 
CIDO terms 1,417 67 696 33 2,113 
Statistical terms 1,276 60 837 40 2,113 
Academic nonpeer 1,647 78 466 22 2,113 
Academic 951 45 1,162 55 2,113 
Handcoded Peers 205 56 162 44 367 
Academic nonpeer 283 77 84 23 367 
Academic 121 33 246 67 367 
Grouping techniqueExtra groupIntra groupTotal
N%N%
Computed Manual terms 1,285 61 828 39 2,113 
CIDO terms 1,417 67 696 33 2,113 
Statistical terms 1,276 60 837 40 2,113 
Academic nonpeer 1,647 78 466 22 2,113 
Academic 951 45 1,162 55 2,113 
Handcoded Peers 205 56 162 44 367 
Academic nonpeer 283 77 84 23 367 
Academic 121 33 246 67 367 

The classifications can be pairwise evaluated using Cohen’s kappa as a measure of inter-rater reliability (Landis & Koch, 1977). Thus, Manual terms and CIDO terms have a kappa of 0.29 (fair agreement), Manual terms and Statistical terms have a kappa of 0.42 (moderate agreement), and Statistical terms and CIDO terms have a kappa of 0.19 (slight agreement). When compared with manually coded peer comments, the kappa for the Manual terms is the highest (0.38), closely followed by the CIDO terms (0.36), and only a slight agreement with the Statistical terms (0.17).

Taking manually coded peer comments as a standard, the Venn diagram (Figure 2) clearly shows the greatest overlap with the Manual terms (108 out of 162 classifications). The CIDO terms only cover 85 of the 162 peer classifications. Based on this, the list of Manual terms can now be determined as a sufficiently meaningful grouping of intra- and nonspecialist comments. It combines key biomedical technical terms, as well as statistical terms typical for biomedical research. At the same time, the analysis so far has shown that, measured by the mode of expression, a significant number of nonspecialist researchers have participated in the commenting.

Figure 2.

Venn diagram of different grouping techniques. Note: The sample was reduced to n = 367 to compare matches with the handcoded grouping “Peer.”

Figure 2.

Venn diagram of different grouping techniques. Note: The sample was reduced to n = 367 to compare matches with the handcoded grouping “Peer.”

Close modal

A glance at some randomly selected comments (Table 2) reveals that even among comments not attributed to peers, there are certainly knowledgeable authors. However, comments from the peer group tend to be more specific and technical. They are also, on average, slightly longer (98.2 words vs. 88.6 words).

Table 2.

Random sample of comments (truncated) from “intra” (peer review equivalent) and “extra” (other) groups

Intra groupExtra group
An important paper and carefully conducted study, but it would be useful if the authors would provide a figure or table starting with the overall cohort size, indicating the total numbers of events according to vaccination versus infection or first (…) In Iceland they have 10 deaths out of 1509 recoveries with 270 active confirmed infections. 0.66% CFR. 
They are not even injections of antigens, they are injections of mRNA which induce your cells – once they take up the mRNA – to produce spike proteins. The study indicated zero re-infection. Unless they never left the house again that would be nearly impossible to avoid. 
The crucial issue here is sample selection. The participants essentially self-selected, but that is a potential source of huge bias. As a hypothetical scenario, if the people who chose to participate were predominantly people who’d had a cold and wer (…) Check out Sweden. Natural immunity reins! 
Yes figures and tables would be nice … they do not provide what extra level of protection you get if you have had sars cov2 and then get vaccinate … is it measured in folds like the actual infected over the vaccinated or is it like 13%? So why are they recommending a booster? 
I think you’re adjusting the 0.85% the wrong way on each point you’ve raised. Cellex’s FDA EUA letter for its rapid antibody test represented that “IgM antibodies to SARS-CoV-2 are generally detectable in blood several days after initial infection.” One of the points I see in this study is that there is no reason for discrimination against previously infected people who opt out of getting the vaccine. Why ostracize people who are a lower risk for spreading the virus in favor of those who’ve only (…) 
Intra groupExtra group
An important paper and carefully conducted study, but it would be useful if the authors would provide a figure or table starting with the overall cohort size, indicating the total numbers of events according to vaccination versus infection or first (…) In Iceland they have 10 deaths out of 1509 recoveries with 270 active confirmed infections. 0.66% CFR. 
They are not even injections of antigens, they are injections of mRNA which induce your cells – once they take up the mRNA – to produce spike proteins. The study indicated zero re-infection. Unless they never left the house again that would be nearly impossible to avoid. 
The crucial issue here is sample selection. The participants essentially self-selected, but that is a potential source of huge bias. As a hypothetical scenario, if the people who chose to participate were predominantly people who’d had a cold and wer (…) Check out Sweden. Natural immunity reins! 
Yes figures and tables would be nice … they do not provide what extra level of protection you get if you have had sars cov2 and then get vaccinate … is it measured in folds like the actual infected over the vaccinated or is it like 13%? So why are they recommending a booster? 
I think you’re adjusting the 0.85% the wrong way on each point you’ve raised. Cellex’s FDA EUA letter for its rapid antibody test represented that “IgM antibodies to SARS-CoV-2 are generally detectable in blood several days after initial infection.” One of the points I see in this study is that there is no reason for discrimination against previously infected people who opt out of getting the vaccine. Why ostracize people who are a lower risk for spreading the virus in favor of those who’ve only (…) 

5.3. Analysis of the Comments

The comments can now be further evaluated in terms of content. Initially, the question of interest is whether the comments resemble a peer review. Spezi et al. (2018) and Wakeling et al. (2020) have developed an approach to assess whether community-based evaluations of scientific papers exhibit features of conventional peer reviews: Novelty, Significance, Relevance, and Soundness. The data indicate a clear pattern (Table 3): Peers within biomedicine consistently highlight these peer review criteria more often than their external counterparts. For instance, while only 12% of the “extra” group commented on the Novelty of a preprint, a substantial 24% of the “intra” group did. Similarly, Significance was mentioned by just 14% of the “extra” group, but was referenced by 32% of the “intra” group. The same trend is evident for Relevance and Soundness, with 15% vs. 29% and 19% vs. 38% mentions respectively between the two groups. This pronounced difference underscores a heightened emphasis or awareness among biomedicine peers toward these criteria when evaluating and commenting on preprints, compared to individuals outside the biomedicine realm.

Table 3.

Distribution of comments matching peer review criteria

CriterionResultGroup
extraintra
Novelty not found 1,132 (88.09%) 630 (76.09%) 
mentioned 153 (11.91%) 198 (23.91%) 
Significance not found 1,111 (86.46%) 566 (68.36%) 
mentioned 174 (13.54%) 262 (31.64%) 
Relevance not found 1,095 (85.21%) 592 (71.50%) 
mentioned 190 (14.79%) 236 (28.50%) 
Soundness not found 1,043 (81.17%) 515 (62.20%) 
mentioned 242 (18.83%) 313 (37.80%) 
CriterionResultGroup
extraintra
Novelty not found 1,132 (88.09%) 630 (76.09%) 
mentioned 153 (11.91%) 198 (23.91%) 
Significance not found 1,111 (86.46%) 566 (68.36%) 
mentioned 174 (13.54%) 262 (31.64%) 
Relevance not found 1,095 (85.21%) 592 (71.50%) 
mentioned 190 (14.79%) 236 (28.50%) 
Soundness not found 1,043 (81.17%) 515 (62.20%) 
mentioned 242 (18.83%) 313 (37.80%) 

Now, text mining will be used to delve deeper into the specific content of the comments. Comparing the most frequent (stemmed) words relative to all words used in a group in the “extra” and “intra” groups, we notice overlapping trends as well as distinct differences (Figure 3). Both groups prioritize discussions on vaccines, infections, and studies. For the “extra” group, the word “vaccin” has the highest relative frequency at 2.3%, followed by “infect” and “studi” at 1.6%. Meanwhile, in the “intra” group, “infect” takes the lead at 1.6%, followed closely by “vaccin” and “studi” at 1.5% and 1.4%. This suggests a shared focus on these topics. However, the “extra” group also places a notable emphasis on “covid,” “peopl,” and “test,” with frequencies of 1.4%, 1.2%, and 1.1%. In contrast, the “intra” group has a slightly higher interest in “test” at 1.4%. Overall, while there’s a shared thematic interest in vaccines, infections, and studies between the two groups, nuanced differences highlight unique priorities within each group. This suggests variations in the thematic focus or the depth of content between the two groups.

Figure 3.

Relative frequency of used words in comments comparing “intra” and “extra” groups. Note: All words that were used to distinguish “intra” and “extra” group are excluded.

Figure 3.

Relative frequency of used words in comments comparing “intra” and “extra” groups. Note: All words that were used to distinguish “intra” and “extra” group are excluded.

Close modal

Looking further into relative frequencies of both groups, a keyness analysis can provide more insights (Figure 4). Keyness analysis is a statistical method used to determine the words that are particularly characteristic or key to a specific corpus compared to another reference corpus. It identifies terms that are overrepresented in one data set relative to another, thus providing insights into the thematic distinctiveness of corpora (Bondi & Scott, 2010). The computation was done using the R package quanteda (Benoit, Watanabe et al., 2018). Here (Figure 4), terms such as “data*,” “antibodi*,” “bias,” and “analysi*” stood out with significantly higher frequencies in the target group, having a χ2 value greater than 50 and a p-value of essentially zero, indicating that these terms are uniquely characteristic of the “intra” group. Conversely, terms such as “vaccine*,” “death*,” and “risk*” had negative χ2 values below –20, suggesting that although they were frequently used in both groups, they appeared disproportionately more in the “extra” group than would be expected by chance, given their frequency in the “intra” group. Thus, both groups seem to talk about different topics.

Figure 4.

Keyness analysis of frequent words comparing “intra” and “extra” groups.

Figure 4.

Keyness analysis of frequent words comparing “intra” and “extra” groups.

Close modal

The topics discussed in the comments will now be looked at more closely. Latent Dirichlet Allocation (LDA) will be used for this (see Section 4.2). LDA is a probabilistic topic modeling technique that uncovers hidden thematic structures within a large collection of documents. It assumes that documents are mixtures of topics, and topics are mixtures of words, and then iteratively refines topic-document and word-topic assignments to best capture the observed documents. In the LDA statistics, the beta value represents the distribution of words over topics, while the gamma value indicates the distribution of topics over documents.

The LDA analysis of the comments on preprint pages revealed six distinct topics (Figure 5). Topic 1, titled “Vaccine Efficacy,” centralizes on vaccine effectiveness and the societal response, marked by keywords such as “vaccine,” “people,” and “study.” Topic 2, “Vaccine Demographics,” dives into understanding the demographic patterns of those vaccinated and possible outcomes, underscored by terms such as “vaccinated,” “children,” and “women.” Topic 3, “Post-Vaccine Mortality,” underscores the consequences postvaccination, with terms such as “results,” “deaths,” and “mortality” standing out. Topic 4, “Transmission & Immune,” focuses on aspects of infection transmission and immunity, highlighted by words such as “cov,” “sars,” and “infection.” Topic 5, “Epidemic Metrics,” illuminates metrics used to understand the epidemic’s spread and scale, distinguished by keywords such as “data,” “infected,” and “rate.” Last, topic 6, “Diagnostic & Immunity,” draws attention to diagnostic tools, potential biases, and the role of antibodies, as evinced by words such as “analysis,” “antibodies,” and “test.”

Figure 5.

Identified topics with most frequent words (LDA topic modeling).

Figure 5.

Identified topics with most frequent words (LDA topic modeling).

Close modal

The gamma statistics reveal distinct topic preferences between the “extra” and “intra” groups (Figure 6). The “extra” group displays a pronounced inclination towards topic 1 “Vaccine Efficacy” (γ = 0.4406) and topic 5 “Epidemic Metrics” (γ = 0.2479), whereas these topics have lesser presence in the “intra” group with γ-values of 0.2122 and 0.2235, respectively. On the other hand, the “intra” group exhibits a stronger affinity for topic 6 “Diagnostic & Immunity” (γ = 0.2455) and topic 4 “Transmission & Immune” (γ = 0.1925). In contrast, these topics are minimally present in the “extra” group with gamma values of 0.0008 and 0.0009 respectively. It’s also noteworthy that topic 2 “Vaccine Demographics” has minimal presence in the “intra” group with a γ-value of 0.0006, while the “extra” group has a moderate γ-value of 0.1368 for the same topic.

Figure 6.

Popular topics by group.

Figure 6.

Popular topics by group.

Close modal

Is there another differentiation regarding the tone of the comments? For this purpose, a sentiment analysis was conducted using the Bing method (Hu & Liu, 2004). This compares the existing words with its own corpus, in which positive or negative emotions are assigned to them. It should be noted that the topic is already biased towards negative terms, as COVID-19 is a serious disease. Of the 186,118 words in the comments, 12,124 in the Bing corpus were classified with an emotion. Both groups differ only slightly in the ratio of positive and negative emotions (Table 4). The “intra” group has 60% negative and 40% positive sentiments (n = 7,429 words). In the “extra” group, it is 60% negative and 40% positive (n = 4,695 words).

Table 4.

Distribution of positive and negative sentiments across groups

CriterionResultGroup
extraintra
Sentiment (Bing method) Negative 2,737 (58.30%) 4,484 (60.36%) 
Positive 1,958 (41.70%) 2,945 (39.64%) 
CriterionResultGroup
extraintra
Sentiment (Bing method) Negative 2,737 (58.30%) 4,484 (60.36%) 
Positive 1,958 (41.70%) 2,945 (39.64%) 

The result is somewhat more differentiated when looking at which terms each group uses in a positive or negative context (Figure 7). The “intra” and “extra” groups both predominantly feature the words “infection” and “infected,” indicating overlapping areas of interest. Notably, the “extra” group has slightly higher relative frequencies for these words, with “infection” at 0.1548 (relative frequency among all identified words with sentiments) compared to 0.1368 in the “intra” group and “infected” at 0.1149 against 0.1039. This suggests a slightly heightened emphasis on these concerns in the “extra” group. Words such as “symptoms,” “virus,” and “death” do appear in both groups; however, while “virus” and “death” have a higher frequency in the “extra” group, “symptoms” is more frequent in the “intra” group. Interestingly, the “intra” group displays a mix of positive terms such as “positive,” “like,” and “important,” hinting at a diverse emotional tone in their discussions. While both groups exhibit a blend of sentiments, the “intra” group demonstrates a richer assortment of sentiment-indicative words. Although the sentiment leans negative for both groups, the “extra” group might have a slightly more pronounced negative sentiment, given the prominence and relative frequencies of negative terms compared to the “intra” group.

Figure 7.

Frequent positive and negative sentiments by word and group. Note: Blue bars indicate positive sentiment, red bars negative. Only words that occurred > 50 times are plotted and all words that were used to distinguish “intra” and “extra” group are excluded.

Figure 7.

Frequent positive and negative sentiments by word and group. Note: Blue bars indicate positive sentiment, red bars negative. Only words that occurred > 50 times are plotted and all words that were used to distinguish “intra” and “extra” group are excluded.

Close modal

5.4. Twitter Comments

A total of 333,872 tweets and retweets from 186,102 unique users were identified that referenced one of the preprints in this sample (77% of cases were retweets). On average, each user made 12.2 contributions (sd = 64.8), while the median was even lower. However, some very active users (n = 221) had more than 100 tweets. Each preprint had an average of 1,505 tweets (sd = 2595) and a median of 454 tweets. There is also a moderate positive correlation between the number of comments and the number of tweets (Spearman’s rank correlation, ρ = 0.513). That means that preprints with many comments tend to be discussed more on Twitter.

As with comments on the preprint pages, it was also checked here which posts used medical technical terms. In the end, this was found in only 14% (n = 45,474) of the total number, which is significantly less than in the preprint comment sections (61%). Two obvious explanations are that, on the one hand, Twitter posts were limited to 280 characters at that time, so there is significantly less room to use technical terms, and on the other hand, the Twitter medium (also because of the character limit) is designed to make user writing as concise and understandable as possible. This significantly limits the opportunities for a differentiated technical debate, except for multiple linked posts (“threads”), in which longer texts can be published by breaking them down into short segments through multiple replies to one’s own post.

The “intra” and “extra” groups both frequently discuss topics related to the pandemic, but their focal points exhibit noticeable differences (Figure 8). For instance, the word “vaccin” is more pronounced in the “extra” group, constituting 1.56% of all words in the text corpus compared to 1.55% in the “intra” group. Similarly, while the term “studi” is used more in the “extra” group (1.37%), the “intra” group also maintains a notable frequency (1.25%). Terms such as “covid” and “infect” are consistently present in both groups. Specifically, “extra” stresses “covid” slightly more (1.16% vs. 1.24%), whereas “intra” puts a bit more emphasis on “infect” (1.38% vs. 1.02%). Another term, “variant,” appears almost equally in both groups, with “intra” at 0.84% and “extra” at 0.71%. Interestingly, words such as “data,” “neutral,” and “preprint” are exclusively prominent in the “intra” group, while they don’t have a corresponding significant presence in the “extra” group, suggesting specific areas of discussion unique to each group. This distribution implies that while both groups discuss similar themes, they have varying areas of emphasis, highlighting the unique narratives or perspectives in each group.

Figure 8.

Word frequencies of tweets for both groups. Note: Only words that account for 0.4% of total words or higher are plotted and all words that were used to distinguish “intra” and “extra” group are excluded.

Figure 8.

Word frequencies of tweets for both groups. Note: Only words that account for 0.4% of total words or higher are plotted and all words that were used to distinguish “intra” and “extra” group are excluded.

Close modal

The tweets concerning COVID-related preprints offer a rich tapestry of discussions as highlighted by the Latent Dirichlet Allocation (LDA) topics (Figure 9). “Topic 1: COVID & Immunity Trends” accentuates overarching themes about the virus, shedding light on aspects such as the immediate community responses, immunity, and various strains of the virus such as “delta.” “Topic 2: Vaccine & Virus Variants” focuses on the widespread discourse regarding vaccines, touching on various infections and discussions on particular vaccine brands such as “pfizer.” “Topic 3: Safety & Public Responses” potentially hones in on prevention measures, the development of safety protocols, and public reception of these measures. The in-depth scientific discourse continues with “Topic 4: Antibodies & Data,” which centralizes discussions on the role of antibodies in fighting SARS-CoV-2 and the data surrounding it. “Topic 5: Preventative Measures” could be about different strategies to curb the spread of the virus, from the importance of masks to various research methodologies. Finally, “Topic 6: Vaccine Impact & Age” provides insights into the effectiveness of vaccines across different age groups, breakthrough cases, and the implications for the unvaccinated population. Collectively, these topics illuminate the intricate and multifaceted discussions surrounding the ongoing pandemic.

Figure 9.

Identified tweet topics with most frequent words (LDA topic modeling).

Figure 9.

Identified tweet topics with most frequent words (LDA topic modeling).

Close modal

Just like with the preprint comments, the topics on Twitter are distributed differently among the two groups (Figure 10). The intra-group primarily focused on topic 4 (“Antibodies & Data”), which dominated their discussions. In contrast, the extra-group had a broad spectrum of engagement, particularly on topics 1 (“COVID & Immunity Trends”), 2 (“Vaccine & Virus Variants”), 5 (“Preventative Measures”), and 6 (“Vaccine Impact & Age”). However, it’s evident that topics 3 (“Safety & Public Responses”) and 4 (“Antibodies & Data”) saw minimal engagement from the “extra” group. The group of tweeters that did not use the defined set of biomedical terms, the “extra” group, showcased diverse discussions in relation to the preprints, unlike the “intra” group which was predominantly centered around one topic.

Figure 10.

Popular topics in tweets by group.

Figure 10.

Popular topics in tweets by group.

Close modal

The sentiment analysis of tweets, segmented by groups “extra” and “intra,” reveals distinct emotional tendencies in the discourse surrounding the topic (Table 5). For the “extra” group, the majority of tweets (55.3%) conveyed negative sentiments, suggesting a prevailing tone of skepticism, concern, or criticism. In contrast, the “intra” group demonstrated a slightly more optimistic outlook with 50.9% of tweets categorized as positive. This group’s sentiment reflects a generally favorable or supportive perspective on the subject at hand. The marked difference between the two groups suggests disparate perceptions or reactions to the same topic.

Table 5.

Distribution of positive and negative sentiments in tweets across groups

CriterionResultGroup
extraintra
Sentiment (Bing method) Negative 27,371 (55.33%) 7,095 (49.11%) 
Positive 22,096 (44.67% 7,351 (50.89%) 
CriterionResultGroup
extraintra
Sentiment (Bing method) Negative 27,371 (55.33%) 7,095 (49.11%) 
Positive 22,096 (44.67% 7,351 (50.89%) 

Sentiment analysis of words used in tweets, categorized by the “extra” and “intra” groups, reveals significant variations in tone and subjects discussed (Figure 11). The discourse of the “extra” group leans towards negative sentiments, with words such as “die,” “infection,” virus,” “infected,” and “risk” emerging as prominent. Their relative frequencies vary, with “die” registering 0.0801 and “risk” at 0.0389. On the other hand, within the “intra” group, the term “infection” is highlighted with a striking relative frequency of 0.1849. The “extra” group also mentions positive terms such as “good,” “effective,” and “breakthrough,” but these appear less frequently compared to the negative ones. The “intra” group brings forth a mix of sentiments. While they do mention negative terms such as “virus” and “infected,” positive expressions such as “good,” “interesting,” and “protection” also stand out. In summary, while the “extra” group leans more towards negative sentiments, the “intra” group offers a diversified sentiment palette, showcasing both positive and negative aspects.

Figure 11.

Frequent positive and negative sentiments in tweets by word and group. Note: Blue bars indicate positive sentiment, red bars negative. Only words with relative frequency above 0.02 are plotted and all words that were used to distinguish “intra” and “extra” group are excluded.

Figure 11.

Frequent positive and negative sentiments in tweets by word and group. Note: Blue bars indicate positive sentiment, red bars negative. Only words with relative frequency above 0.02 are plotted and all words that were used to distinguish “intra” and “extra” group are excluded.

Close modal

5.5. Comparison with Published Preprints

What about the preprints that were eventually published? From the 222 preprints, a journal publication URL could be retrieved from 155 preprints. Those papers obviously were not stored on the preprint servers and, thus, no comments could be observed. However, Altmetrics data delivered 187,792 tweets that referenced these papers, which allows for some comparisons (see Supplementary material for more details):

  • Activity: On average, each paper received 1,212 tweets (sd = 2,339) and a median of 197.

  • Grouping: Only 7,9% of the tweets (n = 14,859) matched the criteria for the “intra” group, whereas 92% (n = 172,933) were assigned to the “extra” group.

  • Sentiments: Distributions match those from the preprint related tweets with 44% of negative sentiments in the “intra” group and 55% for the “extra” group.

Published preprints receive a notably higher number of comments (n = 1,897 or 89%) compared to not yet published preprints (n = 224 or 11%). When analyzing the proportion of comments, published preprints have a larger share of comments from the “extra” group at 54.1%, whereas unpublished preprints have a slightly smaller share at 47.3%. Shifting the lens to tweets, published preprints have garnered 302,220 tweets, vastly outnumbering the 83,585 tweets for unpublished preprints. Hence, discussions on preprints that were published later were not only significantly more extensive, they also attracted more nonpeers.

The mean comment count further accentuates this difference, showcasing an average of 11.7 (sd = 64) comments for published preprints, over three times the average for unpublished ones, which is at 3.54 (sd = 9 .1). The average tweet count for published preprints stands at 1,877 (sd = 5,388), which is notably higher than the 1,370 (sd = 2,455) for unpublished ones. The sizable standard deviation for published preprints alludes to a wide dispersion in the tweet count for published preprints.

6.1. Evaluation of Hypotheses

In this section, we will discuss the outcomes of our hypotheses as they relate to the data analyzed.

  • (H1a) 

    Preprints are relevant to professional and lay audiences. The data presents a clear indication of interest from both professional and lay audiences. This was evident from the substantial amount of comments and tweets related to preprints, notably higher for those that were eventually published. The variance between the “intra” and “extra” groups in terms of participation and sentiments further affirms this. The activity level on preprints that were subsequently published showcases extensive discourse, especially from the “extra” group, which consists of nonpeers. Therefore, in light of the analyzed data, the relevance of preprints to both professional and lay audiences is strongly supported.

  • (H1b) 

    Professional and lay audiences have different interests in preprints. Our data analysis reflects distinct thematic and topical preferences between professional and lay audiences. The “extra” group emphasized terms such as “covid,” “people,” and “immune,” manifesting a more generalized concern, while the “intra” group leaned towards specialized terms such as “test.” Furthermore, the topic extraction showed the “extra” group’s preference for broader themes such as “Vaccines & Social Factors,” juxtaposed against the “intra” group’s inclination towards data-centric topics. This difference in thematic focus confirms that professional and lay audiences indeed engage with preprints in distinct manners, resonating with the theoretical underpinnings of boundary stabilization.

  • (H1c) 

    Faster dissemination can be problematic. The distinct themes and content criteria among the groups might hint at potential areas of misunderstanding. While the data do not directly showcase problematic preprints or misinformation, the varying themes underscore potential pitfalls. The “extra” group’s broader thematic focus might, in a faster dissemination context, lead to misunderstandings or misinterpretations, especially if nuances aren’t properly conveyed. Thus, the hypothesis finds indirect support in the data, highlighting the nuances and potential challenges of rapid scientific communication in a digital era.

  • (H2) 

    Comments on preprints can improve the peer review process. Published preprints witnessed a significantly higher volume of comments, predominantly from the “extra” group. This suggests a potential enriching of content through diverse perspectives. Moreover, the substantial social media and news outlet reception for preprints with increased comments might hint at their amplified relevance or improved quality, as they may have incorporated feedback from these comments. While it is an indirect approach, the positive reception of preprints with extensive comments can be seen as an endorsement of the value added by early feedback in the peer review process.

  • (H3) 

    Comments on preprints strengthen communication between scientists and the public. The vast difference in comments between published and unpublished preprints and the substantial presence of “extra” group comments indicate that preprints do serve as a platform fostering dialog between scientists and the public. Furthermore, the range of topics and language styles in the comments showcases an active exchange of ideas, suggesting that the platform does not just act as a passive medium but stimulates engagement and conversation.

  • (H4) 

    Comments on preprints promote interdisciplinary collaboration among scientists. The diverse array of topics extracted, coupled with a significant fraction of comments being academically formulated by nonspecialists, underscores interdisciplinary dialog. The distinction between comments discussing statistical topics (potentially interdisciplinary) versus those focusing on medical topics (disciplinary) hints at a cross-field collaboration and knowledge exchange.

In summary, the hypotheses find substantial support in the data. Preprints emerge not just as mere scientific drafts but as dynamic platforms fostering diverse dialogs, interdisciplinary collaborations, and public engagement. The challenges and nuances of rapid dissemination are evident, but the overarching narrative positions preprints as pivotal boundary objects in contemporary scientific communication.

6.2. Contributions to the Existing Literature

The role of preprints for scientific communication has been a focal point of contemporary research. In the context of this study, the evaluation of the formulated hypotheses, vis-à-vis extant literature, offers a nuanced understanding of preprints as both instruments of boundary blurring and stabilization.

The results resonate with Fraser et al. (2021) and Kirkham et al. (2020), suggesting that while preprints enhance rapid dissemination, they do not necessarily confuse scientific findings with public opinions. However, with preprints gaining traction in digital media, as underscored by Fabiano et al. (2020) and Widener (2020), there is an evident fusion of scientific dialog with public conversations. Consistent with Weingart’s (2012) observations, the commenting feature in preprint platforms does foster direct public–scientist interactions. The results indicate these interactions may lead scientists to align research outcomes more with public sentiment. These points highlight a potential blurring of the boundary between science and public.

Grounding our findings in Star and Griesemer’s (1989) conceptualization, preprints exhibit the attributes of boundary objects. They maintain scientific rigor while being decipherable to a broader audience (e.g., by dissemination via Twitter), reiterating their dual utility. Our findings echo the multifaceted adaptability of preprints. While they are seeds of academic deliberation for researchers, the public views them as informative focal points. This adaptability aligns with the perspectives of Nosek et al. (2015) on their potential to enhance the peer-review process. The results further highlight preprints as catalysts for interdisciplinary dialogue, a sentiment shared by Borgman (2007). They seem to foster discussions that transcend disciplinary silos, contributing to a richer scientific discourse.

The study underscores the intricate role of preprints in modern scholarly communication. While they exhibit the attributes of boundary blurring, they simultaneously act as stabilizers, reinforcing traditional scientific norms even as they adapt to contemporary communication paradigms. This dynamic dual role situates preprints as pivotal players in the continuum of academic discourse, reinforcing and redefining how science interfaces with diverse stakeholders.

The study mainly used quantitative methods to evaluate the content of comments and tweets, which can be a limiting factor. Qualitative methods are less prone to error when several people evaluate the same texts, and intercoder analyses are performed. To do justice to this circumstance, group assignments and peer-review criteria were made on manually chosen terms rather than automated procedures. It is conceivable that more complex machine learning procedures would lead to even better classifications, but the results obtained are plausible nevertheless.

Four central facets emerge from our exploration into the dynamic world of preprints, particularly in the context of the COVID-19 pandemic:

  1. The traction preprints have gained among nondisciplinary entities accentuates their influence and penetration in nondisciplinary subpublics.

  2. The reception of preprints is not restricted to the scientific community. Both nondisciplinary and nonscientific subpublics are now stakeholders in the preprint ecosystem.

  3. Not all preprints garner equal attention, pointing towards selective consumption patterns.

  4. Different factions involved in the preprint reception exhibit diverse thematic focuses and interpretations.

The COVID-19 era witnessed an upsurge in the acceptance and reception of preprints. Traditionally a conduit for sharing scientific findings pre-peer review, the pandemic has recast preprints as arenas for public discourse. However, a glaring absence is a systematic methodology to infuse public discussions on preprints into their review and enhancement processes. Consequently, preprints, in their present state, do not diminish scientific quality benchmarks. They continue to accentuate the demarcation between scientific and nonscientific realms. Yet, there is a growing confluence between the spheres of internal scientific dialog and broader science communication. Conversations on unreviewed research are not confined to expert circles anymore. They now extend to interdisciplinary and transdisciplinary cohorts, and intriguingly, the general populace.

However, a caveat remains. The public’s intrigue with preprints is often tethered to prevailing events, as exemplified by the COVID-19 crisis. This presents dual challenges for preprint archive custodians. First, there is a pressing need to ingeniously weave discussions on these platforms into the quality enhancement spectrum, ensuring an open participatory peer review paradigm without compromising on scientific rigor. Second, moving beyond the ephemeral allure of topical issues, there is a mandate to cultivate an enduring community of scholars and invested nonexperts. This collective should be poised to critically and constructively navigate topics based on their novelty and innovation quotient, rather than being swayed solely by the immediacies of the contemporary.

I would like to express my gratitude to the peer reviewers whose contributions through critical evaluation, insightful comments, and constructive feedback have been invaluable to the enhancement of this paper

The author has no competing interests.

The study was partly conducted as part of the project “Under the radar: The dynamics of multidirectional science communication in times of Corona” funded by the Volkswagen Foundation as part of the research initiative “Corona Crisis and Beyond—Perspectives for Science, Scholarship and Society”.

The data and R scripts used for this study are publicly available at https://osf.io/72a8j/.

1

https://covid19publikationsliste.smc.page/ (accessed: February 2, 2022).

2

https://github.com/CIDO-ontology/cido (last accessed September 1, 2023)

3

GPT-4 specifically refers to the Large Language Model ChatGPT (Pro) by OpenAI.

4

Control parameters set for Gibbs method: iterations = 2,000, burn-in = 0, thinning = 0.

Abdill
,
R. J.
, &
Blekhman
,
R.
(
2019
).
Tracking the popularity and outcomes of all bioRxiv preprints
.
eLife
,
8
,
e45133
. ,
[PubMed]
Akkerman
,
S. F.
, &
Bakker
,
A.
(
2011
).
Boundary crossing and boundary objects
.
Review of Educational Research
,
81
(
2
),
132
169
.
Aretz
,
H.-J.
(
2022
).
Neofunktionalismus und autopoietische systemtheorie
. In
H.-J.
Aretz
(Ed.),
Funktionalismus und Neofunktionalismus
(pp.
1057
1166
).
Springer VS
.
Benoit
,
K.
,
Watanabe
,
K.
,
Wang
,
H.
,
Nulty
,
P.
,
Obeng
,
A.
, …
Matsuo
,
A.
(
2018
).
Quanteda: An R package for the quantitative analysis of textual data
.
Journal of Open Source Software
,
3
(
30
),
774
.
Bezjak
,
S.
,
Clyburne-Sherin
,
A.
,
Conzett
,
P.
,
Fernandes
,
P.
,
Görögh
,
E.
, …
Heller
,
L.
(
2018
).
Open science training handbook
.
Zenodo
.
Blei
,
D. M.
,
Ng
,
A. Y.
, &
Jordan
,
M. I.
(
2003
).
Latent Dirichlet allocation
.
Journal of Machine Learning Research
,
3
,
993
1022
.
Bohannon
,
J.
(
2013
).
Who’s afraid of peer review?
Science
,
342
(
6154
),
60
65
. ,
[PubMed]
Bondi
,
M.
, &
Scott
,
M.
(Eds.) (
2010
).
Keyness in texts
.
John Benjamins Pub. Co
.
Borgman
,
C. L.
(
2007
).
Scholarship in the digital age: Information, infrastructure, and the Internet
.
Cambridge, MA
:
MIT Press
.
Bornmann
,
L.
, &
Haunschild
,
R.
(
2018
).
Do altmetrics correlate with the quality of papers? A large-scale empirical study based on F1000Prime data
.
PLOS ONE
,
13
(
5
),
e0197133
. ,
[PubMed]
Carlile
,
P.
(
2002
).
A pragmatic view of knowledge and boundaries: Boundary objects in new product development
.
Organization Science
,
13
(
4
),
442
457
.
de Silva
,
P. U. K.
, &
Vance
,
C. K.
(
2017
).
Preserving the quality of scientific research: Peer review of research articles
. In
P. U. K
de Silva
&
C. K.
Vance
(Eds.),
Fascinating life sciences. Scientific scholarly communication
(pp.
73
99
).
Cham
:
Springer International
.
Desjardins-Proulx
,
P.
,
White
,
E. P.
,
Adamson
,
J. J.
,
Ram
,
K.
,
Poisot
,
T.
, &
Gravel
,
D.
(
2013
).
The case for open preprints in biology
.
PLOS Biology
,
11
(
5
),
e1001563
. ,
[PubMed]
Dwan
,
K.
,
Altman
,
D. G.
,
Arnaiz
,
J. A.
,
Bloom
,
J.
,
Chan
,
A.-W.
, …
Williamson
,
P. R.
(
2008
).
Systematic review of the empirical evidence of study publication bias and outcome reporting bias
.
PLOS ONE
,
3
(
8
),
e3081
. ,
[PubMed]
Fabiano
,
N.
,
Hallgrimson
,
Z.
,
Kazi
,
S.
,
Salameh
,
J.-P.
,
Wong
,
S.
, …
McInnes
,
M. D. F.
(
2020
).
An analysis of COVID-19 article dissemination by Twitter compared to citation rates
.
MedRxiv
.
Fecher
,
B.
, &
Friesike
,
S.
(
2014
).
Open Science: One term, five schools of thought
. In
S.
Bartling
&
S.
Friesike
(Eds.),
Opening science
(pp.
17
47
).
Cham
:
Springer International
.
Ford
,
E.
(
2013
).
Defining and characterizing open peer review: A review of the literature
.
Journal of Scholarly Publishing
,
44
(
4
),
311
326
.
Fraser
,
N.
,
Brierley
,
L.
,
Dey
,
G.
,
Polka
,
J. K.
,
Pálfy
,
M.
, …
Coates
,
J. A.
(
2021
).
The evolving role of preprints in the dissemination of COVID-19 research and their impact on the science communication landscape
.
PLOS Biology
,
19
(
4
),
e3000959
. ,
[PubMed]
Fraser
,
N.
,
Momeni
,
F.
,
Mayr
,
P.
, &
Peters
,
I.
(
2019
).
The effect of bioRxiv preprints on citations and altmetrics
.
Quantitative Science Studies
,
1
(
2
),
618
638
.
Greene
,
D.
,
Richardson
,
S.
, &
Turro
,
E.
(
2017
).
ontologyX: A suite of R packages for working with ontological data
.
Bioinformatics
,
33
(
7
),
1104
1106
. ,
[PubMed]
Griffiths
,
T. L.
, &
Steyvers
,
M.
(
2004
).
Finding scientific topics
.
Proceedings of the National Academy of Sciences
,
101
(
suppl_1
),
5228
5235
. ,
[PubMed]
Grün
,
B.
, &
Hornik
,
K.
(
2011
).
Topicmodels: An R package for fitting topic models
.
Journal of Statistical Software
,
40
(
13
),
1
30
.
Harrison
,
J.
(
2022
).
RSelenium: R bindings for ‘Selenium WebDriver’
. https://docs.ropensci.org/RSelenium/
He
,
Y.
,
Yu
,
H.
,
Ong
,
E.
,
Wang
,
Y.
,
Liu
,
Y.
, …
Smith
,
B.
(
2020
).
CIDO, a community-based ontology for coronavirus disease knowledge and data integration, sharing, and analysis
.
Scientific Data
,
7
(
1
),
181
. ,
[PubMed]
Helmer
,
M.
,
Schottdorf
,
M.
,
Neef
,
A.
, &
Battaglia
,
D.
(
2017
).
Gender bias in scholarly peer review
.
eLife
,
6
,
e21718
. ,
[PubMed]
Hu
,
M.
, &
Liu
,
B.
(
2004
).
Mining and summarizing customer reviews
. In
W.
Kim
(Ed.),
Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(pp.
168
177
).
ACM
.
Ioannidis
,
J. P. A.
(
2005
).
Why most published research findings are false
.
PLOS Medicine
,
2
(
8
),
e124
. ,
[PubMed]
Kirkham
,
J. J.
, &
Moher
,
D.
(
2018
).
Who and why do researchers opt to publish in post-publication peer review platforms? Findings from a review and survey of F1000 Research
.
F1000Research
,
7
,
920
. ,
[PubMed]
Kirkham
,
J. J.
,
Penfold
,
N. C.
,
Murphy
,
F.
,
Boutron
,
I.
,
Ioannidis
,
J. P.
, …
Moher
,
D.
(
2020
).
Systematic examination of preprint platforms for use in the medical and biomedical sciences setting
.
BMJ Open
,
10
,
e041849
. ,
[PubMed]
Kodvanj
,
I.
,
Homolak
,
J.
,
Virag
,
D.
, &
Trkulja
,
V.
(
2022
).
Publishing of COVID-19 preprints in peer-reviewed journals, preprinting trends, public discussion and quality issues
.
Scientometrics
,
127
(
3
),
1339
1352
. ,
[PubMed]
Kwon
,
D.
(
2020
).
How swamped preprint servers are blocking bad coronavirus research
.
Nature
,
581
(
7807
),
130
131
. ,
[PubMed]
Landis
,
J. R.
, &
Koch
,
G. G.
(
1977
).
The measurement of observer agreement for categorical data
.
Biometrics
,
33
(
1
),
159
174
. ,
[PubMed]
Lee
,
C. J.
,
Sugimoto
,
C. R.
,
Zhang
,
G.
, &
Cronin
,
B.
(
2013
).
Bias in peer review
.
Journal of the American Society for Information Science and Technology
,
64
(
1
),
2
17
.
Luhmann
,
N.
(
1990
).
Essays on self-reference
.
Columbia University Press
.
Luhmann
,
N.
(
1992
).
Die Wissenschaft der Gesellschaft
.
Suhrkamp
.
Luhmann
,
N.
(
1993
).
Das Recht der Gesellschaft
.
Suhrkamp
.
Luhmann
,
N.
(
1995
).
Social systems
.
Stanford University Press
.
Luhmann
,
N.
(
1997
).
Die Gesellschaft der Gesellschaft
(Bd.1)
.
Suhrkamp
.
Luhmann
,
N.
(
1999 [1964]
).
Funktionen und Folgen formaler Organisation
(5. Aufl.)
.
Duncker & Humblot
.
Miner
,
A. S.
,
Stewart
,
S. A.
,
Halley
,
M. C.
,
Nelson
,
L. K.
, &
Linos
,
E.
(
2023
).
Formally comparing topic models and human-generated qualitative coding of physician mothers’ experiences of workplace discrimination
.
Big Data & Society
,
10
(
1
).
Mirowski
,
P.
(
2018
).
The future(s) of open science
.
Social Studies of Science
,
48
(
2
),
171
203
. ,
[PubMed]
Nosek
,
B. A.
,
Alter
,
G.
,
Banks
,
G. C.
,
Borsboom
,
D.
,
Bowman
,
S. D.
, …
Yarkoni
,
T.
(
2015
).
Promoting an open research culture
.
Science
,
348
(
6242
),
1422
1425
. ,
[PubMed]
Polka
,
J. K.
, &
Penfold
,
N. C.
(
2020
).
Biomedical preprints per month, by source and as a fraction of total literature
.
Zenodo
.
Powell
,
K.
(
2016
).
Does it take too long to publish research?
Nature
,
530
(
7589
),
148
151
. ,
[PubMed]
Priem
,
J.
, &
Hemminger
,
B. M.
(
2012
).
Decoupling the scholarly journal
.
Frontiers in Computational Neuroscience
,
6
,
19
. ,
[PubMed]
Ross-Hellauer
,
T.
, &
Görögh
,
E.
(
2019
).
Guidelines for open peer review implementation
.
Research Integrity and Peer Review
,
4
,
4
. ,
[PubMed]
Schroter
,
S.
,
Black
,
N.
,
Evans
,
S.
,
Godlee
,
F.
,
Osorio
,
L.
, &
Smith
,
R.
(
2008
).
What errors do peer reviewers detect, and does training improve their ability to detect them?
Journal of the Royal Society of Medicine
,
101
(
10
),
507
514
. ,
[PubMed]
Smith
,
R.
(
2006
).
Peer review: A flawed process at the heart of science and journals
.
Journal of the Royal Society of Medicine
,
99
(
4
),
178
182
. ,
[PubMed]
Spezi
,
V.
,
Wakeling
,
S.
,
Pinfield
,
S.
,
Fry
,
J.
,
Creaser
,
C.
, &
Willett
,
P.
(
2018
).
“Let the community decide”? The vision and reality of soundness-only peer review in open-access mega-journals
.
Journal of Documentation
,
74
(
1
),
137
161
.
Star
,
S. L.
, &
Griesemer
,
J. R.
(
1989
).
Institutional ecology, “translations” and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39
.
Social Studies of Science
,
19
(
3
),
387
420
.
Stichweh
,
R.
(
2014
).
Differenzierung und Entdifferenzierung: Zur Gesellschaft des frühen 21. Jahrhunderts
.
Zeitschrift für Theoretische Soziologie
,
3
(
1
),
8
19
.
Tacke
,
V.
(
1997
).
Systemrationalisierung an ihren Grenzen—Organisationsgrenzen und Funktionen von Grenzstellen in Wirtschaftsorganisationen
. In
G.
Schreyögg
&
J.
Sydow
(Eds.),
Gestaltung und Organisationsgrenzen
(pp.
1
44
).
De Gruyter
.
Tennant
,
J. P.
,
Crane
,
H.
,
Crick
,
T.
,
Davila
,
J.
,
Enkhbayar
,
A.
, …
Vanholsbeeck
,
M.
(
2019
).
Ten hot topics around scholarly publishing
.
Publications
,
7
(
2
),
34
.
Vicente-Saez
,
R.
, &
Martinez-Fuentes
,
C.
(
2018
).
Open Science now: A systematic literature review for an integrated definition
.
Journal of Business Research
,
88
,
428
436
.
Wakeling
,
S.
,
Willett
,
P.
,
Creaser
,
C.
,
Fry
,
J.
,
Pinfield
,
S.
, …
Medina Perea
,
I.
(
2020
).
“No comment”? A study of commenting on PLOS articles
.
Journal of Information Science
,
46
(
1
),
82
100
.
Weingart
,
P.
(
2012
).
The lure of the mass media and its repercussions on science
. In
S.
Rödder
,
M.
Franzen
, &
P.
Weingart
(Eds.),
The sciences’ media connection—Public communication and its repercussions
(pp.
17
32
).
Springer Netherlands
.
Wenger
,
E.
(
2000
).
Communities of practice and social learning systems
.
Organization
,
7
(
2
),
225
246
.
Wickham
,
H.
,
Averick
,
M.
,
Bryan
,
J.
,
Chang
,
W.
,
McGowan
,
L.
, …
Yutani
,
H.
(
2019
).
Welcome to the Tidyverse
.
Journal of Open Source Software
,
4
(
43
),
1686
.
Widener
,
A.
(
2020
).
Pandemic puts preprints first
.
C&EN Global Enterprise
,
98
(
22
),
16
19
.

Author notes

Handling Editor: Vincent Larivière

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data