In this paper, I examine journal peer review by focusing on factors that potentially hamper its sound functioning. I argue that scientific literature is not only skewed by the individual level biases of reviewers acting as gatekeepers, but also by the institutional context in which peer review operates. I show that peer review and its efficacy in improving the quality of published work should not be evaluated without heeding the different forms of academic publishing and the role that publishing plays in academic career development.
Journal peer review is an essential part of academic practices.1 But how well does it serve its purpose and which factors have an influence on how close it comes to achieving its aims? Peer review has been widely discussed in empirical literature: it has been studied both qualitatively (e.g., by Michèle Lamont, who in her 2009 book examines how reviewers evaluate academic excellence in the context of multidisciplinary evaluation panels, and by Liv Langfeldt who in her 2001 article studied which institutional factors influence the decision making of grant peer review panels) and quantitatively (e.g., by Cole, who in his 1992 book uses data on how grant applications submitted to National Science Foundation were reviewed to evaluate the levels of consensus in science). However, philosophical perspectives on the topic are scarce.2 This paper contributes to filling this gap in the literature by examining factors that may bias published literature and by questioning an understanding that, according to Lee et al. (2013), is common in empirical studies on peer review, namely that reviewer disagreement should be taken as a sign of bias.
As the following quotes demonstrate, peer review is taken to be a mechanism for ensuring the high quality of published work and making possible the give and take of constructive criticism:
The practice of peer review is to ensure that good science is published. It is an objective process at the heart of good scholarly publishing and is carried out on all reputable scientific journals. (Reviewer Policy Statement 2016)
In the most general sense, journal peer review is the formal expression of the principle that science works best in an environment of unrestrained criticism. (Rennie 2003, p. 7)
Despite its prominence, critics (e.g., Frey 2003; Smith 2006) have stated that peer review does not serve its goals in its current form. According to empirical studies (e.g., Rothwell and Martyn 2000; Bornmann and Daniel 2008; Bornmann, Mutz, and Daniel 2010), inter-rater reliability in peer review is low, i.e., reviewers tend to disagree on the quality of submissions. As Lee et al. (2013) note, in discussions concerning peer review this is often taken to be a sign of the presence of detrimental biases and to denote that in practice peer review does not achieve its goals. In addition, peer review is accused of both not upholding accurate standards of good research and maintaining standards too rigidly so that innovative proposals have unfairly low chances of succeeding (Eisenhart 2002, pp. 243–44). It has also been criticized for being a tool for stealing ideas (Smith 2006, p. 180), being slow and expensive (Rennie 2003, p. 9; Smith 2006, p. 179), and failing at detecting fraud and error (e.g., Smith 2006, p. 180; Shamoo and Resnik 2009, p. 118).
Even though philosophers of science often mention peer review as one of the mechanisms for improving the quality of research, the conditions for its proper functioning have attracted less attention. This is puzzling since peer review is so central to current scientific activities. In this paper, I ask which factors have an influence on whether the published literature is of high quality. I do this from a perspective that evaluates scientific practices as social processes, in which possible individual errors and biases can be disclosed by critical interactions (e.g., Longino 1990). The paper has two aims: First, I want to show that the ideal of so-called concordant objectivity (Douglas 2004), on which much of empirical research on peer review is based, is an insufficient tool for assessing whether the evaluation of submissions has been fair. In other words, I show that it is wrong to assume that reviewer disagreement is the sign of biases being present. Second, I examine how the institutional context of peer review influences the consequences of different individual level biases. My argument is that the problems of peer review cannot be reduced to individuals’ undue preferences only. Some of the defects of the system have to do with its current set-up and academic publishing in general. Certain institutional conditions, such as the publish or perish culture, the size of the pool of potential reviewers, and the number of prestigious journals in a field of study, can amplify the effect of biases that operate at the individual level.
My analysis has implications for more general debates on the objectivity of science: first of all, discussions on peer review underscore the practical implications of different understandings of the meaning of the term “objective,” and show how adopting a certain ideal of objectivity may result in some of the detrimental practices remaining invisible, or in condemning some practices that are not actually harmful. Furthermore, the fact that minimizing the effects of individual biases in peer review requires institution-level counteractions, gives support to the claim that the objectivity of science should not be reduced solely to the actions and intentions of individuals, but also to the institutional conditions of research.
Journal peer review can take different forms. For example, the identities of authors and reviewers can be either disclosed or concealed, editors may follow reviewers’ recommendations more or less closely, and review reports may be published with the articles or be available for the author only. Some journals also have the policy of sharing the reviews among reviewers, which gives them the possibility of evaluating their assessments according to the views of the other reviewers.3 All these factors have an influence on how the effects of individual level biases eventually become actualized in what is taken to be scientific knowledge. This multifacetedness of the process itself enables counteracting some of the unfairness that individual reviewers’ undue preferences might entail.4
Even though the focus of this article is on journal peer review, I shall also be referring to studies that have examined peer reviewing of grant applications or conference abstracts. This is because many of the studies trying to measure the prevalence of different forms of biases have been conducted on grant or conference peer reviews. I introduce their results in the context of discussing what I call individual level biases, i.e., biases that have an influence on how individual reviewers evaluate submissions. I posit that we can expect similar individual level biases to be at play when an individual evaluates grant applications, journal submissions, and conference abstracts. How these individual level biases then influence which projects are funded, which manuscripts are published, or which papers are presented at conferences can be different depending on the context of these peer review practices. I aim at demonstrating this with a special focus on the community level mechanisms that have an effect on the outcomes of journal peer review.
The argument will proceed in the following manner: in section 2, I outline the basics of journal peer review. Section 3 reviews previous literature on different biases in peer review. Here I introduce an understanding of bias that, according to Lee et al. (2013), underlies much empirical research on peer review. In this section, I do not yet question this understanding of what constitutes bias. After section 3, I move on to examining the issue of peer review and biases from the perspective of social epistemology. First, in subsection 4.1, I show that the concordant ideal of objectivity—on which many empirical studies on peer review seem to be based—is a problematic ideal for peer review. This also means that some of the factors mentioned in section 3 that are often labelled as causing biases should not be taken to be detrimental. Second, in subsection 4.2, I argue that what research is eventually published depends on both the preferences of reviewers and the common institutional practices of academia; the system in which peer review operates can either correct or amplify individual level biases. Conclusions are drawn in section 5.
2. The Basics of Journal Peer Review
Peer review has been defined as “an organized method for evaluating scientific work which is used by scientists to certify the correctness of procedures, establish the plausibility of results, and allocate scarce resources” (Chubin and Hackett 1990, p. 2). Even though the main objectives of peer review are not clearly defined (Overbeke and Wager 2003, p. 47), on the basis of the account cited in the introduction it could be stated that the aim of “peer review is to ensure that good science is published.”5 That is to say, when working successfully, this process should make sure that deserving pieces of work will get published and undeserving ones will be excluded. Additionally, peer review is expected to further improve the quality of published articles.
Even though different journals use different peer review systems, the basic forms of review are the following. In single-blinded review, the authors of evaluated submissions are not told the identities of reviewers. In double-blinded review, authors’ identities are also concealed. Even though the blinded forms of peer review are the most popular, open peer review, where the identities of all parties are known, exists, although it has not gained popularity. In addition to blind and open peer review there are hybrid systems. In these options for reviewing scientific work, community members can comment on the scrutinized piece either before the formal peer review (a priori peer review) or after the traditional review process (post peer review) (Lee et al. 2013, pp. 11–13). In all of the above-mentioned forms the evaluators should have expertise in the field to which the scrutinized work belongs, but should neither be too closely affiliated with the person(s) whose work is being evaluated nor with the agent ordering the review (Lee et al. 2013, p. 2); i.e., they should be free of conflicts of interests.
The use of peer review is supported by referring to the supposition that the criteria of “good research” are interpreted and applied impartially (Lee et al. 2013, p. 5). What these criteria are varies according to the discipline in question and the agent ordering the review: soundness, originality, scope, and clarity of submissions are among the most common characteristics that reviewers are advised to pay attention to. It is thought that in the peer review process, reviewers should be able to evaluate the quality of submissions and be impartial in voicing their interpretation of those qualities without paying attention to irrelevant factors such as authors’ gender or affiliations (Lee et al. 2013, p. 5). Thus, according to this understanding, in an ideal peer review process the reviewers would agree beforehand on a set of criteria, apply the criteria in the same ways, and the reviewers’ report would agree on the merits and weaknesses of the submission. However, as studies—many of which I refer to in the next section—have found that reviewers often disagree and that worries about different biases contaminating the process are common.6
3. Biases in Journal Peer Review
Despite widespread concerns about reviewers being biased, in empirical examinations of peer review there are multiple notions of what actually constitutes a bias (Langfeldt 2001, p. 821; Lee et al. 2013, p. 5). I shall now look at previously published literature on different types of biases that have been stated as damaging journal peer review (e.g., Godlee and Dickersin 2003; Rennie 2003; Wood and Wessely 2003; Lee et al. 2013). This brief review is not intended to give a detailed classification of all possible biases. Rather, I introduce different forms of reported biases in order to exemplify the undue preferences that may influence the process. Also, I do not yet take a stance on whether all of the alleged biases should really be called biases; this happens in the next section.
A common way of classifying biases in peer review is to make a distinction between 1) biases that are related to the characteristics of the authors of evaluated submissions, 2) biases that related to the characteristics of reviewers, and 3) biases that are related to the content of the evaluated submissions (e.g., Lee et al. 2013). I shall start with this three-fold distinction. In the empirical literature the first way of conceiving bias is to see it as a function of author characteristics (Lee et al. 2013, p. 6). Some of the author characteristics that have been mentioned as having an influence on review outcomes are as follows:
Bias has also been interpreted as a function of reviewer characteristics (Lee et al. 2013, p. 8). Leniency or strictness can naturally be an idiosyncratic feature of a reviewer, but in empirical studies, certain classes of reviewers have been observed to be more lenient in their evaluations. In the literature, the following characteristics of reviewers have been mentioned as having an influence on review outcomes:
The academic background of reviewers: members of some disciplinary groups tend to be stricter in their evaluations. For instance, philosophers seem to be stricter than psychologists (Lee and Schunn 2011, p. 361; Lee et al. 2013, p. 8).
American reviewers are more lenient than those of some other nationalities (Lee et al. 2013, p. 8).
Female reviewers tend to be stricter than their male counterparts (Lee et al. 2013, p. 8).
In grant peer review, reviewers who were suggested by the authors of evaluated manuscripts were more lenient than reviewers nominated by editors (Schroter, Tite, Hutchings, and Black 2006; Marsh, Jaysinghe, and Bond 2008, p. 163).
Yet another way of examining bias in peer review is to study whether a submission’s content has an influence on reviews, i.e., view bias as a partiality against or for a submission based on its content:
Non-English language manuscripts receive worse evaluations: when reviewers had to evaluate “two non-authentic, but realistic” manuscripts of the same quality, one written in English and the other in the native language of the reviewer, they tended to give more positive evaluations to manuscripts written in English (Nylenna, Riis, and Karlsson 1994).
Publication bias: studies with positive and statistically significant results are published more often than studies with inconclusive or negative results (Lee et al. 2013, p. 10).
Manuscripts that quote the reviewers’ work receive higher ratings, as do manuscripts that are in line with the reviewers’ own theoretical preferences (Lee et al. 2013, p. 8).
Interdisciplinary manuscripts receive harsher evaluations (Lee et al. 2013, p. 10).
In addition to the preferences and biases of reviewers, the editors’ interests naturally play a part in which papers are published. Editors not only decide whether or not to publish a manuscript based on peer review reports, they may also make an editorial decision for rejecting a manuscript without sending the text for external review. Furthermore, the referees are usually chosen by the editor(s) (Young, Ioannidis, and Al-Ubaydli 2008, p. 7). As Godlee and Dickersin note, making choices is part of editors’ job, and as such is not objectionable (2003, p. 93). The choices that are made can be based on valid preferences but they can also be influenced by similar biases as reviewers’ evaluations.
The existence of some of the above mentioned biases has been questioned. For instance, Lee et al. cite studies that have found no evidence for the existence of gender bias in journal or grant peer review (2013, p. 8). However, in order to make the argument at hand, we do not have to wait for decisive evidence on the issue. This is because the aim of this article is not to focus on any of the individual biases as such, but to examine the issue at a meta-level: how do different individual level biases interact with the institutional practices of peer review and academia in general?
4. Evaluating the Effects of Biases in Peer Review
In many social epistemological theories, individual level biases are not necessarily seen as detrimental to science. This is because science is taken to be a social process in which interactions between individuals and communities minimize the effects of individuals’ undue preferences. Also, when published research is evaluated, the focus is not on individual studies but on larger bodies of knowledge. For example, according to Helen Longino (1990, 2002), scientific claims need to be subjected to the widest possible range of criticism in order to control the effect of individuals’ biases on research outcomes. This means that the influence of individuals’ preferences on which views are eventually accepted can be kept in check when researchers with different viewpoints participate in discussions that take place under certain institutional conditions. Also, a certain amount of diversity of preferences is desirable, according to Longino; in a community in which points of view do not differ, some assumptions steering research may become invisible and thus immune to criticism. Likewise, according to Miriam Solomon’s social empiricism (2001), the rationality of research activities should be evaluated on the level of communities because individual level biases (or as she calls them, non-empirical decision vectors) can be epistemically beneficial as they contribute to distributing research efforts in a way that contributes to the goals of inquiry.
Peer review is considered to be one of the mechanisms through which the inadequacies of individuals’ reasoning can be removed. For instance, Longino takes it to be one of the necessary venues for criticism: “The function of peer review is not just to check that the data seem right and conclusions well-reasoned but to bring to bear another point of view on the phenomena, whose expression might lead the original author(s) to revise the way they think about and present their observations and conclusions” (Longino 1990, pp. 68–9). As this quote makes clear, according to Longino’s theory, the existence of different perspectives on the objects of study is invaluable: by evaluating her research from a new perspective, a researcher may realize that her original approach was based on questionable assumptions and, thus, will be able to improve it accordingly.
However, criticism in peer review functions differently than in research in general. In peer review, the dialogue between authors and reviewers is usually limited. If reviewers find a submission significantly lacking, the author seldom has an opportunity to engage in a debate in which she could try to defend the work. Instead, critical reviews tend to permanently forestall publication in the given journal.9 Even though an author can use the criticism to revise her manuscript and later submit it to another journal, the current publish and perish culture does not allow countless submission cycles, especially not at the early stages of an academic career. Thus, there are limits to peer review embodying the critical interaction that Longino seeks. This issue is discussed in more detail in subsection 4.2.
It seems unreasonable to assume that the above-mentioned individual level biases could be fully eliminated from peer review. However, their effects on review outcomes may be more or less severe depending on the conditions in which peer review takes place. Likewise, reviewers’ preferences concerning the substance of submissions naturally have some influence on their evaluations and it seems exorbitant to claim that all preferences should be denounced as harmful. For instance, editors and reviewers may legitimately favor original and particularly important pieces of research (Godlee and Dickersin 2003, p. 92). Also, journal level specialization in publishing articles with certain theoretical approaches may be epistemically beneficial. The question is, therefore, when is the influence of these preferences excessive to the extent that they constitute bias? For instance, under which conditions can a preference against a new theoretical approach be taken to be a detrimental bias against unorthodox work? Before focusing on this issue, however, I shall take a look at an assumption that, according to Lee et al. (2013), is implicit in much of the research on biases in peer review, namely that disagreement between reviewers is a sign of harmful biases.
4.1. Journal Peer Review and the Ideal of Concordant Objectivity
According to Lee et al. (2013, p. 5), different genres of empirical, quantitative research on peer review are based on the assumptions that impartial reviewers arrive at similar assessments of a given submission and, thus, that disagreement implies that biases have undermined the ability of reviewers to apply the evaluative criteria consistently. An important question to be tackled is whether we should hold on to this assumption while evaluating peer review practices. This question is particularly relevant with respect to those biases that are a function of reviewer characteristics, as identified above. In empirical studies, low inter-rater reliability and the observation that reviewers from different (e.g., discipline, nationality, gender) groups tend to hold different criteria of acceptable work, are often (e.g., Jayasinghe et al. 2003) taken to mark the existence of biases. Heather Douglas (2004, pp. 462–63) has called this understanding the ideal of concordant objectivity; according to this view, unanimity concerning a judgment among individuals denotes that the judgment has been objective.
As Douglas (2004, p. 453) has stated, establishing something as objective has strong rhetorical force: “I endorse this and so should you.” If a method is found to be objective, it is thought to produce results that we can trust and should base our actions on. However, there are several different senses in which the term “objective” can be used, and being objective in one sense does not guarantee that the process would qualify as being objective in other senses (Douglas 2004). Because different understandings of what constitutes objectivity direct our attention to evaluating different phases of processes, and because of the rhetorical force of calling something objective, the ideals of objectivity can have practical consequences. For example, if we adopt the so-called procedural ideal of objectivity (Douglas 2004, pp. 461–62), then we strive to develop procedures that obliterate the need for making judgments. The high evidential status of the results of meta-analyses in evidence-based medicine is at least partly based on this ideal. However, the procedural ideal does not help to evaluate all factors that are relevant to producing medical knowledge, and consequently, following this ideal may leave some problematic practices undetected (Jukola 2015).
In the context of peer review, editors are assumed to base their decision on reviewers’ recommendations and accept submissions only when they have received positive reports (e.g., Frey 2003). However, an editor who strove for procedural objectivity and followed this policy even in those cases in which the negative reports were clearly biased would act in a way that is detrimental to the goals of peer review. For instance, if we think that reviews should be based on the content of the submissions, it is not acceptable to make an editorial decision on the basis of reviews that recommend rejection on the grounds of the personal characteristics of an author. It therefore appears that, in the absence of universal assurance that the reviewers’ reports are not biased, procedural objectivity cannot be a universal ideal for editors.
As mentioned, empirical research on peer review seems to be based on the idea that the soundness of peer review practices should be evaluated on the basis of the ideal of concordant objectivity. Nonetheless, when applied to peer review, this ideal is problematic for two reasons. First of all, as noted by Lee (2012, p. 863) and Lee et al. (2013, p. 6), not all cases of reviewer disagreement are unjustified or problematic with respect to the goals of peer review. This is because reviewers may have appropriate reasons for applying or interpreting the evaluative criteria differently, in which case the difference of opinions is not caused by undue preferences (Lee 2012, p. 863). Additionally, even discordant review reports may help authors to improve their submissions and, thus, help to achieve one of the aims of the process. In this way a peer review process that does not qualify as objective in the sense of concordant objectivity could still contribute to the goals of peer review.
Second, an agreement between reviewers is not a guarantee of the absence of bias. As Douglas (2004, p. 463) has noted, a practice that produces results that are objective according to the concordant ideal, can be judged to be biased from the perspective of another ideal of objectivity; a situation is conceivable in which all reviewers share a bias, for example, against authors belonging to a certain social group, or against novel approaches. Therefore, a process that would qualify as objective in the sense of concordant objectivity could generate results that we would not be willing to call sound. It ought to be recognized that this is in line with Longino’s (1990, p. 80) remark on the possibility of certain problematic assumptions staying unquestioned in an overly unanimous community. The ideal of concordant objectivity as the guiding rule in evaluation of peer review practices should be abandoned. Following this ideal does not help us discern between warranted and unwarranted reasons for disagreements concerning the quality of a submission. In addition, it may also have consequences that are epistemically detrimental as it directs us away from considering the possibility of widespread biases in peer review.
It should also be noted that it is not clear that all of the biases mentioned in the category of biases as a function of reviewer characteristics result in low inter-rater reliability, particularly when the effect of the reviewers’ academic background is considered. First of all, as Lee et al. note (2013, p. 8), if submissions in a field of study are generally reviewed by experts working in the same field, then the degree to which manuscripts must fulfill the agreed criteria is the same for all submissions. In other words, even though different disciplines had distinct evaluation cultures, in the local context of a certain discipline the criteria would be applied consistently. The fact that certain reviewers as a group within a discipline tend to be less critical in their evaluations would skew the outcomes of peer review only if there were a mechanism that would assign members of this group to systematically evaluate submissions of a certain type and, thus, increase the likelihood of certain a group of manuscripts being accepted. Receiving reviews from a particularly strict reviewer may feel unfair from the perspective of an individual author. However, if manuscripts have as high chance of being reviewed by a more lenient referee as ending up being evaluated by a stricter one, the effects of the reviewers’ differing leniency levels should disappear at the population level.
Consequently, when discussing biases in peer review, it is important to note when their effects are examined on the level of individual submissions and when the focus is on the body of literature as a whole. Even though having a manuscript evaluated by an unacceptably partial reviewer is certainly unfortunate, at the level of the whole community the effect of a biased review is generally minute. The situation is more alarming if all, or a considerable portion of submissions, are evaluated in conditions under which biases have an influence on review outcomes. This is because what we call scientific knowledge is a combination of results from multiple studies, and an individual study alone seldom has the potency to radically alter what is considered to be the state of the field. For example, in evidence-based medicine, treatment guidelines are based on the amalgamated evidence from multiple studies (e.g., Howick 2011). Even if the publication of an individual article were delayed or, in the most unfortunate case, prevented, the practical and epistemic consequences of this omission would not be dire in most cases.10
The roots of disagreements between reviewers from different academic backgrounds can be better understood in the light of Michèle Lamont’s work (2009). In order to examine how multidisciplinary panels manage to find agreement on which applicants deserve to be funded, she observed and interviewed grant review panels that were responsible for allocating funding for social sciences and humanities in the United States. Lamont studied how the criteria of good research are interpreted and how criteria, such as originality and significance, are weighted by members of different disciplinary groups. Lamont’s key finding was that the differences in reviewers’ preferences concerning which projects to fund can be understood as an outcome of differences in epistemic styles, i.e., different preferences for how to produce knowledge and how to (or whether to) test theories. According to her, diverging theoretical and methodological preferences result in distinct definitions of quality. (Lamont 2009, 54; see also Mallard, Lamont and Guetzkow 2009.)
Lamont‘s research shows how, contrary to the ideal of concordant objectivity, disagreements can appear even if discussants have legitimate reasons for their differing views. These conclusions are in line with Longino’s theory on epistemic pluralism. Longino (2002) has argued that research communities have their own local epistemologies, consisting of substantive and methodological assumptions concerning how research should be conducted and which theoretical virtues are the most important. The diversity of epistemologies is justified on the grounds of the diversity of the goals of inquiry. Thus, local epistemologies should not be seen as a sign of the immaturity of a field or a transient state, but as an epistemically beneficial condition for studying phenomena from different perspectives and cultivating critical discussions between communities (Longino 2002, pp.184–89). The different views on how to conduct inquiry may include differences in the evaluation styles of reviewers, which can explain disagreements between review reports on a single submission.
Sometimes, receiving conflicting reviewer reports can actually contribute to a goal of peer review, namely improving the quality of manuscripts. Receiving criticism from reviewers who come from different academic backgrounds can be beneficial to authors if review reports help them to evaluate manuscripts from new perspectives. According to Longino, hypotheses should be exposed “to the broadest range of criticism” (2002, p. 132), as receiving criticism from a perspective that questions her previous assumption may help an author to revise her work, and therefore the existence of different theoretical preferences and review styles may be taken to be epistemically beneficial. However, a precondition for this is that the author and reviewers agree on at least some criteria for good scholarship; researchers must have at least some shared standards for evaluating academic work (Longino 1990).
One of the ways in which superfluous unanimity among reviewers can become epistemically harmful might be called a bottleneck effect. If the pool of potential reviewers is small, and the same experts are used by several journals, the field can be expected to suffer more from the possible individual level biases those reviewers have than would be the case if a more diverse group of reviewers were available. In these situations, the published literature is in danger of becoming skewed even if the review practices seemed to satisfy the ideal of concordant objectivity.
Even though using reviewers with differing evaluative styles may contribute to the goals of peer review, in a multidisciplinary context where submissions from different disciplines are evaluated together and where they compete for the same resources, a deviant yet strict evaluative style may have negative consequences for the discipline in question. Lee and Schunn’s (2011) analysis of reviews of conference abstracts submitted to a multidisciplinary conference revealed that philosophers’ evaluations included very negative comments on submissions more often than the evaluations written by psychologists did. According to Lee and Schunn (2011, p. 361), the evaluative culture of emphasizing the weaknesses of submissions may have a detrimental influence on philosophy as a discipline because harsh reviews result in high rejection rates for philosophy submissions (in the studied case the rejection rate for philosophy submissions was 41%, whereas 20% of psychology submissions were rejected). The same effect may take place when funding is allocated. Funding decisions are often based on the recommendations of panels consisting of experts from different disciplines, and submissions from different fields are competing against each other. As experts from other disciplines find philosophy submissions difficult to evaluate (Lamont 2009, p. 64), decisions on whether or not to fund philosophy projects are based on the judgments of philosophers. Thus, the outcome of review processes may be unduly influenced by philosophers’ harsh evaluation style, and the discipline’s applicants receive fewer resources due to the negative reviews. As in interdisciplinary panels, in which the evaluators are harder on the submissions from their own field of expertise (Lamont, Mallard, and Guetzkow 2006), the effect can be even stronger.
To sum up, in this section I have discussed the idea that underlies empirical research on biases in peer review, namely that the existence of a bias can be recognized by a disagreement between reviewers. This idea, which is based on the so-called concordant ideal of objectivity, is questionable: unanimous reviewers may have made their evaluations on the basis of undue preferences, while disagreeing reviewers may have valid reasons for their assessments.
4.2. The Effect of the Institutional Context of Peer Review and Individual Level Biases
Next I shall move on to evaluating how the context in which peer review operates affects the influences of individual level biases in the evaluation of submissions. The use of peer review is based on the assumption that critical interaction can improve the quality of scientific work. But how do factors external to the preferences of individual reviewers hamper or further this goal?
The effect of some of the aforementioned individual level biases can be controlled by institutional arrangements. This is particularly the case with biases as a function of author characteristics. Biases against or for authors of certain groups are problematic from the perspective of fairness (e.g., Benétreau-Dupin and Beaulac 2015) but they are also epistemically detrimental as they constitute a violation of a principle that has been epitomized in the Mertonian norm of Universalism: scientific statements should be evaluated on the ground of their content only, regardless of the qualities of the persons voicing them (Merton  1972).11 Only if the personal characteristics had a negative influence on individual’s intellectual performance, would it be justifiable to let these characteristics impact the evaluation of the individual’s work. If it is assumed that the decision on whether or not to accept a submission should be based on the content of that submission, then these biases apparently work against this goal of peer review. One of the goals of peer review, i.e., determining which proposals best fulfill the criteria of good science, cannot be reached if extraneous factors, such as the nationality of an author, have an influence on the review outcomes. Injustice may result in resources being wasted if talented researchers are tempted to leave the field because of unwarranted criticism. Also, if the claims made by individuals with certain social characteristics face less criticism than what is needed for making sure that they are not flawed, the risk of unfit papers being published increases. From a social epistemological perspective that highlights the need for scrutinizing individuals’ claims as thoroughly as possible, insufficiently strict criticism may be seen as increasing the risk of some faulty beliefs going unnoticed. Because of this, the practices that undermine effective criticism need to be combatted.
Double-blinding is a common way of trying to minimize the effect of referees’ undue preferences and implicit biases. For example, Benétreau-Dupin and Beaulac (2015) advocate the wider adoption of anonymous refereeing in philosophy, where the low representation of women has been associated with implicit biases. According to them, even though the evidence is inconclusive for stating that the gender imbalance is due to biases, implementing preventive measures like double-blinding does not require committing to the assumption that there actually is a bias against women. This is because double-blinding is thought to be a mechanism for merely preventing “blatant and indisputable injustice” (Benétreau-Dupin and Beaulac 2015, p. 71). The studies reviewed by Lee et al. are inconclusive on whether double-blind reviews in practice are fairer in evaluating submissions (2013, p. 11). This may be partly caused by the fact that in practice the content of a submission may give the reviewers enough hints to successfully guess the identity of the author. However, this should not be taken as a reason against blinding, according to Benétreau-Dupin and Beaulac (2015, p. 73) who state, “as good as a guess can be, it is still only a guess.” As long as there are suspicions about the existence of biases for or against authors of certain groups, and as long as better methods for combatting these biases are missing, blinding should be implemented.
A contradictory suggestion on how to identify and eliminate biases is to open peer review, i.e., let both reviewers’ and authors’ identities be known. Among the possible advantages of open peer review is encouraging reviewers to be less aggressive and to devote more effort to their work, along with being able to evaluate submissions in the light of the previous work of the authors.12 The main objection to open peer review is that the reviewers might fear the potential reprisals from authors and thus be less critical (e.g., Shamoo and Resnik 2009, p. 121). In other words, different models of peer review have their own strengths in neutralizing the effects of biases. Unfortunately, a detailed comparison of the benefits of anonymous and open peer review is not possible in this article.
Biases for or against certain author groups are not the only problematic individual level factors that have to be controlled in peer review. Conflicts of interests are another way in which extraneous factors may influence individuals’ evaluations in journal peer review. Conflicts of interests are known to have an effect on an individual’s judgments and research outcomes (see, e.g., Babcock, Loewenstein, Isscharoff, and Camerer 1995; Bekelman, Li and Gross 2003), and it should not be assumed that humans are less prone to these biases when acting as reviewers. Because of this, journals have policies governing conflicts of interests. Potential reviewers are asked to inform editors about financial or personal ties that might have or appear to have an impact on their evaluation. However, defining when financial or personal ties should prevent a person from reviewing a manuscript is less complicated than identifying when an intellectual conflict of interest is present (McLellan and Riis 2003, p. 237). Especially in small fields of study, the pool of potential reviewers is limited and it may thus be difficult to find a reviewer who is not a competitor or collaborator of the author.
As reviewers in peer review should be experts on the subject, they usually have strong views about the content of submissions (Ioannidis 2005, p. 698). This, in turn, may result in the aforementioned bottleneck effect: if the authorization for public dissemination is in the hands of a small number of experts, it can be expected that the accepted submissions are more uniform than would be the case if the pool of potential reviewers were more diverse. This may increase the influence of different content-based biases. Confirmation bias has been found to have an impact on scientific practices, and peer review is no exception: in this context this bias denotes that reviewers tend to prefer manuscripts that support their own theoretical backgrounds. Likewise, conservatism, or “bias against groundbreaking and innovative research” has raised concerns (Lee et al. 2013, p. 9). Biases of this type can be particularly problematic if they prevent the development of perspectives that are critical of current governing views. For instance, according to Longino (2002), alternative points of view need to be cultivated so that critical discussions can flourish, and thus the factors that suppress the development of critical points of view should be considered as epistemically detrimental. As peer review functions as the gatekeeping mechanism of science, it plays an essential role in determining which viewpoints get resources to develop and participate in discussions. Since content-based biases may result in nontraditional views being excluded from academic discussions, they pose a threat to objectivity of research in Longino’s sense, and philosophers of science interested in examining the conditions for objective research should pay attention to the ways in which these biases could be combatted.
As blinding does not help to remove content-based biases, other tools for controlling their influence must be considered. Recent discussion of academic publishing gives rise to the worry that the institutional context in which peer review practices take place can amplify the effect of some biases. For example, Bruno Frey (2003) has argued that the institutional structure of peer review in journals and the importance of publication records on the job markets in academia have a distorting effect on publishing. His basic claim is that the current practices force authors to formulate their ideas in a way that can be accepted by editors and likely reviewers. This is because prestigious journals receive such a high number of submissions that getting affirmative evaluations from all reviewers is a prerequisite for an article to be accepted by editors. What Frey’s view boils down to is that authors try to draft their submissions to be in line with the preferences of editors and reviewers, both before initial submission and after receiving the reviews (Frey 2003, pp. 210–11). Thus, there is an amplifying effect that further strengthens individual level content-based biases. This effect is sustained by the institutional context in which peer review practices take place. On the current academic job markets, publication records play a central role in defining the quality of applicants. Academics are evaluated on the basis of their publications, both the number and the journals in which they have been published (Frey 2003, p. 210; Young, Ioannidis, and Al-Ubaydli 2008, p. 2; Siler et al. 2015). As receiving reviewer comments usually takes several months, and because especially early career researchers are working under pressure to publish a certain number of papers in a short period of time, it is reasonable for the authors to adjust their manuscripts to match the assumed preferences of reviewers. Thus, the publish or perish culture of academia may have a negative effect on the quality and diversity of published literature.
According to Young et al. (2008, p. 1419), scientists may be tempted to focus on those areas and styles of research that are preferred in prestigious journals, and thus ignore novel ideas. A study by Siler et al. (2015, p. 361) suggests that for individuals trying to maximize their chances of getting manuscripts accepted in a highly cited journal, this “herding behavior” (Young et al. 2008, p. 1419) may be a rational way of acting: the study found that editors and reviewers were prone to reject unconventional work that with hindsight turned out to be influential. The need to conform to the anticipated demands of reviewers is particularly strong in those fields of study where rejection rates are high. For instance, in biomedical sciences, there are only a few highly visible journals, and these journals tend to have high rejection rates (Young et al. 2008, p. 1421).13 The size of the pool of potential journals may have an effect on how the possible biases eventually impact published literature as researchers working in fields with only a few highly prestigious journals may be more prone to herding behavior.
If we take seriously Longino’s arguments for the epistemic benefits of the plurality of available viewpoints and the need for publicly recognized forums where critical discussions can take place, (Longino 2002, pp. 129, 132), the lack of journals with alternative publishing profiles is epistemically problematic. If all respectable journals in a given field favor research with a certain approach to the object of a study or a theoretical starting point, research that questions this dominant view cannot develop. The existence of adequate channels for making different views public is a precondition for functional scientific exchange of criticism. In this way, the number and diversity of journals is a factor that influences how individual level biases impact the published literature.
The fact that authors anticipate reviewer comments is not detrimental in itself, as preparing for criticism can improve the quality of manuscripts. However, if certain approaches or questions are systematically omitted from submitted manuscripts due to the expected preferences of reviewers and editors, harmful lacunae may develop in the field of study. In the terms of Longino’s view on science, the emphasized significance of publishing in certain highly valued journals may create artificial limits to choosing research questions and, thus, be epistemically detrimental. As Young et al. state, “the self-correcting mechanism in science is retarded” by issues related to publication practices, such as the herding behavior (2008, p. 1418).
In the current system only a small number of submissions become publicly disseminated after blind or double-blind review, and alternative ways of distributing new scientific information have been suggested. Both Frey and Young et al. propose that a possible way of tackling the problem of “artificial scarcity,” i.e., the extremely high rejection rates in prestigious journals, could be changing publication practices, for instance, by publishing papers online, even without peer review or with post-publication review. In this way, even more unconventional views could become public (Frey 2003, p. 211; Young et al. 2008, p. 1421). Moreover, to a certain degree, blogs already serve as a channel for discussing novel ideas that have not yet passed the filter of journal peer review.
However, removing the filter of prepublication peer review can be expected to result in information overload. Today huge numbers of scientific articles are published, to be read by only a few people. The core of the problem is not that we have too few published articles, but that the published material may focus on examining the objects of interest from a narrow perspective and consequently give a skewed picture of phenomena. A possible solution could be to change the way in which researchers are evaluated. Instead of focusing on quantity, such as h-factors, emphasis should be given to the quality of publications. One criterion for evaluating publication records could be taking into account how they contribute to the epistemic diversity of the field. This would lessen the pressure to publish a lot as quickly as possible and give researchers the possibility to choose alternative ways of exploring the world.
The discussion in this subsection can be encapsulated by stating that peer review practices and their efficacy in improving the quality of published work should not be evaluated without heeding the different forms of academic publishing and the role that publishing plays in academic career development. The weight that is given to submission records in academic job markets, the number of prestigious journals and the pool of reviewers used by those journals, all have an effect on the degree to which the reviewers’ possible biases influence the content of submissions that are eventually published. If there are only few journals that are worth submitting to whose publishing profiles are similar, and who rely on a limited pool of experts, then the published literature can be expected to reflect more the preferences of reviewers than in the case where an individual trying to pursue a career in academia has more options for which journals to target.
In this paper, I have argued that the pool of published scientific literature is not only skewed by the individual level biases of reviewers acting as gatekeepers, but also by the institutional context in which peer review operates. The impact of the steering effect of institutional factors becomes more notable especially when the focus is shifted from the assessment of the fate of individual submissions to evaluating the influence of peer review practices on the published literature as a whole. Because of the publish or perish culture, it is rational for individual authors to adjust their manuscript to match the preferences of potential reviewers, i.e., to act in a way that may hinder the development of science.
Reliance on peer review is one of the most prominent features of the current academic culture, and, as Lee (2012, p. 868) points out, an important social epistemic aspect of knowledge production. The use of journal peer review is often advocated by using arguments that are common to social epistemology: it is thought that interaction between individuals makes it possible to disclose faulty reasoning and problematic assumptions. However, in order to make sure that this mechanism functions soundly, we need to seriously consider another message from social epistemology, namely that the institutional context of scientific practices has an impact on whether individuals’ biases have harmful consequences to the products of the process in question.
Reviewers, like all human beings, are prone to biases. Discussions on biases in peer review are in line with the views of those who have highlighted the importance of social justification mechanism in science. Despite what some authors have claimed (e.g., Smith 2004), the objectivity of research should not be seen as dependent on individual scientists being able to reason logically and not let their own preferences interfere with the conclusions they draw from relevant facts. As empirical studies (e.g., Katz et al. 2003; Uhlmann and Cohen 2007) have shown, individuals do not always recognize how their preferences and social biases impact the evaluations they make. In the current publishing system, individual level biases of reviewers and editors seem to be working together with institutional factors, such as the publish or perish culture, the number of journals in a given field of study, the size of the pool of potential reviewers, and the higher status of paper journal publication, in a way that potentially distorts science.
An interesting example of the way in which the published literature may become skewed by interrelated individual level biases and institutional structures is publication bias. Some authors, for example Lee et al. (2013, p. 10), name publication bias as one of the biases in peer review and it has even been called “the most important” (Godlee and Dickersin 2003, p. 112) of the biases influencing journal peer review. This bias is particularly problematic because in medical research it distorts the pool of studies that are used for systematic reviews of literature, and thus has an impact on treatment guidelines (e.g., Godlee and Dickersin 2003, p. 112). In other words, this bias may have severe non-epistemic consequences. The evidence for reviewers’ and editors’ preference for positive studies seems to be inconclusive (Chan et al. 2014; Godlee and Dickersin 2003, p. 103). However, at least in medical research, there is evidence that the main cause of publication bias is that negative studies are never even submitted for evaluation—not that negative studies are rejected more often than positive ones (Chan et al. 2014; Sismondo 2008). Reasons for this so-called “file-drawer” problem can be various. Authors may anticipate that negative results will not gain editors’ or reviewers’ acceptance. They may be tempted to invest all resources in producing positive results that are more valuable on the academic job markets (Young et al. 2008). Studies may also be discontinued by their sponsors when it starts to seem likely that their results will not be desirable, i.e., positive (Sismondo 2008). Thus, the emergence of publication bias is a joint product of potential reviewer and editorial preferences, and more general operational modes in current research culture that tend to reward positive results, whether financially or immaterially.
In academia, peer review is used also for evaluating project proposals, manuscripts of books and book chapters, as well as individuals who are candidates for promotion or tenure. The peer review of grant proposals differs from the peer review of journal submissions in several ways, for instance in that in peer review of grant proposals the object of evaluation is research that has not yet been conducted (Holbrook and Frodeman 2011), journal submissions are often anonymous, while the evaluation of the person(s) conducting the study is an important part of grant peer review. Also, revise and resubmit cycles are typical of journal peer review while grant proposals are most often evaluated in one stage (Marsh, Jayasinghe, and Bond 2008, pp. 166–67). The focus of this paper is on journal peer review because as the allocation of grants is partly based on the publication records of applicants, examining the mechanisms that may bias the outcomes of journal peer review can be taken to be more elementary.
Some exceptions are Fitzpatrick (2010), Lee and Schunn (2011), Lee (2012), and Lee, Sugimoto, Zhang and Cronin (2013). In the recent debate concerning women’s low representation in philosophy, gender biases in peer review have been mentioned as one possible explanation for the gender disparity (e.g., Haslanger 2008).
I thank Heather Douglas for pointing this out.
By undue preferences I refer to preferences that do not reflect the relevant merits of submissions. What merits are relevant, in turn, is at least partially discipline and journal dependent.
Even though I state here that the goal of peer review is to decide whether a given piece of academic work is “good science,” I do not mean to imply 1) that there is a clear and universal understanding of what constitutes “good science” that would be applied to evaluation of all submissions, or 2) that other factors, such as the focus of the journal, would not have an influence on the decision on whether the submission should be accepted or not.
As an anonymous reviewer noted, the low inter-rater reliability could be interpreted as caused by the lack of competence in judging the quality of submissions instead of biases. Indeed, according to a study by Jayasinghe, Marsh and Bond (2003), inter-rater reliability in the evaluation of grant applications did improve when reviewers rated more applications, which suggest that training and experience can decrease disagreement. On the other hand, attending a workshop on peer review did not improve the performance of reviewers, according to a study by Callaham, Wears, and Waeckerle (1998). See Callaham (2003) for evaluation and training of reviewers.
It certainly is somewhat baffling that young reviewers have been found to be stricter than their older colleagues while at the same time less experienced reviewers are more lenient than experienced reviewers. As Hans Radder noted after reading a manuscript of this article, it seems unlikely that, on average, inexperienced reviewers are older.
There are journals that explicitly state a preference novel approaches. However, even if this is the case, it could be that in general there is a bias against submissions suggesting novel approaches.
I am grateful to Uskali Mäki who pointed out that some editors encourage authors to respond to reviews that have led to rejection, and in which they feel that the reviewers’ comments have been unfair. In these cases, the interaction between an author, reviewer, and editor is closer to such a genuine critical discourse that Longino advocates.
There are obviously exceptions to this rule.
An anonymous reviewer suggested that there are personal characteristics that might not be extraneous when research is assessed, for example, a researcher’s temperament or accolades. These features can indeed be relevant when the object of evaluation is a person, for instance in the case of hiring decisions. However, in journal peer review the object of evaluation is a manuscript. The previous track record of the author in question may correspond to the quality of the manuscript, but the assessment should be based on the content of the manuscript, not the merits of its author.
I thank Alex Broadbent for highlighting this.
Similarly, in philosophy on average 92% of submissions received either “reject” or “revise and resubmit” decisions according to Lee and Schunn (2011).