Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-2 of 2
Kalpana Shankar
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Quantitative Science Studies (2022) 3 (3): 832–856.
Published: 01 November 2022
FIGURES
| View All (9)
Abstract
View article
PDF
One of the main critiques of academic peer review is that interrater reliability (IRR) among reviewers is low. We examine an underinvestigated factor possibly contributing to low IRR: reviewers’ diversity in their topic-criteria mapping (“TC-mapping”). It refers to differences among reviewers pertaining to which topics they choose to emphasize in their evaluations, and how they map those topics onto various evaluation criteria. In this paper we look at the review process of grant proposals in one funding agency to ask: How much do reviewers differ in TC-mapping, and do their differences contribute to low IRR? Through a content analysis of review forms submitted to a national funding agency (Science Foundation Ireland) and a survey of its reviewers, we find evidence of interreviewer differences in their TC-mapping. Using a simulation experiment we show that, under a wide range of conditions, even strong differences in TC-mapping have only a negligible impact on IRR. Although further empirical work is needed to corroborate simulation results, these tentatively suggest that reviewers’ heterogeneous TC-mappings might not be of concern for designers of peer review panels to safeguard IRR.
Includes: Supplementary data
Journal Articles
Publisher: Journals Gateway
Quantitative Science Studies (2022) 2 (4): 1271–1295.
Published: 01 December 2021
FIGURES
| View All (8)
Abstract
View article
PDF
Using a novel combination of methods and data sets from two national funding agency contexts, this study explores whether review sentiment can be used as a reliable proxy for understanding peer reviewer opinions. We measure reviewer opinions via their review sentiments on both specific review subjects and proposals’ overall funding worthiness with three different methods: manual content analysis and two dictionary-based sentiment analysis algorithms (TextBlob and VADER). The reliability of review sentiment to detect reviewer opinions is addressed by its correlation with review scores and proposals’ rankings and funding decisions. We find in our samples that review sentiments correlate with review scores or rankings positively, and the correlation is stronger for manually coded than for algorithmic results; manual and algorithmic results are overall correlated across different funding programs, review sections, languages, and agencies, but the correlations are not strong; and manually coded review sentiments can quite accurately predict whether proposals are funded, whereas the two algorithms predict funding success with moderate accuracy. The results suggest that manual analysis of review sentiments can provide a reliable proxy of grant reviewer opinions, whereas the two SA algorithms can be useful only in some specific situations.