Abstract
The ongoing reproducibility crisis in psychology and cognitive neuroscience has sparked increasing calls to re-evaluate and reshape scientific culture and practices. Heeding those calls, we have recently launched the EEGManyPipelines project as a means to assess the robustness of EEG research in naturalistic conditions and experiment with an alternative model of conducting scientific research. One hundred sixty-eight analyst teams, encompassing 396 individual researchers from 37 countries, independently analyzed the same unpublished, representative EEG data set to test the same set of predefined hypotheses and then provided their analysis pipelines and reported outcomes. Here, we lay out how large-scale scientific projects can be set up in a grassroots, community-driven manner without a central organizing laboratory. We explain our recruitment strategy, our guidance for analysts, the eventual outputs of this project, and how it might have a lasting impact on the field.
INTRODUCTION
The scientific community in psychology and neuroscience faces increasing pressure to rethink and improve its current culture and practices. Low replicability and reproducibility of research findings have eroded trust in even some of the most established results (e.g., Szucs & Ioannidis, 2017; Open Science Collaboration, 2015). At the same time, these issues have made clear that a traditional research structure based on individual laboratories with a solo lead will likely be insufficient to overcome the many obstacles inherent to studying the human brain and mind. Restoring trust and advancing knowledge requires not only evaluating the robustness of a specific field's research outcomes to lay the groundwork for evidence-based, reproducible, and robust scientific standards, but also to develop and implement alternative models of research culture.
Here, we introduce the EEGManyPipelines project to (1) map the real-life analytical flexibility in EEG research and its effects on robustness of reported results, and (2) serve as a blueprint for setting up and conducting research in a grassroots, community-driven manner without a central organizing laboratory.
EVALUATING THE ROBUSTNESS OF EEG RESEARCH “IN THE WILD”
Although credibility issues in psychology and cognitive neuroscience concern a wide range of topics and methodologies, EEG research represents an ideal candidate to investigate current scientific practices. Not only is EEG one of the most widely used tools to study human cognition, its large analytical flexibility may render it particularly susceptible to low replicability and robustness (Figure 1A).
One possible source of such a lack of replicability and robustness of research findings could be that reported results are affected by differences in analysis pipelines (Wagenmakers, Sarafoglou, & Aczel, 2022). Systematically and meaningfully mapping the variability of reported results to the variability in analysis pipelines can only be achieved by applying multiple genuine and plausible analysis pipelines to the very same data set. Recent reviews highlighted considerable variability in analysis pipelines, even when researchers pursued similar research questions (Šoškić, Jovanović, Styles, Kappenman, & Ković, 2022). Likewise, permuting analysis parameters in multiverse approaches (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016) has shown strong effects on EEG results (Clayson, Baldwin, Rocha, & Larson, 2021).
However, neither of these approaches can fully relate meaningful variability in analytic choices to meaningful variability in results. Evaluating robustness of research outcomes in the context of published literature relies on comparing analysis pipelines applied to different data sets, thereby introducing uncontrolled variability. By contrast, systematically permuting analysis parameters does not necessarily produce analysis pipelines the community of researchers would actually use or deem plausible.
To address this important blind spot and reveal the true extent of diversity in existing analytical practices and the ensuing variability of findings, we launched the EEGManyPipelines project in 2020. Complementing the EEGManyLabs project (Pavlov et al., 2021), a large-scale effort to replicate experimental findings in EEG research by pooling data collection efforts across multiple laboratories, here, we rely on a multi-analyst approach to investigate EEG analysis practices: Many independent analysis teams test the same set of hypotheses on the same data and report their analyses in detail, providing a record of their results and analysis code. In that sense, our project targets EEG analyses as conducted “in the wild”: (1) It goes beyond summarizing analysis practices reported in published EEG studies by observing in detail how such analyses are conducted and implemented in actual research environments; (2) the analyses are executed by a large, representative sample of analysts rather than a single team; and (3) the analysts are granted the autonomy to make their own analytic choices, mirroring their own “natural” research work.
Multi-analyst studies of a similar kind have already been successfully conducted or are currently underway in other domains and fields. For instance, the Neuroimaging Analysis Replication and Prediction Study (NARPS; Botvinik-Nezer et al., 2020) analyzed fMRI pipelines and demonstrated large methodological flexibility, affecting the reported results for a subset of the tested hypotheses. Similarly, the recently announced Coscience EEG Personality project (Paul et al., 2022) as well as the Team 4 TMS-EEG project (Bortoletto et al., 2023) complement our project with more targeted approaches, aiming at leveraging multi-analyst or multilab components to investigate personality and individual differences with EEG recordings or the robustness of TMS-EEG data to heterogeneous data collection situations and analytic strategies, respectively.
So far, however, this type of multi-analyst approach has never been taken to evaluate the robustness of EEG research as a whole and, as such, contributes to the expansion of comparative research on analytical variability across research fields and designs. What is more, although previous multi-analyst studies (e.g., NARPS) have already started alerting the community about variability and its consequences in domain-specific analysis practices, there is much more to be learned about more general research culture and scientific decision-making. In contrast to NARPS, in the EEGManyPipelines project, analysts did not only report the outcomes of their analysis in the form of a detailed questionnaire and yes/no answers to whether hypotheses were confirmed by the data, but also wrote a free-text “results section” interpreting their findings and submitted their actual analysis code. We expect that the free-text results section might shed light on how people draw conclusions from their statistical findings, while keeping data and hypotheses constant. Moreover, comparing these reports to the analysts' actual code promises to reveal novel patterns about, for instance, where and how scientists are most error-prone in reporting and interpreting their results. Findings such as these should be relevant not only to the EEG community but also to cognitive neuroscientists at large. In the context of multi-analyst studies from different fields, the EEGManyPipelines project is one of the largest ever conducted and yields an unprecedented rich data set of choices, outcomes, and analyst-level variables, enhancing meta-scientific opportunities to investigate determinants of variability.
RECRUITING A LARGE AND REPRESENTATIVE SAMPLE OF ANALYSTS TO BUILD A RICH, OPEN-ACCESS DATA REPOSITORY AND DERIVE PRACTICAL RECOMMENDATIONS
Multi-analyst approaches can reveal how real-world variability in research outcomes relates to variability in analysis pipelines. At the same time, they provide the necessary empirical data to derive evidence-based, reproducible, and robust standards for data analysis and reporting (Aczel et al., 2021). This full potential, however, can only be unlocked if the analysts contributing the pipelines as well as the contributed pipelines are themselves representative of researchers and actual analyses in the field.
To achieve this objective, we selected a data set as representative as possible. Consisting of an EEG experiment on visual long-term memory for scene photographs—a cognitive process that we expected most analysts to be familiar with—a group of 33 participants saw a stream of scene images from different categories and decided on each trial whether the image was new or had been presented before. Thus, this paradigm featured a conventional factorial design and typical indicators of behavioral performance, allowing us to formulate several “research questions.” The data set itself was recorded with standard parameters and comprises a typical range of noise and signal artifacts (see Algermissen et al., 2022 for details). Importantly, previous studies using similar paradigms have reported significant but modest overall effect sizes for memory-related effects (e.g., Van Strien, Hagenbeek, Stam, Rombouts, & Barkhof, 2005; Burgess & Gruzelier, 2000; Friedman, 1990), rendering the results potentially more susceptible to variations induced by different analysis pipelines and avoiding strong expectations about the presence or absence of an effect. We provided the data in an almost completely unprocessed form, except for downsampling to facilitate data sharing, referencing, and export to a variety of formats, such as the EEG-Brain Imaging Data Structure (BIDS) standard (Pernet et al., 2019). No analyses or results associated with this data set were published at the time of sharing to avoid any potential bias in analytical decisions because of prior knowledge of the data or results. Instructions to analysts were carefully worded to encourage an analysis approach typical of analysts' standard analysis pipeline and real-life approach to hypothesis testing.
Recruiting as large and representative a sample of analysts as possible was a guiding principle for our decisions at all stages of the EEGManyPipelines project, from its very conceptualization to analyst recruitment and guidance. We defined—and later verified—inclusion criteria, such that each team of analysts (composed of up to three individual researchers) had to include at least one member with expertise in electrophysiological data analysis (i.e., one or more publication(s) in a peer-reviewed journal). This recruitment strategy combined with outreach efforts (on social media, major software mailing lists, and through direct contact with research institutions and colleagues outside of Europe and the United States) allowed us to recruit, to the best of our knowledge, the largest multi-analyst sample to date: 396 researchers across 168 analysis teams. Importantly, this sample also seems to capture some of the main features of the research community at large. Team composition in terms of gender distribution (Figure 1B) and level of expertise with EEG and cognitive neuroscience as measured by subjective ratings, number of EEG publications, and academic seniority (Figure 1C) suggests a profile of diversity similar to what is encountered in real life. In particular, the geographical origin of individual analysts (Figure 1D) mirrors the geographical distribution of the authors of EEG articles published in the last 5 years (Figure 1E).
With such a large and fairly representative sample of analysts, any variability in analytical choices and/or reported outcomes we may discover will likely capture analytical decisions and flexibility as encountered in the wild, that is, in the community's everyday research work that forms the basis of the scientific literature. Thus, we will be in a position not only to infer the robustness of published findings but also to identify those parameters and analytic choices that shape observed results the most. We hope that this knowledge will help sensitize researchers to the impact of their analytical decisions and lay the ground for the development of evidence-based, robust, and standardized analysis pipelines and reporting guidelines.
We will also release all project materials (i.e., raw EEG data, data/code provided by analysts, data/code derived as part of the EEGManyPipelines project) in an open-access database. This database will represent a rich repository of easily accessible, searchable, and (re-)usable data that we hope will inspire further inquiries into cognitive, methodological, and meta-scientific questions for years to come. Ultimately, we aim to deliver insights that matter—not only for the EEG community but also for those relying on related tools, such as magnetoencephalography, intracranial EEG, or electrocorticography.
SHAPING THE FUTURE OF (NEURO-)SCIENCE BY PROVIDING A BLUEPRINT FOR CONDUCTING LARGE-SCALE, GRASSROOTS, COMMUNITY-DRIVEN SCIENCE
The EEGManyPipelines project is a child of the COVID-19 pandemic: Sparked by a single Tweet,1 it was envisioned entirely online by a group of researchers from around the world as well as at all career stages (i.e., the “steering committee”; https://www.eegmanypipelines.org/#ref-steering-committee). Unlike traditional laboratory-style science or previous big team science collaborations (e.g., adversarial collaborations, ManyLabs, and NARPS), from the get-go, the EEGManyPipelines project has been a bottom–up, community-driven effort without a rigidly defined hierarchy or a central lead laboratory or researcher. In particular, although there is a clear hierarchy between the steering committee and further contributors to the project (e.g., analysts, advisory board), the internal structure within the steering committee is flat. Moreover, apart from invaluable financial support for one full-time research position and in-person meetings acquired 1.5 years into the project's lifetime (i.e., during the data collection phase in March 2022), the bulk of the efforts of the vast majority of steering committee members—both in terms of overall duration and man–hours—has not been supported by a dedicated source of funding and instead relies entirely on voluntary contributions.
Running a large-scale science collaboration without a clearly identifiable (quasi-)solo lead in the current scientific “incentive” structure poses challenges at all stages: For instance, different ideas, perspectives, and priorities have to be translated into a coherent, feasible, and testable research agenda, and results have to be shared respecting individual contributions, whereas, at the same time, project roles might be fluid. Some of these obstacles (e.g., data sharing) may also be faced in the context of traditional laboratory or collaborative science. However, in a grassroots, community-driven setting without a central lead, these can be amplified. We highlight some of the most pertinent challenges encountered and lessons learned in setting up and running this kind of decentralized, big team science effort in Box 1.
In the EEGManyPipelines project, we signal that a meaningful, interesting, and solid scientific question can be successfully addressed with this kind of bottom–up, community-driven collaboration. Although this particular model of scientific inquiry might be indispensable for research agendas such as ours, we believe that it is not limited to multi-analyst studies or comparable scientific questions. Indeed, decentralized, democratic “big team science,” in which researchers pool both their physical and intellectual resources, might confer several critical advantages over a traditional, centralized research approach (Baumgartner et al., 2023; Coles, Hamlin, Sullivan, Parker, & Altschul, 2022): Perhaps among the most important, this kind of decentralized collaboration allows for a larger and, critically, more freely moving pool of ideas. Thereby, it might foster creativity, become larger than the sum of its parts, and open up the door toward potentially unexpected scientific discovery.
Alongside other current big team science projects in neuroscience and beyond (e.g., #EEGManyLabs, Coscience EEG Personality project, Team 4 TMS-EEG), we believe the EEGManyPipelines project to be unique in opening up the discussion about and giving visibility to other models of science. There will not be a “one size fits all” solution. Further inspiration might be drawn from similar projects, such as the Psychological Science Accelerator (PSA; Moshontz et al., 2018) or the International Brain Lab (IBL, 2017), which, in addition to being large-scale collaborations, focus on setting up collaborative infrastructures (e.g., committees to decide which study proposals to put forward, how to allocate funding to collaborators) for running experiments across countries and laboratories (cf. Table 1). We hope that by setting an example; opening up our scientific practices; providing guidance on how to set up and run a grassroots, community-driven project; and successfully finishing this venture, the EEGManyPipelines project may serve as one potential blueprint for a more collaborative, community-driven, flat, and open scientific culture.
Project . | Governance . | Decision-making . | Membership . | Infrastructure . | Scientific Focus . |
---|---|---|---|---|---|
EMP | Steering committee with flat hierarchy | Democratic vote in steering committee | Individual researchers | No | Data analysis |
IBL | General assembly (i.e., all IBL PIs) | Consent-based within the IBL | Franchise of laboratories | Yes | Data collection |
PSA | Hierarchical leadership team with committees | Democratic vote of all members | Individual researchers | Yes | Data collection |
Project . | Governance . | Decision-making . | Membership . | Infrastructure . | Scientific Focus . |
---|---|---|---|---|---|
EMP | Steering committee with flat hierarchy | Democratic vote in steering committee | Individual researchers | No | Data analysis |
IBL | General assembly (i.e., all IBL PIs) | Consent-based within the IBL | Franchise of laboratories | Yes | Data collection |
PSA | Hierarchical leadership team with committees | Democratic vote of all members | Individual researchers | Yes | Data collection |
We highlight some similarities and differences between the EEGManyPipelines project (EMP) and other big team science initiatives: the International Brain Lab (IBL) and the Psychological Science Accelerator (PSA). Projects are compared across a non-exhaustive list of five different criteria: (1) governance, that is, a decision-making body for setting shared scientific goals and policies; (2) decision-making process; (3) membership, being based either on the decision of entire labs under a PI or individual researchers; (4) development of dedicated infrastructure; and (5) main scientific focus.
CONCLUSION
The fields of experimental psychology and cognitive neuroscience currently find themselves at a critical crossroads: Faced with uncertainties about replicability, reproducibility, and robustness, the community increasingly recognizes the need for a shift toward better scientific practices as well as improving the scientific culture. With the EEGManyPipelines project, we tackle analytical flexibility “in the wild” and explicitly address the impact of methodological choices on research outcomes. Our results will provide a roadmap toward more reproducible, robust, and transparent standards for conducting and reporting EEG studies and, ultimately, contribute to shaping a more credible, inclusive, and collaborative science.
Acknowledgments
We thank all analysts for their efforts and making this project possible. We thank Balazs Aczel, Mike X. Cohen, Arnaud Delorme, Anna Dreber, Alexandre Gramfort, Jörg Hipp, Felix Holzmeister, Magnus Johannesson, Steven J. Luck, Vanja Ković, Robert Oostenveld, Yuri Pavlov, Cyril Pernet, Russell Poldrack, Aina Puce, Anđela Šoškić, Tom Schonberg, Martin Schweinsberg, Barnabas Szaszi, and Erik L. Uhlmann for serving on the project's advisory board (names are listed in alphabetical order). We thank Josh Koen, Navid Muhammad Samran, Mehdi Senoussi, Britta Westner, and Jeremy Yeaton for participating in the initial stages of this project as steering committee members. We also thank the Max Planck Institute for Empirical Aesthetics (MPI-EA) graphics' team for their help in designing the figure.
Corresponding author: Darinka Trübutschek, Research Group Neural Circuits, Consciousness and Cognition, Max Planck Institute for Empirical Aesthetics, Frankfurt/Main, Germany, or via e-mail: [email protected].
Data Availability Statement
The scripts used to create figures and summaries in this article are available at https://github.com/EEGManyPipelines/metadata_summary.
Author Contributions
Darinka Trübutschek: Conceptualization; Formal analysis; Investigation; Project administration; Visualization; Writing—Original draft; Writing—Review & editing. Yu-Fang Yang: Conceptualization; Formal analysis; Investigation; Project administration; Visualization; Writing—Original draft; Writing—Review & editing. Claudia Gianelli: Investigation; Project administration; Writing—Original draft; Writing—Review & editing. Elena Cesnaite: Formal analysis; Investigation; Project administration; Visualization; Writing—Original draft; Writing—Review & editing. Nastassja L. Fischer: Investigation; Project administration; Writing—Original draft; Writing—Review & editing. Mikkel C. Vinding: Formal analysis; Investigation; Project administration; Visualization; Writing—Original draft; Writing—Review & editing. Tom R. Marshall: Investigation; Project administration; Writing—Original draft; Writing—Review & editing. Johannes Algermissen: Investigation; Project administration; Writing—Review & editing. Annalisa Pascarella: Investigation; Project administration; Writing—Review & editing. Tuomas Puoliväli: Investigation; Project administration; Writing—Review & editing. Andrea Vitale: Investigation; Project administration; Visualization; Writing—Review & editing. Niko A. Busch: Conceptualization; Funding acquisition; Investigation; Project administration; Resources; Writing—Original draft; Writing—Review & editing. Gustav Nilsonne: Conceptualization; Funding acquisition; Investigation; Project administration; Resources; Writing—Original draft; Writing—Review & editing.
Funding Information
Darinka Trübutschek is supported by the European Union's Horizon 2020 research and innovation program under the Marie Sklodowska-Curie, grant number: 101023805. Nastassja L. Fischer is supported by the Cambridge-NTU Centre for Lifelong Learning and Individualised Cognition (CLC), a project by the National Research Foundation, Prime Minister's Office, Singapore, under its Campus for Research Excellence and Technological Enterprise (NRF-CREATE SoL) Programme with the funding administered by the Cambridge Centre for Advanced Research and Education in Singapore Ltd. (CARES) and housed at the Centre for Research and Development in Learning (CRADLE@NTU). Niko A. Busch and Elena Cesnaite are supported by the DFG priority program “META-REP: A Meta-scientific Programme to Analyse and Optimise Replicability in the Behavioural, Social, and Cognitive Sciences.” Gustav Nilsonne and Mikkel C. Vinding are supported by Riksbankens Jubileumsfond, grant number: P21-0384. Mikkel C. Vinding is supported by a collaborative grant from the Lundbeck Foundation, grant number: R336-2020-1035.
Diversity in Citation Practices
A retrospective analysis of the citations in every article published in this journal from 2010 to 2020 has revealed a persistent pattern of gender imbalance: Although the proportions of authorship teams (categorized by estimated gender identification of first author/last author) publishing in the Journal of Cognitive Neuroscience (JoCN) during this period were M(an)/M = .408, W(oman)/M = .335, M/W = .108, and W/W = .149, the comparable proportions for the articles that these authorship teams cited were M/M = .579, W/M = .243, M/W = .102, and W/W = .076 (Fulvio et al., JoCN, 33:1, pp. 3–7). Consequently, JoCN encourages all authors to consider gender balance explicitly when selecting which articles to cite and gives them the opportunity to report their article's gender citation balance.
Note
REFERENCES
Author notes
These authors contributed equally.
Senior authors.