Abstract
It is easy to argue that open data are critical to enabling faster and more effective research discovery. In this article, we describe the approach we have taken at Wiley to support open data and to start enabling more data to be FAIR data (Findable, Accessible, Interoperable and Reusable) with the implementation of four data policies: “Encourages”, “Expects”, “Mandates” and “Mandates and Peer Reviews Data”. We describe the rationale for these policies and levels of adoption so far. In the coming months we plan to measure and monitor the implementation of these policies via the publication of data availability statements and data citations. With this information, we'll be able to celebrate adoption of data-sharing practices by the research communities we work with and serve, and we hope to showcase researchers from those communities leading in open research.
1. Background and Motivation
“Open research” and “open science” are two interchangeable terms that encompass a number of practices that are becoming widely adopted [1, 2]. While definitions of open research and open science come in many flavors (see Table 1), their core elements include open accessibility and dissemination of research outputs including more than traditional journal articles.
Attribution . | Definition . |
---|---|
Foster Open Science | “Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.” [3] |
European Commission | “A broad term, covering the many exciting developments in how science is becoming more open, accessible, efficient, democratic, and transparent. This Open Science revolution is being driven by new, digital tools for scientific collaboration, experiments and analysis and which make scientific knowledge more easily accessible by professionals and the general public, anywhere, at any time.” [4] |
Michael Nielsen | “the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process” [5] |
Center of Open Science | “Openness and reproducibility are core scientific values because science is a distributed, non-hierarchical culture for accumulating knowledge. No individual is the arbiter of truth. Knowledge accumulates by sharing information and independently reproducing results.” [6] |
Attribution . | Definition . |
---|---|
Foster Open Science | “Open Science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.” [3] |
European Commission | “A broad term, covering the many exciting developments in how science is becoming more open, accessible, efficient, democratic, and transparent. This Open Science revolution is being driven by new, digital tools for scientific collaboration, experiments and analysis and which make scientific knowledge more easily accessible by professionals and the general public, anywhere, at any time.” [4] |
Michael Nielsen | “the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process” [5] |
Center of Open Science | “Openness and reproducibility are core scientific values because science is a distributed, non-hierarchical culture for accumulating knowledge. No individual is the arbiter of truth. Knowledge accumulates by sharing information and independently reproducing results.” [6] |
At Wiley, the researcher is our “North Star” as explained by Judy Verses (Executive Vice President, Wiley) in her keynote talk at the APE2019 conference in Berlin, Germany [7]. This means that we put researchers at the heart of our research publishing and educational services. We listen to the research communities we serve and – by tailoring open research initiatives to the needs of researchers in particular disciplines – we support their open research aspirations. Adopting open practices, but phasing their implementation to suit different communities, is our focus. We organize our work in five key areas: open access, open practices, open collaboration, open recognition and reward, and of course, open data [8].
“Open data” is an often-used term for sharing data, and is perhaps made more meaningful by the term FAIR (Findable, Accessible, Interoperable and Reusable) [9]. After open access, “open data” (or better: FAIR data) is probably one of the most important elements of open research [10]. FAIR data have the potential to revolutionize the way research is done and communicated and we are seeing benefits in research discoveries as a result [11]. Open research initiatives, like open data, bring many benefits including increased transparency as well as, potentially, enhanced reproducibility and amplified impact [12]. Funders and institutions recognize this and are increasingly requiring researchers to share data [13].
However, given the scale and variety of data, the complexity of how best to share data, the need for new practices and habits by research communities, and the need for technology and infrastructure to support data sharing, it is clear that collaboration across all stakeholders is key. This is a challenge we all must embrace, if we are going to make progress.
To reflect our commitment to open research and to support researchers in sharing their data, Wiley recently updated its data sharing and citation policies [14]. In the rest of this article, we will share the approach we took with our data policies, how this fits with approaches taken by other publishers, and how this helps Wiley begin to achieve the goals of FAIR data.
2. Research Data Sharing Policy at Wiley
At Wiley, we are making open research not just the future of research and research communication, but the here and now. We have four policy-level requirements for data sharing, adopted across our portfolio of journals [15].
“Encourages data sharing” is our entry-level policy to encourage data sharing. It enables journals serving researchers in communities where data sharing is not common to start their journey toward data sharing. There are no enforced requirements.
“Expects data sharing” is a policy for journals that require from every author a data availability statement to confirm presence or absence of shared data, and a data citation. It is equivalent to the Transparency and Openness Promotion (TOP) level 1 guidelines [16].
“Mandates data sharing” is a policy for journals that require a data availability statement, a data citation, and sharing of data (it is equivalent to TOP level 2 [16]).
“Mandates data sharing and peer reviews data” is a policy for journals that take the additional step of peer reviewing data (it is the equivalent to TOP level 3 [16]).
Of course, we recognize that the process of adopting open research practices can be challenging and requires cultural change as emphasized by Henriikka Mustajoki (Head of Development, Federation of Finnish Learned Societies) [17]. Our four policy levels give flexibility so that journals can adopt policies that are right for their research communities.
Tiered policies like these adopted by major publishers enable journals to adapt to the communities they serve [18]. The Wiley data sharing policies are shown in Table 2, which maps each against the Transparency Openness Promotion (TOP) guidelines [16] that are used by publishers and funders to increase transparency.
. | Data availability statement is publisheda . | Data have been sharedb . | Data have been peer reviewedc . | Example Wiley journals . | The TOP Guideline Level . |
---|---|---|---|---|---|
Encourages Data Sharing | Optional | Optional | Optional | Not TOP compliant, i.e. “Level 0”) | |
Expects Data Sharing | Required | Optional | Optional | British Journal of Social Psychology | TOP Level 1 |
Mandates Data Sharing | Required | Required | Optional | Ecology and Evolution | TOP Level 2. |
Mandates Data Sharing and Peer Reviews Data | Required | Required | Required | Geoscience Data JournalAmerican Journal of Political Science | TOP Level 3. |
. | Data availability statement is publisheda . | Data have been sharedb . | Data have been peer reviewedc . | Example Wiley journals . | The TOP Guideline Level . |
---|---|---|---|---|---|
Encourages Data Sharing | Optional | Optional | Optional | Not TOP compliant, i.e. “Level 0”) | |
Expects Data Sharing | Required | Optional | Optional | British Journal of Social Psychology | TOP Level 1 |
Mandates Data Sharing | Required | Required | Optional | Ecology and Evolution | TOP Level 2. |
Mandates Data Sharing and Peer Reviews Data | Required | Required | Required | Geoscience Data JournalAmerican Journal of Political Science | TOP Level 3. |
Note:
A data availability statement confirms the presence or absence of shared data.
Links to data in data availability statements are checked to ensure they link to the data that the authors intended. If data have been stored in a data repository, the data availability statement includes a permanent link to the data. Shared data are also cited.
Quality and/or replicability of linked data are peer reviewed. Depending on the journal, this may be to peer review the quality of the data by ensuring that the results in the paper and the data in the repository align (for example, sample sizes and variables match), or it may be to peer review the replicability of the data to ensure that the claims presented in the journal article are valid and can be reproduced.
3. The Research Data Sharing Landscape
Many publishers are adopting data sharing policies either encouraging or requiring researchers to share their underlying data [18]. These developments are going hand-in-hand with requirements from institutions and funders [13]. The characteristic features of data policies from major publishers can be compared with how they map to TOP guidelines [17]. However, while publisher data policies have common elements there is a recognized need for further standardization [19].
With the adoption of data sharing policies comes the possibility of evaluating the impact of these policies. Are researchers compliant? How are data shared? Findings from a recent analysis suggest that the majority of researchers share data within a published article (rather than via a repository) [20] but more research is needed to understand the issues researchers face in sharing their data.
4. Understanding Researcher Needs
The 2016 Wiley Open Science survey, built on earlier work by Wiley in 2014, gathered opinions on data-sharing from over 4,600 researchers worldwide [21]. It identified researchers' motivations to share data (Figure 1), as well as what they find most challenging about data sharing. Wiley is continuing to collect data on how researchers across all disciplines approach open access, open data, peer review and collaboration, and will report the new data in 2019.
5. Promoting Data Sharing
In November 2018, during International Data Week [23] we began a campaign to implement the Wiley “Expects Data” data sharing policy more broadly. Our goal was to step up the support we offered to researchers who want or need to share their data, by transitioning journals from our “Encourages Data” data sharing policy to “Expects Data” [14].
Our first step was to create a toolkit that would brief publishing colleagues, so they could effectively liaise with editors of journals, and then – together – to implement the requirements of the “Expects Data” policy, namely by including data availability statements and data citations in every article. The data sharing team provided everything that journals would need, including support for authors in the form of template data availability statements, instructions for how to cite the data they are sharing, and advice on finding appropriate repositories at which to share their data [15]. We began our implementation plan by selecting journals serving disciplines that were most ready for data sharing, and introduced our new Expects Data policy to those journals first.
At that time, November 2018, more than 1,500 journals had the entry-level “Encourages Data” data sharing policy, and had no specific requirements for data sharing by researchers. We also published a much smaller number of journals (more than 20) that had adopted an earlier version of our “Expects Data” policy, which emphasized the benefits of sharing data to researchers, but that still had no specific requirements for data sharing. Alongside this, we published a similarly small number of journals with “Mandates Data” policy (about 20), among which are the leading journals from the Wiley evolutionary biology portfolio.
Since 2018, we have made significant progress at Wiley. By April 2019, 90 journals have adopted and implemented our “Expects Data” policy and 70 journals have adopted “Mandates Data” policy. Examples of these journals are shown in Table 3 below. Each of these journals now requires data availability statements in every article it publishes, as well as data citations. To make the whole process straightforward for research authors, we created a series of standard templates to complete their data availability statements, shared in Table 4.
Journal title . | ISSN . | Homepage . |
---|---|---|
Acta Neurologica Scandinavica | 1600-0404 | https://onlinelibrary.wiley.com/journal/16000404 |
Applied Stochastic Models in Business and Industry | 1526-4025 | https://onlinelibrary.wiley.com/journal/15264025 |
Brain and Behavior | 2162-3279 | https://onlinelibrary.wiley.com/journal/21579032 |
Chemical Biology and Drug Design | 1747-0285 | https://onlinelibrary.wiley.com/journal/17470285 |
Clinical Endocrinology | 1365-2265 | https://onlinelibrary.wiley.com/journal/13652265 |
Clinical Genetics | 1399-0004 | https://onlinelibrary.wiley.com/journal/13990004 |
Environmetrics | 1099-095X | https://onlinelibrary.wiley.com/journal/1099095x |
Immunity, Inflammation and Disease | 2050-4527 | https://onlinelibrary.wiley.com/journal/20504527 |
Research Synthesis Methods | 1759-2887 | https://onlinelibrary.wiley.com/journal/17592887 |
Pharmaceutical Statistics | 1539-1612 | https://onlinelibrary.wiley.com/journal/15391612 |
Journal title . | ISSN . | Homepage . |
---|---|---|
Acta Neurologica Scandinavica | 1600-0404 | https://onlinelibrary.wiley.com/journal/16000404 |
Applied Stochastic Models in Business and Industry | 1526-4025 | https://onlinelibrary.wiley.com/journal/15264025 |
Brain and Behavior | 2162-3279 | https://onlinelibrary.wiley.com/journal/21579032 |
Chemical Biology and Drug Design | 1747-0285 | https://onlinelibrary.wiley.com/journal/17470285 |
Clinical Endocrinology | 1365-2265 | https://onlinelibrary.wiley.com/journal/13652265 |
Clinical Genetics | 1399-0004 | https://onlinelibrary.wiley.com/journal/13990004 |
Environmetrics | 1099-095X | https://onlinelibrary.wiley.com/journal/1099095x |
Immunity, Inflammation and Disease | 2050-4527 | https://onlinelibrary.wiley.com/journal/20504527 |
Research Synthesis Methods | 1759-2887 | https://onlinelibrary.wiley.com/journal/17592887 |
Pharmaceutical Statistics | 1539-1612 | https://onlinelibrary.wiley.com/journal/15391612 |
Availability of data . | Template for data availability statement . |
---|---|
Data openly available in a public repository that issues data sets with DOIs | The data that support the findings of this study are openly available in [repository name e.g “figshare”] at http://doi.org/[doi], reference number [reference number]. |
Data openly available in a public repository that does not issue DOIs | The data that support the findings of this study are openly available in [repository name] at [URL], reference number [reference number]. |
Data derived from public domain resources | The data that support the findings of this study are available in [repository name] at [URL/DOI], reference number [reference number]. These data were derived from the following resources available in the public domain: [list resources and URLs] |
Embargo on data due to commercial restrictions | The data that support the findings will be available in [repository name] at [URL/DOI link] following an embargo from the date of publication to allow for commercialization of research findings. |
Data available on request due to privacy/ethical restrictions | The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. |
Data subject to third party restrictions | The data that support the findings of this study are available from [third party]. Restrictions apply to the availability of these data, which were used under license for this study. Data are available [from the authors/at URL] with the permission of [third party]. |
Data available on request from the authors | The data that support the findings of this study are available from the corresponding author upon reasonable request. |
Data sharing not applicable – no new data generated | Data sharing is not applicable to this article as no new data were created or analyzed in this study. |
Availability of data . | Template for data availability statement . |
---|---|
Data openly available in a public repository that issues data sets with DOIs | The data that support the findings of this study are openly available in [repository name e.g “figshare”] at http://doi.org/[doi], reference number [reference number]. |
Data openly available in a public repository that does not issue DOIs | The data that support the findings of this study are openly available in [repository name] at [URL], reference number [reference number]. |
Data derived from public domain resources | The data that support the findings of this study are available in [repository name] at [URL/DOI], reference number [reference number]. These data were derived from the following resources available in the public domain: [list resources and URLs] |
Embargo on data due to commercial restrictions | The data that support the findings will be available in [repository name] at [URL/DOI link] following an embargo from the date of publication to allow for commercialization of research findings. |
Data available on request due to privacy/ethical restrictions | The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions. |
Data subject to third party restrictions | The data that support the findings of this study are available from [third party]. Restrictions apply to the availability of these data, which were used under license for this study. Data are available [from the authors/at URL] with the permission of [third party]. |
Data available on request from the authors | The data that support the findings of this study are available from the corresponding author upon reasonable request. |
Data sharing not applicable – no new data generated | Data sharing is not applicable to this article as no new data were created or analyzed in this study. |
We also publish several journals – including EMBO Reports, The EMBO Journal, and EMBO Molecular Medicine – that have adopted our highest data policy of “Mandates and Peer Reviews Data”, setting the standard for data transparency (and also data citation, discussed in the section that follows). Beyond our data sharing policy, we partner with repositories like Figshare and Dryad to make it easier for authors to share data in approved repositories. We develop standards and guidance that enable researchers to share and cite their research data more readily [24, 25]. We adopt and encourage the use of Center for Open Science badges, and over 30 journals use these to recognize and celebrate authors who share data. We are launching an Open Science Ambassador Program in China, and Open Data contribution and sharing will be important components.
6. Citing Data
Wiley also endorses the FORCE11 Joint Declaration of Data Citation Principles [26], a set of guiding principles for data within scholarly literature, another data set, or any other research object. We recommend the format for data citation proposed in this Joint Declaration, and that data held within institutional, subject-focused, or more general data repositories should be cited. At the same time, we do not intend to replace community standards such as in-line citation of GenBank accession codes, instead we hope to supplement those with formal data citations. This is one way to begin to enable researchers who share data to be recognized in the same way that researchers are recognized when they collect citations to their research articles. Data citation like this is not new to Wiley policies. But the emphasis on data citation within the new Wiley data sharing policies is new and is in-line with industry standards and initiatives to recognize data as a primary research object.
7. Working Toward Fair Data
At Wiley, we believe that the introduction of data sharing policies is the first step toward supporting and embracing the FAIR guiding principles [9]. While our policies actively support data sharing (“Expects Data Sharing”) or require data sharing (“Mandates Data Sharing”), the task of making shared data actually FAIR remains with researchers. For many this will be a new responsibility, and it can present some challenges. We have begun work to help overcome those challenges.
For example, research authors who select the first of our template data availability statements (“Data openly available in a public repository that issues data sets with DOIs”) [15] are indicating that their data is “F” (Findable; with a unique and persistent identifier, the DOI) and “A” (Accessible; retrievable by that identifier). Journals that adopt our level 4 policy “Mandates Data Sharing and Peer Reviews Data” conduct peer review on data submitted alongside journal articles, and by doing that, help research authors make their data ready to be “R” (Reusable).
Each of these steps moves us closer to the goal of turning open data into FAIR data, although often the “I” of FAIR (Interoperable) remains a challenge. The following section shares examples of work we are leading or contributing to at Wiley that take us even closer to that goal. Collaboration from all parties – researchers, funders, institutions, policy-makers, infrastructure providers (like repositories), and publishers – is vital to make FAIR data a reality.
8. Examples of Progress Toward Fair Data
American Geophysical Union and Enabling FAIR Data
The American Geophysical Union (AGU) together with Wiley and other partners (including repositories and supporting organizations) have an on-going project to enable FAIR data across the earth and space sciences, sensibly called Enabling FAIR Data [27]. This builds on the work of the Coalition on Publishing Data in the Earth and Space Sciences (COPDESS) [28]. Large and complex data sets are common in the earth and space sciences, which makes this initiative particularly welcome.
Remarkable practice at GeoScience Data Journal and American Journal of Political Science
Journals that “Mandate Data Sharing and Peer Review Data,” for example, the GeoScience Data Journal published by Wiley [29] and the American Journal of Political Science published by Wiley for the Midwest Political Science Association [30], have already adopted remarkable practices toward sharing FAIR-compliant data, and are to be applauded.
SourceData at EMBO Press
The team at EMBO Press, for which Wiley provides publishing services, introduced SourceData in 2018 [31]. SourceData provides a significant step on the road to FAIR data: It makes data findable, accessible, interconnected and downloadable.
“Next” journals at Wiley: Genetics & Genomics Next and Neuroscience Next
Research Data Alliance and standard data sharing policies
Wiley is a member of the community that comes together as the Research Data Alliance (RDA). RDA is creating the social and technical infrastructure that researchers need to share data successfully. For example, the RDA's Data Policy Standardization Interest Group is creating a unified approach to setting data policies, by providing identifying standard requirements for data sharing, as well as how these can be put together into a robust data policy [34].
9. Conclusions: Next Steps
At Wiley, we believe that open research is not just the future of research communications; it is the here and now [8]. Publishers are fundamentally service providers for researchers, whether those researchers are acting as authors, peer reviewers, editors or readers. Our careful implementation of open research practices, including data-sharing policies and open data badges, is intended to help researchers adopt new practices and to benefit from extra impact. We are excited about seeing the results of this work in terms of published data availability statements, and data citations in future. Looking further ahead we intend to measure the success of our “Expects Data” policy implementation, and to measure publication of data availability statements and data citations. With this information, we will be able to celebrate adoption of new practices by the research communities we work with and serve, and showcase researchers from those communities leading in open research.
Author Contributions
All authors made substantial contributions to the design of this paper. Y. Wu ([email protected] wrote the first draft; E. Moylan ([email protected]), C. Graf([email protected]) and H. Inman ([email protected]) revised the first draft. All authors approved the version to be published and are accountable for the paper.
According to CRediT taxonomy, each author's contribution is listed below:
Conceptualization: Y. Wu, E. Moylan, H. Inman and C. Graf; Formal analysis: Y. Wu, E. Moylan, H. Inman and C. Graf; Methodology: Y. Wu, E. Moylan, H. Inman and C. Graf; Resources: E. Moylan, H. Inman and C. Graf; Writing – original draft: Y. Wu; Writing – review & editing: E. Moylan, H. Inman and C. Graf.
Acknowledgments
We are grateful to Wiley's data sharing team and the people who helped design and implement the data sharing policies including: Erin Arndt (Associate Director, Editorial System), Hope Inman (Manager, Editorial Operations & Communications), Kate Perry (Product Manager), Kathryn Sharples (Director, Editorial Development), Terri Teleen (Director, Editorial Operations & Communications), Natasha White (Director, Open Access Product Marketing). We would like to thank Elisha Morris (Editorial Assistant), and Sarah Pedder (Intern) for support as we made revisions to this manuscript following peer review.
References
Author notes
Wiley, 9600 Garsington Road, Oxford, OX4 2DQ, UK
Wiley, 111 River Street, Hoboken, NJ 07030, USA
Wiley, 9600 Garsington Road, Oxford, OX4 2DQ, UK