Skip Nav Destination
Close Modal
Update search
NARROW
Format
Journal
Date
Availability
1-3 of 3
Hannes Thiemann
Close
Follow your search
Access your saved searches in your account
Would you like to receive an alert when new items match your search?
Sort by
Journal Articles
Publisher: Journals Gateway
Data Intelligence (2022) 4 (2): 212–225.
Published: 01 April 2022
FIGURES
| View All (4)
Abstract
View article
PDF
In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulation-based research. We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center (DKRZ). What is special about this is that the DKRZ provides the climate science community with resources like high performance computing (HPC), data storage and specialised services, and hosts the World Data Center for Climate (WDCC). Therefore, users can perform their entire research workflows up to the publication of the data on the same infrastructure. Our analysis shows, that the resources are used by two primary user types: those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse, build-on and analyse existing data. We then further subdivided these top-level user categories based on their specific goals and analysed their typical, idealised workflows applied to achieve the respective project goals. We find that due to the subdivision and further granulation of the user groups, the workflows show apparent differences. Nevertheless, similar “Canonical Workflow Modules” can be clearly made out. These modules are “Data and Software (Re)use”, “Compute”, “Data and Software Storing”, “Data and Software Publication”, “Generating Knowledge” and in their entirety form the basis for a Canonical Workflow Framework for Research (CWFR). It is desirable that parts of the workflows in a CWFR act as FDOs, but we view this aspect critically. Also, we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.
Journal Articles
Publisher: Journals Gateway
Data Intelligence (2020) 2 (1-2): 230–237.
Published: 01 January 2020
Abstract
View article
PDF
Since 2009 initiatives that were selected for the roadmap of the European Strategy Forum on Research Infrastructures started working to build research infrastructures for a wide range of research disciplines. An important result of the strategic discussions was that distributed infrastructure scenarios were now seen as “complex research facilities” in addition to, for example traditional centralised infrastructures such as CERN. In this paper we look at five typical examples of such distributed infrastructures where many researchers working in different centres are contributing data, tools/services and knowledge and where the major task of the research infrastructure initiative is to create a virtually integrated suite of resources allowing researchers to carry out state-of-the-art research. Careful analysis shows that most of these research infrastructures worked on the Findability, Accessibility, Interoperability and Reusability dimensions before the term “FAIR” was actually coined. The definition of the FAIR principles and their wide acceptance can be seen as a confirmation of what these initiatives were doing and it gives new impulse to close still existing gaps. These initiatives also seem to be ready to take up the next steps which will emerge from the definition of FAIR maturity indicators. Experts from these infrastructures should bring in their 10-years' experience in this definition process.
Journal Articles
Publisher: Journals Gateway
Data Intelligence (2020) 2 (1-2): 257–263.
Published: 01 January 2020
Abstract
View article
PDF
Institutions driving fundamental research at the cutting edge such as for example from the Max Planck Society (MPS) took steps to optimize data management and stewardship to be able to address new scientific questions. In this paper we selected three institutes from the MPS from the areas of humanities, environmental sciences and natural sciences as examples to indicate the efforts to integrate large amounts of data from collaborators worldwide to create a data space that is ready to be exploited to get new insights based on data intensive science methods. For this integration the typical challenges of fragmentation, bad quality and also social differences had to be overcome. In all three cases, well-managed repositories that are driven by the scientific needs and harmonization principles that have been agreed upon in the community were the core pillars. It is not surprising that these principles are very much aligned with what have now become the FAIR principles. The FAIR principles confirm the correctness of earlier decisions and their clear formulation identified the gaps which the projects need to address.