Scholarly publications and data set evidence for the Human Reference Atlas

Abstract Experts from 17 consortia are collaborating on the Human Reference Atlas (HRA) which aims to map the human body at single cell resolution. To bridge across scales—from the meter size human body to the micrometer size single-cell level—organ experts are constructing anatomical structures, cell types plus biomarkers (ASCT+B) tables, and associated spatial reference objects. The 3rd HRA (v1.2) release features 26 organ-specific ASCT+B tables that cite 456 scholarly papers and are linked to 61 spatial reference objects and Organ Mapping Antibody Panels (OMAPs); it is authored by more than 120 experts. This paper presents the first analyses and visualizations showcasing what data and scholarly evidence exist for which organs and how experts relate to the organs covered in the HRA. To identify potential HRA authors and reviewers, we queried the Web of Science database for authors who work on the 33 organs targeted for the next HRA release (v1.3). To provide scientific evidence for the HRA, we identified 620 high-quality, single-cell experimental data sets for 58 organs published in 561 unique papers. The results presented are critical for understanding and communicating the quality of the HRA, planning for future tissue data collection, and inviting leading experts to contribute to the evolving atlas.


Introduction
Constructing an atlas of the healthy human body at the single cell level is a massive undertaking that requires close collaboration by researchers and practitioners with expertise in human anatomy, pathology, surgery, and single cell studies. Datasets at different levels of spatial scale-from computerized tomography and magnetic resonance imaging (MRI) scans at the whole body level to single cells data assays at the biomolecular level-need to be federated and combined to construct a multi-level atlas. Fig. 1 illustrates detail from the whole body into the kidney organ into the nephron and to the single cell level.
Supported by the National Institutes of Health and other funders, experts from 17 consortia are working on the HRA (Börner, Teichmann, et al., 2021). The HRA captures ontology-aligned terms for naming anatomical structures (AS), cell types (CT) plus biomarkers (B) in so-called ASCT+B tables. It links these terms to two-dimensional and three-dimensional spatial representations of major anatomical structures and the cell types commonly located in these, and the biomarkers (gene, protein, lipid, metabolites) used to characterize cell types. The HRA also records ORCID IDs of expert authors and reviewers as well as paper DOIs at the organ level and the cell type level. The HRA data can be explored via the ASCT+B Reporter and the Recently, high quality, single-cell experimental datasets have been linked to the anatomical structures, cell types plus biomarkers in the ASCT+B tables. For example, cell by gene matrices from single-cell studies now provide experimental evidence for what cells are located in which anatomical structures or what genes are highly expressed in what cell types. Azimuth references  make it easy to assign cell type names to clusters of cells that have similar gene expression values. Organ Mapping Antibody Panels (OMAPs) (Hickey et al., 2022) save time and money by providing validated antibody panels for proteins commonly used to characterize cell types in different healthy human organs. There exist crosswalks from Azimuth and OMAPs to the ASCT+B tables; hence, experimental datasets that used them to identify cell types via gene or protein biomarkers can be easily compared with the HRA.
Keeping track of 100s of experts working on the HRA, 100s of experimental datasets, and 1000s of papers that provide scholarly evidence for the HRA is non-trivial. This paper presents the first analyses and visualizations that showcase what data and scholarly evidence exist for which organs and how experts relate to the organs covered in the current and future HRA releases.
The remainder of the paper is organized as follows: The subsequent section introduces prior work. Next, we detail the data sources used in this paper and preprocessing performed on data.
We then analyze and tabulate scholarly paper and experimental data evidence for the HRA. Next, we use Web of Science to analyze and visualize experts and the organs they study to identify additional experts that we plan to invite to review the HRA or contribute to it in the future. We conclude with a summary of results and a discussion of next steps.

Prior work
Given recent advances in biomolecular experimental studies, it has become possible to study humans and other species at the single cell level. A key goal of many studies is the development of a healthy reference atlas that can be compared to data for diverse diseases to understand associated structural and functional changes in tissues across scales. Data used for atlas design comes from many experimental studies conducted by teams around the globe. Harmonizing and interlinking this data is non-trivial. Most efforts focus on experimental data exclusively while some aim to capture links to scholarly publications and expertise. We discuss five exemplary efforts here.
(1) The human Ensemble Cell Atlas (hECA) effort (Chen, Luo et al., 2022) aims to build an atlas of human cells as a reference for future biological and medical studies of human health and disease. The HRA compiles data of cells across organs and studies into one data repository using a unified hierarchical annotation framework (uHAF) to harmonize data. In 2021, the HRA provides access to scRNA-seq data of more than 1 million human cells from diverse studies.
(2) CellMarker Hu et al., 2022)    (http://speedatlas.net) is a single-cell pan-species atlas that covers more than 5 million cells across 127 species, aiming to advance our collective understanding of the heterogeneities among cells, tissues, organs and species. However, to the best of our knowledge, none of these five efforts has systematically studied or visualized the network of how experimental data and scholarly papers provide evidence for reference atlas construction.
Several teams within HuBMAP (HuBMAP Consortium et al., 2019) are working on the development of general methods and computational (Börner, Teichmann, et al., 2021;Manz et al., 2022;Zhang et al., 2022), demonstration projects (Burnum-Johnson et al., 2022), organspecific atlases (Becker et al., 2022;Kruse and Spraggins, 2022), and novel technologies that can be used to map human tissue at the single cell level (Deng et al., 2022;Melani et al., 2022;Schachner et al., 2022;Stockwell, 2022). This paper is unique in that it shows for the very first time how elements of the HRA are linked to scholarly papers and experimental data to understand and communicate atlas quality, to guide future tissue data collection, and to identify other leading experts that might be interested to serve as authors or reviewers of the evolving atlas.

Data and data processing
This section details all data used in this study: nearly 500 papers cited by the 33 reference organs covered in the Human Reference Atlas and Azimuth; approximately 250,000 papers on the 33 organs retrieved from the Web of Science; and roughly 300 experimental datasets that are associated with about 80 scholarly papers. Data details and code are available at https://github.com/cns-iu/hra-evidence-supporting-information.
Publication evidence from HRA The Human Reference Atlas captures data on ASCT+B tables (see Introduction and Fig. 1), associated two-dimensional (2D) and three-dimensional (3D) spatial representations of major anatomical structures and the cell types commonly located in these, and the biomarkers (gene, protein, lipid, metabolites) used to characterize cell types. Gene biomarkers used to characterize cell types in different organs are published via Azimuth , see existing organ references at https://azimuth.hubmapconsortium.org. This section details how paper evidence was retrieved from different websites and processed to get summary statistics.

ASCT+B References
The CCF ASCT+B Reporter (https://hubmapconsortium.github.io/ccf-asct-reporter) lets users explore ASCT+B table visualizations and download table reports of key statistics (e.g., number of cell types per organ). Table 1 shows the unique number of references listed in the 26 ASCT+B Tables from the 3rd HRA release v1.2. Note that references are cited at the entire organ level but also for specific cell types in the organ. The number of all unique references cited in the 26 tables is 456, including 12 unique books, 439 unique papers (305 of them in WoS core collection) and 5 papers from PubMed other sources.

2/3D reference objects & OMAPs
In the 3rd HRA release, there are 19 2D reference objects for functional tissue units in 7 organs with 90 unique cell types; 53 3D reference organs with 1,542 named anatomical structures, and 7 Organ Mapping Antibody Panels (OMAPs) for 187 anatomical structures, 179 cell types and 197 protein biomarkers across the 7 organs. Papers for the 2D and 3D References Library Objects and OMAPs were downloaded from the HuBMAP CCF Portal (https://hubmapconsortium.github.io/ccf) and are listed and summed up in Table 2. The publication references include 2 unique books and 14 unique scientific papers (with unique DOIs). None of the 16 scholarly works are cited in ASCT+B tables, likely due to the fact that ASCT+B table references focus on the cell type level. 16 14 *Note that the kidney organ has two 2D objects ("Kidney, 2D Nephron FTU v.1.0" and "Kidney, 2D Renal Corpuscle FTU v.1.0") with one book (ISBN 978-3-662-02676-2) each, and another two 2D objects ("Kidney, 2D Nephron FTU v.1.0" and "Kidney, 2D Renal Corpuscle FTU v.1.0") with one paper (ISBN 978-3-642-08106-4) each. ** Note that the brain organ has four 3D objects with one paper each.

Azimuth references
Azimuth references support cell type annotation for tissue datasets . They exist for 10 organs and references to associated publications can be downloaded from https://azimuth.hubmapconsortium.org and are listed in Table 3. HuBMAP focuses on adults (excluding fetal development) and there do not yet exist ASCT+B tables for adipose and tonsils. In total, there are 38 unique papers associated with the 10 Azimuth references and 2 of the papers are preprints. There are 36 unique papers with DOIs and 8 of these are also cited in ASCT+B tables.
Papers listed in Azimuth single-cell references for which ASCT+B tables exist have been shared with table lead authors for possible inclusion in the ASCT+B tables.

Experimental data references
A total of 308 datasets from single-cell studies of healthy human adults were retrieved from HuBMAP Portal (Cao et al., 2019;Stuart et al., 2019), CxG Portal (Domínguez Conde et al., 2022The Tabula Sapiens Consortium* et al., 2022), NeMO (Orvis et al., 2021), and GTEx (Eraslan et al., 2022) in October 2022. These high quality datasets cover 57 organs and the datasets are associated with 83 unique papers. Exactly 78 of these papers have DOIs and 67 of these DOIs are not cited in the existing 26 ASCT+B tables. A table showing the count of experimental data references per organ can be found on GitHub at https://github.com/cnsiu/hra-evidence-issi-2023-supporting-information. Note that there are 38 organs for which no ASCT+B table exists yet.
Papers associated with high-quality experimental datasets for organs that have ASCT+B tables were shared with table lead authors for possible inclusion in the ASCT+B tables.

Summary
In sum, there are 12 unique books, 439 unique papers (including 305 WoS core papers) and 5 papers from PubMed other sources listed in the 26 ASCT+B tables from the 3rd HRA release; 16 papers (14 of them with DOIs) cited in the 2D, 3D reference objects, and OMAPs references; 26 unique papers associated with the 10 Azimuth single-cell annotation references; and in the set of 380 unique datasets, 195 have 49 unique papers associated.

WoS papers for 33 organs
To better understand which major papers were recently published on the 33 organs planned for the next HRA release, we ran a query over the Web of Science core collection provided via the Collaborative Archive & Data Research Environment (CADRE) Wittenberg et al., 2020). The retrieval result comprises 250,620 papers that were published in 2018 to 2022 and have these organ words in titles or keywords and were cited at least 10 times. These papers cover all the 33 organs except Blood pelvis.
The papers were tagged with HRA specific organ tags based on the 33 organ names occurring in title or keywords. Next, we used the Web of Science (WoS) standard format to retrieve clean author names and affiliations.
A closer look at the 82 unique papers reveals that 77 of them have a DOI and 10 of these 77 are cited in the ASCT+B tables. A table of organ-specific papers that ASCT+B lead authors should review and consider for inclusion was compiled and published on GitHub at https://cnsiu.github.io/hra-evidence-issi-2023-supporting-information. This table was shared with table lead authors for possible inclusion in the ASCT+B tables.

Quality and coverage of the HRA
A comparison of experimental data to anatomical structures, cell types, plus biomarkers covered in the ASCT+B tables helps individuals understand and communicate the coverage of the existing HRA and plan future tissue data collection (e.g., to collect a minimum amount of experimental data for major anatomical structures and cell types). The ASCT+B Reporter (https://hubmapconsortium.github.io/ccf-asct-reporter) was used to visualize the network of anatomical structures, cell types, plus biomarkers in a ASCT+B Master table as a basemap and to overlay experimental data so that coverage can be compared and communicated. See workflow detailed in https://hubmapconsortium.github.io/hra-previews/pilots/pilot1.html.

Figure 2. ASCT+B Reporter comparison visualization of an ASCT+B table and experimental data.
Network of anatomical structures (red nodes), cell types (blue nodes), and protein biomarkers (green nodes) for the skin ASCT+B table is used as a basemap. Experimental data is overlaid in orange, making it easy to explore and communicate (non)matching anatomical structures, cell types plus biomarkers covered in a study.
The visualization shows what data and publication evidence (here 10 datasets published in 1 paper) exist for which anatomical structures, cell types, and protein biomarkers. The ASCT+B Reporter makes it possible to overlay data from multiple studies using different colors. Insights gained are valuable for planning future tissue data collection, e.g., to collect a minimum amount of experimental data that maximally improves HRA coverage and quality.

Mapping experts by organ and geolocation
The 26 ASCT+B tables list 88 directly involved experts who serve as authors or internal and external reviewers. For each expert, there exists an ORCID ID in the ASCT+B tables-a total of 52 unique authors, 4 unique project leaders, 47 unique reviewers. Some experts serve in multiple roles across organs. As for the 2D reference objects, there are 14 unique experts listed; for 3D reference objects, 32 experts, and for OMAPs 29 experts. Across the HRA, there are 116 unique experts and 113 of them with ORCID IDs.
Using the WoS papers data comprising 250,620 papers that featured any of the 33 organ names in their title and or keywords, we identified 672,892 indirectly involved expert authors. The authors have 114,965 unique affiliations in 189 unique countries. A map of the world with a country-level overlay of authors and their co-author relationships is shown in Fig. 3. The original network was almost fully connected and hence MST-Pathfinder Network (PFnet) (Sci2 Team, 2009) was applied to remove less important edges. In the resulting network, the US has 84,287 papers and is the most highly connected node with these top-5 collaborators: CN (9,541 papers), UK (6,338 papers), CA (5,605 papers), DE (5,131 papers), and IT (4,271 papers). To ascertain what organ expertise the paper authors bring to the table, we computed the distribution of the number of organs per expert and the number of papers per organ, see Fig. 4. exist for the liver, followed by papers on the brain, lung, breast, heart, kidney, and skin.
To discern who funds work on the 33 organs, we extracted the bimodal network of organs and funding agencies. As the network was rather dense, we applied PFnet to retain the strongest linkages, see result in Fig. 5. The National Institutes of Health (NIH), United States Department of Health & Human Services (HHS), European Commission and UK Research & Innovation (UKRI) fund 32 organs (note that 'bone marrow pelvis' was excluded from this study as this combination did not retrieve any papers). Most papers on brain topics are funded by NIH, which is acknowledged in 24,352 papers.
At the author level, Fig. 6. shows the bimodal network of highly cited experts (equal or more than 100 citations) and the organs they study. As can be seen, highly cited experts study liver (66 experts) and lung (40 experts). The paper with the most authors is entitled 'Osimertinib in Untreated EGFR-Mutated Advanced Non-Small-Cell Lung Cancer' and has 171 authors and 3,612 citations. In terms of geographical distribution of authors, Fig. 7. presents the number of authors per country per organ for country-organ combinations with more than 1000 authors. The number of authors from China and the United States is notably high, with over 10,000 experts specializing in liver, brain, and lung studies. Specifically, China has 15,526 liver experts and 11,119 lung experts, while the United States has 11,391 brain experts and 11,086 liver experts.

Summary and next steps
The paper presents the initial analyses and visualizations of scholarly papers and experimental dataset evidence for the Human Reference Atlas. We analyze the number and type of scholarly evidence for subgraphs of the HRA and show that 96.15 percent of the 26 ASCT+B tables, all of the 2D reference objects, 12% of the 3D reference objects and 28.57% of the OMAPs and all Azimuth references have scholarly publications associated; all 26 organs have experimental cell type by biomarker data evidence but coverage varies, see exemplary coverage for skin in Fig. 2. We have been and will continue to share results with the larger HRA community to highlight organ teams that have managed to provide extensive publication and experimental data evidence and to inspire other teams that just recently joined the HRA effort to do the same.
We analyzed the network of experts currently collaborating on the HRA and used WoS data to identify and visualize experts that work on the 33 existing and planned organs. The geospatial and bimodal networks showcase the number of experts and funders and their countries and we will use the results to invite other leading experts to serve as authors or reviewers of the evolving atlas. Connecting experts across projects and time zones will make it possible to benefit from international expertise, technologies, and datasets in support of highest quality HRA construction and usage of the HRA data in future scholarly publications.
Over the coming five years, we expect the number of active authors to grow from 200 to 1,000. The current set of organs will double to about 60 organs and organ parts, and we expect the final HRA will cover ca. 5,000 cell types and 10,000 unique anatomical regions. Managing the systematic authoring, review, and validation of the Human Reference Atlas is non-trivial. Visualizations that show the coverage and quality of the evolving atlas, relevant expertise around the globe, and high quality experimental datasets will be critically important for communicating progress to experts and funders engaged in constructing or using the atlas.