Exploring evidence selection with the inclusion network

Although systematic reviews are intended to provide trusted scientific knowledge to meet the needs of decision-makers, their reliability can be threatened by bias and irreproducibility. To help decision-makers assess the risks in systematic reviews that they intend to use as the foundation of their action, we designed and tested a new approach to analyzing the evidence selection of a review: its coverage of the primary literature and its comparison to other reviews. Our approach could also help anyone using or producing reviews understand diversity or convergence in evidence selection. The basis of our approach is a new network construct called the inclusion network, which has two types of nodes: primary study reports (PSRs, the evidence) and systematic review reports (SRRs). The approach assesses risks in a given systematic review (the target SRR) by first constructing an inclusion network of the target SRR and other systematic reviews studying similar research questions (the companion SRRs) and then applying a three-step assessment process that utilizes visualizations, quantitative network metrics, and time series analysis. This paper introduces our approach and demonstrates it in two case studies. We identified the following risks: missing potentially relevant evidence, epistemic division in the scientific community, and recent instability in evidence selection standards. We also compare our inclusion network approach to knowledge assessment approaches based on another influential network construct, the claim-specific citation network, discuss current limitations of the inclusion network approach, and present directions for future work.


Introduction
Systematic reviews have been widely adopted in the social sciences, ecological sciences, software engineering, and health sciences, to synthesize scholarly literature and provide guidance for critical decisions. Their authority depends upon several factors, and among them the relatively standardized procedure to find, select, and synthesize evidence. The selection of evidence is thus a crucial step in a systematic review. Evidence selection results in a list of publications, called the included primary study reports, which form a special subset of a systematic review's citations. Unlike other citations, this set of citations have direct causal effect on the validity and trustworthiness of the conclusions reached by a systematic review. We proposed a new network construct called the inclusion network to study the evidence selection practices in systematic reviews (Fu, Clarke, et al., 2022;Hsiao et al., 2020). The inclusion network is a bipartite network with two types of nodes: one represents systematic review reports (SRRs), and the other represents primary study reports (PSRs). A PSR is "included" in an SRR if it is used in that SRR's evidence synthesis. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. In our manuscript under peer review (Fu, Clarke, et al., 2022), we use the inclusion network, along with two inclusion network datasets, to address three research questions. RQ1: Do systematic reviews on a given topic consistently include the same evidence or not? RQ2: Does evidence inclusion become more or less similar over time in systematic reviews on the same topic? RQ3: Can we derive insights from the structure of inclusion networks regarding evidence selection?

Data
We prepared two inclusion network datasets, each based on data initially collected by researchers interested in studying systematic reviews addressing a given topic (Clarke, 2019;Trinquart et al., 2016): (1) the exercise and depression (ExRx) dataset (Clark et al., 2022) and (2) the salt controversy dataset (Fu, Hsiao, et al., 2022). The 27 SRRs in the ExRx dataset all address the relationship between physical activity and depressive symptoms, and the 14 SRRs in the salt controversy dataset all address the effect of sodium intake on cerebrocardiovascular disease or mortality.

Analytical Approaches
Adjusted Jaccard Similarity We introduced the adjusted Jaccard similarity (adjusted JS) to answer RQ1 and RQ2. When evaluating the consistency of evidence inclusion, we must consider that the evidence is not static but grows over time. Adjusted Jaccard similarity accounts for the difference in the available evidence over time: For a pair of SRRs, research not yet in search indexes or published at the time of the earlier search could be included only in the SRR with the later search date. The formula for adjusted JS is shown below. We denote the entire inclusion network as a graph ( , , ), where U is the set of all SRR nodes, V the set of all PSR nodes, and E the set of all edges in the inclusion network. To compute the adjusted JS on two SRR nodes and , we create an induced subgraph of ′({ , }, ′ , ′ ). and are the two SRR nodes whose adjusted JS we want to compute. ′ is the set of nodes representing all PSRs published by a threshold year . The two SRRs' search yearsthe year in which they completed the search of evidenceare denoted as and , respectively. We take the threshold = ( , ). The nodes in ′ that are connected to and are denoted as ′ ( ) and ′ ( ) respectively. Thus, the adjusted JS can be expressed as: To answer RQ1, we computed the adjusted JS for all SRR node pairs within each dataset and used descriptive statistics (i.e., mean, median, standard deviation) to characterize similarity among the SRRs. To answer RQ2, we computed two types of time series of adjusted JS. Notably, each SRR is time-stamped by their last search date rather than the publication date, because search date is more consequential to the resulting PSR list than the publication date, and there is often considerable delay between last search and first publication. First, we computed the population average of adjusted JS of all existing SRRs at each point when a new SRR completed its search, which gives a population-level cumulative description of the similarity in evidence selection. Secondly, to observe how a given SRR's similarity with other SRRs changes over time, we computed the average adjusted JS of the target SRR with all other existing SRRs at each point when a new SRR completed its search. The quantitative analysis with adjusted JS was coded using the R iGraph library (Csardi & Nepusz, 2006) by YF .

Network Visualization
We created temporal visualizations of the two inclusion networks, which provided us with a view of the formation of the networks. The temporal visualizations were produced using Python NetworkX (Hagberg et al., 2008) and Matplotlib (Hunter, 2007) libraries by MVM (Van Moer, 2022). The last snapshot is used as the visualization of the overall inclusion network. Figure 1 shows that systematic reviews on the same given topic do not include the same evidence consistently. The population-level times series shows that for ExRx SRRs at the population-level, evidence selection remained relatively constant at a low similarity level, except for the beginning 1 year (2012-2013) (Figure 2(a)). On the other hand, for the salt controversy SRRs, evidence inclusion grew less similar over time (Figure 2(b)). We also examined whether an individual SRR's evidence inclusion grew more or less similar to other SRRs using the individual-level cumulative adjusted JS time series (Figure 3 and Figure 4). For the ExRx SRRs, we can see some variations among the 27 time series (Figure 3), but more detailed analysis, for example, statistical testing, is required to tell whether certain time series are significantly different from others. For the salt controversy SRRs (Figure 4), we can discern some notable differences: SRR#4's time series shows an upward trend while the majority of the time series show a downward trend (e.g., SRRs #1, #2, #3, #5). SRR#4 was the first SRR to propose the need to synthesize observational studies (Strazzullo et al., 2009). The upward trends indicate that observational studies (which is the type of evidence SRR#4 included) are also selected by later SRRs, sustaining an upward trend in the time series. We observe structural features related to the evidence selection practices in the networks visualized. The ExRx inclusion network has a large number of PSR nodes that are only connected to one SRR ( Figure 5), which helps to explain the low adjusted JS seen in Figure 1. By contrast, the salt controversy inclusion network shows signs of community formation: SRRs #1, #2, #3, #5, #6, and #12 are closer together; SRRs #4, #7, #8, #9, #11, #13, and #14 are closer together; SRR #10 is distant from others ( Figure 6). We discovered later that the clustering of SRRs was mainly due to their differences regarding the most appropriate study design to include (Figure 7). The temporal visualizations help us discover events of interest in the formation of the inclusion network. For example, Figure 8(d) shows that in 2015, 7 ExRx SRRs completed their search (#8, #9, #11, #12, #13, #15, #17). Among them, five belong to a group of researchers who publish together regularly, which may explain why these SRRs are clustered together.

Conclusions
We proposed a new network construct called the inclusion network to serve as the foundational data structure for the network-based study of the evidence selection practice in systematic reviews. We prepared two inclusion network datasets. We defined adjusted Jaccard similarity as a measure to assess similarities in evidence selection, which takes into account the growing available evidence. Using descriptive statistics and time series based on the adjusted Jaccard similarity, we assessed the overall consistency in evidence selection for systematic reviews in the two datasets and the temporal trends. Network visualization reveals structure features related to aspects of evidence selection practices. More work is required to understand the inclusion network and fully realize its analytical potential.      (2) SRRs that have newly completed their search; (3) nearly formed inclusion relationships. Some PSRs appeared after SRRs that included them appear, for example, PSR 273 in 6(b), which is due to the fact that PSRs could either be published online or indexed in databases earlier than their official publication date, thus allowing SRRs to find them.