Although systematic reviews are intended to provide trusted scientific knowledge to meet the needs of decision-makers, their reliability can be threatened by bias and irreproducibility. To help decision-makers assess the risks in systematic reviews that they intend to use as the foundation of their action, we designed and tested a new approach to analyzing the evidence selection of a review: its coverage of the primary literature and its comparison to other reviews. Our approach could also help anyone using or producing reviews understand diversity or convergence in evidence selection. The basis of our approach is a new network construct called the inclusion network, which has two types of nodes: primary study reports (PSRs, the evidence) and systematic review reports (SRRs). The approach assesses risks in a given systematic review (the target SRR) by first constructing an inclusion network of the target SRR and other systematic reviews studying similar research questions (the companion SRRs) and then applying a three-step assessment process that utilizes visualizations, quantitative network metrics, and time series analysis. This paper introduces our approach and demonstrates it in two case studies. We identified the following risks: missing potentially relevant evidence, epistemic division in the scientific community, and recent instability in evidence selection standards. We also compare our inclusion network approach to knowledge assessment approaches based on another influential network construct, the claim-specific citation network, discuss current limitations of the inclusion network approach, and present directions for future work.

Systematic reviews have been widely adopted in a number of fields in the social sciences, ecological sciences, software engineering, and health sciences, in order to synthesize scholarly literature and provide guidance for practice. To limit bias and ensure the validity of their conclusions, systematic reviews typically use relatively standardized procedures (Cooper, Hedges, & Valentine, 2019; Haddaway, Macura et al., 2018; Kitchenham, Budgen, & Brereton, 2015; Okoli, 2015; Page, McKenzie et al., 2021a), which have been customized and refined over decades in multiple fields (Chalmers, Hedges, & Cooper, 2002; Gurevitch, Koricheva et al., 2018). Despite these attempts at standardization and reducing bias, it is common for systematic reviews published at the same time on the same topic to yield different conclusions (Gøtzsche, 1994; Jadad, Cook, & Browman, 1997; Papatheodorou & Evangelou, 2022), which can pose challenges for practical application.

One crucial step in a systematic review is the selection of evidence. This consists of defining the selection principles (known as the “inclusion criteria”) and exhaustively identifying relevant literature on a given topic1. This search and screening process is often specified in advance in a protocol (Page, Shamseer, & Tricco, 2018) and typically uses a comprehensive search in multiple databases, with well-defined and consistently applied evidence selection criteria (Lefebvre, Glanville et al., 2022; Rethlefsen, Kirtley et al., 2021). Evidence selection results in a list of publications, called the included primary study reports, which form a special subset of a systematic review’s citations. However, this step is known to admit variation: Even methodologically rigorous reviews that address the same question can be discordant in their evidence selection (Créquit, Trinquart et al., 2016; Trinquart, Johns, & Galea, 2016). In particular, discordance in evidence selection is frequently observed in conflicting systematic reviews (Bolland & Grey, 2014; Coarasa, Das et al., 2017; Useem, Brennan et al., 2015).

Network analysis has been used in the task of assessing the reliability of scientific knowledge (Duyx, Urlings et al., 2017a; Greenberg, 2009, 2011; Leng, 2018; Shwed & Bearman, 2010). The purpose of this paper is to design and test a new approach that uses network analysis to assess risks in a systematic review that decision-makers want to use as the foundation of their action. We define risk as any information concerning a systematic review that can reduce a user’s willingness to use the systematic review as the foundation of their action. In this paper, we aim to answer two research questions:

RQ1: How can we design an approach based on network analysis to assess risks in a systematic review?

RQ2: What risks can we discover by applying our approach to real cases?

The remainder of the paper is organized as follows. Section 2 presents background, including reliability issues in systematic reviews and the use of network analysis to assess the reliability of scientific knowledge. Section 3 answers RQ1 by detailing the design of our new approach. Sections 4 and 5 answer RQ2. In particular, Section 4 introduces our two case studies and how we implemented our approach on the two case studies, and Section 5 presents the discoveries we made from the two case studies. We discuss the difference between our approach and related work, especially claim-specific citation network analysis (Greenberg, 2009, 2011), as well as our limitations and future work in Section 7. We conclude with our primary findings in Section 6.

Our work is built upon two threads of prior research: first, reliability issues in systematic reviews, and second, the use of network analysis to assess the reliability of scientific knowledge.

2.1. Reliability Issues in Systematic Reviews

Systematic reviews are a methodology developed to ensure that scientific evidence can be accumulated scientifically and summarize the state of knowledge on a particular issue (Chalmers et al., 2002). The validity of review conclusions has long concerned researchers involved in methodological development. As early as 1982, Harris Cooper, one of the founders of modern research synthesis, conducted a comprehensive analysis of threats to validity in the systematic review process. He divided the review process into five stages: (a) problem formulation; (b) data collection; (c) evaluation of data points; (d) data analysis and interpretation; and (e) presentation of results. The evidence selection process is influenced by stage a and spans stages b and c. Within these three stages, Cooper (1982) identified the following threats to validity: a lack of consensus on the meaning of a concept, too narrow a definition of a concept, a lack of attention to how studies operationalized a concept, failure to locate all studies pertinent to the topic of interest, a mismatch between the target population and the population covered in the studies collected, improper evaluative criteria (e.g., confirmatory bias), and unreliable primary research (the only threat that is out of reviewers’ control, as Cooper (1982) pointed out).

Later, as conflicting systematic reviews emerged and started to concern researchers, Felson (1992) formulated a framework to describe threats to validity using the language of bias. In this context, a bias is a factor that can systematically drive conclusions away from the truth. Felson (1992) also proposed a taxonomy of bias. For each type of bias, Felson described corresponding methods and principles that can help eliminate or alleviate it.

Conflicting conclusions in systematic reviews have been an ongoing subject of study even in recent decades (Bolland & Grey, 2014; Coarasa et al., 2017; Hacke & Nunan, 2020; Khamis, El Moheb et al., 2019; Lucenteforte, Moja et al., 2015; Osnabrugge, Head et al., 2015; Useem et al., 2015). The overwhelming number of systematic reviews (Bastian, Glasziou, & Chalmers, 2010) and their production and selective publication for marketing purposes (Ioannidis, 2016) intensifies the importance of studying conflicting conclusions in systematic reviews.

Evidence selection discordance is frequently observed in conflicting systematic reviews. Two systematic reviews on healthcare in low- and middle-income countries searched for evidence in the same month, but came to different conclusions; the studies reviewed hardly overlapped: “There were only 16 citations in common although one reviewed 102 studies and the other reviewed 80 studies” (Coarasa et al., 2017). Meanwhile Useem et al. (2015) studied 40 matched pairs of systematic reviews (matching each review by well-known review producer Cochrane with a review by a non-Cochrane review group) that were published within 5 years of one another. They found substantial differences in primary study inclusion, and the publication sequence—evidence published subsequent to one review and prior to another review—was unable to explain discordance in 47% of the pairs of reviews examined. In another study comparing Cochrane and non-Cochrane matched pairs, Hacke and Nunan (2020) found only one out of 24 matched pairs included the same studies. For two-thirds of the matched pairs, they could not explain the evidence inclusion discrepancies. Bolland and Grey (2014) analyzed a sample of seven meta-analyses reporting the relationship between vitamin D and bone fracture published in high-ranking general medical journals. They identified a set of trials deemed eligible for the meta-analyses. Of the seven meta-analyses, Bolland and Grey (2014) found that four reviews included all eligible trials; the remaining three missed three to eight trials. Siontis, Hernandez-Boussard, and Ioannidis (2013) discovered that even when meta-analyses stated similar eligibility criteria, they still selected different evidence, although in this case, the reviews Sointis et al. studied did not reach conflicting conclusions.

Recently, attention has turned to reproducing systematic reviews. Low, Ross et al. (2017) investigated whether two independent systematic review teams provided with identical objectives, clinical data, resources, and time would yield the same results when reviewing industry data. The two systematic review teams came to broadly the same findings, with differences regarding how the treatment studied impacted the risk of cancer and the benefits in spinal fusion surgery. Their methods, results, and interpretations still differed slightly. Low et al. (2017) took a positive view of the variations in different meta-analyses, because closed data with a single interpretation of data may “lead people to believe there is no other possible approach.” Concerns regarding the reproducibility and replicability of evidence synthesis led to a community effort called the REProducibility and Replicability In Syntheses of Evidence (REPRISE) project, which plans to crowdsource reviewers in order to replicate the search and inclusion of evidence from a sample of published systematic reviews (Page, Moher et al., 2021b).

Current approaches for studying discordance developed by the systematic review community primarily rely on content analysis (Bolland & Grey, 2014; Hacke & Nunan, 2020; Katrak, Bialocerkowski et al., 2004; Osnabrugge et al., 2015) combined with methods referred to as “critical appraisal” or “critical assessment” of systematic reviews (Katrak et al., 2004; Osnabrugge et al., 2015). The conclusions of such studies report discordance in evidence selection as the number or fraction of overlapping included studies and number or fraction of differences. Occasionally, data analytics and visualization have been used. Lucenteforte et al. (2015) used colored circles (“traffic light visualizations”) to visually compare pairs of heart attack treatments by clustering 36 systematic reviews; colors indicate superiority (green), nonsuperiority (yellow), and discordance (red). Créquit et al. (2016)’s Figure 4 used network visualization to show the percentage of treatment comparisons covered by systematic reviews in each year from 2009 to 2015, from 77 randomized controlled trials of lung cancer treatments; nodes are treatments and edges are treatment comparisons. The only visual and data analytical approach used in studying evidence inclusion we are aware of is a tool called GROOVE (short for Graphical Representation of Overlap for OVErview), which provides spreadsheet templates for authors developing overviews of multiple systematic reviews (Pérez-Bracchiglione, Meza et al., 2022). GROOVE’s focus is on assessing the overlap of the primary studies (evidence inclusion) to derive appropriate weights for statistical summaries in the overviews rather than understanding evidence selection discordance.

2.2. The Use of Network Analysis to Assess the Reliability of Scientific Knowledge

Greenberg (2009, 2011) studied scientific belief systems, focusing on the impact of uneven citation of nonconcordant evidence. He introduced the concept of a claim-specific citation network, “the set of all papers (nodes) containing statements regarding a specific claim or related set of claims and the citations (arrows; directed edges) from these statements to other papers” to represent the scientific belief system (Greenberg, 2011). Using his claim-specific citation network construct, Greenberg traced the factual foundation of a widely held belief important to the care of patients with a muscle disorder called inclusion body myositis and concluded that the belief was unfounded. He identified citation bias (“When relevant data addressing a claim are not cited” (Greenberg, 2011)) and the amplification or the “lens effect” by a few influential papers “containing no data on claim validity” (Greenberg, 2009).

Claim-specific citation networks were later adapted2 by Leng (2018) to reveal research underutilization, biased research utilization, and their impact on the conclusions drawn by narrative scientific reviews. Leng (2018) constructed a claim-specific citation network of reviews centered on the four pre-1984 randomized controlled trials (RCTs) investigating whether fat-controlled diets were effective for preventing coronary heart diseases. Forty-five percent of the reviews were supportive, with the remainder evenly split between neutral and unsupportive. Consideration of more evidence was associated with a neutral position on the benefit of fat-controlled diets, and many reviews only cited the one RCT supportive of a fat-controlled diet while ignoring the three RCTs unsupportive of this claim. The four RCTs Leng collected constituted a tiny evidence set that reviewers had no excuse to miss, and all RCTs should have been utilized, because Leng selected only reviews published at least 2 years after the last RCT report was published. Yet reviews made fewer than half the possible citations to RCTs: In the network as a whole, Leng found evidence utilization of only 49%. By contrast, the neutral reviews had higher evidence utilization (68%). A Pearson’s chi square test confirmed citation bias (Leng, 2018).

Trinquart et al. (2016) used an exponential random graph model (Robins, Pattison et al., 2007) to test citation homophily based on the stance a report took towards a scientific claim (supportive, against, and inconclusive) and found that publications tend to cite other publications with the same stance. Their work further consolidated the association between citation bias and unreliable knowledge: When papers on each side only cited papers of the same stance, the literature left readers little impression of an ongoing scientific controversy. Duyx, Urlings, Swaen, Bouter, and Zeegers published a series of empirical studies on determinants of citation (Duyx et al., 2017a, Duyx, Urlings et al., 2017b, 2019, 2020; Urlings, Duyx et al., 2019a, 2019b, 2020, 2021). Most importantly, they contributed a universal method for detecting citation bias with statistical rigor. Their method can identify publication characteristics that are associated with citation frequency by modeling the relationship between any chosen characteristic of the publication (e.g., conclusion, author affiliation, sample size) and the probability of a potential citation path being realized (Urlings et al., 2021).

Knowing whether a controversy has reached its conclusion can help one assess the risk of adopting a particular knowledge claim. To characterize the level of consensus in a network, Shwed and Bearman (2010) measured network cohesion using the modularity score (Newman, 2018), scaled by the logarithm of the size of the network. They also introduced a dynamic window approach to subset a citation network temporally to reflect the scientific belief system at a given time; for a given year, the window width is determined by the median age of publications cited in a given year (where age is publication year of the citing paper minus publication year of each cited paper). Shwed and Bearman argued that their dynamic window approach was superior to the cross-sectional or fixed-window approach because the window width was theoretically justified and sensitive to the level of scientific activity. It was also shown to be more sensitive to shifts in consensus. By combining the scaled modularity score and dynamic window approach, Shwed and Bearman were able to measure the change in consensus over time with higher sensitivity. They found that changes in scaled modularity score using their dynamic window approach tracked their expectations for studies that were (a) never contested (noncarcinogenicity of coffee), (b) once contentious but currently consensual (carcinogenicity of smoking, climate change), and (c) currently contested (carcinogenicity of cell phones, relationship between vaccination and autism).

Controversy mapping is a “conceptual and methodological toolbox” for studying scientific controversy, and it incorporates network analysis (Venturini & Munk, 2022). However, controversy mapping differs from the methods introduced so far in at least three aspects: (striving for) a more sympathetic attitude towards all players3, resistance to a (hasty) reduction of a controversy, and a refusal to specify their approach in a conventional way4 (Venturini, 2010). Although not open to those uninitiated in its use, controversy mapping is extremely powerful and adaptive to navigate epistemic communities at clash with each other, as Jacomy (2020) exemplified in his analysis of the controversy of scale-freeness in the field of network science.

2.3. Situating Our Work

Reliability issues in systematic reviews make it important to assess systematic reviews before they are used for decision making. Network analysis has shown its strength in assessing the reliability of scientific knowledge, primarily through approaches based on claim-specific citation network analysis. The methodological insight of claim-specific citation network analysis is that a claim-specific citation network is “a published record of a belief system” and the propagation from data to belief was documented in it (Greenberg, 2011). In this case, the entire network is about one claim or one piece of scientific knowledge. Our work joins these two threads of research with an important alteration: We treat a given systematic review (the target) as a documentary representation of some scientific knowledge, and we assess the risk of adopting its knowledge by looking at how the given systematic review’s chosen evidence is compared to evidence chosen by other systematic reviews studying similar topics (the companion reviews).

3.1. Design Principles

The purpose of this paper is to design and test a new approach that uses network analysis to assess risks in a systematic review that decision-makers want to use as the foundation of their action. We define risk as any information concerning a systematic review that can reduce a user’s willingness to use the systematic review as the foundation of their action. The definition also forms our first design principle that users, based on their own knowledge and values, make the final judgment of whether they find a risk.

Based on what we reviewed in Section 2.1, prior research (e.g., Cooper (1982) and Felson (1992)) has laid out many threats to the validity of review conclusions. Yet in addition to those threats to validity, we are aware that some risks can be unexpected, clear only on retrospection or through the participation of people with specific knowledge related to an issue. With this in mind, the second design principle is that the approach must have a component that enables users to discover unknown risks.

We now come to the third design principle, the principle that differentiates our approach from claim-specific citation network analysis (introduced above in Section 2.2). This principle is also specific to the study of evidence selection: We treat a systematic review as a documentary representation of some scientific knowledge, and we assess the risks of adopting the knowledge by looking at how the systematic review’s chosen evidence compares to evidence chosen by other systematic reviews studying similar topics. This principle incorporates the social and historical aspect of scientific knowledge, as the standards for valid scientific knowledge vary across different scientific communities and evolve with time (Cetina, 1999; Kuhn, 2012). Thus, one way, and perhaps the only way, to discover risks associated with community differences and temporal evolution in evidence selection, is by comparing one systematic review to other systematic reviews researching similar research topics.

Network analysis cannot discover all possible risks under our definition. Further considerations thus include whether network analysis techniques have the analytical affordance to detect a risk and whether network analysis can justify its superiority in detecting that risk over other methods. For example, a mismatch between the target population and the population covered in the studies collected (Cooper, 1982) can be detected by reading a systematic review’s inclusion criteria, or even the title. We focus on evidence selection in this paper for two reasons. First, evidence selection discordance is a frequent observation in conflicting systematic reviews and a common concern (see Section 2.2). Second, network analysis allowed us to vividly see evidence selection discordance in our pilot study (Hsiao, Fu, & Schneider, 2020).

Therefore, the fourth design principle is that appropriate risks to be assessed by this approach are most likely those that can reveal themselves through quantification by network metrics but not by other means.

We document all four design principles in Table 1 for easy reference.

Table 1.

Four principles for designing the inclusion network approach

Design Principle 1 Users make the final judgment of whether they find a risk. 
Design Principle 2 The approach must have a component that enables users to discover unknown risks. 
Design Principle 3 We treat a systematic review as a documentary representation of some scientific knowledge, and we assess risks of adopting the knowledge by looking at how the systematic review’s chosen evidence compares to evidence that was chosen by other systematic reviews studying similar topics. 
Design Principle 4 Appropriate risks to be assessed by this approach are most likely those that can reveal themselves through quantification by network metrics but not by other means. 
Design Principle 1 Users make the final judgment of whether they find a risk. 
Design Principle 2 The approach must have a component that enables users to discover unknown risks. 
Design Principle 3 We treat a systematic review as a documentary representation of some scientific knowledge, and we assess risks of adopting the knowledge by looking at how the systematic review’s chosen evidence compares to evidence that was chosen by other systematic reviews studying similar topics. 
Design Principle 4 Appropriate risks to be assessed by this approach are most likely those that can reveal themselves through quantification by network metrics but not by other means. 

3.2. Data Structure: The Inclusion Network

We propose a network construct called the inclusion network. The inclusion network is a bipartite network with two types of nodes: One represents systematic review reports (SRRs), and the other represents primary study reports (PSRs). A PSR is “included” in an SRR if it is used in that SRR’s evidence synthesis. In an inclusion network, if an SRR includes a PSR, there is a directed edge from the SRR to the PSR. Each included PSR is cited in the SRR, but not all references cited in an SRR are included in the evidence synthesis; thus the inclusion network is a subset of the paper’s citation network. Included PSRs can be determined with high confidence, as SRRs are expected to report them in their text, tables, or appendices.

3.3. Data Collection Process

The data collection process reflects Design Principle 3. To prepare their data set, users first identify a collection of SRRs (i.e., the companion SRRs) that study similar research questions to the SRR to be assessed (i.e., the target SRR). Currently, we allow users to define their collection strategy. After collecting SRRs, they extract all included PSRs from the SRRs. At a minimum, three data files should be prepared: an attribute list recording each node’s unique identifier, publication date, and type (SRR or PSR); an edge list specifying all inclusion relationships; and a file containing the dates on which each SRR completed its last search. Users can refer to Section 4.1 and the documentation of the two data sets associated with our case studies (Clarke, Lischwe Mueller et al., 2023; Fu, Hsiao et al., 2023) for more details about data preparation.

3.4. Network Metrics

Our design of network metrics reflects Design Principle 4. We designed the adjusted Jaccard similarity as the main metric we will rely on for risk assessment. It modifies the Jaccard similarity (JS) to adjust for differences in the evidence available to be found at different points in time. The Jaccard similarity (JS)—a widely used similarity measure in graph analytics—quantifies a pair of SRRs’ similarity in choice of evidence. The adjusted JS adjusts for differences in the pool of available evidence due to the temporal sequence depicted in Figure 1. Two reviews (even SRRs published in the same month) typically searched the literature at different times and, consequently, research not yet in search indexes or published at the time of the earlier search can be included only in the SRR with the later search date. In the example shown in Figure 1, PSR 3 was published after the search date for SRR A and only has the possibility to be included in SRR B, not SRR A.

Figure 1.

A graphic depiction of the temporal sequence.

Figure 1.

A graphic depiction of the temporal sequence.

Close modal
The equation for the adjusted JS is shown below as Eq. 1. We denote the entire inclusion network as a graph G(U, V, E), where U is the set of all SRR nodes, V the set of all PSR nodes, and E the set of all edges in the inclusion network. To represent the temporal sequence based on the publication date of PSRs and the search date of SRRs, we create a rank value for all viV and ujU, denoted as Rank(vi) or Rank(uj), with the earliest publication—based on the publication date of a PSR or the search date of an SRR—having the smallest rank value. To compute the adjusted JS on two SRR nodes ui and uj, we create an induced subgraph of G as G({ui, uj}, V′, E′). V′ is the set of nodes representing all PSRs with rank value smaller or equal to min(Rank(ui), Rank(uj)). The nodes in V′ that are connected to ui and uj are denoted as V′(ui) and V′(uj) respectively. Thus, the adjusted JS can be expressed as
AdjustedJSuiujVuiVujVuiVuj
(1)
where
V=v:RankvminRankuiRankujvV
E=uv:uVvVuvE
Vui=v:uivEvV
Vuj=v:ujvEvV

We use the individual-level average adjusted JS to assess the stability of an SRR’s evidence selection standards: whether an SRR’s choice of evidence continued to be accepted by other SRRs or the evidence it selected was no longer used by other SRRs.

The equation for computing the time series of individual-level average adjusted JS is shown below as Eq. 2:
ytsui=x¯
(2)
ts = the search date of the SRRs with rank r(ts)

x ∈ {adjusted JS(ui, uk) : Rank(uk) ≤ r(ts), ik}

The individual-level average adjusted JS is computed at each search date accounted for in the temporal sequence (ts). The individual-level average adjusted JS for a particular SRR at ts(y(ts, ui)) is defined as the average of all adjusted JS (x) of uj with all other SRRs uk having Rank(uk) less than or equal to r(ts)5. When computing the time series of this statistic, each SRR is time-stamped by its last search date rather than its publication date, because, again, the search date is more consequential to the resulting PSR list than the publication date, and there is often considerable delay between last search and its publication. This statistic, the individual-level average adjusted JS, has a particular property that can help detect changes in evidence selection standards: If the value computed at a given time point t(rts) is larger than the value computed at the previous time point t(rts − 1), it means that the SRR or SRRs with rank rt are more similar to ui in evidence selection than the SRR or SRRs with rank lower than rts (i.e., adding the adjusted JS between ui and the SRR or SRRs with rank rts into the calculation of the average helps to bring up the average).

3.5. Assessment Process

Our assessment process consists of three steps. Step 1 (Visualization Assessment) reflects Design Principles 1 and 2 and is exploratory and mostly analyst-driven. Mainly, we examine the entire network and the position of the target SRR in it, notice features that capture our attention, and move back and forth between the network and the content of the SRRs to identify anything we judged to be a risk.

Step 2 (Quantitative Similarity Assessment) reflects Design Principle 4 by using network metrics to detect risks that will be hard to uncover using other methods. Step 2 identifies evidence potentially relevant to the user that was either overlooked by the target SRR or was published after the search date of the target SRR. We use adjusted JS, regular JS, and the fold change from regular JS to adjusted JS (referred to as JS fold change later) for the identification of such evidence. JS fold change (Eq. 3) is the fold change from adjusted JS to regular JS. When regular JS is zero, adjusted JS will also be zero, as in this case there is no shared PSR between the pair of SRRs. In this special case, the fold change is defined as 1.
JSFoldChangeuiujadjustedJSuiujregularJSuiujregularJSuiuj01regularJSuiuj=0
(3)

Step 3 (Time Series Assessment) reflects Design Principle 4 and detects instability of evidence selection standards using time series based on the individual-level average adjusted Jaccard similarity. We describe the three steps of our assessment process in detail in Section 4.2.

In two case studies, we applied our inclusion network approach shown in Figure 2.

Figure 2.

The inclusion network approach.

Figure 2.

The inclusion network approach.

Close modal

4.1. Data Collection

We prepared two inclusion networks, each based on data initially collected by researchers interested in studying systematic reviews addressing a given topic. This resulted in two data sets and we have made both available in the Illinois Databank: the exercise prescription (ExRx) data set (Clarke et al., 2023) and the salt controversy data set (Fu et al., 2023).

4.1.1. ExRx data collection

For the ExRx data set (Clarke et al., 2023), we obtained 27 SRRs6 investigating the relationship between physical activity and depressive symptoms by extending a previous data set from domain expert Caitlin Clarke (Clarke, 2019), which has also been used to explore syndemic approaches to exercise science intervention research (Clarke & Adamson, 2023). Our SRRs were limited to systematic reviews and meta-analyses published between 2013 and 2020 that investigated the relationship between physical activity and depressive symptoms. Exercise is often regarded as a subgroup of physical activity in the field of kinesiology/exercise science (Caspersen, Powell, & Christenson, 1985). For the rest of this paper, we generally use more colloquial terminology: “exercise” instead of “physical activity” and “depression” instead of “depressive symptoms.”7 A search conducted between November 2019 and February 2020 ultimately generated 27 included manuscripts. To construct the inclusion network, contributor NLM manually screened the PDFs of the 27 SRRs to identify all included PSRs (365 PSRs in total). The ExRx inclusion network contains 27 SRR nodes, 365 PSR nodes, and 589 edges.

We choose SRR #2, “Exercise for Depression” by Cooney, Dwan et al. (2013) as our target SRR. We did not strictly follow the Data Collection Process specified in Section 3.2: We did not start from a target review in our data collection but used an existing data set. SRR #2 was appropriate as a target review in our inclusion network on the relationship between physical activity and depressive symptoms due to its objective, “[t]o determine the effectiveness of exercise in the treatment of depression in adults compared with no treatment or a comparator intervention” (Cooney et al., 2013). In addition, SRR #2, a Cochrane Review by a distributed team and with at least four of its six authors outside of the field of exercise science, produced a less positive conclusion regarding the relationship between exercise and depression than the other 26 SRRs and received serious criticism from prominent researchers in the field (Clarke, 2019), making it an interesting SRR to study.

We treated the remaining 26 SRRs as companion SRRs. Although they all investigated the relationship between physical activity and depressive symptoms, their exact research questions and inclusion criteria are noticeably different8.

Sometimes SRRs study a facet of the research question raised by the target SRR (i.e., the relationship between exercise and depression). We name such SRRs faceting SRRs with respect to the target SRR. A facet is a certain aspect of the research question. Our target SRR, SRR #2, studies the relationship between exercise and depression, and several facets of its research question are considered in other ExRx SRRs: particular types of exercise, particular age groups, and particular levels of disease severity. For example, SRR #4 only considers yoga and depression, and SRR #25 only considers resistance exercise training and depression, making these both faceting SRRs with respect to SRR #2. Likewise, SRR #1 is limited to the elderly and SRR #24 is limited to adolescents and young adults. SRR #21 concerns a particular level of disease severity, major depression. Occasionally, facets were combined to further limit an SRR’s research question. For example, SRR #19 (Cramer, Anheyer et al., 2017) studied the “efficacy and safety of yoga interventions in treating patients with major depressive disorder.”

Faceting SRRs of the target SRR can be quite useful for assessing the target SRR using the value of adjusted JS. For example, if the adjusted JS between the target SRR (SRR #2) and an SRR that studies only yoga (e.g., SRR #4) is zero, we can infer that our target SRR has very likely overlooked evidence from yoga. This property will be used in Step 2 of our assessment process.

4.1.2. Salt controversy data collection

For the salt controversy data set (Fu et al., 2023), we obtained 14 SRRs and 68 PSRs from a claim-specific citation network collected by epidemiologists (Trinquart et al., 2016), who carried out a systematic search to identify publications addressing the effect of salt reduction on all-cause mortality and cerebro-cardiovascular diseases that were published between 1978 and 2014. The name, salt controversy, refers to the debate among scientists over the relationship between elevated salt consumption and public health risks such as stroke and cardiovascular disease (Bayer, Johns, & Galea, 2012); in particular Trinquart et al. (2016) analyzed disagreement on the top-level question (Is reducing salt beneficial for certain health outcomes?). Some of the SRRs reviewed the impact of salt reduction on other health outcomes, such as reduction in blood pressure, which fell outside of the scope of the question formulated by Trinquart et al. As a result, eight of the 14 SRRs have included PSRs that were not found in the 68 PSRs identified by Trinquart et al. (2016) (see Table S1 in the Supplementary material). Also, among the 68 PSRs, 18 were not included by any of the 14 SRRs but were identified by Trinquart et al. (2016) through their systematic search; we retained these in the inclusion network because they were technically “included” by Trinquart et al.’s search; and keeping them can help users see unused yet potentially relevant evidence. The resulting salt controversy inclusion network consists of 14 SRR nodes, 68 PSR nodes, and 184 edges.

All SRRs in the salt controversy inclusion network can be target SRRs with the exception of SRR #10, which is a faceting SRR studying a particular group: patients with systolic heart failure. We choose the target SRR from a series of Cochrane reviews9: SRR #5 (Taylor, Ashton et al., 2011b), #6 (Taylor, Ashton et al., 2011a), and #12 (Adler, Taylor et al., 2014). We choose the last publication of the series, SRR #12, as our target SRR and treat the remaining 11 SRRs as companion SRRs.

4.2. Implement the Assessment Process

In Step 1 (Visualization Assessment), we created visualizations of the two inclusion networks. We examined the position of the target SRR and the entire network, noticed features that captured our attention, and moved back and forth between the network and the content of the SRRs to discover anything we judged as a risk.

In Step 2 (Quantitative Similarity Assessment), we computed regular JS, adjusted JS, and JS fold change between the target SRRs and all their companion SRRs. This step identifies evidence potentially relevant to the user that might have been overlooked by the target SRR or was published after the search date of the target SRR (referred to later simply as new evidence).

The first type is companion SRRs that have zero adjusted JS with the target SRR. If such SRRs are faceting SRRs of the target SRR, they might have examined potentially relevant evidence overlooked by the target SRR.

The second type of companion SRRs are those that finished their search after the target SR, have a high adjusted JS with the target SRR, and have a high JS fold change with the target SRR. High adjusted JS means that the target SRR and the companion SRRs share similar evidence selection criteria, and a higher JS fold change means that the companion SRR includes more new evidence (i.e., PSRs published in the interval between the target SRR’s last search date and the companion SRR’s last search date)10. By examining the PSRs included by such companion SRRs, users may be able to identify evidence that is potentially relevant but could not be incorporated into the target SRR due to the constraint of the temporal sequence.

In the third step (Time Series Assessment), we computed and visualized the individual-level average adjusted JS time series traces of all SRRs. We colored the time series traces according to whether a time point was before or after the publication date of the SRR whose time series was being plotted. The coloring helps us to see whether the SRR's publication date marked a significant turning point in the trend. If an SRR's time series started to increase after the SRR’s publication date, then the SRR may have influenced evidence selection practices.

All computer scripts, coded using the R iGraph library version 1.5.0.1 (Csardi & Nepusz, 2006), tidygraph version 1.2.3 (Pedersen, 2023), and ggraph version 2.1.0 (Pedersen, 2022) can be found on GitHub (Fu, 2023).

5.1. Results from Step 1: Visualization Assessment

Figure 3 visualizes the ExRx inclusion network. The target SRR, SRR #2, is situated in the middle of the largest weakly connected component11. There are two other smaller weakly connected components. The weakly connected component at lower right (centered on SRR #23) includes 86.2% PSRs (25/29) published after the search date of SRR #2, and the weakly connected component at lower left (centered on SRR #3 and SRR #26) includes 38.4% (25/65) of PSRs published after the search date of SRR #2.

Figure 3.

The ExRx inclusion network. The arrow points to the target SRR, SRR #2. PSRs’ nodes were colored to indicate whether they were published before (yellow circles) or after (blue circles) SRR #2’s last search date.

Figure 3.

The ExRx inclusion network. The arrow points to the target SRR, SRR #2. PSRs’ nodes were colored to indicate whether they were published before (yellow circles) or after (blue circles) SRR #2’s last search date.

Close modal

SRR #23 synthesized evidence regarding one particular type of exercise, Baduanjin, a form of medical Qigong. The most plausible reason for its isolation is that the authors of SRR #23 conducted extensive searching in Chinese-language databases (Chinese National Knowledge Infrastructure, Wanfang, and the Chinese Clinical Trial Registry) for studies on Baduanjin, and many included PSRs were written in Chinese. Therefore, language and database access may explain why SRR #23’s included PSRs were not used by any other SRRs in the inclusion network, making SRR #23 and its included PSRs an isolated cluster. Decision-makers serving a Chinese community may wish to read SRR #23 to complement SRR #2’s findings.

SRRs #3 and #26 were interested in the effect of exercise in preventing rather than treating depression. They only included PSRs reporting observational studies, and other SRRs (except #17 and #22) restricted themselves to PSRs reporting randomized controlled trials (RCTs). In this case, depending on the decision-makers’ situation, exercise’s preventive effect on depression may be considered relevant or out of scope.

In contrast to Figure 3, the target SRRs reside in their own community (Figure 4). SRRs #1, #2, #3, #5, #6, and #12 are closer together; SRRs #4, #7, #8, #9, #11, #13, and #14 are closer together; SRR #10 is distant from the others. Community formation raises concern because our target SRRs did not include evidence that had been accepted by other SRRs.

Figure 4.

The salt controversy inclusion network. The arrow points to the target SRR to be assessed, SRR #12. PSR nodes are colored according to whether they were published before (yellow circles) or after (blue circles) SRR #12’s search date.

Figure 4.

The salt controversy inclusion network. The arrow points to the target SRR to be assessed, SRR #12. PSR nodes are colored according to whether they were published before (yellow circles) or after (blue circles) SRR #12’s search date.

Close modal

We found that the community formation aligned with SRRs’ differences in the choice of the most appropriate study design(s) to include. In Figure 5, SRRs clustered on the upper right side of the figure and within the light green shaded area are always connected to nodes representing PSRs using the RCT study design. SRRs clustered on the left side of the figure within the light orange shaded area are mainly connected to nodes representing PSRs using the cohort study design. Furthermore, SRRs on the boundary (#8, #9, and #11) are connected to both the RCT and cohort study PSR nodes, indicating that they straddle the study design division. SRR #10 (retracted) forms its own community because it studies patients with heart failure, a severe condition, whereas other SRRs, with the exception of SRR #11, study healthy or nonacutely ill people.

Figure 5.

Visualization of the weakly connected component in the salt controversy inclusion network with nodes colored by their study designs. Three shaded areas—light orange (left), light green (upper right), and light purple (lower right)—indicate three communities detected by the edge betweenness community detection algorithm, applied to the undirected version of the network (Nepusz & Csardi, 2022). Another algorithm, infomap, produced almost identical community detection results, except that it classified node #60 into the light purple community (lower right) rather than the light green community (upper right).

Figure 5.

Visualization of the weakly connected component in the salt controversy inclusion network with nodes colored by their study designs. Three shaded areas—light orange (left), light green (upper right), and light purple (lower right)—indicate three communities detected by the edge betweenness community detection algorithm, applied to the undirected version of the network (Nepusz & Csardi, 2022). Another algorithm, infomap, produced almost identical community detection results, except that it classified node #60 into the light purple community (lower right) rather than the light green community (upper right).

Close modal

The study design differences arose from a struggle between inferential rigor and cost/feasibility in medical research. RCTs are often considered the highest quality evidence in health sciences for topics that permit RCTs, yet some (especially Strazzullo, D’Elia et al., 2009) argued that the RCT design was unlikely to produce definitive evidence for the salt controversy due to its inherent cost and infeasibility.

To test whether we could recover the study design differences (node attributes) through the network structure, we applied community detection algorithms implemented in the R iGraph library (Csardi & Nepusz, 2006) to the weakly connected components (Supplementary material, Table S2). Two algorithms, edge betweenness and infomap, recovered the community structure shown in Figure 5, with only a minor difference regarding the assignment of node #60. Notably, the success of the edge betweenness algorithm indicates that edge betweenness is an important metric to explore in the future12.

Our investigation with Figure 5 revealed a significant risk of using SRR #12 as the foundation of action: Many other SRRs have moved towards using observational studies solely or in combination with RCTs to evaluate the relationship between salt intake and all-cause mortality and cerebro-cardiovascular disease. Although Cochrane reviews are known for excellence in review methodology, the prespecification of inclusion criteria can cause limitations when Cochrane reviews limit themselves to RCTs. Users interested in learning the relationship between salt intake and cardiovascular disease and stroke are strongly recommended to seek information from systematic reviews that consider non-RCT evidence.

5.2. Results from Step 2: Quantitative Similarity Assessment

Table 2 reports the results from a quantitative comparison between the target SRRs and their companion SRRs. We also list the interval between the target SRRs’ search date and their companion SRRs’ search date (i.e., “search date interval”): A positive search date interval means that the companion SRR’s last search date comes after the target SRR’s last search date. We sorted the companion SRRs according to their adjusted JS in ascending order.

Table 2.

Adjusted and regular JS between target SRRs and companion SRRs and corresponding JS fold change13

Companion SRR No. and title14Search date interval (in months)Adjusted JSRegular JSJS fold change (defined by Eq. 3)
(a) ExRx SRR #2 “Exercise for depression” compared to 
(ExRx SRR #3) “Physical activity and the prevention of depression” 0.000 0.000 
(ExRx SRR #14) “The effect of exercise on depressive symptoms in adolescents: a systematic review and meta-analysis” 21 0.000 0.000 
(ExRx SRR #17) “Should we recommend exercise to adolescents with depressive symptoms? a meta-analysis: exercise and depression in adolescents” 31 0.000 0.000 
(ExRx SRR #19) “A systematic review of yoga for major depressive disorder” 54 0.000 0.000 
(ExRx SRR #22) “Sedentary behavior and physical activity levels in people with schizophrenia, bipolar disorder and major depressive disorder: a global systematic review and meta-analysis” 57 0.000 0.000 
(ExRx SRR #23) “Mindfulness-based Baduanjin exercise for depression and anxiety in people with physical or mental illnesses: a systematic review and meta-analysis” 64 0.000 0.000 
(ExRx SRR #26) “Physical activity and incident depression: a meta-analysis of prospective cohort studies” 64 0.000 0.000 
(ExRx SRR #18) “A systematic review of cognitive effects of exercise in depression” 50 0.024 0.022 1.09 
(ExRx SRR #20) “Efficacy of home-based non-pharmacological interventions for treating depression: a systematic review and network meta-analysis of randomised controlled trials” 49 0.041 0.037 1.11 
(ExRx SRR #4) “Yoga for depression: a systematic review and meta-analysis” 0.043 0.041 1.05 
(ExRx SRR #27) “Can physical exercise modulate cortisol level in subjects with depression? a systematic review and meta-analysis” 69 0.051 0.048 1.06 
(ExRx SRR #24) “Treating depression with physical activity in adolescents and young adults: a systematic review and meta-analysis of randomised controlled trials” 51 0.067 0.057 1.18 
(ExRx SRR #25) “Association of efficacy of resistance exercise training with depressive symptoms: meta-analysis and meta-regression analysis of randomized clinical trials” 62 0.070 0.059 1.19 
(ExRx SRR #13) “Exercise improves physical and psychological quality of life in people with depression: A meta-analysis including the evaluation of control group response” 37 0.077 0.071 1.08 
(ExRx SRR #10) “Exercise and depressive symptoms in older adults: a systematic meta-analytic review” 18 0.087 0.082 1.06 
(ExRx SRR #28) “Aerobic exercise for adult patients with major depressive disorder in mental health services: A systematic review and meta-analysis” 57 0.103 0.087 1.18 
(ExRx SRR #1) “Physical activity in depressed elderly. a systematic review” 0.116 0.114 1.02 
(ExRx SRR #11) “Moderators of response in exercise treatment for depression: a systematic review” 33 0.122 0.111 1.10 
(ExRx SRR #8) “Using exercise to fight depression in older adults” 32 0.133 0.140 0.95 
(ExRx SRR #6) “Treating major depression with physical activity: a systematic overview with recommendations” 21 0.171 0.159 1.08 
(ExRx SRR #15) “Exercise for depression in older adults: a meta-analysis of randomized controlled trials adjusting for publication bias” 37 0.184 0.175 1.05 
(ExRx SRR #5) “Physical exercise intervention in depressive disorders: Meta-analysis and systematic review: Exercise intervention in depressive disorders” −3 0.268 0.256 1.05 
(ExRx SRR #21) “Exercise for patients with major depression: a systematic review with meta-analysis and trial sequential analysis” 60 0.317 0.213 1.49 
(ExRx SRR #16) “Exercise as a treatment for depression: A meta-analysis” 29 0.415 0.409 1.01 
(ExRx SRR #12) “Exercise as a treatment for depression: A meta-analysis adjusting for publication bias” 37 0.462 0.391 1.18 
(ExRx SRR #9) “Dropout from exercise randomized controlled trials among people with depression: a meta-analysis and meta regression” 37 0.634 0.491 1.29 
  
(b) Salt controversy SRR #12 “Reduced dietary salt for the prevention of cardiovascular disease” compared to 
(Salt SRR #7) “High salt intake and stroke: meta-analysis of the epidemiologic evidence” −16 0.000 0.000 
(Salt SRR #10) “Low sodium versus normal sodium diets in systolic heart failure: systematic review and meta-analysis” −13 0.000 0.000 
(Salt SRR #14) “Daily sodium consumption and CVD mortality in the general population: systematic review and meta-analysis of prospective studies” 0.000 0.000 
(Salt SRR #4) “Salt intake, stroke, and cardiovascular disease: meta-analysis of prospective studies” −53 0.048 0.045 1.07 
(Salt SRR #13) “Compared with usual sodium intake, low- and excessive-sodium diets are associated with increased mortality: a meta-analysis” 0.056 0.056 
(Salt SRR #11) “Sodium intake in populations: assessment of evidence” −5 0.059 0.057 1.04 
(Salt SRR #8) “Effect of reduced sodium intake on cardiovascular disease, coronary heart disease and stroke” −21 0.261 0.240 1.09 
(Salt SRR #9) “Effect of lower sodium intake on health: systematic review and meta-analyses” −21 0.261 0.240 1.09 
(Salt SRR #1) “Systematic review of long term effects of advice to reduce dietary salt in adults” −156 1.000 0.600 1.67 
(Salt SRR #2) “Reduced dietary salt for prevention of cardiovascular disease” −156 1.000 0.600 1.67 
(Salt SRR #3) “Advice to reduce dietary salt for prevention of cardiovascular disease” −156 1.000 0.600 1.67 
Companion SRR No. and title14Search date interval (in months)Adjusted JSRegular JSJS fold change (defined by Eq. 3)
(a) ExRx SRR #2 “Exercise for depression” compared to 
(ExRx SRR #3) “Physical activity and the prevention of depression” 0.000 0.000 
(ExRx SRR #14) “The effect of exercise on depressive symptoms in adolescents: a systematic review and meta-analysis” 21 0.000 0.000 
(ExRx SRR #17) “Should we recommend exercise to adolescents with depressive symptoms? a meta-analysis: exercise and depression in adolescents” 31 0.000 0.000 
(ExRx SRR #19) “A systematic review of yoga for major depressive disorder” 54 0.000 0.000 
(ExRx SRR #22) “Sedentary behavior and physical activity levels in people with schizophrenia, bipolar disorder and major depressive disorder: a global systematic review and meta-analysis” 57 0.000 0.000 
(ExRx SRR #23) “Mindfulness-based Baduanjin exercise for depression and anxiety in people with physical or mental illnesses: a systematic review and meta-analysis” 64 0.000 0.000 
(ExRx SRR #26) “Physical activity and incident depression: a meta-analysis of prospective cohort studies” 64 0.000 0.000 
(ExRx SRR #18) “A systematic review of cognitive effects of exercise in depression” 50 0.024 0.022 1.09 
(ExRx SRR #20) “Efficacy of home-based non-pharmacological interventions for treating depression: a systematic review and network meta-analysis of randomised controlled trials” 49 0.041 0.037 1.11 
(ExRx SRR #4) “Yoga for depression: a systematic review and meta-analysis” 0.043 0.041 1.05 
(ExRx SRR #27) “Can physical exercise modulate cortisol level in subjects with depression? a systematic review and meta-analysis” 69 0.051 0.048 1.06 
(ExRx SRR #24) “Treating depression with physical activity in adolescents and young adults: a systematic review and meta-analysis of randomised controlled trials” 51 0.067 0.057 1.18 
(ExRx SRR #25) “Association of efficacy of resistance exercise training with depressive symptoms: meta-analysis and meta-regression analysis of randomized clinical trials” 62 0.070 0.059 1.19 
(ExRx SRR #13) “Exercise improves physical and psychological quality of life in people with depression: A meta-analysis including the evaluation of control group response” 37 0.077 0.071 1.08 
(ExRx SRR #10) “Exercise and depressive symptoms in older adults: a systematic meta-analytic review” 18 0.087 0.082 1.06 
(ExRx SRR #28) “Aerobic exercise for adult patients with major depressive disorder in mental health services: A systematic review and meta-analysis” 57 0.103 0.087 1.18 
(ExRx SRR #1) “Physical activity in depressed elderly. a systematic review” 0.116 0.114 1.02 
(ExRx SRR #11) “Moderators of response in exercise treatment for depression: a systematic review” 33 0.122 0.111 1.10 
(ExRx SRR #8) “Using exercise to fight depression in older adults” 32 0.133 0.140 0.95 
(ExRx SRR #6) “Treating major depression with physical activity: a systematic overview with recommendations” 21 0.171 0.159 1.08 
(ExRx SRR #15) “Exercise for depression in older adults: a meta-analysis of randomized controlled trials adjusting for publication bias” 37 0.184 0.175 1.05 
(ExRx SRR #5) “Physical exercise intervention in depressive disorders: Meta-analysis and systematic review: Exercise intervention in depressive disorders” −3 0.268 0.256 1.05 
(ExRx SRR #21) “Exercise for patients with major depression: a systematic review with meta-analysis and trial sequential analysis” 60 0.317 0.213 1.49 
(ExRx SRR #16) “Exercise as a treatment for depression: A meta-analysis” 29 0.415 0.409 1.01 
(ExRx SRR #12) “Exercise as a treatment for depression: A meta-analysis adjusting for publication bias” 37 0.462 0.391 1.18 
(ExRx SRR #9) “Dropout from exercise randomized controlled trials among people with depression: a meta-analysis and meta regression” 37 0.634 0.491 1.29 
  
(b) Salt controversy SRR #12 “Reduced dietary salt for the prevention of cardiovascular disease” compared to 
(Salt SRR #7) “High salt intake and stroke: meta-analysis of the epidemiologic evidence” −16 0.000 0.000 
(Salt SRR #10) “Low sodium versus normal sodium diets in systolic heart failure: systematic review and meta-analysis” −13 0.000 0.000 
(Salt SRR #14) “Daily sodium consumption and CVD mortality in the general population: systematic review and meta-analysis of prospective studies” 0.000 0.000 
(Salt SRR #4) “Salt intake, stroke, and cardiovascular disease: meta-analysis of prospective studies” −53 0.048 0.045 1.07 
(Salt SRR #13) “Compared with usual sodium intake, low- and excessive-sodium diets are associated with increased mortality: a meta-analysis” 0.056 0.056 
(Salt SRR #11) “Sodium intake in populations: assessment of evidence” −5 0.059 0.057 1.04 
(Salt SRR #8) “Effect of reduced sodium intake on cardiovascular disease, coronary heart disease and stroke” −21 0.261 0.240 1.09 
(Salt SRR #9) “Effect of lower sodium intake on health: systematic review and meta-analyses” −21 0.261 0.240 1.09 
(Salt SRR #1) “Systematic review of long term effects of advice to reduce dietary salt in adults” −156 1.000 0.600 1.67 
(Salt SRR #2) “Reduced dietary salt for prevention of cardiovascular disease” −156 1.000 0.600 1.67 
(Salt SRR #3) “Advice to reduce dietary salt for prevention of cardiovascular disease” −156 1.000 0.600 1.67 

In principle, adjusted JS should be larger than the regular JS, but there is one anomaly between ExRx SRR #2 and ExRx SRR #8 (adjusted JS = 0.133, regular JS = 0.140). The cause of these anomalies is that we use the publication date to anchor PSRs in the temporal sequence. However, a SRR can still include a PSR published after its last search date if the PSR was findable through a bibliographic database (e.g., already indexed as online first before its publication in a journal issue) or a clinical trial registry, resulting in an underestimation of adjusted JS.

5.2.1. Assessing ExRx SRR #2

5.2.1.1. Identifying potentially relevant evidence but overlooked by ExRx SRR #2

By examining the first few rows of Table 2(a), we identify other ExRx SRRs that do not share any PSR with our target SRR, SRR #2, besides those we discovered from the visualization assessment step. SRR #19 is an intriguing case: It studies yoga for major depression but shared no PSR with SRR #2. It turned out that SRR #2, although its inclusion criteria did not mention yoga, or explicitly state they would exclude PSRs studying yoga, filtered out several PSRs studying yoga in practice (see “Characteristics of excluded studies” of Cooney et al. (2013)) but retained two PSRs studying laughter yoga. Due to these two PSRs on laughter yoga, SRR #2 has miniscule but nonzero adjusted JS with SRR #4, which specifically studies yoga (adjusted JS = 0.043).

Both SRR #14 and #17 are about the adolescent population and have zero adjusted JS and regular JS with SRR #2, indicating that our target SRR may have overlooked the adolescent population, which is true, because SRR #2 only studies the adult population.

5.2.1.2. Identifying potentially relevant evidence published after the search date of ExRx SRR #2

SRR #9, #12, and #16 are the three ExRx companion SRRs with the top three highest adjusted JS scores with SRR #2. SRRs #2 and #9 have a high adjusted JS (0.634) and their JS fold change (1.29) indicates a considerable amount of new evidence. SRR #9 concerns the dropout rate from randomized controlled trials among people with depression, and therefore may not be of direct interest to users wanting to know whether exercise helps alleviate depression.

By contrast, the other two SRRs with high adjusted JS with our target SRR, SRRs #12 and #16, are both of particular interest. Their titles both contain the keywords “exercise” and “depression.” SRR #16 included more new evidence (JS fold change = 1.18, search date interval = 37 months) than SRR #12 (JS fold change = 1.01, search date interval = 29 months). In addition to checking SRR #16 for new evidence, users should also consider comparing SRRs #16 and #12 to understand why SRR #12 does not include more new evidence as compared to SRR #16.

5.2.2. Assessing salt controversy SRR #12

5.2.2.1. Identifying potentially relevant evidence but overlooked by salt controversy SRR #12

By examining the first few rows of Table 2(b), we identify salt controversy SRRs that do not share any PSR with SRR #12. SRR #12 has zero adjusted JS and regular JS with SRR #7 and #14 due to the difference in study designs. SRR #12 does not have any overlap with SRR #10, the only faceting SRR in the salt controversy inclusion network, because “[d]oubts have been raised about the integrity of research from the Paterna group” (Adler et al., 2014), the group that produced SRR #10. Consequently, due to these integrity concerns, PSRs included in SRR #10 were not selected by SRR #12. In summary, we did not identify new potentially relevant evidence overlooked by SRR #12.

5.2.2.2. Identifying potentially relevant evidence published after the search date of salt controversy SRR #12

SRR #12 is relatively late among all salt controversy SRRs, as only two SRRs, SRRs #13 and #14, finished their search after SRR #12 (Table 2(b)). Their adjusted JS with SRR #12 are both low (0.056 for #13 and 0 for #14) and their JS fold changes are both 1 (i.e., no change). As we cannot find SRRs that (a) finished their search after the target SR; (b) have a high adjusted JS with the target SRR; and (c) have a high JS fold change with the target SRR, we did not identify potentially relevant evidence published after the search date of the target SRR in this case.

5.3. Results from Step 3: Time Series Assessment

We designed the individual-level average adjusted JS to assess the stability in an SRR’s evidence selection standards: whether an SRR’s choice of evidence continued to be used by other SRRs, or whether the evidence it selected was no longer used by other SRRs. Figure 6 shows that ExRx SRR #2’s time series experienced a steady period of increase from 2014 to 2016. Thus, despite the criticism it received (Clarke, 2019), SRR #2’s choice of evidence gained acceptance. The increase also took place after its publication, which increased the significance of SRR #2: It may have influenced the evidence selection of later SRRs by increasing the exposure of the PSRs it selected. However, after 2016, SRR #2’s time series plunged, but so did the time series of many other SRRs (Figure 6). The growing dissimilarity can also be observed in Figure 3, the visualization of the ExRx inclusion network. SRRs that finished their search in 2016 or later appear either in isolated clusters (SRR #26 and SRR #23) or on the periphery of the largest weakly connected component (SRRs #18, #19, #20, #21, #22, #24, #25, #27, and #28). Because SRR #2 is situated at the center of the largest weakly connected component, the distance among the 12 SRRs indicates the dissimilarity amongst them. The 11 SRRs that finished their search in 2016 or later appear to make a concerted effort to select different evidence than our target SRRs—which are also different from each other. Our domain expert Caitlin Clarke suggested a plausible explanation for this diversification: Perhaps researchers determined that the overall topic (the relationship between physical activity and depression) was too broad and chose to further narrow their research questions and redesign their inclusion criteria for narrower research questions, resulting in disjoint sets of evidence selected.

Figure 6.

Time series of individual-level average adjusted Jaccard similarity for all SRRs in the ExRx inclusion network. We have omitted the “20” from the x-axis labels to save space and improve readability (e.g., “13” means “2013”). The color indicates whether the time point is before (purple) or after (blue) the publication date of the SRR whose time series is plotted.

Figure 6.

Time series of individual-level average adjusted Jaccard similarity for all SRRs in the ExRx inclusion network. We have omitted the “20” from the x-axis labels to save space and improve readability (e.g., “13” means “2013”). The color indicates whether the time point is before (purple) or after (blue) the publication date of the SRR whose time series is plotted.

Close modal

Our target salt controversy SRR (SRR #12) shows a downward trend, along with other RCT-only SRRs (SRRs #1, #2, #3, #5, and #6) (Figure 7). Among the companion SRRs, the majority show upward trends (#4, #7, #11, #13, and #14). Two show downward trends (#8 and #9), and one is nearly flat (#10). The average adjusted JS decreased for SRRs that only include RCTs (all target SRRs) because emerging SRRs started to include observational studies. Meanwhile, the average adjusted JS increased among SRRs that include observational studies (most companion SRRs) because more peers with similar evidence selection criteria started to emerge.

Figure 7.

Time series of individual-level average adjusted Jaccard similarity for each SRR in the salt controversy inclusion network. We have omitted the “20” from the x-axis labels to save space and improve readability (e.g., “04” means “2004”). The color indicates whether the time point is before (purple) or after (blue) the publication date of the SRR whose time series is plotted.

Figure 7.

Time series of individual-level average adjusted Jaccard similarity for each SRR in the salt controversy inclusion network. We have omitted the “20” from the x-axis labels to save space and improve readability (e.g., “04” means “2004”). The color indicates whether the time point is before (purple) or after (blue) the publication date of the SRR whose time series is plotted.

Close modal

We also detected salt controversy SRR #4 as a potentially influential SRR by using the time series in Figure 7: The rise of the time series took place after its publication. It was the first among all salt controversy SRRs to include observational studies. Although it is impossible to know whether SRR #4’s argument in support of including observational studies influenced others, among the subsequent 10 SRRs that completed their search after SRR #4’s publication date (November 2009), 80% (8/10) cited SRR #4, indicating their awareness of SRR #4.

Table 3 summarizes all the risks we identified through the inclusion network approach. Evaluating risks is beyond the scope of the inclusion network approach itself. In general, decision-makers must evaluate the risks before deciding whether a given review is sufficient to use as the foundation of their action.

Table 3.

A summary of risks identified

Target SRRAssessment stepRisk
ExRx SRR #2 Step 1. Visualization Assessment ExRx SRR #2 did not include PSRs studying Baduanjin, which may be relevant to decision-makers. 
Step 2. Quantitative Similarity Assessment 1. Although ExRx SRR #2 did not explicitly exclude PSRs studying yoga, it filtered out many PSRs studying yoga. 
2. Users can examine ExRx SRR #16 for potentially relevant evidence published after the search date of SRR #2. 
Step 3. Time Series Assessment Recent instability in evidence selection standards: The decreasing time series after 2016 indicates that ExRx #2’s standards are losing broader acceptance due to a recent diversification in evidence selection. 
Salt controversy SRR #12 Step 1. Visualization Assessment Many other SRRs included observational studies solely or in combination with RCTs to evaluate the relationship between salt intake and all-cause mortality and cerebro-cardiovascular disease. 
Step 2. Quantitative Similarity Assessment None. 
Step 3. Time Series Assessment Recent instability in evidence selection standards: The time series is decreasing due to a recent trend for SRRs to include observational studies. 
Target SRRAssessment stepRisk
ExRx SRR #2 Step 1. Visualization Assessment ExRx SRR #2 did not include PSRs studying Baduanjin, which may be relevant to decision-makers. 
Step 2. Quantitative Similarity Assessment 1. Although ExRx SRR #2 did not explicitly exclude PSRs studying yoga, it filtered out many PSRs studying yoga. 
2. Users can examine ExRx SRR #16 for potentially relevant evidence published after the search date of SRR #2. 
Step 3. Time Series Assessment Recent instability in evidence selection standards: The decreasing time series after 2016 indicates that ExRx #2’s standards are losing broader acceptance due to a recent diversification in evidence selection. 
Salt controversy SRR #12 Step 1. Visualization Assessment Many other SRRs included observational studies solely or in combination with RCTs to evaluate the relationship between salt intake and all-cause mortality and cerebro-cardiovascular disease. 
Step 2. Quantitative Similarity Assessment None. 
Step 3. Time Series Assessment Recent instability in evidence selection standards: The time series is decreasing due to a recent trend for SRRs to include observational studies. 

For example, our salt controversy case study identified two risks in SRR #12: Many other SRRs included observational studies solely or in combination with RCTs, and including observational studies is a more recent trend. To investigate whether SRR #12 would be sufficient to use as the foundation of their action, decision-makers need to evaluate those two risks against the fact that SRR#12 only includes RCTs and the fact that observational studies are, in general, of lower quality than RCTs. One next step would be for decision-makers to assess the merits of observational studies for this topic.

6.1. A Comparison Between the Inclusion Network Approach and Claim-Specific Citation Network Analysis

In comparing the inclusion network approach and claim-specific citation network, we highlight several high-level similarities as well as four significant differences.

First the similarities: Both aim to assess scientific knowledge. Both can trace error propagation: by identifying citation distortion in claim-specific citation network analysis or by making evident whether any SRRs have included problematic PSRs in the inclusion network approach. Both rely on network representations and therefore are sensitive to community formation. In fact, a common risk identified by both approaches is the existence of unresolved disagreement manifested through these network structures. Trinquart et al. (2016) identified communities through their claim-specific citation network, which are “distinct and disparate lines of scholarship” that bear “little imprint of an ongoing controversy.” From the inclusion approach, we discovered community formation in the inclusion network (Figure 5), which represents disparate evidence selection standards.

Among the differences, first, in claim-specific citation network analysis, the entire network is about a single scientific claim, and the structure of the entire network is used to assess that claim. In the inclusion network approach, we assess only one systematic review (the target SRR) using the inclusion network of the target SRR and its companion SRRs. And the conclusions drawn by a systematic review are usually more nuanced and complex than a scientific claim.

Second, claim-specific citation network analysis connects the structure of the claim-specific citation network to violations of scientific objectivity: for example, biased representation of the available evidence, distorting the meaning of a claim from the source literature, or inventing a claim that was not contained in the source literature. These violations concern trust in scientific knowledge but mainly from the position of trained scientists. By contrast, we define risk inclusively as any information concerning a systematic review that can reduce a user’s willingness to use the systematic review as the foundation of their action. The judgment of risk is left to the users of the inclusion network approach because we, as designers, will never have all the knowledge pertinent to the assessment.

Third, the inclusion network approach is more analyst driven than claim-specific citation network analysis, a characteristic that we demonstrated through the two case studies above. It involves a process of sensemaking of the network, particularly in the first step, Visualization Assessment, which relies on the analyst’s attention and judgment.

Fourth, the inclusion network is better positioned to study evidence selection standards than claim-specific citation network analysis is due to the specific meanings encoded in the edges. In both networks, edges represent that a citing paper’s authors selected the cited paper. In general, authors’ decisions regarding what to cite are not explicitly documented; but in the special cases of systematic reviews and other publications that support decision-making, the selection criteria are explicitly stated in a publication’s inclusion criteria. Without being restricted to particular types of scientific publications, a claim-specific citation network provides a bird’s-eye view of a scientific belief system and is better positioned to study the overall establishment process of a scientific belief. Yet in the inclusion network approach, the edges connecting the SRR and its included PSRs in the inclusion network are a consequence of explicitly stated evidence selection standards. This enables measuring the stability of the evidence selection standards of a target SRR using the individual-level average JS time series, as we demonstrated in Section 5.3. And it enables community detection on an inclusion network to highlight disparate evidence selection standards, as shown in Figure 5. In general, the consistent correspondence between the existence of an edge and a set of documented and traceable evidence selection standards enables quantitative analysis of evidence selection standards based on the structure of the inclusion network.

6.2. Comparison With Other Related Work

Parallel to the temporal analysis by Shwed and Bearman (2010), with the inclusion network we studied the stability of the evidence selection standards of the target SRRs using the individual-level average JS time series. We must highlight that the time series analysis for evidence selection standards was made possible by the explicit evidence selection procedures and reporting standards of systematic reviews.

6.3. Limitations of This Work and Related Future Work

The inclusion network spotlights the choice of evidence, however many other factors that can influence the conclusions drawn by a systematic review were not considered, such as data extraction and statistical methods (Bolland & Grey, 2014) or the handling of heterogeneity and risk of bias assessments (Osnabrugge et al., 2015). This is a fundamental limitation of the approach. And inferring a causal relation between the evidence selected and the conclusion drawn by an SRR must proceed with care.

Choosing appropriate companion SRRs is of paramount importance. We have yet to develop a sound methodology for choosing appropriate companion SRRs. Currently, users can collect the SRRs using systematic search as Trinquart et al. (2016) did for their data set or as Clarke did for her data set (Clarke, 2019; Clarke et al., 2023) and design their search queries based on the target SRR’s research question. However, without experimenting with various search strategies to compare their outcomes, we will not know whether systematic search is the most appropriate for the inclusion network approach. Such experimentation will be a part of the future work. Furthermore, we hypothesize that the search strategy for appropriate companion SRRs should also deliberately investigate certain known risks (Cooper, 1982; Felson, 1992), which brings us to the next limitation of the current work.

We carried out the current work without a clear idea of what risks may exist, which made the process more explorative but less systematic. In the future, we plan to seek advice from decision-makers and other users of systematic reviews and other evidence synthesis literature about which risks they find most concerning. We will develop a taxonomy of risks to guide the improvement of our approach. This taxonomy will incorporate user feedback and existing taxonomies, such as the taxonomy of bias by Felson (1992). Furthermore, we will invite decision-makers and other users of systematic review literature to test our approach; we will use their feedback to improve our approach.

We solely focused on SRRs in this study. But a study of PSRs using the inclusion network construct is also a promising direction. For example, we could study the PSR utilization using the construct, following Leng’s work on research underutilization (Leng, 2018). Because many primary studies cost tremendous resources to conduct, it will be extremely valuable to know their utilization. An analysis of PSR utilization deserves a publication of its own, as the analytical methods must be carefully designed from the perspective of PSRs to fit specific analytical purposes (e.g., detecting changes in utilization). At this point, we have presented a metric called PSR-utilization rate in the Supplementary material Section 5.

Although the publication date is easy to obtain, using the publication date to rank PSRs in the temporal sequence can be imperfect because an SRR can still include a PSR published after its last search date if the PSR was findable through a bibliographic database (e.g., already indexed as online first before its publication in a journal issue) or a clinical trial registry, resulting in an underestimation of adjusted JS. In the future, we may refine how a temporal sequence is constructed by systematically researching the findability of a PSR before its publication date and identifying a more precise way to obtain the rank value for PSRs.

6.4. Other Future Work

The most important future work is at the data infrastructure level: indexing included PSR separately, which requires community effort. Currently, we must manually extract included PSRs, assign them unique identifiers, and curate the inclusion relationship to form an inclusion network, which is neither scalable nor timely. Databases of scientific evidence and/or their synthesis, such as The Collaboration of Environmental Evidence Database of Evidence Reviews (CEEDER) and Meta-Analytic Psychotherapy Research (Metapsy) can preserve the record of inclusion relationships and make them retrievable for users. Over time, such efforts will yield large-scale inclusion networks that can be retrieved and analyzed like today’s regular citation data, as an important data source to improve the usage of scientific evidence.

We designed and tested the inclusion network approach to help decision-makers assess risks in systematic reviews they intend to use as the foundation of their action. Our approach stemmed from two lines of research, reliability issues in systematic reviews and the use of network analysis to assess scientific knowledge. It is indebted to but also significantly different from the influential claim-specific citation network approach.

We started the design process by specifying four design principles. Our design principles led us to design the data structure (the inclusion network), the network metric (adjusted Jaccard similarity), and a three-step assessment process.

We presented two case studies. Potential users should learn from our case studies how they may explore the inclusion network, move back and forth between the network and the content of the literature, and develop their own understanding about the SRRs and PSRs. For our case studies, we prepared two inclusion network data sets, each based on data initially collected by researchers interested in studying systematic reviews addressing a given topic. We used these two data sets as two case studies and assessed two SRRs: ExRx SRR #2 and salt controversy SRR #12. Risks we identified include missing potentially relevant evidence (e.g., Baduanjin and yoga in ExRx SRR #2), epistemic division in the scientific community (e.g., disagreement regarding the most appropriate study design(s) to include among salt controversy SRRs), and recent instability in evidence selection standards.

Given the potentially momentous consequence resulting from decision-makers applying our inclusion network approach to their work, we strived for an ethical, inclusive, and transparent design that respects the agency and knowledge of the decision-makers. We hope this paper provides an example for future quantitative science researchers who want to design methods for diverse users. The inclusion network approach still needs improvement, and in the near future, we hope it can increase confidence in the applicability of evidence synthesis products such as systematic reviews, or by contrast, can rationalize the commissioning of up-to-date or more expansive syntheses when the risks are deemed unacceptable.

Tzu-Kun (Esther) Hsiao coauthored our previous paper on the salt controversy and gave useful feedback on this manuscript. Thanks to George Chacko, Jana Diesner, Ly Dinh, David Hopping, David Johns, Jay Patel, Zoi Rapti, Malik Salami, and Linda Smith for comments and to Trinquart et al. for data sharing. Thanks to Maria Gillombardo for introducing CC and JS. Thanks to the anonymous reviewers of Quantitative Science Studies for their extremely helpful comments. Thanks to Randi Proescholdt for providing the Multi-Tagger classification outputs used to cross-check the study design classifications shown in Figure 5.

Mark Van Moer: Software, Visualization, Writing—review & editing. Caitlin Vitosky Clarke: Data curation, Resources, Writing—original draft, Writing—review & editing. Jodi Schneider: Conceptualization, Data curation, Funding acquisition, Methodology, Supervision, Writing—original draft, Writing—review & editing. Yuanxi Fu: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing—original draft, Writing—review & editing. Natalie Lischwe Mueller: Data curation. Manasi Joshi: Data curation. Tzu-Kun Hsiao: Data curation, Methodology, Visualization.

The authors have no competing interests.

This research was supported by NSF 2046454 CAREER: Using network analysis to assess confidence in research synthesis and by the Campus Research Board of the University of Illinois at Urbana-Champaign Grant RB21012. MVM’s participation was supported by the University of Illinois Research Software Collaborative Service. Multi-Tagger was funded by NIH/NLM R01LM010817.

Two data sets were used in this work: ExRx (Clarke et al., 2023) and salt controversy (Fu et al., 2023).

1

Subject to reviewers’ definitions and inclusion criteria. On the challenges in comprehensive search, see, for example, Delaney and Tamás (2018).

2

Leng (2018)’s adaptation was to group publications (RCT reports) that reported the same RCT study into a single unit, a typical approach in evidence synthesis. Further, Leng only considered citations between reviews and RCTs, excluding citations between reviews.

3

For example, a controversy mapping analyst would not judge the claim studied in Greenberg (2009) as “unfounded.” They would analyze the text to see the matter from the perspectives of the researchers who supported the claim.

4

“Indeed, as suspect as this may sound, controversies mapping entails no conceptual assumptions and requires no methodological protocols. There are no definitions to learn; no premises to honor; no hypothesis to demonstrate; no procedure to follow; no correlations to establish.” (Venturini, 2010).

5

It does not matter whether ui has completed its search at the time point t because the adjusted JS only considers PSRs available to ui and the other SRR at the time point t and what ui and the other SRR both ended up choosing.

6

SRRs are numbered 1–6 and 8–28. We excluded SRR #27 because it was not an SRR that synthesized primary studies but an umbrella review that synthesized SRRs.

7

The difference between depression and depressive symptoms is still a debate within psychology and psychiatry, but the field of kinesiology/exercise science uses them interchangeably.

8

See file “systematic_review_inclusion_criteria.csv,” in our data set (Clarke et al., 2023) for an analysis of the research questions and inclusion criteria of the 27 ExRx SRRs.

9

Dual publication of a single systematic review—one as a report and one as a scientific journal article with mention or citation of each other—is normal, because journal publications are shorter and more streamlined, with the potential to reach a larger audience.

10

For a detailed mathematical proof, see Supplementary material Section 6.

11

In a weakly connected component, all vertices are connected to each other by some path without considering edge directionality.

12

Greenberg (2011) uses vertex betweenness centrality. The edge betweenness community detection algorithm (Girvan-Newman algorithm) uses edge betweenness. Vertex betweenness and edge betweenness are analogous concepts but not the same.

13

A bibliography of all SRRs can be found in Supplementary material Section 4.

14

Companion SRRs are ordered by their adjusted JS in ascending order.

Adler
,
A. J.
,
Taylor
,
F.
,
Martin
,
N.
,
Gottlieb
,
S.
,
Taylor
,
R. S.
, &
Ebrahim
,
S.
(
2014
).
Reduced dietary salt for the prevention of cardiovascular disease
.
Cochrane Database of Systematic Reviews
,
2014
(
12
),
CD009217
. ,
[PubMed]
Bastian
,
H.
,
Glasziou
,
P.
, &
Chalmers
,
I.
(
2010
).
Seventy-five trials and eleven systematic reviews a day: How will we ever keep up?
PLOS Medicine
,
7
(
9
),
e1000326
. ,
[PubMed]
Bayer
,
R.
,
Johns
,
D. M.
, &
Galea
,
S.
(
2012
).
Salt and public health: Contested science and the challenge of evidence-based decision making
.
Health Affairs
,
31
(
12
),
2738
2746
. ,
[PubMed]
Bolland
,
M. J.
, &
Grey
,
A.
(
2014
).
A case study of discordant overlapping meta-analyses: Vitamin D supplements and fracture
.
PLOS ONE
,
9
(
12
),
e115934
. ,
[PubMed]
Caspersen
,
C. J.
,
Powell
,
K. E.
, &
Christenson
,
G. M.
(
1985
).
Physical activity, exercise, and physical fitness: Definitions and distinctions for health-related research
.
Public Health Reports
,
100
(
2
),
126
131
.
[PubMed]
Cetina
,
K. K.
(
1999
).
Epistemic cultures: How the sciences make knowledge
.
Cambridge, MA
:
Harvard University Press
.
Chalmers
,
I.
,
Hedges
,
L. V.
, &
Cooper
,
H.
(
2002
).
A brief history of research synthesis
.
Evaluation & the Health Professions
,
25
(
1
),
12
37
. ,
[PubMed]
Clarke
,
C. V.
(
2019
).
Exercise science depression studies: A cultural, interpretive, and science studies perspective
[
Dissertation
,
University of Illinois at Urbana-Champaign
]. https://hdl.handle.net/2142/105193
Clarke
,
C. V.
, &
Adamson
,
B. C.
(
2023
).
A syndemics approach to exercise is medicine
.
Health
,
27
(
3
),
323
344
. ,
[PubMed]
Clarke
,
C. V.
,
Lischwe Mueller
,
N.
,
Joshi
,
M. B.
,
Fu
,
Y.
, &
Schneider
,
J.
(
2023
).
The inclusion network of 27 review articles published between 2013–2018 investigating the relationship between physical activity and depressive symptoms
[Dataset]
.
University of Illinois at Urbana-Champaign Databank
.
Coarasa
,
J.
,
Das
,
J.
,
Gummerson
,
E.
, &
Bitton
,
A.
(
2017
).
A systematic tale of two differing reviews: Evaluating the evidence on public and private sector quality of primary care in low and middle income countries
.
Globalization and Health
,
13
(
1
),
24
. ,
[PubMed]
Cooney
,
G. M.
,
Dwan
,
K.
,
Greig
,
C. A.
,
Lawlor
,
D. A.
,
Rimer
,
J.
, …
Mead
,
G. E.
(
2013
).
Exercise for depression
.
Cochrane Database of Systematic Reviews
,
2013
(
9
),
CD004366
. ,
[PubMed]
Cooper
,
H.
(
1982
).
Scientific guidelines for conducting integrative research reviews
.
Review of Educational Research
,
52
(
2
),
291
302
.
Cooper
,
H.
,
Hedges
,
L. V.
, &
Valentine
,
J. C.
(Eds.). (
2019
).
The handbook of research synthesis and meta-analysis
(3rd ed.).
New York, NY
:
Russell Sage Foundation
.
Cramer
,
H.
,
Anheyer
,
D.
,
Lauche
,
R.
, &
Dobos
,
G.
(
2017
).
A systematic review of yoga for major depressive disorder
.
Journal of Affective Disorders
,
213
,
70
77
. ,
[PubMed]
Créquit
,
P.
,
Trinquart
,
L.
,
Yavchitz
,
A.
, &
Ravaud
,
P.
(
2016
).
Wasted research when systematic reviews fail to provide a complete and up-to-date evidence synthesis: The example of lung cancer
.
BMC Medicine
,
14
,
8
. ,
[PubMed]
Csardi
,
G.
, &
Nepusz
,
T.
(
2006
).
The iGraph software package for complex network research
.
InterJournal, Complex Systems
,
1695
(
5
),
1
9
.
Delaney
,
A.
, &
Tamás
,
P. A.
(
2018
).
Searching for evidence or approval? A commentary on database search in systematic reviews and alternative information retrieval methodologies
.
Research Synthesis Methods
,
9
(
1
),
124
131
. ,
[PubMed]
Duyx
,
B.
,
Urlings
,
M. J. E.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P.
(
2017a
).
Scientific citations favor positive results: A systematic review and meta-analysis
.
Journal of Clinical Epidemiology
,
88
,
92
101
. ,
[PubMed]
Duyx
,
B.
,
Urlings
,
M. J. E.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P.
(
2017b
).
Selective citation in the literature on swimming in chlorinated water and childhood asthma: A network analysis
.
Research Integrity and Peer Review
,
2
,
17
. ,
[PubMed]
Duyx
,
B.
,
Urlings
,
M. J. E.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P.
(
2019
).
Selective citation in the literature on the hygiene hypothesis: A citation analysis on the association between infections and rhinitis
.
BMJ Open
,
9
(
2
),
e026518
. ,
[PubMed]
Duyx
,
B.
,
Urlings
,
M. J. E.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P.
(
2020
).
Determinants of citation in the literature on diesel exhaust exposure and lung cancer: A citation analysis
.
BMJ Open
,
10
(
10
),
e033967
. ,
[PubMed]
Felson
,
D. T.
(
1992
).
Bias in meta-analytic research
.
Journal of Clinical Epidemiology
,
45
(
8
),
885
892
. ,
[PubMed]
Fu
,
Y.
(
2023
).
Code for the inclusion network manuscript
(1.0.0) [R]
. https://github.com/infoqualitylab/code_for_the_inclusion_net_manuscript
Fu
,
Y.
,
Hsiao
,
T.-K.
,
Joshi
,
M. B.
, &
Lischwe Mueller
,
N.
(
2023
).
The salt controversy systematic review reports and primary study reports network dataset
[Dataset]
.
University of Illinois at Urbana-Champaign
.
Gøtzsche
,
P. C.
(
1994
).
Steroids and peptic ulcer: An end to the controversy?
Journal of Internal Medicine
,
236
(
6
),
599
601
. ,
[PubMed]
Greenberg
,
S. A.
(
2009
).
How citation distortions create unfounded authority: Analysis of a citation network
.
BMJ
,
339
,
b2680
. ,
[PubMed]
Greenberg
,
S. A.
(
2011
).
Understanding belief using citation networks
.
Journal of Evaluation in Clinical Practice
,
17
(
2
),
389
393
. ,
[PubMed]
Gurevitch
,
J.
,
Koricheva
,
J.
,
Nakagawa
,
S.
, &
Stewart
,
G.
(
2018
).
Meta-analysis and the science of research synthesis
.
Nature
,
555
(
7695
),
175
182
. ,
[PubMed]
Hacke
,
C.
, &
Nunan
,
D.
(
2020
).
Discrepancies in meta-analyses answering the same clinical question were hard to explain: A meta-epidemiological study
.
Journal of Clinical Epidemiology
,
119
,
47
56
. ,
[PubMed]
Haddaway
,
N. R.
,
Macura
,
B.
,
Whaley
,
P.
, &
Pullin
,
A. S.
(
2018
).
ROSES RepOrting standards for Systematic Evidence Syntheses: Pro forma, flow-diagram and descriptive summary of the plan and conduct of environmental systematic reviews and systematic maps
.
Environmental Evidence
,
7
,
7
.
Hsiao
,
T.-K.
,
Fu
,
Y.
, &
Schneider
,
J.
(
2020
).
Visualizing evidence-based disagreement over time: The landscape of a public health controversy 2002–2014
.
Proceedings of the Association for Information Science and Technology
,
57
(
1
),
e315
. ,
[PubMed]
Ioannidis
,
J. P. A.
(
2016
).
The mass production of redundant, misleading, and conflicted systematic reviews and meta-analyses
.
The Milbank Quarterly
,
94
(
3
),
485
514
. ,
[PubMed]
Jacomy
,
M.
(
2020
).
Epistemic clashes in network science: Mapping the tensions between idiographic and nomothetic subcultures
.
Big Data & Society
,
7
(
2
).
Jadad
,
A. R.
,
Cook
,
D. J.
, &
Browman
,
G. P.
(
1997
).
A guide to interpreting discordant systematic reviews
.
Canadian Medical Association Journal
,
156
(
10
),
1411
1416
.
[PubMed]
Katrak
,
P.
,
Bialocerkowski
,
A. E.
,
Massy-Westropp
,
N.
,
Kumar
,
V. S.
, &
Grimmer
,
K. A.
(
2004
).
A systematic review of the content of critical appraisal tools
.
BMC Medical Research Methodology
,
4
,
22
. ,
[PubMed]
Khamis
,
A. M.
,
El Moheb
,
M.
,
Nicolas
,
J.
,
Iskandarani
,
G.
,
Refaat
,
M. M.
, &
Akl
,
E. A.
(
2019
).
Several reasons explained the variation in the results of 22 meta-analyses addressing the same question
.
Journal of Clinical Epidemiology
,
113
,
147
158
. ,
[PubMed]
Kitchenham
,
B. A.
,
Budgen
,
D.
, &
Brereton
,
P.
(
2015
).
Evidence-based software engineering and systematic reviews
.
Boca Raton, FL
:
CRC Press
.
Kuhn
,
T.
(
2012
).
The structure of scientific revolutions
(4th ed.).
Chicago, IL
:
University of Chicago Press
.
Lefebvre
,
C.
,
Glanville
,
J.
,
Featherstone
,
R.
,
Littlewood
,
A.
,
Marshall
,
C.
, …
Wieland
,
L.
(
2022
).
Chapter 4: Searching for and selecting studies
. In
J.
Higgins
,
J.
Thomas
,
J.
Chandler
,
M.
Cumpston
,
T.
Li
, …
V.
Welch
(Eds.),
Cochrane handbook for systematic reviews of interventions version 6.3
(updated February 2022)
.
Cochrane
. https://training.cochrane.org/handbook/current/chapter-04
Leng
,
R. I.
(
2018
).
A network analysis of the propagation of evidence regarding the effectiveness of fat-controlled diets in the secondary prevention of coronary heart disease (CHD): Selective citation in reviews
.
PLOS ONE
,
13
(
5
),
e0197716
. ,
[PubMed]
Low
,
J.
,
Ross
,
J. S.
,
Ritchie
,
J. D.
,
Gross
,
C. P.
,
Lehman
,
R.
, …
Krumholz
,
H. M.
(
2017
).
Comparison of two independent systematic reviews of trials of recombinant human bone morphogenetic protein-2 (rhBMP-2): The Yale Open Data Access Medtronic Project
.
Systematic Reviews
,
6
(
1
),
28
. ,
[PubMed]
Lucenteforte
,
E.
,
Moja
,
L.
,
Pecoraro
,
V.
,
Conti
,
A. A.
,
Conti
,
A.
, …
Virgili
,
G.
(
2015
).
Discordances originated by multiple meta-analyses on interventions for myocardial infarction: A systematic review
.
Journal of Clinical Epidemiology
,
68
(
3
),
246
256
. ,
[PubMed]
Nepusz
,
T.
, &
Csardi
,
G.
(
2022
).
Detecting community structure
.
R iGraph manual pages
. https://igraph.org/c/doc/igraph-Community.html
Newman
,
M.
(
2018
).
Homophily and assortative mixing
. In
Networks
(2nd ed.) (pp.
201
211
).
Oxford
:
Oxford University Press
.
Okoli
,
C.
(
2015
).
A guide to conducting a standalone systematic literature review
.
Communications of the Association for Information Systems
,
37
,
879
910
;
Article 43
.
Osnabrugge
,
R. L.
,
Head
,
S. J.
,
Zijlstra
,
F.
,
ten Berg
,
J. M.
,
Hunink
,
M. G.
, …
Janssens
,
A. C. J. W.
(
2015
).
A systematic review and critical assessment of 11 discordant meta-analyses on reduced-function CYP2C19 genotype and risk of adverse clinical outcomes in clopidogrel users
.
Genetics in Medicine
,
17
(
1
),
3
11
. ,
[PubMed]
Page
,
M. J.
,
McKenzie
,
J. E.
,
Bossuyt
,
P. M.
,
Boutron
,
I.
,
Hoffmann
,
T. C.
, …
Moher
,
D.
(
2021a
).
The PRISMA 2020 statement: An updated guideline for reporting systematic reviews
.
BMJ
,
372
,
n71
. ,
[PubMed]
Page
,
M. J.
,
Moher
,
D.
,
Fidler
,
F. M.
,
Higgins
,
J. P. T.
,
Brennan
,
S. E.
, …
McKenzie
,
J. E.
(
2021b
).
The REPRISE project: Protocol for an evaluation of REProducibility and Replicability In Syntheses of Evidence
.
Systematic Reviews
,
10
(
1
),
112
. ,
[PubMed]
Page
,
M. J.
,
Shamseer
,
L.
, &
Tricco
,
A. C.
(
2018
).
Registration of systematic reviews in PROSPERO: 30,000 records and counting
.
Systematic Reviews
,
7
(
1
),
32
. ,
[PubMed]
Papatheodorou
,
S. I.
, &
Evangelou
,
E.
(
2022
).
Umbrella reviews: What they are and why we need them
. In
E.
Evangelou
&
A. A.
Veroniki
(Eds.),
Meta-research: Methods and protocols
(pp.
135
146
).
New York, NY
:
Springer US
. ,
[PubMed]
Pedersen
,
T. L.
(
2022
).
ggraph: An implementation of grammar of graphics for graphs and networks
(2.1.0) [Computer software]
. https://ggraph.data-imaginist.com, https://github.com/thomasp85/ggraph
Pedersen
,
T. L.
(
2023
).
Tidygraph: A tidy API for graph manipulation
(1.2.3) [Computer software]
. https://tidygraph.data-imaginist.com, https://github.com/thomasp85/tidygraph
Pérez-Bracchiglione
,
J.
,
Meza
,
N.
,
Bangdiwala
,
S. I.
,
Niño de Guzmán
,
E.
,
Urrútia
,
G.
, …
Madrid
,
E.
(
2022
).
Graphical Representation of Overlap for OVErviews: GROOVE tool
.
Research Synthesis Methods
,
13
(
3
),
381
388
. ,
[PubMed]
Rethlefsen
,
M. L.
,
Kirtley
,
S.
,
Waffenschmidt
,
S.
,
Ayala
,
A. P.
,
Moher
,
D.
, …
PRISMA-S Group
. (
2021
).
PRISMA-S: An extension to the PRISMA statement for reporting literature searches in systematic reviews
.
Systematic Reviews
,
10
(
1
),
39
. ,
[PubMed]
Robins
,
G.
,
Pattison
,
P.
,
Kalish
,
Y.
, &
Lusher
,
D.
(
2007
).
An introduction to exponential random graph (p*) models for social networks
.
Social Networks
,
29
(
2
),
173
191
.
Shwed
,
U.
, &
Bearman
,
P. S.
(
2010
).
The temporal structure of scientific consensus formation
.
American Sociological Review
,
75
(
6
),
817
840
. ,
[PubMed]
Siontis
,
K. C.
,
Hernandez-Boussard
,
T.
, &
Ioannidis
,
J. P. A.
(
2013
).
Overlapping meta-analyses on the same topic: Survey of published studies
.
BMJ
,
347
,
f4501
. ,
[PubMed]
Strazzullo
,
P.
,
D’Elia
,
L.
,
Kandala
,
N.-B.
, &
Cappuccio
,
F. P.
(
2009
).
Salt intake, stroke, and cardiovascular disease: Meta-analysis of prospective studies
.
BMJ
,
339
,
b4567
. ,
[PubMed]
Taylor
,
R. S.
,
Ashton
,
K. E.
,
Moxham
,
T.
,
Hooper
,
L.
, &
Ebrahim
,
S.
(
2011a
).
Reduced dietary salt for the prevention of cardiovascular disease
.
Cochrane Database of Systematic Reviews
,
7
,
CD009217
. ,
[PubMed]
Taylor
,
R. S.
,
Ashton
,
K. E.
,
Moxham
,
T.
,
Hooper
,
L.
, &
Ebrahim
,
S.
(
2011b
).
Reduced dietary salt for the prevention of cardiovascular disease: A meta-analysis of randomized controlled trials (Cochrane Review)
.
American Journal of Hypertension
,
24
(
8
),
843
853
. ,
[PubMed]
Trinquart
,
L.
,
Johns
,
D. M.
, &
Galea
,
S.
(
2016
).
Why do we think we know what we know? A metaknowledge analysis of the salt controversy
.
International Journal of Epidemiology
,
45
(
1
),
251
260
. ,
[PubMed]
Urlings
,
M. J. E.
,
Duyx
,
B.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P. A.
(
2019a
).
Selective citation in scientific literature on the human health effects of bisphenol A
.
Research Integrity and Peer Review
,
4
,
6
. ,
[PubMed]
Urlings
,
M. J. E.
,
Duyx
,
B.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P. A.
(
2019b
).
Citation bias in the literature on dietary trans fatty acids and serum cholesterol
.
Journal of Clinical Epidemiology
,
106
,
88
97
. ,
[PubMed]
Urlings
,
M. J. E.
,
Duyx
,
B.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P. A.
(
2020
).
Determinants of citation in epidemiological studies on phthalates: A citation analysis
.
Science and Engineering Ethics
,
26
(
6
),
3053
3067
. ,
[PubMed]
Urlings
,
M. J. E.
,
Duyx
,
B.
,
Swaen
,
G. M. H.
,
Bouter
,
L. M.
, &
Zeegers
,
M. P. A.
(
2021
).
Citation bias and other determinants of citation in biomedical research: Findings from six citation networks
.
Journal of Clinical Epidemiology
,
132
,
71
78
. ,
[PubMed]
Useem
,
J.
,
Brennan
,
A.
,
LaValley
,
M.
,
Vickery
,
M.
,
Ameli
,
O.
, …
Gill
,
C. J.
(
2015
).
Systematic differences between Cochrane and non-Cochrane meta-analyses on the same topic: A matched pair analysis
.
PLOS ONE
,
10
(
12
),
e0144980
. ,
[PubMed]
Venturini
,
T.
(
2010
).
Diving in magma: How to explore controversies with actor-network theory
.
Public Understanding of Science
,
19
(
3
),
258
273
.
Venturini
,
T.
, &
Munk
,
A. K.
(
2022
).
Controversy mapping: A field guide
.
Cambridge
:
Polity
.

Author notes

Handling Editor: Vincent Larivière

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For a full description of the license, please visit https://creativecommons.org/licenses/by/4.0/legalcode.

Supplementary data