## Abstract

Visibility algorithms are a family of methods that map time series into graphs, such that the tools of graph theory and network science can be used for the characterization of time series. This approach has proved a convenient tool, and visibility graphs have found applications across several disciplines. Recently, an approach has been proposed to extend this framework to multivariate time series, allowing a novel way to describe collective dynamics. Here we test their application to fMRI time series, following two main motivations, namely that (a) this approach allows vs to simultaneously capture and process relevant aspects of both local and global dynamics in an easy and intuitive way, and (b) this provides a suggestive bridge between time series and network theory that nicely fits the consolidating field of network neuroscience. Our application to a large open dataset reveals differences in the similarities of temporal networks (and thus in correlated dynamics) across resting-state networks, and gives indications that some differences in brain activity connected to psychiatric disorders could be picked up by this approach.

## Author Summary

Here we present the first application of multivariate visibility graphs to fMRI data. Visibility graphs are a way to represent a time series as a temporal network, evidencing specific aspects of its dynamics, such as extreme events. Multivariate time series, as those encountered in neuroscience, and in fMRI in particular, can be seen as a multiplex network, in which each layer represents a time series (a region of interest in the brain in our case). Here we report the method, we describe some relevant aspects of its application to BOLD time series, and we discuss the analogies and differences with existing methods. Finally, we present an application to a high-quality, publicly available dataset, containing healthy subjects and psychotic patients, and we discuss our findings. All the code to reproduce the analyses and the figures is publicly available.

Visibility graphs were recently introduced as a method to map time series into networks (Lacasa, Luque, Ballesteros, Luque, & Nuño, 2008; Luque, Lacasa, Ballesteros, & Luque, 2009), with the aims of using the tools of network science (Boccaletti et al., 2014; Newman, 2010) to describe the structure of time series and their underlying dynamics. This strategy of transforming time series into graphs has been exploited in recent years by some authors and several alternative methods have been put forward, contributing to the nascentfield of performing graph theoretical time series analysis (see Donner, Zou, Donges, Marwan, & Kurths, 2010; Xu, Zhang, & Small, 2008; Zhang & Small, 2006, for a few seminal examples and Gao, Small, & Kurths, 2016 and references therein for a recent overview). Research on visibility graphs has since then focused essentially on two separated avenues. First, analytic studies have primarily explored the foundations of this mapping (Gutin, Mansour, & Severini, 2011; Iacovacci & Lacasa, 2016; Lacasa, 2016; Luque & Lacasa, n.d.) and elaborated on mathematical methods (Lacasa, 2014) to extract rigorous results on the topology of visibility graphs associated to canonical dynamics such as stochastic or chaotic processes (Brú, Gómez-Castro, & Nuño, 2017; Gonçalves, Carpi, Rosso, & Ravetti, 2016; Lacasa, Luque, Luque, & Nuño, 2009; Luque, Ballesteros, Núñez, & Robledo, 2013; Luque, Lacasa, Ballesteros & Robledo, 2011) and to obtain combinatoric analogues of different dynamical quantities (Lacasa, Nuñez, Roldán, Parrondo, & Luque, 2012). The second avenue deals with applications of this machinery, primarily by using this method as a feature extraction procedure with which to build feature vectors that can properly characterize time series with the purpose of making statistical learning (see Bhaduri & Ghosh, 2016; Hou, Li, Wang, & Yan, 2016; Long, Fonseca, Aarts, Haakma, & Foussier, 2014;
Shao, 2010, for a few examples in the life sciences). A visibility graph is a network in which the nodes are time points, and the links are defined according to the visibility criteria described in the text. In this latter context, the application to neuroscience is in its infancy and has been essentially limited so far to the analysis of electroencephalogram (EEG) data (see Ahmadlou, Adeli, & Adeli, 2010, 2012; Ahmadlou, Ahmadi, Rezazade, & Azad-Marzabadi, 2013; Bhaduri & Ghosh, 2016; Mira-Iglesias, Conejero, & Navarro-Pardo, 2016 for a few examples). The study of fMRI recordings under this lens has been scarce, and in this work we would like to motivate and justify why we think this is a promising enterprise, both from a
univariate and—perhaps more interestingly— from a multivariate time series perspective (Lacasa, Nicosia, & Latora, 2015). Multilayer networks incorporate multiple types of interactions between the same nodes. This means that multivariate time series can be represented in a multilayer visibility graph. Among other strategies to map time series intro graphs, using the repertoire of visibility graphs is particularly interesting, not just because its current application is scarce, but also because these methods are well suited to handle the specificities of fMRI data. More concretely, these methods have been shown to be efficient in extracting information and dealing with (a) data polluted with noise (Lacasa et al., 2009), (b) multivariate (Nicosia & Latora, 2015), and (c) non stationary time
series (Luque et al., 2009). In order to showcase the usefulness of visibility graphs in neuroscience we will choose a biggish, high-quality public dataset of resting-state fMRI data (Poldrack et al., 2016), and will make use of the family of visibility algorithms to build a multilevel graph of temporal networks, where each node represents a time point, and two nodes are connected if they are *visible* to each other, according to the algorithm explained below. In the case of multivariate time series—as the ones acquired in neuroimaging—each of these networks is actually the layer of a *multiplex* network (usually associated with a recording in a different region of interest (ROI). Being able to integrate all the data in a single structure enables both the intralayer (univariate) and the interlayer (multivariate) analysis simultaneously. We will show
that a direct analysis of this network provides genuine and nontrivial information on fMRI data, potentially including the description and possible noninvasive classification of some brain diseases.

## MATERIALS AND METHODS

### fMRI data

We used the public dataset described in Poldrack et al. (2016). These data were obtained from the OpenfMRI database, with accession number ds000030. We use resting-state fMRI data from 121 healthy controls, 50 individuals diagnosed with schizophrenia, 49 individuals diagnosed with bipolar disorder, and 40 individuals diagnosed with ADHD (attention-deficit/ hyperactivity disorder). The demographics are reported in the original paper, and they can additionally be found in the GitHub page containing the results of this study (Marinazzo, 2017a).

The fMRI data were preprocessed with FSL (FMRIB Software Library v5.0). The volumes were corrected for motion, after which slice timing correction was applied to correct for temporal alignment. All voxels were spatially smoothed with a 6 mm FWHM (full width at half maximum) isotropic Gaussian kernel and after intensity normalization, a band pass filter was applied between 0.01 and 0.08 Hz. In addition, linear and quadratic trends were removed. We next regressed out the motion time courses, the average cerebrospinal fluid signal, and the average white matter signal. Global signal regression was not performed. Data were transformed to the MNI152 template, such that a given voxel had a volume of 3 mm × 3 mm × 3mm. Finally, we averaged the signal in 278 ROIs using the template described in Shen, Tokoglu, Papademetris, & Constable (2013). In order to localize the results within the intrinsic connectivity network of the resting brain, we assigned each of these ROIs to one of the nine resting-state networks (seven cortical networks, plus subcortical regions and cerebellum) as described in Yeo et al. (2011).

### Construction of the visibility graphs

The procedure to build up a visibility graph is extensively and clearly described in Lacasa et al. (2008, 2009, 2012) for univariate and Lacasa et al. (2015) for multivariate time series. Here we will recall the basic steps and provide a visualization of the application of the methodology to BOLD data.

*N*data, any two time points

*i*and

*j*in which the measured quantity takes the values

*y*

_{i}and

*y*

_{j}, respectively, will have visibility and consequently will become two connected nodes in the associated

*natural visibility*graph if any other data point

*y*

_{k}placed between them fulfills the following condition:

*natural visibility*, an ordering criterion, named

*horizontal visibility*, has also been defined (Lacasa et al., 2009). According to the latter, two time points

*i*and

*j*, in which the measured quantity takes the values

*y*

_{i}and

*y*

_{j}, respectively, will now have horizontal visibility if any other data point

*y*

_{k}placed between them is smaller; that is,

In either case, the resulting graphs have *N* nodes, are connected by a trivial Hamiltonian path that induces a natural ordering in the degree sequence, and are undirected (see Figure 1 for an illustration). In the event that the time arrow turns out to be a relevant aspect, directed graphs can be easily constructed, as detailed in Lacasa et al. (2012). Note that the resulting horizontal visibility graph (HVG) is simply a core subgraph of the natural visibility graphs (NVG), the former being analytically tractable (Lacasa, 2014). As a matter of fact, HVG can be understood as an order statistic (Lacasa & Flanagan, 2015) and therefore filters out any dependency on the series marginal distributions (this is not true for NVG, so in applications where marginal distributions are relevant, one should use
NVG rather than HVG).

Both algorithms are fast: naive implementations of NVGs have a runtime complexity O(*N*^{2}); however, a divide-and-conquer strategy already reduces it to O(*N* log *N*) (Lan, Mo, Chen,Liu, & Deng, 2015). Naive implementation of HVG is already O(*N* log *N*) in most of the cases of practical interest. Finally, these methods are well suited to handle several degrees of nonstationarity in the associated time series (Lacasa & Flanagan, 2015).

In this work we will be analyzing BOLD data, and for that task we decided to choose NVG over HVG. This is because NVGs are in principle better suited to handle and extract long-range correlations than HVGs, as the former naturally allow for the development of hubs, which will be typically associated with extreme events in the data and can correlate with data at all scales. Correlations in time series are actually inherited in graph space in the degree distribution. It is somewhat easier to find fat-tailed degree distributions in NVGs (which account for hubs with extremely large degrees). On the other hand, HVGs (which have shown to work fine with processes evidencing short-range correlations) typically display exponentially decaying degree distributions; this feature is linked to short-scale visibility, making this method more local.

For illustration, Figure 1 depicts how the links are established in the visibility graph according to both visibility criteria. The code used to compute the visibility graphs is available on GitHub (Marinazzo, 2017b) and it is basically a translation to Matlab of the original visibility scripts in Fortran 90 (see http://www.maths.qmul.ac.uk/lacasa/VG.f90).

When it comes to the application to multivariate time series formed by *M* series, note that each of the *M* time series yields a different visibility graph to begin with, so in principle the multivariate series can always be mapped into a multilayer graph with *M* layers (Lacasa et al., 2015). Moreover, since for every node *i* there is a natural correspondence across layers (node *i* corresponds to time stamp *i*, and this is the same time stamp for all components), there exist a natural alignment between every node of each layer, so the multilayer graph is effectively a so-called multiplex network (Boccaletti et al., 2014; Lacasa et al., 2015, see Figure 2 for an illustration). Of course, other smarter alignments between
graphs could be investigated (for instance, one could try to find the alignment that minimizes some sort of Hamming distance between ordered node sets), but in this work we keep it simple and consider the natural alignment induced by the time arrow.

Interestingly, this multiplex visibility graph encodes the complex structure of each time series in the topology of each layer. One can therefore extract in each layer any desired topological feature (say for instance, the entropy over the degree distribution, which would provide a different number for each layer), with which one could build a feature vector that provides a compact representation of the multivariate time series complexity. A similar procedure was followed, for instance, in Ahmadlou et al. (2010) to extract markers of Alzheimer’s disease from a graph theoretical characterization of the Hurst index of EEG data.

*P*(

*k*

^{α}) and

*P*(

*k*

^{β}) of two arbitrary layers

*α*and

*β*, it is defined as

As the degree distribution captures the structure of each layer, this measure is in turn capturing the information shared between the two layers, that is, the information shared across each time series component of the multivariate time series. Now, since this is an *M* × *M* matrix whose *ij* entry provides the mutual information between layers (ROIs) *i* and *j*, one can then, for instance, average across pairs (that is, across ROIs) to find a scalar quantity 〈MI〉: the mean value of the mutual information for each intrinsic connectivity network. This methodology is depicted in Figure 3. Note that other informational or similarity measures between layers could be used instead (e.g., edge overlap, conditional or partial mutual information, transfer entropy). Here for the sake of exposition, we consider only mutual information.

The visibility algorithms produce networks whose nodes are time points. As one can observe in Figure 3, these networks have a modular structure, in which subnetworks are constituted by time points that are mainly adjacent. A network has a modular structure if it can be divided into subnetworks (modules) characterized by a higher probability of connections within each model than across models. A modular structure in a temporal network is thus an indication of different temporal regimes. The existence of these temporal regimes is what motivated the study of dynamical functional connectivity (see, for example, Hansen, Battaglia, Speigler, Deco, &Jirsa, 2013; Hutchison et al., 2013). Dynamic functional connectivity can be seen in the visibility framework as the comparison of the temporal networks, taking their modular structure into account. This comparison can be done in the first place considering the modular network as a whole. In our case we partitioned the visibility graphs for each ROI using 100 runs of the Louvain algorithm. We then quantified the distance between the two partitions by means of the mutual information, using the function in the Brain Connectivity Toolbox (Rubinov & Sporns, 2010). The results of the partition of two ROIs, one in the anterior cingulate cortex (ACC) and one in the precuneus (PCC), are shown in Figure 4. The modules of the graphs correspond to consecutive time points (left panels); that is partitioning the visibility graph provides a natural decomposition of the time series in time intervals. Turning to the interdependency between the two time series, the right panel of Figure 4 represents the Sorensen similarity between each pair of modules in the two time series. It shows that there are segments with high Sorensen indexes, and it is likely that during these segments the two ROIs reflect similar neural events.

## RESULTS

We start by reporting in Figure 5 the results of 〈MI〉 within each of the intrinsic connectivity networks, for the four groups of subjects considered. For each group of subjects, each circle corresponds to 〈MI〉 of a given subject, and random average shifted histograms are also provided. This representation is not parametric, and it is bounded. The plots report the median of the Harrell-Davis estimator, and the 95% high density intervals using a Bayesian bootstrap. The Harrell-Davis estimator doe is independent of the distribution (nonparametric) and is a weighted linear combination of order statistics.

The outliers are detected based on the distance between each pair of data points without assuming symmetry of distributions.

In order to account for departure from normality of these distributions, we used a graphical approach and computed the Kolmogorov-Smirnov distance (Rousselet, Pernet, & Wilcox, 2017), obtaining values up to 0.7 (a value of 0.39 would correspond to rejecting the null hypothesis at a level *α* < 0.001 for the smallest population). The Kolmogorov-Smirnov statistic is a nonparametric test of the equality of continuous, one-dimensional probability distributions.

The number of ROIs constituting each intrinsic state network (thus a proxy for the network size, given that Shen’s parcellation has ROIs of similar size) is not correlated with the average value of the mutual information. In particular, it is interesting to observe that the intrinsic connectivity network called limbic in Yeo’s parcellation is the smallest one, but nonetheless it has a low interlayer mutual information compared with the other networks for all the clinical groups.

The network that showed the clearest differentiation in terms of the average interlayer mutual information among the four clinical groups is indeed the Limbic one (Figure 6). This evidence was assessed by means of a multivariate response test with age of the subjects and framewise displacement as covariates. The *p* value of 0.005 was corrected for multiple comparisons using the Bonferroni-Holm criterion with *α* = 0.05. The Kolmogorov-Smirnov statistics of the pairwise comparison between the distributions of average interlayer mutual information values for these particular networks ranged from 0.15 to 0.3. The null hypothesis of values for controls and schizophrenics drawn from the same distribution would be rejected with an *α* < 0.005. Figure 6 also reports the shift functions to
visualize the difference between two distributions, in this case controls and schizophrenics. The shift function can help us understand and quantify how the two distributions differ. The shift function describes how one distribution should be rearranged to match the other one: it estimates how and by how much one distribution must be shifted.

This function (Wilcox, 1995) does not assume (as *t* tests do) that two distributions differ only in the location of the bulk of the observations, and it enables determination of how, and by how much, two distributions differ. Here the Harrell-Davis quantile estimator is used. Confidence intervals of the decile differences with a bootstrap estimation of the standard error of the deciles are computed, and one controls for multiple comparisons so that the type I error rate remains around 0.05 across the nine confidence intervals (Rousselet et al., 2017). In this specific case we can observe a clear separation for all the quantiles but the ninth one.

To complement this analysis, in Figure 1 we further report two additional ways in which results of this kind are often represented (mean and standard errors). According to this plot, it is already evident to the naked eye that the method easily distinguishes controls from patients with any mental disorder, suggesting that visibility graphs do indeed extract informative features that can be used for noninvasive diagnosis. Visualizing results in such a way is indeed suboptimal and sometimes problematic (nicely explained in Rousselet, Foxe, & Bolam, 2016); for this reason we initially chose the visualizations provided in Figures 5 and 6 (Rousselet et al., 2017).

## DISCUSSION

### Why the (Multivariate) Visibility Graph?

All in all, there are several reasons why we think that visibility graphs are a convenient tool. We discuss some of these reasons below.

#### Usefulness

Visibility graphs have been shown to inherit in their topology the essence of the associated dynamics, including nontrivial fingerprints that are both descriptive and informative for statistical learning purposes.

#### Fit for purpose

These methods can be used directly in both stationary and nonstationary signals (i.e., nonstationarity is not required to be removed). Also, series do not require ad hoc phase partitioning or symbolization. Also, visibility graphs naturally filter out linear trends, so they do not require such detrending (Lacasa et al., 2008). Furthermore, since HVG is an order statistic, it is also invariant under monotonic (order-preserving) rescaling on the data (Lacasa & Flanagan, 2015). The NVG is not invariant under this latter transformation however, so nonlinear rescaling to make data more “peaky” will necessarily modify the associated NVG in a nontrivial way.

#### Computationally easy and efficient

The method is numerically straightforward to implement and the runtime algorithms are quite decent, varying from *O*(*N*) for so-called visibility sequential motifs (Iacovacci & Lacasa, 2016) to O(*N* log *N*) for the full adjacency matrices using a divide-and-conquer strategy.

#### Amenable to analytical insight

Unlike other strategies for graph theoretical time series analysis, visibility graphs are not computational black boxes. More particularly for HVG (but not only Iacovacci & Lacasa, 2016; Luque et al., 2009), there exist several theorems available and methods to build rigorous results of HVG properties (Lacasa, 2014, 2016; Lacasa et al., 2009, 2012; Luque et al., 2013). The latter is an area of intense research activity at the interface between combinatorics and dynamical systems.

#### Versatile

The methods are not context dependent but are generally applicable to both univariate and multivariate time series across the disciplines. A drawback of this property is that the topological features one can extract from these graphs are themselves not context dependent.

#### Novel

It builds a bridge between time series and networks and thus opens the exciting possibility of exploring the usefulness of a large set of new tools in the endeavor to describe and classify complex signals.

Coming back to the specific reason why we think that natural visibility graphs are particularly suited for BOLD data, it has been shown that relevant information on the time course of the BOLD signal and on correlated activity can be extracted by looking at single frames, corresponding to peaks in the signal (Liu & Duyn, 2013; Tagliazucchi, Balenzuela, Fraiman, & Chialvo, 2012), and that these events could be the proxy for an innovation signal at the neural level (Karahanoglu & Van De Ville, 2015; Wu et al., 2013). In this framework, the degree of the nodes corresponding to the BOLD peaks in the adjacency matrix constructed according to natural visibility emphasizes the functional relevance of the neural events and of the cor responding patterns of coactivation across the brain. However, both NVG and HVG have been shown to be useful in different contexts, so there is no general rule of thumb on what method should we use: this choice shall be addressed on a case-by-case basis.

Finally, what is important and informative when describing the properties of a certain cognitive state? Is it the complex pattern underlying the structure of individual time series (that is, local activity of ROIs) of different regions? Or are the correlations and interdependencies (understood in a broad sense) between these regions the key aspect to look at? When the latter is the case, a functional network analysis approach (Bullmore & Sporns, 2009) seems appropriate. In the former case where the nature of local activity across regions already captures information (He, 2011; Zang, Jiang, Lu, He, & Tian, 2004), one does not need to resort to functional dependencies and local analysis is the correct thing to do. This is obviously an open question that should be addressed, from a biological point of view, on a case-by-case basis. A recent study suggests that both conceptual frameworks can indeed be connected (Sethi, Zerbi, Wenderoth, Fornito, & Fulcher, 2017). In general, both aspects likely play a relevant role, and some studies have already successfully merged the two (Ciuciu, Abry, & He, 2014; Tagliazucchi et al., 2016). Nevertheless, the multiplex visibility framework offers a compact way of extracting at once both the local temporal structure (via the network intra layer properties) and the global interconnection pattern (via multiplex interlayer similarities).

### Similarity with other measures

We discussed at the end of the Methods section that the modular temporal graphs resulting from the visibility algorithm are a natural way to describe different dynamical regimes of individual time series, and their interdependence, without arbitrary and possibly problematic choices such as a sliding window and its length (Hindriks et al., 2016; Kudela, Harezlak, & Lindquist, 2017).

Features of the visibility graph, such as the modularity, the clustering coefficient, or the node degree, could be used as features in classification algorithms aimed to detect modulations of the local and correlated dynamical regime of BOLD signals.

Furthermore, using the excellent resource NeuroVault (http://neurovault.org/), we also looked at the maps depicting the results of other measures and noticed that the areas belonging to the limbic Yeo network are associated with lower levels of regional homogeneity (Zang et al., 2004), higher coefficient of variation of the BOLD signal (Wu & Marinazzo, 2016), and lower value of the fractional amplitude of low-frequency fluctuations (fALFF) (Zou et al., 2008). This evidence speaks to the fact that interlayer mutual information in multiplex visibility networks is associated with decreased predictability and increased independence between the degrees of freedom of the measured time series.

### Classification of neural disorders

The main focus of this paper is methodological, and a thorough discussion of the implications of our results on neuroimaging studies of psychiatric disorders is beyond its scope; moreover, we would not want to hypothesize after the results are known (HARKing) (Poldrack et al., 2017). However, it is interesting to highlight that the limbic network has been previously associated with mental disorders (Kiehl, 2006; Liston, Cohen, Teslovich, Levenson, & Casey, 2011; Potvin, Lungu, Tikàsz, & Mendrek, 2017; Rdulescu & Mujica-Parodi, 2009; Roberts et al., 2016; Whalley et al., 2012). In the same way, we refer the reader to recent studies specifically aimed at using advanced neuroimaging data analysis tools to map and classify neural disorders (Cetin et al., 2016; Demirci et al., 2008; Miller, Vergara, Keator, & Calhoun, 2016), and Fornito, Zalesky, & Breakspear (2015) for a review. Our results shown here using visibility graphs confirm some of this previous work and further showcase that visibility graphs extract informative features with which we can find statistically significant signatures of different neural disorders.

To conclude, given the exposition and results reported in this study, we hope to have motivated our colleagues to consider visibility graphs as a valuable tool for both exploratory and focused studies.

## AUTHOR CONTRIBUTIONS

Speranza Sannino: Formal analysis; Software; Writing original draft. Sebastiano Stramaglia: Conceptualization; Methodology; Software; Writing review & editing. Lucas Lacasa: Conceptualization; Software; Writing original draft; Writing review & editing. Daniele Marinazzo: Conceptualization; Investigation; Methodology; Project administration; Software; Supervision; Validation; Visualization; Writing original draft; Writing review & editing.

## ACKNOWLEDGMENTS

We thank Matteo Fraschini (University of Cagliari) for setting up the Erasmus mobility for Speranza. We thank Caroline Garcia Forlim for consulting on the mutual information code. We thank Enzo Nicosia (Queen Mary University of London) for stimulating discussions on visibility graphs.

## TECHNICAL TERMS

- Visibility graph:
A visibility graph is a network in which the nodes are time points, and the links are defined according to the visibility criteria described in the text.

- Multilayer networks:
Multilayer networks incorporate multiple types of interactions between the same nodes. This means that multivariate time series can be represented in a multilayer visibility graph.

- Harrell-Davis estimator:
The Harrell-Davis estimator does is independent of the distribution (nonparametric) and is a weighted linear combination of order statistics.

- Kolmogorov-Smirnov statistic:
The Kolmogorov-Smirnov statistic is a nonparametric test of the equality of continuous, one-dimensional probability distributions.

- Shift function:
The shift function can help us understand and quantify how the two distributions differ. The shift function describes how one distribution should be rearranged to match the other one: it estimates how and by how much one distribution must be shifted.

- Modularity in networks:
A network has a modular structure if it can be divided into subnetworks (modules) characterized by a higher density of connections within each module than across modules.

## REFERENCES

## External Supplements

## Competing Interests

Competing Interests: The authors have declared that no competing interests exist.

## Author notes

Handling Editor: Olaf Sporns