Abstract
Graph matching algorithms attempt to find the best correspondence between the nodes of two networks. These techniques have been used to match individual neurons in nanoscale connectomes—in particular, to find pairings of neurons across hemispheres. However, since graph matching techniques deal with two isolated networks, they have only utilized the ipsilateral (same hemisphere) subgraphs when performing the matching. Here, we present a modification to a state-of-the-art graph matching algorithm that allows it to solve what we call the bisected graph matching problem. This modification allows us to leverage the connections between the brain hemispheres when predicting neuron pairs. Via simulations and experiments on real connectome datasets, we show that this approach improves matching accuracy when sufficient edge correlation is present between the contralateral (between hemisphere) subgraphs. We also show how matching accuracy can be further improved by combining our approach with previously proposed extensions to graph matching, which utilize edge types and previously known neuron pairings. We expect that our proposed method will improve future endeavors to accurately match neurons across hemispheres in connectomes, and be useful in other applications where the bisected graph matching problem arises.
INTRODUCTION
Graph matching is a widely used optimization technique whereby one can find a matching between the nodes in one network and those in another. Solving the graph matching problem yields a matching (i.e., the correspondence between the nodes of the two networks) that minimizes edge disagreements between the two networks. The graph matching problem has found uses in fields as disparate as computer vision (Conte, Foggia, Sansone, & Vento, 2004), biometrics (Conte et al., 2004), social networks (Saad-Eldin, Pedigo, Priebe, & Vogelstein, 2021), and natural language processing (Marchisio et al., 2021), to name just a few.
Most important for this work is the use of graph matching techniques to find bilaterally homologous neurons across the two sides of a nervous system (Chung et al., 2021; Sussman, Park, Priebe, & Lyzinski, 2020). Connectomes—maps of neural connectivity—can naturally be represented by networks, wherein a node represents a neuron and an edge represents synapses from one neuron to another (Bassett & Sporns, 2017; Vogelstein et al., 2019). Previous works used graph matching techniques to predict neuron pairings between brain hemispheres based on the observed connectivity (Chung et al., 2021; Sussman et al., 2020). The graph matching problem by its very formulation is concerned with two separate networks; as such, previous applications of graph matching to find homologous neuron pairings across hemispheres have considered one network to be the set of nodes and edges within the left hemisphere, and the other network to be defined likewise for the right hemisphere (see Figure 1). In other words, they have only considered the ipsilateral connections which connect within a brain hemisphere, and ignored the contralateral connections which connect one side of the nervous system to the other. Contralateral connections are quite common in connectomes studied thus far: in subset of the larval Drosophila melanogaster (vinegar fly) connectome (Berck et al., 2016; Burgos et al., 2018; Carreira-Rosario et al., 2018; Eichler et al., 2017; Eschbach et al., 2020, 2021; Fushiki et al., 2016; Gerhard, Andrade, Fetter, Cardona, & Schneider-Mizell, 2017; Heckscher et al., 2015; Hückesfeld et al., 2021; Jovanic et al., 2016, 2019; Larderet et al., 2017; Mark et al., 2021; Miroschnikow et al., 2018; Ohyama et al., 2015; Schlegel et al., 2016; Takagi et al., 2017; Tastekin et al., 2018; Zarin, Mark, Cardona, Litwin-Kumar, & Doe, 2019; Zwart et al., 2016), an edge picked at random from the network has about a 35% chance of being a contralateral connection. It is natural to wonder, then, whether these connections can be used to improve automated neuron pairing.
Here, we show that rather than ignoring the contralateral connections for the purposes of predicting neuron pairs, they can be explicitly included in the optimization by generalizing graph matching to a single network that has been split into two parts. We demonstrate via simulation that when sufficient edge correlations exist between the contralateral subgraphs, our proposed method provides an improvement in matching accuracy. We then show that this methodology indeed improves matching accuracy in our motivating example of a bilateral nervous system by comparing our algorithm to traditional graph matching on five connectome datasets. Further, we describe how our method can be combined with previously proposed generalizations of graph matching to further improve performance.
RESULTS
From Graph Matching to Bisected Graph Matching
Given this notion of what it means to find a good matching between the hemispheres, our goal was to develop an algorithm that could efficiently solve this problem (Equation 2). Unfortunately, graph matching problems in general are known to be NP-hard (Burkard, Dell’Amico, & Martello, 2009), and as such efficient algorithms for solving these problems are approximations. One popular approximation-based algorithm is the Fast Approximate Quadratic (FAQ) algorithm of Vogelstein et al. (2015). This algorithm first relaxes the (discrete) graph matching problem to the relaxed graph matching problem, allowing the tools of continuous optimization to be used (see section Graph Matching Algorithms for discussion of this approximation). FAQ then uses the Frank-Wolfe method (Frank & Wolfe, 1956) to attempt to minimize Equation 1.
Matching Simulated Networks
Here, we demonstrate that this approach improves matching accuracy in simulated datasets when there is sufficient correlation in the contralateral subgraphs. To understand how this correlation affects the usefulness of bisected graph matching, we created simulated data using the correlated Erdős-Rényi model (CorrER). The correlated Erdős-Rényi model is a special case of the correlated stochastic block model introduced in Lyzinski et al. (2015). Briefly, a pair of networks is distributed CorrER(n, p, ρ) if both networks marginally are distributed as Erdős-Rényi models (Erdős & Rényi, 1960; Gilbert, 1959) with n nodes and global connection probability p, but the edges of the two networks have Pearson correlation ρ. Note that this correlation of edges also requires specifying an alignment of one network to the other, which we can use as ground truth for evaluating our algorithm. Here, we use the version of this model for a directed network to more closely resemble nanoscale connectome data, which has directed edges. We used the correlated Erdős-Rényi model (as implemented in graspologic; Chung et al., 2019) to construct a simulation of a “bilateral” network as follows:
- The ipsilateral subgraphs were sampled from a correlated Erdős-Rényi model (CorrER):
- Independently of the ipsilateral networks, the contralateral subgraphs were sampled from a correlated Erdős-Rényi model:
- The full network was defined as
- To simulate an unknown correspondence between the nodes of the left and right hemispheres, we applied a random permutation (Prand) to the nodes of the “right hemisphere” in each sampled network:
Ainput is the network that was input to the matching algorithms, obscuring the true matching from the true alignment in order to evaluate their performance. We varied the value of ρcontra from zero (ALR and ARL have no correlation, and thus are not helpful for matching) to one (ALR and ARL are isomorphic, providing extremely helpful information for matching). For each value of ρcontra, we simulated 1,000 networks. For each network, we ran the graph matching (GM) and bisected graph matching (BGM) algorithms to attempt to uncover the correct permutation that would realign the left and right hemispheres. For both algorithms, we used default parameters, and one initialization for each algorithm. For each run of each algorithm, we examined the matching accuracy, which is the proportion of nodes correctly matched.
Figure 2 shows the matching accuracy for both algorithms as a function of ρcontra. For low values of ρcontra, using bisected graph matching actually degrades performance. When ρcontra = 0, the match accuracy drops by ∼29%. This is unsurprising, as in this case the contralateral subgraphs are effectively noise with respect to the correct matching between the left and right. For small values of ρcontra, bisected graph matching often found permutations of the contralateral subgraphs which had fewer edge disagreements than the alignment used to generate the correlated networks (Supporting Information: Understanding Matching for Weakly Correlated Networks), explaining why including these connections pulls the solution away from the true matching. However, as ρcontra increases, bisected graph matching eventually outperforms graph matching. For this simulation, when ρcontra is greater than 0.4, the accuracy for bisected graph matching is higher (by more than ∼20% when ρcontra ≥ 0.9). We also found that this phenomenon was consistent across a range of network sizes (Supporting Information: Effect of Network Size in Simulated Experiments).
Whether bisected graph matching improves accuracy is determined by many factors, including the correlation in contralateral edge structure studied here in this simple simulation. We next sought to see whether this bisected graph matching would be helpful in our motivating example of matching neurons between two sides of a nervous system.
Matching Connectomes
We examined the performance of both graph matching algorithms on a set of real connectome datasets. To ensure we could evaluate the performance of both algorithms, we restricted our analysis to connectomes for which pairings of individual neurons between sides of the nervous system were already known. We studied the (chemical) connectomes of both a hermaphrodite and a male Caenorhabditis elegans worm (Cook et al., 2019), the pharynges of two Pristionchus pacificus worms (Bumbarger, Riebesell, Rödelsperger, & Sommer, 2013), and a subset of a larval Drosophila melanogaster (Berck et al., 2016; Burgos et al., 2018; Carreira-Rosario et al., 2018; Eichler et al., 2017; Eschbach et al., 2020, 2021; Fushiki et al., 2016; Gerhard et al., 2017; Heckscher et al., 2015; Hückesfeld et al., 2021; Jovanic et al., 2016, 2019; Larderet et al., 2017; Mark et al., 2021; Miroschnikow et al., 2018; Ohyama et al., 2015; Schlegel et al., 2016; Takagi et al., 2017; Tastekin et al., 2018; Zarin et al., 2019; Zwart et al., 2016). For all these datasets, neuron pairings across sides of the nervous system are not complete—indeed, some neurons appear only on one side of the organism or exactly in the center (Cook et al., 2019). Thus, we restricted our analysis to the subset of neurons that were present as a bilaterally homologous pair and for which this pairing was known. We then ensured that the remaining set of nodes was fully (weakly) connected for each dataset, removing nodes not part of the largest connected component. Table 1 shows summary statistics for each of the connectome datasets considered here. We treat each network as weighted (by synapse count) and directed (since the direction of chemical synapses is known). Note that for each dataset, contralateral edge correlation was high (≥0.7), suggesting that bisected graph matching could facilitate better matching.
Dataset . | Number nodes . | Number edges . | % contralateral synapses . | Ipsilateral correlation . | Contralateral correlation . |
---|---|---|---|---|---|
P. pacificus pharynx 1 | 18 | 35 | 30 | 0.79 | 0.83 |
P. pacificus pharynx 2 | 22 | 33 | 32 | 0.84 | 0.88 |
C. elegans hermaphrodite | 286 | 2,838 | 41 | 0.87 | 0.86 |
C. elegans male | 360 | 2,482 | 37 | 0.81 | 0.70 |
D. melanogaster larva subset | 1,240 | 32,564 | 31 | 0.88 | 0.83 |
Dataset . | Number nodes . | Number edges . | % contralateral synapses . | Ipsilateral correlation . | Contralateral correlation . |
---|---|---|---|---|---|
P. pacificus pharynx 1 | 18 | 35 | 30 | 0.79 | 0.83 |
P. pacificus pharynx 2 | 22 | 33 | 32 | 0.84 | 0.88 |
C. elegans hermaphrodite | 286 | 2,838 | 41 | 0.87 | 0.86 |
C. elegans male | 360 | 2,482 | 37 | 0.81 | 0.70 |
D. melanogaster larva subset | 1,240 | 32,564 | 31 | 0.88 | 0.83 |
Note. Correlations are Pearson’s correlation coefficient. All metrics are with respect to the datasets after processing to select fully connected networks composed of neurons who have bilateral pairs, as described in Matching Connectomes section. The number of nodes per hemisphere and (and the number of known pairs) is always half the total number of nodes, since only paired neurons are considered.
For each connectome, we predicted each neuron’s pair on the other side of the nervous system by applying either the graph matching or bisected graph matching algorithms. We ran 50 initializations, each from the barycenter, since neither algorithm is deterministic (see Graph Matching Algorithms section for more explanation on initialization and randomness in the algorithm). For each initialization, we ran both algorithms with default parameters (Chung et al., 2019) and measured the matching accuracy with respect to the known pairing of neurons.
We observed that for all the connectomes studied here, bisected graph matching improves matching performance (Figure 3), sometimes dramatically so. For all five connectomes, the match ratio for the bisected graph matching algorithm was significantly higher (p < 0.0005 for each dataset, two-sided Mann-Whitney U tests). Matching accuracy increased by ∼44% and ∼20% for the two P. pacificus samples, ∼29% for the hermaphrodite C. elegans, ∼15% for the male C. elegans, and ∼22% for the Drosophila larva subset, respectively. We also found that this trend holds when we relaxed the requirement that all nodes in the connectome have a homologous partner, finding that bisected graph matching always provided increased matching accuracy even when some neuron pairs in these connectomes were artificially unmatched (Supporting Information: Matching With Simulated Unpaired Neurons). These results demonstrate the practical utility of our proposed algorithm for improving bilaterally homologous pair prediction in connectomes. We next sought to show how our proposed method can be combined with previously described extensions of graph matching to further improve performance.
Matching Multiplex Networks
While connectomes are often described as networks, many of these datasets actually lend themselves to multiplex network representations. For the purposes of this paper, we consider multiplex networks to have one set of nodes, but potentially multiple types of edges between these nodes (see Kivelä et al. (2014) for a review of multilayer networks more generally). For instance, in C. elegans, both chemical (synaptic) and electrical (gap junction) connections have been mapped (Cook et al., 2019). If we consider these connections to each be of their own “type,” then we can construct an adjacency matrix for each—these become the “layers” of our multiplex network. As further examples of edge types in connectomics, Drosophila connectomes are beginning to have neurotransmitter information associated with each synapse (Eckstein et al., 2020), as well as a differentiation between axo-axonic, axo-dendritic, dendro-axonic, and dendro-dendritic connections (Buhmann et al., 2021; Schneider-Mizell et al., 2016).
To match neurons based on connectivity using this multiplex network information, one can generalize the graph matching problem to a multiplex graph matching problem. Pantazis et al. (2022) proposed a generalization of the FAQ algorithm to solve this problem. This multiplex graph matching scheme can easily be combined with the bisected graph matching proposed in this work, again by simply modifying the graph matching objective function and its gradient to account for these multiple connection types (see Multilayer Graph Matching section for more details).
We applied graph matching and bisected graph matching to the connectomes of both C. elegans sexes, and varied whether the networks used were either chemical, electrical, or both (multiplex network). Figure 4 displays matching accuracy for both algorithms using each combination of edge types. We observed a clear advantage to using multilayer graph matching on these datasets: in both connectomes and for both GM and BGM, matching with the multilayer network outperformed matching for either chemical or electrical connections alone. We also found that BGM outperforms GM for any combination of network layers for both connectomes. These results highlight the advantages of combining BGM with a previously described extension of graph matching (Pantazis et al., 2022) when multiple edge types are available, as the highest accuracy on both datasets came from using both contralateral connections and multiple edge types.
Matching Networks of Differing Sizes and With Seeds
Next, we studied how BGM would work with two further extensions to graph matching based on the work of Fishkind et al. (2019). Often, the two hemispheres being matched may not have exactly the same number of neurons, but one still wishes to find a matching between them. Further, partial matching information is also common—for instance, one could have complete matching information about a subset of nodes in some brain region, and would like to use this partial matching to improve matching of the rest of the brain. Fishkind et al. (2019) studied exactly this setting, proposing “padding” schemes to deal with networks which have different sizes, as well as a method for incorporating a partial matching or “seed” nodes into the FAQ algorithm.
We applied the corresponding generalizations of these ideas to the bisected graph matching case (see Seeded Graph Matching section for details), allowing us to apply our proposed algorithm to a dataset where a partial matching was known ahead of time and the number of neurons in the two hemispheres was not the same. To demonstrate these capabilities, we applied this method to the Drosophila larva partial connectome (Berck et al., 2016; Burgos et al., 2018; Carreira-Rosario et al., 2018; Eichler et al., 2017; Eschbach et al., 2020, 2021; Fushiki et al., 2016; Gerhard et al., 2017; Heckscher et al., 2015; Hückesfeld et al., 2021; Jovanic et al., 2016, 2019; Larderet et al., 2017; Mark et al., 2021; Miroschnikow et al., 2018; Ohyama et al., 2015; Schlegel et al., 2016; Takagi et al., 2017; Tastekin et al., 2018; Zarin et al., 2019; Zwart et al., 2016). In the Matching Connectomes section, we restricted our analysis to the set of nodes for which published pairings existed, such that we could evaluate matching accuracy. Here, we relaxed this restriction, and used these published pairs as seed nodes. We also note that the full collection of published neurons has 942 neurons on the left hemisphere and 938 neurons on the right hemisphere. The padded graph matching of Fishkind et al. (2019) allowed us to perform a matching on these two networks of differing sizes (resulting in some neurons on the larger hemisphere not being matched in each run of the algorithm).
To examine the effect of seeds, we performed a cross-validation-like experiment, wherein some seeds (20%) were reserved for evaluation so that we could compute matching accuracy. We used some number of the remaining pairings as seeds for either graph matching or bisected graph matching. Figure 5 shows matching accuracy on these held-out known pairings as a function of the number of seeds used. We found that for any number of seeds, bisected graph matching always had a higher mean matching accuracy than graph matching. Conversely, BGM can be viewed as allowing the user to reach the same accuracy level for a smaller number of previously known seeds, which can be effortful to obtain for a new dataset.
Given the superiority of BGM over GM across a range of experiments, we finally sought to examine the matches for neurons where we did not know of a previously presented pairing. We reran 100 initializations of bisected graph matching on the Drosophila larva subset, using all known pairings as seeds. Figure 6 shows the morphology of six example-predicted neuron pairs that were always matched together across all 100 initializations. We found that the morphology of these frequently matched neurons was generally similar, suggesting that they may represent true bilaterally homologous pairings. Further investigation will be required to confirm or reject these candidate matches, but our results demonstrate how bisected graph matching can be used to easily provide well-informed guesses for these pairings. We also provide examples of neurons that were very infrequently paired across each initialization (implying a lower confidence in these matches; Fishkind et al., 2019), suggesting that these pairs are less likely to be true homologs (Supporting Information Figure S4). We include all matching results for these previously unpaired neurons in the Supporting Information (see Code and Data section).
DISCUSSION
Summary
We proposed a simple generalization of the graph matching problem, which incorporates any connections between the two sets of nodes being matched. We then showed how this problem could be solved by using a new objective function in the framework of a state-of-the-art graph matching algorithm, FAQ (Vogelstein et al., 2015). In simulations, we saw that as the strength of the correlation between the contralateral subgraphs increases, these connections become more useful to include in the matching process. By running both graph matching and bisected graph matching on five connectome datasets, we provided compelling evidence that for practical purposes in neuroscience, including these contralateral connections in the optimization is beneficial. We further showed how our algorithm can be applied to settings involving multiplex networks, networks of differing sizes, and how the algorithm can leverage a partial, known pairing of neurons to improve matching performance for the remaining neurons. We have provided a documented, open-source implementation of our algorithm (Python 3) to enable its easy application to future connectome datasets (see Code and Data section).
Limitations
As we showed in simulation in Matching Simulated Networks section, bisected graph matching is only likely to improve matching accuracy in connectomes when there is sufficient correlation between the contralateral subgraphs. For a new organism (or possibly even just a new sample), this will not be known in practice. Domain knowledge as to the nature of the contralateral connections in an organism’s brain may be important when choosing whether to include them in the matching as described in this work, though we note that all five connectomes studied in this work had a high (>0.7) contralateral correlation (Table 1). In practice, it may be best to evaluate different matching algorithms (and hyperparameters) on a subset of the connectome prior to matching a complete dataset.
Further, other approaches to matching neurons in neuroscience do not use connectivity at all: Costa, Manton, Ostrovsky, Prohaska, and Jefferis (2016) introduced an algorithm for matching neurons on the basis of morphology, which has been widely used on connectomic reconstructions. In practice, it is likely that the best neuron pairings will be achieved by a joint optimization that considers morphology, multiple edge types, seeds, and the contralateral connections as proposed in this work. We did not explore this possibility here, but it remains an intriguing future pursuit.
Outlook
As more connectomes are mapped both from new organisms or for more individuals of the same species, the tools provided here will accelerate the process of finding correct pairings of neurons between the two sides of a nervous system, while requiring less human labor to annotate pairs by hand. These neuron pairings appear to be a fundamental property of the invertebrate nervous systems studied in connectomics thus far. Finding these neuron pairs is important for understanding the stereotypy in an organism (i.e., how similar is the connectivity of the left and the right) (Cook et al., 2019; Randel et al., 2015; Schlegel, Bates, et al., 2021; Witvliet et al., 2021). Additionally, these neuron pairs can be useful for statistical approaches which leverage a one-to-one correspondence of nodes across networks (Athreya et al., 2017; Tang et al., 2017). We also note that the bisected graph matching problem (and the analogous version of the quadratic assignment problem, which is equivalent to the graph matching problem up to a sign change (Vogelstein et al., 2015) may arise in other settings where one wishes to match nodes in a single graph that can be split into two parts and some level of symmetry exists between them.
METHODS
Graph Matching Algorithms
Since graph matching is an NP-hard problem (Burkard et al., 2009; Conte et al., 2004), no efficient algorithm exists that will always yield a perfect matching. Part of the difficulty of this problem is that the search space of permutations is large (there are n! permutations of n nodes) and discrete (there is no way to interpolate between two permutations and still have a permutation). Thus, many algorithms relax this constraint, enabling efficient solutions of the relaxed problem (Fiori, Sprechmann, Vogelstein, Muse, & Sapiro, 2013; Lyzinski et al., 2016; Vogelstein et al., 2015; Zaslavskiy, Bach, & Vert, 2009). A common approach (used by FAQ (Vogelstein et al., 2015) and other algorithms (Fiori et al., 2013; Zaslavskiy et al., 2009)) is to relax the (discontinuous) search for a permutation matrix to the convex hull of this set, the set of doubly stochastic matrices, 𝒟 (Vogelstein et al., 2015). Note that Lyzinski et al. (2016) showed that under a model of correlated random Bernoulli graphs, the best solution to the relaxation used by FAQ almost always yields the correct permutation matrix as the size of the networks grows (this does not say that FAQ will always find this solution, however).
In this relaxed space of doubly stochastic matrices, FAQ requires an initial position to start its search. A common default value (which we use in this work) is the barycenter, which is the centroid of the set of all doubly stochastic matrices, and is simply an n × n matrix where all elements are . FAQ then proceeds by using the Frank-Wolfe method to iteratively update its search for a doubly stochastic matrix that maps one adjacency matrix to another. The algorithm terminates after either a maximum number of iterations or when the search positions change very little (less than some tolerance parameter) between iterations. After this doubly stochastic solution has been found, FAQ then projects back onto the set of permutation matrices by solving the linear assignment problem.
Algorithm 1 details the BGM-via-Frank-Wolfe algorithm (referred to simply as BGM in the text), which simply adapts this procedure by replacing the objective function and its gradient to solve the bisected graph matching problem as described in the From Graph Matching to Bisected Graph Matching section. We refer the interested reader to Vogelstein et al. (2015) for further details on the original algorithm, and to Fishkind et al. (2019) for many interesting extensions. We also note that implementations of the FAQ algorithm are available in SciPy (Virtanen et al., 2020) and graspologic (Chung et al., 2019).
Require: Adjacency matrices for each of the four subgraphs: ALL, ARR, ALR, ARL ∈ ℝn×n. |
Initialize:P(0) ∈ 𝒟, barycenter (P(0) = 1n × ) unless otherwise specified |
fori = 1, 2, 3, … while (i ≤ MAXITER) and (∥Pi − Pi−1∥F ≥ TOLERANCE)) do |
1. Compute ∇f(P(i)) = −(ALLP(i) + P(i)ARR + ALRARL + ALR) |
2. Compute Q(i) ∈ argmin tr(QT∇f(P(i))) over Q ∈ 𝒟 via linear assignment problem solver, e.g., Hungarian algorithm (Kuhn, 1955) |
3. Compute step size α(i) ∈ argmin f(αP(i) + (1 − α)Q(i)), for α ∈ [0, 1] |
4. Set P(i+1) = αP(i) + (1 − α)Q(i) |
end for |
return ∈ argmin tr(QT∇f(P(final))) over Q ∈ 𝒫 via linear assignment problem solver. |
Require: Adjacency matrices for each of the four subgraphs: ALL, ARR, ALR, ARL ∈ ℝn×n. |
Initialize:P(0) ∈ 𝒟, barycenter (P(0) = 1n × ) unless otherwise specified |
fori = 1, 2, 3, … while (i ≤ MAXITER) and (∥Pi − Pi−1∥F ≥ TOLERANCE)) do |
1. Compute ∇f(P(i)) = −(ALLP(i) + P(i)ARR + ALRARL + ALR) |
2. Compute Q(i) ∈ argmin tr(QT∇f(P(i))) over Q ∈ 𝒟 via linear assignment problem solver, e.g., Hungarian algorithm (Kuhn, 1955) |
3. Compute step size α(i) ∈ argmin f(αP(i) + (1 − α)Q(i)), for α ∈ [0, 1] |
4. Set P(i+1) = αP(i) + (1 − α)Q(i) |
end for |
return ∈ argmin tr(QT∇f(P(final))) over Q ∈ 𝒫 via linear assignment problem solver. |
Two nuances of this algorithm for practical usage are worth commenting on. First, we note that FAQ is not guaranteed to find the correct solution to the graph matching problem (and again, neither is any polynomial-time algorithm (Burkard et al., 2009; Conte et al., 2004)). Even if the minimizer to the indefinite relaxed graph matching problem is the correct permutation (as described in Lyzinski et al., 2016), the Frank-Wolfe method may get stuck in a local minimum, and not find this best solution. Second, this algorithm is not deterministic—different initializations can lead to different solution paths, which may get stuck in local minima. Even from the same initialization, there may be more than one-step direction (see Algorithm 1 Step 2) at any given position in the solution space, since multiple step directions can be deemed equally suitable. Our implementation simply chooses one of these at random. Thus, even from the same initialization, it is often beneficial to restart the algorithm multiple times, and choose the solution with the best objective function value. For this reason, a number of the experiments in the main text specify the number of initializations used.
Multilayer Graph Matching
Seeded Graph Matching
Fishkind et al. (2019) considered modifying FAQ to solve the so-called seeded graph matching problem, wherein a subset of the nodes of the two networks are matched ahead of time. The goal is to leverage these previously known pairings to improve the pairings of the rest of the network. This seeded graph matching problem can be thought of as restricting the search space of all permutations to only those which respect a particular seed set.
Padded Graph Matching
Code and Data
Analyses relied on graspologic (Chung et al., 2019), NumPy (Harris et al., 2020), SciPy (Virtanen et al., 2020), Pandas (McKinney, 2010), NetworkX (Hagberg, Swart, & S Chult, 2008), and pymaid (Schlegel, Phelps, & Jefferis, 2021). Plotting was performed using matplotlib (Hunter, 2007), Seaborn (Waskom, 2021), and NAVis (Schlegel, Barnes, Jagannathan, Pedigo, & Court, 2021).
All code for this paper (implemented in Python 3) can be found on GitHub at github.com/neurodata/bgm (Pedigo, 2022a) and viewed as a Jupyter Book (Executable Books Community, 2020) at docs.neurodata.io/bgm. There are no primary data in the paper (see references in Matching Connectomes section). All data is included in the GitHub repository, including the matching results (Pedigo, 2022b) on the Drosophila larva connectome subset. The source code and data is also archived at doi.org/10.5281/zenodo.6561550.
ACKNOWLEDGMENTS
We thank Thomas Athey for helpful comments.
SUPPORTING INFORMATION
Supporting information for this article is available at https://doi.org/10.1162/netn_a_00287, https://github.com/neurodata/bgm (Pedigo, 2022a), and https://github.com/neurodata/bgm/blob/main/results/outputs/connectome_seeded/pair_predictions.csv (Pedigo, 2022b). The algorithm proposed in this work has also been implemented in graspologic at https://github.com/microsoft/graspologic (Chung et al., 2019).
AUTHOR CONTRIBUTIONS
Benjamin David Pedigo: Conceptualization; Data curation; Formal analysis; Funding acquisition; Investigation; Methodology; Project administration; Software; Supervision; Validation; Visualization; Writing – original draft; Writing – review & editing. Michael Winding: Conceptualization; Data curation; Investigation; Writing – review & editing. Carey E. Priebe: Conceptualization; Funding acquisition; Investigation; Methodology; Supervision; Writing – review & editing. Joshua T. Vogelstein: Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Resources; Supervision; Writing – review & editing.
FUNDING INFORMATION
Benjamin David Pedigo, National Science Foundation (https://dx.doi.org/10.13039/100000001), Award ID: DGE1746891. Joshua T. Vogelstein, National Science Foundation (https://dx.doi.org/10.13039/100000001), Award ID: 1942963. Joshua T. Vogelstein, National Science Foundation (https://dx.doi.org/10.13039/100000001), Award ID: 2014862. Joshua T. Vogelstein, Foundation for the National Institutes of Health (https://dx.doi.org/10.13039/100000009), Award ID: 1RF1MH123233-01. Carey E. Priebe, Foundation for the National Institutes of Health (https://dx.doi.org/10.13039/100000009), Award ID: 1RF1MH123233-01.
TECHNICAL TERMS
- Graph matching:
The process of inferring an alignment of the nodes of one network with respect to another, often by trying to maximize the agreement of edges between those networks.
- Bilaterally homologous neurons:
Neurons which appear to be “mirror images” of one another across the two sides of an organism’s nervous system, often in terms of both morphology and connectivity.
- Ipsilateral:
Relating to the same side of the body. In this work, this refers to connections among neurons on the same side of a nervous system.
- Contralateral:
Relating to the opposite side of the body. In this work, this refers to connections from neurons on one side of a nervous system to the other.
- Bisected graph matching:
The process of inferring the alignment of the nodes of a split network (i.e., the left and right sides of a connectome) by trying to maximize the edge agreement of both ipsilateral and contralateral connections.
- Correlated Erdős-Rényi model:
A generative statistical model of a pair of networks wherein both networks come from the Erdős-Rényi model (random, independent edges with the same connection probability between all nodes), but corresponding potential edges in the two networks are generated with some correlation.
- Barycenter:
The centroid of the doubly stochastic matrices, i.e., an n × n matrix of all . For the purposes of initializing a graph matching algorithm, this can be thought of as the initialization wherein all potential permutations are equally likely.
- Doubly stochastic:
Referring to matrices which are of shape n × n and have all row and column sums equal to one.
- Multiplex network:
A generalization of a network wherein there can be multiple edge types, such as those belonging to different neurotransmitter types.
REFERENCES
Supporting Information
Author notes
Competing Interests: The authors have declared that no competing interests exist.
Handling Editor: Olaf Sporns