## Abstract

In systems neuroscience, most models posit that brain regions communicate
information under constraints of efficiency. Yet, evidence for efficient
communication in structural brain networks characterized by hierarchical
organization and highly connected hubs remains sparse. The principle of
efficient coding proposes that the brain transmits maximal information in a
metabolically economical or compressed form to improve future behavior. To
determine how structural connectivity supports efficient coding, we develop a
theory specifying minimum rates of message transmission between brain regions to
achieve an expected fidelity, and we test five predictions from the theory based
on random walk communication dynamics. In doing so, we introduce the metric of
compression efficiency, which quantifies the trade-off between lossy compression
and transmission fidelity in structural networks. In a large sample of youth
(*n* = 1,042; age 8–23 years), we analyze structural
networks derived from diffusion-weighted imaging and metabolic expenditure
operationalized using cerebral blood flow. We show that structural networks
strike compression efficiency trade-offs consistent with theoretical
predictions. We find that compression efficiency prioritizes fidelity with
development, heightens when metabolic resources and myelination guide
communication, explains advantages of hierarchical organization, links higher
input fidelity to disproportionate areal expansion, and shows that hubs
integrate information by lossy compression. Lastly, compression efficiency is
predictive of behavior—beyond the conventional network efficiency
metric—for cognitive domains including executive function, memory,
complex reasoning, and social cognition. Our findings elucidate how macroscale
connectivity supports efficient coding and serve to foreground communication
processes that utilize random walk dynamics constrained by network
connectivity.

## Author Summary

Macroscale communication between interconnected brain regions underpins most aspects of brain function and incurs substantial metabolic cost. Understanding efficient and behaviorally meaningful information transmission dependent on structural connectivity has remained challenging. We validate a model of communication dynamics atop the macroscale human structural connectome, finding that structural networks support dynamics that strike a balance between information transmission fidelity and lossy compression. Notably, this balance is predictive of behavior and explanatory of biology. In addition to challenging and reformulating the currently held view that communication occurs by routing dynamics along metabolically efficient direct anatomical pathways, our results suggest that connectome architecture and behavioral demands yield communication dynamics that accord to neurobiological and information theoretical principles of efficient coding and lossy compression.

## INTRODUCTION

The principle of compensation states that “to spend on one side, nature is forced to economize on the other side” (West-Eberhard, 2003). In the economics of brain connectomics, natural selection optimizes network architecture for versatility, resilience, and efficiency under constraints of metabolism, materials, space, and time (Bullmore & Sporns, 2012; Laughlin, 2001; West-Eberhard, 2003). Brain networks—composed of nodes representing cortical regions and edges representing white matter tracts—strike evolutionary compromises between costs and adaptations (Avena-Koenigsberger, Goñi, Solé, & Sporns, 2015; Buckner & Krienen, 2013; Laughlin, 2001; Reardon et al., 2018; West-Eberhard, 2003; Whitaker et al., 2016). Further, network disruptions may contribute to the development of neuropsychiatric disorders (Crossley et al., 2014; Di Martino et al., 2014; Gollo et al., 2018; Kaczkurkin et al., 2018). Because the limits of computations are intertwined with the limits of communication between brain regions (Cover, 1999), to understand how the brain efficiently balances resource constraints with pressures of information processing, one must begin with models of information transmission in brain networks.

Principles of neurotransmission established at the cellular level suggest that
biophysical constraints on information processing may apply to the macroscopic
levels of brain regions and networks (Laughlin,
2001; Levy & Baxter,
1996; Sterling & Laughlin,
2015). The principle of efficient
coding proposes that the brain transmits maximal information in a
metabolically economical or *compressed* form to improve future
behavior (Chalk, Marre, & Tkačik,
2018). Efficient coding at the macroscale offers a parsimonious principle
of compression characterizing the dimensionality of neural representations (Mack, Preston, & Love, 2020; Shine et al., 2019; Stringer, Pachitariu, Steinmetz, Carandini, & Harris,
2019; Tang et al., 2019), as
well as a parsimonious principle of transmission characterizing a spectrum of
network communication mechanisms (Avena-Koenigsberger et al., 2015, 2017; Avena-Koenigsberger, Misic,
& Sporns, 2018; Bullmore
& Sporns, 2012; Goñi et
al., 2013; Goñi et al.,
2014; Mišić et al.,
2015). However, it remains incompletely understood how this principle
generalizes from cells and sensory systems to the macroscale connectome (Chalk et al., 2018; Sterling & Laughlin, 2015).

An unexplored link between the efficient coding of compressed transmissions and macroscale brain network communication dynamics is rate-distortion theory, a major branch of information theory that establishes the mathematical foundations of lossy data compression for any communication channel (Shannon, 1959). Rate-distortion theory formalizes the link between compression and communication by determining the minimum amount of information that a source should transmit (the rate) for a target to approximately receive the input signal without exceeding an expected amount of noise (the distortion) (Shannon, 1959). Lossy data compression is reducing the amount of information transmitted (rate), accepting some loss of data fidelity (distortion). Using a rate-distortion model, we sought to explain how the macroscale connectome supports efficient coding from minimal assumptions.

We modeled information transmission as the passing of stochastic messages in parallel along the wiring of the human connectome (Figure 1A). More precisely, we modeled information transmission using a repetition code (Barlow, 1961; Cover, 1999). A repetition code uses redundancy—here by sending multiple copies of a message—to overcome errors in communication arising from the stochasticity of neural processes (Cover, 1999; Sterling & Laughlin, 2015). A natural trade-off emerges between the redundancy and efficiency of a message: while redundant messages are more robust to errors in transmission, they also incur greater cost (Barlow, 1961). Thus, given an allowed error rate (or, equivalently, an expected fidelity), maximizing the efficiency of information transmission requires minimizing the redundancy of messages (Cover, 1999). By quantifying the minimal number of repeated messages needed to achieve a given fidelity, questions of connectome computation and communication can be formulated as a tractable mathematical problem using stochastic processes and redundancy reduction (Avena-Koenigsberger et al., 2018; Sims, 2018; Sterling & Laughlin, 2015).

A parsimonious model of information transmission in connectomes emerges naturally from two key assumptions. The first assumption is that stochastic transmission entails an economy of discrete impulses where the immediate future state only depends on the current state (Barlow, 1961; Sterling & Laughlin, 2015). Mathematically, this assumption casts transmission as a linear process, or a random walk, wherein message copies (identical random walkers) propagate along structural connections with probabilities proportional to the connection’s microstructural integrity (Avena-Koenigsberger et al., 2018; Fornito, Zalesky, & Bullmore, 2016). Biologically, macroscale random walk models are supported by their ability to predict the transsynaptic spread of pathogens (Henderson et al., 2019; Raj et al., 2015; Zheng et al., 2019), as well as the directionality and spatial distribution of neural dynamics from structural connectivity (Abdelnour, Dayan, Devinsky, Thesen, & Raj, 2018; Goñi et al., 2014; Paquola et al., 2020; Seguin, Razi, & Zalesky, 2019). The second assumption is that the impulse can lose information but never generate additional information over successive steps of propagation (Amico et al., 2021). Mathematically, this assumption represents the data processing inequality, which states that a random walker can only lose (and never gain) information about an information source (Cover, 1999). Biologically, the assumption is supported by increasing temporal delay, signal mixing, and signal decay introduced by longer paths (Murray et al., 2014; Sterling & Laughlin, 2015).

Combining these two assumptions, if packets of information propagate along structural
pathways and information can only be lost with each step, then the shortest pathway
between two brain regions yields an upper bound on the fidelity with which they can
communicate. This key conclusion allows one to formulate the probability that a
message propagates along the shortest path as an effective fidelity for the
communication between two regions, thereby operationalizing the notion of *distortion*. Moreover, by modeling messages as random walkers,
one can operationalize the notion of *rate* by computing the number
of messages that must be sent to ensure that at least one transmits along the
shortest path, that is, to ensure that at least one message reaches a specified
receiver with maximum fidelity (Goñi et
al., 2013). We applied our model to 1,042 youth (aged 8–23 years)
from the Philadelphia Neurodevelopmental Cohort who underwent diffusion-weighted
imaging (DWI; see Supporting Information Figure
S1) (Satterthwaite, Elliott, et al.,
2014). To operationalize metabolic expenditure, we used arterial-spin
labeling (ASL) MRI, which measures cerebral blood flow (CBF) and is correlated with
glucose expenditure (Gur et al., 2008; Vaishnavi et al., 2010).

To evaluate the validity of the efficient coding model, we assessed five published predictions of any communication system adhering to rate-distortion theory, which we adapted to connectomes and distinguished from alternative explanations of brain network communication dynamics (Figure 1B) (Goñi et al., 2013; Goñi et al., 2014; Marzen & DeDeo, 2017; Sims, 2018; van den Heuvel et al., 2012). First, information transmission should produce a characteristic rate-distortion gradient in biological and artificial networks, where exponentially increasing information rates are required to minimize signal distortion. Second, transmission efficiency should improve with manipulations of the communication system designed to facilitate signal propagation, where information costs decrease when randomly walking messages are biased with regional differences in metabolic rates and intracortical myelin. Third, the information rate should vary as a function of the costs of error, with discounts when costs are low and premiums when costs are high. Fourth, brain network complexity should flexibly support communication regimes of varying fidelity, where a high-fidelity regime predicts information rates that monotonically increase as the network grows more complex, and a low-fidelity regime predicts asymptotic information rates indicative of lossy compression. Fifth and finally, structural hubs should integrate incoming signals to efficiently broadcast information, where hubs (compared to other brain regions) have more compressed input rates and higher transmission rates for equivalent input–output fidelity. As described below, this model advances the current understanding of how information processing is associated with behaviors in a range of cognitive domains, subject to constraints on metabolic resources and network architecture.

## RESULTS

### Macroscale Efficient Coding Can Be Understood by Communication Processes of Random Walks But Not the Alternative Model of Shortest Path Routing

To understand how the brain balances the transmission rate of stochastic messages and signal distortion across different network architectures, we developed a model positing random walk communication dynamics atop the structural connectome. We tested this model by comparing it to the alternative hypothesis of shortest path routing (see Supporting Information Modeling/Math Notes). The random walk model and shortest path model are currently viewed as opposing extremes of a spectrum of communication processes (Avena-Koenigsberger et al., 2015, 2018; Goñi et al., 2013). In network neuroscience, shortest path routing anchors metrics of communication dynamics and information integration (Avena-Koenigsberger et al., 2018; Seguin, van den Heuvel, & Zalesky, 2018; Sporns, 2013). In cognitive neuroscience, the neural circuit related to behavior is commonly depicted as a subset of brain regions communicating their specialized information to each other across shortest and direct anatomical connections (Saleeba, Dempsey, Le, Goodchild, & McMullan, 2019). Although shortest path routing has acknowledged shortcomings as a model of communication dynamics (see Supporting Information Modeling/Math Notes), a key extenuating hypothesis of the model is reduced metabolic cost (Avena-Koenigsberger et al., 2018; Bullmore & Sporns, 2012). Yet, existing macroscale evidence for this model remains sparse (Várkuti et al., 2011).

We sought to determine how brain metabolism is associated with structural
signatures of shortest path versus random walk models. To quantify the extent to
which a person’s brain is structured to support shortest path routing, we
used a statistical quantity known as the *global efficiency* (Latora & Marchiori, 2001),
a commonly used measure of the average shortest path strength between all pairs
of brain regions. Intuitively, global efficiency represents the ease of routing
information by shortest paths and is proportional to the strength of shortest
paths in a network (Sporns, 2013). As
an operationalization of metabolic running cost, we considered CBF, which is
correlated with glucose consumption (Gur et
al., 2008; Vaishnavi et al.,
2010). To test the spatial correlation between CBF and glucose
consumption, we used a spatial permutation test that generates a null
distribution of randomly rotated brain maps that preserves the spatial
covariance structure of the original data; we denote the *p* value that reflects significance as *p*_{SPIN} (Materials and Methods). We observed a
linear association between CBF and glucose consumption (Figure 1C; Pearson’s correlation coefficient *r* = 0.47, *df* = 358, *p*_{SPIN} < 0.001).

Next, we tested if shortest path routing is linked to a decrease in metabolic
expenditure, operationalized as a negative correlation between global efficiency
and CBF (Bullmore & Sporns,
2012; Várkuti et al.,
2011). Controlling for mean gray matter density, sex, mean degree,
network density, and in-scanner motion, we found that the global efficiency was
negatively correlated with CBF (*r* = −0.20, *df* = 1039, *p* < 0.001), consistent
with prior reports (Várkuti et al.,
2011). Notably, we did not regress out age in the previous analysis
in order to align with the prior analysis that we aimed to replicate (Várkuti et al., 2011). We were
also interested in determining whether development had any effect on the
relationship between global efficiency and CBF. Our interest was justified by
the positive correlation between age and global efficiency (Figure 2A; *F* = 50, estimated *df* = 3.46, *p* < 2 ×
10^{−16}) and the negative correlation between age and CBF
(*F* = 69.22, estimated *df* = 3.74, *p* < 2 × 10^{−16}). After
controlling for age we did not find a significant relationship between global
efficiency and CBF (*r* = 0.01, *df* = 1039, *p* = 0.79), suggesting that colinearity with age drove the
initial observed association between CBF and global efficiency. This null result
undermines the claim that shortest path routing is associated with reduced
metabolic expenditure.

Rather than being driven by shortest path routing, metabolic expenditure could
instead be associated with communication by random walks. Each brain region can
reach every other brain region via random walks along paths of five connections
(Figure 2B). A random walker will
likely not take the most efficient paths and must instead rely on the structural
strengths of longer paths. Hence, if brain metabolism is associated with
communication by random walks, then CBF should correlate with the strength of
the white matter paths greater than length 5. To evaluate this prediction, we
computed the strength of connections across different path distances using the
matrix exponent of the structural network (see Materials and Methods and Supporting Information Figure S1B). We then tested the association
between longer paths and metabolic expenditure across individuals (Figure 2C) and across regions (Figure 2D). In first considering variation across *individuals*, we found that the average node strengths for
walks of length 2 to 15 were negatively correlated with CBF (*t* = −1.59 to −2.81, estimated model *df* = 11.45,
FDR-corrected *p* < 0.05), after controlling for age, sex,
age-by-sex interaction, average node degree, network density, and in-scanner
motion (Figure 2C). The negative
correlations between CBF and the average connection strengths suggest that the
greater the connection integrity, the lower the metabolic expenditure. In next
considering variation across *brain regions*, we found that the
average node strengths for walks of length 2 to 15 were positively correlated
with CBF (Spearman’s rank correlation coefficient *ρ* = 0.12 to 0.14, *df* = 358,
FDR-corrected *p* < 0.05), after controlling for age, sex,
age-by-sex interaction, average node degree, network density, and in-scanner
motion (Figure 2D). The positive
correlations between CBF and the average connection strengths suggest that brain
regions with greater path strengths tended to have higher metabolic expenditure.
See Supporting Information Figures
S2–S4 for metabolic
costs associated with other connectivity metrics supporting random walks.
Together, we found no evidence of metabolic expenditure associated with shortest
path routing, whereas the convergent findings of an association between CBF and
random walk path strengths across individuals and regions provided some evidence
that metabolic running costs were linked to random walk communication
dynamics.

Using the random walk model, we formalized a rate-distortion model of efficient
coding by assuming that the minimal amount of noise is achieved by messages that
randomly walk along shortest paths (Figure 3A–D). We defined rate as
the number of random walkers per transmission, and distortion as the probability
of the random walkers *not* taking the shortest path. We
evaluated the validity of the redundancy reduction implementation of efficient
coding (Barlow, 1961). Reducing
redundancy in repetition coding is equivalent to minimizing the number of random
walkers (Barlow, 1961; Cover, 1999; Shannon, 1959). To understand how the brain balances
information rate and distortion, we measured the number of random walkers that
are required for at least one to randomly walk along the shortest path to a
target cortical region, with an expected probability (Materials and Methods). This measure of random walk
dynamics is based on a prior metric (Goñi et al., 2013). The number of random walkers can be used
to calculate the transmission length of a neural message (in units of bits) or
the information rate (in units of bits per second; see Materials and Methods and Supporting Information Figure S3). To evaluate the
roles of random walk dynamics and rate-distortion theory in the brain, we
assessed five previously published predictions of rate-distortion theory and
information processing (Figure 1C) (Marzen & DeDeo, 2017; Sims, 2018; van den Heuvel et al., 2012).

### Hypothesis 1: Rate-Distortion Gradient

The first prediction of rate-distortion theory is that communication systems including both brain networks and artificial random networks should produce an information rate that is an exponential function of distortion because biological and engineered systems are governed by the same information-theoretic trade-offs (Sims, 2018). To test this prediction, we computed the average number of random walkers over all nodes in the network for a given individual, with the probability of randomly walking along the shortest path ranging from 10% to 99.9% (Figure 3D). We compared the number of random walkers required of structural connections in the brain with that of random walkers required of connections in random Erdős-Rényi networks, which have larger probabilities of shortest path communication compared among canonical random networks (Goñi et al., 2013; Latora & Marchiori, 2001). Hence, Erdős-Rényi networks serve as an optimal benchmark for the efficiency of random walk communication and shortest path routing (Materials and Methods). For each individual network, the information-theoretic trade-off between information rate and signal distortion was defined by a rate-distortion gradient (Figure 3E). The gradient shows that distortion increases as the information rate decreases, which is the hallmark feature of lossy compression. Next, we considered the extent to which the brain’s structural connectome prioritizes compression versus fidelity. We refer to this trade-off as the compression efficiency (Figure 3E), and define it as the slope of the rate-distortion gradient (Materials and Methods). With random walk communication dynamics, increased compression efficiency prioritizes lossy compression, while decreased compression efficiency prioritizes transmission fidelity.

Consistent with the first prediction of rate-distortion theory, we observed an
exponential gradient in every individual brain network and the
Erdős-Rényi random networks. Furthermore, the random networks,
which are composed of more short connections than empirical brain networks,
required significantly fewer random walkers than the empirical brain networks
(Figure 3D and Supporting Information Figure S3; *F* =
10 × 10^{5}, *df* = 29120, *p* < 2 × 10^{−16}), consistent with the intuition that
a greater prevalence of short connections in the random network translates to
greater likelihood of shortest path propagation (Goñi et al., 2013; Latora & Marchiori, 2001). Rate-distortion
trade-offs varied as a function of age and sex, where compression efficiency
(Figure 3E) was negatively correlated
with age (*F* = 27.54, estimated *df* = 2.17, *p* < 0.001), suggesting that neurodevelopment places
a premium on fidelity (Figure 3F).
Compression efficiency was greater on average in females compared to males
(*t* = 9.53, *df* = 996.82, *p* < 0.001). The data, therefore, indicate that random walk communication
dynamics on biological brain networks differ from random walks on artificial
networks, yet each accords well with the prediction of rate-distortion
trade-offs governing all communication systems.

### Hypothesis 2: Redundancy Reduction

The second prediction of rate-distortion theory is that manipulations to the physical communication system (the connectome) that are designed to facilitate information transmission will improve communication efficiency. The prediction stems from two key observations. First, between the rate-distortion gradients for structural connectomes and random networks (depicted in Figure 3D) exists a range of possible rate-distortion gradients produced by some other construction of communication networks (Shannon, 1959). Second, it is well known that brain signaling relies extensively on metabolic diffusion and devotes much of its metabolic resources to maintaining a chemical balance that supports neuron firing (Attwell & Laughlin, 2001; Sterling & Laughlin, 2015). At the longer distances of the connectome, myelin in the white matter and cerebral cortex supports the speed and efficiency of electrical signaling in subcortical fiber tracts and in cortico-cortical communication (Barbas & Rempel-Clower, 1997; Deco, Roland, & Hilgetag, 2014; Laughlin, 2001). Hence, modifying the connectome to bias random walk dynamics according to metabolic resources and myelin mimics biological investments in communication efficiency.

The second observation above leads to the hypothesis that including biological biases in random walk probabilities based on the strength of structural connections will improve the efficiency of information transmission. We hypothesized that biasing random walkers with metabolic resources and myelin would reduce the information rate required to communicate a message with a given fidelity compared to random walkers that propagate only by connectome topology. To test this hypothesis, we biased edge weights (Materials and Methods) representing structural connection strength by multiplying the edge weight by a bias term. Across pairs of connected brain regions, the bias term was either defined as the average, normalized metabolic rate using CBF or the average, normalized cortical myelin content using published maps of T2/T1w MRI measures with histological validation (Figure 4A) (Glasser et al., 2014). By modeling network communication dynamics, one can calculate directed patterns of transmission as inputs into (receiver) and outputs from (sender) brain regions (Seguin et al., 2019). We separately computed the send and receive compression efficiency of brain regions to better understand the biological relevance of transmitted information sent or received across the connectome (Figure 4B; Materials and Methods).

We found that brain regions that prioritized input fidelity and output
compression tended to have greater myelin content (Figure 4B; sender *r* = 0.23, *df* =
358, *p*_{SPIN,Holm-Bonferroni} = 0.02, receiver *r* = −0.14, *df* = 358, *p*_{SPIN,Holm-Bonferroni} = 0.046), consistent with
myelin’s function in neurotransmission efficiency and speed. For a
channel communicating at 0.1% distortion (and across all distortions; Supporting Information Figure S6C),
biasing structural edge weights by the biological properties of metabolic and
electrical signaling resulted in more efficient communication by reducing the
number of redundant random walkers required (Figure 4C; *t*_{metabolic} =
20.87, bootstrap 95% CI [18.72, 23.14], *df* = 1993.9; *t*_{electrical} = 295.93, bootstrap
95% CI [281.19, 312.76], *df* = 1225.9, *p* < 0.001). While both metabolic and electrical signaling supported more
efficient communication, electrical signaling was more efficient than metabolic
signaling (Deco et al., 2014; Sterling & Laughlin, 2015).
Compared to rewired null networks preserving the degree sequence (Supporting Information Figure S6D),
structural topology and metabolic resources support communication that
prioritizes fidelity
(*t*_{topological,degree-preserving}(2074.4) = 121.02, *p* < 0.001; *t*_{metabolic,degree-preserving}(2025.8) = 87.78, *p* < 0.001), while myelination supports communication
that prioritizes compression efficiency
(*t*_{electrical,degree-preserving}(1208.3) =
−122.62, *p* < 0.001). The minimum number of random
walkers required for distortion levels less than 60% was explained by the
interaction of the distortion level with the type of biased random walk (see Figure 4C, *F* = 6
× 10^{5}, *df* = 29120, *p*_{Holm-Bonferroni} < 0.05). Together, these
results support the prediction that biological investments can augment
connectivity to support efficient communication, especially when transmission
prioritizes fidelity.

### Hypothesis 3: Error-Dependent Costs

The third prediction of rate-distortion theory is that the information rate should vary as a function of the costs of errors in communication systems that interact with their environment. If errors are more costly for networks operating at high fidelity, then we should observe an information rate surpassing the minimum predicted by rate-distortion theory. In contrast, if errors are less costly for networks operating at low fidelity, then we should observe no more than the minimum predicted information rate. In testing this prediction, we observed that brain networks commit more random walkers than required for very low levels of distortion, such as 0.1%, but allocate the predicted number of random walkers or fewer to guarantee levels of distortion between 2% and 60% (Figure 4D). Hence, the third prediction of rate-distortion theory was consistent with our observation of a premium placed on very low signal distortion and a discounted cost of greater distortion.

### Hypothesis 4: Flexible Coding Regimes

The fourth prediction of rate-distortion theory proposes that communication systems, including the connectome, have distinct network properties that support information transmission in a flexibly high- or low-fidelity regime. With increasing information processing demands, a high-fidelity regime will continue to place a premium on accuracy, whereas a low-fidelity regime will tolerate noise in support of lossy compression. The operating regime depends on the behavioral demands of the environment, indicating the need for a flexible regime that can simultaneously support both high- and low-fidelity communication. In this section, we assess the fourth prediction of rate-distortion theory by testing the more precise hypotheses that large brain networks support communication in a high-fidelity regime, indirect pathways supports a low-fidelity regime, and hierarchical organization supports a flexible regime. We explain and test each hypothesis in turn.

### Large Networks Support a High-Fidelity Regime and Indirect Pathways Support a Low-Fidelity Regime

Rate-distortion theory predicts that, in a high-fidelity regime, the information rate will monotonically increase with the complexity of the communication system in order to continue to place a premium on accuracy (Figure 5A). To evaluate this prediction, we operationalized complexity as network size because size determines the number of possible states (or nodes) available to each random walker (Marzen & DeDeo, 2017). We re-parcellated each individual brain network at different spatial resolutions to generate brain and random null networks of five different sizes representing the original network. We compared the rate for brain networks to the rate for random networks of matched size. In testing this prediction, we observed that the minimum number of random walkers increased monotonically with network size, consistent with a high-fidelity regime, and at a rate different to random networks with matched sizes (Figure 5B). Larger brain networks support high fidelity communication by placing a greater premium on accuracy than do larger random networks.

In addition to high-fidelity communication, in a low-fidelity regime, rate-distortion theory predicts that the information rate should plateau as a function of network complexity in order to tolerate noise in support of lossy compression. To evaluate this prediction, we operationalized network complexity supporting low fidelity communication by using path transitivity. Path transitivity quantifies the number of indirect pathways along the shortest path which are a one-connection longer detour (Figure 5A, Materials and Methods). Along the shortest paths, the transitivity is the number of triangles formed by one edge in the shortest path and two connected edges composing an indirect pathway that exits and immediately returns to the shortest path. Operationalizing low fidelity with path transitivity stemmed from interpreting path transitivity using our model assumptions of random walk dynamics.

Applying our random walk model assumptions to path transitivity, if the shortest
path represents the structure supporting highest fidelity because longer paths
introduce information loss, then path transitivity’s indirect pathways,
which are just one connection longer than the shortest path, are the next-best
paths for fidelity. Thus, path transitivity quantifies indirect pathways that
offer a random walker the best *approximations* of the highest
fidelity path or the best lossy compression. To operationalize the complexity of
the communication system contributing to low-fidelity transmission (better able
to tolerate noise in support of lossy compression), we measured the number of
nodes in the indirect pathways of path transitivity, which we termed shortest
path complexity. In testing the prediction that information rate should plateau
as a function of network complexity in low-fidelity regimes, we found that the
number of random walkers plateaued nonlinearly as a function of shortest path
complexity, consistent with low-fidelity communication (Figure 5C). Model selection criteria support the
nonlinear form compared to a linear version of the same model (nonlinear *AIC* = 7902, linear *AIC* = 7915; nonlinear *BIC* = 7964, linear *BIC* = 7968). The
nonlinear fit of these data suggest that path transitivity supports low-fidelity
communication that is tolerant to noise.

#### Hierarchical organization flexibly and efficiently supports both a high-fidelity and a low-fidelity regime.

Our findings suggest that high fidelity depends on network *size* while low fidelity depends on network *transitivity*. Prior work has shown that hierarchical
organization supports the simultaneous presence of networks of *large
size and high transitivity* (Ravasz & Barabási, 2003). Therefore, we
hypothesized that if the brain network has hierarchical organization, then
such organization may enable flexible switching between high- and
low-fidelity regimes. Hierarchical organization, wherein submodules are
nested into successively larger but less densely interconnected modules, is
thought to support efficient spatial embedding as well as specialized
information transfer (Figure 5D, left)
(Bassett et al., 2010). We
first sought to assess if brain networks exhibit hierarchical organization.
Hierarchical organization imposes a strict scaling law between transitivity
and degree; the slope of this relationship can be used to identify the
presence of hierarchically modular organization in real networks. This
strict scaling is distinctive of hierarchical networks because increasing
the size of a generic nonhierarchical network containing highly connected
hubs (degree) will tend to diminish transitivity (clustering); however,
hierarchical organization is known to decouple the size and transitivity of
a network, which allows each property to vary individually (Ravasz & Barabási,
2003). The characteristic scaling of reduced transitivity of brain
regions with higher degree was present across the five brain network sizes
(83 nodes: *t* = −15.31, bootstrap 95% CI
[−19.37, −12.29], *p* < 0.001, *R*^{2} = 0.74; 129 nodes: *t* =
−14.82, bootstrap 95% CI [−18.48, −11.93], *p* < 0.001, *R*^{2} =
0.63; 234 nodes: *t* = −16.96, bootstrap 95% CI
[−20.43, −13.75], *p* < 0.001, *R*^{2} = 0.55; 463 nodes: *t* =
−26.47, bootstrap 95% CI [−30.02, −23.16], *p* < 0.001, *R*^{2} =
0.60; 1015 nodes: *t* = −41.88, bootstrap 95%
CI [−46.10, −38.11], *p* < 0.001, *R*^{2} = 0.63; *F* = 917.3, *df* = 1913, *p* < 0.001; Figure 5D, right). Hence, the connectome
exhibits hierarchical organization.

We next assessed the hypothesis that hierarchical organization permits
networks to have simultaneously large size and transitivity: two core
network properties hypothesized to underlie high- and low-fidelity
communication regimes (Ravasz &
Barabási, 2003). Hence, we sought to test whether
hierarchical organization exhibits the hallmarks of both high- and
low-fidelity regimes. If hierarchical organization supports a high-fidelity
regime, then the information rate will monotonically increase with the
complexity of the communication system. In testing this hypothesis, we found
that the scaling characteristic of hierarchical organization in brain
networks was associated with monotonically greater information rate
(*F* = 2002, bootstrap 95% CI [1946.6, 2061.4], *df* = 13389, *p* < 0.001, *R*^{2} = 0.62, bootstrap 95% CI [0.62,
0.63], Figure 5E). Considering the
slope of the increasing rate, networks with greater hierarchical
organization supported high fidelity communication more efficiently
(*t* = 32.4, bootstrap 95% CI [29.96, 34.79], *p* < 0.001) than larger networks
(*t* = 2640.32, bootstrap 95% CI [2556.3,
2725.07], *p* < 0.001). Taken together, our findings
suggest that hierarchical organization, which is already known to be
spatially efficient (Bassett et al.,
2010), also supports high-fidelity network communication and does
so more efficiently than large networks without hierarchical
organization.

To enable flexible switching, hierarchical organization should also support a
low-fidelity regime. If path transitivity supports a low-fidelity regime
better able to tolerate noise, then hierarchical organization may support a
low-fidelity regime by allowing for greater global transitivity in
hierarchical networks compared to nonhierarchical networks. In testing this
prediction, we found that the hierarchical organization of the connectome
exhibited greater transitivity than nonhierarchical scale-free networks
(generated with Materials and
Methods); and this increased transitivity became more pronounced
with increasing network size (size-by-hierarchical network type interaction *t* = −37.22, bootstrap 95% CI
[−40.57, −34.12], *p* < 0.001;
size-by-scale-free network type interaction *t* =
−125.32, bootstrap 95% CI [−131.71, −118.70], *p* < 0.001; Figure 5F). Thus, hierarchical organization could contribute to the
flexibility of efficient coding by preserving high network transitivity for
low-fidelity communication that prioritizes noise tolerance and large
network size for high-fidelity communication that prioritizes accuracy.

#### Brain regions disproportionately scale in relation to regional prioritization of network communication fidelity or lossy compression.

Having found that hierarchical organization preserves high transitivity
despite large network size, we sought to understand how transitivity is
associated with the areal expansion of brain regions. Evolutionarily new
connections may support higher order and flexible information processing,
emerging from disproportionate expansion of the association cortex (Buckner & Krienen, 2013). In
contrast, brain regions that are disproportionately out-scaled by total
brain expansion may save material, space, and metabolic resources. To
explore how sender and receiver compression efficiency relates to cortical
areal expansion, we used published maps of areal scaling; here, allometric
scaling coefficients were defined by the nonlinear ratios of surface area
change to total brain size change over development. Brain regions that
disproportionately expanded in relation to total brain size during
neurodevelopment tended to have greater transitivity (*r* =
0.26, *df* = 358, *p*_{SPIN,Holm-Bonferroni} = 0.001), which may
support noise-tolerant lossy compression with minimal loss of fidelity in
association cortex regions thought to underpin higher order and flexible
information processing. Disproportionately expanding brain regions tended to
have greater sender (*r* = 0.14, *df* = 358, *p*_{SPIN,Holm-Bonferroni} = 0.045; Figure 5H) and reduced receiver
(*r* = −0.20, *df* = 358, *p*_{SPIN,Holm-Bonferroni} = 0.03; Figure 5H) compression efficiency. While brain
regions that disproportionately expanded tend to prioritize messages
received with high fidelity, brain regions that are disproportionately
out-scaled by total brain expansion tend to receive compressed messages.

### Hypothesis 5: Integrative Hubs

The fifth and final hypothesis of our model posits that the structural hubs of
the brain’s highly interconnected rich club supports information
integration of randomly walking signals (van
den Heuvel et al., 2012). To explain the hypothesized information
integration roles of rich-club structural hubs, we investigated the compression
efficiency of messages randomly walking into and out of hub regions compared to
that of other regions. In order to identify the rich-club hubs, we computed the
normalized rich-club coefficient and identified 43 highly interconnected
structural hubs (Figure 6A). Next, we
computed the send and receive compression efficiency of rich-club hubs compared
to all other regions. In support of their hypothesized function, we found that
the rich-club hubs required receiving fewer random walkers compared to other
regions (Wilcox rank-sum test, *W* = 12829, *p* < 0.001), suggesting prioritization of information compression (or
integration). For the rich-club hub to transmit outgoing messages with a
fidelity that is equivalent to the incoming messages, the rich-club hubs
required sending more random walkers compared to other brain regions
(*W* = 64, *p* < 0.001), supporting the
notion that rich-club hubs serve as high-fidelity information broadcasting
sources. The function of hubs apparently compensates for high metabolic cost
(Collin, Sporns, Mandl, & van den
Heuvel, 2013). However, we did not find evidence of greater CBF in
rich-club hubs compared to nonhubs, suggesting that hubs may be more
metabolically efficient than previously known (Supporting Information Figure S8). This may potentially be due to
hubs being compression-efficient receivers (see Supporting Information Results and Discussion for additional
explanations). Though we did not find evidence of high metabolic cost, the
contrasting roles of prioritizing input compression and output fidelity within
rich-club hubs was consistent with our hypothesis and the current understanding
of rich-club hubs as the information integration centers and broadcasters of the
brain’s network (van den Heuvel et
al., 2012).

#### Compression efficiency predicts behavioral performance.

Efficient coding predicts that the brain should not only balance fidelity and
information compression as quantified by compression efficiency, but should
do so in a way that improves future behavior (Chalk et al., 2018; Niven, Anderson, & Laughlin, 2007). Thus, we next sought
to evaluate the association between compression efficiency in rich-club hubs
and cognitive performance in a diverse battery of tasks. In light of
trade-offs between communication fidelity and information compression, we
hypothesized that compression efficiency would correlate with cognitive
efficiency, defined as the combined speed and accuracy of task performance.
Compression efficiency prioritizing lossy compression and low-dimensionality
should predict worse performance (Rigotti
et al., 2013). To assess the relationships between compression
efficiency and cognitive efficiency, we used four independent cognitive
domains that have been established by confirmatory factor analysis to assess
individual variation in tasks of complex reasoning, memory, executive
function, and social cognition (Materials
and Methods). Consistent with the hypothesis, we found that
individuals having rich-club hubs with reduced compression efficiency,
prioritizing transmission fidelity, tended to exhibit increased cognitive
efficiency of complex reasoning (all *p* values corrected
using the Holm–Bonferroni family-wise error method; *t* = −4.73, bootstrap 95% CI
[−6.58, −2.82], model adjusted *R*^{2} = 0.21, bootstrap 95% CI [0.16, 0.26], estimated model *df* = 10.59, *p* = 2 ×
10^{−7}), memory (*t* = −2.60,
bootstrap 95% CI [−4.41, −0.79], model adjusted *R*^{2} = 0.24, bootstrap 95% CI [0.16,
0.26], estimated model *df* = 9.85, *p* =
0.03), executive function (*t* = −2.80, bootstrap
95% CI [−4.75, −0.87], model adjusted *R*^{2} = 0.50, bootstrap 95% CI [0.45,
0.56], estimated model *df* = 10.69, *p* =
0.03), and social cognition (*t* = −2.55, bootstrap
95% CI [−4.50, −0.73], model adjusted *R*^{2} = 0.22, bootstrap 95% CI [0.16,
0.25], estimated model *df* = 10.47, *p* =
0.02). See Supporting Information Figure
S9 for similar nonhub correlations, as well as speed and accuracy
modeled separately. Our finding that the compression efficiency of rich-club
hubs was associated with cognitive efficiency is consistent with the notion
that the integration and broadcasting function of hubs contributes to
cognitive performance (van den Heuvel et
al., 2012). Importantly, compression efficiency explained
variation in cognitive efficiency even when controlling for the commonly
used shortest path measure of global efficiency (Figure 6D; compression efficiency *t* = −4.95, estimated model *df* =
11.52, *p* < 0.001; global efficiency *t* = 2.68, estimated model *df* = 11.52, *p* < 0.01). Collectively, individuals with
connectomes prioritizing fidelity tended to perform with greater cognitive
efficiency in a diverse range of functions, consistent with the
understanding of high-dimensionality neural representations predicting
flexible behavioral performance (Rigotti
et al., 2013; van den Heuvel
et al., 2012).

## DISCUSSION

The efficient coding principle and rate-distortion theory constrain models of brain network communication, explaining how connectome biology and architecture prioritize either communication fidelity or lossy compression (Achard & Bullmore, 2007; Avena-Koenigsberger et al., 2015, 2017, 2018; Bullmore & Sporns, 2012; Goñi et al., 2013; Johansen-Berg, 2010; Laughlin, 2001; Levy & Baxter, 1996; Palmer, Marre, Berry, & Bialek, 2015; Rubinov, 2016). Information compression can improve prediction and generalization, separate relevant features of information, and efficiently utilize limited capacity by balancing the unreliability of stochasticity with the redundancy in messages (Olshausen & Field, 2004; Palmer et al., 2015; Sims, 2016, 2018; Sterling & Laughlin, 2015). We observed a rate-distortion gradient for every individual, consistent with the conceptualization of the brain structural network as a communication channel with limited capacity. While each individual’s network adheres to rate-distortion theory, differences in structural connectivity lead to substantial variance in the compression efficiency. In addition to some of the variation being explained by age and sex, our results also indicate that variance could be driven by biological differences, such as in metabolic expenditure and myelin, as well as topological differences, such as transitivity, degree, and hierarchical organization. We found that biological investments of metabolic resources and myelin in connectome topology, hierarchically interconnected brain regions, and highly connected network hubs each supported efficient coding and compressed communication dynamics. We found evidence consistent with five predictions of rate-distortion theory adapted from prior literature, corroborating the validity of this theoretical model of macroscale efficient coding (Marzen & DeDeo, 2017; Sims, 2018; van den Heuvel et al., 2012).

In addition to these findings, we reported two important null results that could be explained in part by the macroscale efficient coding model. First, we did not find evidence for the hypothesis that metabolic savings are associated with shortest path routing communication dynamics, a key redeeming feature against acknowledged theoretical shortcomings of this routing model (Avena-Koenigsberger et al., 2018; Bullmore & Sporns, 2012; Várkuti et al., 2011). Rather, we found evidence of efficient coding, implemented using random walks, as in other areas of neuroscience (Barlow, 1961; Chalk et al., 2018; Denève, Alemi, & Bourdoukan, 2017; Laughlin, 2001; Laughlin, van Steveninck, & Anderson, 1998; Olshausen & Field, 2004; Palmer et al., 2015; Sterling & Laughlin, 2015; Weber, Krishnamurthy, & Fairhall, 2019; Wei & Stocker, 2015). Efficient coding generated unique predictions of communication as a stochastic process governed by trade-offs between lossy compression and transmission fidelity (MacKay, 2003). The evidence we presented was consistent with these predictions and would be difficult to explain with alternative models of shortest path routing (see Supporting Information Modeling/Math Notes and Discussion). This null result challenges the validity of the shortest path routing model as anchoring one end of a hypothesized spectrum of communication mechanisms and the usage of shortest path routing metrics to quantify the integrative capacity of particular brain areas or whole-brain connectivity (Avena-Koenigsberger et al., 2018). Rather, the macroscale efficient coding model implemented by random walk dynamics provides strong theoretical interpretation of information integration in hubs as lossy compression (Chanes & Barrett 2016; van den Heuvel et al., 2012).

Second, we did not find evidence for high metabolic cost associated with structural network hubs. Rather, we found that hubs are regions that receive inputs with greater compression efficiency; regions prioritizing input compression tend to be disproportionately out-scaled by total brain expansion during development, consistent with cortical consolidation (thinning and myelination) (Whitaker et al., 2016). In contrast, regions prioritizing input fidelity tend to disproportionately expand in relation to total brain growth during development, consistent with theories positing that evolutionary expansion and new connections support the flexibility of neural activity for higher-order cognition (Buckner & Krienen, 2013; Bullmore & Sporns, 2012; Glasser et al., 2014; Goldman-Rakic, 1988; Reardon et al., 2018; Scholtens, Schmidt, de Reus, & van den Heuvel, 2014; Theodoni et al., 2020; Vij, Nomi, Dajani, & Uddin, 2018). Structural network hubs support unique information integrative processes, apparently offsetting high metabolic, spatial, and material costs (Avena-Koenigsberger et al., 2019; Chanes & Barrett, 2016; Crossley et al., 2014; Oldham & Fornito, 2018; Scholtens et al., 2014; van den Heuvel et al., 2012; Vértes, Alexander-Bloch, & Bullmore, 2014). Hence, hub dysfunction may be particularly costly. Our null result motivates reconsideration of metabolic costs, at least for the simplest and most common definition of a structural connectivity hub using the measure of degree centrality (see Supporting Information Figure S8, Results, and Discussion) (Oldham & Fornito, 2018). An alternative hypothesis to high metabolic cost of hubs is that long-run metabolic savings depend on frequent usage of hubs to compensate for the expense of maintaining their size and connectivity (Harris & Attwell, 2012; S. S.-H. Wang et al., 2008). Although further investigation of the metabolic costs of rich-club hubs is warranted, our findings nevertheless reinforce a wealth of evidence emphasizing the importance of the development, resilience, and function of hubs in cognition and psychopathology (Chanes & Barrett 2016; Crossley et al., 2014; Gollo et al., 2018; Liang, Zou, He, & Yang, 2013; Mišić et al., 2015; Scholtens et al., 2014; van den Heuvel et al., 2012; Whitaker et al., 2016).

Our work admits several theoretical and methodological limitations. First, regionally aggregated brain signals are not discrete Markovian messages and do not have goals like reaching specific targets. As in recent work, our model is a deliberately simplified but useful abstraction of macroscale brain network communication (Mišić et al., 2015). Second, although we modeled random walk dynamics in light of prior methodological decisions and information theory benchmarks (Goñi et al., 2013), compression efficiency can be implemented using alternative approaches. Several methodological limitations should also be considered. The accurate reconstruction of white matter pathways using diffusion imaging and tractography remains limited (Zalesky et al., 2016). While we chose to model structural connections using fractional anisotropy because it is one of the most widely used measures of white matter microstructure, future research could evaluate these same hypotheses in structural networks built from different measures and of other species (Chang et al., 2017; Jones, Knösche, & Turner, 2013; Teich et al., 2021). Moreover, noninvasive measurements of CBF with high sensitivity and spatial resolution remain challenging. However, we acquired images using an ASL sequence providing greater sensitivity and approximately four times higher spatial resolution than prior developmental studies of CBF (Satterthwaite, Shinohara, et al., 2014). Lastly, our data was cross-sectional, limiting the inferences that we could draw about neurodevelopmental processes.

In summary, our study advances understanding of how brain metabolism and architecture at the macroscale supports communication dynamics in complex brain networks as efficient coding. Our model provides a simple framework for investigating efficient coding at the whole-brain network level and an important basis to build toward computational network models of special cases of efficient coding, such as robust, sparse, and predictive coding (Chalk et al., 2018). In quantifying integrative communication dynamics and lossy compression of the macroscale brain network, these results are relevant to future research on low-dimensional neural representations (Rigotti et al., 2013; Shine et al., 2019; Tang et al., 2019), efficient control of functional network dynamics (Srivastava et al., 2020; Tang et al., 2017), and hierarchical abstraction of behaviorally relevant cortical representations (Momennejad, 2020; Schapiro, Turk-Browne, Botvinick, & Norman, 2017; Stachenfeld, Botvinick, & Gershman, 2017). Future research could investigate whether other biological properties of the network support efficient coding, such as gradients in brain structure and function (Barbas, 1986; Charvet, Cahalane, & Finlay, 2015; Charvet & Finlay 2014; Huntenburg, Bazin, & Margulies, 2018; Kingsbury & Finlay, 2001; Paquola et al., 2020; Vazquez-Rodriguez, Liu, Hagmann, & Misic, 2020). Neurodevelopmental processes vary with the naturalistic environment and socioeconomic background, suggesting that they are shaped by exposure to different experiences and expectations (Tooley, Bassett, & Mackey, 2021). We hypothesize that the neurodevelopmental range of compression efficiency reflects a relationship between source coding (statistics or compressibility of the naturalistic input information) and channel coding (compression efficiency), which demarcate zones of possible brain network communication (MacKay, 2003; Olshausen & Field, 2004). Lastly, the compression efficiency metric offers a novel tool to test leading hypotheses of dysconnectivity (Di Martino et al., 2014), hubopathy (Crossley et al., 2014; Gollo et al., 2018), disrupted information integration (Chanes & Barrett, 2016; Hernandez, Rudie, Green, Bookheimer, & Dapretto, 2015), and neural noise (Dinstein et al., 2012) in neuropsychiatric disorders.

## MATERIALS AND METHODS

### Participants

As described in detail elsewhere (Satterthwaite, Elliott, et al., 2014), diffusion-weighted imaging
(DWI) and arterial-spin labeling (ASL) data were acquired for the Philadelphia
Neurodevelopmental Cohort (PNC), a large community-based study of
neurodevelopment. The subjects used in this paper are a subset of the 1,601
subjects who completed the cross-sectional imaging protocol. We excluded
participants with health-related exclusionary criteria (*n* =
154) and with scans that failed a rigorous quality assurance protocol for DWI
(*n* = 162) (Roalf et al.,
2016). We further excluded subjects with incomplete or poor ASL and
field map scans (*n* = 60). Finally, participants with poor
quality T1-weighted anatomical reconstructions (*n* = 10) were
removed from the sample. The final sample contained 1042 subjects (mean age =
15.35, SD = 3.38 years; 467 males, 575 females). Study procedures were approved
by the Institutional Review Board of the Children’s Hospital of
Philadelphia and the University of Pennsylvania. All adult participants provided
informed consent; all minors provided assent and their parent or guardian
provided informed consent.

### Cognitive Assessment

All participants were asked to complete the Penn Computerized Neurocognitive
Battery (CNB). The battery consists of 14 tests adapted from tasks typically
applied in functional neuroimaging, and which measure cognitive performance in
four broad domains (Satterthwaite, Elliott, et
al., 2014). The domains included (1) executive control (i.e.,
abstraction and flexibility, attention, and working memory), (2) episodic memory
(i.e., verbal, facial, and spatial), (3) complex cognition (i.e., verbal
reasoning, nonverbal reasoning, and spatial processing), (4) social cognition
(i.e., emotion identification, emotion intensity differentiation, and age
differentiation), and (5) sensorimotor and motor speed. Performance was
operationalized as *z*-transformed accuracy and speed. The speed
scores were multiplied by −1 so that higher indicates faster performance,
and efficiency scores were calculated as the mean of these accuracy and speed *z*-scores. The efficiency scores were then *z*-transformed again, to achieve mean = 0 and *SD* = 1.0 for all scores. Confirmatory factor analysis
supported a model of four latent factors corresponding to the cognitive
efficiency of executive function, episodic memory, complex cognition, and social
cognition (Moore, Reise, Gur, Hakonarson,
& Gur, 2015). Hence, we used these four cognitive efficiency
factors in our analyses. In a factor solution separately modeling accuracy and
speed, the accuracy factors correspond to (1) executive and complex cognition,
(2) social cognition, and (3) memory. The speed factors correspond to (1) fast
speed (e.g., working memory and attention tasks requiring constant vigilance),
(2) episodic memory speed, and (3) slow speed (e.g., tasks requiring complex
reasoning).

### Image Acquisition, Preprocessing, and Network Construction

Neuroimaging acquisition and pre-processing were as previously described (Satterthwaite, Elliott, et al., 2014). We depict the overall workflow of the neuroimaging and network extraction pipeline in Supporting Information Figure S1.

#### Diffusion-weighted imaging.

As was previously described (Baum et al.,
2017; Tang et al.,
2017), diffusion imaging data and all other MRI data were acquired on
the same 3T Siemens Tim Trio whole-body scanner and 32-channel head coil at
the Hospital of the University of Pennsylvania. DWI scans were obtained
using a twice-focused spin-echo (TRSE) single-shot EPI sequence (TR = 8,100
ms, TE = 82 ms, FOV = 240 mm^{2}/240 mm^{2}; Matrix = RL:
128/AP: 128/Slices: 70, in-plane resolution (x & y) 1.875
mm^{2}; slice thickness = 2 mm, gap = 0; FlipAngle =
90°/180°/180°, volumes = 71, GRAPPA factor = 3,
bandwidth = 2170 Hz/pixel, PE direction = AP). The sequence employs a
four-lobed diffusion encoding gradient scheme combined with a 90-180-180
spin-echo sequence designed to minimize eddy current artifacts. The complete
sequence consisted of 64 diffusion-weighted directions with *b* = 1,000 s/mm^{2} and 7 interspersed scans
where *b* = 0 s/mm^{2}. Scan time was about 11 min.
The imaging volume was prescribed in axial orientation covering the entire
cerebrum with the topmost slice just superior to the apex of the brain
(Roalf et al., 2016).

#### Connectome construction.

Cortical gray matter was parcellated according to the Glasser atlas (Glasser et al., 2016), defining 360
brain regions as nodes for each subject’s structural brain network,
denoted as the weighted adjacency matrix **A**. To assess multiple
spatial scales, cortical and subcortical gray matter was parcellated
according to the Lausanne atlas (Cammoun
et al., 2012). Together, 89, 129, 234, 463, and 1,015 dilated
brain regions defined the nodes for each subject’s structural brain
network in the analyses of Figure 5.

DWI data was imported into DSI Studio software and the diffusion tensor was estimated at each voxel (Yeh, Verstynen, Wang, Fernández-Miranda, & Tseng, 2013). For deterministic tractography, whole-brain fiber tracking was implemented for each subject in DSI Studio using a modified fiber assessment by continuous tracking (FACT) algorithm with Euler interpolation, initiating 1,000,000 streamlines after removing all streamlines with length less than 10 mm or greater than 400 mm. Fiber tracking was performed with an angular threshold of 45, a step size of 0.9375 mm, and a fractional anisotropy (FA) threshold determined empirically by Otzu’s method, which optimizes the contrast between foreground and background (Yeh et al., 2013). FA was calculated along the path of each reconstructed streamline. For each subject, edges of the structural network were defined where at least one streamline connected a pair of nodes. Edge weights were defined by the average FA along streamlines connecting any pair of nodes. The resulting structural connectivity matrices were not thresholded and contain edges weighted between 0 and 1.

#### Arterial-spin labeling.

*f*as CBF,

*δM*as the difference of the signal between the control and label acquisitions,

*R*

_{1a}as the longitudinal relaxation rate of blood,

*τ*as the labeling time,

*ω*as the postlabeling delay time,

*α*as the labeling efficiency,

*λ*as the blood/tissue water partition coefficient, and

*M*

_{0}as the approximated control image intensity. Together, CBF

*f*can be calculated according to the equation:

Because prior work has shown that the T1 relaxation time changes substantially in development and varies by sex, this parameter was set according to previously established methods, which enhance CBF estimation accuracy and reliability in pediatric populations (Jain et al., 2012; Wu et al., 2010). As in prior work, the global network CBF was calculated as the average CBF across all brain regions to obtain an individual participant’s CBF (Satterthwaite, Shinohara, et al., 2014).

### Brain Maps

#### Cortical myelin.

*x*

^{2}in the following manner:

*x*is the myelin contrast in the T1w image, 1/

*x*is the myelin contrast in the T2w, and

*b*is the receive bias field in both T1w and T2w images. We used a published atlas generated by this method (Glasser et al., 2014).

#### Cortical areal scaling.

*β*were estimated for log

_{10}(total cortical surface area) as a covariate predicting log

_{10}(vertex area) using spline regression models that incorporated effects of age and sex on vertex area (Wood, 2004). We used the following relational form:

When *β* is 1, the scaling between total brain size and
brain regions is linear. When *β* deviates greater or
less than 1, scaling is nonlinearly and disproportionately expanding or
contracting. We used the published atlas generated using the same data as in
our study (Reardon et al., 2018; Satterthwaite, Elliott, et al.,
2014).

### Network Statistics

#### Global efficiency.

*G*as:

*d*

_{ij}is the shortest distance between node

*i*and node

*j*. Intuitively, a high 𝓔 value indicates greater potential capacity for global and parallel information exchange along shortest paths, and a low 𝓔 value indicates decreased capacity for such information exchange (Latora & Marchiori, 2001).

#### Path strengths.

*S*comprising the paths of multiple connections. As global efficiency measures the capacity of brain networks for shortest path routing, path strengths measure the capacity for stochastic communication. Path strengths are apt for assessing the network capacity for a model of stochastic transmission of impulses because paths can be represented as random walks

*p*= (

*i*,

*j*, …,

*k*), where

*p*is a path and

*i*,

*j*, and

*k*are nodes in the path. As in prior work (Becker et al., 2018), the strength of the weighted connections in a path, denoted

*ω*(

*p*), in the graph

*G*with adjacency matrix

**A**is defined as:

*p*, as depicted in the schematic Figure 1B. Then, for walks of length

*n*, the strengths of the paths from node

*i*to node

*j*are defined as:

*i*to node

*j*with length

*n*. When

*n*= 1, the matrix exponent produces a matrix with elements equal to

*d*

_{ij}from Equation 1, or the shortest distance between node

*i*and node

*j*. Intuitively, a high path strength represents structural paths that consist of higher integrity connections measured by DWI, whereas a low path strength indicates paths consisting of low integrity connections. To compute node strengths, the values for each node were summed. An average value was also calculated across node strengths per individual participant.

#### Path transitivity.

*i*and

*j*along the shortest path

*π*

_{s→t}, with neighboring nonshortest path nodes

*k*as:

*w*is the connection weight, and Θ(

*w*

_{ik}) = 1 if

*w*

_{ik}> 0, and 0 otherwise. Intuitively, the numerator is nonzero if and only if there are two locally detouring connections that make a closed triangle along the shortest path. If either of the two connections

*w*

_{ik}or

*w*

_{jk}does not exist, then the numerator is 0. With the denominator representing the strength of all cumulative connections of the shortest path nodes, the matching index fraction then represents the density of closed triangles (i.e., transitivity) around the shortest path.

*m*

_{ij}for each pairwise connection Ω from source node

*s*to target node

*t*by the set of shortest path edges

*π*

_{s→t}, we compute path transitivity

*M*as:

*m*

_{ij}for all edges in Ω, the scale factor of 2 indicates an undirected graph, and the denominator sums over all possible edges. Intuitively, a high path transitivity

*M*indicates that the shortest path is more densely encompassed by locally detouring triangular motifs. Low path transitivity indicates that the shortest path is surrounded by connections that deviate from the shortest path without an immediate avenue of return. An individual-level value of path transitivity was calculated as the average path transitivity across brain regions.

#### Modularity.

*A*. The modularity quality function is defined as:

*μ*= $12$ ∑

_{ij}

*A*

_{ij}denotes the total weight of

**A**,

*A*

_{ij}encodes the weight of an

*edge*between node

*i*and node

*j*in the structural connectivity matrix,

**P**represents the expected strength of connections according to a specified null model (Newman, 2006),

*γ*is a structural resolution parameter that determines the size of modules, and

*δ*is the Kronecker function which is 1 if

*g*

_{i}=

*g*

_{j}and zero otherwise. As in prior work, we set

*γ*to the default value of 1 (Bassett et al., 2011). Intuitively, a high

*Q*value indicates that the structural connectivity matrix contains communities, where nodes within a community are more densely connected to one another than expected under a null model. Modularity maximization is commonly used to detect community structure, and to quantitatively characterize that structure by assessing the strength and number of communities (Bassett et al., 2011; Baum et al., 2017; Newman, 2006). Regional contributions to the modularity quality function were used for analyses of brain regions.

#### Resource efficiency.

*i*that were required for at least one to travel along the shortest path to another node

*j*with probability

*η*(Fornito et al., 2016; Goñi et al., 2013). To begin, we consider the transition probability matrix by

**U**, defined as

**U**=

**WL**

^{−1}, where each entry

*W*

_{ij}of

**W**describes the weight of the directed edge from node

*i*to node

*j*, and each entry

*L*

_{ii}of the diagonal matrix

**L**is the strength of each node

*i*, defined as ∑

_{i}

*W*

_{ij}. Intuitively, each entry

*U*

_{ij}of

**U**defines the probability of a random walker traveling from node

*i*to node

*j*in one step. Next, to compute the probability that a random walker travels from node

*i*to node

*j*along the shortest path, we define a new matrix

*U*′(

*i*) that is equivalent to

**U**but with the nondiagonal elements of row

*i*set to zero and

*U*

_{ii}= 1 as an absorbent state. Then, the probability of randomly walking from

*i*to

*j*along the shortest path is given by:

*H*is the number of connections composing the shortest path from

*i*to

*j*. Similarly, the probability

*η*of releasing

*r*random walkers at node

*i*and having at least one of them reach node

*j*along the shortest path is given by:

*η*, we can then solve for the number of random walkers

*r*required to guarantee (with probability

*η*) that at least one of them travels from

*i*to

*j*along the shortest path, denoted by:

The number of random walkers *r*_{ij} has been referred to as resources in prior literature (Goñi et al., 2013). In our analyses, we
calculate resources *r*_{ij} over a
range of values of *η* for each participant. Finally,
to calculate the resource efficiency of each participant, the resource
efficiency of an entire network is taken to be
1/(*r*_{ij}(*η*))
averaged over all pairs of nodes *i* and *j*.
With the right stochastic matrix $Ui\u2032$,
the resource efficiency of brain regions as message senders is
1/(*r*_{ij}(*η*))
averaged over *i*, while brain regions as message receivers
is
1/(*r*_{ji}(*η*))
averaged over *j*.

#### Compression efficiency.

*x*is encoded as $x\u02c6$ with a level of distortion

*D*that depends on the information rate

*R*. The greater the rate, the less the distortion. The rate-distortion function

*R*(

*D*) defines the minimum information rate required to transmit a signal corresponding to a level of signal distortion (see Figure 3A). Lossy compression arises from the choice of the distortion function

*d*(

*x*, $x\u02c6$), which implicitly determines the relevant and irrelevant features of a signal. With the true signal

*x*mapped to the compressed signal $x\u02c6$ described by

*p*($x\u02c6$|

*x*), the rate-distortion function is defined by minimizing the mutual information of the signal and compression over the expected distortion defined as

*d*(

*x*, $x\u02c6$)

_{p(x,$x\u02c6$)}= ∑

_{x∈X}∑

_{$x\u02c6$∈$X\u02c6$}

*p*(

*x*, $x\u02c6$)

*d*(

*x*, $x\u02c6$):

By minimizing the mutual information *I*(*X*, $X\u02c6$),
we arrive at a probabilistic map from the signal to the compressed
representation, where the information gain between the signal and
compression is as small as possible (i.e., high fidelity) to favor the most
compact representations.

Similar to the mathematical framework of rate-distortion theory, we sought to
specify a distortion function reflecting communication over the
brain’s structural network. Prior work building models of perceptual
and cognitive performance have inferred distortion functions through
Bayesian inference of a loss function (Sims, 2016, 2018). For
instance, the loss function could be the squared error denoting the residual
values of the true signal minus the compression, *L* =
($x\u02c6$ − *x*)^{2} (Figure 3A). A neural rate-distortion theory has been
theoretically developed (Marzen &
DeDeo, 2017), but remains empirically untested due in part to a
lack of methodological tools at the level of brain systems. Moreover, it has
been difficult to define a distortion function that incorporates both true
signals *x* and compressed signals $x\u02c6$ in part
because the measurements of these signals in human brain networks remains
challenging. Here, we define an analogous framework of information transfer
through capacity-limited channels in the structural network of the brain.
Particularly, we build a distortion function from the simple intuition that
the shortest path is the route that most reliably preserves signal fidelity,
as depicted in Figure 3B.

*i*along the shortest path to node

*j*retains the greatest signal fidelity, we define the distortion function of any signal

*x*from brain region

*i*to a compressed representation $x\u02c6$ decoded in brain region

*j*as:

*η*denotes the probability that a walker gets from node

*i*to node

*j*along the shortest path. A signal with greater probability

*η*of propagating by the shortest path between brain region

*i*and brain region

*j*is at a lower risk of distortion (see Figure 3D). Intuitively, increased topological distance adds greater risk of signal distortion due to further transmission through capacity-limited channels (i.e., structural connections), temporal delay, and potential mixing with other signals. Given the measure of resources in Equation 12, we develop and test predictions of a novel definition of the rate

*R*(

*D*); here, we define

*R*(

*D*) as the resources

*r*

_{ij}(

*η*) required to achieve a tolerated level of distortion

*d*(

*x*, $x\u02c6$)

_{ij}:

*r*

_{ij}) is plotted against our metric of distortion

*D*=

*d*∈ 1 −

*η*

_{ij}, the exponential gradient is depicted linearly (see Figure 3E). Because prior work focused on 50% distortion during analyses, we required the slope to intersect the mean midpoint rate at 50% distortion (Goñi et al., 2013). In addition to the precedent offered by prior work, this requirement is also reasonable given that we sought to model both high and low distortions equitably. The slope denotes the minimum number of resources required to achieve a tolerated level of distortion, which we refer to as the

*compression efficiency*(Figure 3E, bottom). A steeper slope (i.e., a more negative relation) reflects reduced compression efficiency, or prioritization of message fidelity. A flatter slope (i.e., a more positive relation) reflects increased compression efficiency, or prioritization of lossy compression. Individual variation in compression efficiency can be assessed by using the average resource efficiency across brain regions. When compression efficiency is computed for sets of brain regions by averaging across individuals, the slope can denote either messages sent from or arriving to a brain region by using the average resource efficiency over either all nodes

*j*or all nodes

*i*, respectively.

While prior work calculates the bit rates that arise from the stochasticity of opening and closing ion channels or releasing a synaptic vesicle (Laughlin, 2001), in this article we specifically chose to focus on macroscale-level neuroimaging data. Our choice stemmed in part from the fact that the surrounding background literature in neuroimaging motivated many components of our theory, and in part from the fact that the theory of efficient coding had not yet been extended to this scale. We sought to understand the role of the whole connectome in supporting efficient communication and information processing.

To calculate numerical measures of bit rates at any spatial scale, we require a probabilistic information source. In the random walk model of interregional communication, each brain region is an information source that activates while sending a message—that is, a random walker—to another region along the wiring of the connectome. The information content of an individual message is determined by the probability of a message being sent, which, in our setting, is equivalent to the probability of a brain region changing its activity (activating or deactivating) in statistical association with behaviors and cognitive functions. At the level of macroscale neuroimaging data, individual studies describe localized neural activation and deactivation associated with specific cognitive tasks and behavior. Meta-analyses of these individual studies can help aggregate and summarize a large quantity of data on the general probability of brain region activation across a wide range of behaviors and cognitive functions (Sterling & Laughlin 2015; Yarkoni, Poldrack, Nichols, Essen, & Wager, 2011). We used the meta-analytic approach of NeuroSynth to obtain probability distributions across statistical maps of neural activity (Yarkoni et al., 2011). We then used this average probability that a brain region is active to calculate the information content of each communication event, or the transmission of one neural message.

*H*denotes the information content, or surprisal, of a neural signal measured by fMRI, and

*P*(

*Activation*|

*Terms*) defines the average conditional probability of brain region activation given all >3,200 psychological terms encompassing an ontology that spans sensation, behavior, cognition, emotion, and disorders (Yarkoni et al., 2011). The average information content of a message, which we model with a random walker, is 5.74 bits. A characteristic timescale of fMRI activity is given by the repetition time (TR) of each measurement, which is 0.72 seconds in the Human Connectome Project neuroimaging sequence (Barch et al., 2013) and can vary across studies. Hence, the information rate is 5.74 bits/0.72 seconds = 7.97 bits/second per channel use. As with the estimation of information content, the main results of our paper do not depend on the choice of a characteristic timescale, because all values are converted using the same unit of bits per message. In selected early graphs, we include a secondary

*y*-axis to indicate the conversion between units of random walkers and units of bits. To obtain the average bits per transmission for different levels of distortion, we multiplied the number of messages (random walkers) by 5.74 bits/message. We note that our theory can be applied to different neural activity measurements with connectomes reconstructed from a range of spatial scales. In future work, it will be important to analyze the predictability of bit rates measured directly from neural recordings based on theoretical estimates derived from the connectome.

#### Biased random walk.

**T**of CBF-biased transition probabilities as:

*T*

_{ij}defines the transition probabilities of a random walker traversing edges of the structural connectivity matrix

**A**which are multiplied by a bias term

*α*. For random walkers attracted to brain regions of high CBF, the bias term

*α*was defined as the average CBF value for each pair of brain regions. Hence, a random walker propagates over the brain’s structural connections with transition probabilities of

*T*

_{ij}that reflect the integrity of structural connections and the average level of CBF between pairs of brain regions. We then substituted the

**U**matrix in the resources

*r*

_{ij}(

*η*) of Equation 12 with

*T*

_{ij}in Equation 17 to compute the number of resources required for a biased random walker to propagate by the shortest path with a specified probability. To model electrical signal propagation, we performed the same matrix transformation as described above, using regional measurements of intracortical myelin.

#### Rich club.

^{z}(

*k*) as:

*Z*

^{ranked}is a vector of ranked network weights,

*k*is the degree,

*Z*

_{>k}is the set of edges connecting the group of nodes with degree greater than

*k*, and

*E*

_{>k}is the number of edges connecting the group of nodes with degree greater than

*k*. Hence, the rich-club coefficient Φ

^{z}(

*k*) is the ratio between the set of edge weights connected to nodes with degree greater than

*k*and the strongest

*E*

_{>k}connections. The rich-club coefficient was normalized by comparison to the rich-club coefficient of random networks (Colizza et al., 2006). Random networks were created by rewiring the edges of each individual’s brain network while preserving the degree distribution. The rich-club coefficient for the randomized networks Φ

_{random}(

*k*) was computed using Equation 18. Then, the normalized rich-club coefficient Φ

_{norm}(

*k*) was calculated as follows:

_{norm}(

*k*) > 1 indicates the presence of a rich-club organization. We tested the statistical significance of Φ

_{norm}(

*k*) using a 1-sample

*t*test at each level of

*k*, with family-wise error correction for multiple tests over

*k*. Each individual was assigned the value of their highest degree >

*k*rich-club level and their nodes were ranked by rich-club level. Over the group of individuals, the nodal ranks were averaged and the top 12% of nodes were selected as the rich club, following prior work (Collin et al., 2013).

#### Hierarchical organization.

*ζ*. To assess hierarchical structure, we examined the relationship between the nodal transitivity (clustering coefficient) and nodal degree (Ravasz & Barabási, 2003):

*C*is the transitivity (clustering coefficient),

*k*is the degree, and

*ζ*represents the extent of hierarchical organization. To estimate

*ζ*, we used similar curve-fitting methods as in estimating the compression efficiency gradient. The transitivity

*C*was defined as:

*n*is the number of links between the

*k*neighbors of node

*i*. Intuitively, a high

*ζ*value indicates hierarchical organization with less clustering about central nodes (with higher degree), and a low

*ζ*value indicates hierarchical organization with more clustering about central nodes (with higher degree).

### Network Null Models

Random graphs are commonly used in network science to test the statistical significance of the role of some network topology against null models. We used randomly rewired graphs generated by shuffling each individual’s empirical networks 20 times, as in prior work (Maslov & Sneppen, 2002). Furthermore, we generated Erdős-Rényi random networks for each individual brain network where the presence or absence of an edge was generated by a uniform probability calculated as the density of edges existing in the corresponding brain network. Edge weights were randomly sampled from the edge weight distribution of the brain network. While the randomly rewired graphs retain empirical properties such as the degree and edge weight distributions of the individual brain networks, the Erdős-Rényi networks do not. Hence, the randomly rewired null network was used in all analyses where the degree distribution should be retained (e.g., normalized rich-club coefficient), while the Erdős-Rényi network was used in analyses assessing the overall contribution of the brain network topology (e.g., compression efficiency). We used the Erdős-Rényi networks as the null network for two reasons. First, Erdős-Rényi networks are particularly appropriate for comparing the random walk model to the alternative shortest path routing model. Prior work found that Erdős-Rényi were networks with very high global efficiency, though the networks were biologically implausible due to their connection costs (Sporns, 2013). Thus, the Erdős-Rényi networks serve as a benchmark for a communication architecture supporting highly efficient shortest path routing.

Second, we chose to use Erdős-Rényi networks because we sought to compare random walk communication on brain network connectivity to random walk communication on synthetic networks that optimally minimize our distortion function—that is, maximizing the shortest path probability (Goñi et al., 2013). In doing so, we can address the major criticism of the plausibility of random walk models: inefficiency (Avena-Koenigsberger et al., 2018). Prior work compared random walks dynamics on canonical graph models, finding that the Erdős-Rényi networks had the highest probability of shortest path propagation (Goñi et al., 2013). Thus, we expected that the Erdős-Rényi networks would approximate a limit on the efficiency for random walk communication dynamics according to the rate-distortion model. Compared to this approximate limit of efficiency, we aimed to show that biological investments in brain network communication modeled using random walks could indeed be efficient. Moreover, we aimed to show that inefficiency, when viewed in the light of the biologically established efficient coding framework, uses the redundancy of communication to overcome noise introduced by intrinsic stochasticity of neural processes to balance the transmission fidelity and lossy compression of information (Barlow, 1961).

Our tests using the randomly rewired network evaluate the null hypothesis that an apparent rich-club property of brain networks is a trivial result of topology characteristic of random networks with some empirical properties preserved, as in prior work (Colizza et al., 2006; van den Heuvel & Sporns, 2013). The alternative hypothesis is that the brain network has a rich-club organization beyond the level expected in the random networks. Our tests using the Erdős-Rényi network evaluate the null hypothesis that the rates in the rate-distortion function modeling information processing capacity in brain networks does not differ from the rates in the rate-distortion function of random networks. The alternative hypothesis is that the rate of the brain network’s rate-distortion function differs from that of random networks, consistent with the notion that Erdős-Rényi networks have a greater prevalence of shortest paths compared to brain networks. We additionally used the Erdős-Rényi network to assess the hypothesis of rate-distortion theory that synthetic networks should exhibit the same information processing trade-offs (the monotonic rate-distortion gradient) as empirical brain networks (Sims, 2018). We selected Erdős-Rényi networks to assess these hypotheses for two reasons. First, Erdős-Rényi networks do not retain core architectures of brain networks, such as modularity, and therefore reflect an extreme synthetic network. Second, Erdős-Rényi networks are commonly used as a benchmark for assessing shortest path prevalence due to the prominence of uniformly distributed direct pairwise connections (Avena-Koenigsberger et al., 2018; Sporns, 2013). In light of the central assumption that shortest paths represent the route of highest signal fidelity in our definition of distortion, we used Erdős-Rényi networks to verify our intuition that compression efficiency should be greater in the Erdős-Rényi network than in brain networks.

Scale-free networks have a degree distribution resulting in a subset of highly
connected hubs. Prior work established that global transitivity decreases
exponentially with network size in a scale-free network, but that networks that
are scale-free *and* have hierarchical organization can decouple
transitivity from size. We created nonhierarchical scale-free networks using the
Barabási-Albert algorithm (with the preferential attachment parameter set
to 1 for linear preferential attachment) and reevaluated whether the structural
brain networks with hierarchical organization similarly decoupled transitivity
and size (Barabási & Albert,
1999). Decoupling transitivity and size allows for simultaneously
high levels of transitivity despite large size. In the context of efficient
coding theory, we chose to compare the global transitivity of structural brain
networks with scale-free networks across five network sizes to examine if
hierarchical organization in the human structural brain networks confers a
similar outcome.

### Statistical Analyses

To assess the covariation of our measurements across individuals and brain
regions, we used generalized additive models (GAMs) with penalized splines. GAMs
allow for statistically rigorous modeling of linear and nonlinear effects while
minimizing over-fitting (Wood, 2004).
Throughout, the potential for confounding effects was addressed in our model by
including covariates for age, sex, age-by-sex interaction, network degree,
network density, and in-scanner motion. Due to the likelihood of inflated
estimates of brain–behavior associations despite well-powered analyses
(Marek et al., 2020), we report
bootstrap 95% confidence intervals for the test statistic and adjusted *R*^{2} of each model across 1,000 bootstrap
samples.

#### Metabolic running costs associated with brain network architectures.

To evaluate the importance of age as a confound for the relationship between global efficiency and CBF, we also performed sensitivity analyses by removing selected covariates and reassessing the model. In addition, for consistency with prior work (Várkuti et al., 2011), we performed the same analysis including covariates for gray matter volume and density.

Assessments of path strengths were corrected for false discovery rate across the statistical tests performed over the discrete path lengths.

#### Trade-offs between modularity and random walk architecture.

To visualize the landscape of CBF as a function of modularity and path
transitivity, we plotted the GAM model response function. We described the
distribution of modularity and path transitivity across individuals using
frequency histograms. We calculated a map of change in global CBF with
respect to both modularity and path transitivity using first-order
derivatives. A saddle point suggests that adaptive compromises in network
architecture are constrained by dual objectives. To quantify the location of
the saddle point coordinate within the change map, we performed a *k*-nearest neighbor search of the value 0 in the
gradient of first-order derivatives indicating minima and maxima.

#### Compression efficiency and development.

To quantify the differences between the resources calculated from resource
efficiency and the compression efficiency across 14 levels of distortion, we
performed two-sample *t* tests while controlling for
family-wise error rate across multiple comparisons. For each level of
distortion, we calculated the *t*-statistic comparing all
individual resources to all individual resources predicted by the linear
rate-distortion gradient.

#### Compression efficiency of biased random walks.

*t*tests while controlling for family-wise error rate across multiple comparisons.

#### Compression efficiency in a low-or high-fidelity regime.

*and*the scale-free property achieved greater transitivity than random networks with only the scale-free property, as predicted by prior findings (Ravasz & Barabási, 2003). To evaluate the differences in transitivity according to the network properties of hierarchical organization and the scale-free property, we used the ANOVA model written as:

#### Compression efficiency and patterns of neurodevelopment.

To explore how compression efficiency might relate to patterns of cortical
myelination and areal scaling, we assessed the Spearman’s correlation
coefficient between myelination or scaling and send or receive compression
efficiency. We also assessed the relationship between cortical areal scaling
and nodal transitivity, in light of our hypothesis that greater path
transitivity may support greater fidelity and the efficient coding
hypothesis, which states that the organization of the brain allocates neural
resources according to the physical distribution of information. To further
test correspondence between brain maps, we used a spatial permutation test,
which generates a null distribution of randomly rotated brain maps that
preserve the spatial covariance structure of the original data (Alexander-Bloch et al., 2018). Using
spatially constrained null models is the state-of-the-art for comparing
brain maps (Alexander-Bloch et al.,
2018). We used a spin test variant that reassigns parcels with no
duplication. This method implements an iterative procedure that uniquely
assign parcels based on Euclidean distance of the rotated parcels, ignoring
the medial wall and its location (Váša et al., 2018). Our choice of null model
introduces a trade-off between permitting slightly more liberal critical
thresholds than other spatially constrained null models and retaining the
exact distribution of the original brain map to better test network-based
statistics like compression efficiency and transitivity (Markello & Misic, 2021). We
refer to the *p* value of this statistical test as *p*_{SPIN}. Finally, we applied
the conservative Holm–Bonferroni correction for family-wise error
across these tests.

#### Compression efficiency and rich-club hubs.

### Code Availability

Code can be found at https://github.com/dalejn/economicsConnectomics.

### Data Availability

Neuroimaging and cognitive test data were acquired from the Philadelphia Neurodevelopmental Cohort. The data reported in this paper have been deposited in the database of Genotypes and Phenotypes under accession number dbGaP: phs000607.v2.p2 (https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000607.v2.p2). The allometric cortical scaling maps were downloaded from a NeuroVault repository (Reardon et al., 2018) and the cortical myelin maps were downloaded from a public resource (Glasser & Van Essen, 2011).

### Citation Diversity Statement

Recent work in several fields of science has identified a bias in citation practices such that papers from women and other minority scholars are under cited relative to the number of such papers in the field (Bertolero et al., 2020; Caplar, Tacchella, & Birrer, 2017; Chatterjee & Werner, 2021; Dion, Sumner, & Mitchell, 2018; Dworkin et al., 2020; Fulvio, Akinnola, & Postle, 2021; Maliniak, Powers, & Walter, 2013; Mitchell, Lange, & Brus, 2013; X. Wang et al., 2021). Here we sought to proactively consider choosing references that reflect the diversity of the field in thought, form of contribution, gender, race, ethnicity, and other factors. First, we obtained the predicted gender of the first and last author of each reference by using databases that store the probability of a first name being carried by a woman (Dworkin et al., 2020; Zhou et al., 2020). By this measure (and excluding self-citations to the first and last authors of our current paper), our references contain 18.07% woman (first)/woman(last), 5.75% man/woman, 23.33% woman/man, and 52.85% man/man. This method is limited in that (1) names, pronouns, and social media profiles used to construct the databases may not, in every case, be indicative of gender identity and (2) it cannot account for intersex, nonbinary, or transgender people. Second, we obtained predicted racial/ethnic category of the first and last author of each reference by databases that store the probability of a first and last name being carried by an author of color (Ambekar, Ward, Mohammed, Male, & Skiena, 2009; Sood & Laohaprapanon, 2018). By this measure (and excluding self-citations), our references contain 7.38% author of color (first)/author of color (last), 12.18% white author/author of color, 24.82% author of color/white author, and 55.62% white author/white author. This method is limited in that (1) names and Florida Voter Data to make the predictions may not be indicative of racial/ethnic identity, and (2) it cannot account for Indigenous and mixed-race authors, or those who may face differential biases due to the ambiguous racialization or ethnicization of their names. We look forward to future work that could help us to better understand how to support equitable practices in science.

## ACKNOWLEDGMENTS

We acknowledge helpful discussions with Dr. Jennifer Stiso, Dr. Richard Betzel, Dr. David Lydon-Staley, Dr. Lorenzo Caciagli, Adon Rosen, and Dr. Bart Larsen.

## SUPPORTING INFORMATION

Supporting information for this article is available at https://doi.org/10.1162/netn_a_00223.

## AUTHOR CONTRIBUTIONS

Dale Zhou: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Visualization; Writing – original draft; Writing – review & editing. Christopher W. Lynn: Methodology; Writing – review & editing. Zaixu Cui: Data curation; Validation; Writing – review & editing. Rastko Ciric: Data curation; Writing – review & editing. Graham L. Baum: Data curation; Writing – review & editing. Tyler M. Moore: Methodology; Writing – review & editing. David R. Roalf: Data curation; Methodology; Writing – review & editing. John A. Detre: Data curation; Resources; Writing – review & editing. Ruben C. Gur: Funding acquisition; Resources; Writing – review & editing. Raquel E. Gur: Funding acquisition; Resources; Writing – review & editing. Theodore D. Satterthwaite: Conceptualization; Funding acquisition; Investigation; Project administration; Resources; Supervision; Writing – review & editing. Danielle S. Bassett: Conceptualization; Funding acquisition; Investigation; Methodology; Project administration; Resources; Supervision; Writing – review & editing.

## FUNDING INFORMATION

The work was largely supported by the John D. and Catherine T. MacArthur Foundation, the ISI Foundation, the Paul G. Allen Family Foundation, the Alfred P. Sloan Foundation, the NSF CAREER award PHY-1554488, NIH R01MH113550, NIH R01MH112847, and NIH R21MH106799. Secondary support was also provided by the Army Research Office (Bassett-W911NF-14-1-0679, Grafton-W911NF-16-1-0474) and the Army Research Laboratory (W911NF-10-2-0022). D.Z. acknowledges support from the National Institute of Mental Health F31MH126569. C.W.L. acknowledges support from the James S. McDonnell Foundation 21st Century Science Initiative Understanding Dynamic and Multi-scale Systems - Postdoctoral Fellowship Award. The content is solely the responsibility of the authors and does not necessarily represent the official views of any of the funding agencies.

## TECHNICAL TERMS

- Efficient coding:
A principle of neural signaling that predicts a trade-off between maximizing the amount of information conveyed under constraints of limited resources.

- Rate-distortion theory:
A mathematical framework of information theory for communicating information given a tolerated level of information distortion in channels with limited capacity.

- Lossy data compression:
A procedure that sacrifices information fidelity to improve compact and economical transmission.

- Repetition code:
A simple strategy to overcome noise that corrupts a transmission by repeatedly implementing the sufficient mechanisms to convey a message for multiple copies.

- Redundancy reduction:
A theory for how efficient coding may be implemented by exploiting statistical regularities of information inputs.

- Random walk:
A decentralized model of network communication wherein a brain region sends an input to a target region by discrete messages propagating across the structural network.

- Shortest path routing:
A model of network communication wherein a source brain region sends an input to a target region by selecting the shortest pathway between regions.

- Compression efficiency:
A metric that quantifies the trade-off between information compression and communication fidelity that is afforded by an individual’s structural connectivity.

## REFERENCES

*α*-synuclein pathology through the brain connectome is modulated by selective vulnerability and predicted by network analysis

## Author notes

Competing Interests: The authors have declared that no competing interests exist.

Co-senior authors.

Handling Editor: Petra Vertes