Abstract
The formation of large-scale brain networks, and their continual
refinement, represent crucial developmental processes that can drive
individual differences in cognition and which are associated with
multiple neurodevelopmental conditions. But how does this organization
arise, and what mechanisms drive diversity in organization? We use
generative network modeling to provide a computational framework for
understanding neurodevelopmental diversity. Within this framework
macroscopic brain organization, complete with spatial embedding of its
organization, is an emergent property of a generative wiring equation
that optimizes its connectivity by renegotiating its biological costs
and topological values continuously over time. The rules that govern
these iterative wiring properties are controlled by a set of tightly
framed parameters, with subtle differences in these parameters steering
network growth towards different neurodiverse outcomes. Regional
expression of genes associated with the simulations converge on
biological processes and cellular components predominantly involved in
synaptic signaling, neuronal projection, catabolic intracellular
processes and protein transport. Together, this provides a unifying
computational framework for conceptualizing the mechanisms and
diversity in neurodevelopment, capable of integrating different levels
of analysis—from genes to cognition.
Subject terms: Network models, Neuronal development, Dynamic networks,
Neurodevelopmental disorders
__________________________________________________________________
The formation of large-scale brain networks represents crucial
developmental processes that can drive individual differences in
cognition and which are associated with multiple neurodevelopmental
conditions. Here, the authors use generative network modelling to
provide a computational framework for understanding neurodevelopmental
diversity.
Introduction
The human brain is highly organized at multiple scales. At the broadest
scale, neuronal populations are structurally connected across large
anatomical distances with white-matter fiber bundles, forming a set of
interconnected networks. This macroscopic organization can be studied
via diffusion-weighted magnetic resonance imaging (MRI), which measures
the direction of water diffusion in vivo^[30]1–[31]3. During childhood
the emergence and continual refinement of these large-scale brain
networks allows for increasing functional integration and
specialization^[32]4,[33]5. This process is thought crucial for the
growth of complex cognitive processes such as language^[34]6 and
executive function^[35]7–[36]12. However, there are individual
differences in the organization of these networks across children, and
these differences mirror important developmental outcomes. Indeed,
differences in macroscopic networks have been implicated across
multiple neurodevelopmental conditions^[37]13, including ADHD^[38]14,
autism^[39]15,[40]16, and language disorders^[41]17.
But what mechanisms drive the diversity of macroscopic brain networks?
And how do these mechanisms give rise to individual differences in
children’s outcomes? There are numerous descriptive
theories^[42]18–[43]22 that speculate about how different levels of
analysis (e.g., genes, brain structure, and function) interact to
produce these neurodevelopmental differences. However, to date no
theories are sufficiently specified that they can simulate
individual-level brain networks. In the absence of computational
models, it is difficult to establish mechanistic links between
individual differences in observations (e.g., gene expression,
biological pathways, system wide organization). This theory gap
represents a major limitation for understanding neurodevelopmental
diversity. The purpose of this study is to address precisely this gap,
by modeling the generative wiring properties of a large sample of
children at heightened neurodevelopmental risk of poor outcomes. The
computational model we implemented is guided by a simple principle: the
brain’s structural organization is shaped by an economic trade-off
between minimizing wiring costs and adaptively enhancing valuable
topological features^[44]23. We hypothesize that the emergence of
whole-brain organization reflects the continual trade-off of these
factors over time and that tiny differences in the parameters governing
the trade-off can produce the neurodiverse outcomes we observe.
Somewhat counterintuitively, tight parameter constraints likely enable
macroscopic neurodiversity, because large changes in these parameters
would produce networks with configurational states that are not
observed in reality. Instead, narrow boundaries reflect parameter
conditions within which networks can be different, but still maintain
adequate structural properties to be functional.
Our work utilizes generative network modeling^[45]24,[46]25, in which
connections within a physically embedded network are formed
probabilistically over time according to a wide range of potential
mathematical constraints. Varying the parameters and wiring rules that
govern network formation provides a way of establishing which
statistics likely create real networks—in this case structural brain
networks in our large heterogeneous sample of children. Specifically,
we: (1) tested which topological features should be valued in the
wiring trade-off to produce highly accurate individual child
connectomes; (2) tested how small changes in these parameters alter the
organizational properties of the resulting networks; (3) established
relationships between these different wiring parameters and cognitive
outcomes; (4) identified genes with expression profiles that were
spatially co-located with those topological features; and (5)
established the biological pathways that are enriched in these gene
lists. Together, this provides a computational framework that
mathematically specifies the formation of a network over time, captures
individual differences in brain organization and cognition, and
incorporates the genetic and biological pathways that likely constrain
network formation.
Results
The generative network model
The generative network model (GNM) can be expressed as a simple wiring
equation^[47]24,[48]25 (Fig. [49]1a). If you imagine a series of
locations within the brain, at each moment in time the wiring equation
calculates which two locations will become connected. It calculates
this wiring probability by trading-off the cost of a connection
forming, against the potential value of the connection being formed.
The equation can be expressed as:
[MATH:
Pi,j∝(D
mi>i,j
)η(Ki,j)γ, :MATH]
1
where D[i,j] represents the Euclidean distance between nodes i and j
(i.e., “costs”), and K[i,j] reflects the value (i.e., “attractiveness”)
in forming a connection. P[i,j] represents the wiring probability as a
function of the product of the parameterized costs and value. The end
result is a wiring probability matrix which updates over time as new
connections are added.
Fig. 1. Updating wiring probabilities within the generative network model
iteratively, based on dynamically changing graphical structures.
[50]Fig. 1
[51]Open in a new tab
a The brain’s structural connectivity is modeled as a generative
network which grows over time according to parametrized connection
costs, (D[i,j])^η and values, (K[i,j])^γ. In this illustration, we use
subject one’s optimal model. b Early in network development, the
absence of a topology leads to proximal nodes being much more likely to
form connections. The displayed distances and probabilities are from
the right caudal anterior cingulate (n^2), which corresponds to row
(D[2],:)^η and (P[2],:). We display it’s six nearest cortical regions.
c Later, the relative values (K[i,j]) between nodes influence
connection probabilities, such that nodes which are more distant (e.g.,
left rostral anterior cingulate, n^[52]59 in red) may be preferred to
nodes which are closer (e.g., right superior frontal cortex, n^[53]27
in cyan). d As costs and values are decoupled, the wiring probability
can be rapidly recomputed when dynamic changes in graphical structure
occur over developmental time.
D[i,j] is parameterized by the scalar η, which changes how node-to-node
distances influence their probability of connecting. For example, when
η is negative, wiring probabilities decline when distances increase,
and this reflects the costliness of forming connections with nodes that
are distant. This is traded-off against K[i,j], which represents some
relationship between nodes, which can be thought of as a topological
value (or “rule”) driving the intention for node i to connect with node
j. K[i,j] is parameterized by a distinct scalar γ. K[i,j] can take a
range of different forms and can, in principle, be selected from any
non-geometric growth rule used to model social and economic
networks^[54]26–[55]28. One simple example is the “matching”
rule^[56]24: nodes form connections with other nodes on the basis of
their normalized overlap in neighborhood—i.e., whether nodes are
connected to similar nodes to themselves (also termed homophily).
To make this more concrete, imagine the following scenario: a network
is growing according to the matching rule, preferentially attaching to
nodes which are both similarly connected and spatially proximal. In the
wiring equation, this would be represented as η being negative (e.g.,
η = −1), K[i,j] represented as normalized neighborhoods between nodes
(i.e., matching) and its parameter γ being positive (e.g., γ = 1). In
short, a node being far away makes it less likely that a new connection
will be formed, but it having a similar a neighborhood increases the
likelihood. Suppose that the right caudal anterior cingulate (Node 2,
n^2) is going to wire to one of its six nearest neighbors. Initially,
due to an absent network topology, spatial proximity has a great
influence in the formation of new connections—it will wire to its
nearest neighbor (Fig. [57]1b). However, gradually over time, the
network’s developing structural topology means that K[i,j] (i.e., the
relationships between nodes) may now have a greater influence on wiring
probabilities. Indeed, the right caudal anterior cingulate may later
wire with a node that, although further away than other available
nodes, has a greater value (i.e., matching) than the others
(Fig. [58]1c). As the wiring equation separately parameterize costs and
value, the presence of a single connection can heavily influence the
topology of the network and thus the future updated wiring
probabilities. This is because new connections can lead to entirely new
overlapping neighbors, which may include distant nodes. As a result,
wiring probabilities can change considerably from moment to moment,
despite costs remaining fixed (Fig. [59]1d).
The GNM simulates this process across the whole brain, until the
overall number of connections matches those found in the observed brain
network. Subsequently, to test the accuracy of the simulation, an
energy function, E, must be defined which measures the dissimilarity
between simulated and observed networks^[60]24,[61]25:
[MATH: E=max(KSk,<
/mo>KSc,<
/mo>KSb,<
/mo>KSe), :MATH]
2
where KS is the Kolmogorov–Smirnov statistic comparing degree k,
clustering coefficient c, betweenness centrality b, and edge length e
distributions of simulated and observed networks. Minimizing E finds
parameters η and γ which generate networks most closely approximating
the observed network.
The four measures in the energy equation are good candidates for
evaluating the plausibility of simulated networks. They are critical
statistical properties of realistic networks and have featured within
the most well-documented simulated network models^[62]29–[63]31.
Moreover, these statistical properties have been implicated in a number
of neuropsychiatric conditions^[64]32,[65]33 in addition to being shown
to be heritable^[66]34.
Small variations in GNM parameter combinations produce accurate and spatially
embedded networks
From a basic seed network common to all participants (for detail, see
Supplementary Fig. 1 and Methods), we computed the subject-wise optimal
GNM (i.e., network with lowest energy) over a range of 10,000 evenly
spaced parameter combinations (−7 ≤ η ≤ 7, −7 ≤ γ ≤ 7) using 13
different generative rules (for rule formulae, see Supplementary
Table [67]1) across our large sample of children (N = 270, 178 males,
92 females, mean age = 9 years 10 months, SD age = 2 years 2 months;
full sample details can be found at Holmes et al.^[68]35). In each
case, we computed energy landscapes to contextualize how they perform
(Fig. [69]2a–[70]d). Mirroring findings in adult
samples^[71]24,[72]25,[73]35, we found that models driven by geometry
and topology outperform the pure geometric spatial model and
homophily-based models achieve the lowest energy for our pediatric
sample (Fig. [74]2e). In other words, when one combines the distance
penalty with the “matching rule” we described in our concrete example
(as shown in Fig. [75]1), it produces the most plausible simulated
brain networks. This difference between generative rules is extremely
robust. A post hoc power calculation revealed that the homophily-based
rules could be distinguished from the next best class with near-perfect
statistical power (t = −10.210, p = 6.705 × 10^−21; N = 270,
power > 0.99), and that this difference could be detected with around
70 participants.
Fig. 2. Sample-averaged energy landscape visualization and generative rule
comparisons.
[76]Fig. 2
[77]Open in a new tab
a Homophily-based methods. Matching and neighbors algorithms calculate
a measure of shared neighborhood between nodes. b The spatial method.
This ignores γ entirely, judging networks only on the basis of their
spatial relationship. c Clustering-based methods. These calculate a
summary measure between two nodes in terms of their clustering
coefficients. d Degree-based methods. These calculate a summary measure
between two nodes in terms of their degree. e Energy statistics from
the best performing simulation across 13 generative rules, showing that
matching can achieve the lowest energy networks given the appropriate
parameter combination. In total, there are N = 270 data points for each
of the 13 boxplots. A tabulated form of this figure is provided in
Supplementary Table [78]1. The boxplot presents the median and IQR.
Outliers are demarcated as small black crosses, and are those which
exceed 1.5 times the interquartile range away from the top or bottom of
the box. f A further 50,000 simulations were undertaken in the refined
matching window, as these defined boundary conditions for which
low-energetic networks were consistently achieved. Each cross
represents a subject’s individually specific wiring parameters that
achieved their lowest energy simulated network.
It is notable that across the matching energy landscape, these
plausible networks exist within an extremely narrow window of parameter
solutions. That is, as a proportion of the parameter space, the
matching rule (and the other homophily-based model “neighbors”) contain
the least number of low-energy networks relative to other rules. But as
Fig. [79]2e shows, these networks are the closest to real networks.
Thus, varying homophily-based parameters produces the most realistic
networks, yet has the lowest variability in the space (Supplementary
Fig. [80]2).
While small, variability within this narrow matching window determines
inter-individual differences in brain network growth. This is because
small changes in parameters (i.e., the magnitude and direction in which
costs and values influence wiring probabilities) can lead to networks
which are diverse yet include basic structural properties common to all
subjects. To derive more precise estimations of optimal generative
parameter combinations, we subsequently generated a new set of 50,000
evenly spaced simulated networks over this narrow low-energy matching
window (−3.606 ≤ η ≤ 0.354, 0.212 ≤ γ ≤ 0.495). Focusing on this energy
crevasse allows us to detect individual differences in optimal
parameter combinations with much greater specificity. In other words,
we resampled the parameter combinations focusing within the low-energy
window, to make sure we have the most precise estimate of each
individual child’s optimal parameters.
In Fig. [81]2f, we show the spatial distribution of these top
performing parameter combinations and Supplementary Table [82]2
documents their summary statistics. These finely calibrated networks
are even more low energy than in the previous analysis. In
Supplementary Fig. 3a–d we detail how KS statistics vary across the
same space. Importantly, due to the stochastic nature of GNMs, the
energy of optimal parameter combinations varies with an average
standard deviation (SD) of 0.045 across the sample (1000 independent
runs). Therefore, for the rest of this study, we quote our parameter
analyses averaged across a variable number of wiring parameters which
achieved networks with the lowest energy in the space: N = 1 (equating
to 0.002% of the space) N = 10 (0.02%), N = 100 (0.2%), and N = 500
(1.0%).
The optimal η and γ parameters are significantly negatively correlated
with each other, such that subjects with large γ parameters tend to
have larger negative η (Best N = 1 network: r = −0.284,
p = 2.07 × 10^−6; N = 10 networks: r = −0.403, p = 6.08 × 10^−12;
r = −0.460, p = 1.58 × 10^−15; N = 100 networks: p = −0.460,
p = 1.58 × 10^−15, N = 500 networks: r = −0.497, p = 3.21 × 10^−18)
(Supplementary Fig. [83]3f). Optimally simulated networks, using this
simple wiring equation, are so similar to the actual networks that a
support vector machine is unable to distinguish them using the
parameters from the energy Eq. ([84]2) (mean accuracy = 50.45%,
SD = 2.85%).
Replicating previous work, we find that our simulated networks,
optimized via the statistical properties included in the energy Eq.
([85]2) via homophily generative mechanisms, accurately capture these
properties in observed networks^[86]24,[87]25,[88]36. But do these
capture crucial network properties not included in the energy equation,
like their spatial embedding? We next examined if the spatial
patterning of these network properties arises simply from the
generative model.
Averaged across the sample, optimally performing generative models
(i.e., those using the “matching” rule) produce networks which
significantly correlate with observed networks in terms of their degree
(r = 0.522, p = 4.96 × 10^−5), edge length (r = 0.686,
p = 1.11 × 10^−11), and betweenness centrality (r = 0.304, p = 0.012)
but not clustering coefficient (r = −0.054, p = 0.663) (Fig. [89]3).
That is, the spatial embedding of these network properties seemingly
emerges, to mirror those of the observed networks, despite this not
being specified in the growth process. We extended this analysis to new
measures outside of the energy equation (Supplementary Fig. [90]4).
While local efficiency and assortativity cannot be significantly
predicted across the sample (r = 0.211, p = 0.084 and r = −0.096,
p = 0.116, respectively), optimally performing simulated and observed
networks correlate positively in terms of their global number of rich
clubs (r = 0.316, p = 1.11 × 10^−7), maximized modularity (r = 0.349,
p = 3.84 × 10^−9), and transitivity scalar (r = 0.411,
p = 2.11 × 10^−12). In short, despite not being specified in the growth
process, the simple homophily rule generates many properties of
observed brain networks.
Fig. 3. Spatial embedding of simulated networks grown via optimized homophily
generative mechanisms.
[91]Fig. 3
[92]Open in a new tab
For each network measure, we present the cumulative density functions
across all observed versus simulated nodes within each network. Each
point in the scatter plot shows one of the 68 across-subject average
nodal measures from the observed and optimally simulated networks. We
also show a visualization of these measures. All statistics were
computed via two-tailed linear correlations, quoting the Pearson’s
correlation coefficient. a Degree between observed and simulations are
significantly positively correlated (r = 0.522, p = 4.96 × 10^−5). b
Clustering between observed and simulations at not correlated
(r = −0.054, p = 0.663). c Betweenness centrality between observed and
simulations are significantly positively correlated (r = 0.304,
p = 0.012). d Edge length (as a summation of all edges from each node)
between observed and simulations are significantly positively
correlated (r = 0.686, p = 1.11 × 10^−11). Boldened values are
significant correlations at p < 0.05. In Supplementary Fig. 4, we
present a parallel analysis including local and global measures not
included in the energy equation. In Supplementary Fig. 5, we demarcate
for each measure the generative error in spatial embedding, and show
the ranked performance for each region.
One criticism of our simulations is that their embedding may be an
artifact of the seed network (which is on average 10.8% the density of
the observed/simulated networks). In short, if by chance the seed
network mirrors the final network, it could be inevitable that spatial
embedding would emerge, in terms of node degree, betweenness centrality
and edge length. If so, one would expect initial local statistical
properties of the seed to be significantly associated with regional
accuracy of the simulation. To determine if this is the case, we
analyzed the regional accuracy of our homophily simulations by
determining their regional generative error (as a mismatch between
observed and simulated outcome, depicted in Supplementary Fig. [93]5).
Importantly, the average ranked error (Supplementary Fig. [94]5e) is
not correlated with the seed network’s connectivity (r = −0.0711,
p = 0.5643). Furthermore, seed features do not correlate with their own
feature’s resultant error (Degree, r = −0.0408, p = 0.7410;
Betweenness, r = 0.1833, p = 0.1345; Edge length, r = 0.1114,
p = 0.3659).
The large heterogenous sample we chose is ideal for this computational
approach to understanding diversity, but it is highly likely that this
approach will work for more standard typically developing cohorts of
children. In Supplementary Figure 6 we replicate our key findings in an
independent sample of N = 140 children recruited from local primary
schools in the same area (for more details about the cohort, see
“Methods”; “Participants” and Johnson et al.^[95]37).
Individual differences in wiring parameters mirror connectome organization,
gray matter morphology, and cognitive scores
A critical benefit of a generative modeling approach is that it allows
us to probe the underlying mechanisms occurring over the development of
the network^[96]38. As one explicitly specifies the generative
mechanisms involved, the statistical properties which “fall out” of the
network can be considered as spandrels^[97]39; epiphenomena of the
network’s development according to the much simpler economical
trade-off (in this case, according to the homophily principle). If the
generative model is indeed capturing biologically relevant processes,
one would expect the wiring equation—at a minimum—to reduce the
network’s dimensionality simply into the two wiring parameters used to
construct it. Under this view, individual differences in wiring
parameters should (1) map to wide-ranging statistical properties of the
observed network, which may be considered spandrels and (2)
appropriately reduce dimensionality of the connectome such that, for
example, one can equivalently predict cognitive scores from parameters
as one would be able to from the network properties.
To explore this, we first examined how wiring parameters reflect
observed features of brain organization by quantifying how a subject’s
η and γ relate to global measures of their observed connectome.
Furthermore, for all 270 subjects, cortical morphology data were
available. In Fig. [98]4, we document how global network and
morphological measures (most of which are not included in the energy
equation) relate to each other, in addition to their reasonably stable
association with a varying number of high performing η and γ wiring
parameters (specific results are provided in Supplementary
Table [99]3), and their associations with age. Figure [100]4b shows
that η is significantly associated with age—the network parameters
needed to form optimal networks over time need to favor longer distance
connections for older participants, relative to younger
participants^[101]25. To disentangle age-related parameter differences
from individual differences, we repeated all of our correlations across
measures whilst partialling out age (Supplementary Fig. [102]7).
Associations remain when age has been controlled for, demonstrating
that age-related changes in optimal parameters are relatively
independent of the individual differences in those parameters. This is
not only an important step in demonstrating that these parameters
generalize to distinct measures (e.g., morphological observations) not
used to train the generative models, but also demonstrates that the
generative approach is consistent with the notion that wiring
parameters themselves have significant associations with numerous
statistical properties in a network.
Fig. 4. Statistical properties of the connectome and cortical morphology, and
their relationships with wiring parameters and age.
[103]Fig. 4
[104]Open in a new tab
a The correlation matrix of connectome and morphological findings show
how each measure correlates with every other measure. Measures 3–6 were
included in the energy equation. Measures 7–11 are connectome measures
not included in the energy equation. Measures 12–19 are cortical
morphological measures. η and γ are each significantly correlated with
a range of measures, both inside and outside of the energy equation.
Correlation coefficient matrices are shown, the bottom row of which is
highlighted and is reflected in the above radar plots (middle), in
addition to the significance matrix (bottom), across varying numbers of
top performing parameters, for each of the 19 measures investigated. b
Radar plots depict the correlations between all measures and η (left)
and γ (right) averaged across the top N = 500 parameters in the
parameter space. All statistics were computed via two-tailed linear
correlations, quoting the Pearson’s correlation coefficient. The
asterisk, *, reflects significant correlations at p < 0.05. Note, the
inner edge of the radar plot reflects negative correlations and the
outer edge reflects positive correlations. Specific results for
variable top performing parameters are provided in Supplementary
Table [105]3. Further scatter plots are provided highlighting the
relationship of wiring parameters with age. η has a significantly
positive relationship with age (r = 0.325, p = 4.518 × 10^−8) while γ
has a weak non-significant negative relationship with age (r = −0.117,
p = 0.054).
Next, we tested the ability of the wiring equation to reduce the
dimensionality of the connectome. Specifically, if wiring parameters
are accurate decompositions of an individual’s structural network they
should predict cognitive outcomes equivocally to observed features of
the connectome. For all 270 subjects we had data from a battery of
cognitive tasks, including measures of executive function, phonological
awareness, working memory, fluid reasoning and vocabulary (for details
of the tasks see “Methods”; “Cognitive and learning assessments”).
We tested the relationship between a subject’s age-standardized
cognition and (1) their optimal wiring parameters and (2) global
measures of their structural connectome (using measures included in the
energy equation). This was done using partial least squares (PLS), a
multivariate statistical technique which extracts optimally covarying
patterns from two data domains^[106]40. We undertook two separate PLS
analyses, which correlated (1) optimal wiring parameter combinations or
(2) global connectome measures across our sample, with cognitive
performance in the nine tasks, respectively (Fig. [107]5a). For both
analyses, PLS1 was significant in the amount of explained covariance
(p[cov] = 0.009 and p[cov] = 0.049, respectively). PLS1 score
predictions, and their cognitive loadings, are extremely similar
between wiring parameters and connectome features (r = 0.191,
p = 1.63 × 10^−3, p[corr] = 7 × 10^−4 and r = 0.210, p = 5.29 × 10^−4;
each p[corr] = 7 × 10^−4 and p[corr] 4 × 10^−4, respectively, from
10,000 permutations of scores) (Fig. [108]5b, c).
Fig. 5. Covarying patterns of wiring parameters and connectome features with
cognitive performance across nine cognitive tasks.
[109]Fig. 5
[110]Open in a new tab
a A visual representation of the two PLS analyses undertaken. b There
is a significant positive correlation (two-tailed linear correlation,
quoting the Pearson’s correlation coefficient) between parameter scores
and PLS-derived cognitive scores. PLS1 was statistically significant
(p[corr] = 7 × 10^−4 and p[corr] = 4 × 10^−4, respectively) for both
analyses using n = 10,000 permutations. Each parameter loads with
similar magnitude onto PLS1. c There is an analogous significant
positive correlation between connectome scores and PLS-derived
cognition scores, using the same statistical procedure.
Variability in neurodevelopmental trajectories arises through value-updating
over time
While small generative parameter differences result in differential
network properties, we have yet to show how this variability may occur
over the development of the networks. That is, how do differences in
parameter combinations across subjects manifest themselves when the
network is developing? To address this, we examined how between-subject
variability in optimal GNMs emerge at the level of cortical nodes and
their connections. This is possible by simply decomposing the optimal
simulation into its constituent parametrized costs (D[i,j])^η, values
(K[i,j])^γ, and wiring probabilities (P[i,j]) at each time point, for
each subject (Fig. [111]6a, b). This allows us to quantify growth
trajectories and thus establish which aspects of network emergence vary
most in the sample.
Fig. 6. Wiring Eq. ([112]1) decomposition and the subsequent variability
across subjects in our heterogeneous sample.
[113]Fig. 6
[114]Open in a new tab
a For each subject, a simulated network is produced by minimizing the
energy between the observed and simulated network. Here, we present
visualizations for subject one (red). b Costs (D[i,j]) are static,
while values (K[i,j]) dynamically update according to the matching
rule, which enables the computation of wiring probability (P[i,j]). c
The mean and standard deviation for each subject of their edge-wise
parameterized costs, d parameterized values and e wiring probabilities.
f Histograms of each subject’s coefficient of variation (CV) showing
that subjects are more variable in their value-updating compared to
costs, leading to large wiring probability variability. g Regional
patterning of sample-averaged nodal parameterized costs and values,
showing highly “valuable” patterning in the left temporal lobe and
“cheap” regions generally occupying medial aspects of the cortex.
Variability declines as value increases, but increases for costs.
For each subject, we computed the coefficient of variation (CV, σ/μ) of
their parameterized costs, matching values and wiring probabilities to
compare subject-specific variability, as it emerges throughout the
simulated growth of connectomes. While subjects exhibit some
variability in how parameterized costs influence wiring probabilities
(mean CV 2.27), this is dwarfed by their parameterized values over time
(mean CV 33.02). This is because the matching value is dynamic,
changing at each iteration (as in Fig. [115]1d), unlike relative
Euclidean distance between nodes which is static. The result is that
significant inter-individual variability arises in the probability of
connections forming (mean CV 53.08), leading to the emergence of
divergent brain organization (Fig. [116]6c–f). Furthermore, the
regional patterning of costs and values is not random (Fig. [117]6g).
Nodes and edges with high matching values decline in their variability,
suggesting a consistency across subjects in highly “attractive” nodal
structures and their connections. Across the sample, cheaper regions
occupy the medial aspects of the cortex while highly valuable regions
generally reside in the temporal cortex.
Genomic patterning of network growth
Underlying these macroscopic changes in brain organization across time
are a series of complex molecular mechanisms. These are partly governed
by genetically coded processes that vary across individuals. We next
tested whether these processes may steer the brain network toward a
particular growth trajectory within our GNMs.
Nodal cost and nodal “matching” value patterning alongside regional
gene expression profiles of 10,027 genes using human adult brain
microarray data^[118]41,[119]42 were integrated into two PLS analyses
for each subject. For all analyses, gene expression scores at each node
were used as the predictor. For each subject’s first analysis, their
parameterized nodal costs (calculated as the subject’s ∑ (D[i],:)^η, as
visualized Fig. [120]6g; fourth row, right) was used as the response
variable. For each subject’s second analysis, their mean parameterized
values (calculated as the subject’s (∑ (K[i],:)^γ averaged over time,
as visualized in Fig. [121]6g; second row, right) was used as the
response variable. Each analysis defined PLS components independently
which were linear combinations of the weighted gene expression scores
at each node (predictor variables) that were most strongly correlated
with the subject’s nodal costs and nodal values of their simulated
growth trajectory. To limit the variability across regions in terms of
the samples available, only left hemispheric gene data were
analyzed^[122]42.
Across our sample, the first PLS component (PLS1) explained on average
65.0% (SD 1.3%) and 56.9% (SD 9.2%) of the covariance between genetic
expression and nodal costs, and nodal values, respectively. The average
nodal costs PLS1 score significantly correlates with average nodal
costs (r = 0.794, p = 2.07 × 10^−8, p[corr] = 2 × 10^−4). Similarly,
the average nodal values PLS1 score significantly correlates with
average nodal values (r = 0.718, p = 1.71 × 10^−6, p[corr] = 1 × 10^−4)
(Fig. [123]7a, b). To then characterize the genetic profiles associated
with each PLS analysis, we permuted the response variable 1000 times to
form a null distribution for the loading of each gene, across each
subject’s PLS1. This provides an estimate of how strong the loading
would be by chance, and thus which genes exceed p[corr] < 0.05. Across
subjects, PLS1 provided an average of 581.5 significant genes (SD
101.4) for nodal costs and 437.6 significant genes (SD 167.4) for nodal
values (Supplementary Fig. [124]8a).
Fig. 7. Over expressed genes which explain variance in brain wiring across
subjects.
[125]Fig. 7
[126]Open in a new tab
Both PLS1 components across subjects are enriched for functionally
specific biological processes and cellular components. Node size
represents the number of genes in the set. The edges relate to gene
overlap. a Sample-averaged parameterized costs significantly correlates
with sample-averaged PLS1 nodal gene scores, explaining on average
65.0% covariance. Statistics were computed via two-tailed linear
correlations, quoting the Pearson’s correlation coefficient, followed
by n = 10,000 permutations. b Sample-averaged parameterized values
significantly correlates with sample-averaged PLS1 nodal gene scores
explaining on average 56.9% covariance. Statistics were computed via
two-tailed linear correlations, quoting the Pearson’s correlation
coefficient, followed by n = 10,000 permutations. c Nodal costs PLS1 is
enriched for genes predominantly associated with protein localization,
catabolic processes, and ribosomal/membrane cellular components. d
Nodal values PLS1 is enriched for genes predominantly associated with
synaptic signaling, neuronal projection and synaptic membranes.
Genes do not act in isolation, but instead converge to govern
biological pathways across spatial scales. To move from individual
genes to biological processes (BPs) and cellular components (CCs), we
performed a pathway enrichment analysis^[127]43. Pathway enrichment
analysis summarizes large gene sets as a smaller list of more easily
interpretable pathways that can be visualized to identify main
biological themes. Genes were ordered according to their frequency in
being significantly associated with connectome growth across subjects
for that component. For example, for nodal values PLS1, top of the list
was the gene associated with connectome growth in the most subjects
(CHI3L1; significant for 49.4% of our sample), the next was the second
most frequent gene (PRKAB2; 36.4% of our sample) and so on. Our list
stopped when genes were significant for <10% of the sample. This left
the nodal costs PLS1 with a list of 1427 genes and the nodal values
PLS1 with a list of 1584 genes ordered in terms of importance, which
were submitted to pathway enrichment analysis in g:Profiler
([128]https://biit.cs.ut.ee/gprofiler/gost) (Supplementary
Fig. [129]8b)^[130]43. g:Profiler searches a collection of gene sets
representing GO terms. In the ordered test, it repeats a modified
Fisher’s exact test on incrementally larger sub-lists of the input
genes and reports the sub-list with the strongest enrichment.
Multiple-test correction is applied to produce an adjusted p value
(p[adj]) for each enrichment^[131]43,[132]44 (as visualized in
Supplementary Fig. 8c, d, which can be accessed via the links presented
in Supplementary Table [133]4).
The genes identified within the subject-wise PLS are not random, but
instead converge on particular BPs and CCs. The nodal costs PLS1 was
most prominently enriched for genes associated with BPs including
catabolic processes and protein localization (32 BPs; all
p[adj] < 9.58 × 10^−3), cell projection (14 BPs; all
p[adj] < 4.39 × 10^−2), immunological processes (34 BPs; all
p[adj] < 4.82 × 10^−2), regulation of metabolic processes (8 BPs; all
p[adj] < 4.75 × 10^−2), and regulation of cell development and
differentiation (4 BPs; all p[adj] < 3.87 × 10^−2). In terms of CCs,
nodal costs PLS1 was enriched for genes associated with the ribosome
(14 CCs; all p[adj] < 2.15 × 10^−2), vesicular and endoplasmic
membranes (19 CCs; all p[adj] < 4.90 × 10^−2) and intracellular
organelles (8 CCs; all p[adj] < 4.97 × 10^−2) (Fig. [134]7c).
The nodal values PLS1 was most prominently enriched for genes
associated with BPs including synaptic signaling (29 BPs; all
p[adj] < 3.96 × 10^−2), neuronal projection and development (26 BPs;
all p[adj] < 4.21 × 10^−2), and synapse organization (2 BPs; all
p[adj] < 2.92 × 10^−2). In terms of CCs, nodal values PLS1 was enriched
for genes associated with synaptic membranes (60 CCs; all
p[adj] < 3.15 × 10^−2) and ion channel complexes (7 CCs; all
p[adj] < 1.18 × 10^−2) (Fig. [135]7d).
In Supplementary Table [136]4 we provide links so that readers can run
our precise gene ontology (GO) queries within a browser and in
Supplementary Fig. 8c, d we show a visualization of these enriched gene
sets.
Discussion
Diversity in macroscopic human brain organization can be modeled using
a generative network. The generative framework does not include time
itself as an explicit parameter, but instead models it as a sequence of
processes, optimizing its connectivity by renegotiating its costs and
value^[137]24,[138]25 continuously over iterations. Despite the
simplicity of this equation, it results in the dynamic updating of
wiring probabilities over time, with multiple network properties, like
spatial embedding, being an emergent property of this dynamic updating.
This resonates with theoretical perspectives that implicate dynamic
interactions between brain systems over development in progressive,
integrative, specialization^[139]45. We have formalized this process in
the context of neurodevelopmental diversity; offering a new perspective
on the formation of organized macroscopic networks, their possible
biological underpinnings, and their association with functional
outcomes like cognitive performance. This reflects a theoretical
step-change in understanding diversity in neurodevelopment, being
sufficiently well-specified to generate macroscopic brain networks. In
turn, this formalization allows for the unpacking of the computational
and/or biological constraints that shape the trajectories of networks.
Indeed, we anticipate that GNMs may be a powerful tool to model real
and biologically feasible artificial networks across many scales.
Small changes in wiring parameters of the GNM lead to divergent
macroscopic brain networks, with systematically different network
properties. Within the model, the key factor that drives individual
differences in growth trajectory is the dynamic nature of updating
preferences over time. Specifically, as nodes form new connections this