Abstract The formation of large-scale brain networks, and their continual refinement, represent crucial developmental processes that can drive individual differences in cognition and which are associated with multiple neurodevelopmental conditions. But how does this organization arise, and what mechanisms drive diversity in organization? We use generative network modeling to provide a computational framework for understanding neurodevelopmental diversity. Within this framework macroscopic brain organization, complete with spatial embedding of its organization, is an emergent property of a generative wiring equation that optimizes its connectivity by renegotiating its biological costs and topological values continuously over time. The rules that govern these iterative wiring properties are controlled by a set of tightly framed parameters, with subtle differences in these parameters steering network growth towards different neurodiverse outcomes. Regional expression of genes associated with the simulations converge on biological processes and cellular components predominantly involved in synaptic signaling, neuronal projection, catabolic intracellular processes and protein transport. Together, this provides a unifying computational framework for conceptualizing the mechanisms and diversity in neurodevelopment, capable of integrating different levels of analysis—from genes to cognition. Subject terms: Network models, Neuronal development, Dynamic networks, Neurodevelopmental disorders __________________________________________________________________ The formation of large-scale brain networks represents crucial developmental processes that can drive individual differences in cognition and which are associated with multiple neurodevelopmental conditions. Here, the authors use generative network modelling to provide a computational framework for understanding neurodevelopmental diversity. Introduction The human brain is highly organized at multiple scales. At the broadest scale, neuronal populations are structurally connected across large anatomical distances with white-matter fiber bundles, forming a set of interconnected networks. This macroscopic organization can be studied via diffusion-weighted magnetic resonance imaging (MRI), which measures the direction of water diffusion in vivo^[30]1–[31]3. During childhood the emergence and continual refinement of these large-scale brain networks allows for increasing functional integration and specialization^[32]4,[33]5. This process is thought crucial for the growth of complex cognitive processes such as language^[34]6 and executive function^[35]7–[36]12. However, there are individual differences in the organization of these networks across children, and these differences mirror important developmental outcomes. Indeed, differences in macroscopic networks have been implicated across multiple neurodevelopmental conditions^[37]13, including ADHD^[38]14, autism^[39]15,[40]16, and language disorders^[41]17. But what mechanisms drive the diversity of macroscopic brain networks? And how do these mechanisms give rise to individual differences in children’s outcomes? There are numerous descriptive theories^[42]18–[43]22 that speculate about how different levels of analysis (e.g., genes, brain structure, and function) interact to produce these neurodevelopmental differences. However, to date no theories are sufficiently specified that they can simulate individual-level brain networks. In the absence of computational models, it is difficult to establish mechanistic links between individual differences in observations (e.g., gene expression, biological pathways, system wide organization). This theory gap represents a major limitation for understanding neurodevelopmental diversity. The purpose of this study is to address precisely this gap, by modeling the generative wiring properties of a large sample of children at heightened neurodevelopmental risk of poor outcomes. The computational model we implemented is guided by a simple principle: the brain’s structural organization is shaped by an economic trade-off between minimizing wiring costs and adaptively enhancing valuable topological features^[44]23. We hypothesize that the emergence of whole-brain organization reflects the continual trade-off of these factors over time and that tiny differences in the parameters governing the trade-off can produce the neurodiverse outcomes we observe. Somewhat counterintuitively, tight parameter constraints likely enable macroscopic neurodiversity, because large changes in these parameters would produce networks with configurational states that are not observed in reality. Instead, narrow boundaries reflect parameter conditions within which networks can be different, but still maintain adequate structural properties to be functional. Our work utilizes generative network modeling^[45]24,[46]25, in which connections within a physically embedded network are formed probabilistically over time according to a wide range of potential mathematical constraints. Varying the parameters and wiring rules that govern network formation provides a way of establishing which statistics likely create real networks—in this case structural brain networks in our large heterogeneous sample of children. Specifically, we: (1) tested which topological features should be valued in the wiring trade-off to produce highly accurate individual child connectomes; (2) tested how small changes in these parameters alter the organizational properties of the resulting networks; (3) established relationships between these different wiring parameters and cognitive outcomes; (4) identified genes with expression profiles that were spatially co-located with those topological features; and (5) established the biological pathways that are enriched in these gene lists. Together, this provides a computational framework that mathematically specifies the formation of a network over time, captures individual differences in brain organization and cognition, and incorporates the genetic and biological pathways that likely constrain network formation. Results The generative network model The generative network model (GNM) can be expressed as a simple wiring equation^[47]24,[48]25 (Fig. [49]1a). If you imagine a series of locations within the brain, at each moment in time the wiring equation calculates which two locations will become connected. It calculates this wiring probability by trading-off the cost of a connection forming, against the potential value of the connection being formed. The equation can be expressed as: [MATH: Pi,j(Di,j )η(Ki,j)γ, :MATH] 1 where D[i,j] represents the Euclidean distance between nodes i and j (i.e., “costs”), and K[i,j] reflects the value (i.e., “attractiveness”) in forming a connection. P[i,j] represents the wiring probability as a function of the product of the parameterized costs and value. The end result is a wiring probability matrix which updates over time as new connections are added. Fig. 1. Updating wiring probabilities within the generative network model iteratively, based on dynamically changing graphical structures. [50]Fig. 1 [51]Open in a new tab a The brain’s structural connectivity is modeled as a generative network which grows over time according to parametrized connection costs, (D[i,j])^η and values, (K[i,j])^γ. In this illustration, we use subject one’s optimal model. b Early in network development, the absence of a topology leads to proximal nodes being much more likely to form connections. The displayed distances and probabilities are from the right caudal anterior cingulate (n^2), which corresponds to row (D[2],:)^η and (P[2],:). We display it’s six nearest cortical regions. c Later, the relative values (K[i,j]) between nodes influence connection probabilities, such that nodes which are more distant (e.g., left rostral anterior cingulate, n^[52]59 in red) may be preferred to nodes which are closer (e.g., right superior frontal cortex, n^[53]27 in cyan). d As costs and values are decoupled, the wiring probability can be rapidly recomputed when dynamic changes in graphical structure occur over developmental time. D[i,j] is parameterized by the scalar η, which changes how node-to-node distances influence their probability of connecting. For example, when η is negative, wiring probabilities decline when distances increase, and this reflects the costliness of forming connections with nodes that are distant. This is traded-off against K[i,j], which represents some relationship between nodes, which can be thought of as a topological value (or “rule”) driving the intention for node i to connect with node j. K[i,j] is parameterized by a distinct scalar γ. K[i,j] can take a range of different forms and can, in principle, be selected from any non-geometric growth rule used to model social and economic networks^[54]26–[55]28. One simple example is the “matching” rule^[56]24: nodes form connections with other nodes on the basis of their normalized overlap in neighborhood—i.e., whether nodes are connected to similar nodes to themselves (also termed homophily). To make this more concrete, imagine the following scenario: a network is growing according to the matching rule, preferentially attaching to nodes which are both similarly connected and spatially proximal. In the wiring equation, this would be represented as η being negative (e.g., η = −1), K[i,j] represented as normalized neighborhoods between nodes (i.e., matching) and its parameter γ being positive (e.g., γ = 1). In short, a node being far away makes it less likely that a new connection will be formed, but it having a similar a neighborhood increases the likelihood. Suppose that the right caudal anterior cingulate (Node 2, n^2) is going to wire to one of its six nearest neighbors. Initially, due to an absent network topology, spatial proximity has a great influence in the formation of new connections—it will wire to its nearest neighbor (Fig. [57]1b). However, gradually over time, the network’s developing structural topology means that K[i,j] (i.e., the relationships between nodes) may now have a greater influence on wiring probabilities. Indeed, the right caudal anterior cingulate may later wire with a node that, although further away than other available nodes, has a greater value (i.e., matching) than the others (Fig. [58]1c). As the wiring equation separately parameterize costs and value, the presence of a single connection can heavily influence the topology of the network and thus the future updated wiring probabilities. This is because new connections can lead to entirely new overlapping neighbors, which may include distant nodes. As a result, wiring probabilities can change considerably from moment to moment, despite costs remaining fixed (Fig. [59]1d). The GNM simulates this process across the whole brain, until the overall number of connections matches those found in the observed brain network. Subsequently, to test the accuracy of the simulation, an energy function, E, must be defined which measures the dissimilarity between simulated and observed networks^[60]24,[61]25: [MATH: E=max(KSk,< /mo>KSc,< /mo>KSb,< /mo>KSe), :MATH] 2 where KS is the Kolmogorov–Smirnov statistic comparing degree k, clustering coefficient c, betweenness centrality b, and edge length e distributions of simulated and observed networks. Minimizing E finds parameters η and γ which generate networks most closely approximating the observed network. The four measures in the energy equation are good candidates for evaluating the plausibility of simulated networks. They are critical statistical properties of realistic networks and have featured within the most well-documented simulated network models^[62]29–[63]31. Moreover, these statistical properties have been implicated in a number of neuropsychiatric conditions^[64]32,[65]33 in addition to being shown to be heritable^[66]34. Small variations in GNM parameter combinations produce accurate and spatially embedded networks From a basic seed network common to all participants (for detail, see Supplementary Fig. 1 and Methods), we computed the subject-wise optimal GNM (i.e., network with lowest energy) over a range of 10,000 evenly spaced parameter combinations (−7 ≤ η ≤ 7, −7 ≤ γ ≤ 7) using 13 different generative rules (for rule formulae, see Supplementary Table [67]1) across our large sample of children (N = 270, 178 males, 92 females, mean age = 9 years 10 months, SD age = 2 years 2 months; full sample details can be found at Holmes et al.^[68]35). In each case, we computed energy landscapes to contextualize how they perform (Fig. [69]2a–[70]d). Mirroring findings in adult samples^[71]24,[72]25,[73]35, we found that models driven by geometry and topology outperform the pure geometric spatial model and homophily-based models achieve the lowest energy for our pediatric sample (Fig. [74]2e). In other words, when one combines the distance penalty with the “matching rule” we described in our concrete example (as shown in Fig. [75]1), it produces the most plausible simulated brain networks. This difference between generative rules is extremely robust. A post hoc power calculation revealed that the homophily-based rules could be distinguished from the next best class with near-perfect statistical power (t = −10.210, p = 6.705 × 10^−21; N = 270, power > 0.99), and that this difference could be detected with around 70 participants. Fig. 2. Sample-averaged energy landscape visualization and generative rule comparisons. [76]Fig. 2 [77]Open in a new tab a Homophily-based methods. Matching and neighbors algorithms calculate a measure of shared neighborhood between nodes. b The spatial method. This ignores γ entirely, judging networks only on the basis of their spatial relationship. c Clustering-based methods. These calculate a summary measure between two nodes in terms of their clustering coefficients. d Degree-based methods. These calculate a summary measure between two nodes in terms of their degree. e Energy statistics from the best performing simulation across 13 generative rules, showing that matching can achieve the lowest energy networks given the appropriate parameter combination. In total, there are N = 270 data points for each of the 13 boxplots. A tabulated form of this figure is provided in Supplementary Table [78]1. The boxplot presents the median and IQR. Outliers are demarcated as small black crosses, and are those which exceed 1.5 times the interquartile range away from the top or bottom of the box. f A further 50,000 simulations were undertaken in the refined matching window, as these defined boundary conditions for which low-energetic networks were consistently achieved. Each cross represents a subject’s individually specific wiring parameters that achieved their lowest energy simulated network. It is notable that across the matching energy landscape, these plausible networks exist within an extremely narrow window of parameter solutions. That is, as a proportion of the parameter space, the matching rule (and the other homophily-based model “neighbors”) contain the least number of low-energy networks relative to other rules. But as Fig. [79]2e shows, these networks are the closest to real networks. Thus, varying homophily-based parameters produces the most realistic networks, yet has the lowest variability in the space (Supplementary Fig. [80]2). While small, variability within this narrow matching window determines inter-individual differences in brain network growth. This is because small changes in parameters (i.e., the magnitude and direction in which costs and values influence wiring probabilities) can lead to networks which are diverse yet include basic structural properties common to all subjects. To derive more precise estimations of optimal generative parameter combinations, we subsequently generated a new set of 50,000 evenly spaced simulated networks over this narrow low-energy matching window (−3.606 ≤ η ≤ 0.354, 0.212 ≤ γ ≤ 0.495). Focusing on this energy crevasse allows us to detect individual differences in optimal parameter combinations with much greater specificity. In other words, we resampled the parameter combinations focusing within the low-energy window, to make sure we have the most precise estimate of each individual child’s optimal parameters. In Fig. [81]2f, we show the spatial distribution of these top performing parameter combinations and Supplementary Table [82]2 documents their summary statistics. These finely calibrated networks are even more low energy than in the previous analysis. In Supplementary Fig. 3a–d we detail how KS statistics vary across the same space. Importantly, due to the stochastic nature of GNMs, the energy of optimal parameter combinations varies with an average standard deviation (SD) of 0.045 across the sample (1000 independent runs). Therefore, for the rest of this study, we quote our parameter analyses averaged across a variable number of wiring parameters which achieved networks with the lowest energy in the space: N = 1 (equating to 0.002% of the space) N = 10 (0.02%), N = 100 (0.2%), and N = 500 (1.0%). The optimal η and γ parameters are significantly negatively correlated with each other, such that subjects with large γ parameters tend to have larger negative η (Best N = 1 network: r = −0.284, p = 2.07 × 10^−6; N = 10 networks: r = −0.403, p = 6.08 × 10^−12; r = −0.460, p = 1.58 × 10^−15; N = 100 networks: p = −0.460, p = 1.58 × 10^−15, N = 500 networks: r = −0.497, p = 3.21 × 10^−18) (Supplementary Fig. [83]3f). Optimally simulated networks, using this simple wiring equation, are so similar to the actual networks that a support vector machine is unable to distinguish them using the parameters from the energy Eq. ([84]2) (mean accuracy = 50.45%, SD = 2.85%). Replicating previous work, we find that our simulated networks, optimized via the statistical properties included in the energy Eq. ([85]2) via homophily generative mechanisms, accurately capture these properties in observed networks^[86]24,[87]25,[88]36. But do these capture crucial network properties not included in the energy equation, like their spatial embedding? We next examined if the spatial patterning of these network properties arises simply from the generative model. Averaged across the sample, optimally performing generative models (i.e., those using the “matching” rule) produce networks which significantly correlate with observed networks in terms of their degree (r = 0.522, p = 4.96 × 10^−5), edge length (r = 0.686, p = 1.11 × 10^−11), and betweenness centrality (r = 0.304, p = 0.012) but not clustering coefficient (r = −0.054, p = 0.663) (Fig. [89]3). That is, the spatial embedding of these network properties seemingly emerges, to mirror those of the observed networks, despite this not being specified in the growth process. We extended this analysis to new measures outside of the energy equation (Supplementary Fig. [90]4). While local efficiency and assortativity cannot be significantly predicted across the sample (r = 0.211, p = 0.084 and r = −0.096, p = 0.116, respectively), optimally performing simulated and observed networks correlate positively in terms of their global number of rich clubs (r = 0.316, p = 1.11 × 10^−7), maximized modularity (r = 0.349, p = 3.84 × 10^−9), and transitivity scalar (r = 0.411, p = 2.11 × 10^−12). In short, despite not being specified in the growth process, the simple homophily rule generates many properties of observed brain networks. Fig. 3. Spatial embedding of simulated networks grown via optimized homophily generative mechanisms. [91]Fig. 3 [92]Open in a new tab For each network measure, we present the cumulative density functions across all observed versus simulated nodes within each network. Each point in the scatter plot shows one of the 68 across-subject average nodal measures from the observed and optimally simulated networks. We also show a visualization of these measures. All statistics were computed via two-tailed linear correlations, quoting the Pearson’s correlation coefficient. a Degree between observed and simulations are significantly positively correlated (r = 0.522, p = 4.96 × 10^−5). b Clustering between observed and simulations at not correlated (r = −0.054, p = 0.663). c Betweenness centrality between observed and simulations are significantly positively correlated (r = 0.304, p = 0.012). d Edge length (as a summation of all edges from each node) between observed and simulations are significantly positively correlated (r = 0.686, p = 1.11 × 10^−11). Boldened values are significant correlations at p < 0.05. In Supplementary Fig. 4, we present a parallel analysis including local and global measures not included in the energy equation. In Supplementary Fig. 5, we demarcate for each measure the generative error in spatial embedding, and show the ranked performance for each region. One criticism of our simulations is that their embedding may be an artifact of the seed network (which is on average 10.8% the density of the observed/simulated networks). In short, if by chance the seed network mirrors the final network, it could be inevitable that spatial embedding would emerge, in terms of node degree, betweenness centrality and edge length. If so, one would expect initial local statistical properties of the seed to be significantly associated with regional accuracy of the simulation. To determine if this is the case, we analyzed the regional accuracy of our homophily simulations by determining their regional generative error (as a mismatch between observed and simulated outcome, depicted in Supplementary Fig. [93]5). Importantly, the average ranked error (Supplementary Fig. [94]5e) is not correlated with the seed network’s connectivity (r = −0.0711, p = 0.5643). Furthermore, seed features do not correlate with their own feature’s resultant error (Degree, r = −0.0408, p = 0.7410; Betweenness, r = 0.1833, p = 0.1345; Edge length, r = 0.1114, p = 0.3659). The large heterogenous sample we chose is ideal for this computational approach to understanding diversity, but it is highly likely that this approach will work for more standard typically developing cohorts of children. In Supplementary Figure 6 we replicate our key findings in an independent sample of N = 140 children recruited from local primary schools in the same area (for more details about the cohort, see “Methods”; “Participants” and Johnson et al.^[95]37). Individual differences in wiring parameters mirror connectome organization, gray matter morphology, and cognitive scores A critical benefit of a generative modeling approach is that it allows us to probe the underlying mechanisms occurring over the development of the network^[96]38. As one explicitly specifies the generative mechanisms involved, the statistical properties which “fall out” of the network can be considered as spandrels^[97]39; epiphenomena of the network’s development according to the much simpler economical trade-off (in this case, according to the homophily principle). If the generative model is indeed capturing biologically relevant processes, one would expect the wiring equation—at a minimum—to reduce the network’s dimensionality simply into the two wiring parameters used to construct it. Under this view, individual differences in wiring parameters should (1) map to wide-ranging statistical properties of the observed network, which may be considered spandrels and (2) appropriately reduce dimensionality of the connectome such that, for example, one can equivalently predict cognitive scores from parameters as one would be able to from the network properties. To explore this, we first examined how wiring parameters reflect observed features of brain organization by quantifying how a subject’s η and γ relate to global measures of their observed connectome. Furthermore, for all 270 subjects, cortical morphology data were available. In Fig. [98]4, we document how global network and morphological measures (most of which are not included in the energy equation) relate to each other, in addition to their reasonably stable association with a varying number of high performing η and γ wiring parameters (specific results are provided in Supplementary Table [99]3), and their associations with age. Figure [100]4b shows that η is significantly associated with age—the network parameters needed to form optimal networks over time need to favor longer distance connections for older participants, relative to younger participants^[101]25. To disentangle age-related parameter differences from individual differences, we repeated all of our correlations across measures whilst partialling out age (Supplementary Fig. [102]7). Associations remain when age has been controlled for, demonstrating that age-related changes in optimal parameters are relatively independent of the individual differences in those parameters. This is not only an important step in demonstrating that these parameters generalize to distinct measures (e.g., morphological observations) not used to train the generative models, but also demonstrates that the generative approach is consistent with the notion that wiring parameters themselves have significant associations with numerous statistical properties in a network. Fig. 4. Statistical properties of the connectome and cortical morphology, and their relationships with wiring parameters and age. [103]Fig. 4 [104]Open in a new tab a The correlation matrix of connectome and morphological findings show how each measure correlates with every other measure. Measures 3–6 were included in the energy equation. Measures 7–11 are connectome measures not included in the energy equation. Measures 12–19 are cortical morphological measures. η and γ are each significantly correlated with a range of measures, both inside and outside of the energy equation. Correlation coefficient matrices are shown, the bottom row of which is highlighted and is reflected in the above radar plots (middle), in addition to the significance matrix (bottom), across varying numbers of top performing parameters, for each of the 19 measures investigated. b Radar plots depict the correlations between all measures and η (left) and γ (right) averaged across the top N = 500 parameters in the parameter space. All statistics were computed via two-tailed linear correlations, quoting the Pearson’s correlation coefficient. The asterisk, *, reflects significant correlations at p < 0.05. Note, the inner edge of the radar plot reflects negative correlations and the outer edge reflects positive correlations. Specific results for variable top performing parameters are provided in Supplementary Table [105]3. Further scatter plots are provided highlighting the relationship of wiring parameters with age. η has a significantly positive relationship with age (r = 0.325, p = 4.518 × 10^−8) while γ has a weak non-significant negative relationship with age (r = −0.117, p = 0.054). Next, we tested the ability of the wiring equation to reduce the dimensionality of the connectome. Specifically, if wiring parameters are accurate decompositions of an individual’s structural network they should predict cognitive outcomes equivocally to observed features of the connectome. For all 270 subjects we had data from a battery of cognitive tasks, including measures of executive function, phonological awareness, working memory, fluid reasoning and vocabulary (for details of the tasks see “Methods”; “Cognitive and learning assessments”). We tested the relationship between a subject’s age-standardized cognition and (1) their optimal wiring parameters and (2) global measures of their structural connectome (using measures included in the energy equation). This was done using partial least squares (PLS), a multivariate statistical technique which extracts optimally covarying patterns from two data domains^[106]40. We undertook two separate PLS analyses, which correlated (1) optimal wiring parameter combinations or (2) global connectome measures across our sample, with cognitive performance in the nine tasks, respectively (Fig. [107]5a). For both analyses, PLS1 was significant in the amount of explained covariance (p[cov] = 0.009 and p[cov] = 0.049, respectively). PLS1 score predictions, and their cognitive loadings, are extremely similar between wiring parameters and connectome features (r = 0.191, p = 1.63 × 10^−3, p[corr] = 7 × 10^−4 and r = 0.210, p = 5.29 × 10^−4; each p[corr] = 7 × 10^−4 and p[corr] 4 × 10^−4, respectively, from 10,000 permutations of scores) (Fig. [108]5b, c). Fig. 5. Covarying patterns of wiring parameters and connectome features with cognitive performance across nine cognitive tasks. [109]Fig. 5 [110]Open in a new tab a A visual representation of the two PLS analyses undertaken. b There is a significant positive correlation (two-tailed linear correlation, quoting the Pearson’s correlation coefficient) between parameter scores and PLS-derived cognitive scores. PLS1 was statistically significant (p[corr] = 7 × 10^−4 and p[corr] = 4 × 10^−4, respectively) for both analyses using n = 10,000 permutations. Each parameter loads with similar magnitude onto PLS1. c There is an analogous significant positive correlation between connectome scores and PLS-derived cognition scores, using the same statistical procedure. Variability in neurodevelopmental trajectories arises through value-updating over time While small generative parameter differences result in differential network properties, we have yet to show how this variability may occur over the development of the networks. That is, how do differences in parameter combinations across subjects manifest themselves when the network is developing? To address this, we examined how between-subject variability in optimal GNMs emerge at the level of cortical nodes and their connections. This is possible by simply decomposing the optimal simulation into its constituent parametrized costs (D[i,j])^η, values (K[i,j])^γ, and wiring probabilities (P[i,j]) at each time point, for each subject (Fig. [111]6a, b). This allows us to quantify growth trajectories and thus establish which aspects of network emergence vary most in the sample. Fig. 6. Wiring Eq. ([112]1) decomposition and the subsequent variability across subjects in our heterogeneous sample. [113]Fig. 6 [114]Open in a new tab a For each subject, a simulated network is produced by minimizing the energy between the observed and simulated network. Here, we present visualizations for subject one (red). b Costs (D[i,j]) are static, while values (K[i,j]) dynamically update according to the matching rule, which enables the computation of wiring probability (P[i,j]). c The mean and standard deviation for each subject of their edge-wise parameterized costs, d parameterized values and e wiring probabilities. f Histograms of each subject’s coefficient of variation (CV) showing that subjects are more variable in their value-updating compared to costs, leading to large wiring probability variability. g Regional patterning of sample-averaged nodal parameterized costs and values, showing highly “valuable” patterning in the left temporal lobe and “cheap” regions generally occupying medial aspects of the cortex. Variability declines as value increases, but increases for costs. For each subject, we computed the coefficient of variation (CV, σ/μ) of their parameterized costs, matching values and wiring probabilities to compare subject-specific variability, as it emerges throughout the simulated growth of connectomes. While subjects exhibit some variability in how parameterized costs influence wiring probabilities (mean CV 2.27), this is dwarfed by their parameterized values over time (mean CV 33.02). This is because the matching value is dynamic, changing at each iteration (as in Fig. [115]1d), unlike relative Euclidean distance between nodes which is static. The result is that significant inter-individual variability arises in the probability of connections forming (mean CV 53.08), leading to the emergence of divergent brain organization (Fig. [116]6c–f). Furthermore, the regional patterning of costs and values is not random (Fig. [117]6g). Nodes and edges with high matching values decline in their variability, suggesting a consistency across subjects in highly “attractive” nodal structures and their connections. Across the sample, cheaper regions occupy the medial aspects of the cortex while highly valuable regions generally reside in the temporal cortex. Genomic patterning of network growth Underlying these macroscopic changes in brain organization across time are a series of complex molecular mechanisms. These are partly governed by genetically coded processes that vary across individuals. We next tested whether these processes may steer the brain network toward a particular growth trajectory within our GNMs. Nodal cost and nodal “matching” value patterning alongside regional gene expression profiles of 10,027 genes using human adult brain microarray data^[118]41,[119]42 were integrated into two PLS analyses for each subject. For all analyses, gene expression scores at each node were used as the predictor. For each subject’s first analysis, their parameterized nodal costs (calculated as the subject’s ∑ (D[i],:)^η, as visualized Fig. [120]6g; fourth row, right) was used as the response variable. For each subject’s second analysis, their mean parameterized values (calculated as the subject’s (∑ (K[i],:)^γ averaged over time, as visualized in Fig. [121]6g; second row, right) was used as the response variable. Each analysis defined PLS components independently which were linear combinations of the weighted gene expression scores at each node (predictor variables) that were most strongly correlated with the subject’s nodal costs and nodal values of their simulated growth trajectory. To limit the variability across regions in terms of the samples available, only left hemispheric gene data were analyzed^[122]42. Across our sample, the first PLS component (PLS1) explained on average 65.0% (SD 1.3%) and 56.9% (SD 9.2%) of the covariance between genetic expression and nodal costs, and nodal values, respectively. The average nodal costs PLS1 score significantly correlates with average nodal costs (r = 0.794, p = 2.07 × 10^−8, p[corr] = 2 × 10^−4). Similarly, the average nodal values PLS1 score significantly correlates with average nodal values (r = 0.718, p = 1.71 × 10^−6, p[corr] = 1 × 10^−4) (Fig. [123]7a, b). To then characterize the genetic profiles associated with each PLS analysis, we permuted the response variable 1000 times to form a null distribution for the loading of each gene, across each subject’s PLS1. This provides an estimate of how strong the loading would be by chance, and thus which genes exceed p[corr] < 0.05. Across subjects, PLS1 provided an average of 581.5 significant genes (SD 101.4) for nodal costs and 437.6 significant genes (SD 167.4) for nodal values (Supplementary Fig. [124]8a). Fig. 7. Over expressed genes which explain variance in brain wiring across subjects. [125]Fig. 7 [126]Open in a new tab Both PLS1 components across subjects are enriched for functionally specific biological processes and cellular components. Node size represents the number of genes in the set. The edges relate to gene overlap. a Sample-averaged parameterized costs significantly correlates with sample-averaged PLS1 nodal gene scores, explaining on average 65.0% covariance. Statistics were computed via two-tailed linear correlations, quoting the Pearson’s correlation coefficient, followed by n = 10,000 permutations. b Sample-averaged parameterized values significantly correlates with sample-averaged PLS1 nodal gene scores explaining on average 56.9% covariance. Statistics were computed via two-tailed linear correlations, quoting the Pearson’s correlation coefficient, followed by n = 10,000 permutations. c Nodal costs PLS1 is enriched for genes predominantly associated with protein localization, catabolic processes, and ribosomal/membrane cellular components. d Nodal values PLS1 is enriched for genes predominantly associated with synaptic signaling, neuronal projection and synaptic membranes. Genes do not act in isolation, but instead converge to govern biological pathways across spatial scales. To move from individual genes to biological processes (BPs) and cellular components (CCs), we performed a pathway enrichment analysis^[127]43. Pathway enrichment analysis summarizes large gene sets as a smaller list of more easily interpretable pathways that can be visualized to identify main biological themes. Genes were ordered according to their frequency in being significantly associated with connectome growth across subjects for that component. For example, for nodal values PLS1, top of the list was the gene associated with connectome growth in the most subjects (CHI3L1; significant for 49.4% of our sample), the next was the second most frequent gene (PRKAB2; 36.4% of our sample) and so on. Our list stopped when genes were significant for <10% of the sample. This left the nodal costs PLS1 with a list of 1427 genes and the nodal values PLS1 with a list of 1584 genes ordered in terms of importance, which were submitted to pathway enrichment analysis in g:Profiler ([128]https://biit.cs.ut.ee/gprofiler/gost) (Supplementary Fig. [129]8b)^[130]43. g:Profiler searches a collection of gene sets representing GO terms. In the ordered test, it repeats a modified Fisher’s exact test on incrementally larger sub-lists of the input genes and reports the sub-list with the strongest enrichment. Multiple-test correction is applied to produce an adjusted p value (p[adj]) for each enrichment^[131]43,[132]44 (as visualized in Supplementary Fig. 8c, d, which can be accessed via the links presented in Supplementary Table [133]4). The genes identified within the subject-wise PLS are not random, but instead converge on particular BPs and CCs. The nodal costs PLS1 was most prominently enriched for genes associated with BPs including catabolic processes and protein localization (32 BPs; all p[adj] < 9.58 × 10^−3), cell projection (14 BPs; all p[adj] < 4.39 × 10^−2), immunological processes (34 BPs; all p[adj] < 4.82 × 10^−2), regulation of metabolic processes (8 BPs; all p[adj] < 4.75 × 10^−2), and regulation of cell development and differentiation (4 BPs; all p[adj] < 3.87 × 10^−2). In terms of CCs, nodal costs PLS1 was enriched for genes associated with the ribosome (14 CCs; all p[adj] < 2.15 × 10^−2), vesicular and endoplasmic membranes (19 CCs; all p[adj] < 4.90 × 10^−2) and intracellular organelles (8 CCs; all p[adj] < 4.97 × 10^−2) (Fig. [134]7c). The nodal values PLS1 was most prominently enriched for genes associated with BPs including synaptic signaling (29 BPs; all p[adj] < 3.96 × 10^−2), neuronal projection and development (26 BPs; all p[adj] < 4.21 × 10^−2), and synapse organization (2 BPs; all p[adj] < 2.92 × 10^−2). In terms of CCs, nodal values PLS1 was enriched for genes associated with synaptic membranes (60 CCs; all p[adj] < 3.15 × 10^−2) and ion channel complexes (7 CCs; all p[adj] < 1.18 × 10^−2) (Fig. [135]7d). In Supplementary Table [136]4 we provide links so that readers can run our precise gene ontology (GO) queries within a browser and in Supplementary Fig. 8c, d we show a visualization of these enriched gene sets. Discussion Diversity in macroscopic human brain organization can be modeled using a generative network. The generative framework does not include time itself as an explicit parameter, but instead models it as a sequence of processes, optimizing its connectivity by renegotiating its costs and value^[137]24,[138]25 continuously over iterations. Despite the simplicity of this equation, it results in the dynamic updating of wiring probabilities over time, with multiple network properties, like spatial embedding, being an emergent property of this dynamic updating. This resonates with theoretical perspectives that implicate dynamic interactions between brain systems over development in progressive, integrative, specialization^[139]45. We have formalized this process in the context of neurodevelopmental diversity; offering a new perspective on the formation of organized macroscopic networks, their possible biological underpinnings, and their association with functional outcomes like cognitive performance. This reflects a theoretical step-change in understanding diversity in neurodevelopment, being sufficiently well-specified to generate macroscopic brain networks. In turn, this formalization allows for the unpacking of the computational and/or biological constraints that shape the trajectories of networks. Indeed, we anticipate that GNMs may be a powerful tool to model real and biologically feasible artificial networks across many scales. Small changes in wiring parameters of the GNM lead to divergent macroscopic brain networks, with systematically different network properties. Within the model, the key factor that drives individual differences in growth trajectory is the dynamic nature of updating preferences over time. Specifically, as nodes form new connections this