Abstract The tangential expansion of the human cerebral cortex, indexed by its surface area (SA), occurs mainly during prenatal and early postnatal periods, and is influenced by genetic factors. Here we investigate the role of rare copy number variants (CNVs) in shaping SA, and the underlying mechanisms, by aggregating CNVs across the genome in community-based cohorts (N = 39,015). We reveal that genome-wide CNV deletions and duplications are associated with smaller SA. Subsequent analyses with gene expression in fetal cortex suggest that CNVs influence SA by interrupting the proliferation of neural progenitor cells during fetal development. Notably, the deletion of genes with strong (but not weak) coexpression with neural progenitor genes is associated with smaller SA. Follow up analyses reveal similar mechanisms at play in three clinical CNVs, 1q21.1, 16p11.2 and 22q11.2. Together, this study of rare CNVs expands our knowledge about genetic architecture of human cerebral cortex. Subject terms: Nervous system, Development of the nervous system __________________________________________________________________ Variation in cortical surface area in adults can reflect developmental events occurring during prenatal and early postnatal periods. Here, the authors find rare copy number variants associated with cortical surface area, which are also found to disrupt neural progenitor proliferation during fetal development. Introduction The human cerebral cortex, essential for multiple perceptual, motor, and cognitive processes, is a highly folded sheath of tissue with about 1800 cm^2 of surface area^[44]1. Inter-individual variations in the cortical surface area (SA) contribute to the variations in cognitive abilities and vulnerability to mental illness^[45]2–[46]4. The cerebral cortex expands dramatically during prenatal periods and the first two years of life, with subsequent age-related increases in SA being limited^[47]1,[48]5–[49]7. Thus, individual variations in SA observed in adulthood reflect mainly developmental events occurring during the prenatal and early postnatal periods. The cortical expansion during the prenatal period has been of great interest for decades. During fetal development, radial glia can self-renew and generate intermediate progenitor cells (IPCs), which can self-amplify prior to generating neurons^[50]8–[51]11. Based on a large body of work carried out in non-human primates, the radial unit hypothesis proposed that the proliferation of neural progenitor cells plays a central role in the tangential growth of cerebral cortex^[52]12. Increasing the proliferation of progenitor cells experimentally in the mouse embryo results in more neurons and a larger SA^[53]13. Tangential expansion of the cerebral cortex is largely influenced by genetic variations, with about 90% heritability of SA as estimated in twin studies^[54]14. Genome-wide association studies (GWASs) have identified hundreds of loci related to SA of the adult cerebral cortex. These common genetic variants are enriched in regulatory activities in neural progenitor cells, which is consistent with the radial unit hypothesis^[55]15–[56]17. Yet, most of these genetic variants reside in non-coding regions, posing a challenge regarding whether and how these variants influence gene expression and biological processes. In contrast, rare copy number variants (CNVs) with protein-coding gene(s) fully deleted or duplicated lead to substantial decrease and increase, respectively, in gene expression in the brain^[57]18. Thus, investigating the effects of CNVs on SA can provide valuable insights into the intricate relationship between gene expression and cortical expansion. Previous research on the association between CNVs and SA has focused on a few specific genomic loci, such as 1q21.1, 16p11.2, and 22q11.2, which confer high risk for psychiatric disorders such as autism and schizophrenia^[58]19–[59]23. These studies have reported large effects of these CNVs on SA but identifying the contributing genes within these multigenic variants remains challenging. Beyond these specific CNVs, the majority of CNVs are very rare (<1/5000), and, as such, their individual effects on the brain are difficult to assess with currently available sample sizes. A common strategy to overcome this limitation has been to aggregate rare CNVs across the genome based on gene content and characteristics (e.g., intolerance to loss-of-function) to test association with phenotypes such as cognitive abilities and autism^[60]24–[61]27. In this work, we investigated the relationship between CNVs and SA by pooling data from three community-based cohorts (IMAGEN, Saguenay Youth Study, and the UK biobank). By aggregating - across the genome - all CNVs (≥ 50 kilobases) based on genes’ intolerance to loss-of-function, we find that genome-wide deletions and duplications are associated with smaller SA. By utilizing virtual ontogeny and expression Quantitative Trait Loci (eQTLs) from the fetal cortex, we then test the explanatory power of the radial unit hypothesis vis-à-vis the development of the human cerebral cortex. Lastly, we apply similar approaches to gain insights into the developmental mechanisms underlying the effects of three specific CNVs related to neurodevelopmental disorders, namely 1q21.1, 16p11.2, and 22q11.2, on cortical expansion. Our findings suggest that CNVs influence SA by interrupting the expression of genes involved in neural proliferation in the fetal period. Results CNVs and surface area Across the genome, all CNVs that were ≥ 50 kilobases and fully encompassed at least one protein-coding gene were included in this study (“Methods”). To estimate the association between genome-wide CNVs and cortical SA, we conducted mega-analyses using data from three community-based cohorts: the IMAGEN study, Saguenay Youth Study (SYS), and the UK Biobank (UKBB), including a total of 39,015 individuals with both magnetic resonance imaging (MRI) of their brains and genome-wide assessment of CNVs (Fig. [62]1A). In this dataset, we identified 2767 unique protein-coding CNVs, of which 93% were present in fewer than five individuals (Supplementary Data [63]1). Out of the 39,015 individuals, 2733 (1423 females) and 3859 (2011 females) individuals carried, respectively, at least one deletion or one duplication fully encompassing one or more protein-coding genes. Thus, there were 6309 CNV carriers and 32,706 CNV non-carriers. Across the genome, 5822 genes were either deleted (2084 genes) and/or duplicated (4836 genes) at least once (Fig. [64]1A and Supplementary Data [65]2). Fig. 1. Genome-wide CNVs and cortical expansion. [66]Fig. 1 [67]Open in a new tab A Demographic of each cohort and the number of unique protein-coding genes being deleted and duplicated in the pooled dataset. B Workflow of the virtual ontogeny. We first estimated the effect sizes of genome-wide CNVs on cortical SA of 11 regions using data from community-based cohorts. Subsequently, “virtual ontogeny” analyses were conducted to investigate the underlying neurodevelopmental mechanisms by testing the relationships between the CNV-SA effect size profiles and the expression of cell-specific genes in the mid-gestational fetal cortex. C Estimated effect sizes of the burden of CNVs deletion/duplication on normalized SA (Z-score) in community-based cohorts (N = 39,015). We used the sum of 1/LOEUF scores of genes within CNVs for each individual to quantify the burden of CNVs. The centered dots represent the estimated effect size for each region. The error bars represent the standard error of the estimated effect size for each region. D Results of the virtual ontogeny. The dashed vertical line represents the mean biweight midcorrelation within each cell type. The gray box around zero represents 95% confidence intervals of the null distribution, which was derived by a permutation approach. The asterisk denotes p < 0.05 (FDR). DFC, dorsal frontal cortex; VFC, ventral frontal cortex; OFC, orbitofrontal cortex; MFC, medial frontal cortex; ITC, inferior temporal cortex; STC, superior temporal cortex; IPC, inferior parietal cortex; M1C, primary motor cortex; S1C, primary somatosensory cortex; A1C, primary auditory cortex; V1C, primary visual cortex. SYS Saguenay Youth Study, UKBB United Kingdom BioBank. Given the rarity of most CNVs with protein-coding genes, testing their individual effects on SA was not feasible with the currently available sample size. Instead, recent studies have aggregated rare CNVs based on general characteristics of involved genes, such as intolerance to loss-of-function, to explore the association of CNV burden with cognitive abilities and autism risk^[68]24–[69]27. Here, we used a well-established measure of genes’ intolerance to loss-of-function, namely, the inverse loss-of-function observed/expected upper-bound fraction (1/LOEUF), to estimate the total CNV burden for each individual^[70]28. Thus, a higher burden score indicates more deletions or duplications of “intolerant” genes in an individual (Supplementary Fig. [71]1). We then quantified the effect sizes of the CNV burden on normalized SA (Z-scored across participants) of 11 cortical regions using linear mixed-effect models. These 11 regions of interest were selected based on the availability of gene-expression data obtained during the prenatal period, which were used in subsequent analysis (Supplementary Fig. [72]2). In addition, for each CNV with 20 or more carriers (35 CNVs), we provide the effect sizes on SA in Supplementary Fig. [73]3. We note that these recurrent CNVs have small effects on phenotypes, likely because they do not include genes that are intolerant to loss-of-function^[74]25,[75]26,[76]29. Estimating such small effects reliably would require substantially larger sample sizes. We found that both CNV deletions and duplications were associated with a smaller SA across the 11 regions (Fig. [77]1C). On average, across the 11 regions, a one-point increase in the burden score for deletions and duplications was associated, respectively, with SA decreases of 0.0065 SD (p = 1.75E-05) and 0.0022 SD (p = 5.81E-04; Supplementary Data [78]3). Similar effect sizes were observed when utilizing the total number of genes deleted or duplicated in an individual as a simple measure of the CNV burden (Supplementary Fig. [79]4). The greater detrimental effects of deletions compared with duplications on SA aligns with previous research on cognitive abilities, where the effect of deletions is three times that of duplications^[80]24. Such difference may be attributed to larger changes in gene expression for CNV deletions compared with duplications^[81]18. Virtual ontogeny for CNV-SA relationships To investigate the neurodevelopmental mechanisms underlying the CNV-related variations in SA, we used a previously established method named “virtual ontogeny” (Fig. [82]1B). This method provides a cellular-level interpretation of the MRI-based phenotype by testing spatial correlations between inter-regional profiles of CNV effects on SA and cell-specific gene expressions in mid-gestation across the 11 cortical regions^[83]3. More generally, similar approaches that test spatial correlations between MRI matrices and molecular maps across the cerebral cortex are used increasingly to explore biological underpinnings of various MRI-based findings^[84]30. In virtual ontogeny, based on a single-cell RNA-sequencing study^[85]31, we selected 200 gene markers for each of 8 cell types present in the fetal cerebral cortex (6–22 gestational weeks): radial glia, intermediate progenitor cells (IPC), microglia, oligodendrocyte progenitor cells (OPC), excitatory and inhibitory neurons, mural cells and endothelial cells. The expression of these cell-specific genes in each of the 11 regions during mid-gestation (12–22 postconceptional weeks) was obtained from the PsychENCODE bulk RNA-sequencing dataset^[86]32. We then tested the spatial correlation between interregional profiles of CNV-SA effect size and the expression profile of genes for each of the eight cell types. To simplify the interpretation of virtual ontogeny, we used absolute values of the effect sizes such that higher values indicate larger CNV-related variations in SA. The virtual ontogeny revealed that cortical regions with higher expression of genes specific to radial glia, IPC, and microglia during mid-gestation, including OFC, MFC, and V1C, showed stronger effects of genomic deletions on SA. On the other hand, regions with higher expression of genes specific to the majority of neuronal and glial cell types showed weaker effects of CNV deletions (Fig. [87]1D, Supplementary Fig. [88]5, and Supplementary Data [89]4). For duplications, the CNV-SA effect size profile showed no spatial correlations with progenitor cells (radial glia, IPCs), and opposite correlations with the differentiated cell types, as compared with deletions. These results were validated by four sensitivity analyses: (1) 100 and 300 gene markers specific for each of the eight cell types, (2) bootstrapped estimation for the distribution of the mean correlation, (3) gene-set enrichment analysis (Supplementary Fig. [90]6 and Supplementary Datas [91]4–[92]6), and (4) gene markers specific to 16 cell types derived from another single-cell transcriptomic dataset for mid-gestational cerebral cortex^[93]33 (Supplementary Fig. [94]7).” Proximal and distal genes in network of progenitor genes To investigate which of the deleted/duplicated genes contribute to the association between genome-wide CNV burden and SA, we were inspired by the omnigenic model, which proposes that a large number of genes expressed in trait-relevant tissues contribute to the heritability of the trait through regulatory networks of core genes^[95]34. Accordingly, we hypothesized that genes that are highly co-expressed with neural progenitor genes (core) in the gene network have a stronger influence on cortical expansion than those distal to this core set of genes. To test this hypothesis, we divided all deleted and duplicated genes in our dataset into two groups based on their levels of spatial coexpression with neural progenitor genes (radial glia and IPC genes) across the 11 regions in the fetal cerebral cortex. Genes with high coexpression were labeled “proximal”, while those with low coexpression were labeled “distal” (Fig. [96]2A). To ensure a balanced sample size for CNV carriers with proximal/distal genes affected, we used the median coexpression value as a threshold (Supplementary Fig. [97]9, Supplementary Data [98]2 and Supplementary Fig. [99]11 for results obtained with a sliding-window approach). We then tested the relationship between the number of deleted/duplicated proximal and distal genes and SA of the 11 regions, as well as the total SA, while controlling for the effects of the other group of genes and genes without expression data. Fig. 2. Effects of genes on cortical surface area in relation to coexpression with neural progenitor genes. [100]Fig. 2 [101]Open in a new tab A Illustration of defining proximal and distal genes. The proximal/distal categorization was based on coexpression levels with neural progenitor genes across the 11 cortical regions. “Proximal” denotes genes with high coexpression, while “distal” refers to genes with low coexpression. B The effect of proximal/distal gene deletions on SA (N = 39,015). Deletions of proximal genes, but not distal genes, correlated with smaller SA. The centered symbols represent the estimated effect size for each region. Error bars represent the standard error of the estimated effect sizes for each region. C Physical protein-protein interactions among the proximal genes. Node size corresponds to the number of interacting partners within the network. The names of genes with more than 19 interactions are shown. D Calculation of eQTL score for each proximal gene. For participants from the UKBB, we derived the eQTL score of a gene based on the gene’s eQTLs identified in the fetal cerebral cortex. E 54 proximal genes showed nominally significant correlations between their eQTL scores and SA. F Stronger relationships between eQTL scores and SA were found in the multimodal compared with the unimodal cortical regions. G The expression of neural progenitor genes in the multimodal and the unimodal cortices. There are 200 genes for radial glia and IPC, respectively. Centerline, median; box limits, upper and lower quartiles; whiskers, 1.5x interquartile range; points, outliers. We found that, on average, the deletion of one proximal gene was associated with a 0.0512 SD reduction in total SA (equal to about 8 cm^2 in our dataset, p = 1.34E-03), and a mean 0.0385 SD reduction across the 11 regions (p = 3.42E-06, Fig. [102]2B). Deletions of the proximal genes had larger effects in “multimodal” regions, including DFC, VFC, OFC, MFC, ITC, STC and IPC which subserve complex perceptual and cognitive processes, than in “unimodal” regions (including M1C, S1C, A1C, V1C; Fig. [103]2B and Supplementary Data [104]7). Deletions of distal genes had no effect on SA (total SA: beta = 0.0027, p = 0.8695; across 11 regions: mean beta = 0.0027, p = 0.593; distal vs. proximal genes: p = 3.5E-04. Supplementary Data [105]7). An exception to this pattern was observed for V1C, where lower SA was associated with deletions of distal genes (beta = − 0.041, p = 0.044) but not of proximal genes (beta = − 0.025, p = 0.201). Note that V1C was the only cortical region with a significant effect of distal genes on SA (Fig. [106]2B). Among the 579 deleted proximal genes, 448 had at least one physical protein-protein interaction with other proximal genes; the most interacting genes were involved in DNA replication and cell cycle, such as H4C1 and CENPA (Fig. [107]2C). Pathway enrichment analysis based on these interacting proximal genes identified a number of biological processes, with the top pathways involving DNA replication and repair (q value = 5.68E-06), metabolism of co-factors and vitamins (q value = 9.39E-06), and cell growth and death (q value = 1.12E-05, Supplementary Dataset [108]8). No pathway enrichment was found for the 479 interacting distal genes (Supplementary Fig. [109]10), suggesting a heterogeneous nature of this group of genes. We found no duplication-related effects on SA for either proximal or distal genes (Proximal: total SA: beta = − 0.0042, p = 0.6406; across the 11 regions: mean beta = − 0.0049, p = 0.074. Distal: total SA: beta = − 0.0046, p = 0.5799; across the 11 regions: mean beta = − 0.0004, p = 0.778; vs. proximal genes: p = 0.321. Figure [110]2B and Supplementary Data [111]7). This is not surprising given that our virtual ontogeny analysis indicated no spatial correlation between the duplication-related variations in SA and expression of genes specific to proliferative cell types. Expression quantitative trait loci of genes and surface area In CNV studies, elucidating the impact of individual genes within multigenic variants poses a challenge. Next, we aimed to answer two questions: (1) whether the prenatal expression of individual genes belonging to the above proximal group predicts cortical SA in the adult brain; (2) as motivated by the above results (Fig. [112]2B), whether the expression of these genes has stronger effects on the multimodal regions compared with the unimodal regions. To do so, in 30,083 CNV non-carriers from the UKBB, we computed an eQTL score as a proxy for each proximal gene’s expression in the fetal cerebral cortex (Fig. [113]2D). We then assessed the correlation between each gene’s eQTL score and SA across the 11 regions using a mixed-effect model. Among 543 proximal genes with eQTL scores, 54 showed nominally significant correlations between their eQTL scores and SA; 4/54 genes (HADHB, PRRT2, TMEM44, CORO1A) survived FDR correction (Fig. [114]2E and Supplementary Data [115]9). In a replication sample of 16,212 individuals from UKBB, two of the four genes showed significant correlation, in the same direction, between their eQTL scores and SA (HADHB: beta = 0.012, p = 0.007; CORO1A: beta = 0.01, p = 0.022; PRRT2: beta = 0.006, p = 0.154; TMEM44: beta = 0.003, p = 0.508). In addition, the effect sizes (betas) of the 54 genes in the replication dataset were correlated with those in the discovery dataset (r = 0.454, p = 5.64E-04; Supplementary Fig. [116]13). To answer the second question, we compared the effect sizes of the relationships between the 54 genes’ eQTL scores and SA of the multimodal and of the unimodal regions. The correlations between eQTL scores and SA were stronger in multimodal (vs. unimodal) regions (Fig. [117]2F. Paired t test: t = − 2.23, df = 53, p = 0.030). Furthermore, to explore the underlying reasons for such unimodal-multimodal differences, we compared the expression of neural progenitor genes in the multimodal and the unimodal regions during mid-gestation. During this developmental period, the multimodal regions showed higher expression of genes specific to radial glia and IPC compared with the unimodal regions (Fig. [118]2G. radial g p = 2.90E-20; IPC: p = 3.26E-05). But note that we did not replicate the difference in the eQTL-SA relationships between unimodal and multimodal regions in the replication sample (t = 0.367, p = 0.715). Specific CNVs and surface area Previous research has reported large effects on SA of specific CNVs, including 1q21.1, 16p11.2, and 22q11.2, known to increase the risk of neurodevelopmental disorders^[119]21. Yet the underlying mechanisms and contributing genes of these multigenic variants remain elusive. Next, we investigated neurodevelopmental mechanisms underlying these CNV-related variations in SA using the same methodology as for genome-wide CNVs. First, we estimated the effect size of specific CNVs, namely 1q21.1, 16p11.2, 22q11.2, on SA by comparing normalized SA of the same 11 regions between CNV carriers (deletion or duplication) and non-carriers from clinical and community cohorts (Supplementary Data [120]10). Compared with the non-carriers, deletions of 1q21.1 and 22q11.2 had smaller SA across the 11 regions (1q21.1: − 1.04 SD, p = 5.23E-07; 22q11.2: − 0.57 SD, p = 1.40E-03), while duplications of 1q21.1 and 22q11.2 showed larger SA (1q21.1: 0.45 SD, p = 5.14E-05; 22q11.2: 0.18 SD, p = 8.26E-03. Figure [121]3A and Supplementary Data [122]3). In contrast, for 16p11.2, deletions displayed larger SA (vs. non-carriers: 0.39 SD, p = 2.59E-03), while duplications showed smaller SA (− 0.54 SD, p = 8.77E-05). Fig. 3. Neurodevelopmental CNVs and cortical expansion. [123]Fig. 3 [124]Open in a new tab (A) Case-control differences in normalized SA between CNV deletion/duplication at 1q21.1 (N deletion/duplication = 36/30), 16p11.2 (N deletion/duplication = 82/75), and 22q11.2 (N deletion/duplication = 69/19) and the control (CNV noncarriers, N = 347). The error bars represent the standard error of estimated group differences for each region. B Virtual ontogeny through the lens of the correlation between CNV effects on SA and expression of cell-specific genes. The asterisk denotes a significant spatial correlation between the group differences in SA and the expression of cell-specific genes (p < 0.05 after FDR). C Coexpression scores between genes within the three CNVs and genes specific to radial glia and IPC. Coexpression scores represent the mean spatial correlations between prenatal expression of a CNV gene and 200 cell-specific genes and thus, range from − 1 to 1. The color and size of squares indicate the magnitude of coexpression. D Effect size of correlations between CNV genes’ eQTL score and SA of the 11 regions. Second, using virtual ontogeny, we found that deletions of 1q21.1 and 22q11.2 and duplications of 16p11.2 and 22q11.2 showed results similar to genome-wide CNV deletion: regions with higher expression of progenitor genes (radial glia and IPC) in midgestation showed stronger effects of CNVs on SA (Fig. [125]3B, Supplementary Fig. [126]6, Supplementary Fig. [127]8 and Supplementary Datas [128]4–[129]6). These findings suggest that the proliferation and differentiation of neural progenitor cells may be a common cellular mechanism underlying the CNV-SA relationships. Third, we reasoned that genes coexpressed with the radial glia and IPC genes in the fetal cortex could be the main contributors to the effect of 1q21.1, 16p11.2, and 22q11.2 CNVs on the cortical expansion. Among the 65 genes (out of the 88 genes within the three genomic loci) with available prenatal expression data, 50 genes were spatially coexpressed with both radial glia and IPC genes across the 11 regions during the midgestation (6/7 genes in 1q21.1, 19/24 genes in 16p.11.2, and 25/34 genes in 22q11.2. Fig. [130]3C). We subsequently tested whether the expression of these 50 genes in the fetal cortex predicts SA of the adult brain by testing the correlation between each gene’s eQTL score and SA of the 11 regions in 30,083 CNV non-carriers from the UKBB. Among the 47 genes with eQTL scores, 6 showed significant correlations between their eQTL scores and the SA (YPEL3, PRRT2, MVP, SCARF2, SNAP29, THAP7. Figure [131]3D and Supplementary Data [132]11). Discussion Leveraging current knowledge of developmental processes underlying cortical development, this genome-wide association study of rare CNVs provides novel insights into the genetic architecture of SA of the human cerebral cortex. We demonstrated that, in general populations, individuals with a higher number of deleted protein-coding genes have a smaller cortical SA. This effect was stronger (1) in cortical regions with higher mid-gestational expression of neural progenitor genes, and (2) for genes that were strongly co-expressed with neural progenitor genes. These findings were extended to common variants affecting transcription, showing that eQTLs of some of these co-expressed genes were associated with SA. Enrichment analyses of these co-expressed genes pointed to diverse pathways involved in cell proliferation, growth, and death, such as DNA replication/repair and metabolism of co-factors and vitamins. Altogether, our findings suggest a model wherein CNVs interfere with the tangential expansion of the human cerebral cortex by altering the expression of genes involved in neural proliferation in the fetal period. Our findings align well with the radial unit hypothesis, which was formulated based on experimental work on non-human primates^[133]12. In the case of the human cerebral cortex, we showed that cortical regions with greater expression of proliferative cells (radial glia and IPCs) in the fetal cerebral cortex were more affected by CNVs. We also revealed that deletions of genes that had strong (rather than weak) coexpression with neural progenitor genes were associated with smaller SA. These effects were stronger in multimodal vs. unimodal cortical regions. The latter finding was corroborated by utilizing eQTLs in a large sample of CNV non-carriers, showing that the associations between eQTLs and SA were again stronger in the multimodal compared with unimodal regions. In addition, as revealed by the enrichment analysis of the proximal genes within a protein-protein interactions network, these genes appear to be involved in crucial cell proliferation pathways, including regulation of the cell cycle, metabolism of co-factors, and vitamins involved in DNA synthesis and repairment. For example, the E2F-enabled inhibition of the pre-replication complex formation pathway is involved in regulating the initiation of DNA synthesis in the early cell cycle (G1/S-phase)^[134]35. The formyltetrahydrofolate biosynthesis pathway is crucial for the synthesis of tetrahydrofolic acid which acts as a cofactor essential for the synthesis or anabolism of amino acids and nucleic acids^[135]36. The Proliferating Cell Nuclear Antigen (PCNA)-Dependent Long Patch Base Excision Repair pathway is a key mechanism for removing damage-containing strands of DNA^[136]37.These results are consistent with previous reports indicating that common genetic variants linked to adult SA were enriched in regulatory activities in neural progenitor cells during fetal development^[137]15,[138]17. According to the radial unit hypothesis and related experimental work, the cortical surface area of a given region depends on the size of the neural progenitor population^[139]13. The self-renewing and amplification of radial glia and IPCs are critical for determining the size of the neuronal population: two radial glial cells, which can self-renew and produce self-amplifying IPCs, could generate more than 80 neurons following 8 rounds of cellular cycles^[140]11. Thus, subtle deviations or variations in the molecular processes underlying the division of progenitor cells may have a profound impact on the tangential expansion of the cerebral cortex, especially in the multimodal regions, which have high expression of neural progenitor genes during midgestation and prolonged developmental trajectory^[141]38. Furthermore, our findings suggest that genes not specific to neural progenitor cells may also influence cortical expansion, likely via intricate interactions within various biological networks. For example, most CNVs’ effects on SA had associations with genes specific to endothelial and mural cells, components of developing cortical blood vessels, which are key regulators of neuronal proliferation^[142]39. In addition, microglia, an integral component of cortical proliferative zones, have been found to regulate the pool size of neural progenitor cells in the fetal cortex: deactivating or eliminating microglia in the fetal cerebral cortex increased the number of neural precursor cells^[143]40,[144]41. In addition to the specific pathways described above, genes proximal to the neural progenitor genes were associated with broad biological pathways involving, for example, macromolecule metabolism and transport. The genes identified as key contributors to the effects of the three specific CNVs on SA are also involved in diverse biological processes with potential direct or indirect influences on neurogenesis, such as the regulation of DNA transcription (THAP7) cellular senescence (YPEL3), and cell adhesion (SCARF2). The above observations align with the concept of omnigenic model, which suggests that complex traits are largely influenced by genes without direct links to the trait^[145]34. In other words, since the normal tangential growth of the cerebral cortex requires coordination among various cellular processes implemented by other systems, genetic variants of non-neuronal processes can have downstream effects on cortical expansion. For instance, perturbations in vascular development can lead to abnormal processes during cortical expansion, including premature neuronal differentiation and mispositioning growth of neural progenitor cells^[146]39,[147]42–[148]44. It is important to note that the generalizability and interpretation of these findings are subject to certain limitations. Our study has focused on 11 cortical regions with available gene-expression data from the prenatal period. Although this limits the generalizability of the findings throughout the cerebral cortex, note that the 11 regions include a number of unimodal and multi-modal regions in all four cerebral lobes. Furthermore, we also included results with regard to the total surface area. Our main analyses were conducted using pooled data from adults (UKBB and SYS parents) and adolescents (IMAGEN, SYS). While the majority of the data were derived from adults, we believe that age at MRI has, at most, a minor influence on generalizing our findings. This is because the most significant cortical expansion occurs during the prenatal and early postnatal periods, reaching ~ 83% of adult size by 2 years of age^[149]45,[150]46. Moreover, the eQTL scores used in our analyses were calculated based on the effects of genotypes on gene expression during fetal development. Lastly, the current study focused on testing the explanatory power of the radial unit hypothesis in the context of the CNVs, leveraging transcriptomic data from the mid-gestational fetal cortex. We acknowledge that other biological process, such as dendritic arborization and myelination during the first two years of life, may also contribute to the CNV-surface area relationship. Taken together, this genome-wide association study of rare CNVs revealed a group of genes that can influence cortical expansion, providing mechanistic insights into the early development of the human cerebral cortex, and the potential impact of rare gene deletions/duplications on this process. Our findings highlight that deficits in neural progenitor proliferation could serve as a parsimonious theoretical framework to unify hypotheses concerning the molecular mechanisms underlying CNV-related brain abnormalities. We believe that this knowledge advances our understanding of how rare genetic variants may affect cognition and mental health. Methods Participants We included data from three community-based cohorts: the IMAGEN study (n = 1661), which recruited 2090 adolescents in four European countries; the Saguenay Youth Study (SYS, n = 1493), which included both children and parents from 486 families in Quebec, Canada, and the UK Biobank (UKBB, n = 35,861). Carriers of the three specific CNVs, namely 1q21.1, 16p11.2, and 22q11.2, were ascertained in several clinical cohorts and UKBB (Supplementary Data [151]10). These CNVs were selected because they are (1) the most frequently identified CNVs in the clinic and reported in our clinical cohorts, (2) known for their risk of neurodevelopmental disorders and detrimental effects on cognition, (3) have large effects on surface area^[152]21, and (4) relatively prevalent in the population^[153]19,[154]22. As described in previous study^[155]21,[156]47, carriers of these specific CNVs were selected based on following breakpoints according to the reference genome GRCh37/hg19: 16p11.2 proximal (BP4-5, 29.6-30.2MB), 1q21.1 distal (Class I, 146.4-147.5MB & II, 145.3-147.5MB), 22q11.2 proximal (BPA-D, 18.8-21.7MB). CNV carriers from clinical cohorts were referred to the genetic clinic either because of the presence of symptoms or because they were relatives of these symptomatic individuals. Control individuals from clinical cohorts do not carry any CNV at these genomic loci. Control individuals from UKBB were non-carriers matched for age and sex with CNV carriers from UKBB. Each cohort received ethical approvals from Local Research Ethics Committees. CNV detection and annotation in community-based cohorts For the three community-based cohorts, we called CNVs using PennCNV^[157]48 and QuantiSNP^[158]49 with the same parameters as previously published: number of consecutive probes for CNV detection ≥ 3, CNV size ≥ 1 Kb, and confidence scores ≥ 15^[159]25. To minimize false discoveries, we merged CNVs detected by PennCNV and QuantiSNP with CNVision. We then applied a CNV inheritance analysis algorithm to concatenate adjacent CNVs of the same type, separated with a gap lower than 150 kb. Subsequently, we selected CNVs for analyses based on the following criteria: confidence score ≥ 30 with at least one detection algorithm, size ≥ 50 kb, and < 50% overlap with segmental duplication. A detailed description of CNV calling is available in the [160]Supplementary Methods. Note that the entire genome was searched without any pre-selection criteria beyond the three parameters reported above. No CNV regions were pre-selected based on gene characteristics, such as known neurodevelopmental effects or clinical significance. Detected CNVs were then annotated using Gencode V19 (hg19) with [161]ENSEMBL. We only considered genes with all transcripts (UTRs, start and stop codons, exons, and introns) fully encompassed within a CNV. Our dataset included a total of 5822 genes that were deleted or duplicated at least once. These genes were then annotated by the inverse loss-of-function observed/expected upper-bound fraction (1/LOEUF) using gnomAD, version 2.1.1.^[162]28. The 1/LOEUF score is available for 5525 genes and ranges from 0.5 (tolerant to loss-of-function variants) to 27.0 (intolerant to loss-of-function variants). We calculated a CNV burden score by summing up 1/LOEUF scores of all genes encompassed in the deleted or duplicated CNVs for each individual. Individuals without CNVs or with no genes encompassed in any CNVs were assigned a score of 0. Sensitivity analysis with the number of genes deleted/duplicated as a CNV burden score was done (Supplementary Fig. [163]4). Regional surface area from magnetic resonance imaging data All T1-weighted (T1w) MRI data (1.0-1.1 mm isotropic resolution) were processed with FreeSurfer. A detailed description of MRI acquisition and processing is available in [164]Supplementary Methods and Supplementary Data [165]12. Briefly, T1w scans were processed through an automated cortical reconstruction pipeline in FreeSurfer (‘recon-all’). Subsequently, the cortical reconstruction outputs were used to extract SA based on a parcellation scheme using FreeSurfer. We then derived SA for the 11 cortical regions that intersect with prenatal tissue sampling from the PsychENCODE Consortium. Further details are available in the [166]Supplementary Methods. Estimating effects of CNVs on regional surface area To estimate the effects of genome-wide CNV burden on SA in the general population, we pooled the IMAGEN, SYS, and UKBB data together and performed linear mixed-effect models with Restricted Maximum Likelihood. In the model, the CNV burden score was the independent variable of interest, and the SA (Z-score) of each of the 11 cortical regions was the dependent variable. Sex, age, scan site, and first 10 genetic principal components were included as fixed effects, and the family ID was the random effect to account for potential confounders. The effect size (beta) estimation with the model was done for deletion and duplication separately. Thus, we obtained one effect size (beta) profile (across 11 regions) for genome-wide CNV deletion and duplication, respectively. Subsequently, we tested the significance of the effect size (beta) profiles across the 11 regions using a permutation approach. Specifically, the permutation test included the following steps: (1) Permuting the SA of the 11 regions across individuals while preserving the inter-regional associations; (2) Calculating betas between the resampled SA and the CNV burden score to obtain a simulated effect-size profile using corresponding models; (3) Repeating steps 1 and 2 10,000 times and then averaging the simulated profiles to derive a null profile; (4) Using paired t test to test if the observed effect-size profile is different from the null profile. Virtual ontogeny To investigate developmental mechanisms underlying the CNV-related variations on cortical expansion at the cellular level, we tested the spatial relationships between the profile of CNV-SA effect sizes and expression of cell-specific genes in the fetal cortex, as introduced in our previous work^[167]3. We first selected 200 genes with highest specificity for each of 8 cell types in the human cerebral cortex during the first and second trimesters based on single-cell RNA-sequencing data (5 donors, 6–22 gestational weeks)^[168]31. The specificity of a gene for a cell type was defined by the proportion of the gene’s expression in that cell type relative to the total expression across all cell types^[169]50. Sensitivity analyses with the top 100 and 300 cell-specific genes, genes specific to 16 mid-gestational cortical cell types from another single-cell transcriptomic dataset^[170]33 were also conducted (Supplementary Data [171]4 and [172]Supplementary Methods). Prenatal expression levels of these cell-specific genes across 11 regions were obtained from the PsychENCODE bulk RNA sequencing dataset (14 donors, 10–22 post-conception weeks; [173]http://development.psychencode.org/). We filtered out genes with low expression (defined as less than 0.5 RPKM mean expression) and adjusted for donor-specific covariates such as sex, hemisphere, RNA integrity number, ethnicity, and sequencing site. Then the median gene expression per cortical region across all 14 donors was used for the subsequent analysis. A detailed description of the processing of single-cell and bulk RNA sequencing data is available in the [174]Supplementary Methods and previous publications^[175]3,[176]32,[177]50. We then tested the spatial correlation between the inter-regional expression profiles of genes specific to a cell type and the profile of CNV-SA effects using a permutation test. Specifically, for a given cell type, we calculated the bicor between the cell-specific genes and CNV-SA effects across the 11 regions. Bicor is median-based rather than mean-based and, thus, is less sensitive to outliers than other correlation methods. The mean of the resulting 200 correlation coefficients was then tested for significance using a permutation approach. The null hypothesis postulates that the cell-specific genes have the same correlations with the CNV-SA effect profile as any random set of genes. Thus, in the permutation test, we randomly selected 200 genes not specific to the cell type and calculated the mean of correlations between these genes’ expression and the CNV-SA effect profile. This was repeated 10,000 times to generate a null distribution. The observed mean correlation coefficient for the cell type was compared against the null distribution using a two-sided test with FDR for the 8 cell types ([178]Supplementary Methods). In addition, we also performed two different sensitivity analyses: (1) bootstrapped estimation for the distribution of the mean correlation and (2) gene-set enrichment analysis. The bootstrap analysis aimed to obtain 95% confidence intervals for the mean correlation between the profile CNV-SA effects and the expression of cell-specific genes and tested if the distribution differed significantly from zero. For each cell type, we bootstrapped the 200 cell-specific genes as well as the 14 donors to account for the uncertainty related to the limited sample of donors. The gene-set enrichment analysis was done with R package “clusterprofile”^[179]51 to test if the cell-specific genes have significant correlations with the CNV-SA effect profile than the other genes available in the bulk RNA dataset. Further details of these sensitivity analyses can be found in the [180]Supplementary Methods. Proximal and distal genes in network of progenitor genes We used prenatal expression data from PsychENCODE to estimate spatial coexpression (across the 11 cortical regions) between each of the deleted/duplicated genes within the genome-wide CNVs and genes specific to radial glia and IPC in the fetal cortical cortex. There were 3646 out of 5822 CNV genes with prenatal expression data. For each gene, we calculated the mean bicor between the expression profile of the gene and the expression profiles of 200 genes specific for radial glia and IPC, respectively. The coexpression with the neural progenitor cells was defined as the absolute value of the average coexpression between the gene and genes specific to radial glia and IPC. Thus, in this unsigned weighted network, a strong connection/link means that a gene has strong coexpression, either negative or positive, with neural progenitor cell genes in the midgestational fetal cortex. Note that genes showing strong coexpression, either negative or positive, had a larger effect size on SA compared with genes with weak coexpression (Supplementary Fig. [181]12). We divided the 3646 genes into 2 groups based on the median of their coexpression values (= 0.193). We termed the genes with higher and lower coexpression as “proximal” and “distal” genes, respectively. Then for each participant from the community cohorts, we obtained the counts of deleted or duplicated genes belonging to the proximal, distal, and other genes (genes without expression data). There were 843 and 1239 participants with at least one proximal and one distal gene deleted, respectively. A total of 5192 and 5112 participants had at least one proximal and one distal gene duplicated, respectively. We used linear models to estimate the effect size of proximal and distal genes on the SA of the 11 regions and the total SA of the whole cerebral cortex (Z-scored). Since many CNVs encompass more than one gene, we included the counts of proximal and distal genes as the independent variable of interest and the count of other genes as one of the confounders (sex, age, scan site, first 10 genetic principal components). This was done for deletion and duplication separately. The significance of the effect size (beta) profiles across the 11 regions for proximal and distal genes and differences between the two profiles was tested using t tests. In addition to the above dichotomous approach, we also used a sliding window approach to obtain a nuanced picture of the relationship between genes’ proximal/distal position to neural progenitors and their effects on cortical expansion. We first divided the 3646 genes deleted and/or duplicated in our dataset into 20 subgroups based on percentiles of their coexpression with neural progenitor genes. We then included the first four subgroups as the first “window” and moved the window forward by a step size of one subgroup. This resulted in 17 windows in total, with Window 1 having the lowest coexpression (mean = 0.034) and Window 17 having the highest coexpression (mean = 0.428) with the neural progenitor genes. On average, there were 241 and 611 unique genes within a window deleted and duplicated at least once, respectively (Supplementary Fig. [182]9). We used this division approach to ensure that each window contained an equal number of genes, thus, making the number of participants with genes within a window deleted/duplicated comparable across windows, and having enough power to estimate effects of genes on SA. For a given window, we obtained the counts of deleted or duplicated genes belonging to the window, and outside the window for each participant from the community cohorts. We then estimated the effect of genes within the window on SA (11 regions and whole cerebral cortex), while controlling the effects of genes outside the window. On average, there were 479 and 1314 participants with at least one gene within a window deleted and duplicated, respectively (Supplementary Fig. [183]9). Protein-protein interaction and pathway enrichment analysis Physical protein-protein interactions for the deleted proximal and distal genes were obtained from the Integrated Interactions Database (IID) version 2021-05 ([184]https://iid.ophid.utoronto.ca/)^[185]52. The submitted gene lists (i.e., 579 deleted proximal and 633 deleted distal genes) were mapped to proteins, and their protein-protein interactions were derived using the default settings of IID. Individual protein interaction networks were then loaded into NAViGaTOR 48 ver. 3.0.17 for further annotation, analysis, and visualization^[186]53. There were 448 proximal and 479 distal genes that had at least one physical protein-protein interaction with other proximal and distal genes, respectively. Following, pathway enrichment analysis on these proximal/distal genes was conducted using pathDip, version 5.0.32.3^[187]54. Expression quantitative trait loci of genes and surface area To investigate which of the proximal genes level of expression in the fetal cortex was associated with the SA, we used an eQTL score constructed from eQTLs of each proximal gene as the proxy for the gene’s expression level in the fetal cerebral cortex. The largest eQTL study available for this purpose (201 mid-gestational human cortices) provides the summary statistics of eQTLs for 15,925 tested genes^[188]55. In this dataset, the 543 (out of 579) deleted proximal genes have at least one eQTL (nominal p-value < 0.01). Then, for each proximal gene, we calculated the eQTL score by summing up genotypes of each CNV non-carrier from UKBB, weighted by the corresponding genotype effect size (beta) derived from the eQTL study (p-value < 0.01). The calculation of eQTL scores was done using PRSice-2 with default clumping parameters (kb = 250, r^2 = 0.1). A higher eQTL score of a gene indicates a genetic predisposition for higher expression levels of the gene in the fetal cerebral cortex. We then used linear mixed models to estimate the effect of these eQTL scores on the SA of each of the 11 cortical regions in the UKBB participants, with sex and the first 10 genetic principal components as covariates and participants as a random effect. Note that the SA and eQTL scores were normalized (Z-score) across individuals before performing the analyses. There were 54 genes that showed nominal significant correlations between their eQTL scores and SA. Following, we used the same mixed effect modal to obtain the effect size (standardized beta) of correlations between the eQTL scores of these 54 genes and SA of unimodal and multimodal regions, respectively. Then paired t-test was used to compare the absolute value of effect sizes of eQTL-SA correlation between unimodal and multimodal regions. To compare the expression of neural progenitor genes in the multimodal and the unimodal regions during midgestation, expression levels of radial glia and IPC genes for all midgestation donors (n = 14) in PsychENCODE were unit-scaled following adjustment for sex, hemisphere, RIN, ethnicity, and sequencing site. Differences in expression between the clusters were evaluated using a linear mixed effects model adjusting for the age of donors (in log2(days)), with random effects for donor id and gene id. Mechanism of specific CNVs and surface area To estimate the effect size on SA of 1q21.1, 16p11.2, and 22q11.2 CNV deletions and duplications, we conducted a linear regression model for each CNV deletion and duplication, separately. In the model, the CNV status (1 for carrier, 0 for non-carrier) was the independent variable, and the normalized (Z-score) SA of each of the 11 cortical regions was the dependent variable. Age, sex, and data sources were included as covariates. Subsequently, we tested the significance of the effect size (beta) profiles across the 11 regions for each CNV using the same permutation approach as for the genome-wide CNVs with FDR across the six genetic variants (deletion/duplication x 3 CNVs). Virtual ontogeny analyses were conducted with the derived effect size profiles to neurodevelopmental mechanisms underlying these CNV-related variations in SA. To identify potential key genes contributing to the CNV-related variations in SA, we first used prenatal expression data from PsychENCODE to identify CNV genes that were spatially co-expressed with the radial glia and IPC genes in the fetal cortex. Note that there are 88 genes within the three clinical CNVs: 8 in 1q21.1, 34 in 16p11.2, and 46 in 22q11.2. Out of these 88 genes, there were 65 genes (7 for 1q21.1, 24 for 16p11.2, and 34 for 22q11.2) with prenatal expression data in the 11 regions. For each of the 65 CNV genes, we calculated the bicor between the expression profile of the CNV gene and the expression profiles of 200 genes specific for radial glia and IPC, respectively. Then the mean of the resulting 200 bicor was tested for significance using the similar permutation approach as described above (“Virtual Ontogeny”). Thus, the mean of correlations between the expression of 200 randomly selected genes and the CNV gene’s expression profile was calculated. This was repeated 5000 times to generate a null distribution. The observed mean correlation coefficient for the cell type (radial glia or IPC) was compared against the null distribution using a two-sided test with FDR for the 65 CNV genes. There were 50 genes spatially coexpressed with both radial-glia and IPC genes across the 11 regions during the midgestation. Following, we tested the correlations between the eQTL scores of 47 out of these 50 genes (3 genes had no eQTL scores) and the SA of 11 regions with FDR correction. Reporting summary Further information on research design is available in the [189]Nature Portfolio Reporting Summary linked to this article. Supplementary information [190]Supplementary Information^ (22.3MB, docx) [191]41467_2025_56855_MOESM2_ESM.pdf^ (9.1KB, pdf) Description of Additional Supplementary Files [192]Supplementary Datasets 1-12^ (604.4KB, xlsx) [193]Reporting Summary^ (3MB, pdf) [194]Transparent Peer Review file^ (2.1MB, pdf) Source data [195]Source Data^ (976.7KB, xlsx) Acknowledgements