Abstract Cataloging the diverse cellular architecture of the primate brain is crucial for understanding cognition, behavior, and disease in humans. Here, we generated a brain-wide single-cell multimodal molecular atlas of the rhesus macaque brain. Together, we profiled 2.58 M transcriptomes and 1.59 M epigenomes from single nuclei sampled from 30 regions across the adult brain. Cell composition differed extensively across the brain, revealing cellular signatures of region-specific functions. We also identified 1.19 M candidate regulatory elements, many previously unidentified, allowing us to explore the landscape of cis-regulatory grammar and neurological disease risk in a cell type–specific manner. Altogether, this multi-omic atlas provides an open resource for investigating the evolution of the human brain and identifying novel targets for disease interventions. __________________________________________________________________ Single-cell transcriptomic and epigenomic atlas resolves cellular complexity and disease risk across the adult macaque brain. INTRODUCTION The cellular and molecular origins of complex human thought and behavior remain largely a mystery. Historically, proposed explanations have centered on the substantial relative size ([54]1–[55]3), prodigious number of cells ([56]4), or the large cortical surface area and thickness ([57]5) of the human brain. These explanations in isolation, however, fail to explain the many uniquely human faculties, nor do they explain the extreme variety and complexity of impairments that accompany human neurodevelopmental, neuropsychiatric, and neurodegenerative disorders ([58]6). The human brain is composed of myriad cell types, and this cellular heterogeneity presumably contributes to our cognitive and behavioral complexity ([59]7, [60]8). Supporting this hypothesis is the observation that the number of distinct cell types in the brain is positively correlated with behavioral complexity across vertebrates ([61]9). In recent decades, it has been proposed that certain aspects of human cognition are supported by specific cell types such as von Economo neurons ([62]10) and “mirror neurons” ([63]11), which have been hypothesized to support social intuition and empathy, respectively. These propositions, however, remain largely untested due to gaps in our understanding of the cellular landscape of the human brain and, crucially, differences in cell type composition and regional heterogeneity among the brains of humans, nonhuman primates, and other animals. In recent years, the application of rapidly developing single-cell technologies to the brain has begun to address these gaps. Single-cell molecular surveys of targeted regions of the mouse and human brain, for example, have revealed specialized species-specific cell types—e.g., rosehip neurons in humans ([64]12)—and regional biases in cell type distribution and function [e.g., ([65]13)]. Such atlases are yielding unprecedented cross-species insights into the cellular architecture supporting the structure and function of the brain ([66]14, [67]15), but the general paucity of comparative nonhuman primate brain atlases has left a conspicuous gap ([68]16). Moreover, much effort has focused on single molecular modalities (e.g., transcriptomics), typically in only one or a few regions, leaving a lacuna in our understanding of the molecular mechanisms underlying cell function across much of the primate brain. Here, we generated a 4.2 million cell (combined) transcriptomic and epigenomic atlas across the brain of the rhesus macaque (Macaca mulatta), the most widely used nonhuman primate model organism for studies of human perception, cognition, aging, and neurological disease ([69]17). These single-cell profiles derive from 30 distinct brain regions that collectively represent major cortical, subcortical, and cerebellar areas involved in sensory, cognitive, emotional, and motor functions. Many of these regions are also implicated in one or more clinically relevant neurological disorders. By integrating measures of gene expression and chromatin accessibility, we discover molecular signatures that define cell types across the macaque brain, characterize their distribution and molecular function across disparate anatomical regions, and nominate sets of cis-regulatory regions that likely contribute to mature cell fate and function across the brain. RESULTS A molecular taxonomy of cell types across the primate brain We generated single-nucleus RNA sequencing (snRNA-seq) data from 30 distinct regions across the cortex, subcortex, cerebellum, and brainstem (N = 5 animals; three females) using three-level single-cell combinatorial indexing RNA-seq (sci-RNA-seq3) ([70]Fig. 1A and table S1) ([71]18, [72]19). With the original sci-RNA-seq3 protocol ([73]20), we generated 1,008,204 single-nucleus transcriptomes from 110 age-, sex-, and hemisphere-matched samples representing 28 brain regions of 10 year-old (mid-adult aged) macaques (N = 3 animals; two females). Over the course of the study, we implemented improvements in nuclei isolation and preservation ([74]21), which increased nuclear transcriptome recovery by ~60% [median unique molecular indices (UMIs), before = 202, after = 320] and, consequently, the number of nuclei passing our UMI threshold. With the improved protocol, we generated an additional 1,702,081 single-nucleus transcriptomes from the right hemisphere of two animals, the vast majority (N = 1,579,908) of which were sampled from 27 brain regions of a single 10-year-old female macaque. Altogether, after applying quality control (QC) filters (Materials and Methods and figs. S1 and S2), we recovered transcriptome profiles for 2,583,967 nuclei (median UMI per cell = 265, median genes expressed per cell = 221; table S2). Fig. 1. Experimental setup and summary of the Macaque Brain Atlas snRNA-seq dataset. [75]Fig. 1. [76]Open in a new tab (A) Schematic of biopsied brain regions for sci-RNA-seq3 experiment. A full list of sampled regions is provided in table S1. arc, arcuate sulcus; cgs, cingulate sulcus; cs, central sulcus; ecal, external calcarine sulcus; iarc, inferior arcuate sulcus; ic, internal capsule; ips, intraparietal sulcus; ls, lateral sulcus; lv, lateral ventricle; p, principal sulcus; rf, rhinal fissure; sarc, superior arcuate sulcus. (B) UMAP visualization of all snRNA-seq profiled cells colored by cell type [with color code shown in (C)]. (C) Barplots showing the log[2] -transformed cell counts (left), regional specificity score (middle), and regional composition [right, with color code shown in (E)] of each cell type. (D) UMAP visualization of all snRNA-seq cells colored by cell type [with color code shown in (E)]. (E) Barplots showing the cell type composition [left, with color code shown in (C)], log[2]-transformed ratio of glutamatergic neurons and GABAergic neurons (middle), and log[2]-transformed ratio of neurons and glial cells (right) of each region. Regions are organized by the regional subclass to which they belong. Controlling for batch effects across sequencing runs (Materials and Methods and fig. S4), we jointly clustered single-cell profiles across all sampled brain regions to identify 17 molecularly distinct cell types, which we refer to as “cell classes” ([77]Fig. 1, B and C). On the basis of established cell markers (fig. S5 and table S3), we annotated these 17 cell classes as either: (i) neuronal cells, including cortical glutamatergic neurons (CAMK2A), cortical GABAergic neurons (GAD1 and GAD2), basket cells (GRID2 and SORCS3), other cerebellar neurons (primarily granule cells; GRM4), medium spiny neurons (DACH1, PPP1R1B, and BCL11B), serotonergic neurons (TPH2), dopaminergic neurons (TH and DBH); or (ii) non-neuronal cells, including microglia (DOCK2), oligodendrocyte precursor cells (OPCs; VCAN), astrocytes (ALDH1A1 and GFAP), oligodendrocytes (MOG and MBP), vascular cells (CFH), and ependymal cells (FOXJ1). Our broad survey also captured four rare cell populations that, to our knowledge, have not yet been identified in other studies: three RBFOX3^+ (NeuN^+) neuron-like populations (marker genes: APOA2, N = 7055 cells; F5, N = 880; KIR2DL1/2, N = 84) and one RBFOX3^− microglia-like population (marker gene: KIR3DL1/2^+, N = 44 cells, also P2RY12^+/PTPRC^+/ENTPD1^+). Given their rarity, we removed these four cell populations from downstream analyses. Hierarchical clustering of cell classes by the top 50 principal components of gene expression largely recapitulated broad ontogenetic relationships, with most neuronal classes clustering together (dopaminergic neurons being the exception) and the two mesoderm-derived classes (microglia and vascular cells) clustering together (fig. S6A). Further, analysis of pathways associated with gene expression of each cell class revealed known molecular physiological processes characteristic of cell classes (tables S4 and S5). For example, we found that gene expression in glutamatergic and GABAergic neurons were associated with synapse assembly/function, oligodendrocyte gene expression was associated with myelination, and microglia gene expression was associated with immune processes including inflammation. Dopaminergic and serotonergic neurons had high activation levels of expected pathways such as tyrosine hydroxylase/catecholamine and serotonin/melatonin biosynthesis, respectively. By sampling across a broad range of anatomical regions within the same individuals, we were able to characterize cellular composition across 30 distinct brain regions—to our knowledge, the most regionally expansive nonhuman primate single-cell brain atlas to date ([78]Fig. 1, D and E). The distribution of major cell classes was balanced between sexes and hemispheres (fig. S7) but differed extensively across regions, reflecting the cellular makeup underlying region-specific functions ([79]Fig. 1E). Unsupervised hierarchical clustering of brain regions according to cell class composition for the most part conformed to broader anatomical categorizations, with regions of the cortex, subcortex, brainstem, and cerebellum usually grouping together (fig. S6B), which was also the case when clustering regions based on the top 50 principal components of gene expression (fig. S6B). Two of these four broad regional classes were composed primarily of a single cell class: In the cortex (N = 16 regions, table S6), glutamatergic neurons were the most abundant cell type (mean = 63.7% of all cells per sample) and outnumbered GABAergic neurons by almost fourfold ([80]Fig. 1E; mean = 17.4%), while the cerebellum (N = 2 regions) was composed almost entirely of cerebellar neurons (mean = 85.1%). In contrast, the subcortex (N = 8 regions) and brainstem (N = 4 regions) were more heterogeneous with respect to their cellular composition, with samples from these regions containing roughly equal proportions of glutamatergic neurons (mean[subcortex] = 25.1%; mean[brainstem] = 25.5%), GABAergic neurons (mean[subcortex] = 20.2%; mean[brainstem] = 23.0%), and oligodendrocytes (mean[subcortex] = 18.5%; mean[brainstem] = 25.5%). We further subdivided the cortical and subcortical samples into “region subclasses” based on neuroanatomical groups (table S1), within which variation in cellular composition was more limited ([81]Fig. 1E). For instance, in the subcortex, medium spiny neurons (MSNs) comprised around half of the cells in the basal ganglia [nucleus accumbens (NAc) mean = 44.7%; caudate nucleus (CN) mean = 60.0% MSNs], while the thalamus was enriched for GABAergic neurons [lateral geniculate nucleus (LGN) mean = 55.7%; mediodorsal thalamic nucleus (mdTN) mean = 43.8%; ventrolateral thalamic nucleus (vlTN) mean = 28.6%]. Our broad survey also captured two rarer, but important, cell classes: dopaminergic and serotonergic neurons. These two classes of neurons collectively represented less than 0.3% of all profiled cells (dopaminergic = 0.14%; serotonergic = 0.12% of all cells) and 0.5% of all neurons (dopaminergic = 0.19%; serotonergic = 0.17% of all neurons), suggesting that targeted approaches that enrich for these cells [e.g., ([82]22, [83]23)] are necessary to identify transcriptional variation among subtypes. Dopaminergic neurons, which are found primarily in the substantia nigra pars compacta at low frequency [1.1% of cells sampled in the midbrain (MB) versus mean 0.1% in other sampled regions], are involved in a range of important processes, including voluntary movement, reinforcement learning, and addiction, and their loss is a neuropathological hallmark of Parkinson’s disease ([84]24). We found that serotonergic neurons were most abundant in the brainstem (mean 0.35% in the four brainstem regions versus mean 0.09% in other sampled regions), where they play a major role in sleep, mood, and appetite, and are key targets of pharmacological therapies for major depressive disorder in humans ([85]25). Regional variation in cell subtype composition To characterize heterogeneity within cell classes, we partitioned the dataset and repeated preprocessing and clustering separately for each of the 17 cell classes. Collectively, we identified 112 distinct clusters (fig. S8 and table S7), using a coarser clustering criterion, and 397 distinct clusters (table S8) using a more fine-grained criterion, which captured neuronal and non-neuronal diversity across the primate brain ([86]Fig. 2A). We refer to clusters at the coarser level as “cell subtypes” and clusters at the finer-grained level as “cell subclusters.” We identified extensive heterogeneity in glutamatergic (39 subtypes) and GABAergic (20 subtypes) neurons primarily found in the cortex and some regions of the subcortex [e.g., hippocampus (HIP) and thalamus], while neurons derived from other noncortical brain regions (e.g., cerebellum and striatum) were transcriptionally distinct and relatively homogeneous within those regions ([87]Fig. 2A). This is due in part to the large number of specialized neurons present in some of these regions, including granule and Purkinje cells in the cerebellum, and medium spiny neurons in the basal ganglia (table S7). Fig. 2. Cell subtype distribution and variation across the brain. [88]Fig. 2. [89]Open in a new tab (A) Barplots showing the region specificity score (i.e., Jensen-Shannon divergence statistic) and composition for cell subtypes (with color code shown in [90]Fig. 1E). (B) Heatmap showing scaled log[2] ratios of GABAergic neuron and astrocyte subtype compositions within cortical region, compared to the average across all regions. Cell subtypes with at least 100 cells profiled are shown in the order of abundance (x axis, left to right) in the cortical regions organized by region subclasses (y axis). The color and direction of each pie correspond to relative enrichment (blue, clockwise) and depletion (red, anticlockwise) of a cell subtype in a region. Log[2] ratios were capped at positive and negative 2 before scaling. (C) UMAP visualizations of GABAergic neurons colored by cell subtype (left) and regional subclass (right). (D) UMAP visualizations of GABAergic neurons colored by cell subtype marker gene expression. (E) UMAP visualization of astrocytes colored by the region with the highest lochNESS, indicating enrichment of a region subclass in the cell’s transcriptional vicinity. LochNESS distribution in a few example regions (occipital lobe, basal ganglia, brainstem, thalamus, and frontal lobe) is highlighted in separate panels as examples. (F) UMAP visualizations of astrocytes colored by lochNESS-derived region-related marker genes. Our systematic approach also allowed us to characterize and compare the regional cellular distribution of non-neuronal subtypes, including those of glia, which have not often been the focus on most single-cell atlases to date ([91]Fig. 2A and table S9). Overall, we identified six astrocyte, two microglial, seven oligodendrocyte, and six vascular cell subtypes, the latter including endothelial cells, smooth muscle cells, pericytes, and both perivascular and meningeal fibroblasts (fig. S8 and table S7) ([92]26). We compared cell subtypes to published datasets using a non-negative least squares (NNLS) approach ([93]19) and found broad correspondence with subtypes observed in human cortical ([94]14), human brain vascular ([95]26), and macaque hippocampal atlases (fig. S10, A to E) ([96]27). To identify cell subtypes or subclusters that were specific or biased toward a single region or set of regions, we calculated a measure of “regional specificity” using the Jensen-Shannon divergence statistic (Materials and Methods) ([97]28, [98]29). Overall, glial subtypes were more evenly distributed across all regions compared to neuronal subtypes ([99]Fig. 2A). This is reflected in lower Jensen-Shannon specificity scores for glial subtypes (mean = 0.20; median = 0.15; range = [0.04,0.81]) compared to cortical neurons (mean = 0.31; median = 0.18; range = [0.08,0.89]). A number of cell subtypes, both neuronal and non-neuronal, were highly region specific. For instance, oligodendrocyte subtype 8, the rarest oligodendrocyte subtype (N = 3439 cells; 1.5% of oligodendrocytes) overwhelmingly derived from the highly myelinated corpus callosum (CC; 93.0% of these cells; [100]Fig. 2A and table S9). Among cortical neurons, GABAergic interneuron subtypes generally exhibited a lower median regional specificities than to glutamatergic neuron subtypes, although there were a number of interneuron subtypes specific to the thalamus (cluster 6) or brainstem (clusters 3 and 16), discussed below. Given that the regional specificity of excitatory neuronal subtypes has been explored in depth in other studies [e.g., ([101]30, [102]31)], we focus here instead on populations that are vital for neuronal signal transduction but for which cellular diversity has not previously been explored across the macaque brain. Specifically, we concentrated on the regional diversity of interneurons, because they are important components of long-range circuitry and have been characterized in a few regions across mice, monkeys, and humans [e.g., ([103]12, [104]14, [105]15, [106]32)], allowing us to both benchmark our atlas and extend current knowledge to understudied regions. We also examine regional distribution among astrocytes, which are crucial for maintaining neuronal homeostasis ([107]33) and are implicated in neurological disorders ([108]34) but have been relatively understudied at the single-cell level. We pursued three main approaches to dissect the regional heterogeneity within interneuron and astrocyte subtypes, discussed in further detail below: (i) quantification of cell subtype composition to identify nuanced differences in detailed regions within the cortex; (ii) identification of regionally specific gene expression programs by analyzing region specific subtypes of interneurons; and (iii) in the case of minimal region specific subtypes, leveraging a recently developed statistic to identify region-specific gene expression patterns in astrocyte subtypes in a cell subtype–agnostic fashion. Within specific regions of the cortex, cell subtype composition differences become more subtle and require focused quantification. As a first approach, for every sufficiently abundant interneuron and astrocyte subtype in the cortex (>100 cells), we calculated the log[2]-transformed ratio of cell subtype composition in a region, compared to the average composition of that subtype across all cortical regions ([109]Fig. 2B). Within the five most abundant interneuron subtypes, we not only note general balance across all cortical regions but also observe a relative enrichment of cluster 2 (PVALB^+) in the occipital lobe [primary visual cortex (V1)] with depletion in regions within the temporal lobe and depletion of cluster 5 (ADARB/PAX6^+) in V1. In the superior temporal sulcus (STS) and middle temporal visual area (MT), there is a strong depletion of astrocyte subtype 3 (LUZP2/GPC5^+) but an enrichment of subtype 6 (KCNIP4/RBFOX1^+). Interneurons are the primary drivers of inhibitory control through the release of GABA (γ-aminobutyric acid) and thus strongly affect neural circuitry. Inappropriate development of GABAergic interneurons and subsequent loss of inhibitory regulation contribute to disorders of neurodevelopment, including epilepsy and autism ([110]35, [111]36). Despite their importance, the molecular identities and distribution of interneuron subtypes across the adult primate brain remain relatively unknown outside of a few regions ([112]15, [113]32, [114]37). Our snRNA-seq sample captured 371,548 GABAergic interneurons corresponding to 20 subtypes. As a second approach, we focused on gene markers of region-specific interneuron subtypes. Eleven interneuronal subtypes were primarily found in the cortex and could be assigned to four primary interneuron groups that are conserved between mouse and human brains ([115]32), marked by SST, PVALB, VIP, and LAMP5 expression (fig. S11). Compared to the cortex, the brainstem and thalamus had a unique distribution of interneuron subtypes ([116]Fig. 2C). Thalamic interneurons, which use feed-forward inhibition to relay and tune visual inputs to thalamocortical neurons, expressed high levels of NTNG1 and RNF220 ([117]Fig. 2D), which is indicative of long-range interneurons in the first-order relay nuclei of the thalamus ([118]38). Sampling across the striatum, which is a critical part of the reward pathway and the largest part of the basal ganglia, a recent single-cell study identified a molecularly unique primate interneuron ([119]32), which was most similar to our GABAergic cluster 18 and represented 15% of interneurons in the CN ([120]Fig. 2A). Astrocytes, the second most abundant non-neuronal cell type in our dataset, are multifaceted support cells of the brain that perform a variety of tasks related to neuronal homeostasis. These tasks can vary across brain regions ([121]33), and astrocyte dysfunction has been linked to neurological diseases, including Alzheimer’s disease ([122]39). Given these regional differences, we examined whether astrocyte subtypes exhibited regional biases in macaque, similar to what has been observed in the mouse brain ([123]40). However, while astrocyte subtypes were widely distributed across multiple regions, the cell clusters did not correspond neatly to regions of origin, making claims about inter-region differences in cell composition difficult to systematically analyze across the many regions profiled. To address this complexity, as a third approach, we adapted our recently developed statistic, lochNESS ([124]41), to quantitatively measure regional enrichment in each cell’s “neighborhood” of transcriptionally similar cells. Briefly, for each cell, we tally the number of cells from each brain region in its neighborhood and calculate a focal regional enrichment score (fig. S12A and Materials and Methods). We illustrate the utility of this approach by calculating the lochNESS score on astrocytes at the level of brain region subclasses. Each cell had 11 lochNESS scores calculated, one for each region subclass, with each such score quantifying the enrichment of the given region subclass in a cell’s transcriptional vicinity. We then identified the most enriched region subclass in a cell’s neighborhood and examined the regional heterogeneity agnostic to the cluster-assigned subtype labels ([125]Fig. 2E). We also extended lochNESS to identify genes whose expression can be predicted by lochNESS scores for given regions. We modeled gene expression as a function of the lochNESS scores of each region in each cell using generalized linear regression (Materials and Methods and table S10). The resulting set of genes with significant positive associations with a region’s lochNESS score have higher expression in, and are putatively markers for, cell subtypes in that region. Using this approach, we identified markers for astrocytes in specific regions (e.g., TCAF2 and FRK in the occipital lobe) and in combinations of regions (e.g., PGD in the brainstem, basal ganglia, and thalamus), which we would not have identified if we focused solely on discrete, computationally defined clusters ([126]Fig. 2F). This strategy thus facilitates the identification of more complex region-specific gene expression patterns. For example, EMID1, which is a marker for a subpopulation of astrocyte-like NG2 cells ([127]42), is more highly expressed in astrocytes in the cortex but not in the thalamus, brainstem, or cerebellum. In contrast, ADAP2, which is involved in protection from RNA virus infections ([128]43), is highly specific to a subset of astrocytes found in the thalamus ([129]Fig. 2F). LochNESS can thus provide a more nuanced approach to identifying regionally biased cell subtypes and gene expression than conventional clustering. While we focused on astrocytes in this example, lochNESS could be iteratively applied to regions within a subclass in each cell class, e.g., for all glutamatergic neurons across all cortical regions or oligodendrocytes across all subcortical regions (fig. S12, B to D). Joint analysis of single-nucleus transcriptomic and epigenomic data To complement our transcriptomic dataset and identify key regulatory genomic regions in brain cells, we applied three-level single-cell combinatorial indexing ATAC-seq (sci-ATAC-seq3) ([130]44, [131]45) to profile single-nucleus assay for transposase-accessible chromatin (ATAC) with sequencing (snATAC-seq) epigenomes from nearly all of the brain regions represented in our snRNA-seq dataset. To maximize comparability among datasets, we used 110 of the same age-, sex-, and hemisphere-matched tissue samples (representing the same three animals) profiled in our snRNA-seq dataset. To ensure that the snRNA-seq and snATAC-seq datasets captured the same heterogeneous populations of cells, we homogenized tissue samples on dry ice before separately preparing separate nuclei isolations for each library type (Materials and Methods). Together, the snATAC-seq samples represented 28 of the 30 regions (n = 3 animals; MB and MT snATAC-seq data were not generated). After QC (Materials and Methods and fig. S13), the total number of nuclei profiled was 1,587,880 and ranged from 5100 [in the closed medulla (MdC)] to 114,410 [in the inferior temporal gyrus (IT)] nuclei per region (median = 63,739 nuclei per region). We called peaks on a per-sample basis and combined them across all samples based on genomic overlap, resulting in (after filtering) a combined set of 1,192,873 candidate cis-regulatory elements (cCREs) spanning 24.4% (725 Mb) of the genome. We first applied uniform manifold approximation and projection (UMAP) dimensionality reduction and Leiden clustering to the batch-corrected epigenomic data ([132]Fig. 3A) and identified 42 clusters which, based on promoter accessibility, could be assigned to most major cell classes found across the brain ([133]Fig. 3B). However, given that unsupervised approaches to cell type identification are consistently more sensitive using single-cell RNA-seq or snRNA-seq data ([134]46), we drew from our transcriptionally defined cell annotations to assign cell labels to our snATAC-seq nuclei. To integrate the datasets, we used the graph-linked unified embedding (GLUE) approach ([135]47) and generated a unified transcriptomic and epigenomic embedding of 4,171,847 nuclei ([136]Fig. 3, C and D). Subsequent cell type predictions based on our multimodal integration assigned the majority of snATAC-seq nuclei to a cell class (73.7% with confidence ≥ 0.95; fig. S14) and captured all of the major cell classes ([137]Fig. 3D) with the exception of serotonergic and dopaminergic neurons, which are relatively rare and fairly specific to the MB (which as noted above was not sampled in our snATAC-seq data). The regional distribution of cell classes captured from snATAC-seq and snRNA-seq data were highly concordant, both within regions ([138]Fig. 3E) and overall ([139]Fig. 3F), which demonstrates that our homogenization and nuclei isolation protocols captured the same heterogeneous populations of cells in the same regions across both modalities. Fig. 3. Generation of the Macaque Brain Atlas sci-ATAC-seq dataset and identification of cell classes. [140]Fig. 3. [141]Open in a new tab (A) UMAP visualization of all snATAC-seq cells colored by brain region [with color code shown in (C)]. (B) UMAP visualizations of promoter accessibility scores of cell markers (GAD2: GABAergic neurons, ENTPD1: microglia, SLC1A2: astrocytes, ATP10A: vascular cells) reveal high specificity. (C) Barplots showing nuclei counts by brain region of the snRNA-seq, snATAC-seq, and integrated datasets. (D) UMAP visualizations of integrated multimodal data, with cell classes colored separately for (left) snRNA-seq and (right) snATAC-seq nuclei [with color code shown in (F)]. (E) Spearman’s rank correlation coefficients showing the correlation between cell class proportions in the snRNA-seq and snATAC-seq datasets within each region (representing data generated from the same homogenized sample). (F) Scatterplot showing the correlation between cell class proportions in the overall snRNA-seq and snATAC-seq datasets (combined across brain regions). (G) Integration-derived cell class annotations visualized over the same snATAC-seq UMAP visualization shown in (A) [with color code shown in (F)]. The gene regulatory landscape of the rhesus macaque brain We leveraged the snRNA-based cell class annotations ([142]Fig. 3G) to explore heterogeneity in cell type–specific gene regulation across the brain. To do so, we partitioned all unique snATAC-seq reads by predicted cell class ([143]Fig. 3G) and then called peaks separately for each partition using a similar peak calling approach to that used for the overall dataset, thereby generating an inventory of putative cCREs derived from each cell class in isolation (Materials and Methods). Across 11 cell classes with snATAC-seq–assigned nuclei, we identified an average of 210,572 peaks per cell class, ranging from 99,323 in microglia to 425,738 in cortical GABAergic neurons ([144]Fig. 4A). On average, for any given cell class, these peaks covered 7.7% of the genome, and 28.8% were found >2 kb from the nearest gene or promoter ([145]Fig. 4A). Fig. 4. Enrichment of TF binding site motifs in candidate regulatory elements. [146]Fig. 4. [147]Open in a new tab (A) Barplots showing summary statistics for peak sets called separately on reads derived from cells assigned to each of 11 cell classes. (B) Heatmap showing enrichment (log[2] OR) of TF binding motifs among cell classes. The top five most-enriched nonredundant TF motifs (all P[adj] < 0.05) are shown per cell class, ordered from left to right by increasing P[adj]. Log[2] OR color ranges are capped at ±1.5. (C) Position weight matrices of the most-enriched TF motifs for six example cell classes. ORs are shown in parentheses. (D) Scatterplots showing correlation between snATAC-seq accessibility of TF binding motifs and snRNA-seq gene expression of corresponding TF genes within cell classes in regional classes for four example TFs. Transcription factor regulatory networks Multimodal integration of cell-specific snATAC-seq and snRNA-seq data allowed us to examine the cis- and trans-regulatory links between chromatin accessibility and gene expression within individual cell types. We first examined putative trans-regulatory factors within cell classes and subtypes. Transcription factors (TFs) are key trans-regulatory proteins that control cell differentiation and function during neurodevelopment ([148]48–[149]51) and have been implicated in myriad neurodegenerative diseases ([150]52–[151]54). The extremely high cell type specificity of some nuclear TFs have also made them useful targets for identifying and enriching rarer cell types before single-cell sequencing ([152]55–[153]57). To identify candidate trans-acting regulatory networks in each cell class, we carried out TF binding motif enrichment analysis on each set of cell class–specific peaks, defined as the subset of a cell class’s cCREs that did not overlap with any peaks called in other cell classes (Materials and Methods and [154]Fig. 4A). Cell-class–specific cCREs were highly enriched for many TF binding motifs that are likely involved in cell-specific gene regulation ([155]Fig. 4B and table S11), including many motifs previously implicated ([156]Fig. 4C). For instance, microglial cCREs contained 6.6-fold more binding sites of the nuclear TF SPI1 (also known as PU.1) than expected by chance (P[adj] = 1.22 × 10^−284; [157]Fig. 4, B and C). In addition to such canonical examples, we identified numerous motifs that distinguish relatively similar cell classes. For instance, the TF binding motif for NFE2, from the nuclear respiratory factor (NRF) TF family, was most enriched [odds ratio (OR) > 2] in cCREs in both medium spiny neurons (OR = 3.07, P[adj]= 1.77 × 10^−87) and basket cells (OR = 2.34, P[adj] = 8.76 × 10^−7), while the binding motif for NEUROD1 was most enriched (OR > 2) in cCREs of basket cells (OR = 2.04, P[adj] = 3.53 × 10^−29), where this TF is necessary for basket cell terminal differentiation and, consequently, axon growth and inhibitory circuit formation ([158]58). We also characterized TF binding motif enrichment at the cell subtype level. To do this, we extended our multimodal integration and label-transferring approach to each cell class independently by tabulating the reads per-cell falling within cell class–specific cCREs described above for all cells of a given cell class ([159]Fig. 3D). We then integrated the data with corresponding snRNA-seq data of the same cell class using GLUE (Materials and Methods). The resulting integrated embeddings for each cell class were then used as the basis for predicting cell subtypes, which we carried out on all snATAC-seq cells within each class (fig. S15). Since cell subtypes are preselected to already share broadly similar chromatin accessibility profiles, identifying peaks that are specific to a single subtype—similar to our approach at the cell class level—was not feasible and left most cell subtypes with no or very few unique peaks to analyze. As an alternative strategy, we carried out differential accessibility analyses among cell subtypes to identify peaks that were predictive of each individual cell subtype within a given cell class (Materials and Methods). We then identified TF binding motifs enriched in highly differentially accessible regions within cell subtypes (table S12). For example, we observed numerous TF binding motifs (N = 433, P[adj] < 0.05) that were enriched within highly accessible peaks in Purkinje cells, a GABAergic neuron type of the cerebellum that is implicated in autism spectrum disorders (ASDs). In our snRNA-seq dataset, of all tested diseases ([160]59), genes associated with autism (DOID:12849) were overrepresented (Fisher’s exact test, OR = 10.2, P[adj] = 8.52 × 10^−16) among the top 100 Purkinje cell marker genes, including RORA [fold change (FC) = 331.9], AUTS2 (FC = 43.1), and SHANK2 (FC = 13.8) (table S13). Correspondingly, we found that TF motifs enriched in differentially accessible peaks included RORA, four members of the early growth response (EGR) family (EGR1 to EGR4), and CTCF (EGR1, EGR3, and CTCF were among the top five TF motifs ranked by OR; RORA ranked 182nd). RORA is a regulator of circadian rhythm that exhibits decreased expression in ASD brains and may play a role in ASD pathogenesis ([161]60, [162]61). EGR family TFs have been implicated in the disruption of human-specific developmental programs in autism ([163]62). CTCF is an insulator protein that regulates chromatin structure and may play a critical role in maintaining dendrite structure in Purkinje cells ([164]63) and CTCF is also a risk gene for ASD ([165]64). Given that families of TFs have similar binding motifs, it is often difficult to identify the specific TF in a given family that is responsible for enrichment in cell type–specific cCREs. To identify the most likely TF, we therefore used our recently developed approach ([166]45, [167]65) that uses the computationally paired snRNA-seq and snATAC-seq data. Briefly, this approach relies on the assumption that TFs will be highly expressed in cell types where they play a key role, while their associated motif should be enriched (or depleted) in that cell’s cCREs, indicating TF activation (or repression). Overall, we compared the accessibility of 369 TF binding motifs and their corresponding gene’s expression across the cell classes in four region subclasses, with 189 TFs showing positive Pearson’s correlation between gene expression and accessibility of the cognate motif and 180 showing negative correlation (fig. S16A and table S14). Among the TFs with largest positive or negative Pearson’s correlation values were strong cell class–specific activators and repressors ([168]Fig. 4D and fig. S16B). For instance, SPI1, which has been identified as a candidate gene for Alzheimer’s disease via various functional genetics approaches ([169]66), shows a strong activating effect with high expression of the SPI1 gene and high accessibility for the SPI1 binding motif in microglia. In contrast, NFATC2 has a repressing effect in microglia and vascular cells, as shown by high expression of the NFATC2 gene associated with lower NFATC2 motif binding in those cell types. We also found evidence for a clear distinction between neurons and non-neuronal cells at two TFs, with ELF1 functioning as a non-neuronal specific activator and NEUROD2 as a neuron-specific activator. In addition, we note that FLI1, an activator in vascular and microglia cell types, and ELF1 have motif sequences similar to SPI1 (fig. S16C), but their activating effects affect a broader set of cell types. The cis-regulatory landscape of brain cell variation We next sought to characterize cis-regulatory interactions between cCREs and proximate genes in the rhesus macaque brain. We used two complementary analyses to scan for interactions using our integrated multimodal dataset. First, we used the regulatory inference framework of GLUE ([170]47), which leverages the unified feature embedding (i.e., joint integration of snRNA-seq genes and snATAC-seq peaks in a common data space) generated during GLUE integration to assess similarity between peaks and genes. Putative regulatory interactions are defined as a high cosine similarity between peak and gene feature embeddings in the unified data space, with statistical significance assessed by permutation ([171]47). Second, we used a metacell-based approach to aggregate snRNA-seq transcriptomes and snATAC-seq epigenomes into multimodal metacells based on k-means clustering of the unified cell embeddings and then used logistic regression to model the relationship between gene expression and chromatin accessibility within a given metacell ([172]67). In contrast to the GLUE regulatory score, the logistic regression analysis enabled us to differentiate between positive and negative regulatory interactions between peaks and genes. We considered peak-gene pairs to be putatively regulatory if P[adj] < 0.05 for both analyses ([173]Fig. 5A and fig. S17). For each cell class, we also scanned for differentially accessible peaks using both a regularized logistic regression and a t test, testing accessibility in a given cell class against accessibility in all other cell classes. We consider cCREs with differentially high accessibility (regularized LR coefficient > 0, log[2] FC > 0, and t test P[adj] < 0.05; fig. S18) to be candidate regulators of cell type–specific genes ([174]Fig. 5A). Fig. 5. The landscape of cis-regulatory interactions in the Macaque Brain Atlas. [175]Fig. 5. [176]Open in a new tab (A) Schematic outlining criteria for identification of cCREs. Squares represent peak:gene pairs, darker colors symbolize stronger evidence for a given measure, and solid borders represent statistically significant measures. (B) Distribution of gene-peak GLUE regulatory scores binned according to the minimum signed distance (left: upstream, right: downstream) between peaks and gene TSSs. Distributions are shown separately according to whether the gene:peak pair also exhibited a significant association (P[adj] < 0.05) based on the metacell-based logistic regression analysis. The ratio between significant and not-significant gene-peak pairs for proximity bins according to the logistic regression model is shown in the top margin, while the distribution of GLUE regulatory scores is shown in the right margin. (C) Candidate regulatory elements are shown in relation to, from top to bottom, (i) the strength of and direction of inferred regulatory links connecting peaks to MBP expression (based on metacell logistic regression analysis). The height of links represents the strength (−log[10] P values) of evidence for regulatory connections and the color symbolizes the direction of the relationship; (ii) the differential accessibility (−log[2] FC) of peaks in oligodendrocytes relative to all other cell classes; (iii) the distribution of normalized snATAC-seq reads by cell class; (iv) gene and transcript boundaries of MBP and its known isoforms in the rhesus macaque genome, with exons shown in blue; (v) the distribution of normalized snRNA-seq reads by cell class. Oligodendrocyte reads are shown in relation to all other cell classes on the upper portion of the plot. On the bottom portion, the y axis is magnified ×60 and cropped to highlight more subtle differences among cell classes. We focused our analysis on the 6000 most variable genes in our snRNA-seq dataset and tested all snATAC-seq peaks that fell within 150 kb of the gene promoter (defined as TSS extended 2 kb upstream). In total, we tested 223,752 peak-gene pairs (151,083 unique peaks, 5765 unique genes), of which 142,324 peak-gene pairs (63.6%) met our criteria for being considered candidate cis-regulatory interactions (table S15). A total of 128,741 peaks (85.2%) that we evaluated were cCREs for at least one gene, and 4811 genes (83.5%) that we evaluated had at least one cCRE. Of all peak-gene pairs, 132,805 (93.3%) involved a peak that was highly differentially accessible in at least one cell class, thereby fulfilling our criteria for being considered candidate cis molecular interactions regulating cell type–specific markers. cCREs were highly differentially accessible in a maximum of seven cell classes, with 37% exclusive to a single cell class and 88% highly differentially accessible in one to three cell classes. The vast majority (133,496 or 93.8%) of candidate regulatory interactions were positively associated (i.e., had a positive effect size in the metacell logistic regression)—this held true whether peaks were upstream (13,650 of 14,575 or 93.7%), downstream (116,939 of 124,592 or 93.9%), or overlapping (2907 of 3157 or 92.1%) the gene’s transcription start site (TSS). For peak-gene pairs where the peak was upstream of the TSS, the GLUE regulatory scores were highest (indicating high similarity between peak and gene feature embeddings) when peaks were in closer proximity to the TSS ([177]Fig. 5B). For peaks downstream of the TSS, GLUE regulatory scores remained high across all distances, with only a modest decrease farther from the TSS ([178]Fig. 5B). This result was particularly notable for peaks that had significant, mainly positive, associations between accessibility and gene expression, likely reflecting (i) higher global accessibility across the gene body resulting from higher expression of the gene (as opposed to distal regulation) and/or (ii) methodological limitations of using a single gene-wide TSS (i.e., the most upstream TSS of all isoforms), thereby ignoring variation in TSS positioning among isoforms, which likely vary in their usage across tissues and contexts ([179]68). Using the cell class–specific gene expression and cCRE peak sets, we repeated our integration, regulatory inference, and differential accessibility workflows on each cell class individually. We tested a mean of 72,914 peak-gene pairs (range: 45,539 to 114,200) per cell class and identified a mean of 11,442 peak-gene pairs (range: 881 to 41,966) showing evidence of regulatory interactions (fig. S19 and table S16). To illustrate how these maps of putative interactions might be useful to investigate the regulatory landscape at the level of an individual locus, we focused on the myelin basic protein (MBP) gene ([180]Fig. 5C), which encodes one of the most abundant proteins in central nervous system myelin ([181]69, [182]70), has a range of splice isoforms ([183]71), and is a canonical marker of oligodendrocytes. MBP is located on chromosome 18 (positions 2,932,531 to 3,086,873) on the rhesus macaque (Mmul_10) genome and has eight annotated mRNA isoforms (Ensembl). In humans, classic MBP isoform 3 (18.5 kDa) predominates in adult myelin ([184]71). In our global peak set (all cells), 94 peaks fell within 150 kb of the MBP promoter and were included in our analysis. Of these peaks, 83 (88.3%) were identified as candidate regulators of MBP (crMBP), with 38 crMBPs (45.8%) positively associated with MBP expression. Of all crMBPs, only one was not located within the MBP gene boundaries—it was, however, located less than 2 kb upstream within the likely promoter region. In accordance with the well-known status of MBP as an oligodendrocyte marker, we found that MBP was differentially expressed in oligodendrocytes, with detected expression in 80.9% of cells and 1434-fold higher expression than all other cells averaged together. Fine-grained inspection of normalized read distributions from oligodendrocyte nuclei revealed the highest densities of snRNA-seq reads corresponding to the polyadenylation site (position 3,086,373) and snATAC-seq reads corresponding to the TSS (position 3,046,976) of a single transcript, ENSMMUT00000015870, indicating that it is likely the dominant MBP isoform expressed in adult macaque oligodendrocytes. By examining the genomic-distance relationships between crMBPs and the dominant MBP transcript in adult oligodendrocytes, we found that all 16 crMBPs that either overlapped or were downstream of the isoform’s TSS were positively associated with MBP expression. Among the 67 crMBPs that were located upstream of the TSS, 22 (32.8%) were positively associated with MBP expression, while 45 (67.2%) were negatively associated. Several of these negatively associated crMBPs corresponded with sci-ATAC-seq3 peaks in other cell types, particularly OPCs and microglia ([185]Fig. 5C). However, the accessibility landscape of OPCs is overall more similar to that of oligodendrocytes across the region upstream of the TSS of the dominant isoform, with greater accessibility at most peaks except for that of the promoter of the dominant isoform ([186]Fig. 5C). As OPCs play a critical role in myelinogenesis by giving rise to oligodendrocytes ([187]72), these crMBPs likely serve as critical markers of the OPC-oligodendrocyte transition, during which the expression of this gene, and this isoform in particular, is massively up-regulated. Evolutionary conservatism and divergence of candidate regulatory elements Human brain specialization could be driven in part by changes in cell type composition and function following the evolutionary divergence of cell type–specific regulatory elements. To evaluate evolutionary divergence/conservatism, we tested whether cCREs that were differentially accessible across cell classes (table S15) or unique to cell classes (table S17) were associated with regions that underwent rapid evolution in the human lineage [i.e., human accelerated regions (HARs) ([188]73); human ancestor quickly evolved regions (HAQERs) ([189]74)] or are differentially accessible between humans and chimpanzees in cerebral organoids (DA[HC] regions) ([190]75). We also tested whether cCREs were enriched for these evolutionarily salient regions (relative to all peaks, called on the global set) and did not detect any enrichments (table S17). We detected an enrichment of DA[HC] regions among differentially accessible cCREs (OR = Inf; P[adj] = 0.046) and a depletion of DA[HC] regions among glutaminergic neuron-specific cCREs (OR = 0.263; P[adj] = 0.046) (table S17). These findings indicate that regulatory areas exhibiting differential accessibility across cell types in the adult macaque brain may also exhibit differential accessibility in the developing human versus chimpanzee brain, while cell class–specific regions may be more conserved during primate brain development. Neither HARs nor HAQERs were enriched among differentially accessible or cell class–specific cCREs (P[adj] > 0.05), which was expected given that intercell class variation in orthologous gene expression is generally well-conserved in primates ([191]76). We did identify six differentially accessible or cell class–specific cCREs that overlapped with HARs or HAQERs (table S17). There was one cCRE that was differentially accessible in macaque medium spiny neurons, cerebellar neurons, and vascular cells that overlapped with two HAQERs. These cCREs are also: (i) located in the 1q21.1-2 region containing the NBPF gene cluster that contains several human-specific segmental duplications; (ii) areas of open chromatin in the developing human brain; and/or (iii) regions showing differential chromatin accessibility between human and chimpanzee cerebral organoids ([192]74). While the presence of HAQERs in these regions suggests that they may function differently in humans versus macaques, the human sequences for these HAQERs do not appear to exhibit significantly greater enhancer activity compared to nonhuman primate sequences ([193]74). Enrichment of disease heritability among candidate regulatory elements Last, we used our cCREs to identify cell type–associated regulatory networks that may drive polygenic disease risk. We tested for enrichment of disease trait heritability using the linkage disequilibrium score regression (LDSC) tool ([194]77, [195]78), after lifting over macaque cCREs to human genome coordinates ([196]28). We tested a total of 53 phenotypes relevant to neurological diseases, disorders, syndromes, behaviors, or other traits (table S18) and examined enrichment among cell class cCREs called separately in each of 11 cell classes. Our results broadly recapitulated several known roles of cell classes in neurological disease ([197]Fig. 6 and table S19). For example, sites associated with cardioembolic stroke (OR = 32.2) or ischemic stroke (OR = 9.2) were enriched (P[adj] < 0.05) only in vascular cells, which play a crucial role in forming and maintaining the blood-brain barrier ([198]79). We also found that Alzheimer’s disease–associated sites were enriched only in microglia—a result replicated using loci from three independent genome-wide association studies (GWAS) (OR range: 13.9–15.0)—consistent with the prominent role of microglia proliferation and activation in Alzheimer’s disease ([199]80). Fig. 6. Enrichment of heritable disease-relevant sites among candidate regulatory elements. [200]Fig. 6. [201]Open in a new tab The heatmap displays heritability enrichment (log[2] OR) of diseases among cell class snATAC-seq peaks for tested diseases, syndromes, and phenotypes. Only results passing a threshold of P[adj] < 0.05 are shown. The log[2] OR color range is capped at 3.0. Across all cell classes, basket cells were enriched for the greatest number (N = 37) of GWAS phenotypes, including disorders such as schizophrenia (OR range: 5.9 to 6.2), bipolar disorder (OR range: 5.6 to 6.2), and major depressive disorder (OR range: 5.1 to 5.3) and, most strongly, epilepsy (OR = 9.0)—a disease that basket cells have been connected to in animal models and some genetically linked human forms of the disease ([202]81). Other notable results included the enrichment of multiple sclerosis-associated sites among open regions in microglia (OR = 46.6), highlighting the outsized role of these immune cells in the etiology of the disease and as a putative therapeutic target ([203]82, [204]83). In multiple sclerosis, disease-associated microglia alter their transcriptional profiles and may contribute to neuroinflammatory processes underpinning this autoimmune disorder ([205]83). We also found enrichment of Parkinson’s disease–associated sites among open regions in the glial OPC, oligodendrocyte, and astrocyte cell classes (OR range: 7.0 to 8.4). In Parkinson’s disease, glial cells may play a major role in the progressive degeneration of dopaminergic neurons ([206]84), a classic hallmark of Parkinson’s disease, or in alterations to glutamatergic neurotransmission ([207]85). Last, we found that heritable sites associated with attention deficit/hyperactivity disorder (ADHD) in our analysis were enriched only among open regions of medium spiny neurons. While the magnitude of the enrichment was relatively mild (OR = 2.6, P[adj] = 0.031), genetic variants associated with ADHD have been historically difficult to identify, with the first risk loci only recently reported ([208]86). Medium spiny neurons have been linked to behavioral hyperactivity and disrupted attention via activation of astrocyte-mediated synaptogenesis ([209]87). Our results therefore suggest that medium spiny neurons may be a promising target for prospective ADHD therapeutics warranting further study. DISCUSSION Understanding the cellular architecture of the primate brain is crucial both for understanding the evolution of human cognition and behavior and for identifying mechanisms underlying neurological disorders. In service of these goals, we used snRNA-seq and snATAC-seq to derive a molecular atlas spanning the adult rhesus macaque brain, comprising data from over 4 million cells profiled from 30 brain regions. On the basis of our multimodal molecular data, we identified 112 distinct molecular cell types or subtypes and characterized their distribution across the macaque brain, adding to the growing number of primate single-cell molecular brain atlases ([210]15, [211]32). The data are freely available (NeMO archive, nemo:dat-rtmm5q2) and browsable (CELLxGENE viewer, [212]https://cellxgene.cziscience.com/collections/8c4bcf0d-b4df-45c7-88 8c-74fb0013e9e7) and will serve as a rich resource for the neuroscience and neurogenomics communities. In generating a multiregion transcriptomic and epigenomic atlas of the most widely used nonhuman primate in neuroscience, we (i) identified all of the major brain cell classes and many cell types that have been previously reported ([213]Figs. 1 and [214]2); (ii) quantified regional distribution of cell types and subtypes within individuals, which allowed us to identify compositional differences in samples collected at the same time and from the same animals ([215]Fig. 2); (iii) identified rare and regionally specific cell types (e.g., Purkinje cells), which may facilitate the development of molecular tools such as cell type–specific viral vectors that, in combination with new technologies such as CellREADR (Cell access through RNA sensing by Endogenous ADAR) ([216]88), may enable precise targeting of cell types based on their unique patterns of chromatin accessibility and gene expression; (iv) characterized multiple trans- and cis-regulatory mechanisms that differentiate cell classes and subtypes ([217]Figs. 4, A to D, and [218]5); and (v) identified numerous associations between genetic risk for neurological disorders and the epigenomic states of specific cell types ([219]Fig. 6). Notably, this single-cell atlas of the adult primate is generated from samples collected from healthy adults. The paired nature of the dataset, with regions sampled from the same individual brains, avoids many of the inter-individual variables (e.g., genotype and environment) that can affect neurological development and function. The atlas may thus be a valuable resource for characterizing molecular features that play a role in myriad neurological disorders. The relatively few unique individuals sampled also represent a limitation of the current study—we currently know very little about how brains of healthy individuals differ in cell composition and function and what that confers for disease susceptibility and/or progression. Given continuing improvements in cost and throughput of single-cell sequencing, characterizing multiregion cellular variation across many healthy individuals is becoming not only a possibility but also an emerging priority for the field. To our knowledge, these data represent the largest and most comprehensive multimodal molecular atlas in any nonhuman primate to date and provide a resource for exploring how the heterogeneous molecular and cellular composition of the brain gives rise to the behavioral complexity of primates including humans. We anticipate that these data will also provide a critical and much-needed molecular and neurobiological map of complex human-relevant social behavior and disease, as well as an extensive substrate for comparative analyses across animal brains. MATERIALS AND METHODS Study population and sample collection All animals sampled in this study are rhesus macaques (M. mulatta) from the semi-free-ranging colony on the island of Cayo Santiago, Puerto Rico. Maintained by the Caribbean Primate Research Center (CPRC) within the University of Puerto Rico, the Cayo Santiago macaque colony has been largely continuously studied since its founding in 1938 ([220]89). All present-day macaques are descended from an initial founder population of 409 animals and have since maintained an outbred population structure despite generations of isolation ([221]90). Apart from being provisioned with commercial feed and occasionally subject to capture-and-release sampling, the macaques otherwise live in naturalistic conditions, subject to minimal intervention and manipulation, as approved by Institutional Animal Care and Use Committee. The study used animals that needed to be removed from Cayo Santiago ([222]91) and were immediately euthanized. Standardized tissue collection and sample archiving were coordinated by the Cayo Biobank Research Unit (CBRU), which provided the brain samples used in this study ([223]92, [224]93). Procedures for necropsy, brain removal, and dissection followed those previously described for this population ([225]93) and are briefly outlined here. Following veterinary euthanasia, brains were perfused with sterile saline, removed from the cranium, and hemisected into left and right hemispheres using a long single-edge razor blade. After sectioning off the cerebellum/brainstem from each hemisphere, the cerebral hemispheres were placed on custom molds (designed either for left or right hemispheres) and coronally sectioned into 11 roughly 5-mm-thick blocks, numbered in order rostral to caudal. All 12 blocks (with the cerebellum/brainstem considered block 12) were then sealed in Whirl-pak bags, flash-frozen in liquid nitrogen vapor, and archived in ultralow −80°C freezers. The interval between euthanasia and permanent storage of frozen tissue averaged 51 min, with an SD of 5.8. All procedures were performed in accordance with the National Institutes of Health (NIH) Guide for the Care and Use of Laboratory Animals and were approved by the Institutional Animal Care and Use Committee at the University of Puerto Rico (protocol #338300). Five macaques were included in this study (table S2). The vast majority of the data were derived from four 10-year-old macaques, which are considered middle-aged adults in this population ([226]93, [227]94). Region selection and biopsy Frozen brain blocks were placed on a dissection tray over dry ice to keep tissue frozen during biopsy collection. Individual blocks were then moved from the dry ice to a tray sitting on wet ice, allowing for tissues to be acutely warmed to the point that biopsies could be taken from targeted structures. Biopsies were made using a cutting spoon (Fine Science Tools Inc., catalog no. 10360-13). Dissected brain regions are listed in table S1, and approximate locations for biopsy are illustrated in [228]Fig. 1A. For a given structure, attempts were made to minimize inclusion of off-target surrounding tissues (e.g., white matter underlying a targeted gray matter structure). Below, we document the most common block numbers where structures were located. Because of interindividual differences and/or variation in sectioning, regions of interest were sometimes identified and dissected from adjacent blocks based on neuroanatomical landmarks. Alternate block numbers are therefore also documented below. The most anterior block sampled for this study (block 2) contained gray matter for the dorsomedial (dmPFC), ventromedial (vmPFC), dorsolateral (dlPFC), and ventrolateral prefrontal cortices (vlPFC). dmPFC and vmPFC were defined as being on the medial side of block 2. The dmPFC biopsy was pulled from the gray matter in the ~top half of the medial edge of the block. A space along the medial edge was left to separate dmPFC from vmPFC. The vmPFC biopsy was pulled from the medial ventral half of the tissue block. Biopsies of dlPFC came from the cortical tissues surrounding the dorsal lateral portion of the block that included the superior and inferior portions of the principal sulcus. Samples from vlPFC came from the ventral and lateral portion of the block. As was the case on the medial side, a portion of the cortex was left between each lateral biopsy to avoid overlap ([229]Fig. 1A). Block 3 (sometimes 4) contained biopsies for the anterior cingulate cortex (ACC), CC, and head of the CN. The biopsy for ACC was the gray matter sitting between the CC, which is ventral to ACC and the cingulate sulcus (cs), which sits dorsal to the cingulate gyrus. CC was defined as the white matter track sitting ventral to the ACC and medial to the lateral ventricle. The CN was the gray matter sitting ventrolateral to the lateral ventricle and surrounded on all other sides by white matter. The CN was the only biopsy in the second block that was scooped out of the block face to minimize inclusion of any white matter sitting anteriorly past the CN within the block ([230]Fig. 1A). Block 5 (sometimes 4 or 6) contained the amygdala (AMY), entorhinal cortex (EC), perirhinal cortex (PC), and NAc. The NAc is located ventral to the caudate, internal capsule, and putamen (Pu). Furthermore, in fresh-frozen tissue, there was a slightly darker color to the NAc. The tissue making up the NAc was scooped out of the block face. Similarly, the AMY was identified as ventral to the Pu, medial to the ventral portion of the claustrum, and dorsal to the EC. The AMY was also scooped out to minimize the inadvertent collection of neurons within the hippocampus (HIP). Last, the EC and PC were collected, the delineation between the two was the rhinal fissure ([231]Fig. 1A). Blocks 5 and 6 (sometimes 4 or 7) contained tissue that were biopsied to represent cortical regions primary motor cortex (M1), primary somatosensory cortex (S1), primary auditory cortex (A1), superior temporal cortex (STS), and IT. Subcortical structures that were biopsied included mdTN, vlTN, LGN, and HIP. The delineation between M1 and S1 was the central sulcus and was taken from the approximate central third of the lateral portion of each respective gyrus. Within a case, attempts were made to biopsy from approximately the same putative mototopic and somatotopic regions. A1 biopsies were taken from the dorsal portion of the superior temporal gyrus, which is within the lateral sulcus (ls) (i.e., inferior operculum). The gray matter forming the STS sits ventral to the superior temporal gyrus and dorsal to IT. IT was defined as the gray matter forming the lateral portion of the inferior temporal cortex. mdTN sits bilaterally on midline, within the thalamus. It is bound by ventricles dorsally, laterally by the centrolateral thalamic nucleus and ventrally by the centromedial thalamic nucleus. vlTN is bound by the centrolateral thalamic nucleus medially, body of the CN dorsally, and the reticular thalamic nucleus laterally. Biopsies for mdTN were taken from the central and central medial portions of the nucleus, while vlTN biopsies were taken from the central portion of the nucleus. In both cases, this was in an effort to avoid inclusion of other thalamic nuclei. The LGN is a six-layered structure that is easily observed on the coronal face of fresh-frozen slabs. When observed, the biopsy was scooped out. Like the LGN, the HIP was defined by its classic cytoarchitectonic features within the medial temporal lobe. For biopsies, efforts were made to not include EC, which sits ventral and ventromedial to HIP ([232]Fig. 1A). Block 7 (sometimes 6 or 8) contained tissues representing the superior posterior parietal (SPP), inferior posterior parietal (IPP), and the middle temporal visual area (MT). SPP biopsies were from the gray matter of the superior lobule. The intraparietal sulcus sits between SPP and IPP. Therefore, IPP biopsies were taken from the gray matter of the second, more lateral lobule. Last, area MT was defined by the gray matter of the insular cortex, bound on its medial edge by white matter of the extreme external capsule and laterally by the superior and inferior operculum divided by the STS ([233]Fig. 1A) ([234]95, [235]96). The final cerebral block, block 11, contained the visual cortex. Biopsies from V1 were taken from the dorsolateral surface gray matter above the external calcarine sulcus ([236]Fig. 1A). The hemisected cerebellum/brainstem block was dissected as follows. First, the cerebellum was dissected off and the cerebellar vermis (CV) was separated from the lateral cerebellar cortex (lCb). Next, the remaining brainstem was dissected such that the MB block was separated by making a cut from just behind the inferior colliculus to the top of the basilar pons. Next, the pons was separated from the medulla by making a cut from the stria medullaris (approximate center of the fourth ventricle) to the base of the pons. A final cut at the base of the fourth ventricle to separate the open medulla (MdO) from the MdC ([237]Fig. 1A). To allow for the profiling of multiple genomic modalities from the same representative cell populations, we pulverized all biopsies on dry ice to homogenize and divide tissue for downstream experiments. We followed the tissue pulverization procedures described by Domcke et al. ([238]97) to achieve a powder consistency on a sterile aluminum foil work surface. Once sufficiently pulverized, we stirred the sample thoroughly and then divided the sample using the folded edge of foil as a funnel into new 1.5-ml prechilled and prelabeled microcentrifuge tubes. Foil and tubes were set on aluminum trays or tube racks set on dry ice to keep powdered tissue frozen throughout this process. We divided samples into roughly a 2:1 ratio given the expected efficiencies/yields for snRNA-seq and snATAC-seq protocols, respectively. Pulverized tissue was stored at −80°C up until processing for downstream library preparation procedures. snRNA-seq data generation To profile single-nucleus gene expression, we performed snRNA-seq using the sci-RNA-seq3 approach ([239]19), which is the improved version of the original sci-RNA-seq protocol ([240]18). For two of the three experimental batches in our dataset, we used a protocol closely adhering to the sci-RNA-seq3 protocol described by Cao et al. ([241]19). For the third batch, we used the improved protocol (“tiny sci”) described by Martin et al. ([242]21). Sample order was randomized between the first two batches, and within the third batch, to minimize batch effects and other technical artifacts. For the first two batches, we slightly modified the protocol described by Cao et al. ([243]19, [244]20) for a different tissue type and smaller input amounts. Briefly, we added 50 μl of cell lysis buffer to pulverized tissue in a 1.5-ml microcentrifuge tube and then homogenized the tissue using 5 to 10 strokes with a disposable ribonuclease (RNase)–free plastic pestle (Fisherbrand, catalog no. 12-141-364). We then added another 950 μl of cell lysis buffer, mixed by pipette, and then transferred the suspension through a 70-μm cell strainer (pluriSelect, catalog no. 43-10070-70) into a 15-ml conical tube containing 5 ml of ice-cold 4% paraformaldehyde. Nuclei were fixed in 4% paraformaldehyde for 15 min with occasional mixing, washed once in 1-ml ice-cold nuclei wash buffer, and then suspended in 200-μl nuclei wash buffer. Nuclei were counted by mixing with 1 μM YOYO-1 iodide (Thermo Fisher Scientific, catalog no. Y3601) using a Countess II FL automated cell counter (Life Technologies), divided into tubes in 100-μl aliquots, and then flash-frozen in liquid nitrogen. For nuclei fixed with paraformaldehyde, library construction was similar to the sci-RNA-seq3 method from Cao et al. ([245]19) with minor modifications including the substitution of Quick Ligase (New England Biolabs) for 10 min at 25°C for the second index step, instead of T4 DNA ligase (NEB) for 180 min at 16°C. For tagmentation, we used N7 adaptor-loaded Tn5 from QB3 MacroLab at the University of California Berkeley in tagmentation buffer (2× TD) as previously described in Corces et al. ([246]98): 20 mM tris-HCl (pH 7.5), 10 mM MgCl[2], and 20% (vol/vol) dimethylformamide. Libraries were sequenced on a NextSeq or NovaSeq platform (Illumina) (read 1: 34 cycles, read 2: 100 cycles, index 1: 10 cycles, index 2: 10 cycles). For the [dithiobis (succinimidyl propionate)] (DSP)/methanol nuclei isolations and library construction based on Martin et al. ([247]21), we used hypotonic lysis buffer solution B (with bovine serum albumin) for small volume tiny sci-RNA-seq3 nuclei isolation methods. For sci-RNA-seq3 library construction, we loaded ~20,000 nuclei per index 1 reverse transcriptase (RT) well in a 384 RT-well experiment with mouse and human brain added as separate QC nuclei and nuclei from cell lines human embryonic kidney (HEK) 293T (RRID:CVCL_0063) and NIH/3 T3 (RRID:CVCL_0594) combined as barnyard controls per RT plate. Nuclei from all RT plates were pooled and redistributed to ligation plates for the second index as previously published; after the addition of the second index, nuclei were again repooled for their final distribution of 4000 nuclei per well before second strand synthesis, protease digestion, tagmentation, and polymerase chain reaction all on this final third index plate. snRNA-seq preprocessing snRNA-seq sequencing reads were processed into a gene-by-nucleus expression matrix of UMI counts following the methods described by Cao et al. ([248]19). We used largely an identical pipeline which, briefly, (i) converts base calls to fastq files with bcl2fastq/v.2.20 (RRID:SCR_015058) (Illumina), (ii) removes adapter sequences using Trim Galore/v.0.6.7 (RRID:SCR_011847) ([249]99), (iii) aligns trimmed reads to a reference genome with STAR/v.2.7.6 (RRID:SCR_004463) ([250]100), (iv) extracts mapped reads, (v) removes duplicates, and (vi) generates UMI counts for exonic and intronic regions of each gene, tabulated according to the unique three-level barcode design in sci-RNA-seq3. We used the rhesus macaque reference genome (Mmul_10) ([251]101) and annotation, obtained from Ensembl (version 101) (RRID:SCR_002344). We extended the 3′ untranslated region annotations of genes and transcripts by 500 bp to avoid misclassifying genic reads as intergenic. The remainder of our pipeline followed the procedures described by Cao et al. ([252]19). After generating the count matrix, we removed all nuclei with UMI counts < 100. For each sample, we imported gene-by-nucleus count matrices into the AnnData/v.0.8.0 (RRID:SCR_018209) ([253]102) framework and then ran Scrublet/v.0.2.3 (RRID:SCR_018098) ([254]103) (expected_doublet_rate = 0.05) to calculate doublet scores. We marked nuclei as doublets if they had Scrublet doublet scores > 0.20. For each sample, we additionally marked nuclei as doublets using per-sample thresholds determined by Scrublet and adjusted by eye as necessary to separate bimodal peaks visualized on the Scrublet doublet score histogram (fig. S1). To further identify potential doublet nuclei, we used an iterative clustering strategy ([255]104) implemented with Scanpy/v.1.9.1 (RRID:SCR_018139) ([256]105). First, we combined all nuclei into a single AnnData object and filtered nuclei to those with UMI ≥ 100, number of expressed genes < 2500, and a percentage of reads mapping to the mitochondrial genome < 5%. We then removed all non-autosomal genes, genes located on unplaced scaffolds, and unexpressed genes. Next, we normalized the data to the total UMI per nucleus, logarithmized the data, and subsetted the data to the 10,000 most variable genes. For each cell, we regressed out total UMI counts per nucleus and then mean-centered and scaled the data. The dimensionality of the data was then reduced by principal components analysis (PCA) (50 components). To further reduce the dimensionality, we ran a UMAP (using umap-learn/v.​​0.5.2) (RRID:SCR_018217) analysis ([257]106) with BBKNN/v.1.5.1 (RRID:SCR_022807) ([258]107) to simultaneously correct for batch differences. For the BBKNN integration, we set neighbors_within_batch = 10 (given three batches, tantamount to UMAP n_neighbors = 30), used the cosine distance metric, and used the PyNNDescent/v.0.5.6 algorithm (RRID:SCR_022806) ([259]108). We then ran UMAP using the settings min_dist = 0, spread = 1.0, and n_components = 10 to facilitate clustering ([260]https://umap-learn.readthedocs.io/en/latest/clustering.html). For data visualization only (not clustering), we ran a similar BBKNN/UMAP pipeline with neighbors_within_batch = 5 (for three batches, tantamount to UMAP n_neighbors = 15), min_dist = 0.25, spread = 1.0, and n_components = 2. To cluster the data, we exported and imported the 10-dimensional UMAP matrix into Monocle3/v.1.2.9 (RRID:SCR_018685) ([261]19) in R/v.4.0.2 (RRID:SCR_001905) ([262]109) and then implemented the Leiden-clustering workflow in Monocle3 with a relatively high-resolution setting (resolution = 1 × 10^−4). For each cluster, we then calculated the mean Scrublet doublet score and marked all clusters with a mean Scrublet doublet score > 0.15 as doublet clusters (fig. S1). After identifying doublets as described above, we removed all marked doublets and repeated the normalization, dimensionality reduction, and clustering procedures almost exactly as described above, with the only difference being a coarser cluster resolution setting in Monocle3 (resolution = 1 × 10​​^−5). We confirmed adequate removal of doublet cells by observing the clean separation of distinct cell types and the absence of clusters expressing obviously ambiguous marker gene profiles (fig. S1). Removal of sci-RNA-seq cell contamination During the course of cell type identification (see the following section), we observed the presence of two distinct clusters of cells (fig. S2A) with expression profiles resembling embryonic progenitors (markers genes, unknown cluster 1: ASPM, CENPE, CENPF, MKI67; unknown cluster 2: COL1A1, COL1A2, FN1, VIM), an unusual finding in adult primate brain samples. Because these were present in relatively large proportions in some samples (~25%)—but at low levels overall (2.2%)—and because our sci-RNA-seq experiments included control samples of exogenous (i.e., non-macaque brain) origin (specifically, a fetal mouse brain positive control and a “barnyard” sample consisting of mixed human HEK293T and mouse NIH/3 T3 cells), we tested for the presence of contaminating nuclei of exogenous origin. We identified and removed contaminating cells as follows. Because the only non-macaque samples included in all experiments were the control samples of either human or mouse origin, we used BBSplit/v.38.38 (RRID:SCR_016965) ([263]110) to assign reads to the macaque, human, or mouse genomes. BBSplit is a competitive aligner that maps to several references simultaneously, assigning reads to the