Abstract Neuronal senescence (i.e., neurescence) is an important hallmark of aging and neurodegeneration, but it remains poorly characterized in the human brain due to the lack of reliable markers. This study aimed to identify neurescence markers based on single-nucleus transcriptome data from postmortem human prefrontal cortex. Using an eigengene approach, we integrated three gene panels: (a) SenMayo, (b) canonical senescence pathway (CSP), and (c) senescence initiating pathway (SIP), to identify neurescence signatures. We found that paired markers outperform single markers; for instance, by combining CDKN2D and ETS2 in a decision tree, a high accuracy of 99% and perfect specificity (100%) were achieved in distinguishing senescent neurons (i.e, neurescent). Differential expression analyses identified 324 genes that are overexpressed in neurescent. These genes showed significant associations with important neurodegeneration-related pathways, including Alzheimer’s disease, Parkinson’s disease, and Huntington’s disease. Interestingly, several of these overexpressed genes are linked to mitochondrial dysfunction and cytoskeletal dysregulation. These findings provide valuable insights into the complexities of neurescence, emphasizing the need for further exploration of histologically viable markers and validation in broader datasets. Subject terms: Senescence, Neurological disorders, Dementia, Alzheimer's disease, Biomarkers Introduction Cellular senescence is a complex, multi-step process characterized by stable cell cycle arrest^[34]1, changes in cell morphology^[35]2,[36]3, changes in gene expression^[37]4–[38]8, and a pro-inflammatory secretory phenotype^[39]9,[40]10. Senescence was first identified in fibroblasts^[41]11, is heterogeneous in tissues throughout the body^[42]12–[43]14, and has been studied in the context of the human brain^[44]15,[45]16. Increasing evidence implicates cellular senescence in brain aging and links it to neurodegenerative disorders^[46]17,[47]18, including Alzheimer’s disease (AD)^[48]19–[49]22. The presence of senescent cells triggers a pro-inflammatory environment^[50]23 and contributes to neuron loss, tissue dysfunction, and cognitive impairment in animal models of AD pathologies, including amyloid plaques^[51]24,[52]25 and tau-containing neurofibrillary tangles^[53]18,[54]26. Despite their significance, defining and identifying senescent cells in the human brain remains challenging due to their heterogeneous nature^[55]27. Although reliable markers have been utilized to identify senescent cells in specific tissues such as adipose tissue, retinal endothelial cells, and fibroblasts^[56]12,[57]28–[58]33, a universally applicable set of senescence markers across diverse tissues remains poorly defined. This challenge is especially pronounced in neurons, where historically defined senescence markers have not been robustly validated^[59]34. This knowledge gap has limited our understanding of when senescent cells first appear in the adult brain, inferring their contribution to neurodegenerative disease pathophysiology and our ability to develop potential therapies to modulate or remove senescent cells from the human brain^[60]21,[61]35. One recent attempt to generate a comprehensive list of senescent markers resulted in the SenMayo gene list^[62]6, which includes 125 genes that are highly correlated with age and the expression level of p16 and p21, cyclin-dependent kinase inhibitors that are upregulated in many senescent cells. The majority of genes in the SenMayo panel are senescence-associated secretory phenotype (SAPS) factors. However, cellular senescence has other aspects, including activation of senescent cell anti-apoptotic pathways (SCAPs)^[63]36, organelle dysfunction, and morphology changes. To provide a comprehensive and multidimensional perspective for neuronal senescence (neurescence), we employed an eigengene approach based on (a) the SenMayo gene list, in addition to our two previously published lists^[64]16, including (b) canonical senescence pathway (CSP) with 22 genes, which reflect cell cycle arrest^[65]1,[66]37,[67]38, and (c) senescence initiating pathway (SIP) panel with 48 genes, which are upregulated in early senescence and activate SCAPs. Technically, an eigengene is computed as a weighted average expression of all genes in a given list^[68]39,[69]40 derived using principal component analysis on those particular genes. In this study, we computed an eigengene^[70]40 for each of the SenMayo, CSP, and SIP gene lists and used all these eigengenes for three main analyses: (1) identifying senescent and non-senescent cells, (2) identifying the most accurate markers for neuronal senescence, referred to as neurescence^[71]34, and (3) identifying differentially expressed genes in senescent vs non-senescent neurons. Results We used four independent single-nucleus RNA sequencing (snRNA-seq) datasets of the dorsal prefrontal cortex from postmortem human brains were used, which are here referred to as Mathys 2019^[72]41 (n = 80,000), Zhou 2020^[73]42 (n = 70,000), Xiong 2023^[74]43 (n = 400,000), and Mathys 2024^[75]44 (n = 255,000), respectively (Table [76]1). Mathys 2019 served as the discovery dataset to identify markers and the other three datasets were used to validate the performance of the identified markers. Table 1. Number of identified senescent neurons in the discovery dataset (Mathys 2019), and the three external validation datasets: Zhou 2020, Xiong 2023, and Mathys 2024 Cohorts Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024 Number of brain samples 48 32 92 48 Number of samples with AD pathology 24 19 44 26 Median ± sd of age 89.5 ± 4.4 89.3 ± 6.2 87.5 ± 3.8 87.5 ± 5 Total number of cells (PFC) 80,000 70,000 400,000 255,000 Senescent neurons (CSP+, SIP+, & SenMayo+) 475 255 124 1178 Non-Senescent neurons (CSP^-, SIP^-, & SenMayo^-) 16,871 4003 95,144 37,061 Total excitatory neurons 34,976 12,590 173,246 112,143 Total inhibitory neurons 9196 3727 61,056 40,290 [77]Open in a new tab Performing the eigengene approach^[78]16 on the Mathys 2019 discovery dataset, we identified 475 senescent and 16,871 non-senescent neurons (Table [79]1). Specifically, the identified 475 senescent neurons (neurescent cells) expressed each of the three CSP, SIP, and SenMayo eigengenes more than the corresponding mean plus three times the standard deviation (mean + 3 s.d.). In contrast, if a neuron expressed each of these eigengenes less than the corresponding mean, then that neuron was considered non-senescent (N = 16,871). The remaining 53,288 borderline neurons were excluded from our analysis. We used the identified neurescent and non-senescent neurons to train decision trees and to perform differential expression (DE) analysis. Of note, only excitatory neurons displayed consistent expression across all three senescence eigengenes, and subsequent differential expression analysis and marker identification were restricted exclusively to excitatory neurons. To further assess the robustness of our method and the reproducibility of the eigengene-derived senescence markers, we conducted a reciprocal validation analysis. We used Mathys 2024 as the discovery dataset to independently derive the eigengene weights for CSP, SIP, and SenMayo, and then applied these derived weights to Mathys 2019 as a validation dataset. Comparing the weights from the two independent discovery datasets (Mathys 2019 vs. Mathys 2024) (Supplementary Table [80]1) revealed high concordance: specifically, correlations of 94% for CSP weights, 88% for SIP weights, and 88% for SenMayo weights, respectively (Supplementary Fig. [81]3). Furthermore, when using the Mathys 2024-derived eigengene weights to identify senescent cells in the Mathys 2019 dataset, we identified 413 senescent neurons, closely matching the original 475 neurons that were identified using Mathys 2019 as discovery. Notably, 370 neurons were identified as senescent in both approaches, representing a highly significant overlap (hypergeometric test p value [MATH: <10 1714 :MATH] ). Collectively, these results demonstrate the robust consistency and reproducibility of our eigengene-based senescence identification framework, regardless of the initial discovery dataset selection. Identification of specific markers for neurons among genes generally associated with senescence To identify potential markers for neurescence, we fitted relatively small decision trees^[82]45 to the discovery dataset. To ensure the selected genes were specific, each tree was allowed to use no more than two genes from the pool of gene sets known to be associated with cellular senescence^[83]6,[84]16, as indicated on the third row of Table [85]2. Table 2. The top-selected genes for the identification of senescent neurons based on different criteria Column number 1 2 3 4 5 6 7 Criteria for positive neurons CSP^+ SIP^+ SenMayo^+ CSP^+ & SIP^+ & SenMayo^+ CSP^+ & SIP^+ & SenMayo^+ Gene set (number of genes) CSP (22) SIP (44) SenMayo (116) CSP, SIP, SenMayo (180) CSP, SIP, SenMayo, DE (499) Excluded gene None None None None Top gene removed*** None Top gene ‍removed Number of genes in the tree Original count data 1 CDKN2D (96%)* SOD1 (98%) HMGB1 (97%) MAP2K1 (99%) CDKN2D (98%) DPYSL2 (99%) CALM3 (99%) 2 CDKN2, RB1 (99%) SOD1, GSK3B (99%) HMGB1, ETS2 (98%) MAP2K1, CDKN2D (99%) CDKN2D, HMGB1 (99%) DPYSL2, ATP6V1H (99%) CALM3, SH3GL2 (99%) Binary data 1 CDKN2D (96%) HRAS (95%) Not found Not found Not found Not found Not found 2** CDKN2, RB1 (99%) GAAD4, HRAS, MAPK14 (97%) ETS2, CTSB (98%) CDKN2D, ETS2 (99%) E2F3, RB1 (99%) UQCRHL, ETS2 (99%) CDKN2D, RB1(99%) [86]Open in a new tab *The accuracy of the corresponding tree is written in parentheses. **When no tree with two markers were found, a tree with three markers is shown. ***If no single gene was found, both selected markers would be removed in the elimination analysis. Using the original counts to quantify gene expression levels, the best marker among the 180 senescence-associated genes was MAP2K1, leading to a single-gene tree with the highest accuracy of 99%, a sensitivity of 80%, and a specificity of 99% (Table [87]2, column 4). Adding CDKN2D as the second marker increased sensitivity to 93% with a negligible (i.e., <0.1%) effect on accuracy. The corresponding decision tree, which was based on MAP2K1 and CDKN2D, had the best accuracy among all of our trees that could use two of the 180 senescence-associated genes. This suggested that these two genes could serve as effective markers and complement each other in identifying neurescence. In the corresponding tree, the thresholds for the expression levels of these genes were 2 and 1, respectively (Fig. [88]1a), suggesting relatively low expression of these putative marker genes. These thresholds are undesirable for histology markers because distinguishing between levels of expression under the microscope is practically challenging. One would prefer markers that are totally absent in negative cells and ideally, have more than one transcript per cell in senescent neurons to ensure the observable signal is above background. Fig. 1. The decision tree analysis based on CSP, SIP, and SenMayo identified senescence markers in neurons. [89]Fig. 1 [90]Open in a new tab The trees show the classification of neurons into senescent (red) and non-senescent (blue). Each node represents a decision point based on a gene expression levels of the original counts and b the binary transformed expression. This binary tree reached zero false positive and thus a specificity of 1. To provide greater confidence in these lowly expressed marker genes, we repeated our analysis using a binary transformation of the discovery data, where any expression level above zero was converted to 1. Using these binary values, no single gene was found to accurately classify the senescent and non-senescent cells. This suggests that there may not be a single marker in our gene lists specifically expressed in neurescence. Interestingly, the best tree with two genes still included CDKN2D, but MAP2K1 was replaced with ETS2 (Fig. [91]1b, last row of Table [92]2). The binary transformation did not change the accuracy but dropped the sensitivity from 93 to 83%, while increasing the specificity to 100%. After removing CDKN2D and ETS2 from the analysis, E2F3 and RB1 were selected in the second-best tree (Table [93]2, column 5), leading to an accuracy of 99%, specificity of 100%, but a relatively low sensitivity of 67%. Differentially expressed genes To investigate differentially expressed genes in neurescent compared to non-senescent, we performed a DE analysis on the Mathys 2019 discovery dataset. Taking an agnostic approach, we included all 10,768 genes that had non-negligible expression in neurons of the discovery dataset (Methods). We employed two state-of-the-art methods for DE analysis of scRNA-Seq data. The first method, MAST^[94]46, resulted in 375 differentially expressed genes, whereas using the second method, SigEMD^[95]47, we identified 576 differentially expressed genes (Supplementary Table [96]2). The two analyses shared 324 differentially expressed genes (Fig. [97]2 and Supplementary Table [98]2), which represented a significant overlap (p value [MATH: <10 205 :MATH] , hypergeometric test). These 324 genes were overexpressed in the 475 neurescent cells compared to other neurons (Fig. [99]2). In this study, we primarily focused on these 324 genes because they were identified by both DE analysis methods. Fig. 2. The heatmap of differentially expressed genes was identified by comparing senescent versus non-senescent neurons. Fig. 2 [100]Open in a new tab The heatmap shows the expression profile of 324 genes (columns) in the 17,346 cells (rows) in the discovery dataset. These genes were consistently identified as differentially expressed by both MAST and SigEMD methods. Pathway analysis To understand the functional significance of the 324 differentially expressed genes, we performed a pathway analysis using Kyoto Encyclopedia of Genes and Genomes (KEGG)^[101]48 and identified 18 significantly enriched pathways (p value <0.05, Fig. [102]3). Interestingly, the top pathways were related to neurodegeneration including Parkinson disease^[103]49,[104]50, Amyotrophic lateral sclerosis^[105]51, Alzheimer’s disease^[106]52,[107]53, and Huntington’s disease^[108]54, suggesting common underlying biological mechanisms that might be associated with neuronal senescence. Fig. 3. Pathways overrepresented by genes that are differentially expressed in senescent neurons. [109]Fig. 3 [110]Open in a new tab The bubble chart displays the significantly enriched pathways identified from the pathway analysis of 324 differentially expressed genes. Each bubble represents a pathway, with the size indicating the number of DE genes in that pathway and the color denoting the adjusted p value for overlap significance. The top five enriched pathways shared 16 DE genes, which is a significant overlap (p value < [MATH: 108 :MATH] , hypergeometric test, Fig. [111]4). These genes included COX6C, COX7A2L, COX7C, CYCS, KIF5A, NDUFA4, NDUFS1, PSMA7, TUBA4A, TUBB2A, TUBB4A, TUBB4B, UQCRB, UQCRC2, UQCRH, and UQCRHL. These 16 genes are mainly involved in mitochondrial function (COX, NDUF families)^[112]55,[113]56 and cytoskeletal structure (TUBB family)^[114]57,[115]58. Mitochondrial dysfunction leads to impaired energy metabolism and increased production of reactive oxygen species^[116]59. It is a hallmark of both cellular senescence^[117]60 and AD^[118]61. The deregulation of these genes supports prior work demonstrating mitochondrial dysfunction in neurescence, and their role in neurodegeneration^[119]18. Fig. 4. Number of DE genes in the top enriched pathways. [120]Fig. 4 [121]Open in a new tab The number of DE genes in each pathway is listed on the right. The UpSet^[122]62 plot shows the sizes of the overlaps among DE genes and the top five significantly enriched pathways, named on rows. In particular, the last column shows that 16 DE genes are members of all these five pathways, which is a significant overlap (p value [MATH: <10 8 :MATH] ). Identification of specific markers for senescent neurons among differentially expressed genes We hypothesized that there might be some specific markers for neurescence beyond the genes generally known to be associated with senescence (e.g., CSP, SIP, and SenMayo). To test this hypothesis, we extended our decision tree analysis to include the 324 differentially expressed genes identified based on the discovery dataset. Using the original count data, the top gene was DPYSL2 resulting in a relatively accurate classification (i.e., accuracy: 99%, sensitivity: 95%, and specificity: 99%) and when the gene ATP6V1H was added to make a two-gene tree, sensitivity increased to 99% (Table [123]2, column 6). However, for these trees to be useful, meaning that they identify molecules that can serve as biomarkers in histological assays, one must be able to determine whether these genes were expressed at relatively high levels in a neuron (i.e., above 4 and 3, respectively) (Fig. [124]5a). To address this practical issue, we used binary expression levels leading to selection of UQCRHL together with ETS2 (Fig. [125]5b). The recurrent selection of ETS2 (Table [126]2, columns 4 and 6) highlighted its importance even when additional differentially expressed genes were added to the analysis. Since the inclusion of UQCRHL led to very low sensitivity in the validation cohorts (Table [127]3), we considered removing this gene. Excluding UQCRHL and ETS2 led to the selection of CDKN2D and RB1 again, and increased the specificity of senescence classification to 100% in the discovery dataset. Overall, adding differentially expressed genes did not seem to be helpful, as the only reasonable tree in columns 6 and 7 was based on CDKN2D and RB1, which were already in our CSP gene lists. This observation falsified the hypothesis that additional differentially expressed genes could identify specific markers for senescent neurons. Fig. 5. Decision tree analysis based on CSP, SIP, SenMayo, and DE genes identified novel senescence markers in neurons. [128]Fig. 5 [129]Open in a new tab The trees show the classification of neurons into senescent (red) and non-senescent (blue) using a pair of selected markers from a pool of differentially expressed genes merged with CSP, SIP, and SenMayo gene lists. Each node represents a decision point based on a gene expression threshold of original counts and b binary transformed expression. Table 3. Validation performance of the top-performing two-gene decision trees that were trained using the discovery dataset (Mathys 2019) Removed Selected Accuracy Specificity Sensitivity Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024 Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024 Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024 - UQCRHL, ETS2 99% 94% 99% 98% 99% 99% 99% 99% 84% 3% 18% 45% ETS2 UQCRHL, CDKN2D 99% 94% 99% 97% 99% 99% 99% 99% 83% 3% 16% 46% UQCRHL CDKN2D, ETS2 99% 98% 99% 99% 100% 100% 100% 99% 83% 77% 63% 89% UQCRHL, ETS2 CDKN2D, RB1 99% 97% 99% 97% 100% 100% 99% 97% 81% 62% 72% 92% UQCRHL, ETS2, CDKN2D E2F3, RB1 99% 96% 98% 94% 100% 100% 98% 94% 67% 43% 67% 95% UQCRHL, ETS2, RB1 CDKN2D, CDK6 99% 96% 99% 98% 100% 100% 99% 98% 77% 49% 56% 90% UQCRHL, ETS2, RB1, CDK6 CDKN2D, ATM 99% 97% 99% 97% 100% 100% 99% 97% 70% 65% 66% 91% UQCRHL, ETS2, RB1, CDK6, CDKN2D E2F3, RBL2 98% 96% 99% 96% 100% 100% 99% 96% 58% 41% 54% 93% UQCRHL, ETS2, RB1, CDK6, ATM CDKN2D, RBL2 99% 97% 99% 98% 100% 100% 99% 98% 73% 65% 58% 90% UQCRHL, ETS2, RB1, CDK6, ATM, RBL2 CDKN2D, ATP6V1D 99% 97% 99% 98% 99% 99% 99% 98% 88% 68% 51% 88% UQCRHL, ETS2, RB1, CDK6, ATM, RBL2, ATP6V1D CDKN2D, E2F3 99% 96% 99% 98% 100% 100% 100% 99% 66% 44% 54% 88% UQCRHL, ETS2, RB1, CDK6, ATM, RBL2, ATP6V1D, E2F3 CDKN2D, CES4A 99% 97% 99% 98% 99% 99% 99% 98% 82% 69% 37% 84% [130]Open in a new tab Each model was evaluated based on three independent validation datasets: Zhou 2020, Xiong 2023, and Mathys 2024, using the same eigengene thresholds identified in the discovery cohort. Accuracy, sensitivity, and specificity were calculated by comparing the model’s predictions to ground-truth labels derived from eigengene expression thresholds. On each row, the genes in the “Removed” column are excluded, leading to the selection of the pair of genes in the “Selected” column. Identification of alternative markers for senescent neurons The relatively low expression of the selected markers, combined with historical challenges in developing antibodies specific to senescence markers, necessitates that in silico studies identify multiple candidate molecules for subsequent in situ validation. We performed a systematic elimination analysis. We pooled all genes from CSP, SIP, and SenMayo lists with DE genes and fitted decision trees on the binary expression values in the discovery dataset, Mathys 2019 (Table [131]3). Our systematic approach involved iteratively removing the top-performing markers to evaluate their impact on model performance and to identify the next best set of markers. Including UQCRHL in decision trees led to poor sensitivity in the validation datasets. Thus, we excluded UQCRHL and found CDKN2D and ETS2 to be the second-best pair. If reagents to detect ETS2 are unavailable or lack specificity, then CDKN2D and RB1 would be the next best pair of markers for neurescence. Our results showed that removing CDKN2D generally led to decreased accuracy, sensitivity, and specificity in validation datasets. This drop in model performance highlights the critical role CDKN2D plays in the classification. However, it is important to note that the currently available antibodies for p19, the protein product of CDKN2D, may lack specificity, which could complicate their use in histological applications. Discussion Identifying senescent neurons in the human brain is challenging due to the inherent heterogeneity of senescence, neuronal subtypes, and the current lack of reliable markers specific to neurescence in the human brain^[132]62. To tackle this challenge, we systematically and unbiasedly analyzed scRNA-seq data from four datasets of the dorsal prefrontal cortex from postmortem human brains. Our eigengene-based method allowed us to focus on three key gene lists: SenMayo, CSP, and SIP, each reflecting distinct aspects of cellular senescence. While the recently published SenMayo list is a valuable resource, it primarily captures the inflammatory aspects of senescence. The majority of genes in this list are associated with senescence-associated secretory phenotype, and other important characteristics of neural senescence may not be represented by SenMayo. Inflammation is a critical component, particularly in the brain, where it has been linked to neurodegeneration^[133]63,[134]64. The accumulation of senescent cells can lead to a persistent pro-inflammatory state, driven by mitochondrial dysfunction^[135]65,[136]66, and concomitant elevated levels of actors like IL-6, IL-1β, and TNF-α, which are characteristic of SASP^[137]67–[138]69. This chronic inflammation can impair tissue homeostasis and contribute to neuronal dysfunction^[139]70. However, inflammation alone does not fully capture the multifaceted nature of cellular senescence, where other mechanisms such as cell cycle arrest, DNA damage, mitochondrial dysfunction, and metabolic alterations also play critical roles^[140]62,[141]71,[142]72. Moreover, inflammation can occur independently of senescence. This inherent limitation of the SenMayo list underscores the need for a more comprehensive approach. Accordingly, in this study, we included CSP and SIP gene panels to capture other important senescence features such as cell cycle arrest and early stress responses. Our approach leveraged eigengenes derived from these three panels to classify neurons into senescent and non-senescent groups. Our integrative approach using multiple senescence-associated gene panels aims to improve specificity by capturing their combined activity rather than relying on any single pathway. Our findings demonstrated that a single marker gene is insufficient for accurately classifying neurescence. Decision tree analysis, incorporating multiple senescence panels and both continuous and binary expression data, revealed that combining markers significantly improves classification performance. For example, using ETS2 with CDKN2D enhanced model sensitivity and specificity significantly (Table [143]3 and Supplementary Fig. [144]2), highlighting the complex signature of neurescence that cannot be captured by a single gene. While finding reliable markers for neurescence remains a difficult task, our paired-marker strategy, particularly using genes like CDKN2D and ETS2, can offer a promising direction for future studies on neurescence. As expected, pathways associated with neurodegeneration are upregulated in neurescence. Interestingly, we identified 16 DE genes that are overexpressed in all of these upregulated pathways. These 16 genes are related to mitochondrial function and cytoskeletal structure. While the association between mitochondrial function and neurodegeneration has been known^[145]59,[146]73–[147]79, our contribution is to show that these pathways are enriched specifically in neurescence. Follow-up studies to elucidate the biological mechanisms linking these genes and mitochondrial function and neurodegeneration could facilitate the discovery of novel therapeutic targets for AD. Our findings can help such studies focus on mitigating the impact of cellular senescence on neurodegeneration. This study provides promising markers for identifying neurescence, albeit some limitations need attention. Using the original counts to fit decision trees led to non-zero thresholds, which are undesirable for histology markers because distinguishing between levels of expression under the microscope is practically challenging. One would prefer markers that are totally absent in negative cells and ideally, have more than one transcript per cell in senescent neurons to ensure the observable signal is above background. Therefore, we used the binary values. Positive signals using antibodies against cyclin-dependent kinase inhibitors associated with senescence have historically been unreliable^[148]80. CDKN2D, cyclin-dependent kinase 4 inhibitor, encodes p19^INK4D, which we found to be elevated in NFT-bearing neurons in a prior study^[149]16. However, that study revealed that not all neurescent cells expressed p19^INK4D, and not all p19^INK4D-positive cells were neurescent. We hypothesize that co-staining with antibodies against p19^INK4D and ETS2 may increase the specificity of neurescence identification, but this remains to be experimentally determined. Another limitation working with snRNA-seq data lies in the inherent variability among the datasets on hand, despite all being derived from the same region of postmortem human brains. For instance, the distribution of total expression of senescence markers per cell varies considerably across the four datasets (Fig. [150]6). In particular, Mathys 2019 and Mathys 2024 display a more even distribution of gene expression across different levels, while Zhou 2020 and Xiong 2023 exhibit a more skewed distribution towards sparser expression. These differences highlight the challenge of comparing datasets generated by different research groups and under different experimental conditions. The observed variability may stem from technical error, different sample handling procedures, instruments, or biological factors such as the heterogeneity of sampled brain regions and variable progression of neurodegenerative diseases in donors. Also, while our study focused on neuronal senescence, it is important to note that other brain cell types, such as astrocytes, microglia, and oligodendrocyte-lineage cells, have also been reported to exhibit senescent-like phenotypes^[151]81–[152]83. Given the likely variation in senescence marker profiles across these cell types, future studies applying similar approaches to characterize non-neuronal senescence will be essential for the development of targeted senolytics in the brain. Addressing these challenges could pave the way for expanding this study to include more diverse datasets, including those from earlier disease stages and other brain regions. Such an approach would help validate the robustness of the markers identified here. Fig. 6. Distribution of total expression of 499 senescence marker genes. Fig. 6 [153]Open in a new tab The plot illustrates the inherent variability in gene expression distributions among datasets derived from postmortem human brain samples (Table [154]1). The x-axis represents the sum of expression values of all marker genes in each cell. The x-axis is grouped into bins of size 1000. While Mathys 2019 (green) and Mathys 2024 (red) exhibit a more even distribution of gene expression, Zhou 2020 (purple) and Xiong 2023 (blue) display skewed distributions towards sparser expression. Methods The snRNA-Seq datasets We used four snRNA-Seq datasets generated by Mathys et al. ^[155]41, Zhou et al. ^[156]42, Xiong et al. ^[157]43, and Mathys et al. ^[158]44, which are accessible through the Accelerating Medicines Partnership–AD (AMP–AD)^[159]84 with synapse IDs syn18485175, syn2112584, syn52293424, and syn52293442, respectively. The former dataset, referred to as Mathys 2019, was used in all analyses in this study as the discovery (i.e., train) dataset. The latter, referred to as Zhou 2020, Xiong 2023, and Mathys 2024, respectively, were used to validate the robustness and sensitivity of the proposed markers and our decision tree models. All samples were originally generated by longitudinal clinical-pathologic cohort studies of aging and Alzheimer’s disease (AD) from the Religious Order Study (ROS) and the Rush Memory and Aging Project (MAP)^[160]85. We employed the Synapser ([161]https://r-docs.synapse.org/articles/synapser.html) R package^[162]86 (Version 0.6.61) and a custom R script (Version 4.4.1) to download the four snRNA-seq datasets^[163]41–[164]44. We downloaded clinical data from the corresponding publication pages. The snRNA-seq datasets included approximately 80,000, 70,000, 400,000, and 1.3 million single nuclei samples, with median postmortem interval (PMI) of 7, 6, 6, and 6 h, respectively. We included only excitatory and inhibitory neurons in our analysis (Table [165]1). The first three datasets used samples from the dorsal prefrontal cortex of 48, 32, and 92 postmortem human brains, respectively. The Mathys 2024 dataset expanded the scope to include samples from the entorhinal cortex (EC), hippocampus (HC), anterior thalamus (TH), angular gyrus (AG), midtemporal cortex (MT), and prefrontal cortex (PFC) regions of the same brain across 48 postmortem human samples. In this study, we used only the PFC samples, which included approximately 255,000 single-nuclei transcriptomes. Eigengene analysis An eigengene for a given set is the first principal component (PCA)^[166]87, which is the weighted average of the expression of all genes in the set. For each of the three SenMayo, CSP, and SIP gene lists, we used the compute.pigengene() function from the Pigengene package (Version 1.30.0)^[167]40 to compute an eigengene. We addressed the challenge of cell type imbalance by implementing weighted PCA using the weight.pca() function from the Pigengene package to ensure that each cell type contributed equitably to the analysis. Specifically, each cell was weighted by dividing the total number of cells by the frequency of the corresponding cell type. This approach assigns higher weights to rarer cell types and lower weights to more abundant ones, thus balancing their influence in the PCA. We used the project.eigen() function from the Pigengene package to infer eigengene values in the validation datasets. This function computes each eigengene in the validation dataset using the same weights learned from the discovery dataset. After the weighted average is computed, the inferred eigengene was normalized to have the same Euclidean norm as the original eigengene. Cell labeling Three eigengenes were computed based on three independent gene sets associated with senescence: (1) Canonical Senescence Pathway (CSP)^[168]16, (2) Senescence Initiating Pathway (SIP)^[169]16, and (3) SenMayo^[170]6, which consisted of 22, 48, and 125 genes, respectively. The actual number of genes contributed to our eigengene analysis was slightly lower because the discovery dataset included only 22, 44, and 116 of these genes, respectively. We calculated the mean and standard deviation of each eigengene based on the empirical distribution in the discovery dataset. Any cell expressing an eigengene more than mean plus three standard deviations (i.e., mean + 3 s.d.) was considered “overexpressing” the eigengene, and was labeled as CSP^+, SIP^+, or SenMayo^+, depending on their respective gene set. In contrast, any cell expressing the eigengene below the mean was labeled as CSP^-, SIP^-, and SenMayo^-, respectively. Cells overexpressing all three eigengenes were labeled as “senescent”, indicating a consensus across the three gene sets. In contrast, cells were labeled “non-senescent” when they expressed all three eigengenes less than the corresponding means (Table [171]1). Other cells that did not meet either of these two criteria were considered borderline and excluded from our analysis. We used the phyper() function from the stats R package (Version. 4.4.1)^[172]86 to perform a hypergeometric test with the null hypothesis that the number of senescent cells observed in a cell type is more than what would be expected at random. Differential expression analysis Our DE analysis was based on the senescent neurons that were overexpressing all three eigengenes (above mean + 3 s.d.) compared to the non-senescent neurons (below mean) (Table [173]1). We also filtered out 7158 genes that did not have the minimum expression of 1 in at least 200 neurons. We normalized the data by multiplying all nuclei counts by the total library size of 1 million and transformed it to logarithmic space in base 2 (log[2]). Given the significant imbalance between the groups, the smaller senescent class was upsampled by repeating each neuron 36 times so that the number of non-senescent and senescent neurons appeared roughly equal for the DE analysis. We performed the differential expression analysis for senescent and non-senescent neurons, using two popular methods developed for scRna-seq, and the overlapped genes were chosen as the final differentially expressed genes. The first method was carried out using the MAST package (Version 1.30.0)^[174]46,[175]88, which implemented a hurdle model^[176]88 for analyzing scRNA-seq data that consists of a two-part generalized linear model. Considering the bimodality characteristic in single-cell expression data, MAST jointly models the positive mean expression (continuous) and the rates of expression (discrete) values. In this method, genes with a false discovery rate (i.e., adjusted p value) less than 0.01 and an absolute value of log[2] fold change above 6, were declared as differentially expressed genes. The second method, SigEMD (Version 0.21.1)^[177]47,[178]89, which is a custom R script that uses the nonparametric Earth Mover’s Distance (EMD)^[179]90. EMD is a special case of the Wasserstein metric^[180]91, and measures the distance between gene expression distributions. Accordingly, predefined adjusted p values under 0.01 and an EMD score more than 30 were set to identify differentially expressed genes in senescent vs non-senescent neurons. A heatmap of differentially expressed genes was generated using the pheatmap.type() function from the Pigengene R package^[181]40,[182]92. Decision trees Our primary objective in fitting decision trees to the discovery dataset was to identify the most predictive markers. We used the C50 package (Version 0.1.8)^[183]93 in R to construct multiple decision trees based on different criteria. These trees differ in two ways: (a) the eigengenes that served as the basis for neuron labeling and (b) the selection of genes used as features in each tree (Table [184]2). To prevent overfitting, we set the parameter minCases of the function C5.0() to relatively large values to control the number of genes that contribute to a decision tree. This parameter specifies the minimum number of samples required in at least two splits of the tree, thus a larger minCases value results in a smaller decision tree. In this way, none of our trees was allowed to select more than two genes from the pool of genes available to it. We assessed whether the marker genes selected by the decision tree trained on the discovery dataset could accurately classify neurons in the validation datasets. The predict() function was applied to the validation datasets to derive performance metrics of the decision trees fitted to the discovery dataset. This function predicted the senescence status of each neuron based on the trees fitted to the discovery dataset and the expression levels in the validation datasets. Particularly, the same cutoff values on gene expression values that were learned from the discovery dataset were used for the validation dataset. The predictions by decision trees, which used a few single genes, were then compared to the eigengene-based senescence labels. This comparison was done in each of the validation sets to compute accuracy, sensitivity, and specificity^[185]94 of the trees (Table [186]3). Specifically, the neurons that were identified as senescent by both a decision tree and eigengene approaches were considered true positives. In contrast, true negatives were the neurons that were not senescent based on both approaches. In each dataset, (a) accuracy was defined as the number of true positives over the total number of neurons in the dataset, (b) sensitivity was defined as the number of true positives over the number of neurescents based on our eigengene approach, and (c) specificity was the ratio of true negatives over the total number of neurons that were not senescent based on our eigengene approach. The trees varied in the labels used to train them and also in the pool of genes they could select from. Initially, for each of the CSP, SIP, and SenMayo gene lists, we calculated their corresponding eigengenes, along with the positive and negative labels. Note that the resulting labelings could differ across gene lists. Using these three eigengenes and their corresponding gene lists, we trained three independent decision trees (Table [187]2, columns 1–3). Each tree specifically represented the unique characteristics derived from one of the gene lists. We trained a fourth decision tree using the senescent and non-senescent neuron labels, which were unanimously defined based on all three eigengenes (Methods, Cell labeling) (Table [188]1). The fourth tree was allowed to use any of the 180 unique genes from the pooled set of all the genes in the CSP, SIP, and SenMayo lists (Table [189]2, columns 4 and 5). To train the fifth tree, we used the same senescent and non-senescent labels utilized in the fourth tree, but expanded the gene pool to include the 324 differentially expressed genes in addition to the previously combined CSP, SIP, and SenMayo gene lists. This led to including a total of 499 unique genes for training the fifth tree (Table [190]2, columns 6 and 7). Furthermore, our methodology involved training the decision trees twice: first using the original counts directly from the expression matrix that quantified expression levels, and secondly by converting the expression levels into binary values, where any expression level above zero was considered 1. Pathway enrichment analysis We performed a pathway enrichment analysis based on the Kyoto Encyclopedia of Genes and Genomes^[191]48 (KEGG) database using the get.enriched.pw() function from the Pigengene R package (Version 1.30.0)^[192]40,[193]95,[194]96. In this way, we identified the pathways that were overrepresented (adjusted p values of the hypergeometric test <0.05) by the genes that were differentially expressed between senescent and non-senescent neurons. Furthermore, we used the supertest() function from the SuperExactTest R package (Version 1.1.0)^[195]97 to assess the statistical significance of overlapping genes among multiple pathways. Supplementary information [196]Supplementary figures^ (1.2MB, pdf) [197]Supplementary table 1^ (19.5KB, xlsx) [198]Supplementary table 2^ (77.2KB, xlsx) Acknowledgements