Abstract
Neuronal senescence (i.e., neurescence) is an important hallmark of
aging and neurodegeneration, but it remains poorly characterized in the
human brain due to the lack of reliable markers. This study aimed to
identify neurescence markers based on single-nucleus transcriptome data
from postmortem human prefrontal cortex. Using an eigengene approach,
we integrated three gene panels: (a) SenMayo, (b) canonical senescence
pathway (CSP), and (c) senescence initiating pathway (SIP), to identify
neurescence signatures. We found that paired markers outperform single
markers; for instance, by combining CDKN2D and ETS2 in a decision tree,
a high accuracy of 99% and perfect specificity (100%) were achieved in
distinguishing senescent neurons (i.e, neurescent). Differential
expression analyses identified 324 genes that are overexpressed in
neurescent. These genes showed significant associations with important
neurodegeneration-related pathways, including Alzheimer’s disease,
Parkinson’s disease, and Huntington’s disease. Interestingly, several
of these overexpressed genes are linked to mitochondrial dysfunction
and cytoskeletal dysregulation. These findings provide valuable
insights into the complexities of neurescence, emphasizing the need for
further exploration of histologically viable markers and validation in
broader datasets.
Subject terms: Senescence, Neurological disorders, Dementia,
Alzheimer's disease, Biomarkers
Introduction
Cellular senescence is a complex, multi-step process characterized by
stable cell cycle arrest^[34]1, changes in cell morphology^[35]2,[36]3,
changes in gene expression^[37]4–[38]8, and a pro-inflammatory
secretory phenotype^[39]9,[40]10. Senescence was first identified in
fibroblasts^[41]11, is heterogeneous in tissues throughout the
body^[42]12–[43]14, and has been studied in the context of the human
brain^[44]15,[45]16. Increasing evidence implicates cellular senescence
in brain aging and links it to neurodegenerative
disorders^[46]17,[47]18, including Alzheimer’s disease
(AD)^[48]19–[49]22. The presence of senescent cells triggers a
pro-inflammatory environment^[50]23 and contributes to neuron loss,
tissue dysfunction, and cognitive impairment in animal models of AD
pathologies, including amyloid plaques^[51]24,[52]25 and tau-containing
neurofibrillary tangles^[53]18,[54]26.
Despite their significance, defining and identifying senescent cells in
the human brain remains challenging due to their heterogeneous
nature^[55]27. Although reliable markers have been utilized to identify
senescent cells in specific tissues such as adipose tissue, retinal
endothelial cells, and fibroblasts^[56]12,[57]28–[58]33, a universally
applicable set of senescence markers across diverse tissues remains
poorly defined. This challenge is especially pronounced in neurons,
where historically defined senescence markers have not been robustly
validated^[59]34. This knowledge gap has limited our understanding of
when senescent cells first appear in the adult brain, inferring their
contribution to neurodegenerative disease pathophysiology and our
ability to develop potential therapies to modulate or remove senescent
cells from the human brain^[60]21,[61]35.
One recent attempt to generate a comprehensive list of senescent
markers resulted in the SenMayo gene list^[62]6, which includes 125
genes that are highly correlated with age and the expression level of
p16 and p21, cyclin-dependent kinase inhibitors that are upregulated in
many senescent cells. The majority of genes in the SenMayo panel are
senescence-associated secretory phenotype (SAPS) factors. However,
cellular senescence has other aspects, including activation of
senescent cell anti-apoptotic pathways (SCAPs)^[63]36, organelle
dysfunction, and morphology changes. To provide a comprehensive and
multidimensional perspective for neuronal senescence (neurescence), we
employed an eigengene approach based on (a) the SenMayo gene list, in
addition to our two previously published lists^[64]16, including (b)
canonical senescence pathway (CSP) with 22 genes, which reflect cell
cycle arrest^[65]1,[66]37,[67]38, and (c) senescence initiating pathway
(SIP) panel with 48 genes, which are upregulated in early senescence
and activate SCAPs.
Technically, an eigengene is computed as a weighted average expression
of all genes in a given list^[68]39,[69]40 derived using principal
component analysis on those particular genes. In this study, we
computed an eigengene^[70]40 for each of the SenMayo, CSP, and SIP gene
lists and used all these eigengenes for three main analyses: (1)
identifying senescent and non-senescent cells, (2) identifying the most
accurate markers for neuronal senescence, referred to as
neurescence^[71]34, and (3) identifying differentially expressed genes
in senescent vs non-senescent neurons.
Results
We used four independent single-nucleus RNA sequencing (snRNA-seq)
datasets of the dorsal prefrontal cortex from postmortem human brains
were used, which are here referred to as Mathys 2019^[72]41
(n = 80,000), Zhou 2020^[73]42 (n = 70,000), Xiong 2023^[74]43
(n = 400,000), and Mathys 2024^[75]44 (n = 255,000), respectively
(Table [76]1). Mathys 2019 served as the discovery dataset to identify
markers and the other three datasets were used to validate the
performance of the identified markers.
Table 1.
Number of identified senescent neurons in the discovery dataset (Mathys
2019), and the three external validation datasets: Zhou 2020, Xiong
2023, and Mathys 2024
Cohorts Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024
Number of brain samples 48 32 92 48
Number of samples with AD pathology 24 19 44 26
Median ± sd of age 89.5 ± 4.4 89.3 ± 6.2 87.5 ± 3.8 87.5 ± 5
Total number of cells (PFC) 80,000 70,000 400,000 255,000
Senescent neurons (CSP+, SIP+, & SenMayo+) 475 255 124 1178
Non-Senescent neurons (CSP^-, SIP^-, & SenMayo^-) 16,871 4003 95,144
37,061
Total excitatory neurons 34,976 12,590 173,246 112,143
Total inhibitory neurons 9196 3727 61,056 40,290
[77]Open in a new tab
Performing the eigengene approach^[78]16 on the Mathys 2019 discovery
dataset, we identified 475 senescent and 16,871 non-senescent neurons
(Table [79]1). Specifically, the identified 475 senescent neurons
(neurescent cells) expressed each of the three CSP, SIP, and SenMayo
eigengenes more than the corresponding mean plus three times the
standard deviation (mean + 3 s.d.). In contrast, if a neuron expressed
each of these eigengenes less than the corresponding mean, then that
neuron was considered non-senescent (N = 16,871). The remaining 53,288
borderline neurons were excluded from our analysis. We used the
identified neurescent and non-senescent neurons to train decision trees
and to perform differential expression (DE) analysis. Of note, only
excitatory neurons displayed consistent expression across all three
senescence eigengenes, and subsequent differential expression analysis
and marker identification were restricted exclusively to excitatory
neurons.
To further assess the robustness of our method and the reproducibility
of the eigengene-derived senescence markers, we conducted a reciprocal
validation analysis. We used Mathys 2024 as the discovery dataset to
independently derive the eigengene weights for CSP, SIP, and SenMayo,
and then applied these derived weights to Mathys 2019 as a validation
dataset. Comparing the weights from the two independent discovery
datasets (Mathys 2019 vs. Mathys 2024) (Supplementary Table [80]1)
revealed high concordance: specifically, correlations of 94% for CSP
weights, 88% for SIP weights, and 88% for SenMayo weights, respectively
(Supplementary Fig. [81]3). Furthermore, when using the Mathys
2024-derived eigengene weights to identify senescent cells in the
Mathys 2019 dataset, we identified 413 senescent neurons, closely
matching the original 475 neurons that were identified using Mathys
2019 as discovery. Notably, 370 neurons were identified as senescent in
both approaches, representing a highly significant overlap
(hypergeometric test p value
[MATH:
<10−
1714 :MATH]
). Collectively, these results demonstrate the robust consistency and
reproducibility of our eigengene-based senescence identification
framework, regardless of the initial discovery dataset selection.
Identification of specific markers for neurons among genes generally
associated with senescence
To identify potential markers for neurescence, we fitted relatively
small decision trees^[82]45 to the discovery dataset. To ensure the
selected genes were specific, each tree was allowed to use no more than
two genes from the pool of gene sets known to be associated with
cellular senescence^[83]6,[84]16, as indicated on the third row of
Table [85]2.
Table 2.
The top-selected genes for the identification of senescent neurons
based on different criteria
Column number 1 2 3 4 5 6 7
Criteria for positive neurons CSP^+ SIP^+ SenMayo^+ CSP^+ & SIP^+ &
SenMayo^+ CSP^+ & SIP^+ & SenMayo^+
Gene set (number of genes) CSP (22) SIP (44) SenMayo (116) CSP, SIP,
SenMayo (180) CSP, SIP, SenMayo, DE (499)
Excluded gene None None None None Top gene removed*** None Top gene
removed
Number of genes in the tree
Original count data 1 CDKN2D (96%)* SOD1 (98%) HMGB1 (97%) MAP2K1 (99%)
CDKN2D (98%) DPYSL2 (99%) CALM3 (99%)
2 CDKN2, RB1 (99%) SOD1, GSK3B (99%) HMGB1, ETS2 (98%) MAP2K1, CDKN2D
(99%) CDKN2D, HMGB1 (99%) DPYSL2, ATP6V1H (99%) CALM3, SH3GL2 (99%)
Binary data 1 CDKN2D (96%) HRAS (95%) Not found Not found Not found Not
found Not found
2** CDKN2, RB1 (99%) GAAD4, HRAS, MAPK14 (97%) ETS2, CTSB (98%) CDKN2D,
ETS2 (99%) E2F3, RB1 (99%) UQCRHL, ETS2 (99%) CDKN2D, RB1(99%)
[86]Open in a new tab
*The accuracy of the corresponding tree is written in parentheses.
**When no tree with two markers were found, a tree with three markers
is shown.
***If no single gene was found, both selected markers would be removed
in the elimination analysis.
Using the original counts to quantify gene expression levels, the best
marker among the 180 senescence-associated genes was MAP2K1, leading to
a single-gene tree with the highest accuracy of 99%, a sensitivity of
80%, and a specificity of 99% (Table [87]2, column 4). Adding CDKN2D as
the second marker increased sensitivity to 93% with a negligible (i.e.,
<0.1%) effect on accuracy. The corresponding decision tree, which was
based on MAP2K1 and CDKN2D, had the best accuracy among all of our
trees that could use two of the 180 senescence-associated genes. This
suggested that these two genes could serve as effective markers and
complement each other in identifying neurescence. In the corresponding
tree, the thresholds for the expression levels of these genes were 2
and 1, respectively (Fig. [88]1a), suggesting relatively low expression
of these putative marker genes. These thresholds are undesirable for
histology markers because distinguishing between levels of expression
under the microscope is practically challenging. One would prefer
markers that are totally absent in negative cells and ideally, have
more than one transcript per cell in senescent neurons to ensure the
observable signal is above background.
Fig. 1. The decision tree analysis based on CSP, SIP, and SenMayo identified
senescence markers in neurons.
[89]Fig. 1
[90]Open in a new tab
The trees show the classification of neurons into senescent (red) and
non-senescent (blue). Each node represents a decision point based on a
gene expression levels of the original counts and b the binary
transformed expression. This binary tree reached zero false positive
and thus a specificity of 1.
To provide greater confidence in these lowly expressed marker genes, we
repeated our analysis using a binary transformation of the discovery
data, where any expression level above zero was converted to 1. Using
these binary values, no single gene was found to accurately classify
the senescent and non-senescent cells. This suggests that there may not
be a single marker in our gene lists specifically expressed in
neurescence. Interestingly, the best tree with two genes still included
CDKN2D, but MAP2K1 was replaced with ETS2 (Fig. [91]1b, last row of
Table [92]2). The binary transformation did not change the accuracy but
dropped the sensitivity from 93 to 83%, while increasing the
specificity to 100%. After removing CDKN2D and ETS2 from the analysis,
E2F3 and RB1 were selected in the second-best tree (Table [93]2, column
5), leading to an accuracy of 99%, specificity of 100%, but a
relatively low sensitivity of 67%.
Differentially expressed genes
To investigate differentially expressed genes in neurescent compared to
non-senescent, we performed a DE analysis on the Mathys 2019 discovery
dataset. Taking an agnostic approach, we included all 10,768 genes that
had non-negligible expression in neurons of the discovery dataset
(Methods). We employed two state-of-the-art methods for DE analysis of
scRNA-Seq data. The first method, MAST^[94]46, resulted in 375
differentially expressed genes, whereas using the second method,
SigEMD^[95]47, we identified 576 differentially expressed genes
(Supplementary Table [96]2). The two analyses shared 324 differentially
expressed genes (Fig. [97]2 and Supplementary Table [98]2), which
represented a significant overlap (p value
[MATH:
<10−
205 :MATH]
, hypergeometric test). These 324 genes were overexpressed in the 475
neurescent cells compared to other neurons (Fig. [99]2). In this study,
we primarily focused on these 324 genes because they were identified by
both DE analysis methods.
Fig. 2. The heatmap of differentially expressed genes was identified by
comparing senescent versus non-senescent neurons.
Fig. 2
[100]Open in a new tab
The heatmap shows the expression profile of 324 genes (columns) in the
17,346 cells (rows) in the discovery dataset. These genes were
consistently identified as differentially expressed by both MAST and
SigEMD methods.
Pathway analysis
To understand the functional significance of the 324 differentially
expressed genes, we performed a pathway analysis using Kyoto
Encyclopedia of Genes and Genomes (KEGG)^[101]48 and identified 18
significantly enriched pathways (p value <0.05, Fig. [102]3).
Interestingly, the top pathways were related to neurodegeneration
including Parkinson disease^[103]49,[104]50, Amyotrophic lateral
sclerosis^[105]51, Alzheimer’s disease^[106]52,[107]53, and
Huntington’s disease^[108]54, suggesting common underlying biological
mechanisms that might be associated with neuronal senescence.
Fig. 3. Pathways overrepresented by genes that are differentially expressed
in senescent neurons.
[109]Fig. 3
[110]Open in a new tab
The bubble chart displays the significantly enriched pathways
identified from the pathway analysis of 324 differentially expressed
genes. Each bubble represents a pathway, with the size indicating the
number of DE genes in that pathway and the color denoting the adjusted
p value for overlap significance.
The top five enriched pathways shared 16 DE genes, which is a
significant overlap (p value <
[MATH:
10−8
:MATH]
, hypergeometric test, Fig. [111]4). These genes included COX6C,
COX7A2L, COX7C, CYCS, KIF5A, NDUFA4, NDUFS1, PSMA7, TUBA4A, TUBB2A,
TUBB4A, TUBB4B, UQCRB, UQCRC2, UQCRH, and UQCRHL. These 16 genes are
mainly involved in mitochondrial function (COX, NDUF
families)^[112]55,[113]56 and cytoskeletal structure (TUBB
family)^[114]57,[115]58. Mitochondrial dysfunction leads to impaired
energy metabolism and increased production of reactive oxygen
species^[116]59. It is a hallmark of both cellular senescence^[117]60
and AD^[118]61. The deregulation of these genes supports prior work
demonstrating mitochondrial dysfunction in neurescence, and their role
in neurodegeneration^[119]18.
Fig. 4. Number of DE genes in the top enriched pathways.
[120]Fig. 4
[121]Open in a new tab
The number of DE genes in each pathway is listed on the right. The
UpSet^[122]62 plot shows the sizes of the overlaps among DE genes and
the top five significantly enriched pathways, named on rows. In
particular, the last column shows that 16 DE genes are members of all
these five pathways, which is a significant overlap (p value
[MATH:
<10−
8 :MATH]
).
Identification of specific markers for senescent neurons among differentially
expressed genes
We hypothesized that there might be some specific markers for
neurescence beyond the genes generally known to be associated with
senescence (e.g., CSP, SIP, and SenMayo). To test this hypothesis, we
extended our decision tree analysis to include the 324 differentially
expressed genes identified based on the discovery dataset. Using the
original count data, the top gene was DPYSL2 resulting in a relatively
accurate classification (i.e., accuracy: 99%, sensitivity: 95%, and
specificity: 99%) and when the gene ATP6V1H was added to make a
two-gene tree, sensitivity increased to 99% (Table [123]2, column 6).
However, for these trees to be useful, meaning that they identify
molecules that can serve as biomarkers in histological assays, one must
be able to determine whether these genes were expressed at relatively
high levels in a neuron (i.e., above 4 and 3, respectively) (Fig.
[124]5a). To address this practical issue, we used binary expression
levels leading to selection of UQCRHL together with ETS2 (Fig.
[125]5b). The recurrent selection of ETS2 (Table [126]2, columns 4 and
6) highlighted its importance even when additional differentially
expressed genes were added to the analysis. Since the inclusion of
UQCRHL led to very low sensitivity in the validation cohorts (Table
[127]3), we considered removing this gene. Excluding UQCRHL and ETS2
led to the selection of CDKN2D and RB1 again, and increased the
specificity of senescence classification to 100% in the discovery
dataset. Overall, adding differentially expressed genes did not seem to
be helpful, as the only reasonable tree in columns 6 and 7 was based on
CDKN2D and RB1, which were already in our CSP gene lists. This
observation falsified the hypothesis that additional differentially
expressed genes could identify specific markers for senescent neurons.
Fig. 5. Decision tree analysis based on CSP, SIP, SenMayo, and DE genes
identified novel senescence markers in neurons.
[128]Fig. 5
[129]Open in a new tab
The trees show the classification of neurons into senescent (red) and
non-senescent (blue) using a pair of selected markers from a pool of
differentially expressed genes merged with CSP, SIP, and SenMayo gene
lists. Each node represents a decision point based on a gene expression
threshold of original counts and b binary transformed expression.
Table 3.
Validation performance of the top-performing two-gene decision trees
that were trained using the discovery dataset (Mathys 2019)
Removed Selected Accuracy Specificity Sensitivity
Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024 Mathys 2019 Zhou 2020
Xiong 2023 Mathys 2024 Mathys 2019 Zhou 2020 Xiong 2023 Mathys 2024
- UQCRHL, ETS2 99% 94% 99% 98% 99% 99% 99% 99% 84% 3% 18% 45%
ETS2 UQCRHL, CDKN2D 99% 94% 99% 97% 99% 99% 99% 99% 83% 3% 16% 46%
UQCRHL CDKN2D, ETS2 99% 98% 99% 99% 100% 100% 100% 99% 83% 77% 63% 89%
UQCRHL, ETS2 CDKN2D, RB1 99% 97% 99% 97% 100% 100% 99% 97% 81% 62% 72%
92%
UQCRHL, ETS2, CDKN2D E2F3, RB1 99% 96% 98% 94% 100% 100% 98% 94% 67%
43% 67% 95%
UQCRHL, ETS2, RB1 CDKN2D, CDK6 99% 96% 99% 98% 100% 100% 99% 98% 77%
49% 56% 90%
UQCRHL, ETS2, RB1, CDK6 CDKN2D, ATM 99% 97% 99% 97% 100% 100% 99% 97%
70% 65% 66% 91%
UQCRHL, ETS2, RB1, CDK6, CDKN2D E2F3, RBL2 98% 96% 99% 96% 100% 100%
99% 96% 58% 41% 54% 93%
UQCRHL, ETS2, RB1, CDK6, ATM CDKN2D, RBL2 99% 97% 99% 98% 100% 100% 99%
98% 73% 65% 58% 90%
UQCRHL, ETS2, RB1, CDK6, ATM, RBL2 CDKN2D, ATP6V1D 99% 97% 99% 98% 99%
99% 99% 98% 88% 68% 51% 88%
UQCRHL, ETS2, RB1, CDK6, ATM, RBL2, ATP6V1D CDKN2D, E2F3 99% 96% 99%
98% 100% 100% 100% 99% 66% 44% 54% 88%
UQCRHL, ETS2, RB1, CDK6, ATM, RBL2, ATP6V1D, E2F3 CDKN2D, CES4A 99% 97%
99% 98% 99% 99% 99% 98% 82% 69% 37% 84%
[130]Open in a new tab
Each model was evaluated based on three independent validation
datasets: Zhou 2020, Xiong 2023, and Mathys 2024, using the same
eigengene thresholds identified in the discovery cohort. Accuracy,
sensitivity, and specificity were calculated by comparing the model’s
predictions to ground-truth labels derived from eigengene expression
thresholds. On each row, the genes in the “Removed” column are
excluded, leading to the selection of the pair of genes in the
“Selected” column.
Identification of alternative markers for senescent neurons
The relatively low expression of the selected markers, combined with
historical challenges in developing antibodies specific to senescence
markers, necessitates that in silico studies identify multiple
candidate molecules for subsequent in situ validation. We performed a
systematic elimination analysis. We pooled all genes from CSP, SIP, and
SenMayo lists with DE genes and fitted decision trees on the binary
expression values in the discovery dataset, Mathys 2019 (Table [131]3).
Our systematic approach involved iteratively removing the
top-performing markers to evaluate their impact on model performance
and to identify the next best set of markers.
Including UQCRHL in decision trees led to poor sensitivity in the
validation datasets. Thus, we excluded UQCRHL and found CDKN2D and ETS2
to be the second-best pair. If reagents to detect ETS2 are unavailable
or lack specificity, then CDKN2D and RB1 would be the next best pair of
markers for neurescence. Our results showed that removing CDKN2D
generally led to decreased accuracy, sensitivity, and specificity in
validation datasets. This drop in model performance highlights the
critical role CDKN2D plays in the classification. However, it is
important to note that the currently available antibodies for p19, the
protein product of CDKN2D, may lack specificity, which could complicate
their use in histological applications.
Discussion
Identifying senescent neurons in the human brain is challenging due to
the inherent heterogeneity of senescence, neuronal subtypes, and the
current lack of reliable markers specific to neurescence in the human
brain^[132]62. To tackle this challenge, we systematically and
unbiasedly analyzed scRNA-seq data from four datasets of the dorsal
prefrontal cortex from postmortem human brains. Our eigengene-based
method allowed us to focus on three key gene lists: SenMayo, CSP, and
SIP, each reflecting distinct aspects of cellular senescence.
While the recently published SenMayo list is a valuable resource, it
primarily captures the inflammatory aspects of senescence. The majority
of genes in this list are associated with senescence-associated
secretory phenotype, and other important characteristics of neural
senescence may not be represented by SenMayo. Inflammation is a
critical component, particularly in the brain, where it has been linked
to neurodegeneration^[133]63,[134]64. The accumulation of senescent
cells can lead to a persistent pro-inflammatory state, driven by
mitochondrial dysfunction^[135]65,[136]66, and concomitant elevated
levels of actors like IL-6, IL-1β, and TNF-α, which are characteristic
of SASP^[137]67–[138]69. This chronic inflammation can impair tissue
homeostasis and contribute to neuronal dysfunction^[139]70. However,
inflammation alone does not fully capture the multifaceted nature of
cellular senescence, where other mechanisms such as cell cycle arrest,
DNA damage, mitochondrial dysfunction, and metabolic alterations also
play critical roles^[140]62,[141]71,[142]72. Moreover, inflammation can
occur independently of senescence. This inherent limitation of the
SenMayo list underscores the need for a more comprehensive approach.
Accordingly, in this study, we included CSP and SIP gene panels to
capture other important senescence features such as cell cycle arrest
and early stress responses. Our approach leveraged eigengenes derived
from these three panels to classify neurons into senescent and
non-senescent groups. Our integrative approach using multiple
senescence-associated gene panels aims to improve specificity by
capturing their combined activity rather than relying on any single
pathway.
Our findings demonstrated that a single marker gene is insufficient for
accurately classifying neurescence. Decision tree analysis,
incorporating multiple senescence panels and both continuous and binary
expression data, revealed that combining markers significantly improves
classification performance. For example, using ETS2 with CDKN2D
enhanced model sensitivity and specificity significantly (Table [143]3
and Supplementary Fig. [144]2), highlighting the complex signature of
neurescence that cannot be captured by a single gene. While finding
reliable markers for neurescence remains a difficult task, our
paired-marker strategy, particularly using genes like CDKN2D and ETS2,
can offer a promising direction for future studies on neurescence.
As expected, pathways associated with neurodegeneration are upregulated
in neurescence. Interestingly, we identified 16 DE genes that are
overexpressed in all of these upregulated pathways. These 16 genes are
related to mitochondrial function and cytoskeletal structure. While the
association between mitochondrial function and neurodegeneration has
been known^[145]59,[146]73–[147]79, our contribution is to show that
these pathways are enriched specifically in neurescence. Follow-up
studies to elucidate the biological mechanisms linking these genes and
mitochondrial function and neurodegeneration could facilitate the
discovery of novel therapeutic targets for AD. Our findings can help
such studies focus on mitigating the impact of cellular senescence on
neurodegeneration.
This study provides promising markers for identifying neurescence,
albeit some limitations need attention. Using the original counts to
fit decision trees led to non-zero thresholds, which are undesirable
for histology markers because distinguishing between levels of
expression under the microscope is practically challenging. One would
prefer markers that are totally absent in negative cells and ideally,
have more than one transcript per cell in senescent neurons to ensure
the observable signal is above background. Therefore, we used the
binary values.
Positive signals using antibodies against cyclin-dependent kinase
inhibitors associated with senescence have historically been
unreliable^[148]80. CDKN2D, cyclin-dependent kinase 4 inhibitor,
encodes p19^INK4D, which we found to be elevated in NFT-bearing neurons
in a prior study^[149]16. However, that study revealed that not all
neurescent cells expressed p19^INK4D, and not all p19^INK4D-positive
cells were neurescent. We hypothesize that co-staining with antibodies
against p19^INK4D and ETS2 may increase the specificity of neurescence
identification, but this remains to be experimentally determined.
Another limitation working with snRNA-seq data lies in the inherent
variability among the datasets on hand, despite all being derived from
the same region of postmortem human brains. For instance, the
distribution of total expression of senescence markers per cell varies
considerably across the four datasets (Fig. [150]6). In particular,
Mathys 2019 and Mathys 2024 display a more even distribution of gene
expression across different levels, while Zhou 2020 and Xiong 2023
exhibit a more skewed distribution towards sparser expression. These
differences highlight the challenge of comparing datasets generated by
different research groups and under different experimental conditions.
The observed variability may stem from technical error, different
sample handling procedures, instruments, or biological factors such as
the heterogeneity of sampled brain regions and variable progression of
neurodegenerative diseases in donors. Also, while our study focused on
neuronal senescence, it is important to note that other brain cell
types, such as astrocytes, microglia, and oligodendrocyte-lineage
cells, have also been reported to exhibit senescent-like
phenotypes^[151]81–[152]83. Given the likely variation in senescence
marker profiles across these cell types, future studies applying
similar approaches to characterize non-neuronal senescence will be
essential for the development of targeted senolytics in the brain.
Addressing these challenges could pave the way for expanding this study
to include more diverse datasets, including those from earlier disease
stages and other brain regions. Such an approach would help validate
the robustness of the markers identified here.
Fig. 6. Distribution of total expression of 499 senescence marker genes.
Fig. 6
[153]Open in a new tab
The plot illustrates the inherent variability in gene expression
distributions among datasets derived from postmortem human brain
samples (Table [154]1). The x-axis represents the sum of expression
values of all marker genes in each cell. The x-axis is grouped into
bins of size 1000. While Mathys 2019 (green) and Mathys 2024 (red)
exhibit a more even distribution of gene expression, Zhou 2020 (purple)
and Xiong 2023 (blue) display skewed distributions towards sparser
expression.
Methods
The snRNA-Seq datasets
We used four snRNA-Seq datasets generated by Mathys et al. ^[155]41,
Zhou et al. ^[156]42, Xiong et al. ^[157]43, and Mathys et al.
^[158]44, which are accessible through the Accelerating Medicines
Partnership–AD (AMP–AD)^[159]84 with synapse IDs syn18485175,
syn2112584, syn52293424, and syn52293442, respectively. The former
dataset, referred to as Mathys 2019, was used in all analyses in this
study as the discovery (i.e., train) dataset. The latter, referred to
as Zhou 2020, Xiong 2023, and Mathys 2024, respectively, were used to
validate the robustness and sensitivity of the proposed markers and our
decision tree models. All samples were originally generated by
longitudinal clinical-pathologic cohort studies of aging and
Alzheimer’s disease (AD) from the Religious Order Study (ROS) and the
Rush Memory and Aging Project (MAP)^[160]85. We employed the Synapser
([161]https://r-docs.synapse.org/articles/synapser.html) R
package^[162]86 (Version 0.6.61) and a custom R script (Version 4.4.1)
to download the four snRNA-seq datasets^[163]41–[164]44. We downloaded
clinical data from the corresponding publication pages. The snRNA-seq
datasets included approximately 80,000, 70,000, 400,000, and 1.3
million single nuclei samples, with median postmortem interval (PMI) of
7, 6, 6, and 6 h, respectively. We included only excitatory and
inhibitory neurons in our analysis (Table [165]1). The first three
datasets used samples from the dorsal prefrontal cortex of 48, 32, and
92 postmortem human brains, respectively. The Mathys 2024 dataset
expanded the scope to include samples from the entorhinal cortex (EC),
hippocampus (HC), anterior thalamus (TH), angular gyrus (AG),
midtemporal cortex (MT), and prefrontal cortex (PFC) regions of the
same brain across 48 postmortem human samples. In this study, we used
only the PFC samples, which included approximately 255,000
single-nuclei transcriptomes.
Eigengene analysis
An eigengene for a given set is the first principal component
(PCA)^[166]87, which is the weighted average of the expression of all
genes in the set. For each of the three SenMayo, CSP, and SIP gene
lists, we used the compute.pigengene() function from the Pigengene
package (Version 1.30.0)^[167]40 to compute an eigengene. We addressed
the challenge of cell type imbalance by implementing weighted PCA using
the weight.pca() function from the Pigengene package to ensure that
each cell type contributed equitably to the analysis. Specifically,
each cell was weighted by dividing the total number of cells by the
frequency of the corresponding cell type. This approach assigns higher
weights to rarer cell types and lower weights to more abundant ones,
thus balancing their influence in the PCA. We used the project.eigen()
function from the Pigengene package to infer eigengene values in the
validation datasets. This function computes each eigengene in the
validation dataset using the same weights learned from the discovery
dataset. After the weighted average is computed, the inferred eigengene
was normalized to have the same Euclidean norm as the original
eigengene.
Cell labeling
Three eigengenes were computed based on three independent gene sets
associated with senescence: (1) Canonical Senescence Pathway
(CSP)^[168]16, (2) Senescence Initiating Pathway (SIP)^[169]16, and (3)
SenMayo^[170]6, which consisted of 22, 48, and 125 genes, respectively.
The actual number of genes contributed to our eigengene analysis was
slightly lower because the discovery dataset included only 22, 44, and
116 of these genes, respectively. We calculated the mean and standard
deviation of each eigengene based on the empirical distribution in the
discovery dataset. Any cell expressing an eigengene more than mean plus
three standard deviations (i.e., mean + 3 s.d.) was considered
“overexpressing” the eigengene, and was labeled as CSP^+, SIP^+, or
SenMayo^+, depending on their respective gene set. In contrast, any
cell expressing the eigengene below the mean was labeled as CSP^-,
SIP^-, and SenMayo^-, respectively. Cells overexpressing all three
eigengenes were labeled as “senescent”, indicating a consensus across
the three gene sets. In contrast, cells were labeled “non-senescent”
when they expressed all three eigengenes less than the corresponding
means (Table [171]1). Other cells that did not meet either of these two
criteria were considered borderline and excluded from our analysis. We
used the phyper() function from the stats R package (Version.
4.4.1)^[172]86 to perform a hypergeometric test with the null
hypothesis that the number of senescent cells observed in a cell type
is more than what would be expected at random.
Differential expression analysis
Our DE analysis was based on the senescent neurons that were
overexpressing all three eigengenes (above mean + 3 s.d.) compared to
the non-senescent neurons (below mean) (Table [173]1). We also filtered
out 7158 genes that did not have the minimum expression of 1 in at
least 200 neurons. We normalized the data by multiplying all nuclei
counts by the total library size of 1 million and transformed it to
logarithmic space in base 2 (log[2]). Given the significant imbalance
between the groups, the smaller senescent class was upsampled by
repeating each neuron 36 times so that the number of non-senescent and
senescent neurons appeared roughly equal for the DE analysis. We
performed the differential expression analysis for senescent and
non-senescent neurons, using two popular methods developed for
scRna-seq, and the overlapped genes were chosen as the final
differentially expressed genes. The first method was carried out using
the MAST package (Version 1.30.0)^[174]46,[175]88, which implemented a
hurdle model^[176]88 for analyzing scRNA-seq data that consists of a
two-part generalized linear model. Considering the bimodality
characteristic in single-cell expression data, MAST jointly models the
positive mean expression (continuous) and the rates of expression
(discrete) values. In this method, genes with a false discovery rate
(i.e., adjusted p value) less than 0.01 and an absolute value of log[2]
fold change above 6, were declared as differentially expressed genes.
The second method, SigEMD (Version 0.21.1)^[177]47,[178]89, which is a
custom R script that uses the nonparametric Earth Mover’s Distance
(EMD)^[179]90. EMD is a special case of the Wasserstein metric^[180]91,
and measures the distance between gene expression distributions.
Accordingly, predefined adjusted p values under 0.01 and an EMD score
more than 30 were set to identify differentially expressed genes in
senescent vs non-senescent neurons. A heatmap of differentially
expressed genes was generated using the pheatmap.type() function from
the Pigengene R package^[181]40,[182]92.
Decision trees
Our primary objective in fitting decision trees to the discovery
dataset was to identify the most predictive markers. We used the C50
package (Version 0.1.8)^[183]93 in R to construct multiple decision
trees based on different criteria. These trees differ in two ways: (a)
the eigengenes that served as the basis for neuron labeling and (b) the
selection of genes used as features in each tree (Table [184]2). To
prevent overfitting, we set the parameter minCases of the function
C5.0() to relatively large values to control the number of genes that
contribute to a decision tree. This parameter specifies the minimum
number of samples required in at least two splits of the tree, thus a
larger minCases value results in a smaller decision tree. In this way,
none of our trees was allowed to select more than two genes from the
pool of genes available to it.
We assessed whether the marker genes selected by the decision tree
trained on the discovery dataset could accurately classify neurons in
the validation datasets. The predict() function was applied to the
validation datasets to derive performance metrics of the decision trees
fitted to the discovery dataset. This function predicted the senescence
status of each neuron based on the trees fitted to the discovery
dataset and the expression levels in the validation datasets.
Particularly, the same cutoff values on gene expression values that
were learned from the discovery dataset were used for the validation
dataset. The predictions by decision trees, which used a few single
genes, were then compared to the eigengene-based senescence labels.
This comparison was done in each of the validation sets to compute
accuracy, sensitivity, and specificity^[185]94 of the trees (Table
[186]3). Specifically, the neurons that were identified as senescent by
both a decision tree and eigengene approaches were considered true
positives. In contrast, true negatives were the neurons that were not
senescent based on both approaches. In each dataset, (a) accuracy was
defined as the number of true positives over the total number of
neurons in the dataset, (b) sensitivity was defined as the number of
true positives over the number of neurescents based on our eigengene
approach, and (c) specificity was the ratio of true negatives over the
total number of neurons that were not senescent based on our eigengene
approach.
The trees varied in the labels used to train them and also in the pool
of genes they could select from. Initially, for each of the CSP, SIP,
and SenMayo gene lists, we calculated their corresponding eigengenes,
along with the positive and negative labels. Note that the resulting
labelings could differ across gene lists. Using these three eigengenes
and their corresponding gene lists, we trained three independent
decision trees (Table [187]2, columns 1–3). Each tree specifically
represented the unique characteristics derived from one of the gene
lists.
We trained a fourth decision tree using the senescent and non-senescent
neuron labels, which were unanimously defined based on all three
eigengenes (Methods, Cell labeling) (Table [188]1). The fourth tree was
allowed to use any of the 180 unique genes from the pooled set of all
the genes in the CSP, SIP, and SenMayo lists (Table [189]2, columns 4
and 5). To train the fifth tree, we used the same senescent and
non-senescent labels utilized in the fourth tree, but expanded the gene
pool to include the 324 differentially expressed genes in addition to
the previously combined CSP, SIP, and SenMayo gene lists. This led to
including a total of 499 unique genes for training the fifth tree
(Table [190]2, columns 6 and 7).
Furthermore, our methodology involved training the decision trees
twice: first using the original counts directly from the expression
matrix that quantified expression levels, and secondly by converting
the expression levels into binary values, where any expression level
above zero was considered 1.
Pathway enrichment analysis
We performed a pathway enrichment analysis based on the Kyoto
Encyclopedia of Genes and Genomes^[191]48 (KEGG) database using the
get.enriched.pw() function from the Pigengene R package (Version
1.30.0)^[192]40,[193]95,[194]96. In this way, we identified the
pathways that were overrepresented (adjusted p values of the
hypergeometric test <0.05) by the genes that were differentially
expressed between senescent and non-senescent neurons. Furthermore, we
used the supertest() function from the SuperExactTest R package
(Version 1.1.0)^[195]97 to assess the statistical significance of
overlapping genes among multiple pathways.
Supplementary information
[196]Supplementary figures^ (1.2MB, pdf)
[197]Supplementary table 1^ (19.5KB, xlsx)
[198]Supplementary table 2^ (77.2KB, xlsx)
Acknowledgements