Abstract
Introduction
Epilepsy is a chronic brain disease with a certain degree of
neurodegeneration and is caused by abnormal discharges of neurons. The
mechanism of cell senescence has garnered increasing attention in
neurodegenerative diseases. However, the role of cell senescence in the
onset and progression of epilepsy is unclear. Therefore, this study
constructed a diagnostic model of epilepsy based on cellular
senescence-related genes (CSRGs) to analyze their role in disease
pathogenesis.
Methods
The differentially expressed genes (DEGs) were screened from the
epileptic sample dataset of the gene expression omnibus (GEO) database,
and the cellular senescence-related DEGs (CSRDEGs) related to epilepsy
were identified by CSRGs crossover. The functional enrichment
characteristics of CSRDEGs were analyzed using gene ontology (GO) and
Kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses. The
differences in biological processes between high and low-risk groups
were analyzed using gene set enrichment analysis (GSEA). For model
construction, logistic regression, random forest, and least absolute
shrinkage and selection operator (LASSO) regression were employed to
identify key genes, including ribosomal protein S6 kinase alpha-3
(RPS6KA3), cathepsin D (CTSD), and zinc finger protein 101 (ZNF101).
Subsequently, a multifactor logistic regression model was developed to
evaluate the risk of epilepsy based on these screened genes.
Results
The model exhibited higher area under the curve (AUC) values in the GSE
data sets 143272 and 32534, producing encouraging results. Finally,
mRNA-miRNA and mRNA-transcription factors (TFs) networks revealed the
potential regulatory mechanism of the selected critical genes in the
disease.
Discussion
This study elucidated the possible process of cell senescence in
epileptic diseases through bioinformatics analysis, offering a
potential target for personalized diagnosis and precise treatment.
Keywords: bioinformatics analysis, cell senescence, diagnosis model,
epilepsy, senescence-associated secretory phenotype
1. Introduction
Epilepsy is a common neurological disorder with a high prevalence
worldwide. According to the World Health Organization, there are
approximately 50 million individuals with epilepsy globally, and
millions of new cases emerge annually. The incidence of epilepsy varies
significantly among different regions and populations. Generally,
epilepsy is more common in developing and low-income countries than in
developed ones. This may be due to various factors, including a lack of
medical resources, limited diagnosis, and insufficient disease
management. Furthermore, epilepsy has a relatively higher incidence
rate among children and the elderly. It is one of the primary causes of
neurological disorders in children, significantly affecting the quality
of life and social functioning of patients ([33]1). Epilepsy has a high
incidence rate currently, but the treatment effect is unsatisfactory
([34]2). Therefore, it is crucial to investigate its molecular
mechanisms further to identify novel therapeutic targets.
Cell senescence is regarded as one of the key pathological mechanisms
in neurodegenerative diseases ([35]3, [36]4), such as Alzheimer’s
disease (AD) ([37]5–7) and Parkinson’s disease (PD) ([38]3, [39]8).
Neurons and glial cells in AD and PD often display typical aging
characteristics, including cell cycle arrest, organelle dysfunction,
and a pro-inflammatory phenotype ([40]9). Although the role of cellular
senescence in neurodegenerative diseases has been extensively studied,
its relevance in the pathogenesis of epilepsy has not been fully
explored. Some studies in recent years have initially revealed clues
indicating that cell senescence may play an important role in epilepsy.
Some studies revealed that there are senescent phenotype
characteristics (such as upregulation of p16^INK4a expression and
abnormal activation of cell cycle genes) ([41]10); this implies that
cell senescence may be crucial for developing epileptogenic zones and
the regulation of seizure frequency. Additionally, the association
between epilepsy and neuroinflammation has been widely reported, and
the senescence-associated secretory phenotype (SASP) cellular may be a
major contributor in promoting the formation of a chronic inflammatory
environment ([42]11). As a result, investigating the role of cell
senescence-related genes (CSRGs) in epilepsy may assist in identifying
specific mechanisms behind the onset and progression of epilepsy and
providing a theoretical basis for exploring new therapeutic targets.
In light of this, this study aimed to systematically analyze the
functions of CSRGs and explore their potential role in the pathogenesis
of epilepsy. Considering the critical role of CSRGs in other
neurological diseases, we hypothesized that they may also be crucial in
the pathogenesis of epilepsy. This study will use bioinformatics
analysis to identify differentially expressed CSRGs related to epilepsy
and investigate their regulatory mechanisms in neuronal function and
inflammatory response. This study may offer a theoretical foundation
for further understanding the molecular mechanisms underlying epilepsy
and exploring new therapeutic targets.
2. Materials and methods
2.1. Data collection and preprocessing
This study acquired three epilepsy-related Homo sapiens datasets from
the Gene Expression Omnibus (GEO) database[43]^1 and Gene Set
Enrichment (GSE) [44]GSE143272, [45]GSE32534, and [46]GSE4290, using
the R package GEOquery ([47]12). The chip platform of the [48]GSE143272
dataset was [49]GPL10558, and the chip platform of datasets
[50]GSE32534 and [51]GSE4290 was [52]GPL570. The specific details are
given in [53]Table 1. Whole blood tissue served as the tissue source
for dataset [54]GSE143272, including 34 untreated epilepsy samples and
51 healthy control samples. Peritumoral cortical tissue was the source
of dataset [55]GSE32534, which contains five epilepsy samples and five
control samples. The dataset [56]GSE4290 originated from brain tissue
containing 23 epilepsy samples.
Table 1.
GEO microarray chip information.
Data set [57]GSE143272 [58]GSE32534 [59]GSE4290
Platform [60]GPL10558 [61]GPL570 [62]GPL570
Species Homo sapiens Homo sapiens Homo sapiens
Tissue Whole blood tissues Peritumoral cortex tissues Brain tissues
Samples in epilepsy group 34 5 23
Samples in control group 51 5 /
Reference PMID: 30826443
PMID: 32054883 PMID: 23418513 PMID: 16616334
[63]Open in a new tab
This study aimed to identify genes associated with cellular senescence
by collecting protein-coding CSRGs from the GeneCards database.[64]^2
Protein-coding genes were screened and retained by using “cell
senescence” as the search keyword, identifying 3,575 CSRGs.
Additionally, relevant literature was searched in PubMed[65]^3 using
the same keyword, and 279 genes related to the cellular aging process
in different studies were further collected. After merging and
deduplication, 3,619 CSRGs were obtained. The detailed information is
depicted in [66]Supplementary Table 1.
Datasets [67]GSE143272 and [68]GSE32534 were preprocessed for
standardized and normalized probe annotations using the R package limma
([69]13). A subset of samples from dataset [70]GSE143272 and all
samples from dataset [71]GSE32534 were included in this analysis.
Furthermore, epilepsy-related samples in dataset [72]GSE4290 served as
a validation set for subsequent analyses.
2.2. Differentially expressed genes analysis
The samples were divided into two groups based on their grouping in the
[73]GSE143272 and [74]GSE32534 datasets: an epilepsy group and a
control group. Differential expression between epilepsy and control
groups was analyzed using the R package limma. The threshold for
identifying differentially expressed genes (DEGs) was established at
|logFC| >0.10 and p-value <0.05. Specifically, genes were categorized
as upregulated if their logFC >0.10 and p-value <0.05 and as
downregulated if their |logFC| <−0.10 and p-value <0.05. The results of
differential expression analysis were displayed using volcano plots
generated by the R package ggplot2 ([75]14).
Variance analysis was first used to identify DEGs from the
[76]GSE143272 and [77]GSE32534 datasets in order to identify cellular
senescence-related DEGs (CSRDEGs) associated with epilepsy. The
screening criteria included |logFC| >0.10 and p-value <0.05.
Subsequently, these DEGs were analyzed for intersection with the CSRGs
of all epilepsy samples in the [78]GSE4290 dataset. CSRDEGs were
finally identified by constructing a Venn diagram displaying the
overlap of these genes. Heatmaps were created using the R package
pheatmap ([79]15) to display these CSRDEGs.
2.3. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment
analysis
Gene Ontology (GO) analysis is a commonly used method for large-scale
functional enrichment studies, including cellular component (CC),
biological process (BP), and molecular function (MF). The Kyoto
Encyclopedia of Genes and Genomes (KEGG) is currently the most widely
used database for storing information on genomes, biological pathways,
diseases, and drugs. Enrichment analysis of CSRDEGs was performed using
GO and KEGG pathway analysis using the R package clusterProfiler
([80]16). The Screening criteria for statistical significance were
defined as p-value <0.05 and false discovery rate (FDR) (q-value)
<0.25.
2.4. Gene set enrichment analysis
Gene set enrichment analysis (GSEA) is used to evaluate the
distribution trends of genes in predefined gene sets ranked based on
their correlation with a specific phenotype to determine their
contribution to a given phenotype. In this study, the genes in the
[81]GSE143272 dataset were first arranged according to their logFC
values. Subsequently, GSEA analysis was conducted on all genes in the
[82]GSE143272 dataset using the R package clusterProfiler. The
parameters of the GSEA analysis were set as follows: the seed value was
2,020, the number of calculations was 1,000, the minimum number of
genes contained in each gene set was 10, and the maximum number of
genes was 500. The c2 gene set (Cp. All. V2022.1. Hs. Symbols) was
obtained through the Molecular Signatures Database. GSEA used gene
matrix transposed (GMT) files containing all canonical pathways (3,050
pathways). The screening criteria for significant enrichment were set
as adjusted p-value <0.05 and FDR value (q-value) <0.25, with p-value
correction using the Benjamini–Hochberg (BH) method ([83]17).
Epilepsy samples in the [84]GSE4290 dataset were divided into high- and
low-risk groups based on the median logistic risk score. Differential
expression analysis was performed using the R package limma, and genes
with |logFC| >0.10 and p-value <0.05 in high- and low-risk groups were
eliminated. This research focuses on developing a diagnostic support
model for epilepsy that differentiates between patients with epilepsy
and healthy individuals based on gene expression profiles linked to
cellular aging. The model produces scores designed to assess the
probability of epilepsy during diagnosis rather than evaluate disease
progression or recurrence risk.
2.5. Screening key genes
An initial single-factor logistic regression analysis was performed on
CSRDEGs, utilizing a screening criterion of p-value <0.05 to identify
key genes and construct a diagnostic model. The CSRDEGs identified
through this screening were subsequently analyzed using a random forest
(RF) approach. RF ([85]18), an ensemble learning algorithm, integrates
multiple decision trees through bootstrap aggregation and is frequently
employed in model construction to generate numerous decision trees,
with the final prediction determined by majority voting. The expression
levels of CSRDEGs in the [86]GSE143272 dataset were filtered using
single-factor logistic regression, and the model was developed using
the RF package with the parameters “set.seed (234)” and “ntree = 500.”
The “MeanDecreaseGini ([87]19)” metric, which assesses the average
reduction in node impurity for a variable across all decision trees,
was employed to evaluate the importance of each variable within the
model. A higher MeanDecreaseGini value indicates greater variable
importance. To determine the optimal number of variables, five 10-fold
cross-validations were conducted, and the resulting cross-validation
curves were combined. Cross-validation is a method of evaluating model
performance by dividing the data into different training and validation
sets multiple times, helping to alleviate the problems of overfitting
and insufficient training data. Cross-validation was performed using
the training set, and the number of variables was selected that
minimized the error. The significant variables were selected for
subsequent analysis based on the MeanDecreaseGini value.
The least absolute shrinkage and selection operator (LASSO) regression
analysis ([88]20) was carried out using the R package glmnet ([89]21).
The parameters were adjusted to “set.seed (500)” and running the number
of times to 200 to avoid overfitting. LASSO regression helps reduce
overfitting and enhances the generalizability of the model by adding a
penalty term (lambda times the absolute coefficient value) to the
linear regression model. The results of the LASSO regression analysis
are visualized through diagnostic model plots and variable trajectory
plots. CSRDEGs included in the final LASSO regression model were
considered key genes for subsequent analyses. The regulatory mechanisms
of key genes were thoroughly explored through these network analyses,
and their potential functions in epilepsy pathology were revealed.
2.6. Constructing mRNA-miRNA and mRNA-transcription factors (TFs) interaction
network
The regulatory mechanisms of key genes and their potential roles in
epilepsy pathology were explored by constructing mRNA-miRNA and mRNA-TF
interaction networks to identify the upstream regulators and downstream
regulatory targets of these key genes. The miRWalk database was
used[90]^4 to predict potential miRNA regulatory molecules of key
genes. A total of 72 miRNAs related to 11 key genes were screened out,
and the mRNA-miRNA interaction network was constructed and visualized
by Cytoscape software. The results demonstrated that ribosomal protein
S6 kinase alpha-3 (RPS6KA3) has a significant regulatory association
with multiple miRNAs, such as hsa-miR-30c-5p and hsa-miR-19a-3p,
indicating that these miRNAs may participate in the pathogenesis of
epilepsy by regulating the expression of key genes. Additionally, the
interactions of key genes with their transcription factors (TFs) were
predicted using the ChIPBase database ([91]22).[92]^5 The 40 TFs that
were identified encompassed JUN, FOS, and STAT3, among others, which
interact with 13 key genes. Cytoscape software was used to construct an
mRNA-TF interaction network, exhibiting the regulatory network
association between RPS6KA3, zinc finger protein 101 (ZNF101), IL7R,
and other genes and these TFs. This result suggested that these TFs may
play an important role in the pathological process of epilepsy by
regulating the expression of key genes.
2.7. Key genes for building diagnostic logistic regression models
Logistic regression models are often used to analyze the relationship
between independent variables and binary dependent variables. In this
study, all key genes were included to construct a multifactor logistic
regression model. The coefficient of each key gene in the model was
multiplied by its corresponding expression level, and the results were
added to calculate the risk score of each sample. The dataset was
divided into high- and low-risk groups based on the median risk score.
The risk score is calculated as follows:
[MATH: Risk score=∑iCoefficient(genei)∗mRNA expression(genei) :MATH]
Nomogram ([93]23) is a graphical tool that uses a set of
non-overlapping line segments to represent the functional relationship
between multiple independent variables in a rectangular coordinate
system. A nomogram was plotted based on the results of the multivariate
logistic regression model using the R package rms ([94]24) to
demonstrate the interrelationships between key genes included in the
model.
Decision curve analysis (DCA) is a method used to evaluate the clinical
utility of predictive models, diagnostic tests, and molecular markers.
DCA was performed on key genes of the [95]GSE143272 dataset using the R
package ggDCA, and decision curve plots were constructed.
Furthermore, ROC curves of the logistic risk scores in [96]GSE143272
and [97]GSE32534 datasets were plotted using the R package pROC
([98]25), and the area under the curve (AUC) values were calculated.
The ROC curve is a tool used to evaluate model performance by analyzing
the trade-off between sensitivity and specificity to select the optimal
model, eliminate suboptimal models, or determine the optimal threshold
within a single model. The ROC curve comprehensively measures the
ability of the test to distinguish different conditions. The AUC can be
used to evaluate the diagnostic performance of the logistic risk score
for epilepsy. It is usually between 0.5 and 1. The diagnostic
performance improves when the AUC is closer to 1. Low accuracy is
indicated by an AUC between 0.5 and 0.7, moderate accuracy by an AUC
between 0.7 and 0.9, and high accuracy by an AUC above 0.9.
Semantic comparison of GO annotations provides a quantitative method
for assessing similarities between genes and genomes, which has become
an essential basis for many bioinformatics analysis methods. This study
used the R package GOSemSim ([99]26) to calculate the functional
correlations (Friends) of key genes and determine their relationships
through functional similarity analysis.
2.8. Key gene differential expression verification
The differences in key gene expression between epilepsy and control
groups in the [100]GSE143272 and [101]GSE32534 datasets were analyzed
using the Mann–Whitney U test ([102]27) (Wilcoxon rank sum test). The
differential analysis results were visualized using comparison plots
between groups, plotted by the R package ggplot2. R package pROC was
used to generate the ROC curves of key genes between epilepsy and
control groups in [103]GSE143272 and [104]GSE32534 datasets. The AUC
values were calculated to evaluate the diagnostic effect of key gene
expression in patients with epilepsy.
2.9. Statistical analyses
The data processing and analyses in this study were performed using R
software (version 4.2.2). Unless otherwise indicated, the independent
samples t-test (Student’s t-test) was used to determine statistical
significance for comparisons of continuous variables between two groups
if the variables were normally distributed. The Mann–Whitney U test
(Wilcoxon rank sum test) was used to analyze variables that did not
follow the normal distribution. Kruskal–Wallis test was used when
comparing three or more groups ([105]28). Correlation coefficients
between different molecules were calculated using Spearman correlation
analysis ([106]29). Unless otherwise stated, all statistical p-values
are two-sided, and p-values <0.05 were considered statistically
significant.
3. Results
3.1. Processing of epilepsy datasets
First, the batch effect was removed from the epilepsy datasets
[107]GSE143272 and [108]GSE32534 using the R package sva ([109]30). The
effectiveness of this process was evaluated by comparing expression
levels before and after batch effects were eliminated via distribution
box plots. Results for dataset [110]GSE143272 are displayed in
[111]Figures 1A,[112]B, whereas the results for dataset [113]GSE32534
are displayed in [114]Figures 1C,[115]D. Boxplot analysis revealed that
the batch effect of the epilepsy dataset had been effectively
eliminated; therefore, the distribution boxplot confirmed the accuracy
and reliability of gene expression data analysis.
Figure 1.
[116]Figure 1
[117]Open in a new tab
Data to batch processing. (A,B) Boxplot of [118]GSE143272 distribution
of epilepsy dataset before (A) and after (B) going to batch. (C,D)
Epilepsy dataset [119]GSE32534 distribution boxplot before (C) and
after (D) debatched. Light blue is the control group, and light red is
the epilepsy group.
3.2. DEGs related to epileptic cell senescence
Differentiated expression analysis was performed to analyze gene
expression differences in [120]GSE143272 and [121]GSE32534 datasets
between epilepsy and control groups using the R package limma. A total
of 3,430 DEGs were identified in the [122]GSE143272 dataset. The
filtering criteria included |logFC| >0.10 and p-value <0.05, including
1,631 upregulated (logFC >0.10 and p-value <0.05) and 1,799
downregulated genes (logFC <−0.10 and p-value <0.05). The results are
displayed by a volcano plot ([123]Figure 2A). Moreover, 930 DEGs were
identified in the [124]GSE32534 dataset with the same screening
criteria, which included 458 upregulated and 472 downregulated genes,
and the results are displayed in the volcano plot ([125]Figure 2B).
Figure 2.
[126]Figure 2
[127]Open in a new tab
Differential gene expression analysis. (A,B) Volcano plot of
differentially expressed genes analysis between epilepsy group and
control group in [128]GSE143272 (A) and [129]GSE32534 dataset (B). (C)
DEGs in [130]GSE143272 and [131]GSE32534 datasets, genes and CSRGs Venn
diagram of all epilepsy samples in [132]GSE4290 dataset. (D,E) Heat map
of CSRDEGs in [133]GSE143272 (D) and [134]GSE32534 datasets (E). DEGs,
differentially expressed genes; CSRGs, cellular senescence-related
genes; CSRDEGs, cellular senescence-related differentially expressed
genes. Light red is the epilepsy group; light blue is the control
group. In the heat map, red represents high expression, and blue
represents low expression.
The intersection analysis of the DEGs of [135]GSE143272 and
[136]GSE32534 and the CSRGs in the [137]GSE4290 dataset was conducted
to draw a Venn diagram ([138]Figure 2C) to identify CSRDEGs related to
cellular senescence. Ultimately, 40 CSRDEGs were identified, including
ZNF101, cathepsin D (CTSD), ribonucleotide reductase subunit M2
(RRM2B), anti-RNA polymerase II subunit H (POLR2H), myristoylated
alanine-rich C kinase substrate like 1 (MARCKSL1), ETS proto-oncogene 1
(ETS1), cellular repressor of E1A-stimulated genes 1 (CREG1),
tripartite motif-containing protein 21 (TRIM21), mitogen-activated
protein kinase kinase 4 (MAP2K4), platelet endothelial cell adhesion
molecule 1 (PECAM1), methionine sulfoxide reductase A (MSRA),
proteasome activator subunit 1 (PSME1), interferon regulatory factor-1
(IRF1), E74-like factor 4 (ELF4), minichromosome maintenance 3 (MCM3),
cathepsin S (CTSS), tumor necrosis factor α receptor 1 (TNFRSF1A),
chloride intracellular channel protein 1 (CLIC1), E-twenty six variant
gene 6 (ETV6), RNA-binding motif protein 25 (RBM25), neutrophil
cytosolic factor 4 (NCF4), heat shock 70-kDa protein 6 (HSPA6),
speckled protein 100 kDa (SP100), RPS6KA3, ras association
domain-containing protein 5 (RASSF5), tenascin-C (TNC),
mitogen-activated protein kinase 1 (MAPK1), pyruvate dehydrogenase E1
subunit beta (PDHB), growth factor progranulin (GRN), interleukin-7
receptor (IL7R), ATPase copper-transporting alpha (ATP7A), Von
Hippel–Lindau (VHL), tumor necrosis factor superfamily factor 10
(TNFSF10), DNA methyltransferase-1 (DNMT1), interferon-induced
transmembrane protein 3 (IFITM3), carboxypeptidase vitellogenic like
(CPVL), TOR signaling pathway regulator-like (TIPRL), serglycin (SRGN),
neural cell adhesion molecule 1 (NCAM1), and S100 calcium binding
protein A6 (S100A6). The expression differences of these CSRDEGs
between different sample groups of the [139]GSE143272 and [140]GSE32534
datasets were further analyzed. Heatmaps were drawn using the R package
heatmap to present the analysis results ([141]Figures 2D,[142]E).
3.3. GO and KEGG enrichment analyses
GO and KEGG enrichment analyses were conducted to explore the
association between 40 CSRDEGs and CCs, MFs, and biological pathways
associated with epilepsy. The detailed results are presented in
[143]Table 2. The analysis indicated that the 40 CSRDEGs were
significantly enriched in specific CCs, including the lumen of
secretory granules, cytoplasmic vesicles, vacuoles, and late endosomes.
Additionally, these genes exhibited enrichment in MFs, notably MAP
kinase activity. Crucially, the CSRDEGs were intricately associated
with several key biological pathways, as identified by KEGG, including
lipid metabolism and atherosclerosis, regulation of apoptosis, the MAPK
signaling pathway, and the TNF signaling pathway. The GO and pathway
enrichment analysis results were visualized by bar graphs and bubble
plots ([144]Figures 3A,[145]B).
Table 2.
Results of GO and KEGG enrichment analysis for CSRDEGs.
Ontology ID Description Gene ratio Bg ratio p-value p. adjust q-value
CC GO:0034774 Secretory granule lumen 6/40 322/19594 4.5061 × 10^−5
0.00170598 0.0010723
CC GO:0060205 Cytoplasmic vesicle lumen 6/40 325/19594 4.7447 × 10^−5
0.00170598 0.0010723
CC GO:0031983 Vesicle lumen 6/40 327/19594 4.9093 × 10^−5 0.00170598
0.0010723
CC GO:0005775 Vacuolar lumen 5/40 174/19594 2.6671 × 10^−5 0.00170598
0.0010723
CC GO:0005770 Late endosome 5/40 283/19594 0.00026391 0.00611396
0.00384293
MF GO:0004708 MAP kinase activity 2/40 18/18410 0.00068893 0.09438274
0.08412138
KEGG hsa05417 Lipid and atherosclerosis 6/30 215/8164 0.00010881
0.01032613 0.00843993
KEGG hsa04210 Apoptosis 5/30 136/8164 0.00012148 0.01032613 0.00843993
KEGG hsa04010 MAPK signaling pathway 5/30 294/8164 0.00397777
0.08501609 0.06948683
KEGG hsa04668 TNF signaling pathway 4/30 112/8164 0.00069861 0.03958799
0.03235674
KEGG hsa05152 Tuberculosis 4/30 180/8164 0.00400076 0.08501609
0.06948683
[146]Open in a new tab
GO, Gene Ontology; CC, cellular component; MF, molecular function;
KEGG, Kyoto Encyclopedia of Genes and Genomes; CSRDEGs, cellular
senescence-related differentially expressed genes.
Figure 3.
[147]Figure 3
[148]Open in a new tab
GO and KEGG enrichment analysis of CSRDEGs. (A,B) GO and pathway
enrichment analysis results of CSRDEGs bar graph (A) and bubble plot
(B) show CC, MF and biological pathway. GO terms and KEGG terms are
shown on the ordinate. (C–E) GO and pathway (KEGG) enrichment analysis
results of CSRDEGs network diagram showing CC (C), MF (D), and KEGG
(E). Pink nodes represent items, blue nodes represent molecules, and
the lines represent the relationship between items and molecules.
CSRDEGs, cellular senescence-related differentially expressed genes;
GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; CC,
cellular component; MF, molecular function. The screening criteria for
GO and pathway enrichment analysis were p-value <0.05 and FDR value
(q-value) <0.25.
Meanwhile, GO enrichment analysis was used to construct the network
diagram of CC, MF, and biological pathways ([149]Figures 3C–[150]E).
The lines represent the corresponding molecules and the annotations of
the corresponding entries, and the larger the nodes, the more molecules
the entries contain.
3.4. GSEA
GSEA analysis was used to study the association between the expression
levels of all genes in the [151]GSE143272 dataset and the BPs, CCs, and
MFs in which they were involved ([152]Figure 4A); this helped determine
the effects of expression levels of all genes in the [153]GSE143272
dataset on epilepsy. The specific results are displayed in [154]Table
3. The findings revealed that all genes in the [155]GSE143272 dataset
were significantly enriched in neutrophil degranulation ([156]Figure
4B), IL6 7 pathway ([157]Figure 4C), and NABA extracellular matrix
(ECM) affiliated ([158]Figure 4D). Dectin 2 family ([159]Figure 4E) and
other biologically related functions and signaling pathways.
Figure 4.
[160]Figure 4
[161]Open in a new tab
GSEA analysis of epilepsy. (A) GSEA mountain plot of 4 biological
functions of dataset [162]GSE143272. (B–E) Gene set enrichment analysis
(GSEA) showed that all genes were significantly enriched in neutrophil
degranulation (B), IL6 7 pathway (C), NABA ECM affiliated (D), and
Dectin 2 family (E). GSEA, gene set enrichment analysis. The screening
criteria of GSEA were adj. p-value <0.05 and FDR value (q-value) <0.25,
and the p-value correction method was Benjamini–Hochberg (BH).
Table 3.
Results of GSEA for [163]GSE143272.
ID Set size Enrichment score NES p-value p. adjust q-value
REACTOME_NEUTROPHIL_DEGRANULATION 379 0.51414888 2.53977724 12 × 10^−10
2.1178 × 10^−8 1.7696 × 10^−8
PID_IL6_7_PATHWAY 34 0.58337638 2.00811468 0.00011881 0.00503238
0.004205
NABA_ECM_AFFILIATED 46 0.54504668 1.9943069 9.0572 × 10^−5 0.00411023
0.00343446
REACTOME_DECTIN_2_FAMILY 11 0.7478177 1.88658268 0.00111907 0.0234389
0.01958527
[164]Open in a new tab
GSEA, gene set enrichment analysis.
3.5. Screening key genes
A multi-step analysis was performed to evaluate the value of 40 CSRDEGs
in the diagnosis of reflux. First, single-factor logistic regression
was employed to screen 40 CSRDEGs using a p-value <0.05 as a screening
criterion. The results demonstrated that 39 significant CSRDEGs were
identified ([165]Supplementary Table 2). Subsequently, the expression
levels of these 39 CSRDEGs in the engineering group were analyzed and
led in the [166]GSE143272 dataset using the RF algorithm. The algorithm
was configured with a seed value of 234 and 500 decision trees,
analogous to the decision tree workpiece spindle ([167]Figure 5A). The
results demonstrate that when the number of decision trees is about
300, the artifacts reach a minimum and stabilize.
Figure 5.
[168]Figure 5
[169]Open in a new tab
Screening of key genes. (A) Plot of model training error of RF
algorithm. (B) MeanDecreaseGini scatter plot of CSRDEGs (in descending
MeanDecreaseGini order). (C) Cross-validation error plot. (D)
Diagnostic model plot of LASSO regression model. (E) Variable
trajectory plot of LASSO regression model. (F) Forest plot of key genes
in LASSO regression model. CSRDEGs, cellular senescence-related
differentially expressed genes; LASSO, least absolute shrinkage and
selection operator.
Moreover, a MeanDecreaseGini scatter plot was created for the 39
CSRDEGs to select essential genes further ([170]Figure 5B). The average
value of the variable reducing node impurity across all trees is
represented by MeanDecreaseGini. The significance of the gene in
distinguishing the epilepsy group from the control group increases with
this value, which in turn has a greater impact on the diagnosis of
epilepsy. Besides, a five 10-fold cross-validation analysis determined
the optimal number of genes, and the cross-validation error curve was
plotted ([171]Figure 5C). The curve represented that the model error
was at its lowest when there were 26 genes. Combining these results
with the MeanDecreaseGini value, 26 CSRDEGs were selected for further
analysis. These selected CSRDEGs significantly influence the diagnosis
of epilepsy and are ranked in descending order of importance as
follows: MARCKSL1, CTSD, ZNF101, RRM2B, RPS6KA3, CREG1, ETS1, MAP2K4,
RBM25, PECAM1, IRF1, TNFSF10, PDHB, TNFRSF1A, NCAM1, ATP7A, ELF4, CTSS,
PSME1, MCM3, MSRA, TNC, POLR2H, MAPK1, CPVL, and IL7R ([172]Figures
5B,[173]C).
Then, the 26 CSRDEGs identified by the RF algorithm were subjected to
LASSO regression analysis to construct a LASSO risk model. The analysis
results were displayed through the LASSO regression model plot
([174]Figure 5D) and the LASSO variable trajectory plot ([175]Figure
5E). The LASSO regression model finally identified the following 15
CSRDEGs: RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A, PECAM1, IL7R,
ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, and TNC. A forest plot of
key genes was drawn using these 15 genes, which were analyzed further
([176]Figure 5F).
3.6. Construction of mRNA-miRNA and mRNA-TF interaction network of key genes
An interaction network based on mRNA-miRNA and mRNA-TF was designed to
further understand the regulatory mechanism of the key genes in
epilepsy. Their regulatory patterns and potential molecular mechanisms
were thoroughly examined by visualizing them through Cytoscape
software. First, the miRNA regulatory associations of the 15 key genes
(including RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A, PECAM1, IL7R,
ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, and TNC) were predicted
using the miRWalk database. Furthermore, an mRNA-miRNA interaction
network was built ([177]Figure 6A). The results revealed that 11 key
genes (including ATP7A, CREG1, CTSD, MAP2K4, MARCKSL1, MCM3, NCAM1,
RPS6KA3, RRM2B, TNC, TNFRSF1A) had significant interactions with 72
miRNAs. Detailed information is provided in [178]Supplementary Table 3.
Figure 6.
[179]Figure 6
[180]Open in a new tab
Interaction network analysis of key genes. (A) mRNA-miRNA interaction
network of key genes. (B) mRNA-TF interaction network of key genes. TF,
transcription factor. Orange is mRNA, pink is miRNA, and purple is TF.
Then, the TFs binding to these key genes were predicted using the
ChIPBase database. An mRNA-TF interaction network was constructed
([181]Figure 6B). The results revealed that 13 key genes (such as
RPS6KA3, ATP7A, CPVL, CREG1, CTSD, IL7R, MARCKSL1, MCM3, PECAM1, RRM2B,
TNC, TNFRSF1A, and ZNF101) interacted with 40 transcription factors.
Further details are documented in [182]Supplementary Table 4. These
interaction networks provide new insights into the regulatory
mechanisms of these genes and their potential biological functions in
epilepsy.
The mRNA-miRNA network is illustrated in [183]Figure 6A. The complex
regulatory associations between genes such as RPS6KA3 and MAP2K4 and
multiple miRNAs highlight that these miRNAs may play essential roles in
the pathological process of epilepsy by regulating gene expression.
[184]Figure 6B demonstrates the mRNA-TF network, which exhibited
significant interactions between key genes and multiple TFs (such as
JUN, FOS, and STAT3, among others), suggesting that these factors may
be crucial in the regulatory network of epilepsy by controlling the
expression of key genes.
3.7. Establishment of diagnostic logistic regression model and functional
similarity analysis of key genes
The 15 key genes (RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A, PECAM1,
IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, and TNC) were
incorporated into a multivariate logistic regression model to determine
the coefficients of each gene. This analysis facilitated the
development of a diagnostic model for epilepsy. Then, the expression
and coefficient of 15 key genes from the [185]GSE143272 dataset were
interpolated based on the risk score formula. The risk score of each
sample was determined, and then the epilepsy group was divided into
low- and high-risk scores based on their median risk score value. The
following formula was used to calculate the risk score for the
[186]GSE143272 dataset:
[MATH: Risk
score=−1549.336∗RPS6KA3+588.388∗CTSD+446.008∗
mo>NCAM1−94.376∗CREG1+640.352∗CPVL−722.159∗TNFRSF1A+17.809
mn>∗PECAM1+640.925∗IL7R−1789.441
∗ZNF101+852.298∗<
/mo>RRM2B+125.355<
/mn>∗MARCKSL1−826.417∗MCM3−97
9.354∗ATP7A+1776.780∗MAP
2K4+422.015∗TNC :MATH]
The association between the 15 key genes was depicted using a nomogram
([187]Figure 7A), which exhibited that ZNF101 and MAP2K4 expression
were the main contributors to the multivariate logistic model. Then,
the diagnostic performance of the multivariate logistic model for
epilepsy was evaluated by DCA based on the [188]GSE143272 dataset. The
results are illustrated in [189]Figure 7C, which demonstrates that the
line of the model was stable at a higher level than all and none in a
certain range, and the net benefit of the model was higher, indicating
that the diagnostic effect of the model was better. Finally, the ROC
curve was plotted based on the risk scores of the [190]GSE143272 and
[191]GSE32534 datasets using the R package pROC to verify the utility
of the multifactor logistic model in epilepsy diagnosis. According to
ROC analysis, the multifactor logistic model demonstrated high accuracy
in epilepsy diagnosis ([192]Figures 7B,[193]D). The functional
similarity (Friends) analysis scores were used to determine the genes
that play an important role in the biological process of epilepsy
([194]Figure 7E). The results revealed that ZNF101 played an essential
role in epilepsy and was the closest to the cut-off value (cut-off
value = 0.60).
Figure 7.
[195]Figure 7
[196]Open in a new tab
Key genes to construct diagnostic. (A) Logistic regression model
nomogram of key genes in the diagnostic multivariate logistic model
based on dataset [197]GSE143272. (B) Diagnostic ROC curve of risk score
of diagnostic multivariate logistic model in data set [198]GSE143272.
(C) DCA plot of the key genes of the diagnostic multivariate logistic
model based on dataset [199]GSE143272. (D) Diagnostic ROC curve of risk
score of diagnostic multivariate logistic model in dataset
[200]GSE32534. (E) Functional similarity map of key genes. (F)
Chromosomal mapping of key genes. The ordinate of the DCA plot is the
net benefit, and the abscissa is the probability threshold or threshold
probability. DCA, decision curve analysis. ROC, receiver operating
characteristic; AUC, area under the curve. The closer the AUC is to 1
in the ROC curve, the better the diagnostic performance. When AUC was
above 0.9, the accuracy was high.
The location of 15 key genes on the human chromosome was analyzed using
the R package RCircos ([201]31), and a chromosome localization map was
drawn ([202]Figure 7F), which revealed that more genes were located on
chromosomes 1, 11, 17, and X; MARCKSL1 and CREG1 were on chromosome 1,
CTSD and NCAM1 on chromosome 11, MAP2K4 and PECAM1 on chromosome 17,
and ATP7A and RPS6KA3 on chromosome X.
3.8. Differential expression validation analysis of key genes
To elucidate the roles of 15 key genes (RPS6KA3, CTSD, NCAM1, CREG1,
CPVL, TNFRSF1A, PECAM1, IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A,
MAP2K4, and TNC) within the [203]GSE143272 dataset, a differential
expression analysis was conducted comparing the epilepsy and control
cohorts. The findings, depicted in [204]Figure 8A, indicate that the
expression levels of five pivotal genes (CTSD, CREG1, ZNF101, RRM2B,
and MARCKSL1) demonstrated statistically significant differences
(p < 0.001) between the epilepsy and control groups. Finally, the R
package pROC was used to draw ROC curves based on the expression levels
of key genes in the [205]GSE143272 dataset. According to the ROC curves
([206]Figures 8B–[207]F), the expression level of key genes including
CTSD, CREG1, PECAM1, ZNF101, RRM2B, MARCKSL1, and MAP2K4 demonstrated a
certain accuracy in classifying epilepsy and control groups (0.7 < AUC
< 0.9). Moreover, the expression levels of RPS6KA3, NCAM1, CPVL,
TNFRSF1A, IL7R, MCM3, ATP7A, and TNC exhibited low accuracy in
classifying epilepsy and control groups (0.5 < AUC < 0.7).
Figure 8.
[208]Figure 8
[209]Open in a new tab
Validation analysis of differential expression of key genes. (A) Group
comparison of key genes in the epilepsy group and the control group of
dataset [210]GSE143272. (B–F) Key genes RPS6KA3, CTSD, and NCAM1 (B),
CREG1, CPVL, and TNFRSF1A (C), PECAM1, IL7R, and ZNF101 (D), RRM2B,
MARCKSL1, and MCM3 (E), ATP7A, ROC curves of MAP2K4 and TNC (F) in
dataset [211]GSE143272. (G) Group comparison diagram of key genes in
dataset [212]GSE32534 epilepsy and control groups. (H–L) Key genes:
RPS6KA3, CTSD, and NCAM1 (H), CREG1, CPVL, and TNFRSF1A (I), PECAM1,
IL7R, and ZNF101 (J), RRM2B, MARCKSL1, and MCM3 (K), ATP7A,
[213]GSE32534 ROC curves of MAP2K4 and TNC (L) in dataset
[214]GSE32534. ^*Represents p-value <0.05, indicating statistical
significance. ^**Represents p-value <0.01, highly statistically
significant. ^***Represents p-value <0.001 and highly statistically
significant. When AUC >0.5, it indicates that the molecule’s expression
is a trend to promote the event’s occurrence, and the closer the AUC is
to 1, the better the diagnostic effect. AUC between 0.5 and 0.7 had low
accuracy, AUC between 0.7 and 0.9 had moderate accuracy, and AUC above
0.9 had high accuracy. DCA, decision curve analysis; ROC, receiver
operating characteristic; AUC, area under the curve; TPR, true positive
rate; FPR, false positive rate. Light blue represents the control
group, and light red represents the epilepsy group.
By applying the same analytical approach, we examined the expression
profiles of key genes (RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A,
PECAM1, IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, TNC) in the
[215]GSE32534 dataset. The differential expression patterns of these 15
key genes between the epilepsy group and the control group were
illustrated using a grouped comparison chart ([216]Figure 8G). These
findings demonstrated that the expression levels of two key genes,
namely NCAM1 and PECAM1, were highly statistically significant (p-value
<0.01). Finally, the R package pROC was used to draw the ROC curve
based on the expression levels of the key genes in the [217]GSE32534
dataset. The ROC curves ([218]Figures 8H–[219]L) revealed that the
expression levels of key genes NCAM1, CPVL, PECAM1, and TNC had high
accuracy in classifying epilepsy and control groups (AUC >0.9).
Furthermore, the expression levels of RPS6KA3, CTSD, CREG1, TNFRSF1A,
IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, and MAP2K4 exhibited
moderate accuracy (0.7 < AUC < 0.9).
3.9. GSEA enrichment analysis based on high and low logistic risk score
groups
GSEA was used to investigate the association between the expression of
all 21,655 genes and the BPs, CCs, and MFs involved in different
epilepsy risk score groups (low/high) in the [220]GSE4290 dataset
([221]Figure 9A) in order to determine the effect of all gene
expression levels in the [222]GSE4290 dataset on the difference between
high and low epilepsy risk score groups. The specific results are
demonstrated in [223]Table 4. The results revealed that all the genes
in the [224]GSE4290 dataset were significantly enriched in the neuronal
system ([225]Figure 9B), anti-inflammatory response GABA receptor
signaling ([226]Figure 9C), neurotransmitter receptors and postsynaptic
signal transmission ([227]Figure 9D), neuroactive ligand-receptor
interaction ([228]Figure 9E), and other biologically relevant functions
and signaling pathways.
Figure 9.
[229]Figure 9
[230]Open in a new tab
GSEA analysis of the high and low-risk groups of epilepsy. (A) GSEA
mountain plot of four biological functions of dataset [231]GSE4290.
(B–E) GSEA showed that all genes were significantly enriched in
neuronal system (B), anti inflammatory response GABA receptor signaling
(C), and anti-inflammatory response GABA receptor signaling (C).
Neurotransmitter receptors and postsynaptic signal transmission (D),
neuroactive ligand receptor interaction (E). GSEA, gene set enrichment
analysis. The screening criteria of GSEA were adj. p-value <0.05 and
FDR value (q-value) <0.25, and the p-value correction method was BH.
Table 4.
Results of GSEA (low/high) for [232]GSE4290.
ID Set size Enrichment score NES p-value p. adjust q-value
REACTOME_NEURONAL_SYSTEM 394 0.47660382 1.96345638 1 × 10^−10
2.479 × 10^−7 2.4516 × 10^−7
WP_GABA_RECEPTOR_SIGNALING 30 0.69762082 1.94311111 0.00011952
0.04938266 0.04883643
REACTOME_NEUROTRANSMITTER_RECEPTORS_AND_POSTSYNAPTIC_SIGNAL_TRANSMISSIO
N 196 0.47801477 1.84995884 1.9014 × 10^−6 0.00157115 0.00155377
KEGG_NEUROACTIVE_LIGAND_RECEPTOR_INTERACTION 262 0.3956592 1.58541008
9.7042 × 10^−5 0.04811335 0.04758116
[233]Open in a new tab
GSEA, gene set enrichment analysis.
4. Discussion
This study identified 15 key genes linked to epilepsy, such as RPS6KA3
and TNFRSF1A, which are involved in lipid metabolism, apoptosis, and
inflammation. These genes may influence seizure development and
frequency. Additionally, we explored the role of CSRGs in epilepsy and
developed a diagnostic model. Current epilepsy treatments, primarily
including drugs and surgery ([234]32), demonstrate varying
effectiveness, with about 30% of patients experiencing drug-resistant
epilepsy ([235]33). This resistance may result from gene mutations,
drug metabolism issues, and molecular changes in the brain ([236]34).
Surgical treatment can control seizures by removing the lesion area,
but it requires precise localization of the epileptic focus and has
surgical risks, such as possible neurological damage ([237]35). Current
epilepsy treatments have limitations, making it essential to explore
their molecular mechanisms and identify novel therapeutic targets. The
complex pathology of epilepsy involves genetic mutations,
neurotransmitter imbalances, and synaptic dysfunction. Studies have
identified genes such as SCN1A, GABRA1, and KCNQ2 ([238]36), which
regulate neuronal electrophysiological activity, synaptic signaling,
and neural development ([239]37). Mutations in the SCN1A gene are a
major cause of severe epilepsy in infants ([240]38), while mutations in
KCNQ2 and KCNQ3 genes increase neuronal excitability, affecting seizure
frequency and type ([241]39). Neuroinflammation and oxidative stress
also play crucial roles in epilepsy development ([242]40), with
neuroinflammation triggering abnormal neuronal firing and epileptic
lesions and oxidative stress causing mitochondrial dysfunction and
calcium imbalance ([243]41). Since most studies focused on specific
gene mutations or pathways without exploring their interactions, a
comprehensive understanding of the etiology and pathogenesis of
epilepsy is lacking despite advances in understanding molecular
mechanisms. Most studies focus on specific gene mutations or signaling
pathways without understanding the interactions between molecular
mechanisms. Developing targeted treatments for epilepsy is challenging
due to its diverse and individual causes. The global prevalence and
treatment difficulties of epilepsy highlight the need for a deeper
understanding of its molecular mechanisms for prevention and treatment.
Cellular senescence, an irreversible cell cycle arrest due to stress,
was initially seen as protective against tumors but is now linked to
various diseases, including cardiovascular, metabolic, and
neurodegenerative disorders ([244]42). Recently, research interest has
been drawn to CSRGs, which are crucial in aging-related diseases
through their effects on the cell cycle, metabolism, and inflammation
([245]43). Studying these genes is vital for understanding complex
disease mechanisms. For instance, genes such as p16^INK4a, p21, and p53
are overexpressed in AD patients, which contribute to the disease
through stress response, mitochondrial function, and neuroinflammation
([246]44). Moreover, cellular aging leads to mitochondrial dysfunction
and dopamine neuron damage, causing movement and cognitive issues
([247]45). Notably, the SASP, a hallmark of cellular senescence, plays
a critical role in establishing a chronic inflammatory microenvironment
within the central nervous system. This is mediated by the secretion of
various pro-inflammatory cytokines, including tumor necrosis factor-α
(TNF-α) and interleukin-6 (IL-6). This inflammatory milieu not only
exacerbates glial cell activation and synaptic structural abnormalities
but also disrupts GABAergic inhibitory neurotransmission, thereby
increasing neuronal synchrony and excitability. These collective
alterations contribute to the pathophysiology and recurrence of
epilepsy ([248]46–48). Consequently, targeting senescence-related genes
and pathways represents a promising therapeutic strategy.
The role of cellular senescence in neurodegenerative diseases such as
AD and PD is well-established; its mechanisms in epilepsy remain
unclear. Most studies on epilepsy focus on inflammation and neuronal
apoptosis ([249]49), rarely addressing cell aging-related genes such as
p16^INK4a, p53, and CDKN2A. The expression patterns of CSRGs in
epilepsy patients and their impact on the lesion microenvironment are
not well understood. The precise function of cell senescence in
different forms of epilepsy, including temporal lobe and infantile
epilepsy, also needs further investigation.
This study analyzed gene expression data from epilepsy patients and
healthy controls using the GEO database and various bioinformatics
methods, overcoming traditional research challenges in data integration
and model construction. This study has several limitations. Fifteen
potential key genes in epilepsy were identified, including RPS6KA3,
CTSD, and NCAM1. RPS6KA3, located on the X chromosome, encodes the RSK2
protein, crucial for neuronal growth and survival, and mutations in
this gene are linked to various diseases ([250]50). Mutations in CTSD
gene, which encodes the Cathepsin D protein involved in lysosomal
protein degradation, can lead to protein metabolism disorders and are
associated with neurodegenerative diseases such as AD and PD ([251]51,
[252]52). NCAM1 gene encodes a neural cell adhesion molecule critical
for neuronal interactions and network formation, and its abnormal
expression is linked to various neurological disorders ([253]53). CREG1
gene is a key transcriptional regulator, and its protein influences the
cell cycle by interacting with retinoblastoma protein, impacting
neuronal survival and differentiation ([254]54). CPVL gene encodes a
serine-like carboxypeptidase involved in protein degradation and immune
modulation, affecting inflammatory and metabolic diseases ([255]55).
TNFRSF1A gene encodes TNFR1, a major mediator of TNF-α signaling,
crucial in inflammation, apoptosis, and immune response, and is linked
to various disorders, especially autoinflammatory and neurological
diseases ([256]56). PECAM1 gene encodes a transmembrane protein in
endothelial cells and platelets, which is essential for angiogenesis,
inflammation, and intercellular interactions, potentially influencing
the blood–brain barrier and neurovascular units in epilepsy ([257]57).
IL7R gene encodes the IL-7Rα, which is crucial for immune development
and regulation in T and B cells ([258]58, [259]59). ZNF101 gene
produces a zinc finger protein that regulates gene expression, cell
cycle, and DNA repair ([260]60). RRM2B gene encodes a subunit essential
for DNA replication and repair, maintaining genome stability.
MARCKSL1influences cytoskeletal dynamics and cell movement by
regulating phosphatidylinositol-4,5-bisphosphate (PIP2) distribution
([261]61). MARCKSL1 regulates cytoskeletal dynamics by controlling PIP2
distribution, aiding cell migration and invasion ([262]62). MCM3 gene
encodes a key DNA replication initiation complex component, crucial for
DNA replication accuracy and genome stability ([263]63). ATP7A gene
mutations, affecting copper ion transport, are linked to copper
metabolism disorders like Menkes and neurodegenerative diseases
([264]64). MAP2K4 encodes MKK4, a dual-specificity kinase crucial in
the MAPK pathway, responding to environmental stresses via JNK and p38
MAPK activation ([265]65). Recent studies indicate that MAP2K4 is
crucial in AD and PD by regulating apoptosis and neuroinflammation
([266]66). TNC gene, which encodes an extracellular matrix protein, is
abnormally expressed in neurodegenerative diseases, especially AD and
PD, potentially worsening pathology by affecting cell adhesion and
inflammation ([267]67, [268]68). The column-line graph ([269]Figure 6A)
demonstrates that ZNF101 and MAP2K4 significantly contribute to the
multifactorial logistic model. ZNF101 may influence neuronal survival
by regulating DNA repair or apoptosis, while MAP2K4 is involved in the
pathogenesis of epilepsy by affecting apoptosis and inflammation
through the MAPK signaling pathway.
Among the 15 genes, RPS6KA3, MAP2K4, MARCKSL1, and CREG1 are associated
with cell signaling, neuronal growth, and survival. Their proteins are
involved in several crucial cell signaling pathways, such as MAPK and
phosphatidylinositol signaling pathways. Mutations or abnormal
expressions of these genes may affect neuronal excitability and
plasticity, leading to abnormal neural networks and potentially causing
seizures ([270]66, [271]69). Besides, RRM2B, MCM3, ZNF101, CTSD, and
ATP7A are involved in protein metabolism, cell cycle regulation, and
genome stability, playing roles in DNA replication, repair, and protein
degradation to maintain genome stability and cellular metabolic
balance. Their functions are vital for neuronal survival, and any
mutations or functional defects can result in disrupted protein
metabolism and DNA damage accumulation, leading to neuronal damage and
neurodegenerative diseases ([272]64). NCAM1, TNC, PECAM1, IL7R,
TNFRSF1A, and CPVL are associated with cell adhesion, inflammatory
response, and immune regulation. The proteins encoded by these genes
play essential roles in mediating intercellular interactions,
facilitating the establishment of neural networks, and modulating
immune system functions. Specifically, these proteins are crucial for
neuronal cell adhesion, angiogenesis, and the regulation of immune
responses. Abnormal expression of adhesion molecules and excessive
inflammation can lead to neurodegenerative diseases ([273]70).
Furthermore, mRNA-miRNA and mRNA-TF interaction networks were
constructed to better understand the regulatory mechanisms of key genes
in epilepsy. The mRNA-miRNA network revealed the role of miRNA in gene
regulation, while the mRNA-TF network demonstrated the effect of TFs
linked to neuronal function, stress response, and inflammation on
epilepsy. These analyses addressed the limitations of gene expression
data and offered new targets for personalized epilepsy treatment by
regulating upstream factors of key genes. Significant regulatory
interactions between 11 key genes and 72 miRNAs were identified in an
mRNA-miRNA network, suggesting these miRNAs may impact epilepsy-related
processes by modulating gene expression. For instance, RPS6KA3
interacts with several miRNAs (such as hsa-miR-30c-5p and
hsa-miR-19a-3p), potentially affecting neuron growth and survival via
the MAPK pathway, influencing neural plasticity and seizure frequency
([274]71). Abnormal MAP2K4 expression may worsen neuronal stress and
inflammation through JNK and p38 pathways, highlighting its role in
epilepsy ([275]72).
Additionally, an mRNA-TF network analysis exhibited 13 key genes
significantly interacting with 40 TFs. TFs such as JUN, FOS, and STAT3
have crucial roles in cellular stress response, immune regulation, and
neuronal growth by controlling key genes such as ZNF101 and TNFRSF1A.
Particularly, ZNF101 influences neuronal survival and dysfunction by
managing DNA repair and the cell cycle. This regulatory network is
vital for understanding the gene-TF axis in epilepsy, offering insights
into the pathogenesis and personalized treatment strategies for the
disease. Targeting specific miRNAs or TFs can effectively manage
epilepsy-related gene expression, reducing neuronal damage and seizure
frequency. For instance, miR-21 influences neuronal survival via the
phosphatase and tensin homolog (PTEN)/mammalian target of rapamycin
(mTOR) pathway, and its intervention can mitigate epileptic seizures
([276]72). Anti-inflammatory treatments targeting genes, including
TNFRSF1A and IL7R, are essential for epilepsy patients exhibiting
pronounced inflammatory responses ([277]73, [278]74).
Bioinformatics and machine learning methods are used to develop an
epilepsy diagnostic model using CSRGs, which lays a foundation for
clinical diagnosis and personalized treatment. A multifactor logistic
regression model was created using 15 key genes. The model demonstrated
preliminary evidence of strong diagnostic potential based on the ROC
analysis using [279]GSE143272 and [280]GSE32534 datasets ([281]Figures
8B,[282]D). This study demonstrated the effectiveness of a multifactor
logistic model for diagnosing epilepsy at the genetic level,
particularly using CSRG-based markers to distinguish patients from
healthy individuals. This strategy combines multiple gene expressions
for a more accurate prediction than single biomarker methods. The
multidimensional gene combinations enhance the reliability of the
model, highlighting the diagnostic potential of CSRGs. Notably, ZNF101
and MAP2K4 significantly contribute to epilepsy diagnosis ([283]Figure
8A). ZNF101, a zinc finger transcription factor, may affect neuronal
DNA repair and homeostasis. Recent research links neuronal senescence
and abnormal regulatory factor expression in patients with refractory
epilepsy to neuroinflammation and disease progression ([284]9,
[285]75). RPS6KA3, a key player in the MAPK signaling pathway,
influences neuronal survival and excitability. Dysregulation of this
pathway is linked to neuronal hyperexcitability and epilepsy ([286]9,
[287]11). MAP2K4 regulates the JNK and p38 MAPK pathways, impacting
stress, apoptosis, and inflammation, and plays an important role in
epilepsy-related neuronal damage and chronic inflammation ([288]11,
[289]76). These genes are crucial in epilepsy pathology, affecting
neuroinflammation, synaptic signaling, and gene stability, offering
insights into molecular mechanisms and potential therapies for
epilepsy. The CSRGs-based genetic diagnostic model provides hints for
personalized treatment. Early identification of high-risk epilepsy
patients can help with personalized treatment strategies by detecting
expression levels of key genes, including ZNF101 and MAP2K4. The role
of inflammation-related genes (such as TNFRSF1A and IL7R) suggests that
anti-inflammatory therapy can be a new treatment direction, especially
for patients with significant inflammatory responses, potentially
reducing seizure frequency.
This study also analyzed differences between high- and low-epilepsy
risk groups using GSEA on 21,655 genes from the [290]GSE4290 dataset,
revealing pathways related to epileptic mechanisms ([291]Figures
8A–[292]E). GSEA identified significant enrichment of nervous system
function pathways in the high-risk group ([293]Figure 8B). These
neurological pathways encompass neurotransmitter receptors and
postsynaptic signaling mechanisms, suggesting that these genes may play
a pivotal role in aberrant neuronal signaling. This dysfunction is
associated with the occurrence of seizures, which are closely connected
to disrupted synaptic signaling and increased neuronal excitability in
epilepsy. Furthermore, the connection between epilepsy and
neuroinhibitory pathways was reinforced by the enrichment of the GABA
receptor signaling pathway ([294]Figure 8C). Abnormalities in GABA
receptor signaling, a key inhibitory neurotransmitter, result in
reduced inhibitory neuromodulation, increasing the risk of abnormal
neuronal discharges ([295]48). These findings suggested that targeting
the GABA pathway can be a potential therapeutic approach for
controlling seizures. The findings also indicated that the neuroactive
ligand-receptor interaction pathway ([296]Figure 8E) was enriched in
the high-risk epilepsy group. It also highlighted the enrichment of the
neuroactive ligand-receptor interaction pathway in patients with
high-risk epilepsy, indicating that its dysfunction may increase
neuronal excitability and trigger seizures ([297]77). Targeted
intervention in this pathway can help regulate abnormal neural activity
and reduce seizures. GSEA analysis linked seizures to key
neurobiological pathways, particularly neurotransmitter abnormalities,
GABA signaling, and neuroactive ligand-receptor interactions, offering
potential targets for personalized treatments for epilepsy. Future
multi-omics data research can further explore these pathways to develop
new therapeutic strategies.
A total of 15 critical CSRGs were identified using bioinformatics and
machine learning methodologies, and a diagnostic model was constructed
that exhibited robust performance across both training and validation
datasets. The primary aim of this study was to distinguish patients
with epilepsy from healthy controls by leveraging gene expression
profiles linked to cellular aging, thereby constructing a reliable
diagnostic tool for epilepsy. The model assigned high and low scores to
evaluate the likelihood of epilepsy at the time of diagnosis rather
than predicting disease progression or recurrence risk. Although the
validation dataset ([298]GSE32534) was limited in size (n = 10), it
corroborated these findings and was consistent with existing
literature. Increasing the sample size can enhance the generalizability
of these results. Currently, the model is in an exploratory phase and
has not yet been validated at the protein level or with clinical
samples. Given the constraints of time and resources, we recommend
using additional external datasets or data augmentation in future
research to enhance the stability and applicability of the model to a
broader and more diverse population.
5. Conclusion
In conclusion, the findings of this study indicated a significant
association between cellular senescence and the pathophysiology of
epilepsy, which may present novel targets for early diagnosis and
personalized treatment strategies. Considering the complexity of
epilepsy as a neurological disorder, the clinical applicability of
these markers warrants further investigation. Future research should
focus on a comprehensive exploration of these pathways, integrating
multi-omics data to substantiate their potential. This approach can
yield innovative insights and strategies for the precise diagnosis and
management of epilepsy. Moreover, future studies should aim to collect
larger-scale epilepsy datasets through multi-center collaborations to
enhance the robustness and generalizability of the model.
Acknowledgments