Abstract

Introduction

   Epilepsy is a chronic brain disease with a certain degree of
   neurodegeneration and is caused by abnormal discharges of neurons. The
   mechanism of cell senescence has garnered increasing attention in
   neurodegenerative diseases. However, the role of cell senescence in the
   onset and progression of epilepsy is unclear. Therefore, this study
   constructed a diagnostic model of epilepsy based on cellular
   senescence-related genes (CSRGs) to analyze their role in disease
   pathogenesis.

Methods

   The differentially expressed genes (DEGs) were screened from the
   epileptic sample dataset of the gene expression omnibus (GEO) database,
   and the cellular senescence-related DEGs (CSRDEGs) related to epilepsy
   were identified by CSRGs crossover. The functional enrichment
   characteristics of CSRDEGs were analyzed using gene ontology (GO) and
   Kyoto encyclopedia of genes and genomes (KEGG) enrichment analyses. The
   differences in biological processes between high and low-risk groups
   were analyzed using gene set enrichment analysis (GSEA). For model
   construction, logistic regression, random forest, and least absolute
   shrinkage and selection operator (LASSO) regression were employed to
   identify key genes, including ribosomal protein S6 kinase alpha-3
   (RPS6KA3), cathepsin D (CTSD), and zinc finger protein 101 (ZNF101).
   Subsequently, a multifactor logistic regression model was developed to
   evaluate the risk of epilepsy based on these screened genes.

Results

   The model exhibited higher area under the curve (AUC) values in the GSE
   data sets 143272 and 32534, producing encouraging results. Finally,
   mRNA-miRNA and mRNA-transcription factors (TFs) networks revealed the
   potential regulatory mechanism of the selected critical genes in the
   disease.

Discussion

   This study elucidated the possible process of cell senescence in
   epileptic diseases through bioinformatics analysis, offering a
   potential target for personalized diagnosis and precise treatment.

   Keywords: bioinformatics analysis, cell senescence, diagnosis model,
   epilepsy, senescence-associated secretory phenotype

1. Introduction

   Epilepsy is a common neurological disorder with a high prevalence
   worldwide. According to the World Health Organization, there are
   approximately 50 million individuals with epilepsy globally, and
   millions of new cases emerge annually. The incidence of epilepsy varies
   significantly among different regions and populations. Generally,
   epilepsy is more common in developing and low-income countries than in
   developed ones. This may be due to various factors, including a lack of
   medical resources, limited diagnosis, and insufficient disease
   management. Furthermore, epilepsy has a relatively higher incidence
   rate among children and the elderly. It is one of the primary causes of
   neurological disorders in children, significantly affecting the quality
   of life and social functioning of patients ([33]1). Epilepsy has a high
   incidence rate currently, but the treatment effect is unsatisfactory
   ([34]2). Therefore, it is crucial to investigate its molecular
   mechanisms further to identify novel therapeutic targets.

   Cell senescence is regarded as one of the key pathological mechanisms
   in neurodegenerative diseases ([35]3, [36]4), such as Alzheimer’s
   disease (AD) ([37]5–7) and Parkinson’s disease (PD) ([38]3, [39]8).
   Neurons and glial cells in AD and PD often display typical aging
   characteristics, including cell cycle arrest, organelle dysfunction,
   and a pro-inflammatory phenotype ([40]9). Although the role of cellular
   senescence in neurodegenerative diseases has been extensively studied,
   its relevance in the pathogenesis of epilepsy has not been fully
   explored. Some studies in recent years have initially revealed clues
   indicating that cell senescence may play an important role in epilepsy.
   Some studies revealed that there are senescent phenotype
   characteristics (such as upregulation of p16^INK4a expression and
   abnormal activation of cell cycle genes) ([41]10); this implies that
   cell senescence may be crucial for developing epileptogenic zones and
   the regulation of seizure frequency. Additionally, the association
   between epilepsy and neuroinflammation has been widely reported, and
   the senescence-associated secretory phenotype (SASP) cellular may be a
   major contributor in promoting the formation of a chronic inflammatory
   environment ([42]11). As a result, investigating the role of cell
   senescence-related genes (CSRGs) in epilepsy may assist in identifying
   specific mechanisms behind the onset and progression of epilepsy and
   providing a theoretical basis for exploring new therapeutic targets.

   In light of this, this study aimed to systematically analyze the
   functions of CSRGs and explore their potential role in the pathogenesis
   of epilepsy. Considering the critical role of CSRGs in other
   neurological diseases, we hypothesized that they may also be crucial in
   the pathogenesis of epilepsy. This study will use bioinformatics
   analysis to identify differentially expressed CSRGs related to epilepsy
   and investigate their regulatory mechanisms in neuronal function and
   inflammatory response. This study may offer a theoretical foundation
   for further understanding the molecular mechanisms underlying epilepsy
   and exploring new therapeutic targets.

2. Materials and methods

2.1. Data collection and preprocessing

   This study acquired three epilepsy-related Homo sapiens datasets from
   the Gene Expression Omnibus (GEO) database[43]^1 and Gene Set
   Enrichment (GSE) [44]GSE143272, [45]GSE32534, and [46]GSE4290, using
   the R package GEOquery ([47]12). The chip platform of the [48]GSE143272
   dataset was [49]GPL10558, and the chip platform of datasets
   [50]GSE32534 and [51]GSE4290 was [52]GPL570. The specific details are
   given in [53]Table 1. Whole blood tissue served as the tissue source
   for dataset [54]GSE143272, including 34 untreated epilepsy samples and
   51 healthy control samples. Peritumoral cortical tissue was the source
   of dataset [55]GSE32534, which contains five epilepsy samples and five
   control samples. The dataset [56]GSE4290 originated from brain tissue
   containing 23 epilepsy samples.

Table 1.

   GEO microarray chip information.
   Data set [57]GSE143272 [58]GSE32534 [59]GSE4290
   Platform [60]GPL10558 [61]GPL570 [62]GPL570
   Species Homo sapiens Homo sapiens Homo sapiens
   Tissue Whole blood tissues Peritumoral cortex tissues Brain tissues
   Samples in epilepsy group 34 5 23
   Samples in control group 51 5 /
   Reference PMID: 30826443
   PMID: 32054883 PMID: 23418513 PMID: 16616334
   [63]Open in a new tab

   This study aimed to identify genes associated with cellular senescence
   by collecting protein-coding CSRGs from the GeneCards database.[64]^2
   Protein-coding genes were screened and retained by using “cell
   senescence” as the search keyword, identifying 3,575 CSRGs.
   Additionally, relevant literature was searched in PubMed[65]^3 using
   the same keyword, and 279 genes related to the cellular aging process
   in different studies were further collected. After merging and
   deduplication, 3,619 CSRGs were obtained. The detailed information is
   depicted in [66]Supplementary Table 1.

   Datasets [67]GSE143272 and [68]GSE32534 were preprocessed for
   standardized and normalized probe annotations using the R package limma
   ([69]13). A subset of samples from dataset [70]GSE143272 and all
   samples from dataset [71]GSE32534 were included in this analysis.
   Furthermore, epilepsy-related samples in dataset [72]GSE4290 served as
   a validation set for subsequent analyses.

2.2. Differentially expressed genes analysis

   The samples were divided into two groups based on their grouping in the
   [73]GSE143272 and [74]GSE32534 datasets: an epilepsy group and a
   control group. Differential expression between epilepsy and control
   groups was analyzed using the R package limma. The threshold for
   identifying differentially expressed genes (DEGs) was established at
   |logFC| >0.10 and p-value <0.05. Specifically, genes were categorized
   as upregulated if their logFC >0.10 and p-value <0.05 and as
   downregulated if their |logFC| <−0.10 and p-value <0.05. The results of
   differential expression analysis were displayed using volcano plots
   generated by the R package ggplot2 ([75]14).

   Variance analysis was first used to identify DEGs from the
   [76]GSE143272 and [77]GSE32534 datasets in order to identify cellular
   senescence-related DEGs (CSRDEGs) associated with epilepsy. The
   screening criteria included |logFC| >0.10 and p-value <0.05.
   Subsequently, these DEGs were analyzed for intersection with the CSRGs
   of all epilepsy samples in the [78]GSE4290 dataset. CSRDEGs were
   finally identified by constructing a Venn diagram displaying the
   overlap of these genes. Heatmaps were created using the R package
   pheatmap ([79]15) to display these CSRDEGs.

2.3. Gene Ontology and Kyoto Encyclopedia of Genes and Genomes enrichment
analysis

   Gene Ontology (GO) analysis is a commonly used method for large-scale
   functional enrichment studies, including cellular component (CC),
   biological process (BP), and molecular function (MF). The Kyoto
   Encyclopedia of Genes and Genomes (KEGG) is currently the most widely
   used database for storing information on genomes, biological pathways,
   diseases, and drugs. Enrichment analysis of CSRDEGs was performed using
   GO and KEGG pathway analysis using the R package clusterProfiler
   ([80]16). The Screening criteria for statistical significance were
   defined as p-value <0.05 and false discovery rate (FDR) (q-value)
   <0.25.

2.4. Gene set enrichment analysis

   Gene set enrichment analysis (GSEA) is used to evaluate the
   distribution trends of genes in predefined gene sets ranked based on
   their correlation with a specific phenotype to determine their
   contribution to a given phenotype. In this study, the genes in the
   [81]GSE143272 dataset were first arranged according to their logFC
   values. Subsequently, GSEA analysis was conducted on all genes in the
   [82]GSE143272 dataset using the R package clusterProfiler. The
   parameters of the GSEA analysis were set as follows: the seed value was
   2,020, the number of calculations was 1,000, the minimum number of
   genes contained in each gene set was 10, and the maximum number of
   genes was 500. The c2 gene set (Cp. All. V2022.1. Hs. Symbols) was
   obtained through the Molecular Signatures Database. GSEA used gene
   matrix transposed (GMT) files containing all canonical pathways (3,050
   pathways). The screening criteria for significant enrichment were set
   as adjusted p-value <0.05 and FDR value (q-value) <0.25, with p-value
   correction using the Benjamini–Hochberg (BH) method ([83]17).

   Epilepsy samples in the [84]GSE4290 dataset were divided into high- and
   low-risk groups based on the median logistic risk score. Differential
   expression analysis was performed using the R package limma, and genes
   with |logFC| >0.10 and p-value <0.05 in high- and low-risk groups were
   eliminated. This research focuses on developing a diagnostic support
   model for epilepsy that differentiates between patients with epilepsy
   and healthy individuals based on gene expression profiles linked to
   cellular aging. The model produces scores designed to assess the
   probability of epilepsy during diagnosis rather than evaluate disease
   progression or recurrence risk.

2.5. Screening key genes

   An initial single-factor logistic regression analysis was performed on
   CSRDEGs, utilizing a screening criterion of p-value <0.05 to identify
   key genes and construct a diagnostic model. The CSRDEGs identified
   through this screening were subsequently analyzed using a random forest
   (RF) approach. RF ([85]18), an ensemble learning algorithm, integrates
   multiple decision trees through bootstrap aggregation and is frequently
   employed in model construction to generate numerous decision trees,
   with the final prediction determined by majority voting. The expression
   levels of CSRDEGs in the [86]GSE143272 dataset were filtered using
   single-factor logistic regression, and the model was developed using
   the RF package with the parameters “set.seed (234)” and “ntree = 500.”
   The “MeanDecreaseGini ([87]19)” metric, which assesses the average
   reduction in node impurity for a variable across all decision trees,
   was employed to evaluate the importance of each variable within the
   model. A higher MeanDecreaseGini value indicates greater variable
   importance. To determine the optimal number of variables, five 10-fold
   cross-validations were conducted, and the resulting cross-validation
   curves were combined. Cross-validation is a method of evaluating model
   performance by dividing the data into different training and validation
   sets multiple times, helping to alleviate the problems of overfitting
   and insufficient training data. Cross-validation was performed using
   the training set, and the number of variables was selected that
   minimized the error. The significant variables were selected for
   subsequent analysis based on the MeanDecreaseGini value.

   The least absolute shrinkage and selection operator (LASSO) regression
   analysis ([88]20) was carried out using the R package glmnet ([89]21).
   The parameters were adjusted to “set.seed (500)” and running the number
   of times to 200 to avoid overfitting. LASSO regression helps reduce
   overfitting and enhances the generalizability of the model by adding a
   penalty term (lambda times the absolute coefficient value) to the
   linear regression model. The results of the LASSO regression analysis
   are visualized through diagnostic model plots and variable trajectory
   plots. CSRDEGs included in the final LASSO regression model were
   considered key genes for subsequent analyses. The regulatory mechanisms
   of key genes were thoroughly explored through these network analyses,
   and their potential functions in epilepsy pathology were revealed.

2.6. Constructing mRNA-miRNA and mRNA-transcription factors (TFs) interaction
network

   The regulatory mechanisms of key genes and their potential roles in
   epilepsy pathology were explored by constructing mRNA-miRNA and mRNA-TF
   interaction networks to identify the upstream regulators and downstream
   regulatory targets of these key genes. The miRWalk database was
   used[90]^4 to predict potential miRNA regulatory molecules of key
   genes. A total of 72 miRNAs related to 11 key genes were screened out,
   and the mRNA-miRNA interaction network was constructed and visualized
   by Cytoscape software. The results demonstrated that ribosomal protein
   S6 kinase alpha-3 (RPS6KA3) has a significant regulatory association
   with multiple miRNAs, such as hsa-miR-30c-5p and hsa-miR-19a-3p,
   indicating that these miRNAs may participate in the pathogenesis of
   epilepsy by regulating the expression of key genes. Additionally, the
   interactions of key genes with their transcription factors (TFs) were
   predicted using the ChIPBase database ([91]22).[92]^5 The 40 TFs that
   were identified encompassed JUN, FOS, and STAT3, among others, which
   interact with 13 key genes. Cytoscape software was used to construct an
   mRNA-TF interaction network, exhibiting the regulatory network
   association between RPS6KA3, zinc finger protein 101 (ZNF101), IL7R,
   and other genes and these TFs. This result suggested that these TFs may
   play an important role in the pathological process of epilepsy by
   regulating the expression of key genes.

2.7. Key genes for building diagnostic logistic regression models

   Logistic regression models are often used to analyze the relationship
   between independent variables and binary dependent variables. In this
   study, all key genes were included to construct a multifactor logistic
   regression model. The coefficient of each key gene in the model was
   multiplied by its corresponding expression level, and the results were
   added to calculate the risk score of each sample. The dataset was
   divided into high- and low-risk groups based on the median risk score.
   The risk score is calculated as follows:
   [MATH: <mtext>Risk score</mtext><mo>=</mo><munder><mo
   movablelimits="false">∑</mo><mi>i</mi></munder><mtext>Coefficient</mtex
   t><mspace width="0.25em"></mspace><mo
   stretchy="true">(</mo><msub><mtext>gene</mtext><mi>i</mi></msub><mo
   stretchy="true">)</mo><mo>∗</mo><mtext>mRNA expression</mtext><mspace
   width="0.25em"></mspace><mo
   stretchy="true">(</mo><msub><mtext>gene</mtext><mi>i</mi></msub><mo
   stretchy="true">)</mo> :MATH]

   Nomogram ([93]23) is a graphical tool that uses a set of
   non-overlapping line segments to represent the functional relationship
   between multiple independent variables in a rectangular coordinate
   system. A nomogram was plotted based on the results of the multivariate
   logistic regression model using the R package rms ([94]24) to
   demonstrate the interrelationships between key genes included in the
   model.

   Decision curve analysis (DCA) is a method used to evaluate the clinical
   utility of predictive models, diagnostic tests, and molecular markers.
   DCA was performed on key genes of the [95]GSE143272 dataset using the R
   package ggDCA, and decision curve plots were constructed.

   Furthermore, ROC curves of the logistic risk scores in [96]GSE143272
   and [97]GSE32534 datasets were plotted using the R package pROC
   ([98]25), and the area under the curve (AUC) values were calculated.
   The ROC curve is a tool used to evaluate model performance by analyzing
   the trade-off between sensitivity and specificity to select the optimal
   model, eliminate suboptimal models, or determine the optimal threshold
   within a single model. The ROC curve comprehensively measures the
   ability of the test to distinguish different conditions. The AUC can be
   used to evaluate the diagnostic performance of the logistic risk score
   for epilepsy. It is usually between 0.5 and 1. The diagnostic
   performance improves when the AUC is closer to 1. Low accuracy is
   indicated by an AUC between 0.5 and 0.7, moderate accuracy by an AUC
   between 0.7 and 0.9, and high accuracy by an AUC above 0.9.

   Semantic comparison of GO annotations provides a quantitative method
   for assessing similarities between genes and genomes, which has become
   an essential basis for many bioinformatics analysis methods. This study
   used the R package GOSemSim ([99]26) to calculate the functional
   correlations (Friends) of key genes and determine their relationships
   through functional similarity analysis.

2.8. Key gene differential expression verification

   The differences in key gene expression between epilepsy and control
   groups in the [100]GSE143272 and [101]GSE32534 datasets were analyzed
   using the Mann–Whitney U test ([102]27) (Wilcoxon rank sum test). The
   differential analysis results were visualized using comparison plots
   between groups, plotted by the R package ggplot2. R package pROC was
   used to generate the ROC curves of key genes between epilepsy and
   control groups in [103]GSE143272 and [104]GSE32534 datasets. The AUC
   values were calculated to evaluate the diagnostic effect of key gene
   expression in patients with epilepsy.

2.9. Statistical analyses

   The data processing and analyses in this study were performed using R
   software (version 4.2.2). Unless otherwise indicated, the independent
   samples t-test (Student’s t-test) was used to determine statistical
   significance for comparisons of continuous variables between two groups
   if the variables were normally distributed. The Mann–Whitney U test
   (Wilcoxon rank sum test) was used to analyze variables that did not
   follow the normal distribution. Kruskal–Wallis test was used when
   comparing three or more groups ([105]28). Correlation coefficients
   between different molecules were calculated using Spearman correlation
   analysis ([106]29). Unless otherwise stated, all statistical p-values
   are two-sided, and p-values <0.05 were considered statistically
   significant.

3. Results

3.1. Processing of epilepsy datasets

   First, the batch effect was removed from the epilepsy datasets
   [107]GSE143272 and [108]GSE32534 using the R package sva ([109]30). The
   effectiveness of this process was evaluated by comparing expression
   levels before and after batch effects were eliminated via distribution
   box plots. Results for dataset [110]GSE143272 are displayed in
   [111]Figures 1A,[112]B, whereas the results for dataset [113]GSE32534
   are displayed in [114]Figures 1C,[115]D. Boxplot analysis revealed that
   the batch effect of the epilepsy dataset had been effectively
   eliminated; therefore, the distribution boxplot confirmed the accuracy
   and reliability of gene expression data analysis.

Figure 1.

   [116]Figure 1
   [117]Open in a new tab

   Data to batch processing. (A,B) Boxplot of [118]GSE143272 distribution
   of epilepsy dataset before (A) and after (B) going to batch. (C,D)
   Epilepsy dataset [119]GSE32534 distribution boxplot before (C) and
   after (D) debatched. Light blue is the control group, and light red is
   the epilepsy group.

3.2. DEGs related to epileptic cell senescence

   Differentiated expression analysis was performed to analyze gene
   expression differences in [120]GSE143272 and [121]GSE32534 datasets
   between epilepsy and control groups using the R package limma. A total
   of 3,430 DEGs were identified in the [122]GSE143272 dataset. The
   filtering criteria included |logFC| >0.10 and p-value <0.05, including
   1,631 upregulated (logFC >0.10 and p-value <0.05) and 1,799
   downregulated genes (logFC <−0.10 and p-value <0.05). The results are
   displayed by a volcano plot ([123]Figure 2A). Moreover, 930 DEGs were
   identified in the [124]GSE32534 dataset with the same screening
   criteria, which included 458 upregulated and 472 downregulated genes,
   and the results are displayed in the volcano plot ([125]Figure 2B).

Figure 2.

   [126]Figure 2
   [127]Open in a new tab

   Differential gene expression analysis. (A,B) Volcano plot of
   differentially expressed genes analysis between epilepsy group and
   control group in [128]GSE143272 (A) and [129]GSE32534 dataset (B). (C)
   DEGs in [130]GSE143272 and [131]GSE32534 datasets, genes and CSRGs Venn
   diagram of all epilepsy samples in [132]GSE4290 dataset. (D,E) Heat map
   of CSRDEGs in [133]GSE143272 (D) and [134]GSE32534 datasets (E). DEGs,
   differentially expressed genes; CSRGs, cellular senescence-related
   genes; CSRDEGs, cellular senescence-related differentially expressed
   genes. Light red is the epilepsy group; light blue is the control
   group. In the heat map, red represents high expression, and blue
   represents low expression.

   The intersection analysis of the DEGs of [135]GSE143272 and
   [136]GSE32534 and the CSRGs in the [137]GSE4290 dataset was conducted
   to draw a Venn diagram ([138]Figure 2C) to identify CSRDEGs related to
   cellular senescence. Ultimately, 40 CSRDEGs were identified, including
   ZNF101, cathepsin D (CTSD), ribonucleotide reductase subunit M2
   (RRM2B), anti-RNA polymerase II subunit H (POLR2H), myristoylated
   alanine-rich C kinase substrate like 1 (MARCKSL1), ETS proto-oncogene 1
   (ETS1), cellular repressor of E1A-stimulated genes 1 (CREG1),
   tripartite motif-containing protein 21 (TRIM21), mitogen-activated
   protein kinase kinase 4 (MAP2K4), platelet endothelial cell adhesion
   molecule 1 (PECAM1), methionine sulfoxide reductase A (MSRA),
   proteasome activator subunit 1 (PSME1), interferon regulatory factor-1
   (IRF1), E74-like factor 4 (ELF4), minichromosome maintenance 3 (MCM3),
   cathepsin S (CTSS), tumor necrosis factor α receptor 1 (TNFRSF1A),
   chloride intracellular channel protein 1 (CLIC1), E-twenty six variant
   gene 6 (ETV6), RNA-binding motif protein 25 (RBM25), neutrophil
   cytosolic factor 4 (NCF4), heat shock 70-kDa protein 6 (HSPA6),
   speckled protein 100 kDa (SP100), RPS6KA3, ras association
   domain-containing protein 5 (RASSF5), tenascin-C (TNC),
   mitogen-activated protein kinase 1 (MAPK1), pyruvate dehydrogenase E1
   subunit beta (PDHB), growth factor progranulin (GRN), interleukin-7
   receptor (IL7R), ATPase copper-transporting alpha (ATP7A), Von
   Hippel–Lindau (VHL), tumor necrosis factor superfamily factor 10
   (TNFSF10), DNA methyltransferase-1 (DNMT1), interferon-induced
   transmembrane protein 3 (IFITM3), carboxypeptidase vitellogenic like
   (CPVL), TOR signaling pathway regulator-like (TIPRL), serglycin (SRGN),
   neural cell adhesion molecule 1 (NCAM1), and S100 calcium binding
   protein A6 (S100A6). The expression differences of these CSRDEGs
   between different sample groups of the [139]GSE143272 and [140]GSE32534
   datasets were further analyzed. Heatmaps were drawn using the R package
   heatmap to present the analysis results ([141]Figures 2D,[142]E).

3.3. GO and KEGG enrichment analyses

   GO and KEGG enrichment analyses were conducted to explore the
   association between 40 CSRDEGs and CCs, MFs, and biological pathways
   associated with epilepsy. The detailed results are presented in
   [143]Table 2. The analysis indicated that the 40 CSRDEGs were
   significantly enriched in specific CCs, including the lumen of
   secretory granules, cytoplasmic vesicles, vacuoles, and late endosomes.
   Additionally, these genes exhibited enrichment in MFs, notably MAP
   kinase activity. Crucially, the CSRDEGs were intricately associated
   with several key biological pathways, as identified by KEGG, including
   lipid metabolism and atherosclerosis, regulation of apoptosis, the MAPK
   signaling pathway, and the TNF signaling pathway. The GO and pathway
   enrichment analysis results were visualized by bar graphs and bubble
   plots ([144]Figures 3A,[145]B).

Table 2.

   Results of GO and KEGG enrichment analysis for CSRDEGs.
   Ontology ID Description Gene ratio Bg ratio p-value p. adjust q-value
   CC GO:0034774 Secretory granule lumen 6/40 322/19594 4.5061 × 10^−5
   0.00170598 0.0010723
   CC GO:0060205 Cytoplasmic vesicle lumen 6/40 325/19594 4.7447 × 10^−5
   0.00170598 0.0010723
   CC GO:0031983 Vesicle lumen 6/40 327/19594 4.9093 × 10^−5 0.00170598
   0.0010723
   CC GO:0005775 Vacuolar lumen 5/40 174/19594 2.6671 × 10^−5 0.00170598
   0.0010723
   CC GO:0005770 Late endosome 5/40 283/19594 0.00026391 0.00611396
   0.00384293
   MF GO:0004708 MAP kinase activity 2/40 18/18410 0.00068893 0.09438274
   0.08412138
   KEGG hsa05417 Lipid and atherosclerosis 6/30 215/8164 0.00010881
   0.01032613 0.00843993
   KEGG hsa04210 Apoptosis 5/30 136/8164 0.00012148 0.01032613 0.00843993
   KEGG hsa04010 MAPK signaling pathway 5/30 294/8164 0.00397777
   0.08501609 0.06948683
   KEGG hsa04668 TNF signaling pathway 4/30 112/8164 0.00069861 0.03958799
   0.03235674
   KEGG hsa05152 Tuberculosis 4/30 180/8164 0.00400076 0.08501609
   0.06948683
   [146]Open in a new tab

   GO, Gene Ontology; CC, cellular component; MF, molecular function;
   KEGG, Kyoto Encyclopedia of Genes and Genomes; CSRDEGs, cellular
   senescence-related differentially expressed genes.

Figure 3.

   [147]Figure 3
   [148]Open in a new tab

   GO and KEGG enrichment analysis of CSRDEGs. (A,B) GO and pathway
   enrichment analysis results of CSRDEGs bar graph (A) and bubble plot
   (B) show CC, MF and biological pathway. GO terms and KEGG terms are
   shown on the ordinate. (C–E) GO and pathway (KEGG) enrichment analysis
   results of CSRDEGs network diagram showing CC (C), MF (D), and KEGG
   (E). Pink nodes represent items, blue nodes represent molecules, and
   the lines represent the relationship between items and molecules.
   CSRDEGs, cellular senescence-related differentially expressed genes;
   GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; CC,
   cellular component; MF, molecular function. The screening criteria for
   GO and pathway enrichment analysis were p-value <0.05 and FDR value
   (q-value) <0.25.

   Meanwhile, GO enrichment analysis was used to construct the network
   diagram of CC, MF, and biological pathways ([149]Figures 3C–[150]E).
   The lines represent the corresponding molecules and the annotations of
   the corresponding entries, and the larger the nodes, the more molecules
   the entries contain.

3.4. GSEA

   GSEA analysis was used to study the association between the expression
   levels of all genes in the [151]GSE143272 dataset and the BPs, CCs, and
   MFs in which they were involved ([152]Figure 4A); this helped determine
   the effects of expression levels of all genes in the [153]GSE143272
   dataset on epilepsy. The specific results are displayed in [154]Table
   3. The findings revealed that all genes in the [155]GSE143272 dataset
   were significantly enriched in neutrophil degranulation ([156]Figure
   4B), IL6 7 pathway ([157]Figure 4C), and NABA extracellular matrix
   (ECM) affiliated ([158]Figure 4D). Dectin 2 family ([159]Figure 4E) and
   other biologically related functions and signaling pathways.

Figure 4.

   [160]Figure 4
   [161]Open in a new tab

   GSEA analysis of epilepsy. (A) GSEA mountain plot of 4 biological
   functions of dataset [162]GSE143272. (B–E) Gene set enrichment analysis
   (GSEA) showed that all genes were significantly enriched in neutrophil
   degranulation (B), IL6 7 pathway (C), NABA ECM affiliated (D), and
   Dectin 2 family (E). GSEA, gene set enrichment analysis. The screening
   criteria of GSEA were adj. p-value <0.05 and FDR value (q-value) <0.25,
   and the p-value correction method was Benjamini–Hochberg (BH).

Table 3.

   Results of GSEA for [163]GSE143272.
   ID Set size Enrichment score NES p-value p. adjust q-value
   REACTOME_NEUTROPHIL_DEGRANULATION 379 0.51414888 2.53977724 12 × 10^−10
   2.1178 × 10^−8 1.7696 × 10^−8
   PID_IL6_7_PATHWAY 34 0.58337638 2.00811468 0.00011881 0.00503238
   0.004205
   NABA_ECM_AFFILIATED 46 0.54504668 1.9943069 9.0572 × 10^−5 0.00411023
   0.00343446
   REACTOME_DECTIN_2_FAMILY 11 0.7478177 1.88658268 0.00111907 0.0234389
   0.01958527
   [164]Open in a new tab

   GSEA, gene set enrichment analysis.

3.5. Screening key genes

   A multi-step analysis was performed to evaluate the value of 40 CSRDEGs
   in the diagnosis of reflux. First, single-factor logistic regression
   was employed to screen 40 CSRDEGs using a p-value <0.05 as a screening
   criterion. The results demonstrated that 39 significant CSRDEGs were
   identified ([165]Supplementary Table 2). Subsequently, the expression
   levels of these 39 CSRDEGs in the engineering group were analyzed and
   led in the [166]GSE143272 dataset using the RF algorithm. The algorithm
   was configured with a seed value of 234 and 500 decision trees,
   analogous to the decision tree workpiece spindle ([167]Figure 5A). The
   results demonstrate that when the number of decision trees is about
   300, the artifacts reach a minimum and stabilize.

Figure 5.

   [168]Figure 5
   [169]Open in a new tab

   Screening of key genes. (A) Plot of model training error of RF
   algorithm. (B) MeanDecreaseGini scatter plot of CSRDEGs (in descending
   MeanDecreaseGini order). (C) Cross-validation error plot. (D)
   Diagnostic model plot of LASSO regression model. (E) Variable
   trajectory plot of LASSO regression model. (F) Forest plot of key genes
   in LASSO regression model. CSRDEGs, cellular senescence-related
   differentially expressed genes; LASSO, least absolute shrinkage and
   selection operator.

   Moreover, a MeanDecreaseGini scatter plot was created for the 39
   CSRDEGs to select essential genes further ([170]Figure 5B). The average
   value of the variable reducing node impurity across all trees is
   represented by MeanDecreaseGini. The significance of the gene in
   distinguishing the epilepsy group from the control group increases with
   this value, which in turn has a greater impact on the diagnosis of
   epilepsy. Besides, a five 10-fold cross-validation analysis determined
   the optimal number of genes, and the cross-validation error curve was
   plotted ([171]Figure 5C). The curve represented that the model error
   was at its lowest when there were 26 genes. Combining these results
   with the MeanDecreaseGini value, 26 CSRDEGs were selected for further
   analysis. These selected CSRDEGs significantly influence the diagnosis
   of epilepsy and are ranked in descending order of importance as
   follows: MARCKSL1, CTSD, ZNF101, RRM2B, RPS6KA3, CREG1, ETS1, MAP2K4,
   RBM25, PECAM1, IRF1, TNFSF10, PDHB, TNFRSF1A, NCAM1, ATP7A, ELF4, CTSS,
   PSME1, MCM3, MSRA, TNC, POLR2H, MAPK1, CPVL, and IL7R ([172]Figures
   5B,[173]C).

   Then, the 26 CSRDEGs identified by the RF algorithm were subjected to
   LASSO regression analysis to construct a LASSO risk model. The analysis
   results were displayed through the LASSO regression model plot
   ([174]Figure 5D) and the LASSO variable trajectory plot ([175]Figure
   5E). The LASSO regression model finally identified the following 15
   CSRDEGs: RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A, PECAM1, IL7R,
   ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, and TNC. A forest plot of
   key genes was drawn using these 15 genes, which were analyzed further
   ([176]Figure 5F).

3.6. Construction of mRNA-miRNA and mRNA-TF interaction network of key genes

   An interaction network based on mRNA-miRNA and mRNA-TF was designed to
   further understand the regulatory mechanism of the key genes in
   epilepsy. Their regulatory patterns and potential molecular mechanisms
   were thoroughly examined by visualizing them through Cytoscape
   software. First, the miRNA regulatory associations of the 15 key genes
   (including RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A, PECAM1, IL7R,
   ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, and TNC) were predicted
   using the miRWalk database. Furthermore, an mRNA-miRNA interaction
   network was built ([177]Figure 6A). The results revealed that 11 key
   genes (including ATP7A, CREG1, CTSD, MAP2K4, MARCKSL1, MCM3, NCAM1,
   RPS6KA3, RRM2B, TNC, TNFRSF1A) had significant interactions with 72
   miRNAs. Detailed information is provided in [178]Supplementary Table 3.

Figure 6.

   [179]Figure 6
   [180]Open in a new tab

   Interaction network analysis of key genes. (A) mRNA-miRNA interaction
   network of key genes. (B) mRNA-TF interaction network of key genes. TF,
   transcription factor. Orange is mRNA, pink is miRNA, and purple is TF.

   Then, the TFs binding to these key genes were predicted using the
   ChIPBase database. An mRNA-TF interaction network was constructed
   ([181]Figure 6B). The results revealed that 13 key genes (such as
   RPS6KA3, ATP7A, CPVL, CREG1, CTSD, IL7R, MARCKSL1, MCM3, PECAM1, RRM2B,
   TNC, TNFRSF1A, and ZNF101) interacted with 40 transcription factors.
   Further details are documented in [182]Supplementary Table 4. These
   interaction networks provide new insights into the regulatory
   mechanisms of these genes and their potential biological functions in
   epilepsy.

   The mRNA-miRNA network is illustrated in [183]Figure 6A. The complex
   regulatory associations between genes such as RPS6KA3 and MAP2K4 and
   multiple miRNAs highlight that these miRNAs may play essential roles in
   the pathological process of epilepsy by regulating gene expression.
   [184]Figure 6B demonstrates the mRNA-TF network, which exhibited
   significant interactions between key genes and multiple TFs (such as
   JUN, FOS, and STAT3, among others), suggesting that these factors may
   be crucial in the regulatory network of epilepsy by controlling the
   expression of key genes.

3.7. Establishment of diagnostic logistic regression model and functional
similarity analysis of key genes

   The 15 key genes (RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A, PECAM1,
   IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, and TNC) were
   incorporated into a multivariate logistic regression model to determine
   the coefficients of each gene. This analysis facilitated the
   development of a diagnostic model for epilepsy. Then, the expression
   and coefficient of 15 key genes from the [185]GSE143272 dataset were
   interpolated based on the risk score formula. The risk score of each
   sample was determined, and then the epilepsy group was divided into
   low- and high-risk scores based on their median risk score value. The
   following formula was used to calculate the risk score for the
   [186]GSE143272 dataset:
   [MATH: <mtable columnalign="left"
   displaystyle="true"><mtr><mtd><mtext>Risk
   score</mtext><mo>=</mo><mo>−</mo><mn>1549.336</mn><mo>∗</mo><mi>RPS</mi
   ><mn>6</mn><mi>KA</mi><mn>3</mn><mo>+</mo><mn>588.388</mn><mo>∗</mo><mt
   ext>CTSD</mtext><mo>+</mo></mtd></mtr><mtr><mtd><mn>446.008</mn><mo>∗</
   mo><mtext>NCAM</mtext><mn>1</mn><mo>−</mo><mn>94.376</mn><mo>∗</mo><mte
   xt>CREG</mtext><mn>1</mn><mo>+</mo></mtd></mtr><mtr><mtd><mn>640.352</m
   n><mo>∗</mo><mtext>CPVL</mtext><mo>−</mo><mn>722.159</mn><mo>∗</mo><mte
   xt>TNFRSF</mtext><mn>1</mn><mi
   mathvariant="normal">A</mi><mo>+</mo></mtd></mtr><mtr><mtd><mn>17.809</
   mn><mo>∗</mo><mtext>PECAM</mtext><mn>1</mn><mo>+</mo><mn>640.925</mn><m
   o>∗</mo><mi>IL</mi><mn>7</mn><mi
   mathvariant="normal">R</mi><mo>−</mo></mtd></mtr><mtr><mtd><mn>1789.441
   </mn><mo>∗</mo><mi>ZNF</mi><mn>101</mn><mo>+</mo><mn>852.298</mn><mo>∗<
   /mo><mi>RRM</mi><mn>2</mn><mi
   mathvariant="normal">B</mi><mo>+</mo></mtd></mtr><mtr><mtd><mn>125.355<
   /mn><mo>∗</mo><mtext>MARCKSL</mtext><mn>1</mn><mo>−</mo><mn>826.417</mn
   ><mo>∗</mo><mi>MCM</mi><mn>3</mn><mo>−</mo></mtd></mtr><mtr><mtd><mn>97
   9.354</mn><mo>∗</mo><mi>ATP</mi><mn>7</mn><mi
   mathvariant="normal">A</mi><mo>+</mo><mn>1776.780</mn><mo>∗</mo><mi>MAP
   </mi><mn>2</mn><mi
   mathvariant="normal">K</mi><mn>4</mn><mo>+</mo></mtd></mtr><mtr><mtd><m
   n>422.015</mn><mo>∗</mo><mi>TNC</mi></mtd></mtr></mtable> :MATH]

   The association between the 15 key genes was depicted using a nomogram
   ([187]Figure 7A), which exhibited that ZNF101 and MAP2K4 expression
   were the main contributors to the multivariate logistic model. Then,
   the diagnostic performance of the multivariate logistic model for
   epilepsy was evaluated by DCA based on the [188]GSE143272 dataset. The
   results are illustrated in [189]Figure 7C, which demonstrates that the
   line of the model was stable at a higher level than all and none in a
   certain range, and the net benefit of the model was higher, indicating
   that the diagnostic effect of the model was better. Finally, the ROC
   curve was plotted based on the risk scores of the [190]GSE143272 and
   [191]GSE32534 datasets using the R package pROC to verify the utility
   of the multifactor logistic model in epilepsy diagnosis. According to
   ROC analysis, the multifactor logistic model demonstrated high accuracy
   in epilepsy diagnosis ([192]Figures 7B,[193]D). The functional
   similarity (Friends) analysis scores were used to determine the genes
   that play an important role in the biological process of epilepsy
   ([194]Figure 7E). The results revealed that ZNF101 played an essential
   role in epilepsy and was the closest to the cut-off value (cut-off
   value = 0.60).

Figure 7.

   [195]Figure 7
   [196]Open in a new tab

   Key genes to construct diagnostic. (A) Logistic regression model
   nomogram of key genes in the diagnostic multivariate logistic model
   based on dataset [197]GSE143272. (B) Diagnostic ROC curve of risk score
   of diagnostic multivariate logistic model in data set [198]GSE143272.
   (C) DCA plot of the key genes of the diagnostic multivariate logistic
   model based on dataset [199]GSE143272. (D) Diagnostic ROC curve of risk
   score of diagnostic multivariate logistic model in dataset
   [200]GSE32534. (E) Functional similarity map of key genes. (F)
   Chromosomal mapping of key genes. The ordinate of the DCA plot is the
   net benefit, and the abscissa is the probability threshold or threshold
   probability. DCA, decision curve analysis. ROC, receiver operating
   characteristic; AUC, area under the curve. The closer the AUC is to 1
   in the ROC curve, the better the diagnostic performance. When AUC was
   above 0.9, the accuracy was high.

   The location of 15 key genes on the human chromosome was analyzed using
   the R package RCircos ([201]31), and a chromosome localization map was
   drawn ([202]Figure 7F), which revealed that more genes were located on
   chromosomes 1, 11, 17, and X; MARCKSL1 and CREG1 were on chromosome 1,
   CTSD and NCAM1 on chromosome 11, MAP2K4 and PECAM1 on chromosome 17,
   and ATP7A and RPS6KA3 on chromosome X.

3.8. Differential expression validation analysis of key genes

   To elucidate the roles of 15 key genes (RPS6KA3, CTSD, NCAM1, CREG1,
   CPVL, TNFRSF1A, PECAM1, IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A,
   MAP2K4, and TNC) within the [203]GSE143272 dataset, a differential
   expression analysis was conducted comparing the epilepsy and control
   cohorts. The findings, depicted in [204]Figure 8A, indicate that the
   expression levels of five pivotal genes (CTSD, CREG1, ZNF101, RRM2B,
   and MARCKSL1) demonstrated statistically significant differences
   (p < 0.001) between the epilepsy and control groups. Finally, the R
   package pROC was used to draw ROC curves based on the expression levels
   of key genes in the [205]GSE143272 dataset. According to the ROC curves
   ([206]Figures 8B–[207]F), the expression level of key genes including
   CTSD, CREG1, PECAM1, ZNF101, RRM2B, MARCKSL1, and MAP2K4 demonstrated a
   certain accuracy in classifying epilepsy and control groups (0.7 < AUC
   < 0.9). Moreover, the expression levels of RPS6KA3, NCAM1, CPVL,
   TNFRSF1A, IL7R, MCM3, ATP7A, and TNC exhibited low accuracy in
   classifying epilepsy and control groups (0.5 < AUC < 0.7).

Figure 8.

   [208]Figure 8
   [209]Open in a new tab

   Validation analysis of differential expression of key genes. (A) Group
   comparison of key genes in the epilepsy group and the control group of
   dataset [210]GSE143272. (B–F) Key genes RPS6KA3, CTSD, and NCAM1 (B),
   CREG1, CPVL, and TNFRSF1A (C), PECAM1, IL7R, and ZNF101 (D), RRM2B,
   MARCKSL1, and MCM3 (E), ATP7A, ROC curves of MAP2K4 and TNC (F) in
   dataset [211]GSE143272. (G) Group comparison diagram of key genes in
   dataset [212]GSE32534 epilepsy and control groups. (H–L) Key genes:
   RPS6KA3, CTSD, and NCAM1 (H), CREG1, CPVL, and TNFRSF1A (I), PECAM1,
   IL7R, and ZNF101 (J), RRM2B, MARCKSL1, and MCM3 (K), ATP7A,
   [213]GSE32534 ROC curves of MAP2K4 and TNC (L) in dataset
   [214]GSE32534. ^*Represents p-value <0.05, indicating statistical
   significance. ^**Represents p-value <0.01, highly statistically
   significant. ^***Represents p-value <0.001 and highly statistically
   significant. When AUC >0.5, it indicates that the molecule’s expression
   is a trend to promote the event’s occurrence, and the closer the AUC is
   to 1, the better the diagnostic effect. AUC between 0.5 and 0.7 had low
   accuracy, AUC between 0.7 and 0.9 had moderate accuracy, and AUC above
   0.9 had high accuracy. DCA, decision curve analysis; ROC, receiver
   operating characteristic; AUC, area under the curve; TPR, true positive
   rate; FPR, false positive rate. Light blue represents the control
   group, and light red represents the epilepsy group.

   By applying the same analytical approach, we examined the expression
   profiles of key genes (RPS6KA3, CTSD, NCAM1, CREG1, CPVL, TNFRSF1A,
   PECAM1, IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, MAP2K4, TNC) in the
   [215]GSE32534 dataset. The differential expression patterns of these 15
   key genes between the epilepsy group and the control group were
   illustrated using a grouped comparison chart ([216]Figure 8G). These
   findings demonstrated that the expression levels of two key genes,
   namely NCAM1 and PECAM1, were highly statistically significant (p-value
   <0.01). Finally, the R package pROC was used to draw the ROC curve
   based on the expression levels of the key genes in the [217]GSE32534
   dataset. The ROC curves ([218]Figures 8H–[219]L) revealed that the
   expression levels of key genes NCAM1, CPVL, PECAM1, and TNC had high
   accuracy in classifying epilepsy and control groups (AUC >0.9).
   Furthermore, the expression levels of RPS6KA3, CTSD, CREG1, TNFRSF1A,
   IL7R, ZNF101, RRM2B, MARCKSL1, MCM3, ATP7A, and MAP2K4 exhibited
   moderate accuracy (0.7 < AUC < 0.9).

3.9. GSEA enrichment analysis based on high and low logistic risk score
groups

   GSEA was used to investigate the association between the expression of
   all 21,655 genes and the BPs, CCs, and MFs involved in different
   epilepsy risk score groups (low/high) in the [220]GSE4290 dataset
   ([221]Figure 9A) in order to determine the effect of all gene
   expression levels in the [222]GSE4290 dataset on the difference between
   high and low epilepsy risk score groups. The specific results are
   demonstrated in [223]Table 4. The results revealed that all the genes
   in the [224]GSE4290 dataset were significantly enriched in the neuronal
   system ([225]Figure 9B), anti-inflammatory response GABA receptor
   signaling ([226]Figure 9C), neurotransmitter receptors and postsynaptic
   signal transmission ([227]Figure 9D), neuroactive ligand-receptor
   interaction ([228]Figure 9E), and other biologically relevant functions
   and signaling pathways.

Figure 9.

   [229]Figure 9
   [230]Open in a new tab

   GSEA analysis of the high and low-risk groups of epilepsy. (A) GSEA
   mountain plot of four biological functions of dataset [231]GSE4290.
   (B–E) GSEA showed that all genes were significantly enriched in
   neuronal system (B), anti inflammatory response GABA receptor signaling
   (C), and anti-inflammatory response GABA receptor signaling (C).
   Neurotransmitter receptors and postsynaptic signal transmission (D),
   neuroactive ligand receptor interaction (E). GSEA, gene set enrichment
   analysis. The screening criteria of GSEA were adj. p-value <0.05 and
   FDR value (q-value) <0.25, and the p-value correction method was BH.

Table 4.

   Results of GSEA (low/high) for [232]GSE4290.
   ID Set size Enrichment score NES p-value p. adjust q-value
   REACTOME_NEURONAL_SYSTEM 394 0.47660382 1.96345638 1 × 10^−10
   2.479 × 10^−7 2.4516 × 10^−7
   WP_GABA_RECEPTOR_SIGNALING 30 0.69762082 1.94311111 0.00011952
   0.04938266 0.04883643
   REACTOME_NEUROTRANSMITTER_RECEPTORS_AND_POSTSYNAPTIC_SIGNAL_TRANSMISSIO
   N 196 0.47801477 1.84995884 1.9014 × 10^−6 0.00157115 0.00155377
   KEGG_NEUROACTIVE_LIGAND_RECEPTOR_INTERACTION 262 0.3956592 1.58541008
   9.7042 × 10^−5 0.04811335 0.04758116
   [233]Open in a new tab

   GSEA, gene set enrichment analysis.

4. Discussion

   This study identified 15 key genes linked to epilepsy, such as RPS6KA3
   and TNFRSF1A, which are involved in lipid metabolism, apoptosis, and
   inflammation. These genes may influence seizure development and
   frequency. Additionally, we explored the role of CSRGs in epilepsy and
   developed a diagnostic model. Current epilepsy treatments, primarily
   including drugs and surgery ([234]32), demonstrate varying
   effectiveness, with about 30% of patients experiencing drug-resistant
   epilepsy ([235]33). This resistance may result from gene mutations,
   drug metabolism issues, and molecular changes in the brain ([236]34).

   Surgical treatment can control seizures by removing the lesion area,
   but it requires precise localization of the epileptic focus and has
   surgical risks, such as possible neurological damage ([237]35). Current
   epilepsy treatments have limitations, making it essential to explore
   their molecular mechanisms and identify novel therapeutic targets. The
   complex pathology of epilepsy involves genetic mutations,
   neurotransmitter imbalances, and synaptic dysfunction. Studies have
   identified genes such as SCN1A, GABRA1, and KCNQ2 ([238]36), which
   regulate neuronal electrophysiological activity, synaptic signaling,
   and neural development ([239]37). Mutations in the SCN1A gene are a
   major cause of severe epilepsy in infants ([240]38), while mutations in
   KCNQ2 and KCNQ3 genes increase neuronal excitability, affecting seizure
   frequency and type ([241]39). Neuroinflammation and oxidative stress
   also play crucial roles in epilepsy development ([242]40), with
   neuroinflammation triggering abnormal neuronal firing and epileptic
   lesions and oxidative stress causing mitochondrial dysfunction and
   calcium imbalance ([243]41). Since most studies focused on specific
   gene mutations or pathways without exploring their interactions, a
   comprehensive understanding of the etiology and pathogenesis of
   epilepsy is lacking despite advances in understanding molecular
   mechanisms. Most studies focus on specific gene mutations or signaling
   pathways without understanding the interactions between molecular
   mechanisms. Developing targeted treatments for epilepsy is challenging
   due to its diverse and individual causes. The global prevalence and
   treatment difficulties of epilepsy highlight the need for a deeper
   understanding of its molecular mechanisms for prevention and treatment.

   Cellular senescence, an irreversible cell cycle arrest due to stress,
   was initially seen as protective against tumors but is now linked to
   various diseases, including cardiovascular, metabolic, and
   neurodegenerative disorders ([244]42). Recently, research interest has
   been drawn to CSRGs, which are crucial in aging-related diseases
   through their effects on the cell cycle, metabolism, and inflammation
   ([245]43). Studying these genes is vital for understanding complex
   disease mechanisms. For instance, genes such as p16^INK4a, p21, and p53
   are overexpressed in AD patients, which contribute to the disease
   through stress response, mitochondrial function, and neuroinflammation
   ([246]44). Moreover, cellular aging leads to mitochondrial dysfunction
   and dopamine neuron damage, causing movement and cognitive issues
   ([247]45). Notably, the SASP, a hallmark of cellular senescence, plays
   a critical role in establishing a chronic inflammatory microenvironment
   within the central nervous system. This is mediated by the secretion of
   various pro-inflammatory cytokines, including tumor necrosis factor-α
   (TNF-α) and interleukin-6 (IL-6). This inflammatory milieu not only
   exacerbates glial cell activation and synaptic structural abnormalities
   but also disrupts GABAergic inhibitory neurotransmission, thereby
   increasing neuronal synchrony and excitability. These collective
   alterations contribute to the pathophysiology and recurrence of
   epilepsy ([248]46–48). Consequently, targeting senescence-related genes
   and pathways represents a promising therapeutic strategy.

   The role of cellular senescence in neurodegenerative diseases such as
   AD and PD is well-established; its mechanisms in epilepsy remain
   unclear. Most studies on epilepsy focus on inflammation and neuronal
   apoptosis ([249]49), rarely addressing cell aging-related genes such as
   p16^INK4a, p53, and CDKN2A. The expression patterns of CSRGs in
   epilepsy patients and their impact on the lesion microenvironment are
   not well understood. The precise function of cell senescence in
   different forms of epilepsy, including temporal lobe and infantile
   epilepsy, also needs further investigation.

   This study analyzed gene expression data from epilepsy patients and
   healthy controls using the GEO database and various bioinformatics
   methods, overcoming traditional research challenges in data integration
   and model construction. This study has several limitations. Fifteen
   potential key genes in epilepsy were identified, including RPS6KA3,
   CTSD, and NCAM1. RPS6KA3, located on the X chromosome, encodes the RSK2
   protein, crucial for neuronal growth and survival, and mutations in
   this gene are linked to various diseases ([250]50). Mutations in CTSD
   gene, which encodes the Cathepsin D protein involved in lysosomal
   protein degradation, can lead to protein metabolism disorders and are
   associated with neurodegenerative diseases such as AD and PD ([251]51,
   [252]52). NCAM1 gene encodes a neural cell adhesion molecule critical
   for neuronal interactions and network formation, and its abnormal
   expression is linked to various neurological disorders ([253]53). CREG1
   gene is a key transcriptional regulator, and its protein influences the
   cell cycle by interacting with retinoblastoma protein, impacting
   neuronal survival and differentiation ([254]54). CPVL gene encodes a
   serine-like carboxypeptidase involved in protein degradation and immune
   modulation, affecting inflammatory and metabolic diseases ([255]55).
   TNFRSF1A gene encodes TNFR1, a major mediator of TNF-α signaling,
   crucial in inflammation, apoptosis, and immune response, and is linked
   to various disorders, especially autoinflammatory and neurological
   diseases ([256]56). PECAM1 gene encodes a transmembrane protein in
   endothelial cells and platelets, which is essential for angiogenesis,
   inflammation, and intercellular interactions, potentially influencing
   the blood–brain barrier and neurovascular units in epilepsy ([257]57).
   IL7R gene encodes the IL-7Rα, which is crucial for immune development
   and regulation in T and B cells ([258]58, [259]59). ZNF101 gene
   produces a zinc finger protein that regulates gene expression, cell
   cycle, and DNA repair ([260]60). RRM2B gene encodes a subunit essential
   for DNA replication and repair, maintaining genome stability.
   MARCKSL1influences cytoskeletal dynamics and cell movement by
   regulating phosphatidylinositol-4,5-bisphosphate (PIP2) distribution
   ([261]61). MARCKSL1 regulates cytoskeletal dynamics by controlling PIP2
   distribution, aiding cell migration and invasion ([262]62). MCM3 gene
   encodes a key DNA replication initiation complex component, crucial for
   DNA replication accuracy and genome stability ([263]63). ATP7A gene
   mutations, affecting copper ion transport, are linked to copper
   metabolism disorders like Menkes and neurodegenerative diseases
   ([264]64). MAP2K4 encodes MKK4, a dual-specificity kinase crucial in
   the MAPK pathway, responding to environmental stresses via JNK and p38
   MAPK activation ([265]65). Recent studies indicate that MAP2K4 is
   crucial in AD and PD by regulating apoptosis and neuroinflammation
   ([266]66). TNC gene, which encodes an extracellular matrix protein, is
   abnormally expressed in neurodegenerative diseases, especially AD and
   PD, potentially worsening pathology by affecting cell adhesion and
   inflammation ([267]67, [268]68). The column-line graph ([269]Figure 6A)
   demonstrates that ZNF101 and MAP2K4 significantly contribute to the
   multifactorial logistic model. ZNF101 may influence neuronal survival
   by regulating DNA repair or apoptosis, while MAP2K4 is involved in the
   pathogenesis of epilepsy by affecting apoptosis and inflammation
   through the MAPK signaling pathway.

   Among the 15 genes, RPS6KA3, MAP2K4, MARCKSL1, and CREG1 are associated
   with cell signaling, neuronal growth, and survival. Their proteins are
   involved in several crucial cell signaling pathways, such as MAPK and
   phosphatidylinositol signaling pathways. Mutations or abnormal
   expressions of these genes may affect neuronal excitability and
   plasticity, leading to abnormal neural networks and potentially causing
   seizures ([270]66, [271]69). Besides, RRM2B, MCM3, ZNF101, CTSD, and
   ATP7A are involved in protein metabolism, cell cycle regulation, and
   genome stability, playing roles in DNA replication, repair, and protein
   degradation to maintain genome stability and cellular metabolic
   balance. Their functions are vital for neuronal survival, and any
   mutations or functional defects can result in disrupted protein
   metabolism and DNA damage accumulation, leading to neuronal damage and
   neurodegenerative diseases ([272]64). NCAM1, TNC, PECAM1, IL7R,
   TNFRSF1A, and CPVL are associated with cell adhesion, inflammatory
   response, and immune regulation. The proteins encoded by these genes
   play essential roles in mediating intercellular interactions,
   facilitating the establishment of neural networks, and modulating
   immune system functions. Specifically, these proteins are crucial for
   neuronal cell adhesion, angiogenesis, and the regulation of immune
   responses. Abnormal expression of adhesion molecules and excessive
   inflammation can lead to neurodegenerative diseases ([273]70).

   Furthermore, mRNA-miRNA and mRNA-TF interaction networks were
   constructed to better understand the regulatory mechanisms of key genes
   in epilepsy. The mRNA-miRNA network revealed the role of miRNA in gene
   regulation, while the mRNA-TF network demonstrated the effect of TFs
   linked to neuronal function, stress response, and inflammation on
   epilepsy. These analyses addressed the limitations of gene expression
   data and offered new targets for personalized epilepsy treatment by
   regulating upstream factors of key genes. Significant regulatory
   interactions between 11 key genes and 72 miRNAs were identified in an
   mRNA-miRNA network, suggesting these miRNAs may impact epilepsy-related
   processes by modulating gene expression. For instance, RPS6KA3
   interacts with several miRNAs (such as hsa-miR-30c-5p and
   hsa-miR-19a-3p), potentially affecting neuron growth and survival via
   the MAPK pathway, influencing neural plasticity and seizure frequency
   ([274]71). Abnormal MAP2K4 expression may worsen neuronal stress and
   inflammation through JNK and p38 pathways, highlighting its role in
   epilepsy ([275]72).

   Additionally, an mRNA-TF network analysis exhibited 13 key genes
   significantly interacting with 40 TFs. TFs such as JUN, FOS, and STAT3
   have crucial roles in cellular stress response, immune regulation, and
   neuronal growth by controlling key genes such as ZNF101 and TNFRSF1A.
   Particularly, ZNF101 influences neuronal survival and dysfunction by
   managing DNA repair and the cell cycle. This regulatory network is
   vital for understanding the gene-TF axis in epilepsy, offering insights
   into the pathogenesis and personalized treatment strategies for the
   disease. Targeting specific miRNAs or TFs can effectively manage
   epilepsy-related gene expression, reducing neuronal damage and seizure
   frequency. For instance, miR-21 influences neuronal survival via the
   phosphatase and tensin homolog (PTEN)/mammalian target of rapamycin
   (mTOR) pathway, and its intervention can mitigate epileptic seizures
   ([276]72). Anti-inflammatory treatments targeting genes, including
   TNFRSF1A and IL7R, are essential for epilepsy patients exhibiting
   pronounced inflammatory responses ([277]73, [278]74).

   Bioinformatics and machine learning methods are used to develop an
   epilepsy diagnostic model using CSRGs, which lays a foundation for
   clinical diagnosis and personalized treatment. A multifactor logistic
   regression model was created using 15 key genes. The model demonstrated
   preliminary evidence of strong diagnostic potential based on the ROC
   analysis using [279]GSE143272 and [280]GSE32534 datasets ([281]Figures
   8B,[282]D). This study demonstrated the effectiveness of a multifactor
   logistic model for diagnosing epilepsy at the genetic level,
   particularly using CSRG-based markers to distinguish patients from
   healthy individuals. This strategy combines multiple gene expressions
   for a more accurate prediction than single biomarker methods. The
   multidimensional gene combinations enhance the reliability of the
   model, highlighting the diagnostic potential of CSRGs. Notably, ZNF101
   and MAP2K4 significantly contribute to epilepsy diagnosis ([283]Figure
   8A). ZNF101, a zinc finger transcription factor, may affect neuronal
   DNA repair and homeostasis. Recent research links neuronal senescence
   and abnormal regulatory factor expression in patients with refractory
   epilepsy to neuroinflammation and disease progression ([284]9,
   [285]75). RPS6KA3, a key player in the MAPK signaling pathway,
   influences neuronal survival and excitability. Dysregulation of this
   pathway is linked to neuronal hyperexcitability and epilepsy ([286]9,
   [287]11). MAP2K4 regulates the JNK and p38 MAPK pathways, impacting
   stress, apoptosis, and inflammation, and plays an important role in
   epilepsy-related neuronal damage and chronic inflammation ([288]11,
   [289]76). These genes are crucial in epilepsy pathology, affecting
   neuroinflammation, synaptic signaling, and gene stability, offering
   insights into molecular mechanisms and potential therapies for
   epilepsy. The CSRGs-based genetic diagnostic model provides hints for
   personalized treatment. Early identification of high-risk epilepsy
   patients can help with personalized treatment strategies by detecting
   expression levels of key genes, including ZNF101 and MAP2K4. The role
   of inflammation-related genes (such as TNFRSF1A and IL7R) suggests that
   anti-inflammatory therapy can be a new treatment direction, especially
   for patients with significant inflammatory responses, potentially
   reducing seizure frequency.

   This study also analyzed differences between high- and low-epilepsy
   risk groups using GSEA on 21,655 genes from the [290]GSE4290 dataset,
   revealing pathways related to epileptic mechanisms ([291]Figures
   8A–[292]E). GSEA identified significant enrichment of nervous system
   function pathways in the high-risk group ([293]Figure 8B). These
   neurological pathways encompass neurotransmitter receptors and
   postsynaptic signaling mechanisms, suggesting that these genes may play
   a pivotal role in aberrant neuronal signaling. This dysfunction is
   associated with the occurrence of seizures, which are closely connected
   to disrupted synaptic signaling and increased neuronal excitability in
   epilepsy. Furthermore, the connection between epilepsy and
   neuroinhibitory pathways was reinforced by the enrichment of the GABA
   receptor signaling pathway ([294]Figure 8C). Abnormalities in GABA
   receptor signaling, a key inhibitory neurotransmitter, result in
   reduced inhibitory neuromodulation, increasing the risk of abnormal
   neuronal discharges ([295]48). These findings suggested that targeting
   the GABA pathway can be a potential therapeutic approach for
   controlling seizures. The findings also indicated that the neuroactive
   ligand-receptor interaction pathway ([296]Figure 8E) was enriched in
   the high-risk epilepsy group. It also highlighted the enrichment of the
   neuroactive ligand-receptor interaction pathway in patients with
   high-risk epilepsy, indicating that its dysfunction may increase
   neuronal excitability and trigger seizures ([297]77). Targeted
   intervention in this pathway can help regulate abnormal neural activity
   and reduce seizures. GSEA analysis linked seizures to key
   neurobiological pathways, particularly neurotransmitter abnormalities,
   GABA signaling, and neuroactive ligand-receptor interactions, offering
   potential targets for personalized treatments for epilepsy. Future
   multi-omics data research can further explore these pathways to develop
   new therapeutic strategies.

   A total of 15 critical CSRGs were identified using bioinformatics and
   machine learning methodologies, and a diagnostic model was constructed
   that exhibited robust performance across both training and validation
   datasets. The primary aim of this study was to distinguish patients
   with epilepsy from healthy controls by leveraging gene expression
   profiles linked to cellular aging, thereby constructing a reliable
   diagnostic tool for epilepsy. The model assigned high and low scores to
   evaluate the likelihood of epilepsy at the time of diagnosis rather
   than predicting disease progression or recurrence risk. Although the
   validation dataset ([298]GSE32534) was limited in size (n = 10), it
   corroborated these findings and was consistent with existing
   literature. Increasing the sample size can enhance the generalizability
   of these results. Currently, the model is in an exploratory phase and
   has not yet been validated at the protein level or with clinical
   samples. Given the constraints of time and resources, we recommend
   using additional external datasets or data augmentation in future
   research to enhance the stability and applicability of the model to a
   broader and more diverse population.

5. Conclusion

   In conclusion, the findings of this study indicated a significant
   association between cellular senescence and the pathophysiology of
   epilepsy, which may present novel targets for early diagnosis and
   personalized treatment strategies. Considering the complexity of
   epilepsy as a neurological disorder, the clinical applicability of
   these markers warrants further investigation. Future research should
   focus on a comprehensive exploration of these pathways, integrating
   multi-omics data to substantiate their potential. This approach can
   yield innovative insights and strategies for the precise diagnosis and
   management of epilepsy. Moreover, future studies should aim to collect
   larger-scale epilepsy datasets through multi-center collaborations to
   enhance the robustness and generalizability of the model.

Acknowledgments