Abstract Spinal cord injury (SCI) is a profound affliction of the central nervous system that often remains inadequately addressed. Prior research has indicated that endoplasmic reticulum stress (ERS), associated with apoptotic signaling, plays a part in subsequent injuries post-SCI. However, the exact mechanisms are still unclear. ERS-related genes and SCI-associated datasets were sourced from the Genecard and GEO databases. We identified 68 ERSDEGs and pinpointed 6 marker genes vital for SCI diagnosis (CYBB, PRDX6, PTGS1, GCH1, TLR2 and PIK3CG) which were all upregulated in SCI based on bioinformatics and qRT-PCR. The nomogram exhibited that these genes could effectively predict the occurrence of SCI. Functional analysis revealed the potential roles of these genes was closely related to neuron cells and immune response. Immune infiltration research underscored the substantial roles of macrophage and CD56 dim NK cells in SCI. The ceRNA network analysis further revealed the complex interplay among marker genes, lncRNAs and miRNAs in SCI. We screened six marker genes with great diagnostic value, and found that these genes may affect the occurrence of SCI by affecting the immune response and recovery of neurons. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-024-81844-7. Keywords: Spinal cord injury, Endoplasmic reticulum stress, Single-cell RNA, Immune landscape, Machine learning Subject terms: Diseases of the nervous system, Spinal cord diseases Introduction Spinal cord injury (SCI), results in profound impairment, incurring both financial and emotional burdens^[32]1. Annually, over 12,000 individuals suffer from spinal cord injuries in the U.S., leading to a rough count of 270,000 such incidents across North America^[33]2. Despite global efforts, present surgical and pharmacological treatments yield inadequate results, often leading to suboptimal patient recovery^[34]3. Hence, a comprehensive understanding of SCI’s pathological progression becomes indispensable for devising effective therapeutic strategies. The underlying physiological processes of SCI can be segmented into two stages: an initial injury, mainly due to physical trauma to the spine, followed by a subsequent phase characterized by a series of cellular and biological reactions. The latter encompasses a myriad of events including hemorrhage, ischemia, oxidative stress, inflammation, neuronal death, demyelination, and scar formation. Recent single-cell transcriptome analyses highlight that the most significant changes post-SCI occur between days 3 and 14, marked by microglia activation and cellular transformations^[35]4. The initial injury triggers microglial activity in the vicinity, compromising the stability of the barrier between the blood and spinal cord. This attracts circulating immune cells like neutrophils, lymphocytes, and macrophages into the spine area. As these cells emit agents that are either inflammatory or modulatory in nature, they have a complex impact on immunity, typically hindering the recovery from SCI^[36]5,[37]6. Nevertheless, the precise role of immunity, spanning from damage amplification to regenerative potential, remains a contentious topic. The endoplasmic reticulum (ER), serving as a cellular hub, is actively involved in both protein production and calcium storage, and it participates in diverse cell signaling pathways, including those of lipid biogenesis and calcium metabolism^[38]7,[39]8. Acute intrinsic responses within the spinal cord post-SCI involve ER stress activation^[40]9. Notably, the demise of neuronal cells post-SCI, pivotal to the ensued neurological deficits, is predominantly attributed to secondary biochemical perturbations rather than the immediate mechanical impact^[41]10. These biochemical aberrations, including protein misfolding, instigate the ER stress response. Prolonged and intensified ER stress can perturb ER functions, culminating in cell death^[42]11. However, the detailed mechanisms by which endoplasmic reticulum stress accentuates apoptosis during secondary injuries remain elusive. To bridge this knowledge chasm, we deployed an array of bioinformatic techniques, ranging from differential expression analysis to molecular subtyping, aiming to discern robust biomarkers pertinent to SCI diagnosis. Concurrently, we discerned dual immune regulatory modalities in SCI and constructed a competing endogenous RNA (ceRNA) network. Key gene-immune cell interactions were explored, and validation was done using datasets from single cell RNA (scRNA)-seq for gene expression, with qPCR analysis for core genes. Our findings accentuate the significance of genes associated with ER stress in SCI evolution, proffering enhanced insights into the molecular immunological frameworks orchestrating SCI pathology. Materials and methods Data acquisition and preprocessing The four mRNA datasets of SCI patients ([43]GSE5296, [44]GSE47681, [45]GSE132242) were obtained from GEO database, including 23 SCI and 19 Sham samples. Besides, these SCI samples were all collected on day 3 post-SCI. In instances where multiple probes matched a singular gene, we took the average expression, excluding the minimal values and outliers. To ensure uniformity across the datasets, we employed batch normalization with the assistance of “sva” package^[46]12. The integrity of the downloaded and processed data was corroborated through boxplots and principal component analysis (PCA) using “ggplot2” and “FactoMineR”. Separately, we retrieved 1193 ER-associated genes via the Genecard platform, with the condition that a relevance score exceeding 7. The study workflow is succinctly depicted in Fig. [47]1. Fig. 1. [48]Fig. 1 [49]Open in a new tab The normalization of datasets and identification of DEGs. (A) The boxplot of three datasets before normalization. (B) The boxplot of three datasets after normalization. (C) PCA plot of three datasets before normalization. (D) PCA plot of three datasets after normalization. (E) The volcano diagram showed the differentially expressed genes between SCI group and control group. Differentially expressed genes (DEGs) identification We initiated our analysis by extracting the expression matrix of genes related to ERS from the merged dataset, yielding 409 genes in common. Through the R package “limma”^[50]13, differentially expressed analysis was analyzed. Genes were deemed differentially expressed if they met the criteria of |logFC| > 1, and a significance level of p < 0.05. Visualization of these findings was facilitated using the “ggplot2” package. Functional enrichment and pathway analysis For a deeper understanding of the implications of the identified genes, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses with “ClusterProfiler” in R^[51]14. We set the significance thresholds at p < 0.05 and FDR < 0.25. Biomarker identification and risk prediction modeling To refine data dimensionality, we applied the LASSO method with the “glmnet” tool in R^[52]15. Concurrently, we employed the SVM-RFE model using the “SVM” package. The performance of model was evaluated based on mean misclassification rates across 10-fold cross-validations^[53]16. The overlapping genes recognized by both algorithms were considered optimal for SCI diagnosis. Besides, ROC curve was used to evaluate the diagnostic capabilities of our selected markers using “pROC” package. Pathway analysis using single-gene gene set enrichment analysis (GSEA) For clarification of pathways linked to the marker genes, we examined their relationships with the remaining genes using the GSEA algorithm. Subsequently, genes were ranked based on their correlation strengths, from highest to lowest, and this ranked list was utilized for the enrichment analysis. The KEGG pathway database was used as the reference to determine the enrichment levels within the gene set. Assessment of immune cell infiltration in SCI To evaluate the degree of immune cell infiltration in both SCI and control samples, the single sample GSEA (ssGSEA) method was employed. Then, an evaluation of the association among the identified genes and the infiltrated immune cells was performed based on Pearson correlation analysis using reshape2 package. Elucidation of SCI-associated molecular subtypes The consensus cluster plus analysis was used to determine the subtype of SCI using ConsensusClusterPlus package. Parameters for this analysis were set as follows: sample resampling ratio at 80%, number of resampling at 1,000, and a maximum of 9 clusters. By analyzing the cumulative distribution function (CDF), consensus matrix, and variations in the region beneath the CDF curve, the best cluster count was ascertained. Visualization of the molecular typing results was achieved using the pheatmap package. Construction of the ceRNA and TF-mRNA network In our research, a ceRNA framework was built encompassing mRNAs, miRNAs, and lncRNAs. MiRWalk([54]http://mirwalk.umm.uni-heidelberg.de/) was employed to forecast the targeted miRNAs for the chosen 6 mRNAs^[55]17. Subsequent identification of lncRNAs, acting as upstream regulators of miRNAs, was conducted with StarBase v2.0 ^[56]18. Given that both databases include experimentally confirmed miRNA-target interactions, we ensured a robust network construct. The ceRNA framework was depicted with the assistance of Cytoscape (version 3.9.0). Data for forming the TF-mRNA network was acquired through the TRRUST database ([57]https://www.grnpedia.org/trrust/)^[58]19. Using the paired TFs and mRNAs, we juxtaposed the findings from RNA differential analysis, which yielded a unique TF-mRNA network. This network was subsequently visualized using the “ggplot2” and “ggalluvial” packages^[59]20 in R. Single-cell dataset exploration and validation The scRNA-seq count data was retrieved from [60]GSE189070 for our exploration, comprising eight instances: seven from SCI at varying time points and one from an uninjured sample. We specifically chose [61]GSM5694214 (from three days post-SCI) and [62]GSM5694211 (uninjured) for our analysis. The Seurat R package^[63]21 was utilized for Quality Control (QC). Specific criteria led to the exclusion of cells: (a) fewer than 500 RNA counts or (b) mitochondrial gene expression ratios exceeding 5%. The Seurat package’s NormalizeData function was employed for data normalization. Subsequently, cell clusters were pinpointed using the FindClusters tool in Seurat (configured to a 0.5 resolution) and visualized through a 2D tSNE plot^[64]22. Using the SingleR package, we compared cells across clusters to a previously annotated reference dataset^[65]23. Cluster annotations were finalized after analyzing cell markers and assessing comparison outcomes. Establishment of SCI mice model Six male C57BL/6 mice were procured at the age of 6–7 weeks and accommodated in an environment that alternated between 12-hour day and night cycles, maintaining a temperature range of 20–23 degrees Celsius and a humidity level of 50%. All operational procedures adhered rigorously to the standards established by the National Institutes of Health. The mice were assigned randomly to two distinct groups: three underwent spinal cord injury surgery with a deliberate injury at the T10 segment, while the remaining three underwent a sham operation involving a laminectomy without spinal cord injury. The surgical procedure was meticulously executed as follows: the mice were anesthetized with 1% Isoflurane, sterilized using povidone-iodine, and a median dorsal incision was skillfully performed to expose the T10 segment of the spinal cord. Subsequently, the T10 laminectomy was executed to unveil the spinal cord. A spinal cord injury percussion device was then employed to deliver a blow at a height of 1 cm, utilizing a percussion rod weighing 10 g. Following the impact, both lower limbs of the mice exhibited involuntary twitching, and their tails demonstrated reflexive movements. Upon regaining consciousness from anesthesia, the mice experienced complete lower limb paralysis. The presence of dark red blood at the impact site served as an unequivocal indicator of the successful preparation of the injury model. In contrast, the sham operation group experienced immediate closure of the incision after laminectomy. The mice were humanely euthanized three days post-procedure to facilitate tissue sampling. Quantitative real-time polymerase chain reaction (qRT-PCR) Total RNA was meticulously isolated from mice spinal cord tissues utilizing TRIzol reagent (Solarbio, 15596-026, China). Subsequently, the RNA underwent reverse transcription, employing the HiFiScript Reverse Transcription Kit (Cwbio, CW2020M, China). To gauge gene expression levels, we conducted qRT-PCR using the Takara TB Green^® Fast qPCR Mix kit (Takara, RR430A, China) in conjunction with a state-of-the-art LightCycler 7500 System. Each sample was subjected to three sets of replicate experiments to ensure data robustness. To quantify the mRNA expression levels accurately, the 2^−ΔΔCt method was employed. The internal reference of mRNA was glyceraldehyde-3-phosphate dehydrogenase (GAPDH). The primer sequences for the pivotal genes under investigation can be found in Table [66]1 for reference. Table 1. Primer sequences of the core genes. Gene Sequence(5′-3′) CYBB Forward: CCAACTGGGATAACGAGTTCA Reverse: GAGAGTTTCAGCCAAGGCTTC PRDX6 Forward: CAGTAGAGTGTCCCAGGAGGA Reverse: GCTGGGATTTTGGGGTTACG PTGS1 Forward: CCAGAACCAGGGTGTCTGTGT Reverse: GTAGCCCGTGCGAGTACAATC GCH1 Forward: AGCAAGTCCTTGGTCTCAGTAAAC Reverse: ACCGCAATCTGTTTGGTGAGGC TLR2 Forward: CTCCTGAAGCTGTTGCGTTAC Reverse: GCTCCCTTACAGGCTGAGTTC PIK3CG Forward: TGTTTACTCCTGTGCAGGCT Reverse: AAGCTCTAACACAGACATCCCC GAPDH Forward: GTCTTCACCACCATGGAGAA Reverse: TAAGCAGTTGGTGGTGCAG [67]Open in a new tab Statistical analysis All bioinformatic and statistical evaluations were carried out via R software (version 4.2.2). Statistical contrasts were made using the t-test for two groups. Using a two-tailed perspective, we interpreted all p-values, marking a value of p < 0.05 as the cutoff for statistical significance. Results Identification and characterization of DE-ERSRGs It was obvious that the gene expression distribution of samples from different dataset has some deviation (Fig. [68]1A). However, the gene expression level is basically in the same range after the batch removal effect treatment (Fig. [69]1B). Besides, PCA results show that after batch effect is removed, samples from different data sets overlap together, indicating that batch effect is effectively removed (Fig. [70]1C–D). From our results, 68 genes were differentially expressed (63 upregulated genes and 5 downregulated genes) in SCI group compared with sham group (Fig. [71]1E). Functional enrichment analysis of DE-ERSRGs During the development of SCI, the DE-ERSRGs may play essential roles. As shown in Fig. [72]2A, the BP of GO results exhibited that DE-ERSRGs were mainly involved in the response to toxic substance, response to oxidative stress and cellular response to chemical stress. Besides, most DE-ERSRGs were existed in collagen-containing extracellular matrix, endoplasmic reticulum lumen and phagocytic vesicle (Fig. [73]2B). In the term of MF, these genes were involved in peroxidase activity, heme binding and Toll-like receptor binding (Fig. [74]2C). The KEGG results showed that DE-ERSRGs were closely related to the lipid and atherosclerosis, TNF signaling pathway and HIF-1 signaling pathway (Fig. [75]2D). Fig. 2. [76]Fig. 2 [77]Open in a new tab The GO and KEGG functional enrichment of DE-ERSRGs. (A) The GO enrichment of DE-ERSRGs in term of BP. (B) The GO enrichment of DE-ERSRGs in term of CC. (C) The GO enrichment of DE-ERSRGs in term of MF. (D) The KEGG enrichment of DE-ERSRGs. Identification and validation of diagnostic markers for SCI Then, we tried to determine the key genes with diagnosis value for SCI through machine learning. Through tenfold cross-validation in LASSO logistic regression, 9 key features were screened out based on the optimal lambda (Fig. [78]3A–B). Besides, the SVM-RFE algorithm further refined this list, targeting the optimal gene feature combination. Ultimately, 10 genes were pinpointed as the most distinguishing features with the lowest RMSE and the highest accuracy (Fig. [79]3C–D). An intersection of genes derived from both algorithms led to the identification of 6 key marker genes, namely CYBB, PRDX6, PTGS1, GCH1, TLR2, and PIK3CG, for subsequent analyses (Fig. [80]3E). Fig. 3. [81]Fig. 3 [82]Open in a new tab Identification of DE-ERSRGs as SCI diagnostic markers. (A, B) LASSO approach identification. (C, D) SVM-RFE algorithm filtering. (E) Venn diagram showing algorithm intersections. (F) Logistic regression model underwent ROC analysis. (G) ROC evaluation of the 6 markers. To further discern the diagnostic utility of each gene, we illustrated individual ROC curves using the set of markers. Figure [83]3F–G shows that the diagnosis model and all 6 genes rendered AUC values exceeding 0.9. Then, as shown in Fig. [84]4A–F, it can be seen that six hub genes were all significantly upregulated in SCI samples compared with normal samples. Moreover, leveraging the logistic regression findings, we designed a predictive nomogram to estimate the likelihood of SCI occurrence. It was obvious that PTGS1 had the highest point while PRDX6 had the lowest point (Fig. [85]5A). The corresponding calibration curve attested to the nomogram’s proficiency in identifying SCI with remarkable precision (Fig. [86]5B). Besides, the DCA curves showed that the nomogram has great net benefits in diagnosis of SCI (Fig. [87]5C). Fig. 4. [88]Fig. 4 [89]Open in a new tab The expression of six marker genes in SCI and normal group. (A) The expression of GCH1 in SCI and normal group. (B) The expression of CYBB in SCI and normal group. (C) The expression of TLR2 in SCI and normal group. (D) The expression of PTGS1 in SCI and normal group. (E) The expression of PRDX6 in SCI and normal group. (F) The expression of PIK3CG in SCI and normal group. *** (p < 0.001). Fig. 5. [90]Fig. 5 [91]Open in a new tab The construction and validation of nomogram based on six marker genes. (A) The construction of nomogram for SCI diagnosis. (B) The calibration curve for nomogram. (C) Decision curve analysis (DCA) was performed for the Nomogram. Involvement of key genes in SCI-associated pathways In an endeavor to elucidate the implications of the identified marker genes in differentiating SCI from sham samples, we conducted a GSEA-KEGG pathway analysis for individual genes. A multifaceted analysis revealed substantial associations of four highly expressed marker genes, namely GCH1, PIK3CG, PRDX6 and PTGS1, with pathways like focal adhesion and ECM receptor interaction (Fig. [92]6A–D). Furthermore, the downregulated expression of TLR2 were enriched in Nod like receptor signal pathway and Toll like receptor pathway (Fig. [93]6E). In addition, the downregulated expression of CYBB were enriched in Notch signaling pathway (Fig. [94]6F). Fig. 6. [95]Fig. 6 [96]Open in a new tab Single-gene GSEA for six marker genes using KEGG reference dataset. (A) The GSEA pathway enrichment analysis for GCH1. (B) The GSEA pathway enrichment analysis for PIK3CG. (C) The GSEA pathway enrichment analysis for PRDX6. (D) The GSEA pathway enrichment analysis for PTGS1. (E) The GSEA pathway enrichment analysis for TLR2. (F) The GSEA pathway enrichment analysis for CYBB. Molecular subtyping and immune correlations in SCI In our pursuit to determine the immune subtypes inherent in SCI, we employed the cumulative distribution function (CDF) to meticulously categorize SCI patients, leveraging the expression profiles of the 68 DE-ERSRGs. Optimal subtyping was achieved with k = 2, denoting the most suitable count of patterns (Fig. [97]7A–C). Subsequently, the discernibility of these two clusters was substantiated by principal component analysis (PCA) (Fig. [98]7D). Besides, boxplots and heatmaps illustrated that the expression of CYBB, PIK3CG, GCH1 and TLR2 were significantly downregulated in cluster 2 compared with cluster 1, while the difference of PRDX6 and PTGS1 were not significant between cluster 1 and cluster 2 (Fig. [99]7E–F). Fig. 7. [100]Fig. 7 [101]Open in a new tab Molecular subtypes and expression of six marker genes in SCI. (A) Cumulative distribution function (CDF) curve illustrating consensus clustering for SCI-related molecules. (B) Area variation under the CDF curve illustrating two stable molecular types. (C) Cluster heatmap for SCI molecular subtypes. (D) SCI-related molecular subtypes evaluated by Principal component analysis (PCA). (E) Expression difference in 6 markers between Cluster1 and Cluster2. (F) Heatmap of 6 markers across two subtypes. Immune landscape in SCI The interplay between SCI and the immune microenvironment is increasingly recognized. In light of this, the degree of immune infiltration within SCI samples was assessed by us utilizing the ssGSEA approach. Figure [102]8A depicts a noticeable reduction in the proportion of CD56dim natural killer (NK) cells and activated B cells among SCI samples compared to control groups. In contrast, the fractions of gamma delta T cells, MDSC, regulatory T cells, natural killer T cells, plasmacytoid dendritic cells, activated dendritic cells, natural killer cells, macrophages, mast cells, immature B cells, and type 1 T helper cells were noticeably elevated in SCI samples. Fig. 8. [103]Fig. 8 [104]Open in a new tab The calculation of immune landscape between different groups. (A) The infiltration levels of multiple immune cells in SCI and sham were assessed via the ssGSEA (B) The infiltration levels of multiple immune cells in cluster1 and cluster2 were assessed via the ssGSEA. ns (p > 0.05), * (p < 0.05), ** (p < 0.01), *** (p < 0.001). Then, we also explored the disparities in immune landscape between two clusters. Notably, Cluster 1 exhibited a heightened abundance of MDSC, Macrophage, Regulatory T cells (Treg), and Immature B cells when contrasted with Cluster 2. In contrast, Cluster 2 was enriched in Activated CD8 T cells, CD56 bright NK cells, and CD56 dim NK cells (Fig. [105]8B). The correlation between hub genes and Immune cells Next, we further analyzed the correlation between hub genes and infiltration levels of immune cells to explore the potential roles in the development of SCI. From Fig. [106]9A, it can be seen that PRDX6 was only positively related to the activated B cells. PTGS1 was positively and negatively related to the activated B cells and activated CD8T cells, respectively (Fig. [107]9B). Besides, PIK3CG was positively and closely related to the immature B cell, monocyte, T follicular helper cell and macrophage (Fig. [108]9C). In addition, CYBB, TLR2 and GCH1 were all negatively related to CD56 dim NK cells (Fig. [109]9D-F). Fig. 9. [110]Fig. 9 [111]Open in a new tab The correlation between immune cells and six marker genes. (A) The correlation between the infiltration of immune cells and PRDX6 expression. (B) The correlation between the infiltration of immune cells and PTGS1 expression. (C) The correlation between the infiltration of immune cells and PIK3CG expression. (D) The correlation between the infiltration of immune cells and CYBB expression. (E) The correlation between the infiltration of immune cells and TLR2 expression. (F) The correlation between the infiltration of immune cells and GCH1 expression. Constructing ceRNA networks and transcription factor predictions To further understand the post-transcriptional regulations in SCI, we established ceRNA networks, pinpointing the intersections between the target miRNAs of lncRNAs and those of the hub genes. Among the identified miRNAs, several showcased a high frequency of connections with genes (Supplement Fig. [112]1A). Specifically, miRNAs such as hsa-miR-873-5p, hsa-miR-875-3p, hsa-miR-1-3p, hsa-miR-561-3p, and hsa-miR-590-3p exhibited connections with four genes, while others including hsa-miR-4305, hsa-miR-3126-3p, hsa-miR-335-3p, hsa-miR-130b-5p, and hsa-miR-33a-3p were linked to three. In total, 11 miRNAs manifested an extensive interaction network with at least two genes. Of these, hsa-miR-873-5p, hsa-miR-1-3p, and hsa-miR-875-3p displayed the highest reliability by having upstream regulatory lncRNAs, with all three targeting the gene PTGS1. Consequently, through these interactions, we identified eight lncRNAs – CDR1-AS, LINC01043, RP5-894D12.5, RP11-102K13.5, RP13-895J2.3, RP11-96L7.2, RP4-539M6.22, and RP11-333E1.2 – that target these three key miRNAs. Building upon the findings, we harvested a comprehensive 830 pairs involving transcription factors and mRNA by referencing the TRRUST database’s variably expressed mRNAs. The intricate associations between the 318 transcription factors and corresponding mRNA sequences were effectively visualized using a Sankey diagram. Prominently, MYC, HSF1, JMJD1C, and SUMO1 emerged as the top four transcription factors exhibiting the strongest correlations (Supplement Fig. [113]1B). Single-cell dataset validation In this phase of the study, we used a scRNA-seq dataset for a more granular look into the cellular environment of SCI and uninjured samples. From the entire dataset, 22,768 cells passed the quality control (QC) benchmarks. These cells were divided between SCI samples (12,757 cells) and uninjured samples (the remaining 10,011 cells). Through unsupervised clustering, the dataset was further categorized into 19 distinct clusters, as depicted in Fig. [114]10A. When exploring these clusters in depth, seven predominant cell types emerged from both the SCI and uninjured samples. These are: Gametocytes, Neurons, Dendritic Cells (DC), Bone Marrow Cells (BM), Erythroblasts, Embryonic stem cells, and Hematopoietic Stem Cells (HSC CD34+). This distribution is showcased in Fig. [115]10B. Fig. 10. [116]Fig. 10 [117]Open in a new tab The analysis of single-cell sequencing dataset. (A–B) T-Distributed Stochastic Neighbor Embedding (T-SNE) plots showcasing 19 cell clusters and 7 main cell types in pSCI3d and uninjured tissues. (C-H) Violin diagrams displaying expression of 6 marker genes across varied cell types. The study then proceeded to scrutinize the expression patterns of the six previously identified marker genes (CYBB, PRDX6, PTGS1, GCH1, TLR2, and PIK3CG) across these cell types, in both SCI and uninjured conditions. Figure [118]10C-H visually depict how these identified genes are expressed among predominant cellular categories, facilitating a straightforward contrast between SCI and uninjured samples. Notably, Neurons in the SCI samples expressed the majority of these marker genes. Specifically, CYBB, PTGS1, GCH1, TLR2, and PIK3CG were recognized as distinct markers for Neurons. Additionally, CYBB, GCH1, TLR2, and PTGS1 were markedly enriched in Erythroblasts. Furthermore, PTGS1 and TLR2 were pronouncedly expressed in Gametocytes. Validation of marker genes in vivo Finally, we evaluated the expression of six marker genes in mouse spinal cord tissues through qRT-PCR. It was observed that the SCI group exhibited markedly elevated levels of these five key biomarkers, namely CYBB, PRDX6, PTGS1, TLR2, and PIK3CG, in comparison to the sham group, while GCH1 was downregulated in SCI group (Fig. [119]11A–F). Fig. 11. [120]Fig. 11 [121]Open in a new tab The validation of six marker genes in SCI and sham mouse samples calculating by qRT-PCR. (A) The expression of TLR2 in control and SCI groups. (B) The expression of GCH1 in control and SCI groups. (C) The expression of CYBB in control and SCI groups. (D) The expression of PTGS1 in control and SCI groups. (E) The expression of PRDX6 in control and SCI groups. (F) The expression of PIK3CG in control and SCI groups. * (p < 0.05), ** (p < 0.01), **** (p < 0.0001). Discussion SCI represents a grave medical condition, inflicting profound health implications on patients and imposing a significant economic strain^[122]1. Despite the rapid technological advancements, a plethora of basic and clinical research studies on SCI have been undertaken, yet the validation of effective molecularly targeted therapies remains elusive. The activation of ERS has historically been acknowledged as a component of the immediate intrinsic response after a traumatic injury to the spinal cord^[123]9. However, the intricacies of its mechanism continue to be elusive. In light of this, our study aimed to discern potential ERSRGs associated with SCI using bioinformatics analysis and further explore the influence of immune cell infiltration in SCI. In our study, three SCI datasets ([124]GSE5296, [125]GSE47681, [126]GSE132242) were chosen to mitigate the potential error rate and to bolster the reliability of our findings. We identified 68 DE-ERSRGs when comparing SCI with healthy controls: 63 of these were up-regulated, while 5 exhibited down-regulation. Both GO and KEGG analyses further substantiated a deep connection of these DEGs with an array of pathways. These pathways encompassed TNF, HIF-1, p53, NOD-like receptor and IL-17 signaling pathways. The TNF signaling pathway, specifically, plays a crucial role in modulating immune responses and facilitating T-cell activation, which can result in cellular demise^[127]24. Notably, studies have emphasized the advantageous impact of inhibiting the TNF-α signaling pathway on enhancing post-SCI functional recovery^[128]25. Additionally, Chen et al. discovered that activating the HIF-1 signaling pathway could enhance angiogenesis, neurogenesis, and the hypoxic ischemic environment to facilitate the recovery of neural functionality by regulating HIF-1α, VEGF proteins, and M2 macrophage exosomes^[129]26,[130]27. Moreover, the role of p53, NOD-like, and IL17 in the pathogenesis of SCI is intricately linked to the inflammatory response^[131]28–[132]30. Collectively, these findings underscore the pivotal significance of inflammation, immune response, oxidative stress, and neurogenesis in the initiation and progression of SCI. In our analysis, six critical DE-ERSRGs, including CYBB, PRDX6, PTGS1, GCH1, TLR2 and PIK3CG, were identified through machine learning, demonstrating remarkable predictive accuracy for SCI. CYBB, a constituent of cytochrome B, has been implicated in inducing oxidative stress following SCI^[133]31. TLR2, a member of the Toll-like receptor (TLR) family, is essential for pathogen detection and the initiation of innate immune responses^[134]32. Its significant role is further highlighted in the pathogenesis of various autoimmune diseases, with studies indicating that TLR2 deletion may potentially hinder recovery processes post-injury^[135]33,[136]34. GCH1 encodes for GTP cyclohydrolase I, a key enzyme in tetrahydrobiopterin synthesis, crucial for nitric oxide synthase activity^[137]35. The downregulation of GCH1 resulted in reduced microglial activation and pain^[138]36 and was associated with neuroprotection in SCI^[139]37. PIK3CG (PI3Kγ/p110γ) serves as a central regulator in inflammation and oxidation^[140]38,[141]39. Further studies demonstrated that PIK3CG could enhance the recovery of SCI patients by promoting neuronal differentiation^[142]40. PRDX6, or peroxiredoxin-6, is an antioxidant protein. It has been detected in traumatic central nervous system injuries and linked to post-injury inflammation and apoptotic reactions^[143]41–[144]43. PTGS1 encodes for cyclooxygenase-1 (COX-1), which plays a substantial role in neuronal health, with studies showing its protective capabilities post-injury^[145]44,[146]45. In conclusion, our findings underscore the pivotal roles of these six marker genes in the evolution and progression of SCI. The significant impact of immune cell infiltration on the progression and development of SCI is highlighted in recent research^[147]46. For example, Noble et al. demonstrated that functional recovery following SCI could be enhanced by restoring immune homeostasis^[148]47. SCI post-injury leads to reduced activity of NK cells, indicating their potential as therapeutic targets for SCI^[149]48. Furthermore, our study compared immune cells in SCI with sham samples. Notably, certain immunocytes such as macrophages, regulatory T cells, and NK cells exhibited increased infiltration in SCI samples, while the presence of CD8 T cells and CD56dim NK cells was reduced. CD8 T cells are among the most prevalent and crucial effector cells in the immune system^[150]49. Wu et al. discovered that the levels of CD8 T cells increased at 3-, 7-, and 14-day post-SCI treatment in a mouse model^[151]50. Similarly, we also observed an increase in CD8 T cell levels in the SCI group. However, PTGS1 was upregulated in the SCI group and negatively correlated with CD8 T cells, suggesting that CD8 T cells were not primarily affected by PTGS1. Additionally, PTGS1 and PRDX6 were both positively correlated with activated B cells, which showed no significant difference between the SCI and sham groups. According to GSEA results, we found that PTGS1 and PRDX6 were enriched in ECM receptor interaction and focal adhesion, which may be involved in the migration and proliferation of neural cells^[152]51,[153]52. These findings suggest that PTGS1 and PRDX6 may play a role in the development of SCI by influencing neural cells rather than the immune response. Macrophages typically differentiate into two subtypes: M1 macrophage and M2 macrophage. M1 macrophages are primarily associated with inflammatory responses^[154]53,[155]54. In the mouse model of SCI, the microenvironment is predominantly influenced by M1 macrophages^[156]55. Elevated levels of M1 macrophages lead to increased production of pro-inflammatory mediators, exacerbating SCI, impeding recovery post-injury, and inducing neuronal damage^[157]27,[158]56. Additionally, several studies have identified that PIK3CG can enhance the levels of M1 macrophages^[159]57. In our study, we observed that PIK3CG was upregulated in the SCI group and closely associated with macrophages. This suggests that PIK3CG may exacerbate SCI by promoting the polarization and infiltration of M1 macrophages. Furthermore, CD56dim NK cells are the predominant NK cell subsets and exhibit potent cytotoxicity^[160]58. Campagnolo et al. demonstrated that NK cell cytotoxicity was significantly reduced in the SCI group compared to the control group^[161]59. Similarly, we observed a decrease in the levels of CD56dim NK cells in the SCI group in our study. Additionally, we found that the upregulated CYBB, GCH1, and TLR2 exhibited strong negative correlations with CD56dim NK cells. This suggests that CYBB, GCH1, and TLR2 may be involved in the development of SCI by inhibiting CD56dim NK cells, thereby promoting inflammation. Nonetheless, our study has limitations. It predominantly relies on bioinformatics, necessitating further experimental and clinical validation. Data were sourced from public repositories with a limited sample size, and our clinical diagnostic model awaits external validation. In the initial phase of our study, we relied predominantly on animal data for analysis instead of patient specimens. Moving forward, the next stage of our research will involve the use of patient blood samples and other relevant specimens. In addition, our in vitro experiments included only three mice per group. Further studies should increase the number of experimental subjects to enhance the reliability of the results. This progression is anticipated to significantly enhance the clinical applicability and the informative value of our results. Conclusion In summary, our research discerned six pivotal marker genes tied to SCI’s onset and progression using advanced machine learning techniques. By developing a diagnostic model, we can precisely diagnose SCI based on these genes. Additionally, significant alterations in the immune landscape following SCI were uncovered in our findings, emphasizing the nuanced interplay of immune cells during post-injury reactions. Moreover, our understanding of the intricacies of ceRNA interactions and the ability to discern varying immune response patterns in SCI paves the way for personalized therapeutic strategies. Collectively, our findings offer enhanced insights into SCI’s molecular foundations and unveil promising avenues for therapeutic intervention. Electronic supplementary material Below is the link to the electronic supplementary material. [162]Supplementary Material 1^ (1.5MB, tif) Author contributions YPZ and XMT contributed to the conception and design. XMT, YL and JD contributed to the collection and assembly of data. YPZ, XMT and JM analyzed and interpreted the data. All authors wrote and approved the final manuscript. Data availability The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. Declarations Competing interests The authors declare no competing interests. Ethics approval The Ethics Committee of The Affiliated Huaian No.1 people’s Hospital of Nanjing Medical University deemed that this research is based on open-source data, so the need for ethics approval was waived. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. These authors contributed equally to this work, and should be regarded as co-first author: Yunpeng Zhang, Xiaoming Tang, Jian Dai and Yao Li. References