Abstract

Background

   The glycolytic pathway and immune response play pivotal roles in the
   intervertebral disc degeneration (IDD) progression. This study aimed to
   develop a glycolysis-related diagnostic model and analyze its
   relationship with the immune response to IDD.

Methods

   [31]GSE70362, [32]GSE23130, and [33]GSE15227 datasets were collected
   and merged from the Gene Expression Omnibus, and differential
   expression analysis was performed. Glycolysis-related differentially
   expressed genes (GLRDEGs) were identified, and a machine learning-based
   diagnostic model was constructed and validated, followed by Gene Set
   Enrichment Analysis (GSEA). Gene Ontology functional enrichment and
   Kyoto Encyclopedia of Genes and Genomes pathway enrichment analyses
   were performed, and mRNA-miRNA and mRNA-transcription factor (TF)
   interaction networks were constructed. Immune infiltration was analyzed
   using single-sample GSEA (ssGSEA) and cell-type identification by
   estimating relative subsets of RNA transcripts (CIBERSORT) algorithm
   between high- and low-risk groups.

Results

   In the combined dataset, samples from 31 patients with IDD and 55
   normal controls were analyzed, revealing differential expression of 16
   GLRDEGs between the two groups. Using advanced machine learning
   techniques (LASSO, support vector machine, and random forest
   algorithms), we identified eight common GLRDEGs (PXK, EIF3D, WSB1,
   ZNF185, IGFBP3, CKAP4, RPL15, and, SSR1) and developed a diagnostic
   model, which demonstrated high accuracy in distinguishing IDD from
   control samples (area under the curve, 0.935). We identified
   42 mRNA-miRNA and 33 mRNA-TF interaction pairs. Using the RiskScore
   from the diagnostic model, the combined dataset was stratified into
   high- and low-risk groups. SsGSEA revealed significant differences in
   the infiltration abundances of the four immune cell types between the
   groups. The CIBERSORT algorithm revealed the strongest correlation
   between resting natural killer (NK) cells and ZNF185 in the low-risk
   group and between CD8^+ T cells and SSR1 in the high-risk group.

Conclusions

   Our study reveals a potential interplay between glycolysis-associated
   genes and immune infiltration in IDD pathogenesis. These findings
   contribute to our understanding of IDD and may guide development of
   novel diagnostic markers and therapeutic interventions.

1. Introduction

   Lower back pain (LBP) is the leading cause of productivity loss and
   disability worldwide [[34]1]. Various multifactorial causes and risk
   factors contribute to the pathogenesis of LBP, with intervertebral disc
   degeneration (IDD) emerging as the predominant underlying factor
   [[35]2]. IDD is an increasingly common health problem and poses
   significant economic and social burdens in countries with rapidly aging
   populations [[36]3]. Moreover, the clinical management of IDD that in
   place today is not ideal. The original biology of the disc cannot be
   fundamentally restored by pharmacological or physiological treatments
   for early degenerative changes, or by disc resection, fusion, or
   replacement for advanced degenerative changes [[37]4].

   The intervertebral disc (IVD) is a critical component of the spine and
   is made up of fibrous cartilage. It is composed of the nucleus pulposus
   (NP), annulus fibrosus, and cartilaginous endplates. The IVD plays a
   vital role by providing flexibility, acting as a shock absorber, and
   ensuring vertebral stability [[38]5]. The pathogenesis of IDD is
   complex and overlapping, and is associated with the loss of homeostatic
   balance in the disc environment, leading to a catabolic and hypoxic
   microenvironment, a senescent cell profile, and consequent
   immunometabolic alterations [[39]6]. However, the specific mechanism
   and pathogenesis of disc degeneration remain unclear.

   Glycolysis is a fundamental cellular metabolic pathway that efficiently
   converts glucose into lactate to produce energy [[40]7]. It plays a
   vital role as a major energy source, particularly in cells under
   hypoxic conditions. Increased levels of glycolysis promote the
   proliferation, invasion, and migration of certain cancer cells by
   activating various signaling pathways and increasing drug resistance
   [[41]8]. Researchers have found that by intervening in the glycolytic
   process, the energy production and metabolic activity of chondrocytes
   can be significantly regulated, slowing down the progression of
   osteoarthritis (OA) [[42]9]. Other bone and joint disorders, including
   rheumatoid arthritis [[43]10], osteoporosis [[44]11], and IDD [[45]12],
   have also shown significant alterations in the glycolytic metabolic
   processes.

   The IVD tissues exhibit unique metabolic characteristics. Owing to its
   limited blood supply and lack of neural innervation, the IVD primarily
   receives blood through capillaries in the outer layers of the disc,
   whereas nutrient exchange in the central NP relies on diffusion from
   nearby blood vessels in the endplates, facilitated by the porous
   structure of the cartilaginous endplates [[46]13]. Compared to plasma,
   less oxygen and glucose are available for use. In such a low-oxygen
   environment, the glycolytic metabolic pathway serves as a key energy
   source for the NP [[47]14]. However, disruptions in this pathway can
   result in a progressive decline in the number of IVD cells and
   extracellular matrix, along with structural changes in the fibrous ring
   and accelerated disc degeneration [[48]15].

   The IVD is traditionally considered an immunologically privileged organ
   that shields NP tissue from the host immune system [[49]16]. However,
   when the blood-NP barrier is breached, an immune response is triggered,
   which plays a significant role in IVD degeneration and the subsequent
   pathological processes. Immune cells such as macrophages [[50]17] and T
   cells [[51]18] are involved in the inflammatory response of the disc
   tissue and regulate the inflammatory process and cellular metabolism
   through the release of cytokines such as tumor necrosis factor-α and
   interleukin-1β [[52]19]. However, few studies have reported on the role
   of immune infiltration in the development of disc degeneration, and the
   relationship between disc degeneration and immune infiltration requires
   further research to be fully understood.

   The objective of this study was to identify glycolysis-related
   biomarkers and potential therapeutic targets for IDD management using
   bioinformatics. Initially, three distinct disc degeneration datasets
   retrieved from the Gene Expression Omnibus (GEO) database were
   amalgamated, resulting in a combined dataset. Subsequently, we
   conducted differential expression analysis and Gene Set Enrichment
   Analysis (GSEA) of samples from the combined dataset. This analysis
   intersected with glycolysis-related genes (GLRGs) to yield
   glycolysis-related differentially expressed genes (GLRDEGs). We then
   constructed GLRDEG diagnostic models using Logistic-Least Absolute
   Shrinkage and Selection Operator (Logistic-LASSO), support vector
   machine (SVM), and random forest (RF) algorithms to identify common
   GLRDEGs. Subsequent steps included Gene Ontology (GO) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) enrichment analyses and
   construction of mRNA-miRNA and mRNA-transcription factor (mRNA-TF)
   interaction networks. Finally, based on the RiskScore derived from the
   diagnostic model, the combined dataset samples were divided into high-
   and low-risk groups. Differential analysis and GSEA enrichment analysis
   were performed, followed by single-sample GSEA (ssGSEA) and CIBERSORT
   immune signature differential analysis. In summary, our study uncovered
   new perspectives on the interplay between glycolysis-associated genes
   and immune cell infiltration in IDD pathogenesis.

2. Materials and methods

2.1. Data download

   Data retrieval was performed from GEO datasets using the keywords
   “intervertebral disc degeneration” and “Homo sapiens”. The following
   selection criteria were used for data retrieval: 1) gene expression
   profile of human intervertebral disc tissue samples; 2) Samples were
   divided into normal controls and IDD group; 3) At least 15 samples were
   included in the dataset. A total of three datasets met the screening
   criteria. We downloaded the gene expression profile datasets
   ([53]GSE70362 [[54]20], [55]GSE23130 [[56]21], and [57]GSE15227
   [[58]22]) of patients with IVD from the GEO database [[59]23] using the
   R package GEOquery [[60]24]. The source species for the three GEO
   datasets was Homo sapiens. The annotations of the probe names in the
   dataset used the chip GPL platform file (see [61]Table 1 for detailed
   information).

Table 1.

   Intervertebral Disc Degeneration Datasets Information list.
   [62]GSE70362 [63]GSE23130 [64]GSE15227
   Platform [65]GPL17810 [66]GPL1352 [67]GPL1352
   Species Homo sapiens Homo sapiens Homo sapiens
   Tissue disc tissue - annulus fibrosus and nucleus pulposus Disc Disc
   Samples in Control group 28 15 12
   Samples in IDD group 20 8 3
   [68]Open in a new tab

   For the analysis, we selected 20 degenerated IVD samples (Grade: IV, V;
   group: IDD) and 28 normal control samples (Grade: I, II, III; group:
   Control) from the [69]GSE70362 dataset; a total of 23 samples from the
   [70]GSE23130 dataset, including 15 normal control samples (Grade: I,
   II, III; grouping: Control) and eight degenerated IVD samples (Grade:
   IV, V; grouping: IDD); and a total of 15 samples from the [71]GSE15227
   dataset, including 12 normal control samples (Grade: I, II, III; group:
   Control) and three degenerated IVD samples (Grade: IV; group: IDD).

   We collected GLRGs from the GeneCards [[72]25] database
   ([73]https://www.genecards.org/), which provides comprehensive
   information on human genes. We used “glycolysis” as the search keyword
   to obtain the GLRGs in the GeneCards database and obtained a total of
   2661 GLRGs. The specific gene names are listed in [74]Supplement Table
   1.

2.2. Data Preprocessing and differential expression analysis

   We merged the three datasets ([75]GSE70362, [76]GSE23130, and
   [77]GSE15227), used the ComBat function of R's sva package [[78]26] to
   batch the data, and then used the ControlizeBetweenArrays function of
   the limma package [[79]27] to standardize, thus obtaining the combined
   dataset (31 IDD group samples, 55 control group samples).

   Subsequently, we used the limma package in R to conduct expression
   analysis of all genes between the IDD group samples and control group
   samples from the combined dataset, identifying differentially expressed
   genes (DEGs) based on the stringent criteria of |logFC| > 0.3 and P.adj
   <0.05 for further investigation. A volcano map was generated using the
   R package ggplot2, and the results are presented. We then intersected
   the GLRGs and DEGs to obtain GLRDEGs.

2.3. GSEA enrichment analysis

   GSEA [[80]28] is commonly used to estimate changes in pathway and
   biological process activities in expression datasets. In this study,
   based on the positive and negative sorting of the logFC values of the
   genes, all genes in the different groups of the combined dataset were
   divided into two groups. The clusterProfiler package was used to
   analyze positive and negative logFC values in the two groups. For GSEA
   of all genes, the parameters are as follows: the seed is 2022, the
   number of calculations is 1000, each gene set contains at least 10
   genes, the maximum number of genes is 500, and the p-value correction
   method is Benjamini-Hochberg (BH). We obtained the
   “c2.cp.all.v2022.1.Hs.symbols.gmt [All Canonical Pathways] (3050)" gene
   set from the Molecular Signatures Database (MSigDB) [[81]29]. The
   screening criteria for significant enrichment were P.adj <0.05 and
   false discovery rate (FDR) value (q.value) < 0.05.

2.4. Construction of diagnostic model

   In this study, we employed GLRDEGs to develop an SVM model. This was
   achieved by applying the SVM algorithm [[82]30] to the expression
   matrix and grouping the data in the combined dataset. The model was
   refined to identify genes with the highest accuracy and lowest error
   rates based on GLRDEGs to ensure optimal gene selection for diagnostic
   purposes.

   The RF [[83]31] is an algorithm that integrates multiple decision trees
   using ensemble learning. The randomForest package was used to construct
   a model based on the expression of GLRDEGs in the expression matrix of
   the combined dataset. The parameters were set.seed(234) and
   ntree = 1000.

   To develop a logistic diagnostic model for the combined dataset, we
   conducted logistic regression analysis of GLRDEGs, differentiating
   between the IDD and control groups. GLRDEGs with a p-value <0.05 were
   selected to construct the model. Molecular expression data for this
   model were visualized using forest plots. Further, we applied the R
   package glmnet [[84]32] (with set.seed set to 500 and family to
   “binomial”) for LASSO [[85]33] regression analysis on the GLRDEGs. This
   process aimed to refine the logistic regression model (termed
   Logistic-LASSO) and mitigate overfitting, with a run period of 500.
   LASSO regression, which is an extension of linear regression with a
   penalty term (lambda times the absolute value of the coefficient),
   reduces overfitting and enhances the generalizability of the model. The
   outcomes of the LASSO regression analysis were illustrated using the
   diagnostic model and variable trajectory diagrams, providing a
   comprehensive view of the model's performance and the variable
   selection process.

   We then intersected the GLRDEGs identified in the logistic
   (Logistic-LASSO) regression, SVM, and RF models. This intersection was
   visualized using a Venn diagram to identify common GLRDEGs.
   Subsequently, we combined the coefficients of common GLRDEGs from the
   Logistic-LASSO regression model with their expression levels in the
   combined dataset. This approach enabled us to develop a comprehensive
   GLRDEGs diagnostic model and calculate corresponding risk scores.

   In our analysis, we utilized nomograms [[86]34], which employ a series
   of disjoint line segments in a Cartesian coordinate system to represent
   the functional relationship between multiple independent variables.
   Specifically, genes identified in the GLRDEGs diagnostic model were
   analyzed using the R package 'rms' based on Logistic-LASSO regression.
   We performed logistic regression on the expression levels of these
   genes within the combined dataset and constructed a nomogram to
   visually represent the results of the GLRDEGs analysis.

   Additionally, to assess the accuracy and resolution of our GLRDEGs
   diagnostic model, we conducted a calibration analysis and generated a
   calibration curve. Moreover, decision curve analysis (DCA) [[87]35], a
   method for evaluating clinical prediction models, diagnostic tests, and
   molecular markers, was employed. We used the R package ‘ggDCA' to
   create DCA diagrams specifically for our GLRDEGs, thereby evaluating
   the accuracy and resolution of the diagnostic model.

2.5. Gene Function enrichment analysis (GO) and pathway enrichment (KEGG)
analysis

   GO [[88]36] analysis is a common method for large-scale functional
   enrichment research, including biological process (BP), molecular
   function (MF), and cellular components (CC). The KEGG [[89]37] is a
   widely used database that stores information on genomes, biological
   pathways, diseases, drugs, etc. We used the R package clusterProfiler
   [[90]38] to perform GO and KEGG annotation analyses of GLRDEGs. The
   item selection criteria were P.adj <0.05 and FDR value (q.value) < 0.05
   is statistically significant, and the BH method was used for p-value
   correction.

2.6. Construction of mRNA-miRNA, mRNA-TF interaction network

   The ENCORI [[91]39] database ([92]https://starbase.sysu.edu.cn/) is
   version 3.0 of the starBase database, and the miRNA-mRNA interactions
   in the ENCORI database are based on CLIP-seq and degradome sequencing
   (for plants). Data mining provides a variety of visualization
   interfaces for exploring miRNA targets. We used the ENCORI database to
   predict miRNAs interacting with GLRDEGs and then constructed an
   mRNA-miRNA interaction network in Cytoscape.

   The CHIPBase database (version 3.0) [[93]40]
   ([94]https://rna.sysu.edu.cn/chipbase/) identified thousands of binding
   motif matrices and their binding sites from the ChIP-seq data of
   DNA-binding proteins and predicted the transcriptional regulatory
   relationship between millions of TFs and genes. The hTFtarget [[95]41]
   database ([96]http://bioinfo.life.hust.edu.cn/hTFtarget) is a
   comprehensive database containing information on human TFs and their
   corresponding regulatory targets. We searched for TFs binding to key
   genes using the CHIPBase (version 3.0) and hTFtarget databases and
   visualized them using Cytoscape software.

2.7. Identification and correlation analysis of immune infiltrating cells

   We used the ssGSEA algorithm to quantify the relative abundance of each
   immune cell infiltrate; mark various infiltrating immune cell types,
   such as CD8^+ T cells, dendritic cells, macrophages, regulatory T
   cells, and other human immune cell subtypes; and use the enrichment
   fraction calculated by ssGSEA analysis to represent the relative
   abundance of each immune cell infiltration in each sample
   [[97]42,[98]43]. We analyzed the ssGSEA algorithm in the R package GSVA
   [[99]44] and calculated the enrichment scores of the high- and low-risk
   grouping samples to represent the infiltration levels of different
   types of immune cells in each sample. The difference in the
   infiltration abundance of immune cells between samples from the high-
   and low-risk groups is displayed in a boxplot. The correlation was
   calculated using the Spearman algorithm and visualized using a
   correlation point diagram.

   CIBERSORT [[100]45] is an immune infiltration analysis algorithm that
   deconvolutes a transcriptome expression matrix based on the principle
   of linear support vector regression, thereby estimating the composition
   and abundance of immune cells in mixed cells. We uploaded the
   expression matrix data of different groups of samples in the combined
   dataset to CIBERSORT, combined with the LM22 characteristic gene
   matrix; screened out data with immune cell enrichment scores greater
   than zero; and finally obtained and displayed the specific results of
   the immune cell infiltration abundance matrix.

   The differences in the infiltration abundance of immune cells in the
   samples between the different groups of the combined dataset are
   displayed in a stacked histogram. The correlation between different
   immune cells in the combined dataset was calculated using the Spearman
   algorithm and visualized using the R package ggplot2. We then combined
   the gene expression matrix of the combined dataset to calculate the
   correlation between immune cells and GLRDEGs and drew a correlation dot
   plot using the R package ggplot2.

2.8. Statistical analysis

   All data processing and analyses in this study were performed using R
   software (Version 4.1.2). For the comparison of two groups of
   continuous variables, the statistical significance of normally
   distributed variables was estimated using the independent Student's
   t-test, and the Mann–Whitney U test was used (i.e., Wilcoxon rank sum
   test) to analyze the differences among non-normally distributed
   variables. If not specified, the results were calculated using Spearman
   correlation analysis to determine the correlation coefficients between
   different molecules; all statistical p-values were two-sided, and
   P < 0.05 is considered statistically significant.

3. Results

3.1. Dataset processing

   The mRNA expression profile of IDD was downloaded from GEO, and the
   data were analyzed as in the roadmap ([101]Fig. 1). We merged three IDD
   datasets ([102]GSE70362, [103]GSE23130, and [104]GSE15227) and applied
   the ‘ComBat’ function from R's ‘sva' package [[105]26] for batch
   normalization, followed by standardization using the
   ‘ControlizeBetweenArrays’ function of the limma package [[106]27]. This
   resulted in a combined dataset comprising 31 IDD and 55 control
   samples.

Fig. 1.

   [107]Fig. 1
   [108]Open in a new tab

   Technology roadmap.

   Boxplots and principal component analysis (PCA) diagrams of the
   combined dataset were generated to illustrate the data distribution
   before ([109]Fig. 2A and B) and after ([110]Fig. 2C and D) processing,
   categorized by sample sources. Post-processing results indicated a more
   uniform expression pattern across samples in the combined dataset,
   suggesting the successful mitigation of batch effects. Subsequent
   analyses were conducted on the batch-effect-corrected dataset.

Fig. 2.

   [111]Fig. 2
   [112]Open in a new tab

   Dataset standardization processing.

   Boxplot (A) and PCA plot (B) of the combined dataset prior to batch
   effect removal and normalization. Combined dataset removes the batch
   effect, normalized boxplot plot (C) and PCA plot (D).

3.2. Identification of glycolysis-related DEGs

   We used the limma package to analyze the differential expression
   between the IDD group samples and the control group samples in the
   combined dataset, with 44 genes that met the threshold of |logFC| >0.30
   and P.adj <0.05. Under this threshold, in the high RiskScore group, 20
   genes showed high expression and 24 genes showed low expression. We
   intersected these 44 DEGs with 2661 GLRGs; as shown in [113]Fig. 3A, a
   total of 16 GLRDEGs were obtained, including CKAP4, IGFBP3, ASPH, RPN1,
   PXK, SSR1, RPL15, FTL, HDGF, EIF3D, ZNF185, RPL22, WSB1, SH3P XD2A,
   GLA, and CALR.

Fig. 3.

   [114]Fig. 3
   [115]Open in a new tab

   Expression difference analysis and correlation analysis of GLRDEGs.

   A. Venn diagram showing the intersection of differentially expressed
   genes and glycolysis-related genes between the disease and control
   groups in the combined dataset. B. Correlation heat map based on the
   expression matrix of 16 GLRDEGs in the combined dataset. C. Volcano map
   drawn according to the difference analysis results between the disease
   and control groups in the combined dataset, and the marked genes are
   GLRDEGs. D. Group comparison chart of 16 GLRDEGs between disease group
   and control group in combined dataset. ns is equal to P ≥ 0.05, which
   is not statistically significant; *P < 0.05, statistically significant;
   **P < 0.01, highly statistically significant; and ***P < 0.001, very
   statistically significant.

   We then generated a correlation heatmap using the expression matrix of
   the 16 GLRDEGs in the combined dataset ([116]Fig. 3B). The results
   revealed two distinct correlation patterns among these genes. The first
   group, comprising ASPH, IGFBP3, SH3PXD2A, GLA, CKAP4, FTL, SSR1, HDGF,
   CALR, and RPN1, exhibited positive correlations. The second group,
   which included EIF3D, PXK, RPL15, RPL22, WSB1, and ZNF185, also showed
   positive correlations within its members. However, the second group
   displayed a negative correlation with the first group of GLRDEGs.

   We also used a volcano map ([117]Fig. 3C) to show the results of the
   difference analysis between the disease and control groups of the
   combined dataset and marked the positions of the 16 GLRDEGs in the
   volcano map. A group comparison chart was generated to illustrate the
   differences in expression of the 16 GLRDEGs between the IDD and control
   groups ([118]Fig. 3D). The chart revealed significant differential
   expression of ASPH, IGFBP3, SH3PXD2A, GLA, CKAP4, FTL, SSR1, HDGF,
   CALR, and RPN1, which were markedly upregulated in the IDD group,
   whereas EIF3D, PXK, RPL15, RPL22, WSB1, and ZNF185 were significantly
   downregulated in the IDD group compared to those in the controls.

3.3. GSEA enrichment analysis between IDD and control groups based on
combined dataset

   To assess the impact of differential gene expression between IDD and
   control groups in the combined dataset, we conducted GSEA. We set the
   significance threshold for enrichment screening with an adjusted
   p-value (P. adj) and FDR, q. value) below 0.05. Our analysis revealed
   that in the combined dataset, genes from the different groups
   (IDD/control) were significantly enriched in pathways such as oxidative
   phosphorylation, hedgehog ligand biogenesis, and Notch4 signaling, as
   detailed in [119]Table 2 ([120]Fig. 4B–G). The GSEA results are
   visually summarized in a mountain plot ([121]Fig. 4A).

Table 2.

   GSEA enrichment analysis results of Combined dataset Control-IDD group
   genes.
   ID enrichmentScore NES pvalue p.adjust qvalue
   KEGG OXIDATIVE PHOSPHORYLATION 0.520952 2.111817 0.001927 0.029484
   0.024768
   REACTOME HEDGEHOG LIGAND BIOGENESIS 0.547656 2.02681 0.002058 0.029484
   0.024768
   REACTOME SIGNALING BY NOTCH4 0.467794 1.831013 0.002024 0.029484
   0.024768
   WP IL9 SIGNALING PATHWAY 0.660696 1.828353 0.002141 0.029484 0.024768
   REACTOME MAPK6 MAPK4 SIGNALING 0.473459 1.861385 0.002016 0.029484
   0.024768
   REACTOME SIGNALING BY NOTCH 0.356717 1.570358 0.001876 0.029484
   0.024768
   [122]Open in a new tab

Fig. 4.

   [123]Fig. 4
   [124]Open in a new tab

   GSEA of combined dataset.

   (A) Six main biological characteristics from the GSEA of genes between
   different groups (IDD/control) in the combined dataset. B-G. Genes in
   the IDD dataset were significantly enriched in Notch signaling (B),
   MAPK6 MAPK4 signaling (C), the IL9 signaling pathway (D), Notch4
   signaling (E), Hedgehog ligand biogenesis (F), and oxidative
   phosphorylation (G). The significant enrichment screening criteria for
   GSEA enrichment analysis were P. adj <0.05 and FDR value (q.
   value) < 0.05.

3.4. Construction of diagnostic model

   To identify 16 GLRDEGs in the combined dataset, we performed logistic
   regression using the expression levels of these genes and grouping
   information (IDD/control). A logistic regression model was constructed,
   selecting genes with P < 0.05 as the inclusion criterion. This model
   encompasses all 16 GLRDEGs. Subsequently, we visualized their
   expression levels using a Forest Plot ([125]Fig. 5A).

Fig. 5.

   [126]Fig. 5
   [127]Open in a new tab

   Construction of diagnostic model.

   A. Forest Plot of logistic regression model of GLRDEGs. B. The number
   of genes with the lowest error rate obtained by the SVM algorithm. C.
   The number of genes with the highest accuracy obtained by the SVM
   algorithm. D. Model training error plot for the RF algorithm. E. RF
   model showing GLRDEGs (in descending order of IncNodePurity). F.
   Diagnostic model diagram of the LASSO regression model. G. Variable
   trajectory plot of the LASSO regression model. H. GLRDEGs and SVM
   models in Logistic-LASSO regression model, GLRDEGs Venn diagram in RF
   model.

   We then constructed the SVM model based on 16 GLRDEGs and the SVM
   algorithm and obtained the number of genes with the lowest error rate
   ([128]Fig. 5B) and the highest accuracy rate ([129]Fig. 5C). The
   results showed that the accuracy of the SVM model was the highest when
   the number of genes was 10.

   We applied the RF algorithm to analyze the expression levels of the 16
   GLRDEGs in the combined dataset ([130]Fig. 5D). 'IncNodePurity' was
   used to indicate the increase in node purity, with higher values
   signifying fewer impurities (i.e., a lower Gini coefficient). Setting
   an IncNodePurity threshold >0.5, we identified 16 diagnostic markers
   from the GLRDEGs using the RF algorithm ([131]Fig. 5E). These markers
   included CKAP4, IGFBP3, ASPH, RPN1, PXK, SSR1, RPL15, FTL, HDGF, EIF3D,
   ZNF185, RPL22, WSB1, SH3PXD2A, GLA, and CALR.

   The outcomes of the LASSO regression analysis are depicted in a LASSO
   regression model diagram ([132]Fig. 5F) and a LASSO variable trajectory
   diagram ([133]Fig. 5G). This process led to the identification of 10
   GLRDEGs in the model: ASPH, CKAP4, EIF3D, IGFBP3, PXK, RPL15, RPN1,
   SSR1, WSB1, and ZNF185.

   To identify commonly diagnosed GLRDEGs (common GLRDEGs), we intersected
   the GLRDEGs from the Logistic-LASSO regression, SVM, and RF models.
   This intersection revealed eight common GLRDEGs, which are visualized
   in a Venn diagram ([134]Fig. 5H). These genes include PXK, EIF3D, WSB1,
   ZNF185, IGFBP3, CKAP4, RPL15, and SSR1.

   Next, we developed a new diagnostic model based on the expression
   levels of eight common GLRDEGs in the combined dataset in conjunction
   with the coefficients of these genes in the diagnostic model
   constructed through LASSO regression analysis.

   To further validate the value of the GLRDEGs diagnostic model, we
   plotted the receiver operating characteristic curve based on the
   RiskScore of the GLRDEGs diagnostic model and the group information
   (IDD/control) of the combined dataset ([135]Fig. 6A). As shown in
   [136]Fig. 6A, the GLRDEGs diagnostic model exhibited high accuracy in
   diagnosing the two groups, with an area under the curve (AUC) of 0.935.

Fig. 6.

   [137]Fig. 6
   [138]Open in a new tab

   Verification of diagnostic model.

   A. The ROC curve of the GLRDEGs diagnostic model in the combined
   dataset. B. Nomogram of eight common GLRDEGs in the GLRDEGs logistic
   regression model. C. Calibration curve of nomogram of GLRDEGs logistic
   regression model. D. Decision curve (DCA) in GLRDEGs logistic
   regression model.

   We also constructed a GLRDEGs logistic regression model and drew a
   nomogram to show the contribution of eight common GLRDEGs to the
   GLRDEGs logistic regression model ([139]Fig. 6B). The results showed
   that the expression levels of EIF3D, PXK, and IGFBP3 had significantly
   higher effects on the GLRDEGs logistic regression model than other
   variables.

   Calibration analysis was performed, and a calibration curve was drawn
   ([140]Fig. 6C). The optimal theoretical probability (solid line) and
   model prediction under different conditions were determined. The fit of
   the probabilities (dashed line) indicated how well the model predicted
   the outcome ([141]Fig. 6C). In addition, we used DCA to evaluate the
   role of the GLRDEGs diagnostic model in terms of clinical utility. The
   results are presented in [142]Fig. 6D. In the DCA diagram, when the
   line of the model is stably higher than all positive and all negative
   values within a certain range, the larger the range, the higher the net
   income and the better the model effect. From the results, we can see
   that the constructed model has a higher diagnostic value for the
   occurrence of IDD.

3.5. Functional enrichment analysis (GO) and pathway enrichment (KEGG)
analysis

   To analyze the relationships of the eight common GLRDEGs, we conducted
   GO functional enrichment analysis and KEGG pathway enrichment analysis.
   The results show that the eight common GLRDEGs were significantly
   enriched in various BPs such as cytoplasmic translation, regulation of
   RNA binding, negative regulation of ATP-dependent activity, and
   positive regulation of myoblast differentiation, including the
   insulin-like growth factor receptor signaling pathway. Additionally,
   these genes were associated with CCs such as the eukaryotic 48S and 43S
   pre-initiation complexes, endoplasmic reticulum lumen, lamellar body,
   and translation pre-initiation complex. In terms of MFs, enrichment was
   noted for insulin-like growth factor I binding, protein phosphatase
   activator activity, fibronectin binding, and actin binding. Moreover,
   these genes were implicated in the KEGG pathway for protein processing
   in the endoplasmic reticulum (see [143]Table 3 for pathway details).

Table 3.

   GO and KEGG enrichment analysis results of GLRDEGs.
   ONTOLOGY ID Description pvalue p.adjust qvalue
   BP GO:0002181 cytoplasmic translation 0.00088 0.039614 0.020386
   BP GO:1905214 regulation of RNA binding 0.003824 0.048303 0.024858
   BP GO:0032780 negative regulation of ATP-dependent activity 0.007002
   0.048303 0.024858
   BP GO:0045663 positive regulation of myoblast differentiation 0.007319
   0.048303 0.024858
   BP GO:0043567 regulation of insulin-like growth factor receptor
   signaling pathway 0.007636 0.048303 0.024858
   CC GO:0033290 eukaryotic 48S preinitiation complex 0.006109 0.035413
   0.019281
   CC GO:0005788 endoplasmic reticulum lumen 0.006601 0.035413 0.019281
   CC GO:0016282 eukaryotic 43S preinitiation complex 0.006921 0.035413
   0.019281
   CC GO:0042599 lamellar body 0.006921 0.035413 0.019281
   CC GO:0070993 translation preinitiation complex 0.007327 0.035413
   0.019281
   MF GO:0031994 insulin-like growth factor I binding 0.005636 0.047455
   0.02389
   MF GO:0072542 protein phosphatase activator activity 0.006933 0.047455
   0.02389
   MF GO:0019211 phosphatase activator activity 0.009953 0.047455 0.02389
   MF GO:0001968 fibronectin binding 0.013394 0.047455 0.02389
   MF GO:0003779 actin binding 0.014443 0.047455 0.02389
   KEGG has04141 Protein processing in endoplasmic reticulum 0.002546
   0.017819 0.013398
   [144]Open in a new tab

   The results of GO functional enrichment analysis and KEGG pathway
   enrichment analysis are displayed using histograms ([145]Fig. 7A). In
   addition, we displayed the BP ([146]Fig. 7B), CC ([147]Fig. 7C), MF
   ([148]Fig. 7D), and KEGG ([149]Fig. 7E) pathway enrichment results of
   the GO gene functional enrichment analysis in the form of a circular
   network diagram.

Fig. 7.

   [150]Fig. 7
   [151]Open in a new tab

   Functional enrichment analysis (GO) and pathway enrichment (KEGG)
   analysis of GLRDEGs.

   A. Histogram display of GO enrichment analysis and KEGG pathway
   enrichment analysis results of GLRDEGs. B-E. The circular network
   diagram display of BP (B), CC (C), MF (D), and KEGG pathways (E) in the
   GO functional enrichment analysis of GLRDEGs and the circular network
   diagram of KEGG enrichment results exhibit (E). The abscissa in the
   histogram (A) is GO terms, and the height of the bar indicates the Padj
   value of GO or KEGG terms. The blue dots in the network diagram (B, C,
   D, E) represent specific genes, and the pink dots represent specific
   pathways.

3.6. GSEA enrichment analysis between high- and low-risk groups based on
combined dataset

   Using the RiskScore from the previously established GLRDEGs diagnostic
   model, we divided the samples into high- and low-risk groups using the
   combined dataset. Differential analysis between these groups was
   conducted using the ‘limma' package. We assessed the expression of all
   genes across these groups, focusing on their involvement in BPs, impact
   on CCs, and exertion of MFs. For significant enrichment, we set
   stringent criteria with P.adj <0.05 and an FDR (q.value) < 0.05. We
   presented the results of the GSEA of genes between different groups
   using a ridge plot ([152]Fig. 8A). The results show that the different
   groups in the combined dataset data (high/low RiskScore group) were
   significantly enriched in Wnt signaling ([153]Fig. 8B), beta catenin
   independent Wnt signaling ([154]Fig. 8C), MAPK6 MAPK4 signaling
   ([155]Fig. 8D), fceri-mediated NF-kB activation ([156]Fig. 8E), NOTCH4
   signaling ([157]Fig. 8F), negative regulation of NOTCH4 signaling
   ([158]Fig. 8G), and other pathways (See [159]Table 4 for details).

Fig. 8.

   [160]Fig. 8
   [161]Open in a new tab

   GSEA enrichment analysis of combined dataset.

   A. Combined dataset of different groups (high/low RiskScore group) GSEA
   of genes among the main six biological characteristics. B-G.
   REACTOME_SIGNALING_BY_WNT (B),
   REACTOME_BETA_CATENIN_INDEPENDENT_WNT_SIGNALING (C),
   REACTOME_MAPK6_MAPK4_SIGNALING (D),
   REACTOME_FCERI_MEDIATED_NF_KB_ACTIVATION (E),
   REACTOME_SIGNALING_BY_NOTCH 4 (F), and
   REACTOME_NEGATIVE_REGULATION_OF_NOTCH4_SIGNALING (G). The significant
   enrichment screening criteria for GSEA were P.adj <0.05 and FDR value
   (p-value) < 0.05.

Table 4.

   GSEA enrichment analysis results of Combined dataset Low-High RiskScore
   group genes.
   ID enrichmentScore NES pvalue p.adjust qvalue
   REACTOME_NEGATIVE_REGULATION_OF_NOTCH4_SIGNALING −0.63081 −2.30816
   0.002075 0.044564 0.040247
   REACTOME_SIGNALING_BY_NOTCH4 −0.49719 −2.02722 0.002179 0.044564
   0.040247
   REACTOME_FCERI_MEDIATED_NF_KB_ACTIVATION −0.55319 −2.23626 0.002128
   0.044564 0.040247
   REACTOME_MAPK6_MAPK4_SIGNALING −0.53807 −2.19342 0.002183 0.044564
   0.040247
   REACTOME_BETA_CATENIN_INDEPENDENT_WNT_SIGNALING −0.41536 −1.85468
   0.002364 0.044564 0.040247
   REACTOME_SIGNALING_BY_WNT −0.30205 −1.46246 0.002506 0.044564 0.040247
   [162]Open in a new tab

3.7. Construction of mRNA-miRNA, mRNA-TF interaction network

   Using the mRNA-miRNA data from the miRDB database, we predicted the
   miRNAs interacting with the eight identified GLRDEGs, retaining only
   those interactions that were reported in at least five different
   sources. Subsequently, we used the Cytoscape software to construct and
   visualize the mRNA-miRNA interaction network ([163]Fig. 9A). Within
   this network, mRNAs are represented by red nodes and miRNAs are
   represented by blue nodes. Analysis of the mRNA-miRNA interaction
   network revealed that our network comprised seven hub genes (PXK, WSB1,
   ZNF185, IGFBP3, CKAP4, RPL15, and SSR1) and 38 miRNA molecules,
   collectively forming 42 mRNA-miRNA interaction pairs. Detailed
   information on these mRNA-miRNA interactions is provided in [164]Table
   5.

Fig. 9.

   [165]Fig. 9
   [166]Open in a new tab

   mRNA-miRNA and mRNA-TF interaction networks and functional similarity
   analysis diagram of GLRDEGs.

   A-D. mRNA-miRNA (A) and mRNA-TF (B) interaction networks of eight
   GLRDEGs. C. Functional similarity analysis between eight GLRDEGs. The
   red circles in the mRNA-miRNA (A) interaction network are mRNAs, and
   the blue circles are miRNAs. (B) The red circles in the mRNA-TF
   interaction network are mRNAs, and the blue circles are the TFs.

Table 5.

   mRNA-miRNA interaction network nodes.
        node1      node2       node1      node2
   hsa-miR-27a-3p  CKAP4  hsa-miR-135a-5p SSR1
   hsa-miR-129-5p  CKAP4  hsa-miR-152-3p  SSR1
   hsa-miR-129-5p  CKAP4  hsa-miR-196b-5p SSR1
   hsa-miR-7-5p    CKAP4  hsa-miR-495-3p  SSR1
   hsa-miR-27b-3p  CKAP4  hsa-miR-455-3p  SSR1
   hsa-miR-133a-3p CKAP4  hsa-miR-1323    SSR1
   hsa-miR-153-3p  CKAP4  hsa-miR-548o-3p SSR1
   hsa-miR-200c-3p CKAP4  hsa-miR-15a-5p  WSB1
   hsa-miR-133b    CKAP4  hsa-miR-16-5p   WSB1
   hsa-miR-429     CKAP4  hsa-miR-27a-3p  WSB1
   hsa-miR-19a-3p  IGFBP3 hsa-miR-15b-5p  WSB1
   hsa-miR-19b-3p  IGFBP3 hsa-miR-27b-3p  WSB1
   hsa-miR-197-3p  IGFBP3 hsa-miR-128-3p  WSB1
   hsa-miR-9-5p    IGFBP3 hsa-miR-190a-5p WSB1
   hsa-miR-24-3p   PXK    hsa-miR-195-5p  WSB1
   hsa-miR-92a-3p  RPL15  hsa-miR-424-5p  WSB1
   hsa-miR-367-3p  RPL15  hsa-miR-497-5p  WSB1
   hsa-miR-1323    RPL15  hsa-miR-503-5p  WSB1
   hsa-miR-196a-5p SSR1   hsa-miR-190b    WSB1
   hsa-miR-148a-3p SSR1   hsa-miR-1321    ZNF185
   hsa-miR-147a    SSR1   hsa-miR-4739    ZNF185
   [167]Open in a new tab

   We identified TFs that bind to eight GLRDEGs (PXK, EIF3D, WSB1, ZNF185,
   IGFBP3, CKAP4, RPL15, and SSR1) using the CHIPBase (version 3.0) and
   hTFtarget databases. We filtered the results with the criteria “Number
   of samples found (upstream) > 0” and “Number of samples found
   (downstream) > 0”. Finally, we obtained data for 33 interaction pairs
   involving four hub genes (PXK, RPL15, WSB1, and ZNF185) and 29 TFs.
   These interactions were visualized using the Cytoscape software
   ([168]Fig. 9B), where red nodes represent mRNAs and blue nodes
   represent TFs. Details of the mRNA-TF interactions are provided in
   [169]Table 6. We also conducted a functional similarity analysis on the
   eight datasets and presented the results using cloud and rain diagrams.
   Using the R package GOSemSim, we calculated the semantic similarity
   between GO terms, sets of GO terms, gene products, and gene clusters.
   During this process, we retained only those genes that were annotated
   to pathways in terms of BPs, MFs, and CCs for similarity analysis.
   Ultimately, we obtained functional similarity analysis results for six
   GLRDEGs (PXK, EIF3D, WSB1, IGFBP3, RPL15, and SSR1) and visualized them
   using a raincloud diagram ([170]Fig. 9C). The results demonstrated that
   WSB1 had the highest functional similarity value with other GLRDEGs
   ([171]Fig. 9C; the x-axis represents the similarity score, where a
   larger value denotes higher functional similarity with other genes).

Table 6.

   mRNA-TF interaction network nodes.
   mRNA  Transcription factor  mRNA  Transcription factor
   PXK   FOXA1                WSB1   HIF1A
   PXK   FOXA2                WSB1   KLF1
   RPL15 EP300                WSB1   MAX
   RPL15 GLIS1                WSB1   MAZ
   RPL15 GRHL2                WSB1   MNT
   RPL15 BRD4                 WSB1   MXI1
   RPL15 SPI1                 WSB1   MYC
   RPL15 TEAD4                WSB1   POLR2A
   RPL15 CREB1                WSB1   SPI1
   WSB1  ATF1                 WSB1   SRF
   WSB1  BHLHE40              WSB1   STAT1
   WSB1  BRD2                 WSB1   TFAP2A
   WSB1  BRD4                 WSB1   TFAP2C
   WSB1  CREB1                WSB1   YY1
   WSB1  E2F1                 WSB1   ZBTB7A
   WSB1  GTF2B                ZNF185 MAX
   WSB1  HDAC1
   [172]Open in a new tab

3.8. Difference analysis of ssGSEA immune signature between high- and
low-risk groups

   We used the median RiskScore of the GLRDEGs diagnostic model to divide
   IDD samples from the combined dataset into low- and high-risk groups.
   To explore the differences in immune infiltration between the high- and
   low-risk groups, we used the ssGSEA algorithm to calculate the
   infiltration abundance of 28 types of immune cells in the high- and
   low-risk group samples. The Mann–Whitney U test was used to analyze the
   degree of difference in the infiltration of 28 types of immune cells
   between the low- and high-risk groups. The results are displayed in the
   group comparison chart ([173]Fig. 10A). The results showed that in the
   combined dataset data, there were statistically significant differences
   (P < 0.05) in the infiltration abundance of four types of immune cells
   between the low- and high-risk groups: activated dendritic cells,
   monocytes, type 2 T helper cells, and activated CD8 T cells.

Fig. 10.

   [174]Fig. 10
   [175]Open in a new tab

   Difference analysis of ssGSEA immune characteristics between different
   groups (low/high RiskScore groups) of combined dataset data.

   A. Combined dataset data: high- and low-risk group (low/high RiskScore
   group). SsGSEA immune infiltration analysis results in group comparison
   chart display. The results of correlation analysis of immune cell
   infiltration abundance in the low RiskScore group (B) and high
   RiskScore group (C) of BC. Combined dataset data are shown. D, E.
   Combined dataset data of low-risk group (D) and high-risk group (E).
   Correlation dot plot of immune cells and 27 GLRDEGs. ns is equal to
   P ≥ 0.05, which is not statistically significant; *P < 0.05,
   statistically significant; **P < 0.01, highly statistically
   significant; and ***P < 0.001, very statistically significant.

   We further calculated the correlation between the infiltration
   abundances of the four types of immune cells with group statistical
   differences in the samples of the low- and high-risk groups and
   displayed the results ([176]Fig. 10B–C). In the low-risk group, the
   infiltration abundances of the four types of immune cells were mostly
   positively correlated, among which the relationship between activated
   CD8 T cells and type 2 T helper cells was the most significant
   ([177]Fig. 10B), whereas in the high-risk group, except type 2 T helper
   cells and monocytes, type 2 T helper cells and activated CD8 T cells
   were negatively correlated, and all the other cells were positively
   correlated. Among all correlations, the strongest were between type 2 T
   helper cells and activated CD8 T cells (negative correlation) and
   activated CD8 T cells and monocytes (positive correlation) ([178]Fig.
   10C).

   At the same time, we also calculated the infiltration abundance of four
   types of immune cells and eight GLRDEGs (correlation between the
   expression levels of PXK, EIF3D, WSB1, ZNF185, IGFBP3, CKAP4, RPL15,
   and SSR1) ([179]Fig. 10D–E). Most of the genes were positively
   correlated, among which the correlation between activated CD8 T cells
   and EIF3D was the strongest (positive correlation). In the high-risk
   group of the combined dataset data ([180]Fig. 10 E), four types of
   cells were positively correlated with eight genes and among the
   negative correlations, monocytes showed the strongest correlation
   (negative correlation) with WSB1.

3.9. 3.9CIBERSORT immune signature difference analysis between high-risk
group and low-risk group of combined dataset

   Subsequently, we employed the CIBERSORT algorithm to calculate the
   infiltration abundance of 22 types of immune cells in the high- and
   low-risk groups. A proportional representation of the immune cell
   infiltration in the dataset samples is depicted in stacked bar charts
   ([181]Fig. 11A). The results showed that of the 22 types of immune
   cells, 21 had a non-zero infiltration abundance in the combined dataset
   samples. These 21 immune cell types are naïve B cells, memory B cells,
   plasma cells, CD8 T cells, CD4 memory resting T cells, CD4 memory
   activated T cells, follicular helper T cells, regulatory T cells
   (Tregs), gamma delta T cells, resting NK cells, activated NK cells,
   monocytes, M0 macrophages, M1 macrophages, M2 macrophages, resting
   dendritic cells, activated dendritic cells, resting mast cells,
   activated mast cells, eosinophils, and neutrophils.

Fig. 11.

   [182]Fig. 11
   [183]Open in a new tab

   Difference analysis of CIBERSORT immune signature among different risk
   groups of combined dataset data.

   A. Stacked histogram display of CIBERSORT immune infiltration analysis
   results between different risk groups of the combined dataset data.
   B–C. Combined dataset data: low-risk group (B) and high-risk group (C).
   Correlation heat map between immune cells with infiltrating abundance
   not 0. D-E. Combined dataset data: low-risk group (D) and high-risk
   group (E). Correlation dot plot of immune cells with infiltration
   abundance not zero and GLRDEGs. ns is equal to P ≥ 0.05, which is not
   statistically significant; *P < 0.05, statistically significant;
   **P < 0.01, highly statistically significant; and ***P < 0.001, very
   statistically significant.

   We then used the “spearman” algorithm to calculate the correlation
   between the infiltration abundance of 21 types of immune cells in the
   samples of the combined dataset data for high- and low-risk groups
   (low/high RiskScore group) whose infiltration abundance is not all 0
   ([184]Fig. 11B and C). The results showed that in the combined dataset
   data, the low-risk group (low RiskScore group, infiltration abundance
   of M2 macrophages in the low-risk group is all 0, leaving 20 types of
   cells) ([185]Fig. 11C) and the high-risk group (high RiskScore group)
   ([186]Fig. 11D), the expression levels of immune cells ([187]Fig. 11D)
   were mostly negatively correlated.

   We also calculated the low-risk group (low RiskScore group) ([188]Fig.
   11D) and high-risk group (high RiskScore group) ([189]Fig. 11E) in the
   patient samples of 21 types of immune cell infiltration abundance and 8
   GLRDEGs (correlation between the expression levels of PXK, EIF3D, WSB1,
   ZNF185, IGFBP3, CKAP4, RPL15, SSR1) ([190]Fig. 11D–E), were screened
   with P < 0.05 as the standard and the results were displayed by the
   correlation dot plot. The results showed that there were significant
   correlations between four types of cells (eosinophils, resting NK
   cells, plasma cells, and focal helper T cells) and five genes (CKAP4,
   IGFBP3, PXK, WSB1, and ZNF185) in the low-risk group of combined
   dataset data ([191]Fig. 11D). Among them, the correlation between
   resting NK cells and ZNF185 was the strongest (positive correlation) in
   the high-risk group of combined dataset data ([192]Fig. 11E). There was
   a significant correlation between CD4 memory activated T cells, CD8 T
   cells, focal helper T cells, regulatory T cells, and six genes (CKAP4,
   EIF3D, PXK, RPL15, SSR1, and ZNF185), among which T cell CD8 and SSR1
   showed the strongest correlation (negative correlation).

4. Discussion

   IDD is a prevalent cause of LBP, resulting in significant social and
   economic burdens [[193]1]. The high prevalence of LBP makes it a common
   reason for seeking medical attention. Considering the increasing
   incidence of IDD in the aging population, there is an urgent need to
   uncover its underlying causes and identify effective therapies.
   Currently, IDD diagnosis primarily relies on symptoms and imaging,
   which hampers early detection and timely intervention [[194]46].
   Therefore, the identification of potential biomarkers for the
   prediction of IDD is crucial. In recent years, advances in machine
   learning techniques and the availability of gene expression data in
   public databases have provided new approaches for identifying
   biomarkers for disease detection [[195]47]. Despite extensive research
   efforts, the precise mechanisms underlying IDD remain elusive, and
   effective treatment options are still lacking. Therefore, our study
   aimed to explore potential glycolysis-associated gene signatures
   associated with IDD and investigate their correlation with immune cell
   infiltration using a comprehensive analytical approach.

   In this study, we comprehensively investigated the association among
   GLRGs, immune cells, and IDD using various machine learning techniques
   and the CIBERSORT algorithm. Our study established a strong link
   between these elements, providing a novel perspective for understanding
   the molecular mechanisms underlying IDD and identifying potential
   therapeutic targets.

   We identified eight GLRDEGs—PXK, EIF3D, WSB1, ZNF185, IGFBP3, CKAP4,
   RPL15, and SSR1—by Logistic-LASSO regression, SVM, and RF models. These
   GLRDEGs not only provide insight into the pathogenesis of IDD but also
   constitute a promising diagnostic model for IDD. Previous studies have
   implicated some of these genes in different physiological and
   pathological contexts, partially confirming our findings.

   For instance, IGFBP3, also known as insulin-like growth factor binding
   protein 3 in yeast, suppresses apoptosis and improves cell survival in
   several cell systems when calorie intake is restricted [[196][48],
   [197][49], [198][50]]. IGFBP3, which is essential for cell
   differentiation, proliferation, apoptosis, and cell aging, deacetylates
   several apoptosis-associated nonhistone proteins [[199]49,[200]51].
   Patients with disc degeneration show higher levels of IGFBP3
   expression, which may protect against NP cell apoptosis [[201]20].
   EIF3D (eukaryotic translation initiation factor 3, subunit D) is an
   mRNA cap-binding protein required for specialized translation
   initiation [[202]52]. EIF3D is associated with tumor development and is
   widely expressed in a variety of tumor tissues, regulating the cycle of
   tumor cells and dysregulating apoptotic and anti-apoptotic signaling by
   increasing apoptosis [[203][52], [204][53], [205][54]]. WSB1 is a
   member of the SOCS box family and plays a key role in mediating the
   degradation of substrate proteins via the ubiquitin-proteasome pathway
   as a central component of the E3 ubiquitin ligase complex of ECS
   (Elongin B/C-Cullin 2/5-SOCS box protein) [[206]55]. Recent research
   has indicated that WSB1 regulates thyroid homeostasis, immunological
   response, glycolysis, and hypoxia, and possibly influences the onset or
   progression of cancer. It has been shown that WSB1 affects cell
   invasion, survival, and proliferation [[207]56]. However, further
   experimental and clinical studies are required to verify whether EIF3D
   and WSB1 regulate the proliferation of NP cells during IDD.

   Enrichment analyses (GO and KEGG) provided deeper insight into the
   functional roles and pathways related to the identified GLRDEGs. These
   analyses revealed that the identified genes are significantly involved
   in the glycolysis pathway, which has been extensively implicated in
   several disorders including IDD [[208][57], [209][58], [210][59]].
   Therefore, it is plausible to assume that modulation of the glycolysis
   pathway may provide new opportunities for IDD therapeutic strategies.

   Our constructed mRNA-miRNA and mRNA-TF interaction networks provide a
   comprehensive map of the potential regulatory relationships among the
   identified genes, miRNAs, and TFs. This regulatory network aids in
   understanding the intricacies of the molecular mechanisms underlying
   IDD and their contributions to the disease pathology.

   Finally, RiskScore based on the diagnostic model divided the samples in
   the combined dataset into two groups of high/low risk for differential
   analysis and GSEA, further ssGSEA, and CIBERSORT immune signature
   differential analysis between the groups. To the best of our knowledge,
   studies on GLRDEGs and immune responses related to IDD are scarce, and
   our study examining the cross-talk between hub GLRGs and immune cells
   may provide new insights into the etiology of IDD.

   Considering the physical and biological barriers, such as an avascular
   microenvironment, high proteoglycan concentration, intense physical
   pressure, notochordal cells, and apoptosis inducers such as Fas ligand,
   the IVD is perceived as an immunologically privileged site [[211]16].
   Nonetheless, during IDD, immune cells within the IVD have unique
   properties. Certain immune cells secrete substances that promote
   angiogenesis and inflammatory responses in the disc, thus playing a
   vital role in the progression of IDD. Macrophages exhibit a higher
   pro-inflammatory profile in degenerate IVDs, as evidenced by the
   elevated expression of pro-inflammatory markers (such as CCR7, IL-6,
   CD86, and iNOS) [[212]60]. A novel lineage of Th17 lymphocytes
   expressing substantial amounts of IL-17 has been discovered in surgical
   tissues obtained from patients with deteriorated IVDs [[213]61]. IL-17
   not only promotes chemokine synthesis, but also orchestrates the
   migration of neutrophils and monocytes to the inflammation site,
   thereby instigating chemokine production and attracting immune cells
   [[214]62].

   However, this study has some limitations. First, the database used had
   a very small sample size, which calls for additional research using
   larger samples. Second, there were insufficient experiments to validate
   these claims. Therefore, extensive biological research on immune cell
   infiltration and glycolysis is necessary.

5. Conclusions

   Our study provides novel insights into the role of glycolysis and the
   immune response in IDD. The identified GLRGs and immune cells can
   potentially serve as biomarkers or therapeutic targets for IDD. Despite
   the need for further experimental validation, these findings pave the
   way for future research on the molecular mechanisms underlying IDD and
   have potential implications for improving diagnostic accuracy and
   developing new therapeutic strategies.

Funding statement

   This work was supported by the grant from General Project of Natural
   Science Research Project of Shanxi Provincial Health Commission (No.
   [215]2020TD13) and Hepcidin expression changes and related effect
   mechanisms in intervertebral disc nucleus pulposus degeneration (No.
   [216]202203021211074).

Data availability statement

   The datasets presented in this study can be found in the GEO database
   including [217]GSE70362, [218]GSE23130 and [219]GSE15227, which is
   global and public online repositories.

CRediT authorship contribution statement

   Jian Gao: Writing – review & editing, Writing – original draft,
   Visualization, Validation, Methodology, Formal analysis. Liming He:
   Funding acquisition. Jianguo Zhang: Formal analysis, Data curation.
   Leimin Xi: Project administration, Methodology. Haoyu Feng:
   Visualization, Validation, Supervision, Funding acquisition.

Declaration of competing interest

   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.

Footnotes

   ^Appendix A

   Supplementary data to this article can be found online at
   [220]https://doi.org/10.1016/j.heliyon.2024.e36158.

Appendix A. Supplementary data

   The following is the Supplementary data to this article.
   Multimedia component 1
   [221]mmc1.docx^ (41.3KB, docx)

References