Abstract The incidence of intervertebral disc (IVD) degeneration disease, caused by changes in the osmotic pressure of nucleus pulposus (NP) cells, increases with age. In general, low back pain is associated with IVD degeneration. However, the mechanism and molecular target of low back pain have not been elucidated, and there are no data suggesting specific biomarkers of low back pain. Therefore, the research aims to identify and verify the significant gene biomarkers of low back pain. The differentially expressed genes (DEGs) were screened in the Gene Expression Omnibus (GEO) database, and the identification and analysis of significant gene biomarkers were also performed with various bioinformatics programs. A total of 120 patients with low back pain were recruited. Before surgery, the degree of pain was measured by the numeric rating scale (NRS), which enables comparison of the pain scores from individuals. After surgery, IVD tissues were obtained, and NP cells were isolated. The NP cells were cultured in two various osmotic media, including iso-osmotic media (293 mOsm/kg H[2]O) to account for the morbid environment of NP cells in IVD degeneration disease and hyper-osmotic media (450 mOsm/kg H[2]O) to account for the normal condition of NP cells in healthy individuals. The relative mRNA expression levels of CCL5, OPRL1, CXCL13, and SST were measured by quantitative real-time PCR in the in vitro analysis of the osmotic pressure experiments. Finally, correlation analysis and a neural network module were employed to explore the linkage between significant gene biomarkers and pain. A total of 371 DEGs were identified, including 128 downregulated genes and 243 upregulated genes. Furthermore, the four genes (CCL5, OPRL1, SST, and CXCL13) were identified as significant gene biomarkers of low back pain (P < 0.001) based on univariate linear regression, and CCL5 (odds ratio, 34.667; P = 0.003) and OPRL1 (odds ratio, 19.875; P < 0.001) were significantly related to low back pain through multivariate logistic regression. The expression of CCL5 and OPRL1 might be correlated with low back pain in patients with IVD degeneration disease caused by changes in the osmotic pressure of NP cells. Subject terms: Biomarkers, Molecular medicine Introduction IVD is fibro-cartilage tissue located between vertebrae and is composed of the nucleus pulposus, annulus fibrosus and endplate^[40]1. NP survives in a special ecological niche containing no distribution of blood vessels and nerves in which the osmotic pressure is significantly higher than that of plasma, enabling proteoglycans to be present in high concentrations^[41]2,[42]3. Increasing with age, the osmotic pressure changes, and the interverbal disc degenerates^[43]4,[44]5, which may be related to low back pain^[45]6,[46]7. Previous studies have shown that the osmotic environment can influence the gene expression of NP cells^[47]7; thus, we hypothesized that there are pain-related genes among these affected genes, which might provide new therapeutic strategies and drug targets for low back pain. In addition, the NRS is ubiquitous in the clinic as a means of evaluating low back pain^[48]8. The scale requires patients to describe pain intensity using the numbers 0–10 or 0–100, while a higher number means more pain (0 = no pain, 100 or 100 = most severe pain)^[49]9. Therefore, the NRS was used to evaluate the degree of low back pain in this study. Through bioinformatic analysis, we screened enrichment pathways, such as the regulation of extracellular matrix organization and the Janus kinase (JAK)/signal transducer and activator of transcription (STAT) signalling pathway, as well as key genes, such as CXCL13, SST, CCL5 and OPRL1, between NP cells cultured in hyperosmotic media and NP cells cultured in iso-osmotic media. In addition, through quantitative real-time PCR and statistical analysis, we found that CCL5 and OPRL1 might be potential biomarkers of low back pain. Results Identification of DEGs between NP cells cultured in hyperosmotic media and NP cells cultured in iso-osmotic media After analysis of the datasets ([50]GSE1648) with the limma package, the difference between NP cells cultured in hyperosmotic media and NP cells cultured in iso-osmotic media could be presented in a volcano plot. After setting the threshold value, a total of 371 DEGs were reserved, including 128 downregulated DEGs and 243 upregulated DEGs (Fig. [51]1). Figure 1. Figure 1 [52]Open in a new tab Volcano plot presents the DEGs between IVD cells cultured in hyperosmotic media and IVD cells cultured in iso-osmotic media. The downregulated DEGs are marked in green, and upregulated DEGs are marked in red. Functional annotation for DEGs via DAVID and Metascape Through DAVID analysis, the results of the Gene Ontology analysis showed that variations in DEGs linked with biological processes were mainly enriched in sensory perception of pain and inflammatory response (Fig. [53]2A). Variations in DEGs linked with cellular components were significantly enriched in the postsynaptic membrane, synapse, neuromuscular junction, acetylcholine−gated channel complex, voltage−gated calcium channel complex, and L−type voltage−gated calcium channel complex (Fig. [54]2B). With regard to molecular function, DEGs were significantly enriched in chemokine activity, monooxygenase activity, and voltage−gated calcium channel activity (Fig. [55]2C). Analysis of KEGG pathways indicated that the top canonical pathways associated with DEGs were glutamatergic synapses and neuroactive ligand−receptor interactions (Fig. [56]2D). Figure 2. [57]Figure 2 [58]Open in a new tab Enrichment analysis of DEGs by DAVID and Metascape. Detailed information relating to changes in the (A) cellular component, (B) biological process, (C) molecular function, and (D) KEGG analysis for hub genes. (E) Heatmap of enriched terms across input differentially expressed gene lists, coloured by p-values, via Metascape. (F) Network of enriched terms coloured by clust er identity, where nodes that share the same cluster identity are typically close to each other. (G) Network of enriched terms coloured by p-value, where terms containing more genes tend to have a more significant p-value. Furthermore, the functional enrichment analysis with Metascape showed that the DEGs between NP cells cultured in hyperosmotic media and NP cells cultured in iso-osmotic media were significantly enriched in synaptic signalling, synapse organization, neuroactive ligand-receptor interaction, and glutamatergic synapse (P < 0.05, Fig. [59]2E–G). Gene ontology and KEGG pathway enrichment analysis of DEGs in NP cells using GSEA The results of Gene Ontology enrichment analysis by GSEA indicated that 1274/4081 genes were upregulated in hypertonic NP cells. Table [60]1 lists the most significant enrichments in the upregulated and downregulated gene sets according to the normalized enrichment score by order (Table [61]1). Figure [62]3 shows the six most significant plots in the upregulated and downregulated gene sets. KEGG enrichment analysis by GSEA indicated that 70/168 gene sets were upregulated in hypertonic NP cells compared to isotonic NP cells, while 98/168 gene sets were downregulated. Table [63]2 lists the most significant enrichments in the upregulated and downregulated gene sets according to the normalized enrichment score by order (Table [64]2). Figure [65]4 shows the six most significant plots in the upregulated and downregulated gene sets. Table 2. Pathway enrichment analysis of DEGs in IVD cells using GSEA. Gene Set Name SIZE ES NES p-val Upregulated KEGG_VIBRIO_CHOLERAE_INFECTION 46 0.386 1.418 0.068 KEGG_DORSO_VENTRAL_AXIS_FORMATION 19 0.494 1.375 0.128 KEGG_EPITHELIAL_CELL_SIGNALING_IN_HELICOBACTER_PYLORI_INFECTION 58 0.348 1.338 0.025 KEGG_ERBB_SIGNALING_PATHWAY 82 0.325 1.323 0.019 KEGG_FRUCTOSE_AND_MANNOSE_METABOLISM 32 0.388 1.312 0.058 KEGG_NEUROTROPHIN_SIGNALING_PATHWAY 117 0.312 1.312 0.048 Downregulated KEGG_REGULATION_OF_AUTOPHAGY 31 0.472 1.752 0.000 KEGG_CITRATE_CYCLE_TCA_CYCLE 29 0.401 1.515 0.000 KEGG_JAK_STAT_SIGNALING_PATHWAY 140 0.358 1.456 0.059 KEGG_MISMATCH_REPAIR 22 0.445 1.445 0.060 KEGG_HOMOLOGOUS_RECOMBINATION 25 0.458 1.416 0.053 KEGG_CYTOSOLIC_DNA_SENSING_PATHWAY 47 0.464 1.394 0.086 [66]Open in a new tab IVD: Intervertebral disc; ES: Enrichment Score; NES: Normalized Enrichment Score. Figure 3. [67]Figure 3 [68]Open in a new tab Six significant enrichment plots of functional enrichment analysis of DEGs between hyper and iso samples in [69]GSE1648 using GSEA. (A) Enrichment plot: Gene Ontology_REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (B) Enrichment plot: Gene Ontology_ATP_GENERATION_FROM_ADP. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (C) Enrichment plot: Gene Ontology_RESPONSE_TO_TOPOLOGICALLY_INCORRECT_PROTEIN. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (D) Enrichment plot: Gene Ontology_LYSOSOME_LOCALIZATION. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (E) Enrichment plot: Gene Ontology_MAST_CELL_MEDIATED_IMMUNITY. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (F) Enrichment plot: Gene Ontology_MAST_CELL_ACTIVATION. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. Table 1. Functional enrichment analysis of DEGs in IVD cells using GSEA. Gene Set Name SIZE ES NES P-value Upregulated REGULATION_OF_EXTRACELLULAR_MATRIX_ORGANIZATION 25 0.647 2.294 0.000 ATP_GENERATION_FROM_ADP 35 0.571 2.175 0.000 RESPONSE_TO_TOPOLOGICALLY_INCORRECT_PROTEIN 134 0.425 2.127 0.000 RIBONUCLEOSIDE_DIPHOSPHATE_METABOLIC_PROCESS 54 0.495 2.121 0.000 ADP_METABOLIC_PROCESS 41 0.519 2.098 0.000 GLUCOSE_CATABOLIC_PROCESS 28 0.574 2.060 0.000 Downregulated LYSOSOME_LOCALIZATION 18 0.676 1.855 0.000 MAST_CELL_MEDIATED_IMMUNITY 16 0.693 1.842 0.004 MAST_CELL_ACTIVATION 19 0.662 1.810 0.003 NEGATIVE_REGULATION_OF_LIPID_CATABOLIC_PROCESS 17 0.677 1.802 0.001 RNA_PHOSPHODIESTER_BOND_HYDROLYSIS_ENDONUCLEOLYTIC 41 0.559 1.782 0.001 NEUROPEPTIDE_RECEPTOR_ACTIVITY 30 0.584 1.773 0.000 [70]Open in a new tab IVD: Intervertebral disc; ES: Enrichment Score; NES: Normalized Enrichment Score. Figure 4. [71]Figure 4 [72]Open in a new tab Six significant enrichment plots of pathway enrichment analysis of DEGs between hyper and iso samples in [73]GSE1648 using GSEA. (A) Enrichment plot: KEGG_VIBRIO_CHOLERAE_INFECTION. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (B) Enrichment plot: KEGG_DORSO_VENTRAL_AXIS_FORMATION. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (C) Enrichment plot: KEGG_EPITHELIAL_CELL_SIGNALING_IN_HELICOBACTER_PYLORI_INFECTION. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (D) Enrichment plot: KEGG_REGULATION_OF_AUTOPHAGY. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (E) Enrichment plot: KEGG_CITRATE_CYCLE_TCA_CYCLE. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. (F) Enrichment plot: KEGG_JAK_STAT_SIGNALING_PATHWAY. Profile of the Running ES Score & Positions of GeneSet Members on the Rank Ordered List. PPI construction and module analysis Through Metascape analysis, a protein-protein interaction network of the DEGs was constructed (Fig. [74]5A). Three MCODE modules were identified from the PPI network. First, MCODE1 consisted of 10 genes, including GALR2, CCR8, NPY1R, CXCL13, OPRL1, CCL23, CCL5, DRD3, SST, and CHRM4 (Fig. [75]5B). Second, MCODE2 consisted of 5 genes, including PSMB2, ACTA2, SLC25A4, PPP2R1B, and HMGCS2 (Fig. [76]5C). Third, MCODE3 consisted of 4 genes, including GRM1, NTS, NMBR, and MLNR (Fig. [77]5D). Figure 5. [78]Figure 5 [79]Open in a new tab PPI network and significant module constructed by Metascape. (A) Through Metascape analysis, a protein-protein interaction network of the DEGs was constructed. (B) MCODE1 consists of 10 genes, including GALR2, CCR8, NPY1R, CXCL13, OPRL1, CCL23, CCL5, DRD3, SST, and CHRM4. (C) MCODE2 consists of 5 genes, including PSMB2, ACTA2, SLC25A4, PPP2R1B, and HMGCS2. (D) MCODE3 consists of 4 genes, including GRM1, NTS, NMBR, and MLNR. Furthermore, construction of the PPI network was also performed by the STRING tool, and there were 539 edges and 225 nodes in the PPI network (PPI enrichment p-value < 0.05) (Fig. [80]6A). Six MCODE modules were also identified from the PPI network by the Molecular Complex Detection tool (MCODE) (version 1.5.1), one plug-in of Cytoscape (Fig. [81]6B–G). Degrees >10 were considered the criterion of judgement. A total of 10 genes were identified as hub genes with Cytoscape: OPRL1, CCL5, IL10, IGF1, CCL4, SST, GRM1, PVALB, CXCL13, and NTS (Fig. [82]6H). Figure 6. [83]Figure 6 [84]Open in a new tab PPI network, significant module, and hub gene network constructed by STRING and Cytoscape. (A) There were 539 edges and 225 nodes in the PPI network. (B) The first module consists of 45 edges and 10 nodes. (C) The second module consists of 21 edges and 7 nodes. (D) The third module consists of 13 edges and 6 nodes. (E) The fourth module consists of 10 edges and 5 nodes. (F) The fifth module consists of 6 edges and 4 nodes. (G) The sixth module consists of 20 edges and 12 nodes. (H) A total of 10 genes were identified as hub genes with Cytoscape: OPRL1, CCL5, IL10, IGF1, CCL4, SST, GRM1, PVALB, CXCL13, and NTS. Identification and analysis of significant genes The VENN diagram showed that there were four significant common genes among “Metascape_MCODE”, “Cytoscape_MCODE”, and “Cytoscape_cytoHubba”, including CCL5, OPRL1, SST, and CXCL13 (Fig. [85]7A). Summaries of the functions of the four significant genes are shown in Table [86]3. Hierarchical clustering showed that the significant genes could largely differentiate the NP cells cultured in hyperosmotic media from the NP cells cultured in iso-osmotic media. Compared with NP cells cultured in hyperosmotic media, the expression of CXCL13 was downregulated, and the expression of OPRL1, CCL5, and SST was upregulated, in NP cells cultured in iso-osmotic media (Fig. [87]7B). The Pearson correlation analysis showed that there was a positive correlation between CCL5 and OPRL1 (Fig. [88]7C). Identification of significant genes associated with pain and intervertebral disc degeneration was performed on the comparative toxicogenomics database, which is shown in Fig. [89]7D–G. Figure 7. [90]Figure 7 [91]Open in a new tab Identification and analysis of significant genes. (A) The VENN diagram shows that there were four significant common genes among “Metascape_MCODE”, “Cytoscape_MCODE”, and “Cytoscape_cytoHubba”, including CCL5, OPRL1, SST, and CXCL13. (B) Hierarchical clustering showed that the significant genes could basically differentiate the IVD cells cultured in hyperosmotic media from the IVD cells cultured in iso-osmotic media. (C) The Pearson correlation analysis showed positive correlations among CCL5, OPRL1, and SST. However, the expression of CXCL13 was negatively related to the expression of CCL5, OPRL1, and SST. (D) Relationship to pain and intervertebral disc degeneration related to CCL5 based on the CTD database. (E) Relationship to pain and intervertebral disc degeneration related to OPRL1 based on the CTD database. (F) Relationship to pain and intervertebral disc degeneration related to SST based on the CTD database. (G) Relationship to pain and intervertebral disc degeneration related to CXCL13 based on the CTD database. Table 3. Summaries for the function of 4 significant genes. No. Gene symbol Full name Function 1 CCL5 C-C Motif Chemokine Ligand 5 Chemoattractant for blood monocytes, memory T-helper cells and eosinophils. Causes the release of histamine from basophils and activates eosinophils. 2 OPRL1 Opioid Related Nociceptin Receptor 1 G-protein coupled opioid receptor that functions as receptor for the endogenous neuropeptide nociceptin. 3 SST Somatostatin Somatostatin (somatotropin release inhibiting factor, SRIF) is an endogenous cyclic polypeptide with two biologically active forms. It is an abundant neuropeptide and has a wide range of physiological effects on neurotransmission, secretion and cell proliferation. 4 CXCL13 C-X-C Motif Chemokine Ligand 13 Chemotactic for B-lymphocytes but not for T-lymphocytes, monocytes and neutrophils. Does not induce calcium release in B-lymphocytes. [92]Open in a new tab Results of quantitative real-time PCR analysis The relative expression levels of CCL5 (Fig. [93]8A), OPRL1 (Fig. [94]8B), and SST (Fig. [95]8C) were significantly higher in the NP cells cultured in iso-osmotic media than in the NP cells cultured in hyper-osmotic media. However, the relative expression levels of CXCL13 were reversed (Fig. [96]8D). Figure 8. [97]Figure 8 [98]Open in a new tab Expression level of significant genes through quantitative real-time PCR analysis and strong associations between NRS, osmotic pressure, CCL5, and OPRL1. (A) The relative expression level of CCL5. (B) The relative expression level of OPRL1. (C) The relative expression level of SST. (D) The relative expression level of CXCL13. (E) The NRS was negatively associated with osmotic pressure. (F) There is a positive association between NRS and the relative expression of CCL5. (G) The NRS was positively associated with the relative expression of OPRL1. (H) There was also a positive association between the relative expression of OPRL1 and the relative expression of CCL5. (I) Osmotic pressure was negatively associated with the relative expression of CCL5. (J) Osmotic pressure was negatively associated with the relative expression of OPRL1. (K) The heatmap shows the strong associations between the NRS, osmotic pressure, CCL5, OPRL1, SST, and CXCL13 through Spearman correlation analysis. (L) The receiver operator characteristic curve indicates that the expression level of CCL5 could predict NRS sensitively and specifically. (M) The receiver operator characteristic curve indicates that the expression level of OPRL1 could also predict NRS sensitively and specifically. miRNA of the four significant genes prediction The miRNAs that regulate the four significant genes were screened out with TargetScan (Table [99]4). Table 4. Summary of miRNAs that regulate hub genes. Gene Predicted MiR 1 CCL5 hsa-miR-5000-5p hsa-miR-148a-5p hsa-miR-5010-3p 2 OPRL1 hsa-miR-31-5p 3 SST hsa-miR-101-3p.2 4 CXCL13 hsa-miR-2116-5p hsa-miR-4677-5p hsa-miR-22-5p [100]Open in a new tab Strong associations between the NRS, osmotic pressure, CCL5, and OPRL1 The NRS was negatively associated with osmotic pressure (Pearson Rho = −0.612, P < 0.001) (Fig. [101]8E). The NRS was positively associated with the relative expression of CCL5 (Pearson Rho = 0.951, P < 0.001) (Fig. [102]8F) and OPRL1 (Pearson Rho = 0.946, P < 0.001) (Fig. [103]8G). There was also a positive association between OPRL1 and CCL5 (Pearson Rho = 0.946, P < 0.001) (Fig. [104]8H). Osmotic pressure was negatively associated with CCL5 (Pearson Rho = −0.633, P < 0.001) (Fig. [105]8I) and OPRL1 (Pearson Rho = −0.581, P < 0.001) (Fig. [106]8J). The heatmap showed strong associations between the NRS, osmotic pressure, CCL5, OPRL1, SST, and CXCL13. There was a positive correlation between CCL5 and OPRL1 (Fig. [107]8K). Further associations between NRS and relevant gene expression based on univariate and multiple linear regression Univariate linear regression showed that the NRS was significantly correlated with CCL5 (β = 0.951, p < 0.001), OPRL1 (β = 0.946, p < 0.001), SST (β = 0.902, p < 0.001), and CXCL13 (β = −0.958, p < 0.001). Furthermore, the NRS remained related to CCL5 (β = 0.273, p = 0.024), OPRL1 (β = 0.224, p = 0.036), and CXCL13 (β = −0.389, p = 0.007) in the multivariate linear regression model (Table [108]5). Table 5. Linear regression analysis between NRS and relevant gene expression. Gene symbol NRS Univariate linear regression Multiple linear regression β^a P-value β^b P-value CCL5 0.951 <0.001* 0.273 0.024* OPRL1 0.946 <0.001* 0.224 0.036* SST 0.902 <0.001* 0.094 0.167 CXCL13 −0.958 <0.001* −0.389 0.007* [109]Open in a new tab ^aUnivariate linear regression; ^bMultiple linear regression analysis; β: parameter estimate; NRS: numeric rating scale. *Significant variables: P<0.05. Univariate logistic regression for the proportional hazard analysis of relevant gene expression for NRS Table [110]6 presents the univariate odds ratio (OR) and 95% confidence intervals (95% CI) for the NRS of significant genes. The OR for NRS was 296.400 (95% CI, 36.672–2395.665, p < 0.001) in the group with high expression of CCL5 compared with the low-expression group. For the NRS, subjects with high expression of OPRL1 had a higher OR of 153.700 (95% CI, 39.193–602.760, p < 0.001) than subjects with low expression. However, there was no significant detrimental impact of SST and CXCL13 on the NRS (Table [111]6). Table 6. Correlative parameters’ effect on NRS based on univariate logistic proportional regression analysis. Parameters NRS OR 95% CI P CCL5 Low 1 <0.001* High 296.400 36.672–2395.665 OPRL1 Low 1 <0.001* High 153.700 39.193–602.760 SST Low 1 0.997 High 9.1×10^9 0.000–0.000 CXCL13 Low 1 0.996 High 0.000 0.000–0.000 [112]Open in a new tab OR, odds ratio; 95% CI, 95% confidence interval; NRS: numeric rating scale. *P < 0.05. Independent risk factors for NRS based on multivariate logistic regression The result of multivariate logistic proportional regression analysis showed that higher expression of CCL5 in individuals was associated with significantly greater risk, and the OR of high CCL5 was 34.667 (95% CI, 3.311–362.993; p = 0.003). Additionally, the OR of high OPRL1 was 19.875 (95% CI, 3.922–100.711; p < 0.001) (Table [113]7). Table 7. Correlative genes’ effect on NRS based on multiple logistic proportional regression analysis. Genes NRS OR 95% CI P CCL5 34.667 3.311–362.993 0.003* OPRL1 19.875 3.922–100.711 <0.001* [114]Open in a new tab NRS: numeric rating scale; OR, odds ratio; 95% CI, 95% confidence interval. *P < 0.05. Expression of CCL5 and OPRL1 can sensitively and specifically predict NRS through the receiver operating characteristic curve The receiver operator characteristic curve indicated that the expression level of CCL5 could predict NRS sensitively and specifically (area under the curve for NRS, 0.985; p < 0.001) (Fig. [115]8L), and the expression level of OPRL1 could also predict NRS sensitively and specifically (area under the curve for NRS, 0.985; p < 0.001) (Fig. [116]8M) (Table [117]8). Table 8. Receiver operator characteristic curve analysis of relative gene expression for NRS. Gene symbol NRS AUC P-value 95% CI ODT CCL5 0.985 <0.001* 0.970–1.000 8.035 OPRL1 0.985 <0.001* 0.970–1.000 3.840 [118]Open in a new tab AUC: area under curve; max the maximum of AUC; *Significant variables; ODT: Optimal diagnostic threshold; NRS: numeric rating scale. Neural network prediction model and high-risk warning range of NRS After training, the neural network prediction model reached the best effect, in which the mean squared error was 0.0076566 at epoch 2000 (Fig. [119]9A), and the relativity was 0.98987 (Fig. [120]9B). By verifying the predicted value of the data against the actual value, we found that there are only small differences (Fig. [121]9C,D). Figure 9. [122]Figure 9 [123]Open in a new tab Neural network prediction model and high-risk warning range of NRS. (A) The neural network prediction model reached the best effect, in which the mean squared error was 0.0076566 at epoch 2000. (B) The final training model of the neural network prediction model, and the relativity is 0.98987. (C) Verify the predicted value of the data against the actual value. (D) Verify the data error percentage curve. (E) The high-risk warning range of NRS at the level of the planform. (F) The high-risk warning range of NRS at the level of the three-dimensional stereogram. Through the cubic spline interpolation algorithm, we found the high-risk warning indicator of NRS: CCL5 <7 or> 8.5, and 3.5 8.5 and 3.5 1 or log (fold change) <−1 as the cut-off criteria. The volcano plot was drawn by R language. Functional annotation for DEGs with database for annotation, visualization and integrated discovery (DAVID) The DAVID ([199]https://david.ncifcrf.gov/home.jsp) (version 6.8)^[200]71 was one online analysis tool suite with the functional annotation for Gene Ontology^[201]72 and Kyoto Encyclopedia of Genes and Genomes (KEGG)^[202]73. To perform the Gene Ontology and KEGG analysis of DEGs, the DAVID online tool was implemented. First, we clicked on the “Functional Annotation” on the website ([203]https://david.ncifcrf.gov/home.jsp). Second, we entered the gene list, selected identifiers as official gene symbols, selected list types as gene lists, and submitted lists. Third, we selected to limit annotations and background by Homo species. Finally, the enrichment results of Gene Ontology and KEGG are presented. P-values < 0.05 indicated significance. Pathway and process enrichment analysis with metascape analysis Furthermore, pathway and process enrichment analyses were performed by Metascape ([204]http://metascape.org/gp/index.html#/main/step1)^[205]74. For given DEGs identified between the isotonic NP cells and hypertonic NP cells by the above analysis (see the “Identification of DEGs” section), function and pathway enrichment analysis was carried out with the following ontology sources: Gene Ontology and KEGG Pathway. Terms with a P < 0.01, a minimum count of 3, and an enrichment factor >1.5 were collected and grouped into clusters. Enrichment analysis of gene ontology and KEGG by gene set enrichment analysis (GSEA) The gene sequences were obtained from isotonic and hypertonic NP cells. GSEA was able to analyse all gene sequences of the samples from different groups and export them into two groups in the form of a gene expression matrix, and all genes were sequenced first and then used to indicate the trend of gene expression level between the two groups^[206]75. In this study, we performed GESA analysis on gene sequences of nucleus pulposus under isotonic and hypertonic NP cells as follows. GESA software analysed and sorted genes according to the algorithm after importing gene annotation files, reference function sets and gene data of nucleus pulposus under isotonic and hypertonic conditions, and then we obtained a gene sequence table. After that, GESA software analysed the positions of all genes and accumulated them to obtain enrichment scores. Construction and analysis of the protein-protein interaction (PPI) network, significant module, and hub gene network First, Metascape ([207]http://metascape.org/gp/index.html#/main/step1)^[208]74 was used to construct the PPI network and screen the significant module. Moreover, the Search Tool for the Retrieval of Interacting Genes (STRING, [209]http://string.embl.de/) was also applied to construct the PPI network, and Cytoscape was used to present the network^[210]73. Cytoscape (version 3.6.1) was a free visualization software^[211]76. The Molecular Complex Detection tool (MCODE) (version 1.5.1)^[212]77 could screen and identify the most significant module in the PPI network. When the degrees were set (degrees >10), the hub genes were excavated. Identification and analysis of significant genes A Venn diagram was delineated to identify significant common genes among “Metascape_MCODE”, “Cytoscape_MCODE”, and “Cytoscape_cytoHubba” by FunRich software ([213]http://www.funrich.org). Summaries for the function of 4 significant genes were obtained via GeneCards ([214]https://www.genecards.org/). The R language was used to perform the clustering analysis of significant genes based on the gene expression level. Pearson’s correlation test was performed to complete the correlation analysis among the hub genes. Identification of significant genes associated with pain and intervertebral disc degeneration The comparative toxicogenomics database ([215]http://ctdbase.org/) is a web-based tool that provides information about interactions between chemicals and gene products and their relationships to diseases^[216]78. The relationships between the significant genes and “pain and intervertebral disc degeneration” were analysed via the tool. Cell isolation and culture Human IVD tissue was obtained from patients (60±5 years old) undergoing discectomy prior to surgery for lumbar disc herniation (n = 50) or lumbar interbody fusion (n = 70). Before the surgery, NRS was obtained from all the patients. The research conformed to the Declaration of Helsinki and was authorized by the Human Ethics and Research Ethics Committees of Beijing Ditan Hospital. Informed consent was obtained from all patients. All tissue was harvested from the central NP region. Sequential pronase–collagenase digestion was used to isolate NP cells. The osmotic pressures between inside and outside of the NP cells were measured by an osmometer [VAPRO 5600 (5520), New York, US]. The NP cells were cultured in RPMI-1640 (GIBCO, Gran Island, NY, USA) containing 1% penicillin-streptomycin and 10% foetal bovine serum (FBS; HyClone, Logan, UT, USA) at 37 °C with 5% CO[2]. Cell culture was performed under normoxia. Cell culture medium was exchanged for either hyperosmotic (450 mOsm/kg H[2]O) or iso-osmotic media (293 mOsm/kg H[2]O) after 24 h^[217]69. The osmolarity was altered by using both sucrose and NaCl^[218]17. Quantitative real-time PCR The relative mRNA expression levels of CCL5, OPRL1, CXCL13, and SST were measured by quantitative real-time PCR. Total RNA extraction, cDNA synthesis, and qPCR were performed. Total mRNA was extracted from NP cells with TRIzol reagent (Invitrogen, Beijing, China) according to the manufacturer’s protocol. One microgram of total RNA was used to generate first strand cDNA using random primers and SuperScript II reverse transcriptase (Invitrogen). Real-time PCR was performed in triplicate using the SYBR PrimeScript RT-PCR Kit (Takara, Dalian). The expression of GAPDH was measured as an internal control. Thermocycler conditions included an initial hold at 50 °C for 2 minutes and then 95 °C for 10 minutes followed by a two-step PCR programme of 95 °C for 15 seconds and 60 °C for 60 seconds repeated for 40 cycles on an Mx4000 system (Applied Biosystems, Foster City, CA), on which data were collected and quantitatively analysed^[219]79. The expression level of mRNA was demonstrated as the fold change relative to the control group^[220]80. The primer sequences used in the qPCR are shown in Table [221]9. Table 9. Primers and their sequences for PCR analysis. Primer Sequence (5′–3′) CCL5-hF GGAGAATCGCTTGAACCC CCL5-hR GGGTTCAAGCGATTCTCC OPRL1-hF GTTCTGTGAGTCCCTGTCTGTG OPRL1-hR CCAGCCTACCTGAGGATGAC SST-hF CCGTATGTTTGAGATTGTG SST-hR GACCGTTCTGGTAAGATAAA CXCL13-hF TAGGGCTAAAGGGTTGTT CXCL13-hR TGATGAGTTGATGGGTGC [222]Open in a new tab Identification of the miRNA-gene pairs of the significant genes TargetScan ([223]www.targetscan.org) is a web-based database that can predict biological targets of miRNAs. In our study, the miRNA-gene pairs of the significant genes were screened out with TargetScan. Statistical analysis The data are expressed as the percentage of total and mean ± SD. When two groups were compared, Student’s t-test was used to determine statistical significance. Double Delta Ct Analysis was used for the PCR statistics. By using Pearson’s correlation test, associations between the NRS, osmotic pressure, and the expression of CCL5 and OPRL1 were analysed. We used linear regression analysis to explore the linear correlations between NRS, osmotic pressure, and the expression of CCL5 and OPRL1. The Spearman-rho test was executed to compare NRS, osmotic pressure, and the expression of CCL5, SST, CXCL13, and OPRL1 for the correlation analysis. Univariate linear regression analysis between NRS and relevant gene expression was performed. When any analytic results reached a liberal statistical threshold of p < 0.2 for each comparison, the risk factors were forced into the multivariable linear regression model to confirm independent risk factors for the NRS. Variance inflation factors were calculated to quantify the severity of multicollinearity. To identify the residual distribution, a histogram and Shapiro-Wilk test were conducted, and we concluded that residuals accurately modelled the normal distribution. Univariate and multivariate logistic regression analysis was used to calculate the odds ratios (ORs) of each variable for NRS. A receiver operating characteristic curve analysis was performed to determine the ability of the expression of CCL5 and OPRL1 to predict the NRS. All statistical analyses were conducted using SPSS software, version 23.0 (IBM Corp., Armonk, NY, USA). A p-value < 0.05 was considered to be significant^[224]81. Construction of neural network modelling and cubic spline interpolation between the expression of CCL5 and OPRL1 and NRS The training group was divided into training data and calibration data randomly according to the proportion of 7:3. There were 77 individuals in the training data and 33 individuals in the calibration data. Furthermore, a total of 10 individuals were used as the validation data. MATLAB (version: 8.3, NY, US) was used to accomplish the normalization processing of variable values, network initialization, network training, and network simulation. The number of input neurons in the input layer was the same as the number of input variables, and the number was two. The hidden layer was designed as 1 layer, and the output layer was also designed for 1 layer. One output variable was NRS. Then, the forecast model was established with a hidden unit number of 6. When training to 2000 steps after repeated training, the falling gradient is 0, and the training speed is uniform. At the same time, the training error was 0.0076566, and the R (relativity) value reached 0.98987. Cubic spline interpolation was performed to predict the high-risk warning range of NRS by using the expression of CCL5 and OPRL1^[225]82. Ethics approval and consent to participate The data of this research were downloaded from the GEO database public website. The research conformed to the Declaration of Helsinki and was authorized by the Human Ethics and Research Ethics Committees of Beijing Ditan Hospital. All institutional and national guidelines for the care and use of participants were followed. Acknowledgements