Abstract Aims While MRI serves as a tool for assessing the severity of lumbar disc herniation (LDH), it has been observed that imaging diagnoses do not always align with clinical symptoms in nearly half of patients. The absence of dependable prognostic biomarkers impedes the early and accurate diagnosis of LDH, which is critical for the development of further treatment approaches. Thus, the aim of this study was to elucidate the molecular mechanisms that determine pain and LDH severity. Methods We conducted a pilot study with 55 patients, employing transcriptomic and metabolomic analyses on blood samples to identify potential biomarkers. A gene-metabolite interaction approach helped in identifying the pivotal pathway linked to disease severity. Moreover, a machine-learning model was designed to differentiate between patients based on the intensity of pain. Results Cholinergic-related glycerophospholipid metabolism emerged as the predominant enriched pathway in the severe symptom group via gene-metabolite interaction network analysis. Among various models, the gradient boosting machines (GBM) model stood out, achieving a commendable area under the curve (AUC) of 0.875 in distinguishing between the severe and mild symptom groups using combined RNA and metabolomics data. Conclusion Integrated molecular profiling of blood biomarkers has highlighted a novel determining pathway for LDH severity. This machine-learning approach can serve as a valuable predictive tool when MRI findings are inconclusive. Future research will focus on validating these biomarkers and exploring their potential for personalized medicine approaches. Cite this article: Bone Joint Res 2025;14(5):434–447. Keywords: Lumbar disc herniation diagnosis, Integrated molecular profiling, Glycerophospholipid metabolism, Lumbar disc herniation, MRI, biomarkers, RNA, blood, blood biomarkers, Western blot, t-test, gene expressions, straight leg raising test Article focus * MRI as a tool for assessing the severity of lumbar disc herniation (LDH) has limitations, with nearly half of patients having imaging diagnoses that are inconsistent with clinical symptoms. The lack of reliable prognostic biomarkers hinders the early and accurate diagnosis of LDH, affecting the development of treatment methods. * Our pilot study explores the clinical value of blood biomarkers in assessing the severity of LDH using bioinformatics methods. Key messages * This study conducted preliminary research on 55 LDH patients and performed transcriptomic and metabolomic analyses on blood samples. * The comprehensive molecular profile of blood biomarkers provides new data support for the assessment of LDH severity. Strengths and limitations * Combining transcriptomics and metabolomics analyses provides a comprehensive perspective on important biomarkers of LDH severity. * The research results have the potential to improve the early diagnosis and personalized treatment strategies of LDH, filling the gap between MRI diagnosis and clinical symptoms. * The study only included 55 patients and had a relatively small sample size, which limited the generalizability and extrapolation of the results. * Although biomarkers have been confirmed, there is insufficient in-depth exploration of the relevant mechanisms, and their role in disease development has not yet been fully elucidated. Introduction Lumbar disc herniation (LDH) is a prevalent orthopaedic condition, affecting approximately 2% to 3% of the global population, and the lifetime risk for symptomatic LDH is 1% to 3% with a higher incidence in males.^[42]1 It commonly manifests as a recurring clinical condition, characterized by intervertebral disc (IVD) degeneration, annulus rupture, and inward shifting of the nucleus pulposus due to shear stress. This culminates in the compression of the spinal nerve root.^[43]2 LDH is a primary cause of lower back and leg pain, typically occurring at the L4-5 or L5-S1 intervertebral levels, and is frequently linked to behaviours such as prolonged sitting or bending.^[44]3 Early intervention and treatment are essential, as untreated LDH can lead to abnormal gait, which may disrupt the mechanical equilibrium of the back and lower limb muscles. This disruption can greatly affect patients’ daily activities and work performance, and can even lead to disability.^[45]4 Prior research sought to associate MRI findings with discography outcomes related to pain responses in patients suffering from low back pain.^[46]5,[47]6 While earlier studies did demonstrate notable correlations between pain and various MRI and discographic markers, one investigation revealed that half of the patients displayed a discordance between imaging diagnoses and clinical symptoms.^[48]5 This suggests that lumbar IVDs can be classified into two distinct categories: those displaying notable imaging anomalies but presenting with mild symptoms, and those with minor imaging discrepancies but experiencing intense symptoms. These findings underscore the importance of pinpointing specific biomarkers to distinguish or forecast these patient categories; doing so could substantially aid in tailoring therapeutic interventions for these individuals.^[49]7 Previous omics-based studies have delved into LDH, encompassing extensive gene expression profiling in the nucleus pulposus of human ruptured LDH^[50]8 as well as a thorough analysis of molecular pathways and key genes implicated in LDH.^[51]9 Pro-inflammatory chemokines CCL5 and CXCL6 have also been identified to be associated with LDH.^[52]10 However, there has not yet been an integrative analysis combining both metabolomics and transcriptomics concerning LDH. By leveraging these two methodologies, in this study we aimed to discern the pathway-level distinctions between the two variants of LDH. In this research, we employed an integrative approach of transcriptomic and metabolomic analyses to identify distinctive biomarkers and molecular pathways associated with diverse manifestations of LDH. Subsequently, we leveraged a suite of sophisticated machine-learning algorithms to refine the selection of potential biomarkers and construct a robust predictive model. This model is designed to delineate different patient profiles. Our study introduces a novel predictive tool, capable of precisely assessing the severity of LDH and thereby facilitating improved patient prognoses. Methods Participant description In this study, participants were consecutively recruited from a single institution between April and November 2022. They were divided into three groups: Group A (mild MRI findings but severe symptoms) (n = 22); Group B (severe MRI findings but mild symptoms) (n = 26); and Group C (healthy individuals) (n = 7). The study protocol was approved by the Ethics Committee of the Wuxi TCM Hospital Affiliated to Nanjing University of Chinese Medicine (No. SSB2020122502), and all participants provided written informed consent. We employed the Michigan State University (MSU) classification system to assess the severity of LDH, stratifying the extent of disc protrusions into Grades 1, 2, and 3. Utilizing the intra-facet line as a reference, we classified the protrusions as follows: Grade 1, where the lumbar disc protrusion is less than or equal to 50% of the distance from the disc’s posterior edge to the reference line; Grade 2, where the lumbar disc protrusion exceeds 50% of the distance from the disc’s posterior edge to the reference line; and Grade 3, where the disc protrusion surpasses the reference line.^[53]11-[54]13 Pain intensity was evaluated utilizing the visual analogue scale (VAS), a continuous and intuitive measurement tool, with the scoring criteria defined as: 0 points indicate the absence of pain; scores of 1 to 3 correspond to mild pain; scores ranging from 4 to 6 denote moderate pain; 7 to 9 points reflect severe pain; and a score of 10 points signifies unbearable, intense pain.^[55]14-[56]16 The inclusion criteria for Group A were: 1) meeting the diagnostic criteria for LDH; 2) lumbar spine MRI indicating disc herniation with a MSU grade greater than 1; 3) subjective severe pain experienced by the patient, intolerable, with a VAS score of fewer than 5 points; 4) clinical manifestations encompassing the following: lower limb radiating pain, with painful sites including but not limited to the anterior thigh, posterior thigh, and areas corresponding to the distribution of the affected nerve; abnormal sensations in the lower limbs, with diminished light touch sensation in the skin areas innervated by the corresponding affected nerve; and positive straight leg raising test, straight leg raising test with augmentation, contralateral straight leg raising test, or femoral nerve stretch test; 5) persistent for more than three months. The inclusion criteria for Group B were: 1) meeting the diagnostic criteria for LDH; 2) MRI indicating disc herniation with an MSU grade less than 1; 3) subjective severe pain experienced by the patient, intolerable, with a VAS score exceeding 5 points; 4) clinical manifestations encompassing the following: lower limb radiating pain, with painful sites including but not limited to the anterior thigh, posterior thigh, and areas corresponding to the distribution of the affected nerve; abnormal sensations in the lower limbs, with diminished light touch sensation in the skin areas innervated by the corresponding affected nerve; and positive straight leg raising test, straight leg raising test with augmentation, contralateral straight leg raising test, or femoral nerve stretch test; 5) persistent for more than three months. The inclusion criteria for Group C were: 1) history of LDH; 2) lumbar spine MRI showing no substantial abnormalities; 3) without symptoms of nerve root compression corresponding to LDH, such as accompanying low back pain for one week or no considerable low back pain, with lower limb numbness and soreness as the main symptoms and sensory abnormalities in one or more regions of the lumbosacral nerve root distribution, local paralysis, limited lumbar mobility, and exacerbation of lower limb symptoms during lumbar effort, coughing, and sneezing.^[57]17 Peripheral blood mononuclear cells and serum extraction Blood samples were obtained within the initial week following hospitalization. To discern the critical pathways that determine LDH severity, especially in cases undetectable through MRI, we conducted transcriptomics and metabolomics analyses. Transcriptomics samples: 2 ml of blood from patients meeting the inclusion criteria was collected using medical vacuum blood collection tubes (ethylenediaminetetraacetic acid (EDTA) anticoagulant tubes). Subsequently, peripheral blood mononuclear cells (PBMCs) were isolated following the relevant protocol.^[58]18 The cells were then mixed thoroughly with 1 ml of TRIzol reagent (Thermo Fisher Scientific, USA), left to stand at room temperature for ten minutes, and subsequently stored at -80°C for future testing. Metabolomics samples: 1) whole blood samples were collected in medical vacuum blood collection tubes (coagulant gel tubes), gently inverted to mix, and then immediately placed upright in an icebox to stand for 30 to 60 minutes; 2) after natural sedimentation and layering, the upper layer of serum was extracted and transferred to a 1.5 ml screw-cap conical centrifuge tube and spun at 13,000 rpm for two minutes; 3) the supernatant was then transferred into a new 1.5 ml screw-cap conical centrifuge tube, with 200 µl of serum aliquoted into each tube, discarding the cell pellet; 4) the aliquoted serum samples were stored long-term at -80°C for future analysis. RNA extraction Total RNA was extracted from the PBMCs of LDH patients using TRIzol reagent, following the manufacturer’s instructions. The purity of the RNA samples was assessed based on the A260/A280 absorbance ratio using a Nanodrop ND-2000 system (Thermo Fisher Scientific). Additionally, the RNA integrity number (RIN) was determined with an Agilent Bioanalyzer 4150 system (Agilent Technologies, USA). Only those samples meeting the quality criteria were selected for subsequent library construction. Library preparation and mRNA sequencing experiments Paired-end libraries were generated using the ABclonal mRNA-seq Lib Prep Kit (ABclonal, China), as per the manufacturer’s guidelines. From 1 μg of total RNA, messenger RNA (mRNA) was isolated using oligo (dT) magnetic beads. These mRNA samples were then fragmented using divalent cations under elevated temperatures. First-strand complementary DNA (cDNA) was produced from these fragments using random hexamer primers and reverse transcriptase (RNase H), and this was followed by the synthesis of second-strand cDNA involving DNA polymerase I and other reagents. Double-stranded cDNA fragments underwent adapter ligation to prepare the library. After PCR amplification of the adaptor-ligated cDNA, the products were purified using the AMPure XP system (Thermo Fisher Scientific). The quality of the library was verified using the Agilent Bioanalyzer 4150 system. Sequencing was conducted on an Illumina NovaSeq 6000 or MGISEQ-T7 (Illumina, USA), yielding 150 bp paired-end reads. Transcriptomic analysis Raw reads in FASTQ format were initially processed using in-house Perl scripts. This step involved removing adapter sequences and filtering out reads with low quality (where lines with a quality score of 25 or lower constitute more than 60% of the entire read) or an ‘N’ ratio exceeding 5%. ‘N’ denotes bases with undetermined information. This processing yielded clean reads suitable for subsequent analyses. These clean reads were then mapped to a reference genome using the kallisto software.^[59]19 Differential expression was analyzed with DESeq2.^[60]20,[61]21 Genes with an absolute log2 fold change greater than 1 and an adjusted p-value below 0.05 were classified as significantly differentially expressed. The differentially regulated genes were then subjected to hierarchical clustering, an unsupervised machine-learning technique that groups samples based on their similarity in gene expression patterns.^[62]22 To understand the functional implications of these differentially expressed genes and discern differences at the gene functional level between samples, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were conducted. The clusterProfiler R package was employed for both GO function and KEGG pathway enrichment analyses.^[63]23 A p-value below 0.05 indicated significant enrichment in either GO or KEGG functions. To explore the influence of LDH severity on immune cell composition, we used CibersortX to deconvolute the transcriptome data.^[64]24,[65]25 Liquid chromatography–mass spectrometry experiments For polar metabolite untargeted metabolomics, we used a Sciex TripleTOF 6600 quadrupole time-of-flight mass spectrometer (Sciex, Canada) combined with hydrophilic interaction chromatography and electrospray ionization. The analysis was performed at Shanghai Applied Protein Technology (China). Liquid chromatography (LC) separation was conducted on an ACQUIY UPLC BEH Amide column with a specified solvent gradient (Waters Corporation, Ireland). The flow rate was set at 0.4 ml/minute, the column temperature at 25°C, auto sampler temperature at 5°C, and the injection volume at 2 μl. The mass spectrometer operated in both negative and positive ion modes with specific Electrospray Ionization (ESI) source conditions. Mass spectrometry (MS) acquisition ranged from 60 m/z to 1,000Da, while auto MS/MS acquisition was from 25 m/z to 1,000Da, using information dependent acquisition (IDA) in high sensitivity mode. Specific parameters, including collision energy and declustering potential, were set accordingly. LC-MS analysis Raw MS data, in the form of wiff.scan files, were transitioned to MzXML format using ProteoWizard MSConvert (ProteoWizard, USA) before being introduced to the XCMS software (Scripps Research, USA). Peak identification used parameters such as centWave m/z at 25 ppm, peakwidth between 10 and 60, and prefilter from 10 to 100. For grouping peaks, settings included bw at 5, mzwid at 0.025, and minfrac at 0.5. Only ion features with over 50% non-zero measurements in at least one group were retained. The in-house database, composed of authentic standards, facilitated compound identification. After normalization based on peak intensity, the data were imported into SIMCA-P (version 14.1; Umetrics, Sweden) for advanced analysis, which encompassed both Pareto-scaled principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA). Model robustness was gauged through seven-fold cross-validation and response permutation testing. Each variable’s contribution to classification in the OPLS-DA model was denoted by its variable importance in projection value (VIP). The standard for significance was an independent-samples t-test, with VIP values exceeding 1 and p-values less than 0.05 deemed statistically significant. Bioinformatics analysis: for KEGG pathway annotation, metabolites were matched with the KEGG database to determine their pathways. We then conducted an enrichment analysis of these pathways using Fisher’s exact test, with a significance threshold of p < 0.05. For hierarchical clustering, we used Cluster 3.0 and Java Treeview software (YuLab, Southern Medical University (SMU), China). We selected the Euclidean distance for similarity and average linkage clustering for grouping. Alongside the dendrogram, a heatmap was provided for visual clarity. Gene-metabolite interaction network analysis The MetaboAnalyst5.0^[66]26 offers a robust platform for exploring and illustrating the gene-metabolite interaction network between functionally associated genes and metabolites.^[67]27,[68]28 Using this platform, we sought to identify the joint-pathways most influenced by LDH severity. We leveraged high-confidence interactions from the STITCH database – a comprehensive resource that catalogues interactions between metabolites and target genes – to extract associations between chemicals and human genes.^[69]29 Western blot PBMCs were extracted and protease and phosphatase inhibitors were added in radioimmunoprecipitation assay buffer (RIPA buffer; Solarbio, China), and the samples were denatured at 100°C for 15 minutes. Subsequent to the separation on a 10% sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), proteins were transferred to polyvinylidene fluoride (PVDF) membranes. These membranes were then blocked using a 5% skim milk powder solution for one hour. Primary antibodies, including anti-lysophospholipid acyltransferase 3 (LPCAT3) antibody (1:500, 67882-1-Ig; Proteintech, USA), anti-phosphate cytidylyltransferase 1, choline, alpha (PCYT1A) antibody (1:500, NBP1-84366; NOVUS, USA), and anti-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) antibody (1:1,000, 10494-1-AP; Proteintech), were incubated overnight. Secondary antibodies (1:2,000, SA00001-2 and SA00001-1; Proteintech) were then applied for two hours at room temperature, followed by detection using an Enhanced Chemiluminescence (ECL) kit (Millipore, USA). Finally, the protein bands were visualized with the Chemidoc detection system (Bio-Rad, USA). Quantitative real-time polymerase chain reaction Total RNA was isolated from PBMCs using TRIzol reagent. cDNA synthesis was performed with 500 ng of extracted RNA using the HiScript II SuperMix for quantitative real-time polymerase chain reaction (qRT-PCR) (Vazyme, China). The qRT-PCR reactions were conducted using SYBR Green Master Mix on the ABI 7500 System (Thermo Fisher Scientific). The amplification protocol consisted of an initial denaturation at 94°C for ten minutes, followed by 45 cycles of denaturation at 94°C for ten seconds, and annealing/extension at 60°C for 45 seconds. GAPDH served as the housekeeping gene for normalization. The primer sequences of the target genes are shown in [70]Table I. Table I. Primer sequences of the target genes. Gene Forward primer sequence (5’-3’) Reverse primer sequence (5’-3’) LPCAT3 CAGGATACCTGGTCTGCTTCCA TGAAGAGCCAGTGGATGGTCTG PCYT1A GTTCCTTCCAAAGTGCAGCGCT AGGAGTTCCTCTGCTGGCTTCT GAPDH AATGGGCAGCCGTTAGGAAA GCCCAATACGACCAAATCAGAG [71]Open in a new tab GAPDH, glyceraldehyde-3-phosphate dehydrogenase; LPCAT3, lysophospholipid acyltransferase 3. Enzyme-linked immunosorbent assay (ELISA) Levels of LPCAT3 (JL21306; Jianglai Biotechnology, China) and PCYT1A (HZ-11306R; HZB Science, Germany) expression in the serum were quantified using an enzyme-linked immunosorbent assay (ELISA) kit following the manufacturer’s protocol (Jianglai Biotechnology). The absorbance of the samples was measured at 450 nm using a microplate reader (Biotek, USA). The concentration of each sample was determined using a standard curve. Machine learning The efficacy of current methodologies, including subjective patient reports and MRI evaluations, has proven to be inadequate in accurately discerning symptom severity among patients. To address this problem, we developed a machine learning-based model integrating metabolite and gene expression data. We normalized RNA and metabolomics datasets and merged them into a comprehensive dataset featuring 48 cases, with 22 in Group A and 26 in Group B, encompassing a total of 33,488 features. We chose gradient boosting machines (GBM), random forests (RF), deep learning (DL), and support vector machines (SVM) for their suitability for high-dimensional data and classification tasks.^[72]30,[73]31 Each algorithm was rigorously trained and cross-validated using the combined dataset. We fine-tuned the hyperparameters for each model to optimize performance, employing k-fold cross-validation to assess the generalizability of the models. We subsequently analyzed the feature importance scores derived from the models to identify the most influential metabolites and gene expressions. The receiver operating characteristic (ROC) curve was used to evaluate the models’ performance, focusing on the area under the curve (AUC) as a measure of the trade-off between sensitivity and specificity. Statistical analysis In this study, different statistical methods were used to analyze the data and determine significant differences between groups. We used one-way analysis of variance (ANOVA) to evaluate differences between two or more groups. Metabolomics differential foreign body screening was performed using the Mann-Whitney U test combined with the Benjamini-Hochberg method for multiple comparison correction. Independent-samples t-test was performed to calculate p-values for biomarker expression analysis (western blot, qRT-PCR, ELISA). Mann-Whitney U test, Kruskal-Wallis test, and Fisher’s exact test were used for metabolomics and transcriptomics data comparison to ensure the robustness of the data. Fisher’s exact test was applied in KEGG pathway enrichment analysis to assess statistical significance, and machine learning models were used to assess the classification performance by ROC curve analysis and permutation test. Pearson’s correlation coefficient (PCC) or Spearman’s rank correlation coefficient was used to analyze the relationship between biomarkers and pain severity. Statistical significance was set at p < 0.05. Results Transcriptomic analysis of peripheral blood mononuclear cell samples in severe and mild LDH patients, and controls We conducted a comprehensive profiling of 55 paired blood samples through transcriptomic and untargeted metabolomic analyses. The cohort included: 22 samples from Group A (mild MRI findings but severe symptoms); 26 samples from Group B (severe MRI findings but mild symptoms); and seven samples from Group C (healthy individuals). The details of patient characteristics are presented in [74]Table II. Group A exhibits a Grade A lumbar disc protrusion, which is relatively mild, but with more severe symptoms ([75]Figure 1a); Group B has a Grade 2 lumbar disc protrusion, which is more pronounced, but with milder symptoms ([76]Figure 1b); while Group C consists of normal IVDs with no protrusion and no clinical symptoms ([77]Figure 1c). Table II. Clinical characteristics of patients with different severities of lumbar disc herniation (Group A and Group B) and healthy controls (Group C). Characteristic Group A Group B Group C p-value Total, n 22 26 7 Mean age, yrs (SD) 60.00 (12.29) 60.08 (11.73) 62.57 (9.57) 0.870[78]^* Sex, n Male 12 9 3 Female 8 11 4 Mean BMI, kg/m^2 (SD) 24.08 (1.30) 24.76 (1.52) 24.69 (1.62) 0.260[79]^* Types, n Bulging 12 4 Prolapsing 1 6 Protruding 7 10 Anatomical location of LDH, n N/A Central type 8 10 Central lateral type 14 15 Lateral type 0 1 Extreme lateral type (level), n N/A L3/L4 8 8 L4/L5 9 16 L5/S1 5 2 Mean VAS (SD) 7.36 (0.66) 3.85 (0.97) < 0.001 [80]Open in a new tab ^* One-way analysis of variance (ANOVA). ^† Independent-samples t-test. LDH, lumbar disc herniation; N/A, not applicable; VAS, visual analogue scale. Fig. 1. [81]Fig. 1 [82]Open in a new tab Clinical profiling of the patient cohorts. a) Group A, with Grade 1 protrusion, shows relatively mild disc displacement yet presents with more severe symptoms. b) Group B, classified as Grade 2, exhibits a more noticeable protrusion but milder symptoms. c) Group C is characterized by normal intervertebral discs without any protrusion or associated clinical symptoms. Although there is a discernible distinction between Group A and Group B, considerable overlap is evident in the hierarchical clustering analysis ([83]Figure 2a). This implies that relying solely on transcriptomic data from PBMC may not adequately differentiate between Groups A and B. [84]Figures 2b and 2c present the differentially regulated genes in comparisons between mild and severe groups and the control group. Subsequently, immune infiltration analysis showed that in comparison to the severe group (Group A), CD4 memory T cells exhibited a significant increase in the severe group, even though the overall proportion of this cell type remains low ([85]Figure 2d). However, when incorporating the healthy control group into the analysis, no significant disparities in cell types were identified among the groups ([86]Figure 2e). In conclusion, these findings indicate that the overall transcriptome is moderately altered between the severe and mild LDH groups. Fig. 2. [87]Fig. 2 [88]Open in a new tab Transcriptomic analysis of peripheral blood mononuclear cells (PBMCs) from three groups. a) Heatmap clustering for all samples. b) Volcano plots displaying differentially expressed genes between the mild and severe groups. c) Volcano plots illustrating differentially expressed genes between the mild and control groups. d) CibersortX analysis was used to deconvolve the PBMC transcriptome data and compare the severe and mild groups. e) CibersortX analysis was employed to compare all three groups. *p < 0.05, Mann-Whitney U test. NK cell, natural killer cell; ns, non-significant. Metabolomic analysis of serum samples in patients with severe and mild LDH and controls The untargeted metabolomic profiling showed that in contrast to the transcriptomics data, the PCA depicted a clearer distinction among groups A, B, and C ([89]Figure 3a). Hierarchical clustering analysis also revealed that the majority of samples from groups A and B were well-clustered, as illustrated in [90]Figure 3b. A Venn diagram identified 38 metabolites that were significantly different (adjusted p-value (p-adj) < 0.05, Mann-Whitney U test combined with Benjamini-Hochberg method, for multiple comparison correction) in group A compared to group B ([91]Figure 3c). In contrast, only 13 metabolites were significantly different when comparing group B to group C. This suggests that the metabolic differences between the two LDH groups (A and B) are more pronounced than between any LDH group and the healthy control group. Among the differentially regulated metabolites, ‘Linolenic acid’, ‘arginine’, and ‘choline’-related metabolites were the most significant ([92]Figure 3d). Intriguingly, the KEGG enrichment analysis highlighted the ‘cholinergic synapse’ as one of the most enriched pathways ([93]Figure 3e). Collectively, these metabolomic findings suggest that the cholinergic pathway might be pivotal in influencing symptom severity to a certain degree. Fig. 3. [94]Fig. 3 [95]Open in a new tab Metabolomic profiling of serum from three groups. a) Principal component analysis (PCA) across all groups. b) Hierarchical clustering analysis for all groups. c) Venn diagram depicting the overlap of differentially regulated metabolites. d) Bar plot detailing the classes and names of metabolites differentially regulated between the severe and mild groups. e) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis for the differentially regulated compounds. Glycerophospholipid metabolism is the main enrichment pathway in the severe group The joint-pathway analysis revealed that ‘glycerophospholipid metabolism’ emerged as the predominant enriched pathway, characterized by 5 hits out of a total of 86 elements ([96]Figures 4a and 4b). Another significantly enriched pathway was ‘thiamine metabolism’, albeit with only 2 hits out of 14 elements. [97]Figure 4c showcases the comprehensive gene-metabolite network pertaining to the glycerophospholipid metabolism pathway. A granular network, detailing the variations in gene expression and metabolites, can be found in Supplementary Figure a. Fig. 4. [98]Fig. 4 [99]Open in a new tab Gene-metabolite network analysis for severe versus mild group. a) Bubble plot highlighting the most enriched metabolic networks when comparing the severe group to the mild group. b) Table presenting the significantly enriched metabolic pathways. c) Overview diagram illustrating the glycerophospholipid metabolism pathway. We subsequently aimed to corroborate the involvement of the two pivotal enzymes, LPCAT3 and PCYT1A, which were found to be prominent within the gene-metabolite network exclusively in the severe group. We assessed and juxtaposed the expression levels of LPCAT3 and PCYT1A in the PBMCs of both patient cohorts. Western blot analysis revealed that the expression levels of LPCAT3 and PCYT1A were augmented with increasing disease severity ([100]Figures 5a to 5c). These findings were echoed by qRT-PCR outcomes, which aligned with the western blot data, thereby corroborating the mRNA sequencing (mRNA-seq) results ([101]Figures 5d and 5e). Furthermore, ELISA assays indicated that serum concentrations of LPCAT3 and PCYT1A were elevated in patients with severe symptoms compared to those in the mild group ([102]Figures 5f and 5g). Collectively, these findings lend additional support to the gene-metabolite network observations, suggesting that the upregulation of these key enzymes also occurs at the protein level. Fig. 5. Fig. 5 [103]Open in a new tab The protein and messenger RNA (mRNA) expression level of lysophospholipid acyltransferase 3 (LPCAT3) and PCYT1A in the severe and mild groups. a) to c) Representative western blot strips of LPCAT3 and PCYT1A in the severe and mild groups with quantitative analysis (n = 3). d) and e) Quantitative real-time polymerase chain reaction (qRT-PCR) results of LPCAT3 and PCYT1A in the lumbar disc herniation (LDH) patients with mild and severe symptoms with relative quantitative analysis (n = 3). f) and g) Enzyme-linked immunosorbent assay (ELISA) results of LPCAT3 and PCYT1A with quantitative analysis (n = 3). Independent-samples t-test was used for comparison: *p < 0.05, **p < 0.01, ***p < 0.001. GAPDH, glyceraldehyde-3-phosphate dehydrogenase. Utilizing machine-learning algorithms for distinguishing between severe and mild groups in diagnosis The machine learning models’ performances were exceptional. Among the models we tested, the GBM model outperformed the others with an impressive AUC of 0.875 in distinguishing between the two groups based on RNA and metabolomics data ([104]Figure 6a). As the number of features increases, the area under the receiver operating characteristic curve (AUROC) also rises, hitting a plateau when the feature count reaches 10 ([105]Figure 6b). Notably, the GBM algorithm demonstrated exceptional performance with a mean accuracy of 0.873 (SD 0.089) and a mean AUC of 0.875 (SD 0.083) ([106]Figure 6c). The expression levels of the ten features in both Group A and Group B are also depicted in [107]Figure 6d. In summary, the model’s exceptional performance underscores its ability to differentiate between the severe and mild LDH groups effectively, even in cases where MRI has failed to make this distinction. Fig. 6. [108]Fig. 6 [109]Open in a new tab Diagnostic machine learning algorithms to differentiate severe from mild groups. a) Receiver operating characteristic (ROC) curves for group classification using RNA and metabolomics data via gradient boosting machine (GBM), random forest (RF), deep learning (DL), and support vector machine (SVM) algorithms. b) Graph depicting the relationship between area under the receiver operating characteristic curve (AUROC) and the number of features used. c) Table detailing the mean performance metrics of various machine-learning models for group classification. d) Box plots illustrating the variance in the top ten features between the severe and mild groups. AUC, area under the curve. Discussion For the first time, we have integrated both transcriptomic and metabolomic analyses to assist in diagnosing or distinguishing the severity of LDH. Our focus was on patients for whom MRI failed to provide clear differentiation. We aimed to pinpoint key pathways at the transcriptomic and metabolomic levels, shedding light on potential therapeutic targets for this disease. While several studies have documented significant gene expression changes in the nucleus pulposus tissue level between disease and control groups,^[110]7 our study found only moderate gene expression differences when comparing severe and mild LDH PBMC samples. This suggests that relying solely on blood gene expression markers for differentiation remains a challenge. In terms of metabolomic changes, prior research has highlighted that Lactobacillus paracasei S16 can alleviate LDH by modulating both the inflammatory response and gut microbiota.^[111]32 Another study, utilizing gas chromatography coupled with MS, sought to identify biomarker metabolites distinguishing LDH patients from healthy controls. Echoing our findings, they observed that approximately 30 metabolites in patient urine samples were significantly altered.^[112]33 This reinforces the idea that relying solely on metabolomic analysis is insufficient for diagnosing or determining the severity of the disease. By integrating metabolomic and transcriptomic analyses, we can capture changes at both the enzyme and metabolite levels in the disease group. This offers a comprehensive view of pathway alterations. Such an approach has been employed in studies on cancer research,^[113]34 diabetes,^[114]35 birth defects,^[115]36 and various other diseases. In this study, a combined analysis of metabolomics and transcriptomics pinpointed ‘glycerophospholipid metabolism’ as a crucial pathway in determining severity. While metabolomic analysis on its own did highlight several associated metabolites – such as choline and other glycerophospholipid-related compounds – that were significantly altered in the severe group, it was the gene-metabolite network that distinctly indicated this pathway. This was due to observed changes in the expression levels of both upstream and downstream enzymes. Prior research has associated glycerophospholipid metabolism with chronic pain.^[116]37 Another study identified significant effects on glycerophospholipid metabolism in the complete Freund’s adjuvant (CFA)-induced inflammatory pain model, suggesting that it may amplify inflammatory reactions and trigger persistent pain.^[117]38 In our analysis, phosphate cytidylyltransferase 1 (PCYT1A), which catalyzes a critical rate-limiting step in the cytidine diphosphate (CDP)-choline pathway for phosphatidylcholine biosynthesis, was markedly elevated in the severe group ([118]Figure 4c). Concurrently, choline ([119]C00114), a pivotal compound in this pathway, also showed a significant increase. In a related pathway, both LPCAT3 and its precursor metabolite, 1-Acyl-sn-glycero-3-phosphocholine ([120]C04230), were elevated. These pathways suggest that the downstream metabolite, phosphatidylcholine ([121]C00157), should have been upregulated. However, this anticipated result was not observed, suggesting that phosphatidylcholine might be rapidly consumed. Given that phosphatidylcholine is a foundational component for biological membranes and is also involved in acetylcholine synthesis,^[122]39 our findings underline the notable role glycerophospholipid metabolism plays in determining LDH severity. Indeed, a previous study indicated that the loss of Pcyt1a affects macrophages, leading to accelerated phosphatidylcholine turnover in these cells, which in turn promotes adipose tissue inflammation in obesity.^[123]40 Another study highlighted that the epigenetic suppression of LPCAT1 and PCYT1A correlates with lipidomic changes in polycystic ovary syndrome. Moving forward, research should investigate the accelerated consumption of phosphatidylcholine and the mechanisms behind it. Employing lipidomics will be essential in elucidating the shifts in glycerophospholipid metabolism in subsequent studies. The primary objective of this integrated analysis is to offer clinicians an alternative method for diagnosing the severity of LDH without relying solely on MRI. To this end, we crafted a machine-learning model utilizing the GBM method, achieving a classification performance with an AUC of 0.875. Once validated in a more extensive clinical study, this model has the potential for practical clinical application. The confluence of these two extensive datasets enabled us to harness a vast array of features for model training. Our study offers valuable contributions, and the innovation in predictive medicine as showcased in the study of LDH marks a substantial advancement in the field. By integrating transcriptomic and metabolomic analyses, this research has enhanced the understanding of the molecular characteristics and underlying mechanisms that distinguish varying symptoms of LDH. Utilizing machine-learning algorithms, this model analyzes a complex array of biomarkers and metabolic pathways, including the notable glycerophospholipid metabolism, to accurately predict the severity of LDH. This predictive approach paves the way for the development of a predictive model that surpasses traditional diagnostic methods such as MRI, and enables clinicians to identify the risk and severity of LDH before it becomes clinically evident, which is of immense assistance for the diagnosis and treatment of LDH. However, our study also has some limitations. Firstly, the sample size is relatively small, necessitating further validation of our conclusions in larger cohorts. Secondly, while we experimentally verified the biomarkers identified, our investigation lacked a deeper exploration of the underlying mechanisms. Future research should delve into these mechanisms, potentially through animal models, to enhance our understanding of the pathogenesis and refine therapeutic strategies. In conclusion, our pilot study integrates transcriptomic and metabolomic profiling to reveal the glycerophospholipid metabolism pathway as a potential biomarker for LDH severity, offering a machine-learning-based diagnostic approach that complements MRI. Further research with larger cohorts and mechanistic exploration is warranted to validate these findings and explore the therapeutic potential. Author contributions Q. Deng: Formal analysis, Methodology, Validation, Visualization, Writing – original draft S. Ren: Formal analysis, Methodology, Validation, Visualization, Writing – original draft N. Zhang: Validation, Visualization G. Li: Validation, Visualization Z. Yu: Validation, Visualization X. Li: Validation, Visualization H. Cui: Validation, Visualization Y. Zhang: Validation, Visualization Y. Zhang: Conceptualization, Supervision, Writing – review & editing J. Chen: Conceptualization, Supervision, Writing – review & editing Funding statement The authors disclose receipt of the following financial or material support for the research, authorship, and/or publication of this article: this study was supported by the Foundation of Jiangsu CM Clinical Innovation Center of Degenerative Bone & Joint Disease (Jiangsu science and education of traditional Chinese medicine (2021) No. 4) and the Foundation of Top Talent Support Program for young and middle-aged people of Wuxi Health Committee (grant number HB2020068), as reported by J. Chen. ICMJE COI statement J. Chen reports funding from the Foundation of Jiangsu CM Clinical Innovation Center of Degenerative Bone & Joint Disease (Jiangsu science and education of traditional Chinese medicine (2021) No. 4) and the Foundation of Top Talent Support Program for young and middle-aged people of Wuxi Health Committee (grant number HB2020068), all related to this study. Data sharing The data that support the findings for this study are available to other researchers from the corresponding author upon reasonable request. Acknowledgements