Abstract

Aims

   While MRI serves as a tool for assessing the severity of lumbar disc
   herniation (LDH), it has been observed that imaging diagnoses do not
   always align with clinical symptoms in nearly half of patients. The
   absence of dependable prognostic biomarkers impedes the early and
   accurate diagnosis of LDH, which is critical for the development of
   further treatment approaches. Thus, the aim of this study was to
   elucidate the molecular mechanisms that determine pain and LDH
   severity.

Methods

   We conducted a pilot study with 55 patients, employing transcriptomic
   and metabolomic analyses on blood samples to identify potential
   biomarkers. A gene-metabolite interaction approach helped in
   identifying the pivotal pathway linked to disease severity. Moreover, a
   machine-learning model was designed to differentiate between patients
   based on the intensity of pain.

Results

   Cholinergic-related glycerophospholipid metabolism emerged as the
   predominant enriched pathway in the severe symptom group via
   gene-metabolite interaction network analysis. Among various models, the
   gradient boosting machines (GBM) model stood out, achieving a
   commendable area under the curve (AUC) of 0.875 in distinguishing
   between the severe and mild symptom groups using combined RNA and
   metabolomics data.

Conclusion

   Integrated molecular profiling of blood biomarkers has highlighted a
   novel determining pathway for LDH severity. This machine-learning
   approach can serve as a valuable predictive tool when MRI findings are
   inconclusive. Future research will focus on validating these biomarkers
   and exploring their potential for personalized medicine approaches.

   Cite this article: Bone Joint Res 2025;14(5):434–447.

   Keywords: Lumbar disc herniation diagnosis, Integrated molecular
   profiling, Glycerophospholipid metabolism, Lumbar disc herniation, MRI,
   biomarkers, RNA, blood, blood biomarkers, Western blot, t-test, gene
   expressions, straight leg raising test

Article focus

     * MRI as a tool for assessing the severity of lumbar disc herniation
       (LDH) has limitations, with nearly half of patients having imaging
       diagnoses that are inconsistent with clinical symptoms. The lack of
       reliable prognostic biomarkers hinders the early and accurate
       diagnosis of LDH, affecting the development of treatment methods.
     * Our pilot study explores the clinical value of blood biomarkers in
       assessing the severity of LDH using bioinformatics methods.

Key messages

     * This study conducted preliminary research on 55 LDH patients and
       performed transcriptomic and metabolomic analyses on blood samples.
     * The comprehensive molecular profile of blood biomarkers provides
       new data support for the assessment of LDH severity.

Strengths and limitations

     * Combining transcriptomics and metabolomics analyses provides a
       comprehensive perspective on important biomarkers of LDH severity.
     * The research results have the potential to improve the early
       diagnosis and personalized treatment strategies of LDH, filling the
       gap between MRI diagnosis and clinical symptoms.
     * The study only included 55 patients and had a relatively small
       sample size, which limited the generalizability and extrapolation
       of the results.
     * Although biomarkers have been confirmed, there is insufficient
       in-depth exploration of the relevant mechanisms, and their role in
       disease development has not yet been fully elucidated.

Introduction

   Lumbar disc herniation (LDH) is a prevalent orthopaedic condition,
   affecting approximately 2% to 3% of the global population, and the
   lifetime risk for symptomatic LDH is 1% to 3% with a higher incidence
   in males.^[42]1 It commonly manifests as a recurring clinical
   condition, characterized by intervertebral disc (IVD) degeneration,
   annulus rupture, and inward shifting of the nucleus pulposus due to
   shear stress. This culminates in the compression of the spinal nerve
   root.^[43]2 LDH is a primary cause of lower back and leg pain,
   typically occurring at the L4-5 or L5-S1 intervertebral levels, and is
   frequently linked to behaviours such as prolonged sitting or
   bending.^[44]3 Early intervention and treatment are essential, as
   untreated LDH can lead to abnormal gait, which may disrupt the
   mechanical equilibrium of the back and lower limb muscles. This
   disruption can greatly affect patients’ daily activities and work
   performance, and can even lead to disability.^[45]4

   Prior research sought to associate MRI findings with discography
   outcomes related to pain responses in patients suffering from low back
   pain.^[46]5,[47]6 While earlier studies did demonstrate notable
   correlations between pain and various MRI and discographic markers, one
   investigation revealed that half of the patients displayed a
   discordance between imaging diagnoses and clinical symptoms.^[48]5 This
   suggests that lumbar IVDs can be classified into two distinct
   categories: those displaying notable imaging anomalies but presenting
   with mild symptoms, and those with minor imaging discrepancies but
   experiencing intense symptoms. These findings underscore the importance
   of pinpointing specific biomarkers to distinguish or forecast these
   patient categories; doing so could substantially aid in tailoring
   therapeutic interventions for these individuals.^[49]7

   Previous omics-based studies have delved into LDH, encompassing
   extensive gene expression profiling in the nucleus pulposus of human
   ruptured LDH^[50]8 as well as a thorough analysis of molecular pathways
   and key genes implicated in LDH.^[51]9 Pro-inflammatory chemokines CCL5
   and CXCL6 have also been identified to be associated with LDH.^[52]10
   However, there has not yet been an integrative analysis combining both
   metabolomics and transcriptomics concerning LDH. By leveraging these
   two methodologies, in this study we aimed to discern the pathway-level
   distinctions between the two variants of LDH.

   In this research, we employed an integrative approach of transcriptomic
   and metabolomic analyses to identify distinctive biomarkers and
   molecular pathways associated with diverse manifestations of LDH.
   Subsequently, we leveraged a suite of sophisticated machine-learning
   algorithms to refine the selection of potential biomarkers and
   construct a robust predictive model. This model is designed to
   delineate different patient profiles. Our study introduces a novel
   predictive tool, capable of precisely assessing the severity of LDH and
   thereby facilitating improved patient prognoses.

Methods

Participant description

   In this study, participants were consecutively recruited from a single
   institution between April and November 2022. They were divided into
   three groups: Group A (mild MRI findings but severe symptoms) (n = 22);
   Group B (severe MRI findings but mild symptoms) (n = 26); and Group C
   (healthy individuals) (n = 7). The study protocol was approved by the
   Ethics Committee of the Wuxi TCM Hospital Affiliated to Nanjing
   University of Chinese Medicine (No. SSB2020122502), and all
   participants provided written informed consent.

   We employed the Michigan State University (MSU) classification system
   to assess the severity of LDH, stratifying the extent of disc
   protrusions into Grades 1, 2, and 3. Utilizing the intra-facet line as
   a reference, we classified the protrusions as follows: Grade 1, where
   the lumbar disc protrusion is less than or equal to 50% of the distance
   from the disc’s posterior edge to the reference line; Grade 2, where
   the lumbar disc protrusion exceeds 50% of the distance from the disc’s
   posterior edge to the reference line; and Grade 3, where the disc
   protrusion surpasses the reference line.^[53]11-[54]13

   Pain intensity was evaluated utilizing the visual analogue scale (VAS),
   a continuous and intuitive measurement tool, with the scoring criteria
   defined as: 0 points indicate the absence of pain; scores of 1 to 3
   correspond to mild pain; scores ranging from 4 to 6 denote moderate
   pain; 7 to 9 points reflect severe pain; and a score of 10 points
   signifies unbearable, intense pain.^[55]14-[56]16

   The inclusion criteria for Group A were: 1) meeting the diagnostic
   criteria for LDH; 2) lumbar spine MRI indicating disc herniation with a
   MSU grade greater than 1; 3) subjective severe pain experienced by the
   patient, intolerable, with a VAS score of fewer than 5 points; 4)
   clinical manifestations encompassing the following: lower limb
   radiating pain, with painful sites including but not limited to the
   anterior thigh, posterior thigh, and areas corresponding to the
   distribution of the affected nerve; abnormal sensations in the lower
   limbs, with diminished light touch sensation in the skin areas
   innervated by the corresponding affected nerve; and positive straight
   leg raising test, straight leg raising test with augmentation,
   contralateral straight leg raising test, or femoral nerve stretch test;
   5) persistent for more than three months.

   The inclusion criteria for Group B were: 1) meeting the diagnostic
   criteria for LDH; 2) MRI indicating disc herniation with an MSU grade
   less than 1; 3) subjective severe pain experienced by the patient,
   intolerable, with a VAS score exceeding 5 points; 4) clinical
   manifestations encompassing the following: lower limb radiating pain,
   with painful sites including but not limited to the anterior thigh,
   posterior thigh, and areas corresponding to the distribution of the
   affected nerve; abnormal sensations in the lower limbs, with diminished
   light touch sensation in the skin areas innervated by the corresponding
   affected nerve; and positive straight leg raising test, straight leg
   raising test with augmentation, contralateral straight leg raising
   test, or femoral nerve stretch test; 5) persistent for more than three
   months.

   The inclusion criteria for Group C were: 1) history of LDH; 2) lumbar
   spine MRI showing no substantial abnormalities; 3) without symptoms of
   nerve root compression corresponding to LDH, such as accompanying low
   back pain for one week or no considerable low back pain, with lower
   limb numbness and soreness as the main symptoms and sensory
   abnormalities in one or more regions of the lumbosacral nerve root
   distribution, local paralysis, limited lumbar mobility, and
   exacerbation of lower limb symptoms during lumbar effort, coughing, and
   sneezing.^[57]17

Peripheral blood mononuclear cells and serum extraction

   Blood samples were obtained within the initial week following
   hospitalization. To discern the critical pathways that determine LDH
   severity, especially in cases undetectable through MRI, we conducted
   transcriptomics and metabolomics analyses.

   Transcriptomics samples: 2 ml of blood from patients meeting the
   inclusion criteria was collected using medical vacuum blood collection
   tubes (ethylenediaminetetraacetic acid (EDTA) anticoagulant tubes).
   Subsequently, peripheral blood mononuclear cells (PBMCs) were isolated
   following the relevant protocol.^[58]18 The cells were then mixed
   thoroughly with 1 ml of TRIzol reagent (Thermo Fisher Scientific, USA),
   left to stand at room temperature for ten minutes, and subsequently
   stored at -80°C for future testing.

   Metabolomics samples: 1) whole blood samples were collected in medical
   vacuum blood collection tubes (coagulant gel tubes), gently inverted to
   mix, and then immediately placed upright in an icebox to stand for 30
   to 60 minutes; 2) after natural sedimentation and layering, the upper
   layer of serum was extracted and transferred to a 1.5 ml screw-cap
   conical centrifuge tube and spun at 13,000 rpm for two minutes; 3) the
   supernatant was then transferred into a new 1.5 ml screw-cap conical
   centrifuge tube, with 200 µl of serum aliquoted into each tube,
   discarding the cell pellet; 4) the aliquoted serum samples were stored
   long-term at -80°C for future analysis.

RNA extraction

   Total RNA was extracted from the PBMCs of LDH patients using TRIzol
   reagent, following the manufacturer’s instructions. The purity of the
   RNA samples was assessed based on the A260/A280 absorbance ratio using
   a Nanodrop ND-2000 system (Thermo Fisher Scientific). Additionally, the
   RNA integrity number (RIN) was determined with an Agilent Bioanalyzer
   4150 system (Agilent Technologies, USA). Only those samples meeting the
   quality criteria were selected for subsequent library construction.

Library preparation and mRNA sequencing experiments

   Paired-end libraries were generated using the ABclonal mRNA-seq Lib
   Prep Kit (ABclonal, China), as per the manufacturer’s guidelines. From
   1 μg of total RNA, messenger RNA (mRNA) was isolated using oligo (dT)
   magnetic beads. These mRNA samples were then fragmented using divalent
   cations under elevated temperatures. First-strand complementary DNA
   (cDNA) was produced from these fragments using random hexamer primers
   and reverse transcriptase (RNase H), and this was followed by the
   synthesis of second-strand cDNA involving DNA polymerase I and other
   reagents. Double-stranded cDNA fragments underwent adapter ligation to
   prepare the library. After PCR amplification of the adaptor-ligated
   cDNA, the products were purified using the AMPure XP system (Thermo
   Fisher Scientific). The quality of the library was verified using the
   Agilent Bioanalyzer 4150 system. Sequencing was conducted on an
   Illumina NovaSeq 6000 or MGISEQ-T7 (Illumina, USA), yielding 150 bp
   paired-end reads.

Transcriptomic analysis

   Raw reads in FASTQ format were initially processed using in-house Perl
   scripts. This step involved removing adapter sequences and filtering
   out reads with low quality (where lines with a quality score of 25 or
   lower constitute more than 60% of the entire read) or an ‘N’ ratio
   exceeding 5%. ‘N’ denotes bases with undetermined information. This
   processing yielded clean reads suitable for subsequent analyses. These
   clean reads were then mapped to a reference genome using the kallisto
   software.^[59]19 Differential expression was analyzed with
   DESeq2.^[60]20,[61]21 Genes with an absolute log2 fold change greater
   than 1 and an adjusted p-value below 0.05 were classified as
   significantly differentially expressed. The differentially regulated
   genes were then subjected to hierarchical clustering, an unsupervised
   machine-learning technique that groups samples based on their
   similarity in gene expression patterns.^[62]22 To understand the
   functional implications of these differentially expressed genes and
   discern differences at the gene functional level between samples, Gene
   Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG)
   enrichment analyses were conducted. The clusterProfiler R package was
   employed for both GO function and KEGG pathway enrichment
   analyses.^[63]23 A p-value below 0.05 indicated significant enrichment
   in either GO or KEGG functions. To explore the influence of LDH
   severity on immune cell composition, we used CibersortX to deconvolute
   the transcriptome data.^[64]24,[65]25

Liquid chromatography–mass spectrometry experiments

   For polar metabolite untargeted metabolomics, we used a Sciex TripleTOF
   6600 quadrupole time-of-flight mass spectrometer (Sciex, Canada)
   combined with hydrophilic interaction chromatography and electrospray
   ionization. The analysis was performed at Shanghai Applied Protein
   Technology (China). Liquid chromatography (LC) separation was conducted
   on an ACQUIY UPLC BEH Amide column with a specified solvent gradient
   (Waters Corporation, Ireland). The flow rate was set at 0.4 ml/minute,
   the column temperature at 25°C, auto sampler temperature at 5°C, and
   the injection volume at 2 μl. The mass spectrometer operated in both
   negative and positive ion modes with specific Electrospray Ionization
   (ESI) source conditions. Mass spectrometry (MS) acquisition ranged from
   60 m/z to 1,000Da, while auto MS/MS acquisition was from 25 m/z to
   1,000Da, using information dependent acquisition (IDA) in high
   sensitivity mode. Specific parameters, including collision energy and
   declustering potential, were set accordingly.

LC-MS analysis

   Raw MS data, in the form of wiff.scan files, were transitioned to MzXML
   format using ProteoWizard MSConvert (ProteoWizard, USA) before being
   introduced to the XCMS software (Scripps Research, USA). Peak
   identification used parameters such as centWave m/z at 25 ppm,
   peakwidth between 10 and 60, and prefilter from 10 to 100. For grouping
   peaks, settings included bw at 5, mzwid at 0.025, and minfrac at 0.5.
   Only ion features with over 50% non-zero measurements in at least one
   group were retained. The in-house database, composed of authentic
   standards, facilitated compound identification. After normalization
   based on peak intensity, the data were imported into SIMCA-P (version
   14.1; Umetrics, Sweden) for advanced analysis, which encompassed both
   Pareto-scaled principal component analysis (PCA) and orthogonal partial
   least squares discriminant analysis (OPLS-DA). Model robustness was
   gauged through seven-fold cross-validation and response permutation
   testing. Each variable’s contribution to classification in the OPLS-DA
   model was denoted by its variable importance in projection value (VIP).
   The standard for significance was an independent-samples t-test, with
   VIP values exceeding 1 and p-values less than 0.05 deemed statistically
   significant.

   Bioinformatics analysis: for KEGG pathway annotation, metabolites were
   matched with the KEGG database to determine their pathways. We then
   conducted an enrichment analysis of these pathways using Fisher’s exact
   test, with a significance threshold of p < 0.05. For hierarchical
   clustering, we used Cluster 3.0 and Java Treeview software (YuLab,
   Southern Medical University (SMU), China). We selected the Euclidean
   distance for similarity and average linkage clustering for grouping.
   Alongside the dendrogram, a heatmap was provided for visual clarity.

Gene-metabolite interaction network analysis

   The MetaboAnalyst5.0^[66]26 offers a robust platform for exploring and
   illustrating the gene-metabolite interaction network between
   functionally associated genes and metabolites.^[67]27,[68]28 Using this
   platform, we sought to identify the joint-pathways most influenced by
   LDH severity. We leveraged high-confidence interactions from the STITCH
   database – a comprehensive resource that catalogues interactions
   between metabolites and target genes – to extract associations between
   chemicals and human genes.^[69]29

Western blot

   PBMCs were extracted and protease and phosphatase inhibitors were added
   in radioimmunoprecipitation assay buffer (RIPA buffer; Solarbio,
   China), and the samples were denatured at 100°C for 15 minutes.
   Subsequent to the separation on a 10% sodium dodecyl
   sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), proteins were
   transferred to polyvinylidene fluoride (PVDF) membranes. These
   membranes were then blocked using a 5% skim milk powder solution for
   one hour. Primary antibodies, including anti-lysophospholipid
   acyltransferase 3 (LPCAT3) antibody (1:500, 67882-1-Ig; Proteintech,
   USA), anti-phosphate cytidylyltransferase 1, choline, alpha (PCYT1A)
   antibody (1:500, NBP1-84366; NOVUS, USA), and
   anti-glyceraldehyde-3-phosphate dehydrogenase (GAPDH) antibody
   (1:1,000, 10494-1-AP; Proteintech), were incubated overnight. Secondary
   antibodies (1:2,000, SA00001-2 and SA00001-1; Proteintech) were then
   applied for two hours at room temperature, followed by detection using
   an Enhanced Chemiluminescence (ECL) kit (Millipore, USA). Finally, the
   protein bands were visualized with the Chemidoc detection system
   (Bio-Rad, USA).

Quantitative real-time polymerase chain reaction

   Total RNA was isolated from PBMCs using TRIzol reagent. cDNA synthesis
   was performed with 500 ng of extracted RNA using the HiScript II
   SuperMix for quantitative real-time polymerase chain reaction (qRT-PCR)
   (Vazyme, China). The qRT-PCR reactions were conducted using SYBR Green
   Master Mix on the ABI 7500 System (Thermo Fisher Scientific). The
   amplification protocol consisted of an initial denaturation at 94°C for
   ten minutes, followed by 45 cycles of denaturation at 94°C for ten
   seconds, and annealing/extension at 60°C for 45 seconds. GAPDH served
   as the housekeeping gene for normalization. The primer sequences of the
   target genes are shown in [70]Table I.

Table I.

   Primer sequences of the target genes.
   Gene   Forward primer sequence (5’-3’) Reverse primer sequence (5’-3’)
   LPCAT3 CAGGATACCTGGTCTGCTTCCA          TGAAGAGCCAGTGGATGGTCTG
   PCYT1A GTTCCTTCCAAAGTGCAGCGCT          AGGAGTTCCTCTGCTGGCTTCT
   GAPDH  AATGGGCAGCCGTTAGGAAA            GCCCAATACGACCAAATCAGAG
   [71]Open in a new tab

   GAPDH, glyceraldehyde-3-phosphate dehydrogenase; LPCAT3,
   lysophospholipid acyltransferase 3.

Enzyme-linked immunosorbent assay (ELISA)

   Levels of LPCAT3 (JL21306; Jianglai Biotechnology, China) and PCYT1A
   (HZ-11306R; HZB Science, Germany) expression in the serum were
   quantified using an enzyme-linked immunosorbent assay (ELISA) kit
   following the manufacturer’s protocol (Jianglai Biotechnology). The
   absorbance of the samples was measured at 450 nm using a microplate
   reader (Biotek, USA). The concentration of each sample was determined
   using a standard curve.

Machine learning

   The efficacy of current methodologies, including subjective patient
   reports and MRI evaluations, has proven to be inadequate in accurately
   discerning symptom severity among patients. To address this problem, we
   developed a machine learning-based model integrating metabolite and
   gene expression data. We normalized RNA and metabolomics datasets and
   merged them into a comprehensive dataset featuring 48 cases, with 22 in
   Group A and 26 in Group B, encompassing a total of 33,488 features. We
   chose gradient boosting machines (GBM), random forests (RF), deep
   learning (DL), and support vector machines (SVM) for their suitability
   for high-dimensional data and classification tasks.^[72]30,[73]31 Each
   algorithm was rigorously trained and cross-validated using the combined
   dataset. We fine-tuned the hyperparameters for each model to optimize
   performance, employing k-fold cross-validation to assess the
   generalizability of the models. We subsequently analyzed the feature
   importance scores derived from the models to identify the most
   influential metabolites and gene expressions. The receiver operating
   characteristic (ROC) curve was used to evaluate the models’
   performance, focusing on the area under the curve (AUC) as a measure of
   the trade-off between sensitivity and specificity.

Statistical analysis

   In this study, different statistical methods were used to analyze the
   data and determine significant differences between groups. We used
   one-way analysis of variance (ANOVA) to evaluate differences between
   two or more groups. Metabolomics differential foreign body screening
   was performed using the Mann-Whitney U test combined with the
   Benjamini-Hochberg method for multiple comparison correction.
   Independent-samples t-test was performed to calculate p-values for
   biomarker expression analysis (western blot, qRT-PCR, ELISA).
   Mann-Whitney U test, Kruskal-Wallis test, and Fisher’s exact test were
   used for metabolomics and transcriptomics data comparison to ensure the
   robustness of the data. Fisher’s exact test was applied in KEGG pathway
   enrichment analysis to assess statistical significance, and machine
   learning models were used to assess the classification performance by
   ROC curve analysis and permutation test. Pearson’s correlation
   coefficient (PCC) or Spearman’s rank correlation coefficient was used
   to analyze the relationship between biomarkers and pain severity.
   Statistical significance was set at p < 0.05.

Results

Transcriptomic analysis of peripheral blood mononuclear cell samples in
severe and mild LDH patients, and controls

   We conducted a comprehensive profiling of 55 paired blood samples
   through transcriptomic and untargeted metabolomic analyses. The cohort
   included: 22 samples from Group A (mild MRI findings but severe
   symptoms); 26 samples from Group B (severe MRI findings but mild
   symptoms); and seven samples from Group C (healthy individuals). The
   details of patient characteristics are presented in [74]Table II. Group
   A exhibits a Grade A lumbar disc protrusion, which is relatively mild,
   but with more severe symptoms ([75]Figure 1a); Group B has a Grade 2
   lumbar disc protrusion, which is more pronounced, but with milder
   symptoms ([76]Figure 1b); while Group C consists of normal IVDs with no
   protrusion and no clinical symptoms ([77]Figure 1c).

Table II.

   Clinical characteristics of patients with different severities of
   lumbar disc herniation (Group A and Group B) and healthy controls
   (Group C).
   Characteristic Group A Group B Group C p-value
   Total, n 22 26 7
   Mean age, yrs (SD) 60.00 (12.29) 60.08 (11.73) 62.57 (9.57) 0.870[78]^*
   Sex, n
   Male 12 9 3
   Female 8 11 4
   Mean BMI, kg/m^2 (SD) 24.08 (1.30) 24.76 (1.52) 24.69 (1.62)
   0.260[79]^*
   Types, n
   Bulging 12 4
   Prolapsing 1 6
   Protruding 7 10
   Anatomical location of LDH, n N/A
   Central type 8 10
   Central lateral type 14 15
   Lateral type 0 1
   Extreme lateral type (level), n N/A
   L3/L4 8 8
   L4/L5 9 16
   L5/S1 5 2
   Mean VAS (SD) 7.36 (0.66) 3.85 (0.97) < 0.001
   [80]Open in a new tab
   ^*

   One-way analysis of variance (ANOVA).
   ^†

   Independent-samples t-test.

   LDH, lumbar disc herniation; N/A, not applicable; VAS, visual analogue
   scale.

Fig. 1.

   [81]Fig. 1
   [82]Open in a new tab

   Clinical profiling of the patient cohorts. a) Group A, with Grade 1
   protrusion, shows relatively mild disc displacement yet presents with
   more severe symptoms. b) Group B, classified as Grade 2, exhibits a
   more noticeable protrusion but milder symptoms. c) Group C is
   characterized by normal intervertebral discs without any protrusion or
   associated clinical symptoms.

   Although there is a discernible distinction between Group A and Group
   B, considerable overlap is evident in the hierarchical clustering
   analysis ([83]Figure 2a). This implies that relying solely on
   transcriptomic data from PBMC may not adequately differentiate between
   Groups A and B. [84]Figures 2b and 2c present the differentially
   regulated genes in comparisons between mild and severe groups and the
   control group. Subsequently, immune infiltration analysis showed that
   in comparison to the severe group (Group A), CD4 memory T cells
   exhibited a significant increase in the severe group, even though the
   overall proportion of this cell type remains low ([85]Figure 2d).
   However, when incorporating the healthy control group into the
   analysis, no significant disparities in cell types were identified
   among the groups ([86]Figure 2e). In conclusion, these findings
   indicate that the overall transcriptome is moderately altered between
   the severe and mild LDH groups.

Fig. 2.

   [87]Fig. 2
   [88]Open in a new tab

   Transcriptomic analysis of peripheral blood mononuclear cells (PBMCs)
   from three groups. a) Heatmap clustering for all samples. b) Volcano
   plots displaying differentially expressed genes between the mild and
   severe groups. c) Volcano plots illustrating differentially expressed
   genes between the mild and control groups. d) CibersortX analysis was
   used to deconvolve the PBMC transcriptome data and compare the severe
   and mild groups. e) CibersortX analysis was employed to compare all
   three groups. *p < 0.05, Mann-Whitney U test. NK cell, natural killer
   cell; ns, non-significant.

Metabolomic analysis of serum samples in patients with severe and mild LDH
and controls

   The untargeted metabolomic profiling showed that in contrast to the
   transcriptomics data, the PCA depicted a clearer distinction among
   groups A, B, and C ([89]Figure 3a). Hierarchical clustering analysis
   also revealed that the majority of samples from groups A and B were
   well-clustered, as illustrated in [90]Figure 3b. A Venn diagram
   identified 38 metabolites that were significantly different (adjusted
   p-value (p-adj) < 0.05, Mann-Whitney U test combined with
   Benjamini-Hochberg method, for multiple comparison correction) in group
   A compared to group B ([91]Figure 3c). In contrast, only 13 metabolites
   were significantly different when comparing group B to group C. This
   suggests that the metabolic differences between the two LDH groups (A
   and B) are more pronounced than between any LDH group and the healthy
   control group. Among the differentially regulated metabolites,
   ‘Linolenic acid’, ‘arginine’, and ‘choline’-related metabolites were
   the most significant ([92]Figure 3d). Intriguingly, the KEGG enrichment
   analysis highlighted the ‘cholinergic synapse’ as one of the most
   enriched pathways ([93]Figure 3e). Collectively, these metabolomic
   findings suggest that the cholinergic pathway might be pivotal in
   influencing symptom severity to a certain degree.

Fig. 3.

   [94]Fig. 3
   [95]Open in a new tab

   Metabolomic profiling of serum from three groups. a) Principal
   component analysis (PCA) across all groups. b) Hierarchical clustering
   analysis for all groups. c) Venn diagram depicting the overlap of
   differentially regulated metabolites. d) Bar plot detailing the classes
   and names of metabolites differentially regulated between the severe
   and mild groups. e) Kyoto Encyclopedia of Genes and Genomes (KEGG)
   pathway enrichment analysis for the differentially regulated compounds.

Glycerophospholipid metabolism is the main enrichment pathway in the severe
group

   The joint-pathway analysis revealed that ‘glycerophospholipid
   metabolism’ emerged as the predominant enriched pathway, characterized
   by 5 hits out of a total of 86 elements ([96]Figures 4a and 4b).
   Another significantly enriched pathway was ‘thiamine metabolism’,
   albeit with only 2 hits out of 14 elements. [97]Figure 4c showcases the
   comprehensive gene-metabolite network pertaining to the
   glycerophospholipid metabolism pathway. A granular network, detailing
   the variations in gene expression and metabolites, can be found in
   Supplementary Figure a.

Fig. 4.

   [98]Fig. 4
   [99]Open in a new tab

   Gene-metabolite network analysis for severe versus mild group. a)
   Bubble plot highlighting the most enriched metabolic networks when
   comparing the severe group to the mild group. b) Table presenting the
   significantly enriched metabolic pathways. c) Overview diagram
   illustrating the glycerophospholipid metabolism pathway.

   We subsequently aimed to corroborate the involvement of the two pivotal
   enzymes, LPCAT3 and PCYT1A, which were found to be prominent within the
   gene-metabolite network exclusively in the severe group. We assessed
   and juxtaposed the expression levels of LPCAT3 and PCYT1A in the PBMCs
   of both patient cohorts. Western blot analysis revealed that the
   expression levels of LPCAT3 and PCYT1A were augmented with increasing
   disease severity ([100]Figures 5a to 5c). These findings were echoed by
   qRT-PCR outcomes, which aligned with the western blot data, thereby
   corroborating the mRNA sequencing (mRNA-seq) results ([101]Figures 5d
   and 5e). Furthermore, ELISA assays indicated that serum concentrations
   of LPCAT3 and PCYT1A were elevated in patients with severe symptoms
   compared to those in the mild group ([102]Figures 5f and 5g).
   Collectively, these findings lend additional support to the
   gene-metabolite network observations, suggesting that the upregulation
   of these key enzymes also occurs at the protein level.

Fig. 5.

   Fig. 5
   [103]Open in a new tab

   The protein and messenger RNA (mRNA) expression level of
   lysophospholipid acyltransferase 3 (LPCAT3) and PCYT1A in the severe
   and mild groups. a) to c) Representative western blot strips of LPCAT3
   and PCYT1A in the severe and mild groups with quantitative analysis (n
   = 3). d) and e) Quantitative real-time polymerase chain reaction
   (qRT-PCR) results of LPCAT3 and PCYT1A in the lumbar disc herniation
   (LDH) patients with mild and severe symptoms with relative quantitative
   analysis (n = 3). f) and g) Enzyme-linked immunosorbent assay (ELISA)
   results of LPCAT3 and PCYT1A with quantitative analysis (n = 3).
   Independent-samples t-test was used for comparison: *p < 0.05, **p <
   0.01, ***p < 0.001. GAPDH, glyceraldehyde-3-phosphate dehydrogenase.

Utilizing machine-learning algorithms for distinguishing between severe and
mild groups in diagnosis

   The machine learning models’ performances were exceptional. Among the
   models we tested, the GBM model outperformed the others with an
   impressive AUC of 0.875 in distinguishing between the two groups based
   on RNA and metabolomics data ([104]Figure 6a). As the number of
   features increases, the area under the receiver operating
   characteristic curve (AUROC) also rises, hitting a plateau when the
   feature count reaches 10 ([105]Figure 6b). Notably, the GBM algorithm
   demonstrated exceptional performance with a mean accuracy of 0.873 (SD
   0.089) and a mean AUC of 0.875 (SD 0.083) ([106]Figure 6c). The
   expression levels of the ten features in both Group A and Group B are
   also depicted in [107]Figure 6d. In summary, the model’s exceptional
   performance underscores its ability to differentiate between the severe
   and mild LDH groups effectively, even in cases where MRI has failed to
   make this distinction.

Fig. 6.

   [108]Fig. 6
   [109]Open in a new tab

   Diagnostic machine learning algorithms to differentiate severe from
   mild groups. a) Receiver operating characteristic (ROC) curves for
   group classification using RNA and metabolomics data via gradient
   boosting machine (GBM), random forest (RF), deep learning (DL), and
   support vector machine (SVM) algorithms. b) Graph depicting the
   relationship between area under the receiver operating characteristic
   curve (AUROC) and the number of features used. c) Table detailing the
   mean performance metrics of various machine-learning models for group
   classification. d) Box plots illustrating the variance in the top ten
   features between the severe and mild groups. AUC, area under the curve.

Discussion

   For the first time, we have integrated both transcriptomic and
   metabolomic analyses to assist in diagnosing or distinguishing the
   severity of LDH. Our focus was on patients for whom MRI failed to
   provide clear differentiation. We aimed to pinpoint key pathways at the
   transcriptomic and metabolomic levels, shedding light on potential
   therapeutic targets for this disease. While several studies have
   documented significant gene expression changes in the nucleus pulposus
   tissue level between disease and control groups,^[110]7 our study found
   only moderate gene expression differences when comparing severe and
   mild LDH PBMC samples. This suggests that relying solely on blood gene
   expression markers for differentiation remains a challenge.

   In terms of metabolomic changes, prior research has highlighted that
   Lactobacillus paracasei S16 can alleviate LDH by modulating both the
   inflammatory response and gut microbiota.^[111]32 Another study,
   utilizing gas chromatography coupled with MS, sought to identify
   biomarker metabolites distinguishing LDH patients from healthy
   controls. Echoing our findings, they observed that approximately 30
   metabolites in patient urine samples were significantly
   altered.^[112]33 This reinforces the idea that relying solely on
   metabolomic analysis is insufficient for diagnosing or determining the
   severity of the disease.

   By integrating metabolomic and transcriptomic analyses, we can capture
   changes at both the enzyme and metabolite levels in the disease group.
   This offers a comprehensive view of pathway alterations. Such an
   approach has been employed in studies on cancer research,^[113]34
   diabetes,^[114]35 birth defects,^[115]36 and various other diseases. In
   this study, a combined analysis of metabolomics and transcriptomics
   pinpointed ‘glycerophospholipid metabolism’ as a crucial pathway in
   determining severity. While metabolomic analysis on its own did
   highlight several associated metabolites – such as choline and other
   glycerophospholipid-related compounds – that were significantly altered
   in the severe group, it was the gene-metabolite network that distinctly
   indicated this pathway. This was due to observed changes in the
   expression levels of both upstream and downstream enzymes.

   Prior research has associated glycerophospholipid metabolism with
   chronic pain.^[116]37 Another study identified significant effects on
   glycerophospholipid metabolism in the complete Freund’s adjuvant
   (CFA)-induced inflammatory pain model, suggesting that it may amplify
   inflammatory reactions and trigger persistent pain.^[117]38 In our
   analysis, phosphate cytidylyltransferase 1 (PCYT1A), which catalyzes a
   critical rate-limiting step in the cytidine diphosphate (CDP)-choline
   pathway for phosphatidylcholine biosynthesis, was markedly elevated in
   the severe group ([118]Figure 4c). Concurrently, choline ([119]C00114),
   a pivotal compound in this pathway, also showed a significant increase.
   In a related pathway, both LPCAT3 and its precursor metabolite,
   1-Acyl-sn-glycero-3-phosphocholine ([120]C04230), were elevated. These
   pathways suggest that the downstream metabolite, phosphatidylcholine
   ([121]C00157), should have been upregulated. However, this anticipated
   result was not observed, suggesting that phosphatidylcholine might be
   rapidly consumed. Given that phosphatidylcholine is a foundational
   component for biological membranes and is also involved in
   acetylcholine synthesis,^[122]39 our findings underline the notable
   role glycerophospholipid metabolism plays in determining LDH severity.
   Indeed, a previous study indicated that the loss of Pcyt1a affects
   macrophages, leading to accelerated phosphatidylcholine turnover in
   these cells, which in turn promotes adipose tissue inflammation in
   obesity.^[123]40 Another study highlighted that the epigenetic
   suppression of LPCAT1 and PCYT1A correlates with lipidomic changes in
   polycystic ovary syndrome. Moving forward, research should investigate
   the accelerated consumption of phosphatidylcholine and the mechanisms
   behind it. Employing lipidomics will be essential in elucidating the
   shifts in glycerophospholipid metabolism in subsequent studies.

   The primary objective of this integrated analysis is to offer
   clinicians an alternative method for diagnosing the severity of LDH
   without relying solely on MRI. To this end, we crafted a
   machine-learning model utilizing the GBM method, achieving a
   classification performance with an AUC of 0.875. Once validated in a
   more extensive clinical study, this model has the potential for
   practical clinical application. The confluence of these two extensive
   datasets enabled us to harness a vast array of features for model
   training.

   Our study offers valuable contributions, and the innovation in
   predictive medicine as showcased in the study of LDH marks a
   substantial advancement in the field. By integrating transcriptomic and
   metabolomic analyses, this research has enhanced the understanding of
   the molecular characteristics and underlying mechanisms that
   distinguish varying symptoms of LDH. Utilizing machine-learning
   algorithms, this model analyzes a complex array of biomarkers and
   metabolic pathways, including the notable glycerophospholipid
   metabolism, to accurately predict the severity of LDH. This predictive
   approach paves the way for the development of a predictive model that
   surpasses traditional diagnostic methods such as MRI, and enables
   clinicians to identify the risk and severity of LDH before it becomes
   clinically evident, which is of immense assistance for the diagnosis
   and treatment of LDH.

   However, our study also has some limitations. Firstly, the sample size
   is relatively small, necessitating further validation of our
   conclusions in larger cohorts. Secondly, while we experimentally
   verified the biomarkers identified, our investigation lacked a deeper
   exploration of the underlying mechanisms. Future research should delve
   into these mechanisms, potentially through animal models, to enhance
   our understanding of the pathogenesis and refine therapeutic
   strategies.

   In conclusion, our pilot study integrates transcriptomic and
   metabolomic profiling to reveal the glycerophospholipid metabolism
   pathway as a potential biomarker for LDH severity, offering a
   machine-learning-based diagnostic approach that complements MRI.
   Further research with larger cohorts and mechanistic exploration is
   warranted to validate these findings and explore the therapeutic
   potential.

   Author contributions

   Q. Deng: Formal analysis, Methodology, Validation, Visualization,
   Writing – original draft

   S. Ren: Formal analysis, Methodology, Validation, Visualization,
   Writing – original draft

   N. Zhang: Validation, Visualization

   G. Li: Validation, Visualization

   Z. Yu: Validation, Visualization

   X. Li: Validation, Visualization

   H. Cui: Validation, Visualization

   Y. Zhang: Validation, Visualization

   Y. Zhang: Conceptualization, Supervision, Writing – review & editing

   J. Chen: Conceptualization, Supervision, Writing – review & editing

   Funding statement

   The authors disclose receipt of the following financial or material
   support for the research, authorship, and/or publication of this
   article: this study was supported by the Foundation of Jiangsu CM
   Clinical Innovation Center of Degenerative Bone & Joint Disease
   (Jiangsu science and education of traditional Chinese medicine (2021)
   No. 4) and the Foundation of Top Talent Support Program for young and
   middle-aged people of Wuxi Health Committee (grant number HB2020068),
   as reported by J. Chen.

   ICMJE COI statement

   J. Chen reports funding from the Foundation of Jiangsu CM Clinical
   Innovation Center of Degenerative Bone & Joint Disease (Jiangsu science
   and education of traditional Chinese medicine (2021) No. 4) and the
   Foundation of Top Talent Support Program for young and middle-aged
   people of Wuxi Health Committee (grant number HB2020068), all related
   to this study.

   Data sharing

   The data that support the findings for this study are available to
   other researchers from the corresponding author upon reasonable
   request.

   Acknowledgements