Abstract

Background

   Existing biomarkers for epithelial ovarian cancer (EOC) have
   demonstrated limited sensitivity and specificity. This study aimed to
   investigate plasma protein and metabolite characteristics of EOC and
   identify novel biomarker candidates for noninvasive diagnosis and
   differential diagnosis.

Methods

   In this prospective diagnostic cohort study, plasma was preoperatively
   collected from 536 consecutive patients presenting with
   imaging-suspected adnexal masses, uterine fibroids, or pelvic organ
   prolapse. After exclusions, the final cohort comprised 251
   participants: EOC (n = 97), borderline ovarian tumors (n = 38), benign
   ovarian tumors (n = 54), and healthy controls (n = 62). Proteomic and
   metabolomic profiling was performed. A machine learning model was
   trained on a training cohort (34 EOC patients and 62 non-OC individuals
   [borderline, benign, and healthy controls]) to distinguish EOC from
   other groups. The model was validated in two independent cohorts:
   validation cohort 1 (n = 25) and validation cohort 2 (n = 130) using
   targeted proteomics and untargeted metabolomics. External
   transcriptomic datasets (TCGA-OV, GTEx bulk RNA-seq; [44]GSE180661
   scRNA-seq) were leveraged to validate TDO2 upregulation in ovarian
   cancer tissues, particularly in fibroblasts. This TDO2 upregulation
   were experimentally confirmed through quantitative PCR,
   immunohistochemistry, and immunofluorescence using clinical specimens.

Results

   We identified significant protein alterations in EOC patients’ plasma,
   implicating dysregulated metabolic and PI3K-Akt signaling pathways.
   Metabolite analysis further revealed aberrant sphingolipid metabolism,
   steroid hormone biosynthesis, and tryptophan metabolism in EOC
   patients’ plasma. A diagnostic panel comprising 4 proteins (LRG1,
   ITIH3, PDIA4, and PON1) and 3 metabolites (kynurenine, indole, and
   3-hydroxybutyrate) achieved an AUC of 0.975 (95% CI 0.943–0.997) with
   95.2% sensitivity and 91.2% specificity in the training cohort.
   Critically, the model demonstrated robust generalizability in two
   independent validation cohorts: validation cohort 1 (AUC = 0.962, 95%
   CI 0.878–1.000) and validation cohort 2 (AUC = 0.965, 95% CI
   0.921–0.995). Furthermore, fibroblasts with high expression of
   tryptophan 2,3-dioxygenase are contributing factors to elevated levels
   of kynurenine.

Conclusions

   Our findings provide novel insights into the EOC metabolic and protein
   landscape. We developed and validated a plasma classifier demonstrating
   high sensitivity and specificity, which effectively distinguishes EOC
   patients from non-OC individuals. This classifier could enhance
   preoperative diagnostic accuracy and aid in differential diagnosis.

Supplementary Information

   The online version contains supplementary material available at
   10.1186/s12916-025-04341-2.

   Keywords: Proteome, Metabolome, EOC, Diagnosis, Differential diagnosis

Background

   Ovarian cancer (OC) remains the most lethal gynecological malignancy,
   accounting for over 200,000 annual deaths globally [[45]1].
   Histologically, epithelial ovarian cancer (EOC) constitutes > 90% of
   cases, classified into five major subtypes: high-grade serous (up to
   75%), low-grade serous (< 5%), endometrioid (10%), clear cell (6%), and
   mucinous carcinomas (< 5%). Borderline serous/mucinous tumors,
   characterized by moderate cellular proliferation and atypia, represent
   an intermediate pathological category [[46]2, [47]3]. Approximately 75%
   of EOC patients present with advanced-stage disease at diagnosis due to
   the disease’s asymptomatic nature, correlating with stark survival
   disparities: 92% 5-year survival for early-stage disease vs. 29% for
   late-stage disease [[48]4–[49]6].

   Pelvic masses affect approximately one in six women during their
   lifetime, with the vast majority being benign and safely manageable by
   general gynecologists [[50]7, [51]8]. Nevertheless, preoperative risk
   stratification remains critical, as numerous studies have consistently
   demonstrated that EOC patients achieve significantly better outcomes
   when their surgeries are performed by experienced gynecologic
   oncologists at specialized cancer centers[[52]9, [53]10]. Effective
   symptom-triggered testing is crucial for timely referral to gynecologic
   oncologists, which may improve prognosis by increasing the likelihood
   of complete cytoreduction [[54]11]. Nevertheless, 50–70% of patients
   with adnexal masses (later pathologically confirmed as EOC) are not
   appropriately referred under current triage protocols [[55]12].

   Despite decades of biomarker research, clinical challenges persist.
   CA-125 though widely used, exemplifies these limitations: while
   elevated in 90% of advanced EOC cases, it detects only 50% of
   early-stage disease and yields frequent false positives in
   premenopausal women (e.g., endometriosis) [[56]8, [57]13, [58]14]. Two
   landmark screening trials (UKCTOCS and PLCO) demonstrated that CA-125
   combined with transvaginal ultrasound increased early-stage detection
   rates but failed to reduce mortality [[59]15–[60]18]. This paradox
   underscores fundamental biological constraints: while screening
   improves stage shift, inherent tumor aggressiveness and limited
   therapeutic advances attenuate survival benefits [[61]19]. Multi-marker
   strategies show incremental progress. Combinatorial algorithms like
   ROMA (Risk of Ovarian Malignancy Algorithm) integrate HE4, CA-125, and
   menopausal status, achieving superior diagnostic accuracy
   [[62]20–[63]23]. Multivariate assays such as OVA1® (measuring
   β2-microglobulin, apolipoprotein A1, transthyretin, CA-125, and
   transferrin) and Overa® (incorporating HE4, CA-125, apolipoprotein A1,
   follicle-stimulating hormone, and TIMP-1) further refine preoperative
   risk assessment [[64]24–[65]27]. However, real-world data reveal
   persistent gaps—only 30–50% of confirmed EOC cases receive appropriate
   referrals [[66]12].

   Emerging multi-omics technologies offer transformative potential.
   Notably, proteomic and metabolomic approaches have proven useful in
   biomarker discovery [[67]28–[68]32]. Proteomic analyses have identified
   a novel biomarker panel (EEF1G, MSLN, BCAM, and TAGLN2) for high-grade
   serous ovarian cancer (HGSOC) [[69]33]. Additionally, studies suggest
   that the combination of TTR, Hb, ApoAI, and TF with CA-125 could
   significantly improve the detection of early-stage ovarian cancer
   [[70]34]. Concurrently, metabolomic profiling reveals that metabolic
   alterations may differentiate serous ovarian cancer from benign ovarian
   tumors, highlighting the potential of plasma metabolites as diagnostic
   biomarkers [[71]35]. The evident transition from single-marker to
   multi-marker strategies in ovarian cancer screening underscores the
   enhanced diagnostic reliability of multimodal approaches. Building on
   this paradigm shift, our study aimed to develop an integrative
   algorithm combining proteomic profiles with metabolomic signatures,
   thereby capturing systemic pathophysiological alterations across
   biological hierarchies.

   Our study identified differentially abundant proteins and metabolites
   in the plasma of EOC patients, offering valuable insights for
   subsequent mechanistic exploration and novel tumor biomarker
   development. The innovation of this work lies in integrating downstream
   cellular products—proteins and metabolites—to comprehensively
   characterize the EOC plasma profile. Theoretically, these two molecular
   hierarchies exhibit the most significant pathological alterations, and
   our diagnostic approach synergizes the dynamic sensitivity of
   metabolites with the stability of proteins. Our preoperative diagnostic
   model demonstrated robust performance, constructed and validated using
   proteomic and metabolomic data from a real-world, unselected cohort of
   EOC patients.

Methods

Patients and samples

   This prospective diagnostic cohort study was approved by the
   Independent Ethics Committee for Clinical Research and Animal Trials of
   the First Affiliated Hospital of Sun Yat-sen University (approved
   number: 2022–510). Written informed consent was obtained from all
   patients prior to any study procedures. Between April 2023 and April
   2024, preoperative plasma samples were consecutively collected from 536
   participants recruited from the Department of Obstetrics and
   Gynecology, the First Affiliated Hospital of Sun Yat-sen University:
   (1) patients with imaging-suspected adnexal masses (all ages); (2)
   patients predominantly aged around 50 years with uterine fibroids or
   pelvic organ prolapse. Histopathological diagnosis for all enrolled
   patients with treatment-naïve EOC, borderline ovarian tumors (BOT), or
   benign ovarian tumors was independently confirmed by two
   board-certified pathologists. Exclusion criteria were as follows: (1)
   active infections, (2) autoimmune diseases (e.g., systemic lupus
   erythematosus, rheumatoid arthritis), (3) chronic inflammatory
   disorders (e.g., inflammatory bowel disease), and (4) history of other
   malignant tumors. These conditions were excluded due to their potential
   confounding effects on plasma protein and metabolite profiles. The
   final cohort comprised: 97 women with EOC, 38 women with BOT, 54 women
   with benign ovarian tumors, and 62 women with uterine fibroids or
   pelvic organ prolapse as healthy controls (Additional file 1: Fig. S1).
   The training cohort (n = 96) was used for biomarker discovery and model
   construction. The validation cohort 1 (n = 25) and the validation
   cohort 2 (n = 130) were applied to validate the model performance. The
   plasma of patients in the training cohort and the validation cohort 1
   were collected between April 2023 and September 2023. The validation
   cohort 2 was enrolled as an independent cohort from September 2023 to
   April 2024. Demographic and clinical information including age, body
   mass index (BMI), CA-125, HE4, family history of cancer, parity
   history, FIGO stage, and histological subtype was recorded
   (Table [72]1).

Table 1.

   Description of demographic and clinical characteristics of patients
   Training set and validation cohort 1 (n = 121) Validation cohort 2
   (n = 130)
   Number Epithelial ovarian cancer (n = 46) Borderline ovarian tumor
   (n = 16) Benign ovarian tumor (n = 26) Healthy control (n = 33)
   Epithelial ovarian cancer (n = 51) Borderline ovarian tumor (n = 22)
   Benign ovarian tumor (n = 28) Healthy control (n = 29)
   Age (year) (mean (SD)) 53.17 (10.23) 38.69 (9.82) 44.19 (13.52) 53.79
   (6.98) 54.24 (12.65) 38.00 (13.49) 47.00 (16.41) 53.00 (10.62)
   Height (cm) (mean (SD)) 157.55 (5.20) 161.13 (5.58) 157.50 (5.23)
   157.82 (5.03) 158.05 (5.02) 158.00 (6.06) 158.86 (5.84) 158.07 (4.94)
   Weight (kg) (mean (SD)) 55.65 (7.17) 57.57 (7.80) 57.93 (10.28) 56.76
   (7.31) 58.55 (8.99) 56.08 (7.60) 61.70 (10.41) 59.20 (8.96)
   BMI (kg/m^2) (mean (SD)) 22.43 (2.69) 22.16 (2.76) 23.30 (3.73) 22.79
   (2.72) 23.39 (2.97) 22.49 (3.01) 24.46 (4.04) 23.69 (3.35)
   CA125 (units/ml) (median [IQR]) 722.80 [197.58, 1785.90] 33.15 [20.30,
   377.25] 15.80 [12.10, 24.53] NA 803.25 [132.02, 1554.95] 26.75 [14.90,
   104.28] 15.85 [12.88, 17.88] NA
   HE4 (pmol/l) (median [IQR]) 257.70 [137.10, 430.10] 45.05 [34.05,
   49.65] 36.60 [27.30, 42.10] NA 155.30 [68.90, 401.80] 41.35 [31.12,
   59.70] 34.60 [27.90, 40.00] NA
   Family history (%)
    No 31 (67.39) 14 (87.50) 24 (92.31) 28 (84.85) 36 (70.59) 17 (77.27)
   23 (82.14) 26 (89.66)
    Yes 15 (32.61) 2 (12.50) 2 (7.69) 5 (15.15) 15 (29.41) 5 (22.73) 5
   (17.86) 3 (10.34)
   Parity (%)
    No 6 (13.04) 6 (37.50) 5 (19.23) 2 (6.06) 7 (13.73) 9 (40.91) 4
   (14.29) 1 (3.45)
    Yes 40 (86.96) 10 (62.50) 21 (80.77) 31 (93.94) 44 (86.27) 13 (59.09)
   24 (85.71) 28 (96.55)
   Menopause (%)
    No 21 (45.65) 12 (75.00) 18 (69.23) 12 (36.36) 20 (39.22) 19 (86.36)
   14 (50.00) 16 (55.17)
    Yes 25 (54.35) 4 (25.00) 8 (30.77) 21 (63.64) 31 (60.78) 3 (13.64) 14
   (50.00) 13 (44.83)
   Tumor size (cm) (%)
    ≤ 10 20 (43.48) 9 (56.25) 21 (80.77) NA 27 (52.94) 12 (54.55) 18
   (64.29) NA
    > 10 26 (56.52) 7 (43.75) 5 (19.23) NA 24 (47.06) 10 (45.45) 10
   (35.71) NA
   FIGO Stage for ovarian cancer (%)
    I-II 11 (23.91) NA NA NA 16 (31.37) NA NA NA
    III-IV 35 (76.09) NA NA NA 35 (68.63) NA NA NA
   Histological type (%)
    Serous 32 (69.57) 8 (50.00) 10 (38.46) NA 34 (66.67) 8 (36.36) 13
   (46.43) NA
    Mucinous 3 (6.52) 8 (50.00) 12 (46.15) NA 5 (9.80) 10 (45.45) 14
   (50.00) NA
    Endometriod 7 (15.22) 0 (0.00) 0 (0.00) NA 2 (3.92) 3 (13.64) 0 (0.00)
   NA
    Clear cell 3 (6.52) 0 (0.00) 0 (0.00) NA 7 (13.73) 0 (0.00) 0 (0.00)
   NA
    Other 1 (2.17) 0 (0.00) 4 (15.38) NA 3 (5.88) 1 (4.55) 1 (3.57) NA
   [73]Open in a new tab

   Age, age at diagnosis; Height, height at diagnosis; Weight, weight at
   diagnosis; BMI, BMI at diagnosis; CA125, CA125 blood test performed at
   diagnosis; HE4, HE4 test of blood collected at diagnosis; Family
   history, family history of malignant tumors; Parity, parity before
   diagnosis; Menopause, Menopause before diagnosis or not; Tumor size,
   tumor size described by ultrasound; FIGO Stage for ovarian cancer, FIGO
   Stage I-II, Tumor confined to ovaries or fallopian tube(s). Or tumor
   involves one or both ovaries or fallopian tubes with pelvic extension
   (below pelvic brim) or primary peritoneal cancer. FIGO Stage III-IV,
   tumor involves one or both ovaries, or fallopian tubes, or primary
   peritoneal cancer, with cytologically or histologically confirmed
   spread to the peritoneum outside the pelvis and/or metastasis to the
   retroperitoneal lymph nodes. Or distant metastasis excluding peritoneal
   metastases. Histological type, histological type based on the patient’s
   pathology report

   All blood samples were collected after an 8–12-h fasting period using
   K[2]-EDTA Blood Collection Tubes (Becton, Dickinson and Company, USA).
   After collection, tubes were immediately placed on ice and processed
   within 4 h. Plasma separation was performed by centrifugation at
   1750 × g for 10 min at 4 °C. The supernatant plasma was aliquoted into
   cryovials and stored at − 80 °C until analysis.

Proteome analysis

Sample preparation

   High abundance proteins were removed by ProteoMiner™ Protein Enrichment
   Small-Capacity Kit (Bio-Rad Laboratories, Hercules, CA, USA; Cat#
   1633006). according to the kit’s instruction. Protein concentration was
   measured via BCA assay, and equal amounts were digested. Proteins were
   reduced with 10 mM DTT (37 °C, 45 min), alkylated with 50 mM
   iodoacetamide (dark, 15 min), and precipitated with acetone (−20 °C,
   2 h). The pellet was resuspended in 25 mM ammonium bicarbonate,
   digested overnight with trypsin (37 °C), and desalted using C18
   cartridges.

Fractionation

   To generate a comprehensive library for the diaPASEF experiments,
   peptides were fractionated on a reversed-phase column (XBridge™ BEH300
   C18 column, 4.6um × 250 mm, 3.5 µm 100 Å (Waters, Milford, MA, USA)) at
   pH 10 with Vanquish Core system (Thermo Fisher Scientific, Waltham, MA,
   USA). Fifty-two fractions were collected, concatenated into 10, dried,
   and reconstituted in 0.1% formic acid.

Liquid chromatography-mass spectrometry (LC–MS) analysis

   The sample was separated using the NanoElute UHPLC system with
   nanoliter flow rates. Mobile phase A consisted of 0.1% formic acid in
   water, and mobile phase B consisted of 0.1% formic acid in acetonitrile
   (100% acetonitrile). The sample was loaded by an autosampler onto the
   analytical column (IonOpticks, Australia, 25 cm × 75 μm, C18 packing,
   1.6 μm particle size) for separation. The column temperature was
   controlled at 50 °C using an integrated column oven. The sample load
   was 200 ng, and the flow rate was set to 300 nL/min with a gradient
   duration of 40 min. The liquid chromatography gradient was as follows:
   0 to 25 min: mobile phase B increased from 2 to 22%; 25 to 30 min:
   mobile phase B increased linearly from 22 to 35%; 30 to 35 min: mobile
   phase B increased linearly from 35 to 80%; 35 to 40 min: mobile phase B
   was maintained at 80%.

   The mixed samples were first subjected to mass spectrometry data
   acquisition using the ddaPASEF mode of the timsTOF Pro2 (Bruker
   Daltonics, Germany) to establish appropriate acquisition windows for
   the subsequent diaPASEF method. The effective gradient for the analysis
   was 40 min, and the detection mode was set to positive ions. The
   precursor ion scan range was 100–1700 m/z, with an ion mobility range
   of 1/K[0] from 0.85 to 1.3 V·s/cm^2. The ion accumulation and release
   time was set to 100 ms, with near 100% ion utilization efficiency. The
   capillary voltage was 1500 V, the drying gas flow rate was 3 L/min, and
   the drying temperature was 180 °C. For DDA-PASEF mode, the acquisition
   parameters were as follows: 4 MS/MS scans per cycle (with a total cycle
   time of 0.53 s), a charge range of 0–5, a dynamic exclusion time of
   0.4 min, an ion target intensity of 10,000, and an ion intensity
   threshold of 1500. The collision energy increased linearly with ion
   mobility, ranging from 27 eV at 1/K[0] = 0.85 V·s/cm^2 to 45 eV at
   1/K[0] = 1.3 V·s/cm^2. The quadrupole isolation width was set to 2 Th
   when m/z < 700 and 3 Th when m/z > 800. In diaPASEF mode, the
   acquisition parameters were as follows: a mass range of approximately
   400–1200, a mobility range of 0.85–1.3 V·s/cm^2, a mass width of 25 Da,
   a mass overlap of 0.1, 24 mass steps per cycle, and 2 mobility windows,
   resulting in a total of 48 acquisition windows. The average acquisition
   cycle time was 1.17 s.

Data processing

   The search engines used in this project were DIA-NN (v1.8.0) for DIA
   data and MSFragger (v3.8) for DDA data. For the DDA mass spectrometry
   data, the key search parameters were as follows: the database used was
   the uniprot_proteome_UP000005640_human_20230504.fasta database
   (containing 82,492 sequences). The mass tolerances for precursor ions
   and fragment ions were both set to 20 ppm. Carbamidomethylation of
   cysteine (Carbamidomethyl [C]) was selected as a fixed modification,
   while oxidation of methionine (Oxidation [M]) and acetylation of the
   protein N-terminus (Acetyl [Protein N-term]) were set as variable
   modifications. The enzyme used was strict trypsin with a maximum of two
   missed cleavages allowed. For DIA mass spectrometry data, the spectral
   library search method was used. The parameters included importing the
   qualitative results from the DDA search of this project to build a
   spectral library. The Match Between Runs (MBR) option was selected to
   generate a spectral library from the DIA data, and this library was
   used to reanalyze the DIA data for protein quantification. The false
   discovery rate (FDR) at both the precursor ion and protein levels was
   filtered to 1%. The filtered data were used for subsequent
   bioinformatics analysis.

Quality control of proteome data

   The quality of proteomic data was ensured at multiple levels. First, a
   HeLa cell digest was used for instrument performance evaluation. We
   also run water samples as blanks every 10 injections to avoid
   carry-over. A pooled sample, comprising a mixture of all peptide
   samples, was analyzed as a Quality control-mix every 10 injections.
   Plasma samples of four patient groups from both training and validation
   cohorts were randomly distributed.

Metabolome analysis

Sample preparation

   Metabolites were extracted from 50 µL plasma using ACN: methanol (1:4)
   with internal standards. After vortexing, centrifugation (12,000 rpm,
   10 min), and freezing (− 20 °C, 30 min), supernatants were analyzed.

LC–MS analysis

   All samples were for two LC/MS methods. One aliquot was analyzed using
   positive ion conditions and was eluted from T3 column (Waters ACQUITY
   Premier HSS T3 Column 1.8 µm, 2.1 mm * 100 mm) using 0.1% formic acid
   in water as solvent A and 0.1% formic acid in acetonitrile as solvent B
   in the following gradient: 5 to 20% in 2 min, increased to 60% in the
   following 3 min, increased to 99% in 1 min and held for 1.5 min, then
   come back to 5% mobile phase B within 0.1 min, held for 2.4 min. The
   analytical conditions were as follows, column temperature, 40 °C; flow
   rate, 0.4 mL/min; injection volume, 4 μL. Another aliquot was using
   negative ion conditions and was the same as the elution gradient of
   positive mode.

   The data acquisition was operated using the information-dependent
   acquisition (IDA) mode using Analyst TF 1.7.1 Software (Sciex, Concord,
   ON, Canada). The source parameters were set as follows: ion source gas
   1 (GAS1), 50 psi; ion source gas 2 (GAS2), 50 psi; curtain gas (CUR),
   25 psi; temperature (TEM), 550 °C; declustering potential (DP), 60 V,
   or − 60 V in positive or negative modes, respectively; and ion spray
   voltage floating (ISVF), 5000 V or − 4000 V in positive or negative
   modes, respectively. The TOF MS scan parameters were set as follows:
   mass range, 50–1000 Da; accumulation time, 200 ms; and dynamic
   background subtract, on. The product ion scan parameters were set as
   follows: mass range, 25–1000 Da; accumulation time, 40 ms; collision
   energy, 30 or − 30 V in positive or negative modes, respectively;
   collision energy spread, 15; resolution, UNIT; charge state, 1 to 1;
   intensity, 100 cps; exclude isotopes within 4 Da; mass tolerance,
   50 ppm; maximum number of candidate ions to monitor per cycle, 18.

Data processing

   The original data file was converted into mzXML format by ProteoWizard
   software. Peak extraction, peak alignment, and retention time
   correction were respectively performed by XCMS program. The “SVR”
   method was used to correct the peak area. The peaks with detection rate
   lower than 50% in each group of samples were discarded. The blank value
   is then KNN filled. After that, metabolic identification information
   was obtained by searching the laboratory’s self-built database,
   integrated public database, AI database, and metDNA. Substances with an
   identification score above 0.5 and that exhibited a coefficient of
   variation (CV) below 0.3 in quality control (QC) samples were retained;
   positive and negative ionization modes were subsequently merged,
   retaining the substance with the highest qualitative grade and smallest
   CV value.

Quality control of metabolome analysis

   Several types of QC were included in the experiment. Mixed standards
   for targeted detection are used for monitoring instrument variation
   before the project analysis. QC are prepared by mixing sample extracts
   and are used to analyze the reproducibility of samples under the same
   processing methods. During instrumental analysis, a QC sample is
   typically inserted after every 10 analytical samples to monitor the
   reproducibility of the analysis process. Solvent served as blanks.
   Insert one blank after every 10 samples, extract the chromatogram of
   the internal standard in the blank solvent, and examine the impact of
   residues.

Targeted protein analysis

Sample preparation

   Proteins were extracted, quantified (BCA), and digested similarly to
   proteome analysis.

LC–MS/MS

   Liquid chromatography (LC) was performed on a nanoElute UHPLC (Bruker
   Daltonics, Germany). About 200 ng peptides were separated within 40 min
   at a flow rate of 0.3 µL/min on a commercially available reverse-phase
   C18 column with an integrated CaptiveSpray Emitter (25 cm × 75 μm ID,
   1.6 μm, Aurora Series with CSI, IonOpticks, Australia). Mobile phases A
   and B were produced with 0.1% formic acid in water and 0.1% formic acid
   in ACN. Mobile phase B was increased from 2 to 22% over the first
   25 min, increased to 35% over the next 5 min, further increased to 80%
   over the next 5 min, and then held at 80% for 5 min. The LC was coupled
   online to a hybrid timsTOF Pro2 via a CaptiveSpray nano-electrospray
   ion source (CSI).

   The timsTOF Pro2 was operated in Data-Dependent Parallel
   Accumulation-Serial Fragmentation (PASEF) mode with 4 PASEF MS/MS
   frames in 1 complete frame. The capillary voltage was set to 1500 V,
   and the MS and MS/MS spectra were acquired from 100 to 1700 m/z. As for
   ion mobility range (1/K[0]), 0.85 to 1.3 Vs/cm^2 was used. The TIMS
   accumulation and ramp time were both set to 100 ms, which enable an
   operation at duty cycles close to 100%. The “target value” of 10,000
   was applied to a repeated schedule, and the intensity threshold was set
   at 1500. The collision energy was ramped linearly as a function of
   mobility from 45 eV at 1/K[0] = 1.3 Vs/cm^2 to 27 eV at 1/K[0] = 0.85
   Vs/cm^2. The quadrupole isolation width was set to 2Th for m/z < 700
   and 3Th for m/z > 800.

Spectral library and quantification

   The software used for the preliminary experiment to construct the
   spectral library is FragPipe (v21.0). The database parameters are as
   follows: the database is
   uniprotkb_proteome_UP000005640_human_82493_20240528.fasta (containing
   82,493 sequences), 700, and the iRT2.fasta database (containing 1
   sequence). A decoy database and a contaminant database were added to
   control the FDR caused by random matches and to eliminate the impact of
   contaminating proteins. The mass tolerance for precursor ions and
   fragment ions is set to 20 ppm. The digestion enzyme is strict trypsin,
   allowing for up to two missed cleavages. The fixed modification is
   Carbamidomethyl (C), and the variable modifications are Oxidation (M)
   and Acetyl (Protein N-term), allowing up to three variable
   modifications. For the identification results, the FDR at both the
   peptide and protein levels is set to 1%, and decoys are not included in
   the output. The characteristic peptide filtering criteria are as
   follows: variable modification peptides are excluded, missed cleavage
   peptides are excluded, non-unique peptides are excluded, and peptides
   with low intensity (generally below 3000) are excluded.

   The analysis of the mass spectrometry data was performed using Skyline
   (v23.1). The Peptide Settings are as follows: Trypsin [KR/P] was used
   as the protease, with the maximum number of missed cleavages set to 0;
   peptide length is set to 6–25 amino acid residues; the fixed
   modification is Carbamidomethyl (C), and the variable modifications are
   Oxidation (M) and Acetyl (Protein N-term); the maximum number of
   modifications is set to 3. For the transition settings, the charge of
   the precursor ion is set to 2, 3, or 4, and the charge of the fragment
   ion is set to 1 or 2. The ion types are set to b, y, and p. The
   fragment ion selection includes ions with a charge-to-mass ratio (m/z)
   greater than that of the precursor ion, up to the second-to-last ion.
   The mass error tolerance for ion matching is set to 0.05 Da.

   Key reagents and instruments for LC–MS analysis are listed in
   Additional file 2: Table S1–S2.

External transcriptomic data analysis

Bulk RNA-seq data processing

   Transcriptomic profiles of ovarian cancer tissues and normal controls
   were analyzed using public datasets: TCGA-OV (The Cancer Genome Atlas
   Ovarian Cancer, n = 426 tumors) and GTEx (Genotype-Tissue Expression
   project, n = 88 normal ovarian tissues). Differential expression of
   IDO1 and TDO2 was assessed through GEPIA [[74]36] (Gene Expression
   Profiling Interactive Analysis; [75]http://gepia.cancer-pku.cn/).
   Significance thresholds were set at |log₂FC|> 1 and p-value < 0.01. Box
   plots were generated directly via the GEPIA interface.

Single-cell RNA-seq analysis

   Processed scRNA-seq data (GEO: [76]GSE180661; Synapse: syn33521743)
   were obtained with pre-defined cell clusters from the original
   publication [[77]37]. UMAP embeddings and cell-type annotations were
   directly utilized without re-clustering. Expression patterns of IDO1
   and TDO2 were visualized on the original UMAP projections.

Survival analysis

   Prognostic significance of TDO2 expression was evaluated using the
   online survival analysis tool [[78]38]. Kaplan-Meier curves for 5-year
   OS/PFS were generated with optimal cutoff values determined by the
   tool’s auto-selection algorithm (p < 0.05 deemed significant).

RNA extraction and quantitative real-time PCR

   Total RNA was extracted from tissues using the TRIzol reagent (Thermo
   Fisher Scientific, Waltham, MA, USA; Cat# 15596026) and quantified with
   a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham,
   MA, USA). cDNA synthesis was performed using the Evo M-MLV RT Master
   Mix (Accurate Biotechnology Co., Ltd., Hunan, China; Cat# AG11706)
   according to the manufacturer’s protocol. Quantitative real-time PCR
   (qRT-PCR) was subsequently carried out as previously described, with
   the ACTB gene (β-actin) serving as an internal reference for data
   normalization. Results were expressed as relative expression levels
   compared to the control group. All primers, synthesized by GENEWIZ
   (Suzhou, China), included the following sequences: TDO2 forward
   5′-CAAATCCTCTGGGAGTTGGA-3′ and reverse 5′-GTCCAAGGCTGTCATCGTCT-3′, and
   ACTB forward 5′-CATGTACGTTGCTATCCAGGC-3′ and reverse
   5′-CTCCTTAATGTCACGCACGAT-3′.

Immunohistochemistry (IHC)

   Tissues were dewaxed in xylene and rehydrated in ethanol with gradient
   concentrations. After PBS rinsing, sections were incubated in 0.01 mol
   L − 1 sodium citrate buffer (pH 6.0) for high-temperature antigen
   retrieval. Endogenous peroxidases were blocked with 3% H₂O₂ at room
   temperature for 15 min. Tissues were then blocked with 5% goat serum in
   PBST and incubated overnight at 4 °C with the primary antibody:
   Anti-TDO2 polyclonal antibody (TDO2; Proteintech Group, Wuhan, China;
   Cat# 15880-1-AP; dilution 1:200). After PBS washes, sections were
   incubated with horseradish peroxidase (HRP)-conjugated goat anti-rabbit
   secondary antibodies at room temperature for 60 min. Positive
   expressions were determined using 3,3’-diaminobenzidine (DAB) staining
   solution (ZSGB-BIO, Beijing, China; Cat# ZLI-9018).

Immunofluorescence staining

   Tissues underwent identical dewaxing and rehydration steps as described
   for IHC. Antigen retrieval was performed in sodium citrate buffer (pH
   6.0) followed by PBS washes. Sections were blocked with 5% goat serum
   and incubated overnight at 4 °C with the following primary antibodies:
   α-Smooth Muscle Actin antibody (α-SMA; Proteintech Group, Wuhan, China;
   Cat# 67735-1-Ig; dilution 1:400) and TDO2 antibody (Proteintech, Cat#
   15880-1-AP; dilution 1:200). After multiple washes, tissues were
   incubated with species-matched fluorophore-conjugated secondary
   antibodies: Alexa Fluor 488 donkey anti-mouse (Thermo Fisher
   Scientific, Waltham, MA, USA; Cat# A-21202) and Alexa Fluor 594 donkey
   anti-rabbit (Cat# A-21207) at room temperature for 60 min. Nuclei were
   stained with 4’,6-diamidino-2-phenylindole (DAPI; Thermo Fisher
   Scientific, Waltham, MA, USA; Cat# D1306). Immunofluorescent images
   were acquired using an Olympus BX63 upright microscope (Evident
   Corporation, Tokyo, Japan).

Statistical analysis and machine learning

   Log[2] fold change (log[2]FC) was calculated as the ratio of the mean
   expression levels between compared groups. A two-sided unpaired Welch’s
   t test was performed for each comparison, and p-values were adjusted
   using the Benjamini–Hochberg correction. Proteins or metabolites with
   an adjusted p-value < 0.05 and an absolute log[2]FC > 0.25 were
   considered statistically significant. In the random forest analysis, a
   model with 1000 trees was constructed using the R package randomForest
   (version 4.7-1.1), with node size set to 5. To assess model
   generalizability, 100 iterations of tenfold stratified cross-validation
   were performed. Within each iteration, models were trained on 90% of
   the training cohort samples and predicted on the remaining 10%. After
   100 repetitions, each training cohort sample accumulated multiple
   predictions; the final predicted probability was computed as the mean
   probability across all iterations, and the larger probability was used
   as the predictive label for binary classification. Model performance
   metrics were subsequently derived from these aggregated probabilities.
   The decision threshold was optimized by maximizing the Youden index on
   the training cohort. For clinical deployment, a definitive random
   forest model (ntree = 1000, node size = 5) was retrained on the
   complete training cohort and validated on validation cohort 1. For
   validation in the independent validation cohort 2, generated by
   targeted proteomics and untargeted metabolomics, z-score normalization
   was applied before model validation. The same selected features were
   used in the random forest analysis on the independent validation
   cohort. Additionally, random forest analysis was performed on the omics
   features after z-score normalization, yielding identical classification
   results.

Results

Study design and patient characteristics

   To characterize the metabolomic and proteomic profiles of plasma in
   EOC, we analyzed samples from the final cohort of 251 participants: 97
   women with EOC, 38 with BOT, 54 with benign ovarian tumors, and 62
   healthy controls (Additional file 1: Fig. S1). The training cohort and
   validation cohort 1 (n = 121) were recruited between April 2023 and
   September 2023, while validation cohort 2 (n = 130) was enrolled as an
   independent cohort from September 2023 to April 2024. Demographic and
   clinical characteristics of all participants are summarized in
   Table [79]1. BMI was balanced across groups. Age did not differ
   significantly between EOC patients and healthy controls.

   Plasma samples from the training cohort and validation cohort 1 were
   analyzed using 4D data-independent acquisition (DIA) quantitative
   proteomics and liquid chromatography-mass spectrometry (LC–MS)-based
   untargeted metabolomics. For validation cohort 2, targeted proteomics
   (4D parallel reaction monitoring, PRM) and LC–MS-based untargeted
   metabolomics were employed to validate the findings (Fig. [80]1).

Fig. 1.

   Fig. 1
   [81]Open in a new tab

   Study design for machine-learning-based classifier development for EOC
   patients. We first procured samples in a training cohort for proteomic
   and metabolomic analysis. The classifier was then validated in an
   independent validation cohort 1, followed by a validation cohort 2

Plasma metabolomic profiling of EOC, BOT, benign ovarian tumor and healthy
controls

   Our data demonstrated high consistency and reproducibility. In QC
   analysis, the median CV values for metabolomic and proteomic data were
   5.6 and 18.7%, respectively (Fig. [82]2A). Reproducibility of
   metabolite extraction and detection was further confirmed by
   overlapping total ion chromatograms (TICs) from mass spectrometry
   analysis of QC samples (Additional file 1: Fig. S2A–S2B). The high
   stability of the instrument ensured data reliability. Principal
   component analysis (PCA) was performed to assess overall protein
   differences between groups and intra-group variability. Tight
   clustering of QC samples in PCA plots indicated robust system
   performance and high data quality (Additional file 1: Fig. S2C–S2D). A
   total of 4362 metabolites were identified and quantified, with details
   provided in Additional file 2: Table S3. These metabolites spanned
   multiple categories, including benzene and substituted derivatives (the
   most abundant), amino acids and their metabolites, heterocyclic
   compounds, and organic acids and their derivatives (Additional file 1:
   Fig. S2E). Untargeted metabolomics analysis revealed clear separation
   between EOC patients and healthy controls or benign ovarian tumor
   groups, while BOT exhibited partial overlap (Fig. [83]2B). Notably, the
   metabolomic profiles of benign ovarian tumors and healthy controls were
   highly similar, suggesting shared biological features. To identify
   altered plasma metabolites in EOC, pairwise comparisons were performed
   using a two-sided unpaired Welch’s t test with a FDR < 0.05 and
   |log₂FC|> 0.25. Significant differential metabolites were identified:
   1238 in EOC vs. healthy controls, 860 in EOC vs. benign ovarian tumors,
   and 256 in EOC vs. BOT (Fig. [84]2C–E). Hierarchical clustering showed
   distinct separation between EOC and healthy control samples, while
   benign and BOT samples exhibited higher similarity to EOC (Additional
   file 1: Fig. S2F). Benign and BOT groups were subsequently combined
   into a non-malignant ovarian tumor group for further analysis. K-means
   clustering analysis revealed 4362 metabolites were clustered into
   significant discrete clusters among the three groups (Fig. [85]2F). We
   observed an apparent discrimination between EOC and other groups under
   Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA)
   (p < 0.05; Fig. [86]2G and [87]H). Permutation (200X) validated the
   credible predictability and fitness in the OPLS-DA model, with R^2Y
   (cumulative[cum]) = 0.978 and Q^2 (cum) = 0.786 in OPLS-DA.

Fig. 2.

   [88]Fig. 2
   [89]Open in a new tab

   Plasma metabolomic profiling of EOC, BOT, benign ovarian tumor, and
   healthy controls. A Coefficients of variation (CV) for metabolomic and
   proteomic data were calculated using pooled quality control (QC)
   samples derived from the training cohort and validation cohort 1. B
   Principal component analysis (PCA) of plasma samples in the training
   cohort was performed based on 4362 quantified metabolites. C–E Volcano
   plots were generated to compare EOC with BOT, EOC with benign ovarian
   tumors, and EOC with healthy controls. Metabolites with a log[2]FC
   greater than 0.25 or less than − 0.25, and an adjusted p-value lower
   than 0.05, were considered significantly differentially abundant. The
   number of significantly depleted (green) and enriched (red) metabolites
   is indicated at the top of each plot. F The 4362 metabolites were
   clustered using K-means into significant discrete clusters to
   illustrate the relative expression changes in the metabolomics data.
   The groups in the metabolomics data include EOC, Non-Malignant (BOT and
   benign ovarian tumors), and healthy controls. G Orthogonal partial
   least squares-discriminant analysis (OPLS-DA) was applied to plasma
   metabolomics data to distinguish between EOC patients and Non-OC (BOT,
   benign ovarian tumors, and healthy controls) in the training set. EOC
   samples are shown in green, and Non-OC samples are shown in orange. H
   The validity of OPLS-DA model in G was confirmed by a permutation test
   (permutation times n = 200), showing no overfitting. R^2Y measures the
   goodness of fit, while Q^2 measures the predictive ability of the model

Plasma proteomic profiling of EOC, BOT, benign ovarian tumor, and healthy
controls

   Altogether, 2753 proteins were identified and quantified (Fig. [90]3A).
   Detailed protein information is provided in Additional file 2: Table
   S4. Unsupervised PCA of proteomics data (without prior feature
   selection) revealed partial separation between plasma samples from EOC
   patients and other groups (Fig. [91]3B). To characterize altered plasma
   proteins in EOC, we performed pairwise comparisons to identify
   significantly differentially abundant proteins. A total of 561, 385,
   and 208 proteins showed significant abundance differences (two-sided
   unpaired Welch’s t test, FDR < 0.05, |log2FC|> 0.25) in EOC vs. healthy
   controls, EOC vs. benign ovarian tumors, and EOC vs. BOT, respectively
   (Fig. [92]3C–E). Benign ovarian tumors and BOT were combined into a
   non-malignant group for analysis. K-means clustering analysis
   categorized the 2753 proteins into distinct expression patterns across
   the three groups: EOC, non-malignant ovarian tumors, and healthy
   controls (Fig. [93]3F).

Fig. 3.

   [94]Fig. 3
   [95]Open in a new tab

   Plasma proteomic profiling of EOC, BOT, benign ovarian tumors and
   healthy controls. A Bar chart shows the following: Database: total
   protein sequences in the reference database; Peptides: total identified
   peptides; Identified Proteins: total identified proteins;
   Quantification Proteins: total quantified proteins. B PCA of plasma
   samples (training cohort) based on 2753 quantified proteins. EOC
   samples show partial separation from other groups. C–E Volcano plots
   were generated to compare EOC with BOT, EOC with benign ovarian tumors,
   and EOC with healthy controls. Proteins with log[2]FC beyond 0.25 or
   below 0.25 with adjusted p value lower than 0.05 were considered
   significantly differentially abundant. The number of significantly
   down- (green) and up- (red) regulated proteins is indicated at the top
   of each plot. F K-means clustering of 2753 proteins across three
   groups: EOC, non-malignant (BOT and benign ovarian tumors), and healthy
   controls. Sub class 1–8 represent distinct co-expression patterns

Metabolite biomarker screening for EOC diagnosis and differential diagnosis

   We identified 355 metabolites with increased abundance and 121
   metabolites with decreased abundance in the plasma of EOC patients
   compared to both non-malignant ovarian tumor patients and healthy
   controls (Fig. [96]4A). Among these, 244 metabolites were detected in
   the validation cohort 2. Detailed metabolite information is provided in
   Additional file 2: Table S5. To trace the potential biological origins
   of these metabolites, we utilized MetOrigin [[97]39] (a bioinformatics
   tool freely available). The analysis revealed that the metabolites
   originated from multiple sources: host metabolism, gut microbiota
   activity, host-microbiota co-metabolism, and other exogenous factors
   (e.g., diet, drugs, environmental exposures, or unknown sources)
   (Fig. [98]4B). From this pool, 75 metabolites derived from host,
   microbiota, or co-metabolism were selected as candidate diagnostic
   biomarkers (Fig. [99]4C–D). Notably, kynurenine (Kyn), indole, and
   3-hydroxybutyrate demonstrated the highest reproducibility in abundance
   levels (Fig. [100]4E) and diagnostic performance (Fig. [101]4F). To
   confirm metabolite identity, we generated mirror plots comparing
   experimental MS/MS spectra with reference spectral libraries, showing
   high similarity (Additional file 1: Fig. S3A–S3C).

Fig. 4.

   [102]Fig. 4
   [103]Open in a new tab

   Metabolite biomarker screening for EOC diagnosis and differential
   diagnosis. A Venn diagrams illustrating overlaps between significantly
   enriched (increased abundance) and depleted (decreased abundance)
   metabolites in EOC vs. healthy controls and EOC vs. non-malignant
   ovarian tumors. Metabolites labeled in red represent consistently
   dysregulated candidates across both comparisons (FDR < 0.05,
   |log[2]FC|> 0.25). B Biological origin of red-labeled metabolites in A
   predicted by MetOrigin, integrating annotations from KEGG (Kyoto
   Encyclopedia of Genes and Genomes) and HMDB (Human Metabolome
   Database). Origins were categorized as host-derived,
   microbiota-derived, host-microbiota co-metabolism, or exogenous sources
   (e.g., diet, drugs). C Venn diagram showing overlaps between host- and
   microbiota-associated metabolites. D Heatmap of host- and
   microbiota-derived metabolites in B. Relative metabolite abundance is
   color-coded (blue: low; red: high; z-score normalized). E Validation of
   three candidate metabolites (kynurenine, indole, 3-hydroxybutyrate)
   showing consistent abundance differences between EOC and other groups
   in both the training cohort and validation cohort 2. Asterisks indicate
   statistical significance based on unpaired two-sided Welch’s t test. p
   value: *, < 0.05; **, < 0.01; ***, < 0.001. F Receiver operating
   characteristic (ROC) curves demonstrating diagnostic performance (AUC
   values) of the three metabolites in the training and validation cohort
   2

Protein biomarker screening for EOC diagnosis and differential diagnosis

   We identified 89 proteins with significantly increased abundance and 71
   proteins with decreased abundance in plasma from EOC patients compared
   to both non-malignant ovarian tumor patients and healthy controls
   (Fig. [104]5A). Seventeen proteins exhibiting > 50% missing values
   across samples were excluded to ensure data robustness (Fig. [105]5B).
   Using random forest analysis, we ranked proteins by mean decrease
   accuracy and selected the top 25 candidate features in the training
   cohort (Fig. [106]5C). Subsequent validation via 4D-PRM (Parallel
   Reaction Monitoring) targeted proteomics confirmed 12 proteins with
   reproducible quantification (Fig. [107]5D). The details of proteins are
   shown in Additional file 2: Table S6. Four proteins demonstrated the
   most pronounced differential abundance between EOC and other groups:
   Leucine-rich alpha-2-glycoprotein (LRG1), Inter-alpha-trypsin inhibitor
   heavy chain H3 (ITIH3), Protein disulfide-isomerase A4 (PDIA4), and
   Serum paraoxonase/arylesterase 1 (PON1) (Fig. [108]5E). Receiver
   operating characteristic (ROC) analysis revealed excellent diagnostic
   performance for these proteins in both training cohort and validation
   cohort 2 (Fig. [109]5F).

Fig. 5.

   [110]Fig. 5
   [111]Open in a new tab

   Protein biomarker screening for EOC diagnosis and differential
   diagnosis. A Venn diagrams showing overlaps between significantly
   upregulated (red) and downregulated (blue) proteins in EOC vs. healthy
   controls and EOC vs. non-malignant ovarian tumors. Proteins labeled in
   red are consistently dysregulated across both comparisons. B Heatmap of
   z-score normalized abundance for differential proteins identified in A.
   Columns represent sample groups; rows represent proteins. C Top 25
   proteins ranked by mean decrease accuracy from random forest analysis.
   D Heatmap of 12 proteins validated by 4D-PRM targeted proteomics. E
   Abundance levels of four candidate proteins (LRG1, ITIH3, PDIA4, PON1)
   in EOC, non-malignant ovarian tumors, and healthy controls. Asterisks
   indicate statistical significance based on unpaired two-sided Welch’s t
   test. p value: *, < 0.05; **, < 0.01; ***, < 0.001. F ROC curves
   showing diagnostic performance of the four proteins in the training
   cohort and validation cohort 2

Identification of EOC patients using machine learning

   We developed a random forest classifier to discriminate EOC patients
   from non-malignant ovarian tumors (benign tumors and BOT) and healthy
   controls, using 4 proteins (LRG1, ITIH3, PDIA4, PON1) and 3 metabolites
   (kynurenine, indole, 3-hydroxybutyrate) in the training cohort (n = 34
   EOC vs. 62 non-OC). The 7 selected biomarkers exhibited no missing
   values across all cohorts. The model achieved excellent performance in
   the training cohort, with AUC = 0.975 (95% CI 0.943–0.997),
   sensitivity = 95.2%, and specificity = 91.2% (Fig. [112]6A). The
   optimal threshold of 0.423 (annotated in Fig. [113]6A) was identified
   by maximizing the Youden index on the training cohort. Feature
   importance was ranked by mean decrease in accuracy (Fig. [114]6B). We
   then tested the classifier on an independent validation cohort 1 of 25
   patients. This classifier reached an AUC of 0.962 (95% CI 0.878–1.000)
   in the validation cohort 1 (Fig. [115]6C–D). To further validate this
   classifier, we applied it to validation cohort 2, which included 51 EOC
   patients, and achieved an AUC of 0.965 (95% CI 0.921–0.995)
   (Fig. [116]6E–F). Precision-Recall (PR) curves demonstrating the
   trade-off between positive predictive value and sensitivity across the
   training cohort, validation cohort 1, and validation cohort 2 are
   comprehensively presented in Additional file 1: Fig. S4A–S4C. Confusion
   matrix and all performance metrics under the optimal threshold were
   evaluated on the training cohort, validation cohort 1, and validation
   cohort 2 (Additional file 1: Fig. S4D–S4G). Decision Curve Analysis
   (DCA) and Clinical Impact Curve (CIC) were employed to quantify the
   clinical utility of our model (Additional file 1: Fig. S4H–S4I). DCA
   demonstrated superior net benefit of our model (red curve) over
   “treat-all” and “treat-none” strategies within the threshold range
   0.1–1.0. CIC demonstrated exceptional concordance between predicted and
   actual cases at thresholds above 0.4. Furthermore, the machine learning
   model demonstrated higher discriminative power than CA-125 and HE4 in
   discriminating between EOC and Non-malignant ovarian tumor (Additional
   file 1: Fig. S4J–S4K). To address clinical utility across disease
   spectra, we evaluated the model in key subgroups defined by FIGO stage
   and histology (cohort details in Table [117]1). In the stratified
   assessment of validation cohort 2 (Additional file 1: Fig. S5), the
   machine learning model demonstrated consistently strong performance
   across key subgroups, albeit with expected variation reflective of
   disease biology. For early-stage EOC, the model achieved an AUC of
   0.899 (95% CI 0.772–0.986). For late-stage EOC, the model achieved an
   AUC of 0.996 (95% CI 0.987–1.000) and providing high-stakes decision
   support for treatment stratification. In HGSOC patients, discriminative
   power remained excellent with an AUC of 0.967 (95% CI 0.915–0.997).

Fig. 6.

   [118]Fig. 6
   [119]Open in a new tab

   Identification of EOC patients using machine learning. A ROC curve of
   the random forest model in the training cohort (n = 34 EOC vs. 62
   non-OC). The model achieved an AUC of 0.975. B Feature importance
   ranking based on mean decrease in accuracy for the 7 biomarkers. C ROC
   curve of the model in independent validation cohort 1 (n = 25; 12 EOC
   vs. 13 non-OC) without batch correction, showing an AUC of 0.962. D
   Performance of the classifier in the validation cohort 1. E ROC curve
   of the classifier in independent validation cohort 2 (n = 51 EOC vs. 79
   non-OC) after batch correction, with an AUC of 0.965. F Performance of
   the classifier in the validation cohort 2

Proteomic and metabolomic alterations in EOC plasma

   We identified 561 differentially abundant proteins (|log2FC|> 0.25,
   FDR < 0.05) in plasma from EOC patients vs. healthy controls
   (Fig. [120]3C). Subcellular localization predicted by WoLF PSORT
   software showed predominant extracellular (31.55%) and cytoplasmic
   (29.59%) localization of these proteins (Additional file 1: Fig. S6A).
   Further, Gene Ontology (GO), Clusters of Orthologous Groups (KOG), and
   Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were employed
   to characterize dysregulated mechanisms in EOC. Biological processes
   indicated that differentially abundant proteins were enriched in cell
   adhesion, innate immune response, intracellular protein transport, and
   signal transduction. Cellular components analysis showed that
   differentially abundant proteins were localized in both the
   extracellular space and nucleus. Molecular function of GO analysis
   revealed differentially abundant proteins related to calcium ion
   binding and identical protein binding (Fig. [121]7A). Additionally,
   KEGG pathway analysis indicated that differentially abundant proteins
   were enriched in metabolic pathway and PI3K-Akt signaling pathway
   (Fig. [122]7B–C). KOG analysis was conducted using the KOG database to
   classify proteins based on their functional categories. KOG analysis
   indicated that differentially abundant proteins were enriched in signal
   transduction mechanisms and posttranslational modification, protein
   turnover, and chaperones (Fig. [123]7D and Additional file 1: Fig.
   S6B). Protein–protein interaction (PPI) analysis was performed using
   the STRING database [[124]40]. Differentially abundant proteins with a
   confidence score > 400 were used to construct the PPI network. Hub
   proteins with |log[2]FC|> 0.45 and a node degree > 18 were identified
   and visualized using Cytoscape (version 3.10.0) (Additional file 1:
   Fig. S7A) [[125]41]. The top three highest-scoring hub modules were
   identified using the MCODE plugin [[126]42] in Cytoscape (Fig. [127]7E,
   Additional file 1: Fig. S7B–S7C). Of particular interest, ITIH3 and
   LRG1 were present in the top-scoring hub module, whereas PDIA4 and PON1
   were identified in the third top-scoring hub module, suggesting their
   pivotal roles in EOC pathogenesis.

Fig. 7.

   [128]Fig. 7
   [129]Open in a new tab

   Proteomic alterations in EOC plasma compared to healthy controls. A
   Gene Ontology enrichment analysis of differentially abundant proteins
   showing the top five enriched terms in biological process, cellular
   component, and molecular function. B KEGG pathway enrichment analysis
   of differentially abundant proteins highlighting the top five pathways.
   C Heatmap of differentially abundant proteins associated with metabolic
   and PI3K-Akt signaling pathways. D Top five KOG classifications of
   differentially abundant proteins. E PPI of differentially abundant
   proteins revealed the hub module with the highest score, which was
   identified using the MCODE plugin in Cytoscape. Node size corresponds
   to |log[2]FC|, edge width represents the combined score, and node color
   denotes subcellular localization

   Untargeted metabolomics analysis identified 964 metabolites with
   increased abundance and 274 metabolites with decreased abundance in EOC
   plasma compared to healthy controls (Fig. [130]2C). Among these, 326
   metabolites originating from host metabolism, gut microbiota activity,
   or host-microbiota co-metabolism were selected for pathway analysis
   (Additional file 1: Fig. S8).

   Pathway enrichment analysis revealed the top 20 dysregulated pathways
   (Fig. [131]8A), including sphingolipid metabolism, steroid hormone
   biosynthesis, and tryptophan metabolism (Fig. [132]8B). Pearson
   correlation analysis (r > 0.5, p < 0.01) was performed between
   differentially abundant proteins (proteins with > 50% missing values
   excluded) and metabolites. We identified 241 significant
   protein-metabolite pairs involving 52 metabolites and 87 proteins. Our
   analysis revealed that the majority of EOC-enriched plasma metabolites
   was positively associated with EOC-upregulated proteins (Fig. [133]9).
   Conversely, negative correlations were observed between most
   EOC-enriched metabolites and EOC-downregulated proteins.

Fig. 8.

   [134]Fig. 8
   [135]Open in a new tab

   Metabolomic alterations in EOC plasma compared to healthy controls. A
   Pathway enrichment analysis of differentially abundant metabolites in
   EOC vs. healthy controls. P values are displayed as base-0.05
   logarithmic transformations [log₀.₀₅(Pvalue)]. Smaller P values
   correspond to larger logarithmic magnitudes, with log₀.₀₅(Pvalue) > 1
   indicating statistical significance (P < 0.05). B Heatmap of
   metabolites annotated to the top three pathways in A

Fig. 9.

   [136]Fig. 9
   [137]Open in a new tab

   Pearson correlation analysis between differentially abundant
   metabolites and proteins in EOC vs. combined control groups
   (non-malignant tumors and healthy controls). Metabolites or proteins
   increased or decreased in EOC are labeled in red and blue,
   respectively. *p < 0.05, **p < 0.01

TDO2^+ fibroblasts is a contributing factor to enriched kynurenine

   Our study reveals that Kyn, a tryptophan (Trp) catabolite, is
   significantly enriched in the plasma of EOC patients (Fig. [138]4E),
   particularly those with HGSOC. Trp metabolism via the Kyn pathway is
   primarily mediated by the rate-limiting enzymes indoleamine
   2,3-dioxygenase (IDO1/2) and tryptophan 2,3-dioxygenase (TDO2)
   [[139]43]. Analysis of the TCGA-OV and GTEx datasets demonstrated
   significant upregulation of both IDO1 and TDO2 in ovarian tumor tissues
   vs. normal tissues (Fig. [140]10A and [141]B) [[142]36]. We evaluated
   the expression of IDO1 and TDO2 in the scRNA-seq datasets of 160 tumor
   sites from 42 treatment-naive patients with HGSOC (GEO: [143]GSE180661)
   and available as processed objects from Synapse (syn33521743)
   [[144]37]. UMAP plots showed that IDO1 was lowly expressed and TDO2 was
   exclusively or highly expressed in fibroblasts rather than in other
   cell types (e.g., ovarian cancer cells, epithelial cells, immune cells
   and endothelial cells) (Fig. [145]10C and [146]D). Kaplan–Meier curves
   showed that the 5-year OS and PFS of the TDO2^high OC were
   significantly worse than those of the TDO2^low OC (P < 0.05,
   Fig. [147]10E and [148]F) [[149]38]. Experimental validation confirmed
   elevated TDO2 mRNA levels in HGSOC tissues vs. normal fallopian tubes
   (p < 0.0001, Fig. [150]10G). Representative IHC images of normal
   fallopian tubes, TDO2^high and TDO2^low HGSOC, are shown in
   Fig. [151]10H. TDO2^+ fibroblasts play a crucial role in promoting
   immune evasion and metastasis. In lung metastasis, TDO2^+ matrix
   fibroblasts facilitate the immune evasion of disseminated tumor cells
   and promote metastatic progression through the production of Kyn,
   suggesting that targeting stromal cell metabolism could be a
   therapeutic strategy for breast cancer patients with lung metastasis
   [[152]44]. In oral squamous cell carcinoma, TDO2^+ myofibroblasts
   attract T cells and induce the transformation of CD4^+ T cells into
   Tregs and cause CD8^+ T cell dysfunction, highlighting TDO2^+
   myofibroblasts as potential targets for immunotherapy [[153]45].
   Consistent with our observations, these two studies highlight that TDO2
   is predominantly expressed in fibroblasts. Immunofluorescence staining
   demonstrated the presence of TDO2^+ fibroblasts within the tumor
   microenvironment of HGSOC (Fig. [154]10I).

Fig. 10.

   [155]Fig. 10
   [156]Open in a new tab

   TDO2^+ fibroblasts is a contributing factor to enriched kynurenine. A–B
   Box plots showing relative mRNA expression of IDO1 and TDO2 in OC
   (n = 426) vs. normal ovarian tissues (n = 88) in the TCGA dataset of OV
   and the GTEx projects. C–D UMAP plot of cells profiled by scRNA-seq
   colored by the expression of IDO1 and TDO2. Cell types are highlighted
   with grey outlines. E–F Kaplan–Meier survival analysis of OS and PFS in
   OC patients stratified by TDO2 expression (microarray data). G qRT-PCR
   validation of TDO2 upregulation in HGSOC vs. normal fallopian tube
   tissues. H Representative IHC images of normal fallopian tubes,
   TDO2^high and TDO2^low HGSOC. Scale bars: 200 μm (left) and 100 μm
   (right). I Immunofluorescence co-staining of α-SMA (fibroblasts marker,
   green) and TDO2 (red) in HGSOC tissues. Nuclei counterstained with DAPI
   (blue)

Discussion

   This study performed comprehensive proteomic and metabolomic profiling
   of plasma from 251 women, including patients with EOC (n = 97), BOT
   (n = 38), benign ovarian tumors (n = 54), and age-matched healthy
   controls (n = 62). Using LC–MS, we identified 2753 proteins and 4362
   metabolites across these groups. We established a protein-metabolite
   machine learning model incorporating LRG1, ITIH3, PDIA4, PON1, Kyn,
   indole, and 3-hydroxybutyrate. This model was validated in an
   independent validation cohort (n = 130), achieving an AUC of 0.965. To
   our knowledge, this represents the largest plasma-based LC–MS proteomic
   and metabolomic analysis for EOC to date. Existing algorithms like ROMA
   and OVERA prioritize sensitivity (e.g., ROMA: 94% sensitivity at 75%
   specificity) [[157]14, [158]27], but their limited specificity often
   results in high false-positive rates [[159]46, [160]47], highlighting
   the need for more robust diagnostic tools. Our model demonstrated high
   accuracy in validation, underscoring the potential of multi-omics
   integration for preoperative risk stratification.

   Pathway analysis provided critical insights into EOC pathophysiology.
   After removing high-abundance proteins to enhance detection of
   low-abundance biomarkers, we quantified over 2000 proteins, revealing
   significant dysregulation in EOC patients compared to healthy controls.
   Key pathways included metabolic reprogramming—a hallmark of cancer
   [[161]48]—and PI3K-Akt signaling, frequently implicated in ovarian
   carcinogenesis. Notably, sphingolipid metabolism emerged as the most
   perturbed pathway, consistent with its established role in cancer
   progression. Sphingosine-1-phosphate (S1P), a pro-tumorigenic
   metabolite enriched in EOC plasma, promotes chemoresistance,
   metastasis, and immune evasion via the sphingosine kinase 1–S1P
   receptor 1 axis [[162]49, [163]50], reinforcing sphingolipid signaling
   as a therapeutic target.

   We observed enrichment of Kyn in EOC plasma, and IF confirmed the
   presence of TDO2^+ fibroblasts within HGSOC the tumor microenvironment.
   Trp, an essential amino acid obtained solely from dietary, sources: is
   metabolized primarily through three pathways: Kyn, 5-hydroxytryptamine,
   and indole [[164]51]. Increased levels of Trp catabolites—particularly
   Kyn, Indolepyruvate, N-Acetylserotonin, 5-Methoxyindoleacetate,
   5-Hydroxyindoleacetic acid, and Formyl-5-hydroxykynurenamine—indicate
   enhanced Trp metabolism via the Kyn and 5-hydroxytryptamine pathways.
   Both Kyn and indole are key Trp catabolites. Kyn derivatives can
   potently activate the aryl hydrocarbon receptor, leading to
   immunosuppression [[165]52]. Previous studies show Kyn progressively
   enriched along the colorectal adenoma-carcinoma sequence, with
   significantly higher plasma levels in advanced adenomas compared to
   non-advanced adenomas, accompanied by indole depletion. Furthermore,
   lower Kyn levels were associated with better chemotherapy
   response[[166]53].

   It is also important to consider the potential influence of
   pre-analytical factors (such as diet, fasting status, and medications)
   on the observed metabolomic profiles. While we implemented standardized
   sample collection protocols (e.g., fasting blood draws, controlled
   sample processing times) to mitigate these variations, residual
   confounding effects cannot be entirely ruled out. Moreover, although
   clear exclusion criteria were established, patients’ potential
   medication history was not specifically accounted for in this study.
   Future validation efforts should incorporate detailed documentation and
   analysis of these variables.

   Among the key biomarkers, LRG1 (an acute-phase protein), ITIH3 (a
   component of extracellular matrix stabilization complexes), and PDIA4
   (a disulfide isomerase linked to endoplasmic reticulum stress) emerged
   as robust diagnostic candidates. LRG1 and ITIH3 demonstrate pan-cancer
   utility in multi-marker panels for gastrointestinal and reproductive
   malignancies (e.g., colorectal, pancreatic, gastric, ovarian cancer)
   [[167]54–[168]59], highlighting the superiority of combinatorial
   strategies over single-marker approaches. PDIA4 was significantly
   elevated in EOC plasma, mirroring its tissue-specific overexpression in
   HGSOC [[169]60]. These proteins collectively reflect the systemic
   impact of EOC on host physiology, linking tumor-specific alterations
   with broader metabolic perturbations.

   Our study reveals elevated plasma levels of 3-hydroxybutyric acid
   (β-hydroxybutyrate, βHB) in EOC patients, a ketone body derived from
   branched-chain amino acid catabolism. Plasma levels of 3-hydroxybutyric
   acid (β-hydroxybutyrate, βHB), a ketone body derived from
   branched-chain amino acid catabolism, were elevated in EOC patients.
   Associations between plasma βHB levels and cancer risk vary: elevated
   levels correlate with reduced hepatocellular carcinoma and lymphoma
   risk [[170]61] but increased melanoma susceptibility [[171]62]. While
   evidence on βHB in cancer plasma is limited, its roles are
   context-dependent: it suppresses colorectal cancer progression by
   inhibiting HIF-1α/VEGFA signaling [[172]63] and inhibits glioma growth
   by activating pro-inflammatory astrocytes [[173]64]. Conversely, βHB
   promotes pancreatic cancer metastasis through HMGCL-driven ketogenesis
   [[174]65], exacerbates CRC proliferation via ACAT1-mediated IDH1
   acetylation [[175]66], and contributes to chemoresistance in bladder
   cancer through OXCT1-dependent metabolic reprogramming [[176]67]. This
   dualistic behavior underscores the tissue-specific metabolic
   adaptations in cancer and positions βHB as a potential therapeutic
   target in EOC, warranting further mechanistic investigation.

   Currently, OC screening is not recommended for low- or high-risk
   populations. In low-risk groups, screening fails to reduce mortality
   while increasing morbidity through false positives, unnecessary
   surgeries, and associated complications. Even in high-risk populations
   (e.g., BRCA mutation carriers), screening does not lower mortality
   despite enabling earlier detection and improved surgical outcomes;
   annual TVUS screening in BRCA1 carriers was associated with a fourfold
   higher 10-year mortality risk compared to risk-reducing bilateral
   salpingo-oophorectomy (rrBSO) [[177]68]. Abandoning screening
   necessitates alternative risk mitigation strategies. Population-based
   multigene testing represents a cost-effective approach for breast and
   ovarian cancer prevention, surpassing family history-based testing in
   economic analyses [[178]69]. This shift from reactive screening to
   proactive genetic risk stratification aligns with precision oncology,
   enabling personalized interventions (e.g., rrBSO in high-risk carriers)
   while avoiding universal screening pitfalls.

   Our findings have multifaceted clinical implications. The early
   detection of OC, particularly HGSOC, remains a formidable challenge due
   to minimal systemic perturbations and subtle biofluid alterations
   during initial stages. These characteristics likely contribute to the
   limited sensitivity of peripheral non-invasive detection methods, as
   early-stage tumors may evade recognition by conventional single-analyte
   assays. No universally accepted gold-standard screening approach exists
   for early ovarian cancer, underscoring the critical need for innovative
   strategies to address this unmet clinical demand given the high
   lethality of epithelial ovarian malignancies.

   Emerging evidence suggests multi-modal diagnostic frameworks,
   particularly those integrating artificial intelligence (AI)-driven
   predictive models, can overcome single-marker limitations. For example,
   a multi-criteria classification fusion (MCF) model using 52 parameters
   (51 laboratory tests and age) achieved an AUC of 0.949 in internal
   validation, with robust generalizability in external cohorts (AUC
   0.882–0.884) [[179]70]. Strikingly, this model outperformed traditional
   biomarkers (CA-125, HE4) in early-stage detection and retained
   diagnostic efficacy even after excluding tumor-specific markers,
   highlighting the power of multi-parameter systems. Although CA-125 and
   HE4 are widely utilized for ovarian cancer diagnosis, their sensitivity
   and specificity are limited by disease heterogeneity and interference
   from inflammatory or benign gynecological conditions (e.g.,
   endometriosis) [[180]71]. In the present study, given that healthy
   controls did not undergo CA-125/HE4 testing, we exclusively compared
   our diagnostic model with CA-125/HE4 in distinguishing EOC from
   non-malignant ovarian tumors. We demonstrated that our model achieved
   higher AUC values than both biomarkers alone. Proteomics captures
   dynamic functional proteins within the tumor microenvironment, whereas
   metabolomics reflects downstream metabolic reprogramming signatures.
   Integrating these approaches provides a more comprehensive profiling of
   tumor biology, and mitigates bias from single-marker reliance. Recent
   colorectal cancer studies [[181]72] demonstrated that a combined
   protein-metabolite diagnostic model was developed using logistic
   regression, effectively distinguishing CRC patients from healthy
   individuals.

   Building on this paradigm, we employed LC–MS-based proteomic and
   metabolomic profiling to characterize plasma alterations across dual
   omics layers in OC. While limited early-stage HGSOC samples
   availability constrained our analysis—a recognized bottleneck in OC
   research—integrative multi-omics data fusion significantly enhanced
   detection accuracy compared to single-modality approaches. Although the
   sample size precluded developing a dedicated early-stage diagnostic
   model, our findings robustly support the hypothesis that multi-analyte
   integration improves diagnostic precision.

   These results align with prior reports emphasizing the power of
   combinatorial biomarkers. The MCF model’s sustained performance without
   key tumor markers parallels our observation that non-tumor-derived
   proteomic/metabolomic features contribute significantly to
   classification accuracy. This suggests that systemic metabolic
   dysregulation and microenvironmental remodeling, beyond tumor-secreted
   factors alone, may underpin detectable biofluid signatures even in
   early disease.

   Our study provides a reliable non-invasive preoperative assessment tool
   for ovarian tumors, addressing critical limitations in current
   diagnostics. The high positive predictive value (PPV) of our
   multi-analyte panel minimizes false-positive, reducing unnecessary
   surgeries, mitigating risks from aggressive procedures, and optimizing
   preoperative resource allocation—particularly for elderly populations
   unsuitable for extensive surgery. Its liquid biopsy-based methodology
   enhances patient compliance compared to invasive tissue sampling. For
   patients identified as high-risk for EOC, immediate referral to
   specialized gynecologic oncology centers enables timely implementation
   of standardized preoperative protocols, including advanced imaging,
   multidisciplinary planning, and risk-adapted surgical strategies. This
   triage system improves surgical outcomes and reduces delays in
   initiating adjuvant therapy for confirmed malignancies.

   The primary limitations include the monocentric design and relatively
   small sample size derived exclusively from an Asian population,
   potentially limiting generalizability to other ethnic groups and
   statistical power to detect subtle biomarker differences. Future
   multi-center studies involving diverse cohorts are needed to validate
   and refine the model. Leveraging AI-driven platforms for large-scale
   multi-omics data integration could enhance algorithmic robustness.
   Additionally, exploring dynamic biomarker panels adaptable to disease
   progression stages may further optimize clinical utility.

Conclusions

   Through integrated proteomic and metabolomic profiling of plasma
   samples, we identified disease-specific molecular signatures that
   effectively discriminate EOC from borderline and benign ovarian tumors,
   and healthy controls. Differential protein expression highlighted
   dysregulation of metabolic pathways and the PI3K-Akt signaling pathway
   in EOC patients. Metabolic pathway enrichment analysis revealed
   perturbations in sphingolipid metabolism, steroid hormone biosynthesis,
   and Trp metabolism. Notably, Kyn—a Trp catabolite—was specifically
   enriched in EOC plasma. TDO2^+ fibroblasts in the tumor
   microenvironment are considered contributing factors to Kyn enrichment.
   A machine-learning model incorporating four proteins and three
   metabolites demonstrated robust diagnostic performance (training
   cohort: n = 96, AUC = 0.975), validated in independent cohorts
   (validation cohort 1: n = 25, AUC = 0.962; validation cohort 2:
   n = 130, AUC = 0.965). This noninvasive tool shows significant
   potential to improve preoperative EOC diagnosis. Before clinical
   implementation, further validation through multicenter studies
   involving larger, more heterogeneous patient cohorts is essential.

Supplementary Information

   [182]12916_2025_4341_MOESM1_ESM.docx^ (5.2MB, docx)

   Additional File 1: Figures S1–S8. Fig. S1 Flow diagram of study
   participants. Fig. S2 Quality control of metabolome analysis. A, Total
   ion chromatograms (TICs) of pooled QC samples in negative ion mode. B,
   TICs of pooled QC samples in positive ion mode. C, PCA of plasma
   samples and QC samples (training cohort) in negative ion mode. QC
   clusters tightly, indicating analytical stability. D, PCA of plasma
   samples and QC samples (training cohort) in positive ion mode. E, Donut
   chart illustrating the distribution of metabolites across categories.
   F, Hierarchical clustering analysis showing the similarity between
   sample groups based on metabolite expression patterns. Fig. S3 Mirror
   plots comparing experimental MS/MS spectra (top) with reference
   spectral library matches (bottom) for: A, Kynurenine; B, Indole; C,
   3-Hydroxybutyrate. Fig S4 Performance assessment of the machine
   Learning model. A, Precision − Recall Curves of Training Cohort. B,
   Precision − Recall Curves of Validation Cohort 1. C, Precision − Recall
   Curves of Validation Cohort 2. D, Confusion Matrix of Training Cohort.
   E, Confusion Matrix of Validation Cohort 1. F, Confusion Matrix of
   Validation Cohort 2. G, Performance metrics of Training Cohort,
   Validation Cohort 1 and Validation Cohort 2. H, Decision Curve Analysis
   (DCA). The blue curve plots net benefit of our model across threshold
   probabilities. Green line:"treat-none"strategy; Red
   line:"treat-all"strategy. Analyses derived from validation cohort 2. I,
   Clinical Impact Curve (CIC). Blue curve: number of patients classified
   as high-risk at each threshold; red line: actual EOC cases among
   high-risk patients. Analyses derived from validation cohort 2. J,
   Performance of the machine learning model (blue line) versus CA-125
   (red line) in validation cohort 2. K, Performance of the machine
   learning model (blue line) versus HE4 (red line) in validation cohort
   2. Fig. S5 Stratified Performance of the Machine Learning Model in
   Validation Cohort 2. A, ROC curve for early-stage EOC classification
   (FIGO I-II, n = 16) versus non-OC (n = 79). B, ROC curve for late-stage
   EOC classification (FIGO III-IV, n = 35) versus non-OC (n = 79). C, ROC
   curve for HGSOC subtype classification (n = 34) versus non-OC (n = 79).
   Fig. S6 Proteomic alterations in EOC plasma. A, Subcellular
   localization of differentially abundant proteins predicted by WoLF
   PSORT. B, Heatmap of differentially abundant proteins enriched in KOG
   functional categories: Signal transduction mechanisms (T)
   and Post-translational modification, protein turnover, chaperones (O).
   Fig. S7 PPI Network Analysis of Hub Proteins (confidence score > 400).
   A, PPI network constructed using differentially abundant
   proteins (|log2FC|> 0.45, FDR < 0.05, node degree > 18. B–C, Second and
   third top-scoring modules identified by the MCODE plugin in Cytoscape.
   Circles and squares represent up-regulated and down-regulated proteins,
   respectively. Node size corresponds to |log[2]FC|, edge width
   represents the combined score, and node color denotes subcellular
   localization. Fig. S8 Heatmap of 326 metabolites derived from the host,
   microbiome, or potential co-metabolism.
   [183]12916_2025_4341_MOESM2_ESM.xlsx^ (13MB, xlsx)

   Additional file 2: Table S1–S6. Table S1. Key reagents for LC–MS
   analysis. Table S2. Key instruments for LC–MS analysis. Table S3. The
   metabolite matrix (4,362 features) for 121 patients in the training
   cohort and validation cohort 1. Table S4. The protein matrix (2,753
   features) for 121 patients in the training cohort and validation cohort
   1. Table S5. The metabolite matrix (4,386 features) for 130 patients in
   the validation cohort 2. Table S6. The protein matrix (12 features) for
   130 patients in the validation cohort 2.
   [184]Additional file 3^ (100.5KB, doc)

Acknowledgements