Abstract Background Existing biomarkers for epithelial ovarian cancer (EOC) have demonstrated limited sensitivity and specificity. This study aimed to investigate plasma protein and metabolite characteristics of EOC and identify novel biomarker candidates for noninvasive diagnosis and differential diagnosis. Methods In this prospective diagnostic cohort study, plasma was preoperatively collected from 536 consecutive patients presenting with imaging-suspected adnexal masses, uterine fibroids, or pelvic organ prolapse. After exclusions, the final cohort comprised 251 participants: EOC (n = 97), borderline ovarian tumors (n = 38), benign ovarian tumors (n = 54), and healthy controls (n = 62). Proteomic and metabolomic profiling was performed. A machine learning model was trained on a training cohort (34 EOC patients and 62 non-OC individuals [borderline, benign, and healthy controls]) to distinguish EOC from other groups. The model was validated in two independent cohorts: validation cohort 1 (n = 25) and validation cohort 2 (n = 130) using targeted proteomics and untargeted metabolomics. External transcriptomic datasets (TCGA-OV, GTEx bulk RNA-seq; [44]GSE180661 scRNA-seq) were leveraged to validate TDO2 upregulation in ovarian cancer tissues, particularly in fibroblasts. This TDO2 upregulation were experimentally confirmed through quantitative PCR, immunohistochemistry, and immunofluorescence using clinical specimens. Results We identified significant protein alterations in EOC patients’ plasma, implicating dysregulated metabolic and PI3K-Akt signaling pathways. Metabolite analysis further revealed aberrant sphingolipid metabolism, steroid hormone biosynthesis, and tryptophan metabolism in EOC patients’ plasma. A diagnostic panel comprising 4 proteins (LRG1, ITIH3, PDIA4, and PON1) and 3 metabolites (kynurenine, indole, and 3-hydroxybutyrate) achieved an AUC of 0.975 (95% CI 0.943–0.997) with 95.2% sensitivity and 91.2% specificity in the training cohort. Critically, the model demonstrated robust generalizability in two independent validation cohorts: validation cohort 1 (AUC = 0.962, 95% CI 0.878–1.000) and validation cohort 2 (AUC = 0.965, 95% CI 0.921–0.995). Furthermore, fibroblasts with high expression of tryptophan 2,3-dioxygenase are contributing factors to elevated levels of kynurenine. Conclusions Our findings provide novel insights into the EOC metabolic and protein landscape. We developed and validated a plasma classifier demonstrating high sensitivity and specificity, which effectively distinguishes EOC patients from non-OC individuals. This classifier could enhance preoperative diagnostic accuracy and aid in differential diagnosis. Supplementary Information The online version contains supplementary material available at 10.1186/s12916-025-04341-2. Keywords: Proteome, Metabolome, EOC, Diagnosis, Differential diagnosis Background Ovarian cancer (OC) remains the most lethal gynecological malignancy, accounting for over 200,000 annual deaths globally [[45]1]. Histologically, epithelial ovarian cancer (EOC) constitutes > 90% of cases, classified into five major subtypes: high-grade serous (up to 75%), low-grade serous (< 5%), endometrioid (10%), clear cell (6%), and mucinous carcinomas (< 5%). Borderline serous/mucinous tumors, characterized by moderate cellular proliferation and atypia, represent an intermediate pathological category [[46]2, [47]3]. Approximately 75% of EOC patients present with advanced-stage disease at diagnosis due to the disease’s asymptomatic nature, correlating with stark survival disparities: 92% 5-year survival for early-stage disease vs. 29% for late-stage disease [[48]4–[49]6]. Pelvic masses affect approximately one in six women during their lifetime, with the vast majority being benign and safely manageable by general gynecologists [[50]7, [51]8]. Nevertheless, preoperative risk stratification remains critical, as numerous studies have consistently demonstrated that EOC patients achieve significantly better outcomes when their surgeries are performed by experienced gynecologic oncologists at specialized cancer centers[[52]9, [53]10]. Effective symptom-triggered testing is crucial for timely referral to gynecologic oncologists, which may improve prognosis by increasing the likelihood of complete cytoreduction [[54]11]. Nevertheless, 50–70% of patients with adnexal masses (later pathologically confirmed as EOC) are not appropriately referred under current triage protocols [[55]12]. Despite decades of biomarker research, clinical challenges persist. CA-125 though widely used, exemplifies these limitations: while elevated in 90% of advanced EOC cases, it detects only 50% of early-stage disease and yields frequent false positives in premenopausal women (e.g., endometriosis) [[56]8, [57]13, [58]14]. Two landmark screening trials (UKCTOCS and PLCO) demonstrated that CA-125 combined with transvaginal ultrasound increased early-stage detection rates but failed to reduce mortality [[59]15–[60]18]. This paradox underscores fundamental biological constraints: while screening improves stage shift, inherent tumor aggressiveness and limited therapeutic advances attenuate survival benefits [[61]19]. Multi-marker strategies show incremental progress. Combinatorial algorithms like ROMA (Risk of Ovarian Malignancy Algorithm) integrate HE4, CA-125, and menopausal status, achieving superior diagnostic accuracy [[62]20–[63]23]. Multivariate assays such as OVA1® (measuring β2-microglobulin, apolipoprotein A1, transthyretin, CA-125, and transferrin) and Overa® (incorporating HE4, CA-125, apolipoprotein A1, follicle-stimulating hormone, and TIMP-1) further refine preoperative risk assessment [[64]24–[65]27]. However, real-world data reveal persistent gaps—only 30–50% of confirmed EOC cases receive appropriate referrals [[66]12]. Emerging multi-omics technologies offer transformative potential. Notably, proteomic and metabolomic approaches have proven useful in biomarker discovery [[67]28–[68]32]. Proteomic analyses have identified a novel biomarker panel (EEF1G, MSLN, BCAM, and TAGLN2) for high-grade serous ovarian cancer (HGSOC) [[69]33]. Additionally, studies suggest that the combination of TTR, Hb, ApoAI, and TF with CA-125 could significantly improve the detection of early-stage ovarian cancer [[70]34]. Concurrently, metabolomic profiling reveals that metabolic alterations may differentiate serous ovarian cancer from benign ovarian tumors, highlighting the potential of plasma metabolites as diagnostic biomarkers [[71]35]. The evident transition from single-marker to multi-marker strategies in ovarian cancer screening underscores the enhanced diagnostic reliability of multimodal approaches. Building on this paradigm shift, our study aimed to develop an integrative algorithm combining proteomic profiles with metabolomic signatures, thereby capturing systemic pathophysiological alterations across biological hierarchies. Our study identified differentially abundant proteins and metabolites in the plasma of EOC patients, offering valuable insights for subsequent mechanistic exploration and novel tumor biomarker development. The innovation of this work lies in integrating downstream cellular products—proteins and metabolites—to comprehensively characterize the EOC plasma profile. Theoretically, these two molecular hierarchies exhibit the most significant pathological alterations, and our diagnostic approach synergizes the dynamic sensitivity of metabolites with the stability of proteins. Our preoperative diagnostic model demonstrated robust performance, constructed and validated using proteomic and metabolomic data from a real-world, unselected cohort of EOC patients. Methods Patients and samples This prospective diagnostic cohort study was approved by the Independent Ethics Committee for Clinical Research and Animal Trials of the First Affiliated Hospital of Sun Yat-sen University (approved number: 2022–510). Written informed consent was obtained from all patients prior to any study procedures. Between April 2023 and April 2024, preoperative plasma samples were consecutively collected from 536 participants recruited from the Department of Obstetrics and Gynecology, the First Affiliated Hospital of Sun Yat-sen University: (1) patients with imaging-suspected adnexal masses (all ages); (2) patients predominantly aged around 50 years with uterine fibroids or pelvic organ prolapse. Histopathological diagnosis for all enrolled patients with treatment-naïve EOC, borderline ovarian tumors (BOT), or benign ovarian tumors was independently confirmed by two board-certified pathologists. Exclusion criteria were as follows: (1) active infections, (2) autoimmune diseases (e.g., systemic lupus erythematosus, rheumatoid arthritis), (3) chronic inflammatory disorders (e.g., inflammatory bowel disease), and (4) history of other malignant tumors. These conditions were excluded due to their potential confounding effects on plasma protein and metabolite profiles. The final cohort comprised: 97 women with EOC, 38 women with BOT, 54 women with benign ovarian tumors, and 62 women with uterine fibroids or pelvic organ prolapse as healthy controls (Additional file 1: Fig. S1). The training cohort (n = 96) was used for biomarker discovery and model construction. The validation cohort 1 (n = 25) and the validation cohort 2 (n = 130) were applied to validate the model performance. The plasma of patients in the training cohort and the validation cohort 1 were collected between April 2023 and September 2023. The validation cohort 2 was enrolled as an independent cohort from September 2023 to April 2024. Demographic and clinical information including age, body mass index (BMI), CA-125, HE4, family history of cancer, parity history, FIGO stage, and histological subtype was recorded (Table [72]1). Table 1. Description of demographic and clinical characteristics of patients Training set and validation cohort 1 (n = 121) Validation cohort 2 (n = 130) Number Epithelial ovarian cancer (n = 46) Borderline ovarian tumor (n = 16) Benign ovarian tumor (n = 26) Healthy control (n = 33) Epithelial ovarian cancer (n = 51) Borderline ovarian tumor (n = 22) Benign ovarian tumor (n = 28) Healthy control (n = 29) Age (year) (mean (SD)) 53.17 (10.23) 38.69 (9.82) 44.19 (13.52) 53.79 (6.98) 54.24 (12.65) 38.00 (13.49) 47.00 (16.41) 53.00 (10.62) Height (cm) (mean (SD)) 157.55 (5.20) 161.13 (5.58) 157.50 (5.23) 157.82 (5.03) 158.05 (5.02) 158.00 (6.06) 158.86 (5.84) 158.07 (4.94) Weight (kg) (mean (SD)) 55.65 (7.17) 57.57 (7.80) 57.93 (10.28) 56.76 (7.31) 58.55 (8.99) 56.08 (7.60) 61.70 (10.41) 59.20 (8.96) BMI (kg/m^2) (mean (SD)) 22.43 (2.69) 22.16 (2.76) 23.30 (3.73) 22.79 (2.72) 23.39 (2.97) 22.49 (3.01) 24.46 (4.04) 23.69 (3.35) CA125 (units/ml) (median [IQR]) 722.80 [197.58, 1785.90] 33.15 [20.30, 377.25] 15.80 [12.10, 24.53] NA 803.25 [132.02, 1554.95] 26.75 [14.90, 104.28] 15.85 [12.88, 17.88] NA HE4 (pmol/l) (median [IQR]) 257.70 [137.10, 430.10] 45.05 [34.05, 49.65] 36.60 [27.30, 42.10] NA 155.30 [68.90, 401.80] 41.35 [31.12, 59.70] 34.60 [27.90, 40.00] NA Family history (%)  No 31 (67.39) 14 (87.50) 24 (92.31) 28 (84.85) 36 (70.59) 17 (77.27) 23 (82.14) 26 (89.66)  Yes 15 (32.61) 2 (12.50) 2 (7.69) 5 (15.15) 15 (29.41) 5 (22.73) 5 (17.86) 3 (10.34) Parity (%)  No 6 (13.04) 6 (37.50) 5 (19.23) 2 (6.06) 7 (13.73) 9 (40.91) 4 (14.29) 1 (3.45)  Yes 40 (86.96) 10 (62.50) 21 (80.77) 31 (93.94) 44 (86.27) 13 (59.09) 24 (85.71) 28 (96.55) Menopause (%)  No 21 (45.65) 12 (75.00) 18 (69.23) 12 (36.36) 20 (39.22) 19 (86.36) 14 (50.00) 16 (55.17)  Yes 25 (54.35) 4 (25.00) 8 (30.77) 21 (63.64) 31 (60.78) 3 (13.64) 14 (50.00) 13 (44.83) Tumor size (cm) (%)  ≤ 10 20 (43.48) 9 (56.25) 21 (80.77) NA 27 (52.94) 12 (54.55) 18 (64.29) NA  > 10 26 (56.52) 7 (43.75) 5 (19.23) NA 24 (47.06) 10 (45.45) 10 (35.71) NA FIGO Stage for ovarian cancer (%)  I-II 11 (23.91) NA NA NA 16 (31.37) NA NA NA  III-IV 35 (76.09) NA NA NA 35 (68.63) NA NA NA Histological type (%)  Serous 32 (69.57) 8 (50.00) 10 (38.46) NA 34 (66.67) 8 (36.36) 13 (46.43) NA  Mucinous 3 (6.52) 8 (50.00) 12 (46.15) NA 5 (9.80) 10 (45.45) 14 (50.00) NA  Endometriod 7 (15.22) 0 (0.00) 0 (0.00) NA 2 (3.92) 3 (13.64) 0 (0.00) NA  Clear cell 3 (6.52) 0 (0.00) 0 (0.00) NA 7 (13.73) 0 (0.00) 0 (0.00) NA  Other 1 (2.17) 0 (0.00) 4 (15.38) NA 3 (5.88) 1 (4.55) 1 (3.57) NA [73]Open in a new tab Age, age at diagnosis; Height, height at diagnosis; Weight, weight at diagnosis; BMI, BMI at diagnosis; CA125, CA125 blood test performed at diagnosis; HE4, HE4 test of blood collected at diagnosis; Family history, family history of malignant tumors; Parity, parity before diagnosis; Menopause, Menopause before diagnosis or not; Tumor size, tumor size described by ultrasound; FIGO Stage for ovarian cancer, FIGO Stage I-II, Tumor confined to ovaries or fallopian tube(s). Or tumor involves one or both ovaries or fallopian tubes with pelvic extension (below pelvic brim) or primary peritoneal cancer. FIGO Stage III-IV, tumor involves one or both ovaries, or fallopian tubes, or primary peritoneal cancer, with cytologically or histologically confirmed spread to the peritoneum outside the pelvis and/or metastasis to the retroperitoneal lymph nodes. Or distant metastasis excluding peritoneal metastases. Histological type, histological type based on the patient’s pathology report All blood samples were collected after an 8–12-h fasting period using K[2]-EDTA Blood Collection Tubes (Becton, Dickinson and Company, USA). After collection, tubes were immediately placed on ice and processed within 4 h. Plasma separation was performed by centrifugation at 1750 × g for 10 min at 4 °C. The supernatant plasma was aliquoted into cryovials and stored at − 80 °C until analysis. Proteome analysis Sample preparation High abundance proteins were removed by ProteoMiner™ Protein Enrichment Small-Capacity Kit (Bio-Rad Laboratories, Hercules, CA, USA; Cat# 1633006). according to the kit’s instruction. Protein concentration was measured via BCA assay, and equal amounts were digested. Proteins were reduced with 10 mM DTT (37 °C, 45 min), alkylated with 50 mM iodoacetamide (dark, 15 min), and precipitated with acetone (−20 °C, 2 h). The pellet was resuspended in 25 mM ammonium bicarbonate, digested overnight with trypsin (37 °C), and desalted using C18 cartridges. Fractionation To generate a comprehensive library for the diaPASEF experiments, peptides were fractionated on a reversed-phase column (XBridge™ BEH300 C18 column, 4.6um × 250 mm, 3.5 µm 100 Å (Waters, Milford, MA, USA)) at pH 10 with Vanquish Core system (Thermo Fisher Scientific, Waltham, MA, USA). Fifty-two fractions were collected, concatenated into 10, dried, and reconstituted in 0.1% formic acid. Liquid chromatography-mass spectrometry (LC–MS) analysis The sample was separated using the NanoElute UHPLC system with nanoliter flow rates. Mobile phase A consisted of 0.1% formic acid in water, and mobile phase B consisted of 0.1% formic acid in acetonitrile (100% acetonitrile). The sample was loaded by an autosampler onto the analytical column (IonOpticks, Australia, 25 cm × 75 μm, C18 packing, 1.6 μm particle size) for separation. The column temperature was controlled at 50 °C using an integrated column oven. The sample load was 200 ng, and the flow rate was set to 300 nL/min with a gradient duration of 40 min. The liquid chromatography gradient was as follows: 0 to 25 min: mobile phase B increased from 2 to 22%; 25 to 30 min: mobile phase B increased linearly from 22 to 35%; 30 to 35 min: mobile phase B increased linearly from 35 to 80%; 35 to 40 min: mobile phase B was maintained at 80%. The mixed samples were first subjected to mass spectrometry data acquisition using the ddaPASEF mode of the timsTOF Pro2 (Bruker Daltonics, Germany) to establish appropriate acquisition windows for the subsequent diaPASEF method. The effective gradient for the analysis was 40 min, and the detection mode was set to positive ions. The precursor ion scan range was 100–1700 m/z, with an ion mobility range of 1/K[0] from 0.85 to 1.3 V·s/cm^2. The ion accumulation and release time was set to 100 ms, with near 100% ion utilization efficiency. The capillary voltage was 1500 V, the drying gas flow rate was 3 L/min, and the drying temperature was 180 °C. For DDA-PASEF mode, the acquisition parameters were as follows: 4 MS/MS scans per cycle (with a total cycle time of 0.53 s), a charge range of 0–5, a dynamic exclusion time of 0.4 min, an ion target intensity of 10,000, and an ion intensity threshold of 1500. The collision energy increased linearly with ion mobility, ranging from 27 eV at 1/K[0] = 0.85 V·s/cm^2 to 45 eV at 1/K[0] = 1.3 V·s/cm^2. The quadrupole isolation width was set to 2 Th when m/z < 700 and 3 Th when m/z > 800. In diaPASEF mode, the acquisition parameters were as follows: a mass range of approximately 400–1200, a mobility range of 0.85–1.3 V·s/cm^2, a mass width of 25 Da, a mass overlap of 0.1, 24 mass steps per cycle, and 2 mobility windows, resulting in a total of 48 acquisition windows. The average acquisition cycle time was 1.17 s. Data processing The search engines used in this project were DIA-NN (v1.8.0) for DIA data and MSFragger (v3.8) for DDA data. For the DDA mass spectrometry data, the key search parameters were as follows: the database used was the uniprot_proteome_UP000005640_human_20230504.fasta database (containing 82,492 sequences). The mass tolerances for precursor ions and fragment ions were both set to 20 ppm. Carbamidomethylation of cysteine (Carbamidomethyl [C]) was selected as a fixed modification, while oxidation of methionine (Oxidation [M]) and acetylation of the protein N-terminus (Acetyl [Protein N-term]) were set as variable modifications. The enzyme used was strict trypsin with a maximum of two missed cleavages allowed. For DIA mass spectrometry data, the spectral library search method was used. The parameters included importing the qualitative results from the DDA search of this project to build a spectral library. The Match Between Runs (MBR) option was selected to generate a spectral library from the DIA data, and this library was used to reanalyze the DIA data for protein quantification. The false discovery rate (FDR) at both the precursor ion and protein levels was filtered to 1%. The filtered data were used for subsequent bioinformatics analysis. Quality control of proteome data The quality of proteomic data was ensured at multiple levels. First, a HeLa cell digest was used for instrument performance evaluation. We also run water samples as blanks every 10 injections to avoid carry-over. A pooled sample, comprising a mixture of all peptide samples, was analyzed as a Quality control-mix every 10 injections. Plasma samples of four patient groups from both training and validation cohorts were randomly distributed. Metabolome analysis Sample preparation Metabolites were extracted from 50 µL plasma using ACN: methanol (1:4) with internal standards. After vortexing, centrifugation (12,000 rpm, 10 min), and freezing (− 20 °C, 30 min), supernatants were analyzed. LC–MS analysis All samples were for two LC/MS methods. One aliquot was analyzed using positive ion conditions and was eluted from T3 column (Waters ACQUITY Premier HSS T3 Column 1.8 µm, 2.1 mm * 100 mm) using 0.1% formic acid in water as solvent A and 0.1% formic acid in acetonitrile as solvent B in the following gradient: 5 to 20% in 2 min, increased to 60% in the following 3 min, increased to 99% in 1 min and held for 1.5 min, then come back to 5% mobile phase B within 0.1 min, held for 2.4 min. The analytical conditions were as follows, column temperature, 40 °C; flow rate, 0.4 mL/min; injection volume, 4 μL. Another aliquot was using negative ion conditions and was the same as the elution gradient of positive mode. The data acquisition was operated using the information-dependent acquisition (IDA) mode using Analyst TF 1.7.1 Software (Sciex, Concord, ON, Canada). The source parameters were set as follows: ion source gas 1 (GAS1), 50 psi; ion source gas 2 (GAS2), 50 psi; curtain gas (CUR), 25 psi; temperature (TEM), 550 °C; declustering potential (DP), 60 V, or − 60 V in positive or negative modes, respectively; and ion spray voltage floating (ISVF), 5000 V or − 4000 V in positive or negative modes, respectively. The TOF MS scan parameters were set as follows: mass range, 50–1000 Da; accumulation time, 200 ms; and dynamic background subtract, on. The product ion scan parameters were set as follows: mass range, 25–1000 Da; accumulation time, 40 ms; collision energy, 30 or − 30 V in positive or negative modes, respectively; collision energy spread, 15; resolution, UNIT; charge state, 1 to 1; intensity, 100 cps; exclude isotopes within 4 Da; mass tolerance, 50 ppm; maximum number of candidate ions to monitor per cycle, 18. Data processing The original data file was converted into mzXML format by ProteoWizard software. Peak extraction, peak alignment, and retention time correction were respectively performed by XCMS program. The “SVR” method was used to correct the peak area. The peaks with detection rate lower than 50% in each group of samples were discarded. The blank value is then KNN filled. After that, metabolic identification information was obtained by searching the laboratory’s self-built database, integrated public database, AI database, and metDNA. Substances with an identification score above 0.5 and that exhibited a coefficient of variation (CV) below 0.3 in quality control (QC) samples were retained; positive and negative ionization modes were subsequently merged, retaining the substance with the highest qualitative grade and smallest CV value. Quality control of metabolome analysis Several types of QC were included in the experiment. Mixed standards for targeted detection are used for monitoring instrument variation before the project analysis. QC are prepared by mixing sample extracts and are used to analyze the reproducibility of samples under the same processing methods. During instrumental analysis, a QC sample is typically inserted after every 10 analytical samples to monitor the reproducibility of the analysis process. Solvent served as blanks. Insert one blank after every 10 samples, extract the chromatogram of the internal standard in the blank solvent, and examine the impact of residues. Targeted protein analysis Sample preparation Proteins were extracted, quantified (BCA), and digested similarly to proteome analysis. LC–MS/MS Liquid chromatography (LC) was performed on a nanoElute UHPLC (Bruker Daltonics, Germany). About 200 ng peptides were separated within 40 min at a flow rate of 0.3 µL/min on a commercially available reverse-phase C18 column with an integrated CaptiveSpray Emitter (25 cm × 75 μm ID, 1.6 μm, Aurora Series with CSI, IonOpticks, Australia). Mobile phases A and B were produced with 0.1% formic acid in water and 0.1% formic acid in ACN. Mobile phase B was increased from 2 to 22% over the first 25 min, increased to 35% over the next 5 min, further increased to 80% over the next 5 min, and then held at 80% for 5 min. The LC was coupled online to a hybrid timsTOF Pro2 via a CaptiveSpray nano-electrospray ion source (CSI). The timsTOF Pro2 was operated in Data-Dependent Parallel Accumulation-Serial Fragmentation (PASEF) mode with 4 PASEF MS/MS frames in 1 complete frame. The capillary voltage was set to 1500 V, and the MS and MS/MS spectra were acquired from 100 to 1700 m/z. As for ion mobility range (1/K[0]), 0.85 to 1.3 Vs/cm^2 was used. The TIMS accumulation and ramp time were both set to 100 ms, which enable an operation at duty cycles close to 100%. The “target value” of 10,000 was applied to a repeated schedule, and the intensity threshold was set at 1500. The collision energy was ramped linearly as a function of mobility from 45 eV at 1/K[0] = 1.3 Vs/cm^2 to 27 eV at 1/K[0] = 0.85 Vs/cm^2. The quadrupole isolation width was set to 2Th for m/z < 700 and 3Th for m/z > 800. Spectral library and quantification The software used for the preliminary experiment to construct the spectral library is FragPipe (v21.0). The database parameters are as follows: the database is uniprotkb_proteome_UP000005640_human_82493_20240528.fasta (containing 82,493 sequences), 700, and the iRT2.fasta database (containing 1 sequence). A decoy database and a contaminant database were added to control the FDR caused by random matches and to eliminate the impact of contaminating proteins. The mass tolerance for precursor ions and fragment ions is set to 20 ppm. The digestion enzyme is strict trypsin, allowing for up to two missed cleavages. The fixed modification is Carbamidomethyl (C), and the variable modifications are Oxidation (M) and Acetyl (Protein N-term), allowing up to three variable modifications. For the identification results, the FDR at both the peptide and protein levels is set to 1%, and decoys are not included in the output. The characteristic peptide filtering criteria are as follows: variable modification peptides are excluded, missed cleavage peptides are excluded, non-unique peptides are excluded, and peptides with low intensity (generally below 3000) are excluded. The analysis of the mass spectrometry data was performed using Skyline (v23.1). The Peptide Settings are as follows: Trypsin [KR/P] was used as the protease, with the maximum number of missed cleavages set to 0; peptide length is set to 6–25 amino acid residues; the fixed modification is Carbamidomethyl (C), and the variable modifications are Oxidation (M) and Acetyl (Protein N-term); the maximum number of modifications is set to 3. For the transition settings, the charge of the precursor ion is set to 2, 3, or 4, and the charge of the fragment ion is set to 1 or 2. The ion types are set to b, y, and p. The fragment ion selection includes ions with a charge-to-mass ratio (m/z) greater than that of the precursor ion, up to the second-to-last ion. The mass error tolerance for ion matching is set to 0.05 Da. Key reagents and instruments for LC–MS analysis are listed in Additional file 2: Table S1–S2. External transcriptomic data analysis Bulk RNA-seq data processing Transcriptomic profiles of ovarian cancer tissues and normal controls were analyzed using public datasets: TCGA-OV (The Cancer Genome Atlas Ovarian Cancer, n = 426 tumors) and GTEx (Genotype-Tissue Expression project, n = 88 normal ovarian tissues). Differential expression of IDO1 and TDO2 was assessed through GEPIA [[74]36] (Gene Expression Profiling Interactive Analysis; [75]http://gepia.cancer-pku.cn/). Significance thresholds were set at |log₂FC|> 1 and p-value < 0.01. Box plots were generated directly via the GEPIA interface. Single-cell RNA-seq analysis Processed scRNA-seq data (GEO: [76]GSE180661; Synapse: syn33521743) were obtained with pre-defined cell clusters from the original publication [[77]37]. UMAP embeddings and cell-type annotations were directly utilized without re-clustering. Expression patterns of IDO1 and TDO2 were visualized on the original UMAP projections. Survival analysis Prognostic significance of TDO2 expression was evaluated using the online survival analysis tool [[78]38]. Kaplan-Meier curves for 5-year OS/PFS were generated with optimal cutoff values determined by the tool’s auto-selection algorithm (p < 0.05 deemed significant). RNA extraction and quantitative real-time PCR Total RNA was extracted from tissues using the TRIzol reagent (Thermo Fisher Scientific, Waltham, MA, USA; Cat# 15596026) and quantified with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). cDNA synthesis was performed using the Evo M-MLV RT Master Mix (Accurate Biotechnology Co., Ltd., Hunan, China; Cat# AG11706) according to the manufacturer’s protocol. Quantitative real-time PCR (qRT-PCR) was subsequently carried out as previously described, with the ACTB gene (β-actin) serving as an internal reference for data normalization. Results were expressed as relative expression levels compared to the control group. All primers, synthesized by GENEWIZ (Suzhou, China), included the following sequences: TDO2 forward 5′-CAAATCCTCTGGGAGTTGGA-3′ and reverse 5′-GTCCAAGGCTGTCATCGTCT-3′, and ACTB forward 5′-CATGTACGTTGCTATCCAGGC-3′ and reverse 5′-CTCCTTAATGTCACGCACGAT-3′. Immunohistochemistry (IHC) Tissues were dewaxed in xylene and rehydrated in ethanol with gradient concentrations. After PBS rinsing, sections were incubated in 0.01 mol L − 1 sodium citrate buffer (pH 6.0) for high-temperature antigen retrieval. Endogenous peroxidases were blocked with 3% H₂O₂ at room temperature for 15 min. Tissues were then blocked with 5% goat serum in PBST and incubated overnight at 4 °C with the primary antibody: Anti-TDO2 polyclonal antibody (TDO2; Proteintech Group, Wuhan, China; Cat# 15880-1-AP; dilution 1:200). After PBS washes, sections were incubated with horseradish peroxidase (HRP)-conjugated goat anti-rabbit secondary antibodies at room temperature for 60 min. Positive expressions were determined using 3,3’-diaminobenzidine (DAB) staining solution (ZSGB-BIO, Beijing, China; Cat# ZLI-9018). Immunofluorescence staining Tissues underwent identical dewaxing and rehydration steps as described for IHC. Antigen retrieval was performed in sodium citrate buffer (pH 6.0) followed by PBS washes. Sections were blocked with 5% goat serum and incubated overnight at 4 °C with the following primary antibodies: α-Smooth Muscle Actin antibody (α-SMA; Proteintech Group, Wuhan, China; Cat# 67735-1-Ig; dilution 1:400) and TDO2 antibody (Proteintech, Cat# 15880-1-AP; dilution 1:200). After multiple washes, tissues were incubated with species-matched fluorophore-conjugated secondary antibodies: Alexa Fluor 488 donkey anti-mouse (Thermo Fisher Scientific, Waltham, MA, USA; Cat# A-21202) and Alexa Fluor 594 donkey anti-rabbit (Cat# A-21207) at room temperature for 60 min. Nuclei were stained with 4’,6-diamidino-2-phenylindole (DAPI; Thermo Fisher Scientific, Waltham, MA, USA; Cat# D1306). Immunofluorescent images were acquired using an Olympus BX63 upright microscope (Evident Corporation, Tokyo, Japan). Statistical analysis and machine learning Log[2] fold change (log[2]FC) was calculated as the ratio of the mean expression levels between compared groups. A two-sided unpaired Welch’s t test was performed for each comparison, and p-values were adjusted using the Benjamini–Hochberg correction. Proteins or metabolites with an adjusted p-value < 0.05 and an absolute log[2]FC > 0.25 were considered statistically significant. In the random forest analysis, a model with 1000 trees was constructed using the R package randomForest (version 4.7-1.1), with node size set to 5. To assess model generalizability, 100 iterations of tenfold stratified cross-validation were performed. Within each iteration, models were trained on 90% of the training cohort samples and predicted on the remaining 10%. After 100 repetitions, each training cohort sample accumulated multiple predictions; the final predicted probability was computed as the mean probability across all iterations, and the larger probability was used as the predictive label for binary classification. Model performance metrics were subsequently derived from these aggregated probabilities. The decision threshold was optimized by maximizing the Youden index on the training cohort. For clinical deployment, a definitive random forest model (ntree = 1000, node size = 5) was retrained on the complete training cohort and validated on validation cohort 1. For validation in the independent validation cohort 2, generated by targeted proteomics and untargeted metabolomics, z-score normalization was applied before model validation. The same selected features were used in the random forest analysis on the independent validation cohort. Additionally, random forest analysis was performed on the omics features after z-score normalization, yielding identical classification results. Results Study design and patient characteristics To characterize the metabolomic and proteomic profiles of plasma in EOC, we analyzed samples from the final cohort of 251 participants: 97 women with EOC, 38 with BOT, 54 with benign ovarian tumors, and 62 healthy controls (Additional file 1: Fig. S1). The training cohort and validation cohort 1 (n = 121) were recruited between April 2023 and September 2023, while validation cohort 2 (n = 130) was enrolled as an independent cohort from September 2023 to April 2024. Demographic and clinical characteristics of all participants are summarized in Table [79]1. BMI was balanced across groups. Age did not differ significantly between EOC patients and healthy controls. Plasma samples from the training cohort and validation cohort 1 were analyzed using 4D data-independent acquisition (DIA) quantitative proteomics and liquid chromatography-mass spectrometry (LC–MS)-based untargeted metabolomics. For validation cohort 2, targeted proteomics (4D parallel reaction monitoring, PRM) and LC–MS-based untargeted metabolomics were employed to validate the findings (Fig. [80]1). Fig. 1. Fig. 1 [81]Open in a new tab Study design for machine-learning-based classifier development for EOC patients. We first procured samples in a training cohort for proteomic and metabolomic analysis. The classifier was then validated in an independent validation cohort 1, followed by a validation cohort 2 Plasma metabolomic profiling of EOC, BOT, benign ovarian tumor and healthy controls Our data demonstrated high consistency and reproducibility. In QC analysis, the median CV values for metabolomic and proteomic data were 5.6 and 18.7%, respectively (Fig. [82]2A). Reproducibility of metabolite extraction and detection was further confirmed by overlapping total ion chromatograms (TICs) from mass spectrometry analysis of QC samples (Additional file 1: Fig. S2A–S2B). The high stability of the instrument ensured data reliability. Principal component analysis (PCA) was performed to assess overall protein differences between groups and intra-group variability. Tight clustering of QC samples in PCA plots indicated robust system performance and high data quality (Additional file 1: Fig. S2C–S2D). A total of 4362 metabolites were identified and quantified, with details provided in Additional file 2: Table S3. These metabolites spanned multiple categories, including benzene and substituted derivatives (the most abundant), amino acids and their metabolites, heterocyclic compounds, and organic acids and their derivatives (Additional file 1: Fig. S2E). Untargeted metabolomics analysis revealed clear separation between EOC patients and healthy controls or benign ovarian tumor groups, while BOT exhibited partial overlap (Fig. [83]2B). Notably, the metabolomic profiles of benign ovarian tumors and healthy controls were highly similar, suggesting shared biological features. To identify altered plasma metabolites in EOC, pairwise comparisons were performed using a two-sided unpaired Welch’s t test with a FDR < 0.05 and |log₂FC|> 0.25. Significant differential metabolites were identified: 1238 in EOC vs. healthy controls, 860 in EOC vs. benign ovarian tumors, and 256 in EOC vs. BOT (Fig. [84]2C–E). Hierarchical clustering showed distinct separation between EOC and healthy control samples, while benign and BOT samples exhibited higher similarity to EOC (Additional file 1: Fig. S2F). Benign and BOT groups were subsequently combined into a non-malignant ovarian tumor group for further analysis. K-means clustering analysis revealed 4362 metabolites were clustered into significant discrete clusters among the three groups (Fig. [85]2F). We observed an apparent discrimination between EOC and other groups under Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) (p < 0.05; Fig. [86]2G and [87]H). Permutation (200X) validated the credible predictability and fitness in the OPLS-DA model, with R^2Y (cumulative[cum]) = 0.978 and Q^2 (cum) = 0.786 in OPLS-DA. Fig. 2. [88]Fig. 2 [89]Open in a new tab Plasma metabolomic profiling of EOC, BOT, benign ovarian tumor, and healthy controls. A Coefficients of variation (CV) for metabolomic and proteomic data were calculated using pooled quality control (QC) samples derived from the training cohort and validation cohort 1. B Principal component analysis (PCA) of plasma samples in the training cohort was performed based on 4362 quantified metabolites. C–E Volcano plots were generated to compare EOC with BOT, EOC with benign ovarian tumors, and EOC with healthy controls. Metabolites with a log[2]FC greater than 0.25 or less than − 0.25, and an adjusted p-value lower than 0.05, were considered significantly differentially abundant. The number of significantly depleted (green) and enriched (red) metabolites is indicated at the top of each plot. F The 4362 metabolites were clustered using K-means into significant discrete clusters to illustrate the relative expression changes in the metabolomics data. The groups in the metabolomics data include EOC, Non-Malignant (BOT and benign ovarian tumors), and healthy controls. G Orthogonal partial least squares-discriminant analysis (OPLS-DA) was applied to plasma metabolomics data to distinguish between EOC patients and Non-OC (BOT, benign ovarian tumors, and healthy controls) in the training set. EOC samples are shown in green, and Non-OC samples are shown in orange. H The validity of OPLS-DA model in G was confirmed by a permutation test (permutation times n = 200), showing no overfitting. R^2Y measures the goodness of fit, while Q^2 measures the predictive ability of the model Plasma proteomic profiling of EOC, BOT, benign ovarian tumor, and healthy controls Altogether, 2753 proteins were identified and quantified (Fig. [90]3A). Detailed protein information is provided in Additional file 2: Table S4. Unsupervised PCA of proteomics data (without prior feature selection) revealed partial separation between plasma samples from EOC patients and other groups (Fig. [91]3B). To characterize altered plasma proteins in EOC, we performed pairwise comparisons to identify significantly differentially abundant proteins. A total of 561, 385, and 208 proteins showed significant abundance differences (two-sided unpaired Welch’s t test, FDR < 0.05, |log2FC|> 0.25) in EOC vs. healthy controls, EOC vs. benign ovarian tumors, and EOC vs. BOT, respectively (Fig. [92]3C–E). Benign ovarian tumors and BOT were combined into a non-malignant group for analysis. K-means clustering analysis categorized the 2753 proteins into distinct expression patterns across the three groups: EOC, non-malignant ovarian tumors, and healthy controls (Fig. [93]3F). Fig. 3. [94]Fig. 3 [95]Open in a new tab Plasma proteomic profiling of EOC, BOT, benign ovarian tumors and healthy controls. A Bar chart shows the following: Database: total protein sequences in the reference database; Peptides: total identified peptides; Identified Proteins: total identified proteins; Quantification Proteins: total quantified proteins. B PCA of plasma samples (training cohort) based on 2753 quantified proteins. EOC samples show partial separation from other groups. C–E Volcano plots were generated to compare EOC with BOT, EOC with benign ovarian tumors, and EOC with healthy controls. Proteins with log[2]FC beyond 0.25 or below 0.25 with adjusted p value lower than 0.05 were considered significantly differentially abundant. The number of significantly down- (green) and up- (red) regulated proteins is indicated at the top of each plot. F K-means clustering of 2753 proteins across three groups: EOC, non-malignant (BOT and benign ovarian tumors), and healthy controls. Sub class 1–8 represent distinct co-expression patterns Metabolite biomarker screening for EOC diagnosis and differential diagnosis We identified 355 metabolites with increased abundance and 121 metabolites with decreased abundance in the plasma of EOC patients compared to both non-malignant ovarian tumor patients and healthy controls (Fig. [96]4A). Among these, 244 metabolites were detected in the validation cohort 2. Detailed metabolite information is provided in Additional file 2: Table S5. To trace the potential biological origins of these metabolites, we utilized MetOrigin [[97]39] (a bioinformatics tool freely available). The analysis revealed that the metabolites originated from multiple sources: host metabolism, gut microbiota activity, host-microbiota co-metabolism, and other exogenous factors (e.g., diet, drugs, environmental exposures, or unknown sources) (Fig. [98]4B). From this pool, 75 metabolites derived from host, microbiota, or co-metabolism were selected as candidate diagnostic biomarkers (Fig. [99]4C–D). Notably, kynurenine (Kyn), indole, and 3-hydroxybutyrate demonstrated the highest reproducibility in abundance levels (Fig. [100]4E) and diagnostic performance (Fig. [101]4F). To confirm metabolite identity, we generated mirror plots comparing experimental MS/MS spectra with reference spectral libraries, showing high similarity (Additional file 1: Fig. S3A–S3C). Fig. 4. [102]Fig. 4 [103]Open in a new tab Metabolite biomarker screening for EOC diagnosis and differential diagnosis. A Venn diagrams illustrating overlaps between significantly enriched (increased abundance) and depleted (decreased abundance) metabolites in EOC vs. healthy controls and EOC vs. non-malignant ovarian tumors. Metabolites labeled in red represent consistently dysregulated candidates across both comparisons (FDR < 0.05, |log[2]FC|> 0.25). B Biological origin of red-labeled metabolites in A predicted by MetOrigin, integrating annotations from KEGG (Kyoto Encyclopedia of Genes and Genomes) and HMDB (Human Metabolome Database). Origins were categorized as host-derived, microbiota-derived, host-microbiota co-metabolism, or exogenous sources (e.g., diet, drugs). C Venn diagram showing overlaps between host- and microbiota-associated metabolites. D Heatmap of host- and microbiota-derived metabolites in B. Relative metabolite abundance is color-coded (blue: low; red: high; z-score normalized). E Validation of three candidate metabolites (kynurenine, indole, 3-hydroxybutyrate) showing consistent abundance differences between EOC and other groups in both the training cohort and validation cohort 2. Asterisks indicate statistical significance based on unpaired two-sided Welch’s t test. p value: *, < 0.05; **, < 0.01; ***, < 0.001. F Receiver operating characteristic (ROC) curves demonstrating diagnostic performance (AUC values) of the three metabolites in the training and validation cohort 2 Protein biomarker screening for EOC diagnosis and differential diagnosis We identified 89 proteins with significantly increased abundance and 71 proteins with decreased abundance in plasma from EOC patients compared to both non-malignant ovarian tumor patients and healthy controls (Fig. [104]5A). Seventeen proteins exhibiting > 50% missing values across samples were excluded to ensure data robustness (Fig. [105]5B). Using random forest analysis, we ranked proteins by mean decrease accuracy and selected the top 25 candidate features in the training cohort (Fig. [106]5C). Subsequent validation via 4D-PRM (Parallel Reaction Monitoring) targeted proteomics confirmed 12 proteins with reproducible quantification (Fig. [107]5D). The details of proteins are shown in Additional file 2: Table S6. Four proteins demonstrated the most pronounced differential abundance between EOC and other groups: Leucine-rich alpha-2-glycoprotein (LRG1), Inter-alpha-trypsin inhibitor heavy chain H3 (ITIH3), Protein disulfide-isomerase A4 (PDIA4), and Serum paraoxonase/arylesterase 1 (PON1) (Fig. [108]5E). Receiver operating characteristic (ROC) analysis revealed excellent diagnostic performance for these proteins in both training cohort and validation cohort 2 (Fig. [109]5F). Fig. 5. [110]Fig. 5 [111]Open in a new tab Protein biomarker screening for EOC diagnosis and differential diagnosis. A Venn diagrams showing overlaps between significantly upregulated (red) and downregulated (blue) proteins in EOC vs. healthy controls and EOC vs. non-malignant ovarian tumors. Proteins labeled in red are consistently dysregulated across both comparisons. B Heatmap of z-score normalized abundance for differential proteins identified in A. Columns represent sample groups; rows represent proteins. C Top 25 proteins ranked by mean decrease accuracy from random forest analysis. D Heatmap of 12 proteins validated by 4D-PRM targeted proteomics. E Abundance levels of four candidate proteins (LRG1, ITIH3, PDIA4, PON1) in EOC, non-malignant ovarian tumors, and healthy controls. Asterisks indicate statistical significance based on unpaired two-sided Welch’s t test. p value: *, < 0.05; **, < 0.01; ***, < 0.001. F ROC curves showing diagnostic performance of the four proteins in the training cohort and validation cohort 2 Identification of EOC patients using machine learning We developed a random forest classifier to discriminate EOC patients from non-malignant ovarian tumors (benign tumors and BOT) and healthy controls, using 4 proteins (LRG1, ITIH3, PDIA4, PON1) and 3 metabolites (kynurenine, indole, 3-hydroxybutyrate) in the training cohort (n = 34 EOC vs. 62 non-OC). The 7 selected biomarkers exhibited no missing values across all cohorts. The model achieved excellent performance in the training cohort, with AUC = 0.975 (95% CI 0.943–0.997), sensitivity = 95.2%, and specificity = 91.2% (Fig. [112]6A). The optimal threshold of 0.423 (annotated in Fig. [113]6A) was identified by maximizing the Youden index on the training cohort. Feature importance was ranked by mean decrease in accuracy (Fig. [114]6B). We then tested the classifier on an independent validation cohort 1 of 25 patients. This classifier reached an AUC of 0.962 (95% CI 0.878–1.000) in the validation cohort 1 (Fig. [115]6C–D). To further validate this classifier, we applied it to validation cohort 2, which included 51 EOC patients, and achieved an AUC of 0.965 (95% CI 0.921–0.995) (Fig. [116]6E–F). Precision-Recall (PR) curves demonstrating the trade-off between positive predictive value and sensitivity across the training cohort, validation cohort 1, and validation cohort 2 are comprehensively presented in Additional file 1: Fig. S4A–S4C. Confusion matrix and all performance metrics under the optimal threshold were evaluated on the training cohort, validation cohort 1, and validation cohort 2 (Additional file 1: Fig. S4D–S4G). Decision Curve Analysis (DCA) and Clinical Impact Curve (CIC) were employed to quantify the clinical utility of our model (Additional file 1: Fig. S4H–S4I). DCA demonstrated superior net benefit of our model (red curve) over “treat-all” and “treat-none” strategies within the threshold range 0.1–1.0. CIC demonstrated exceptional concordance between predicted and actual cases at thresholds above 0.4. Furthermore, the machine learning model demonstrated higher discriminative power than CA-125 and HE4 in discriminating between EOC and Non-malignant ovarian tumor (Additional file 1: Fig. S4J–S4K). To address clinical utility across disease spectra, we evaluated the model in key subgroups defined by FIGO stage and histology (cohort details in Table [117]1). In the stratified assessment of validation cohort 2 (Additional file 1: Fig. S5), the machine learning model demonstrated consistently strong performance across key subgroups, albeit with expected variation reflective of disease biology. For early-stage EOC, the model achieved an AUC of 0.899 (95% CI 0.772–0.986). For late-stage EOC, the model achieved an AUC of 0.996 (95% CI 0.987–1.000) and providing high-stakes decision support for treatment stratification. In HGSOC patients, discriminative power remained excellent with an AUC of 0.967 (95% CI 0.915–0.997). Fig. 6. [118]Fig. 6 [119]Open in a new tab Identification of EOC patients using machine learning. A ROC curve of the random forest model in the training cohort (n = 34 EOC vs. 62 non-OC). The model achieved an AUC of 0.975. B Feature importance ranking based on mean decrease in accuracy for the 7 biomarkers. C ROC curve of the model in independent validation cohort 1 (n = 25; 12 EOC vs. 13 non-OC) without batch correction, showing an AUC of 0.962. D Performance of the classifier in the validation cohort 1. E ROC curve of the classifier in independent validation cohort 2 (n = 51 EOC vs. 79 non-OC) after batch correction, with an AUC of 0.965. F Performance of the classifier in the validation cohort 2 Proteomic and metabolomic alterations in EOC plasma We identified 561 differentially abundant proteins (|log2FC|> 0.25, FDR < 0.05) in plasma from EOC patients vs. healthy controls (Fig. [120]3C). Subcellular localization predicted by WoLF PSORT software showed predominant extracellular (31.55%) and cytoplasmic (29.59%) localization of these proteins (Additional file 1: Fig. S6A). Further, Gene Ontology (GO), Clusters of Orthologous Groups (KOG), and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis were employed to characterize dysregulated mechanisms in EOC. Biological processes indicated that differentially abundant proteins were enriched in cell adhesion, innate immune response, intracellular protein transport, and signal transduction. Cellular components analysis showed that differentially abundant proteins were localized in both the extracellular space and nucleus. Molecular function of GO analysis revealed differentially abundant proteins related to calcium ion binding and identical protein binding (Fig. [121]7A). Additionally, KEGG pathway analysis indicated that differentially abundant proteins were enriched in metabolic pathway and PI3K-Akt signaling pathway (Fig. [122]7B–C). KOG analysis was conducted using the KOG database to classify proteins based on their functional categories. KOG analysis indicated that differentially abundant proteins were enriched in signal transduction mechanisms and posttranslational modification, protein turnover, and chaperones (Fig. [123]7D and Additional file 1: Fig. S6B). Protein–protein interaction (PPI) analysis was performed using the STRING database [[124]40]. Differentially abundant proteins with a confidence score > 400 were used to construct the PPI network. Hub proteins with |log[2]FC|> 0.45 and a node degree > 18 were identified and visualized using Cytoscape (version 3.10.0) (Additional file 1: Fig. S7A) [[125]41]. The top three highest-scoring hub modules were identified using the MCODE plugin [[126]42] in Cytoscape (Fig. [127]7E, Additional file 1: Fig. S7B–S7C). Of particular interest, ITIH3 and LRG1 were present in the top-scoring hub module, whereas PDIA4 and PON1 were identified in the third top-scoring hub module, suggesting their pivotal roles in EOC pathogenesis. Fig. 7. [128]Fig. 7 [129]Open in a new tab Proteomic alterations in EOC plasma compared to healthy controls. A Gene Ontology enrichment analysis of differentially abundant proteins showing the top five enriched terms in biological process, cellular component, and molecular function. B KEGG pathway enrichment analysis of differentially abundant proteins highlighting the top five pathways. C Heatmap of differentially abundant proteins associated with metabolic and PI3K-Akt signaling pathways. D Top five KOG classifications of differentially abundant proteins. E PPI of differentially abundant proteins revealed the hub module with the highest score, which was identified using the MCODE plugin in Cytoscape. Node size corresponds to |log[2]FC|, edge width represents the combined score, and node color denotes subcellular localization Untargeted metabolomics analysis identified 964 metabolites with increased abundance and 274 metabolites with decreased abundance in EOC plasma compared to healthy controls (Fig. [130]2C). Among these, 326 metabolites originating from host metabolism, gut microbiota activity, or host-microbiota co-metabolism were selected for pathway analysis (Additional file 1: Fig. S8). Pathway enrichment analysis revealed the top 20 dysregulated pathways (Fig. [131]8A), including sphingolipid metabolism, steroid hormone biosynthesis, and tryptophan metabolism (Fig. [132]8B). Pearson correlation analysis (r > 0.5, p < 0.01) was performed between differentially abundant proteins (proteins with > 50% missing values excluded) and metabolites. We identified 241 significant protein-metabolite pairs involving 52 metabolites and 87 proteins. Our analysis revealed that the majority of EOC-enriched plasma metabolites was positively associated with EOC-upregulated proteins (Fig. [133]9). Conversely, negative correlations were observed between most EOC-enriched metabolites and EOC-downregulated proteins. Fig. 8. [134]Fig. 8 [135]Open in a new tab Metabolomic alterations in EOC plasma compared to healthy controls. A Pathway enrichment analysis of differentially abundant metabolites in EOC vs. healthy controls. P values are displayed as base-0.05 logarithmic transformations [log₀.₀₅(Pvalue)]. Smaller P values correspond to larger logarithmic magnitudes, with log₀.₀₅(Pvalue) > 1 indicating statistical significance (P < 0.05). B Heatmap of metabolites annotated to the top three pathways in A Fig. 9. [136]Fig. 9 [137]Open in a new tab Pearson correlation analysis between differentially abundant metabolites and proteins in EOC vs. combined control groups (non-malignant tumors and healthy controls). Metabolites or proteins increased or decreased in EOC are labeled in red and blue, respectively. *p < 0.05, **p < 0.01 TDO2^+ fibroblasts is a contributing factor to enriched kynurenine Our study reveals that Kyn, a tryptophan (Trp) catabolite, is significantly enriched in the plasma of EOC patients (Fig. [138]4E), particularly those with HGSOC. Trp metabolism via the Kyn pathway is primarily mediated by the rate-limiting enzymes indoleamine 2,3-dioxygenase (IDO1/2) and tryptophan 2,3-dioxygenase (TDO2) [[139]43]. Analysis of the TCGA-OV and GTEx datasets demonstrated significant upregulation of both IDO1 and TDO2 in ovarian tumor tissues vs. normal tissues (Fig. [140]10A and [141]B) [[142]36]. We evaluated the expression of IDO1 and TDO2 in the scRNA-seq datasets of 160 tumor sites from 42 treatment-naive patients with HGSOC (GEO: [143]GSE180661) and available as processed objects from Synapse (syn33521743) [[144]37]. UMAP plots showed that IDO1 was lowly expressed and TDO2 was exclusively or highly expressed in fibroblasts rather than in other cell types (e.g., ovarian cancer cells, epithelial cells, immune cells and endothelial cells) (Fig. [145]10C and [146]D). Kaplan–Meier curves showed that the 5-year OS and PFS of the TDO2^high OC were significantly worse than those of the TDO2^low OC (P < 0.05, Fig. [147]10E and [148]F) [[149]38]. Experimental validation confirmed elevated TDO2 mRNA levels in HGSOC tissues vs. normal fallopian tubes (p < 0.0001, Fig. [150]10G). Representative IHC images of normal fallopian tubes, TDO2^high and TDO2^low HGSOC, are shown in Fig. [151]10H. TDO2^+ fibroblasts play a crucial role in promoting immune evasion and metastasis. In lung metastasis, TDO2^+ matrix fibroblasts facilitate the immune evasion of disseminated tumor cells and promote metastatic progression through the production of Kyn, suggesting that targeting stromal cell metabolism could be a therapeutic strategy for breast cancer patients with lung metastasis [[152]44]. In oral squamous cell carcinoma, TDO2^+ myofibroblasts attract T cells and induce the transformation of CD4^+ T cells into Tregs and cause CD8^+ T cell dysfunction, highlighting TDO2^+ myofibroblasts as potential targets for immunotherapy [[153]45]. Consistent with our observations, these two studies highlight that TDO2 is predominantly expressed in fibroblasts. Immunofluorescence staining demonstrated the presence of TDO2^+ fibroblasts within the tumor microenvironment of HGSOC (Fig. [154]10I). Fig. 10. [155]Fig. 10 [156]Open in a new tab TDO2^+ fibroblasts is a contributing factor to enriched kynurenine. A–B Box plots showing relative mRNA expression of IDO1 and TDO2 in OC (n = 426) vs. normal ovarian tissues (n = 88) in the TCGA dataset of OV and the GTEx projects. C–D UMAP plot of cells profiled by scRNA-seq colored by the expression of IDO1 and TDO2. Cell types are highlighted with grey outlines. E–F Kaplan–Meier survival analysis of OS and PFS in OC patients stratified by TDO2 expression (microarray data). G qRT-PCR validation of TDO2 upregulation in HGSOC vs. normal fallopian tube tissues. H Representative IHC images of normal fallopian tubes, TDO2^high and TDO2^low HGSOC. Scale bars: 200 μm (left) and 100 μm (right). I Immunofluorescence co-staining of α-SMA (fibroblasts marker, green) and TDO2 (red) in HGSOC tissues. Nuclei counterstained with DAPI (blue) Discussion This study performed comprehensive proteomic and metabolomic profiling of plasma from 251 women, including patients with EOC (n = 97), BOT (n = 38), benign ovarian tumors (n = 54), and age-matched healthy controls (n = 62). Using LC–MS, we identified 2753 proteins and 4362 metabolites across these groups. We established a protein-metabolite machine learning model incorporating LRG1, ITIH3, PDIA4, PON1, Kyn, indole, and 3-hydroxybutyrate. This model was validated in an independent validation cohort (n = 130), achieving an AUC of 0.965. To our knowledge, this represents the largest plasma-based LC–MS proteomic and metabolomic analysis for EOC to date. Existing algorithms like ROMA and OVERA prioritize sensitivity (e.g., ROMA: 94% sensitivity at 75% specificity) [[157]14, [158]27], but their limited specificity often results in high false-positive rates [[159]46, [160]47], highlighting the need for more robust diagnostic tools. Our model demonstrated high accuracy in validation, underscoring the potential of multi-omics integration for preoperative risk stratification. Pathway analysis provided critical insights into EOC pathophysiology. After removing high-abundance proteins to enhance detection of low-abundance biomarkers, we quantified over 2000 proteins, revealing significant dysregulation in EOC patients compared to healthy controls. Key pathways included metabolic reprogramming—a hallmark of cancer [[161]48]—and PI3K-Akt signaling, frequently implicated in ovarian carcinogenesis. Notably, sphingolipid metabolism emerged as the most perturbed pathway, consistent with its established role in cancer progression. Sphingosine-1-phosphate (S1P), a pro-tumorigenic metabolite enriched in EOC plasma, promotes chemoresistance, metastasis, and immune evasion via the sphingosine kinase 1–S1P receptor 1 axis [[162]49, [163]50], reinforcing sphingolipid signaling as a therapeutic target. We observed enrichment of Kyn in EOC plasma, and IF confirmed the presence of TDO2^+ fibroblasts within HGSOC the tumor microenvironment. Trp, an essential amino acid obtained solely from dietary, sources: is metabolized primarily through three pathways: Kyn, 5-hydroxytryptamine, and indole [[164]51]. Increased levels of Trp catabolites—particularly Kyn, Indolepyruvate, N-Acetylserotonin, 5-Methoxyindoleacetate, 5-Hydroxyindoleacetic acid, and Formyl-5-hydroxykynurenamine—indicate enhanced Trp metabolism via the Kyn and 5-hydroxytryptamine pathways. Both Kyn and indole are key Trp catabolites. Kyn derivatives can potently activate the aryl hydrocarbon receptor, leading to immunosuppression [[165]52]. Previous studies show Kyn progressively enriched along the colorectal adenoma-carcinoma sequence, with significantly higher plasma levels in advanced adenomas compared to non-advanced adenomas, accompanied by indole depletion. Furthermore, lower Kyn levels were associated with better chemotherapy response[[166]53]. It is also important to consider the potential influence of pre-analytical factors (such as diet, fasting status, and medications) on the observed metabolomic profiles. While we implemented standardized sample collection protocols (e.g., fasting blood draws, controlled sample processing times) to mitigate these variations, residual confounding effects cannot be entirely ruled out. Moreover, although clear exclusion criteria were established, patients’ potential medication history was not specifically accounted for in this study. Future validation efforts should incorporate detailed documentation and analysis of these variables. Among the key biomarkers, LRG1 (an acute-phase protein), ITIH3 (a component of extracellular matrix stabilization complexes), and PDIA4 (a disulfide isomerase linked to endoplasmic reticulum stress) emerged as robust diagnostic candidates. LRG1 and ITIH3 demonstrate pan-cancer utility in multi-marker panels for gastrointestinal and reproductive malignancies (e.g., colorectal, pancreatic, gastric, ovarian cancer) [[167]54–[168]59], highlighting the superiority of combinatorial strategies over single-marker approaches. PDIA4 was significantly elevated in EOC plasma, mirroring its tissue-specific overexpression in HGSOC [[169]60]. These proteins collectively reflect the systemic impact of EOC on host physiology, linking tumor-specific alterations with broader metabolic perturbations. Our study reveals elevated plasma levels of 3-hydroxybutyric acid (β-hydroxybutyrate, βHB) in EOC patients, a ketone body derived from branched-chain amino acid catabolism. Plasma levels of 3-hydroxybutyric acid (β-hydroxybutyrate, βHB), a ketone body derived from branched-chain amino acid catabolism, were elevated in EOC patients. Associations between plasma βHB levels and cancer risk vary: elevated levels correlate with reduced hepatocellular carcinoma and lymphoma risk [[170]61] but increased melanoma susceptibility [[171]62]. While evidence on βHB in cancer plasma is limited, its roles are context-dependent: it suppresses colorectal cancer progression by inhibiting HIF-1α/VEGFA signaling [[172]63] and inhibits glioma growth by activating pro-inflammatory astrocytes [[173]64]. Conversely, βHB promotes pancreatic cancer metastasis through HMGCL-driven ketogenesis [[174]65], exacerbates CRC proliferation via ACAT1-mediated IDH1 acetylation [[175]66], and contributes to chemoresistance in bladder cancer through OXCT1-dependent metabolic reprogramming [[176]67]. This dualistic behavior underscores the tissue-specific metabolic adaptations in cancer and positions βHB as a potential therapeutic target in EOC, warranting further mechanistic investigation. Currently, OC screening is not recommended for low- or high-risk populations. In low-risk groups, screening fails to reduce mortality while increasing morbidity through false positives, unnecessary surgeries, and associated complications. Even in high-risk populations (e.g., BRCA mutation carriers), screening does not lower mortality despite enabling earlier detection and improved surgical outcomes; annual TVUS screening in BRCA1 carriers was associated with a fourfold higher 10-year mortality risk compared to risk-reducing bilateral salpingo-oophorectomy (rrBSO) [[177]68]. Abandoning screening necessitates alternative risk mitigation strategies. Population-based multigene testing represents a cost-effective approach for breast and ovarian cancer prevention, surpassing family history-based testing in economic analyses [[178]69]. This shift from reactive screening to proactive genetic risk stratification aligns with precision oncology, enabling personalized interventions (e.g., rrBSO in high-risk carriers) while avoiding universal screening pitfalls. Our findings have multifaceted clinical implications. The early detection of OC, particularly HGSOC, remains a formidable challenge due to minimal systemic perturbations and subtle biofluid alterations during initial stages. These characteristics likely contribute to the limited sensitivity of peripheral non-invasive detection methods, as early-stage tumors may evade recognition by conventional single-analyte assays. No universally accepted gold-standard screening approach exists for early ovarian cancer, underscoring the critical need for innovative strategies to address this unmet clinical demand given the high lethality of epithelial ovarian malignancies. Emerging evidence suggests multi-modal diagnostic frameworks, particularly those integrating artificial intelligence (AI)-driven predictive models, can overcome single-marker limitations. For example, a multi-criteria classification fusion (MCF) model using 52 parameters (51 laboratory tests and age) achieved an AUC of 0.949 in internal validation, with robust generalizability in external cohorts (AUC 0.882–0.884) [[179]70]. Strikingly, this model outperformed traditional biomarkers (CA-125, HE4) in early-stage detection and retained diagnostic efficacy even after excluding tumor-specific markers, highlighting the power of multi-parameter systems. Although CA-125 and HE4 are widely utilized for ovarian cancer diagnosis, their sensitivity and specificity are limited by disease heterogeneity and interference from inflammatory or benign gynecological conditions (e.g., endometriosis) [[180]71]. In the present study, given that healthy controls did not undergo CA-125/HE4 testing, we exclusively compared our diagnostic model with CA-125/HE4 in distinguishing EOC from non-malignant ovarian tumors. We demonstrated that our model achieved higher AUC values than both biomarkers alone. Proteomics captures dynamic functional proteins within the tumor microenvironment, whereas metabolomics reflects downstream metabolic reprogramming signatures. Integrating these approaches provides a more comprehensive profiling of tumor biology, and mitigates bias from single-marker reliance. Recent colorectal cancer studies [[181]72] demonstrated that a combined protein-metabolite diagnostic model was developed using logistic regression, effectively distinguishing CRC patients from healthy individuals. Building on this paradigm, we employed LC–MS-based proteomic and metabolomic profiling to characterize plasma alterations across dual omics layers in OC. While limited early-stage HGSOC samples availability constrained our analysis—a recognized bottleneck in OC research—integrative multi-omics data fusion significantly enhanced detection accuracy compared to single-modality approaches. Although the sample size precluded developing a dedicated early-stage diagnostic model, our findings robustly support the hypothesis that multi-analyte integration improves diagnostic precision. These results align with prior reports emphasizing the power of combinatorial biomarkers. The MCF model’s sustained performance without key tumor markers parallels our observation that non-tumor-derived proteomic/metabolomic features contribute significantly to classification accuracy. This suggests that systemic metabolic dysregulation and microenvironmental remodeling, beyond tumor-secreted factors alone, may underpin detectable biofluid signatures even in early disease. Our study provides a reliable non-invasive preoperative assessment tool for ovarian tumors, addressing critical limitations in current diagnostics. The high positive predictive value (PPV) of our multi-analyte panel minimizes false-positive, reducing unnecessary surgeries, mitigating risks from aggressive procedures, and optimizing preoperative resource allocation—particularly for elderly populations unsuitable for extensive surgery. Its liquid biopsy-based methodology enhances patient compliance compared to invasive tissue sampling. For patients identified as high-risk for EOC, immediate referral to specialized gynecologic oncology centers enables timely implementation of standardized preoperative protocols, including advanced imaging, multidisciplinary planning, and risk-adapted surgical strategies. This triage system improves surgical outcomes and reduces delays in initiating adjuvant therapy for confirmed malignancies. The primary limitations include the monocentric design and relatively small sample size derived exclusively from an Asian population, potentially limiting generalizability to other ethnic groups and statistical power to detect subtle biomarker differences. Future multi-center studies involving diverse cohorts are needed to validate and refine the model. Leveraging AI-driven platforms for large-scale multi-omics data integration could enhance algorithmic robustness. Additionally, exploring dynamic biomarker panels adaptable to disease progression stages may further optimize clinical utility. Conclusions Through integrated proteomic and metabolomic profiling of plasma samples, we identified disease-specific molecular signatures that effectively discriminate EOC from borderline and benign ovarian tumors, and healthy controls. Differential protein expression highlighted dysregulation of metabolic pathways and the PI3K-Akt signaling pathway in EOC patients. Metabolic pathway enrichment analysis revealed perturbations in sphingolipid metabolism, steroid hormone biosynthesis, and Trp metabolism. Notably, Kyn—a Trp catabolite—was specifically enriched in EOC plasma. TDO2^+ fibroblasts in the tumor microenvironment are considered contributing factors to Kyn enrichment. A machine-learning model incorporating four proteins and three metabolites demonstrated robust diagnostic performance (training cohort: n = 96, AUC = 0.975), validated in independent cohorts (validation cohort 1: n = 25, AUC = 0.962; validation cohort 2: n = 130, AUC = 0.965). This noninvasive tool shows significant potential to improve preoperative EOC diagnosis. Before clinical implementation, further validation through multicenter studies involving larger, more heterogeneous patient cohorts is essential. Supplementary Information [182]12916_2025_4341_MOESM1_ESM.docx^ (5.2MB, docx) Additional File 1: Figures S1–S8. Fig. S1 Flow diagram of study participants. Fig. S2 Quality control of metabolome analysis. A, Total ion chromatograms (TICs) of pooled QC samples in negative ion mode. B, TICs of pooled QC samples in positive ion mode. C, PCA of plasma samples and QC samples (training cohort) in negative ion mode. QC clusters tightly, indicating analytical stability. D, PCA of plasma samples and QC samples (training cohort) in positive ion mode. E, Donut chart illustrating the distribution of metabolites across categories. F, Hierarchical clustering analysis showing the similarity between sample groups based on metabolite expression patterns. Fig. S3 Mirror plots comparing experimental MS/MS spectra (top) with reference spectral library matches (bottom) for: A, Kynurenine; B, Indole; C, 3-Hydroxybutyrate. Fig S4 Performance assessment of the machine Learning model. A, Precision − Recall Curves of Training Cohort. B, Precision − Recall Curves of Validation Cohort 1. C, Precision − Recall Curves of Validation Cohort 2. D, Confusion Matrix of Training Cohort. E, Confusion Matrix of Validation Cohort 1. F, Confusion Matrix of Validation Cohort 2. G, Performance metrics of Training Cohort, Validation Cohort 1 and Validation Cohort 2. H, Decision Curve Analysis (DCA). The blue curve plots net benefit of our model across threshold probabilities. Green line:"treat-none"strategy; Red line:"treat-all"strategy. Analyses derived from validation cohort 2. I, Clinical Impact Curve (CIC). Blue curve: number of patients classified as high-risk at each threshold; red line: actual EOC cases among high-risk patients. Analyses derived from validation cohort 2. J, Performance of the machine learning model (blue line) versus CA-125 (red line) in validation cohort 2. K, Performance of the machine learning model (blue line) versus HE4 (red line) in validation cohort 2. Fig. S5 Stratified Performance of the Machine Learning Model in Validation Cohort 2. A, ROC curve for early-stage EOC classification (FIGO I-II, n = 16) versus non-OC (n = 79). B, ROC curve for late-stage EOC classification (FIGO III-IV, n = 35) versus non-OC (n = 79). C, ROC curve for HGSOC subtype classification (n = 34) versus non-OC (n = 79). Fig. S6 Proteomic alterations in EOC plasma. A, Subcellular localization of differentially abundant proteins predicted by WoLF PSORT. B, Heatmap of differentially abundant proteins enriched in KOG functional categories: Signal transduction mechanisms (T) and Post-translational modification, protein turnover, chaperones (O). Fig. S7 PPI Network Analysis of Hub Proteins (confidence score > 400). A, PPI network constructed using differentially abundant proteins (|log2FC|> 0.45, FDR < 0.05, node degree > 18. B–C, Second and third top-scoring modules identified by the MCODE plugin in Cytoscape. Circles and squares represent up-regulated and down-regulated proteins, respectively. Node size corresponds to |log[2]FC|, edge width represents the combined score, and node color denotes subcellular localization. Fig. S8 Heatmap of 326 metabolites derived from the host, microbiome, or potential co-metabolism. [183]12916_2025_4341_MOESM2_ESM.xlsx^ (13MB, xlsx) Additional file 2: Table S1–S6. Table S1. Key reagents for LC–MS analysis. Table S2. Key instruments for LC–MS analysis. Table S3. The metabolite matrix (4,362 features) for 121 patients in the training cohort and validation cohort 1. Table S4. The protein matrix (2,753 features) for 121 patients in the training cohort and validation cohort 1. Table S5. The metabolite matrix (4,386 features) for 130 patients in the validation cohort 2. Table S6. The protein matrix (12 features) for 130 patients in the validation cohort 2. [184]Additional file 3^ (100.5KB, doc) Acknowledgements