Graphical abstract

   graphic file with name fx1.jpg
   [34]Open in a new tab

Highlights

     * •
       Harvesting uterine fluid for early detection of ovarian cancer
     * •
       Extracting metabolites from uterine fluid
     * •
       Running LC-MS for untargeted metabolomics
     * •
       Bioinformatics analysis and visualization strategies
     __________________________________________________________________

   Publisher’s note: Undertaking any experimental protocol requires
   adherence to local institutional guidelines for laboratory safety and
   ethics.
     __________________________________________________________________

   High mortality of ovarian cancer (OC) is primarily attributed to the
   lack of effective early detection methods. Uterine fluid, pooling
   molecules from neighboring ovaries, presents an organ-specific
   advantage over conventional blood samples. Here, we present a protocol
   for identifying metabolite biomarkers in uterine fluid for early OC
   detection. We describe steps for uterine fluid collection from
   patients, metabolite extraction, metabolomics experiments, and
   candidate metabolite biomarker screening. This standardized workflow
   holds the potential to achieve early OC diagnosis in clinical practice.

Before you begin

Overview

   OC has the highest mortality rate among gynecological tumors primarily
   because of the absence of effective biomarkers for early
   diagnosis.[35]^2 The 5-year survival rate for localized OC exceeds 90%,
   whereas it drops to less than 30% for patients with distant spread, a
   pattern observed across various types of cancers as well.[36]^3

   Although clinical serum biomarkers such as cancer antigen 125 (CA125)
   and human epididymis protein 4 (HE4) are useful in monitoring OC
   progression, these circulation-based biomarkers exhibit inefficient
   performance in the early detection of OC. Differing from blood that
   lacks organ-specific characteristics, uterine fluid displays unique
   features owing to its proximity to adjacent ovaries, supported by the
   closely interconnected structure of the ovaries, fallopian tubes, and
   uterine cavity. Molecular substances shed from ovaries can be
   transported to uterine cavity, where they can be retrieved through
   uterine flushing.[37]^4 This phenomenon establishes the conceptual
   foundation for uterine fluid to be a valuable sample source for early
   detection of OC. Liquid chromatography-mass spectrometry (LC-MS)-based
   untargeted metabolomics has demonstrated substantial potential in
   disease exploration, as illustrated earlier in STAR
   Protocols.[38]^5^,[39]^6^,[40]^7^,[41]^8 This untargeted approach
   produces massive amounts of data covering the entire metabolite profile
   within biological samples.[42]^9

   Notably, this metabolomics workflow is applicable to various biological
   bodily fluids, extending from uterine fluid to nasal fluid,
   bronchoalveolar fluid, cerebrospinal fluid, pleural fluid and ascitic
   fluids, etc. For nasopharyngeal cancer (NPC), highly curable when
   detected early,[43]^10 the existing diagnostic approaches largely hinge
   on MRI and biopsy,[44]^11 lacking efficient and noninvasive methods for
   early detection. In this context, the investigation of nasal fluid
   emerges as a promising pathway for advancing the early diagnosis in
   NPC. Moreover, in terms of bronchoalveolar lavage fluid, our
   methodology could aid clinicians in better understanding the metabolic
   changes associated with pulmonary diseases. As for cerebrospinal fluid,
   metabolomic analysis provides insights into early diagnosis and the
   pathological mechanisms of neurological diseases such as Alzheimer’s
   Disease and multiple sclerosis.

   The protocol below describes the detailed steps of identifying
   metabolite biomarkers sourced from uterine fluid, aimed at improving
   early-stage OC detection. This workflow includes uterine fluid
   collection, metabolite extraction, LC-MS-based untargeted metabolomics
   analysis, and biomarker screening analysis. Remarkably, multi-level
   bioinformatics analysis tools are used to interpret the high
   dimensional metabolome-wide data. To conclude, this protocol serves as
   a powerful strategy for biomarker discovery applicable to diagnostics,
   therapeutics, and disease stratification, targeting a broad spectrum of
   severe diseases that extend beyond OC.

Institutional permissions

   The collection of uterine fluid samples from patients in the study has
   been approval by Peking University Third Hospital Medical Science
   Research Ethic Committee (IRB00006761-M2019471). All participants in
   our study signed informed consent, and the uterine flushing procedures
   were performed by professional gynecological surgeons in accord with
   the ethical guidelines. Please note that the use of clinical specimens
   from patients require permissions from your local institutions.

Establish the inclusion and exclusion criteria of patients

     Inline graphic Timing: variable

   This section outlines the enrollment criteria and cohort division for
   participants.
     * 1.
       Include patients clinically diagnosed with early-stage OC,
       late-stage OC, endometrial cancer, and benign gynecological
       diseases.
     * 2.
       Exclude patients presenting bilateral oophorectomy, metabolic
       disorders, abnormal uterine cavity, and other malignancies
       unrelated to primary OC or primary endometrial cancer.
     * 3.
       Collect clinical records of patients.

     Note: The clinical information includes general physiological data
     (age, weight, body mass index), serum indicators such as CA125, HE4,
     the Risk of Ovarian Malignancy Algorithm (ROMA),[45]^12^,[46]^13
     along with family history, parity history, oral contraceptive usage,
     and details about FIGO (The International Federation of Gynecology
     and Obstetrics) stage[47]^14 and histological subtype of tumors.

     Note: FIGO OC staging system categorizes OC into stage I to IV based
     on tumor size and extent of metastasis. Specifically, stage I and
     stage II refer to the early stages when the lesions are confined to
     the pelvis. In contrast, stage III and IV indicate late stages when
     the tumors metastasized to abdominal area and more distant organs.
     This classification can help guide the personalized management of OC
     patients.

     * 4.
       Randomly divide the enrolled patients into two separate cohorts,
       one designated as a training cohort for constructing the diagnostic
       model and the other as a validation cohort used for independent
       validation of the model.

     Note: Sample size distribution for training and validation cohorts
     (such as 80/20, or 50/50, etc.) are flexible, largely based on the
     principal focus of the research, i.e., model establishment or model
     application.

     Inline graphic CRITICAL: The enrolled patients must not have
     received any prior treatments, as previous therapies could
     complicate the interpretation of metabolic status.

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Biological samples
     __________________________________________________________________

   Uterine fluid specimens This paper N/A
     __________________________________________________________________

   Chemicals, peptides, and recombinant proteins
     __________________________________________________________________

   LC-MS grade methanol Thermo Fisher Scientific Cat#047192
   LC-MS grade acetonitrile Thermo Fisher Scientific Cat#51101
   LC-MS grade formic acid Thermo Fisher Scientific Cat#28905
   Ammonium formate Fisher Scientific Cat#A11550
   Pierce positive ion calibration solution Thermo Fisher Scientific
   Cat#88323
   Pierce negative ion calibration solution Thermo Fisher Scientific
   Cat#88324
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   Xcalibur 2.2 SP1.48 software Thermo Fisher Scientific Cat#OPTON-30965
   Progenesis QI (Nonlinear Dynamics, version 2.1) Nonlinear Dynamics
   [48]http://www.nonlinear.com/progenesis/qi/download/
   HMDB HMDB [49]https://hmdb.ca/
   MetaboAnalyst MetaboAnalyst [50]https://www.metaboanalyst.ca/
   SIMCA software Sartorius Stedim Biotech, Umetrics
   [51]https://www.sartorius.com/en/products/process-analytical-technology
   /data-analytics-software/mvda-software/simca
   KEGG KEGG [52]https://www.genome.jp/kegg/
   Cytoscape software Cytoscape [53]https://cytoscape.org/
   MetScape plugin Cytoscape [54]https://apps.cytoscape.org/apps/metscape
   R (v4.3.0) R [55]https://www.r-project.org/
   ropls package Bioconductor
   [56]https://bioconductor.org/packages/release/bioc/htmL/ropls.htmL
   ggplot2 package R
   [57]https://cran.r-project.org/web/packages/ggplot2/index.htmL
   ggrepel package R
   [58]https://cran.r-project.org/web/packages/ggrepel/index.htmL
   ggpubr package R
   [59]https://cran.r-project.org/web/packages/ggpubr/index.htmL
   ggalluvial R [60]https://cran.r-project.org/web/packages/ggalluvial/
   pROC package R
   [61]https://cran.r-project.org/web/packages/pROC/index.htmL
     __________________________________________________________________

   Other
     __________________________________________________________________

   ACQUITY UPLC HSS T3 column (1.8 μm, 2.1 mm × 100 mm) Waters
   Cat#186003539
   ACQUITY UPLC BEH amide column (1.7 μm, 2.1 mm × 100 mm) Waters
   Cat#186004801
   Thermomixer R Eppendorf Cat#05-400-205
   MIKRO 220 R high-speed refrigerated microcentrifuge Hettich Lab
   Technology Cat#2200|2205
   Savant SPD131DDA SpeedVac concentrator Thermo Fisher Scientific
   Cat#197-3003-00
   Ultrasonic water bath Kunshan Shumei Ultrasonic Instrument KQ3200DE
   ACQUITY UPLC I-Class PLUS system Waters Cat#720003920en
   Q Exactive Hybrid Quadrupole-Orbitrap mass spectrometer Thermo Fisher
   Scientific Cat#IQLAAEGAAPFALGMAZR
   [62]Open in a new tab

Materials and equipment

     Inline graphic CRITICAL: HPLC-grade solvents should be used for all
     mobile phases and needle wash. Most organic reagents are toxic,
     flammable, and volatile, which should be handled in a fume hood with
     personal protective equipment like protective gloves, eyewear, and a
     lab coat.

     Inline graphic CRITICAL: All solutions should be degassed by
     ultrasonication for 30 min.

   Mobile phase A for reversed phase liquid chromatography (RPLC)
     Reagent   Final concentration Amount
   Water       99.9%               999 mL
   Formic acid 0.1%                1 mL
   Total       N/A                 1 L
   [63]Open in a new tab

   Mobile phase B for RPLC
   Reagent  Final concentration Amount
   Methanol 100%                1 L
   Total    N/A                 1 L
   [64]Open in a new tab

     Note: The LC separation program is the same in both positive and
     negative modes of RPLC.

   Mobile phase A for Hydrophilic interaction liquid chromatography
   (HILIC)-positive mode
       Reagent      Final concentration  Amount
   Acetonitrile     95%                 950 mL
   Water            4.9%                49 mL
   Formic acid      0.1%                1 mL
   Ammonium formate 5 mM                315.3 mg
   Total            N/A                 1 L
   [65]Open in a new tab

   Mobile phase B for HILIC-positive mode
       Reagent      Final concentration  Amount
   Water            99.9%               999 mL
   Formic acid      0.1%                1 mL
   Ammonium formate 5 mM                315.3 mg
   Total            N/A                 1 L
   [66]Open in a new tab

   Needle wash solvent 1 (strong)
     Reagent    Final concentration Amount
   Acetonitrile 50%                 500 mL
   Water        50%                 500 mL
   Total        N/A                 1 L
   [67]Open in a new tab

   Needle wash solvent 2 (weak)
     Reagent    Final concentration Amount
   Acetonitrile 10%                 100 mL
   Water        90%                 900 mL
   Total        N/A                 1 L
   [68]Open in a new tab

Step-by-step method details

Uterine flushing

     Inline graphic Timing: 20 min per patient

   This section explains how to collect uterine fluid from each patient
   ([69]Figure 1).
     * 1.
       Obtain informed consent from all women participating in the study.

     Note: This involves thorough explanation of the collection
     procedure, any potential side effect, and the scientific research
     purpose for which the collected samples would be used.

     * 2.
       Schedule an appropriate time for uterine flushing operation.

     Note: Uterine flushing could be conducted as a single operation or
     during a surgery, which is routinely performed during the follicular
     phase, the time right after the women’s period. The endometrial
     lining is at its thinnest state at that time, which helps in
     minimizing the risk of trauma and bleeding during the operation.

     Inline graphic CRITICAL: Uterine flushing should be only performed
     by qualified gynecologists to prevent complications such as
     infection and tissue damage, and the procedure itself does not
     require anesthesia.

     * 3.
       Arrange patients in a lithotomy position and conduct vulvar and
       vaginal disinfection.
     * 4.
       Use a sterile speculum to make cervix visible and then rinse it
       with normal saline before placing a foley catheter for uterine
       lavage harvest.
     * 5.
       Insert a foley catheter into uterine cavity through cervix using
       surgical forceps.
     * 6.
       Inflate the balloon of the catheter and gently pull on the catheter
       to ensure it is correctly positioned.
     * 7.
       Inject 2 mL of normal saline into the uterine cavity and allow it
       to settle for 30 s before retrieving it back into the collection
       tube.
     * 8.
       Repeat the previous step five times for a complete and thorough
       uterine flushing.

     Note: Multiple flushing ensure a more comprehensive sampling.

     * 9.
       Merge the fluid collected from each flushing and subsequently
       divide the pooled fluid into separate 1.5 mL microcentrifuge tubes.
     * 10.
       Snap freeze the samples in liquid nitrogen and store at −80°C to
       prevent metabolite degradation for future analysis.

Figure 1.

   [70]Figure 1
   [71]Open in a new tab

   An overview of uterine fluid collection and metabolite extraction

   (A) The general steps involve uterine flushing, sample aliquoting, QC
   sample preparation, and metabolite extraction for subsequent untargeted
   metabolomics experiments.

   (B) Dry samples in a vacuum concentrator.

   (C) The post-dry sample within the Eppendorf tube.

   (D) Ultrasonicate samples to redissolve.

Sample preparation

     Inline graphic Timing: 5 h

   This section describes the preparation process for metabolite
   extraction from uterine fluid samples ([72]Figure 1A).
     * 11.
       Thaw uterine fluid sample on ice.
     * 12.
       Centrifuge the sample at 17,000 g and 4°C for 10 min to remove cell
       or tissue fragments.
     * 13.
       Transfer the supernatant into a new tube and mix well through
       pipetting and vortex.
     * 14.
       Generate a pooled Quality Control (QC) sample by aliquoting 20 μL
       from each sample.

     Note: QC volume from each sample can vary depending on the required
     QC numbers and the total volume for each experimental sample.

     * 15.
       Divide the pooled QC into single QCs of 100 μL per tube as single
       QC samples to undergo the identical metabolite exaction steps as
       follows.

     Inline graphic CRITICAL: QC samples are crucial in assuring the
     stability of LC-MS system and filtering false positive signals.

     * 16.
       Combine 100 μL of each sample with 300 μL of ice-cold methanol and
       then shake the mixture for 20 min at 4°C in a Thermomixer R
       (Eppendorf).
     * 17.
       Centrifuge the mixture at 17,000 g and 4°C for 10 min to
       precipitate proteins.
     * 18.
       Collect 300 μL of the supernatant gently in a new tube.
     * 19.
       Evaporate the supernatant in a vacuum concentrator at 4°C until
       completely dried ([73]Figures 1B and 1C).
     * 20.
       Add 100 μL methanol/water (50:50) into the dried sample for
       dissolution aided by ultrasonication in an ice-cold water bath for
       20 min ([74]Figure 1D).

     Note: Ice-cold solvent prevents metabolic turnover by suppressing
     enzymatic activity.

     * 21.
       Transfer the redissolved solution to an autosampler vial insert for
       analysis.

     Note: Samples prepared for HILIC do not require step 19 and step 20.

Untargeted metabolomics

     Inline graphic Timing: 16 min per sample for RPLC, 12 min per sample
     for HILIC, and 1 h for database searching

   This section describes the LC-MS operation for untargeted metabolomics,
   covering system equilibration, sample runs, data acquisition, and data
   analysis steps ([75]Figure 2).

     Inline graphic CRITICAL: Pre-checks and maintenance steps such as
     system equilibration and calibration, are integral to the optimal
     operation of an LC-MS system. These preliminary steps ensure the
     quality and reliability of the analytical results as well as the
     functionality and longevity of the LC-MS instruments.

     * 22.
       Prepare the mobile phase solvents and the needle wash (NW)
       solutions.

     Note: Ensure that both mobile phases and NW solutions are freshly
     prepared and degassed before each experiment and stored at 20°C–22°C
     immediately before use.

     Note: NW employs two solutions sequentially before each run, solvent
     1 (acetonitrile/water, 50:50) and solvent 2 (acetonitrile/water,
     10:90), effectively eliminating residues from the injection needle
     to avoid contamination between samples.

     * 23.
       Position the solvent lines for the LC pump and autosampler NW into
       the containers with mobile phases and NW solutions.
     * 24.
       Purge the LC pump for at least 5 min to remove any air from the
       system and fill solvent lines with the mobile phase.
     * 25.
       Conduct a purge with approximately 50 mL of NW solutions through
       the NW solvent lines.
     * 26.
       Equilibrate the system by running the mobile phases for 20 min at
       starting conditions and monitor the operational backpressure.

     Note: Waters ACQUITY UPLC HSS T3 (2.1 mm × 100 mm, 1.7 μm) is used
     for RPLC separation, and Waters UPLC BEH Amide (2.1 mm × 100 mm,
     1.7 μm) is for HILIC separation. Maintain both HILIC and RPLC
     columns at 40°C. Excessive or insufficient backpressure is usually
     attributed to column blockage or system leaks, necessitating
     thorough inspections and corrective measures for optimal system
     performance.[76]^15

     * 27.
       Clean the ionization source and monitor spray stability.
     * 28.
       Perform external mass calibration using commercial calibration
       solutions (Pierce, Thermo Fisher Scientific) ([77]Figure 2B) to
       correct the mass axis of the mass spectrometer (MS).

     Note: The calibration for positive ions includes masses at 74.09643,
     83.06037, 195.08465, 262.63612, 524.26496, and 1022.00341; for
     negative ions, the calibration includes masses at 91.00368,
     96.96010, 112.98559, 265.14790, 514.28440, and 1080.00999. The mass
     error should be within 5 parts per million (ppm).

     * 29.
       Start with ten blank sample injections followed by ten QC samples
       for LC-MS system equilibration.

     Note: Blank samples solely consist of the solvents without
     biological substances, specifically, methanol/water (50:50) for
     RPLC, and methanol/water (75:25) for HILIC.

     Note: The injection volume is 4 μL.

     * 30.
       Then run two QCs (QC ddMS^2) specifically for MS/MS spectra
       acquisition.
     * 31.
       Subsequent samples are all subjected to full-scan MS1 data
       acquisition and one QC is introduced after every 6–8 experimental
       samples ([78]Figure 2A).

     Note: Experimental samples from all groups are injected in a
     randomized order within and between groups to minimize the effect of
     run orders on metabolomics data.

     Note: Please make sure to prepare sufficient QC samples in step 14
     according to the total number of experimental samples.

     Inline graphic CRITICAL: It is strongly recommended to perform
     metabolomic experiments in a single batch to maintain consistency.
     If multiple batches are unavoidable, please ensure to use the same
     QC samples for all batches.

     * 32.
       Detect metabolites eluted from the columns following
       chromatographic programs ([79]Tables 1 and [80]2) by a Q Exactive
       Hybrid Quadrupole-Orbitrap MS (Thermo Fisher Scientific)
       ([81]Figure 2C; [82]Tables 3 and [83]4).
     * 33.
       Acquire raw LC-MS data using Xcalibur 2.2 SP1.48 software (Thermo
       Fisher Scientific).

     Note: Total Ion Current (TIC) chromatography could provide a quick
     visual representation of all the ions detected over the course of a
     chromatographic run ([84]Figure 2D).

Figure 2.

   [85]Figure 2
   [86]Open in a new tab

   An overview of LC-MS-based untargeted metabolomics

   (A) The run orders in the LC-MS-based untargeted metabolomics
   experiments.

   (B) Pierce ESI positive ion calibration solution spectra.

   (C) Physical representations of UPLC-MS equipment.

   (D) Example total ion current (TIC) chromatogram of uterine fluid in
   positive mode of RPLC-MS.

   (E) Illustrative exported tables from Progenesis QI.

   (F) Distribution of ion signals. Each box represents total signals in
   one sample, and y-axis shows the normalized intensities.
   Figure reprinted and adapted with permission from Wang et al., 2023.

   (G) MS/MS spectra and molecular structure of L-Phenylalanine in uterine
   fluid.

Table 1.

   RPLC separation program
   Time (min) Flow rate (mL/min) Mobile phase A (%) Mobile phase B (%)
   0          0.3                98                 2
   1          0.3                98                 2
   7          0.3                0                  100
   14.5       0.3                0                  100
   14.6       0.3                98                 2
   16         0.3                98                 2
   [87]Open in a new tab

   Mobile Phase A: 0.1% formic acid in HPLC-grade water.

   Mobile Phase B: HPLC-grade methanol.

Table 2.

   HILIC separation program for the positive mode
   Time (min) Flow rate (mL/min) Mobile phase A (%) Mobile phase B (%)
   0          0.3                95                 5
   1          0.3                95                 5
   7          0.3                50                 50
   8          0.3                50                 50
   8.1        0.3                95                 5
   12         0.3                95                 5
   [88]Open in a new tab

   Mobile Phase A: 0.1% formic acid in HPLC-grade acetonitrile/water
   (95:5) with 5 mM ammonium formate.

   Mobile Phase B: 0.1% formic acid in HPLC-grade water with 5 mM ammonium
   formate.

Table 3.

   MS source parameter settings
   Source parameter Value
   Ionization type Electrospray ionization (ESI)
   Capillary temperature 320°C
   Spray voltage +3.7 kV for positive mode
   −3.5 kV for negative mode
   Sheath gas; auxiliary gas; collision gas Nitrogen
   Sheath gas pressure 30 psi
   Auxiliary gas pressure 10 psi
   Desolvant temperature 300°C
   Collision gas pressure 1.5 mTorr
   [89]Open in a new tab

Table 4.

   MS acquisition parameter settings
   Acquisition parameter Value
   MS1
     __________________________________________________________________

   Scanning mode Full-scan
   Resolution 70,000
   Scan range (m/z) 80–1200
   Automatic gain control (AGC) target 1✕10^6
   Maximum isolation time (IT) 50 ms
     __________________________________________________________________

   MS/MS
     __________________________________________________________________

   Scanning mode Data-dependent acquisition
   Resolution 17,500
   AGC target 1✕10^5
   Maximum IT 50 ms
   Normalized Collision energy (NCE) Stepped NCE: 15 V, 30 V, 55 V
   Intensity threshold 1✕10^5
   [90]Open in a new tab

   Subsequently, Progenesis QI software (Nonlinear dynamics, v 2.1) is
   utilized for peak picking and alignment, deconvolution, and feature
   (pairs of m/z and retention time) database searching, detailed in the
   following steps 34–42.

     Note: Open-source platforms such as XCMS online,[91]^16 MZmine
     2,[92]^17 MS-DIAL,[93]^18 offer free alternatives to Progenesis QI
     for processing raw metabolomics data.

     * 34.
       Click on “New” and define a name for the new project.
     * 35.
       Set analysis parameters including the MS machine, data format and
       ionization mode.
     * 36.
       Select the possible adducts.
     * 37.
       Import raw data of all run files and automatically select a QC as
       an alignment reference.
     * 38.
       Set the minimum peak width as 0.05 min and retention time limits
       from 0.7 min to the end of the elution duration.

     Note: Excluding peaks before the hold-up time of the LC system
     (0.7 min) is done to remove unretained compounds and prevent noise.

     * 39.
       Start peak picking and alignment automatically.
     * 40.
       Create an experiment design and set comparison groups.
     * 41.
       Conduct deconvolution and database searching by choosing the Human
       Metabolome Database (HMDB, [94]http://www.hmdb.ca/) in MetaScope
       plugin.

     Note: HMDB is a comprehensive repository encompassing thousands of
     metabolites identified in the human body, each provided with
     in-depth information.

     * 42.
       Export two comma separated value (.csv) format files as illustrated
       in [95]Figure 2E for each LC-MS method.

     Note: One file contains a table of quantitative results with the
     peak intensities for each sample, and the other is a list of
     preliminary identification results involving the feature matching to
     the HMDB.

     Inline graphic CRITICAL: Metabolite identification is very
     time-consuming, as it necessitates manual scrutiny and verification
     of each metabolic feature. It is thus recommended to first narrow
     down the number of features through differential screening, enabling
     the process more manageable, which can be referred in step 48.

Bioinformatics analysis and data visualization

     Inline graphic Timing: variable

   This section details the workflow for bioinformatics and visualization,
   focusing on identifying differential metabolites, conducting functional
   enrichment analysis, and evaluating candidate biomarker performance.
     * 43.
       First, filter out the metabolic features when their coefficient of
       variation > 30% in QC samples.
     * 44.
       Perform metabolomics data normalization of raw peak intensities
       ([96]Figure 2F) on MetaboAnalyst platform
       ([97]https://www.metaboanalyst.ca/).

     Note: Choose these parameters: normalization by sum, log
     transformation (base 10), and Pareto scaling for the normalization
     procedure.

     * 45.
       Utilize principal component analysis (PCA) on all samples
       (including QC samples).

     Note: PCA is essential for quality control in untargeted
     metabolomics, providing dataset visualization to identify outliers,
     batch effects, and instrument drift. The tight clustering of QC
     samples in PCA plot indicate the high quality and reliability of the
     analyzed data, and discrepancies in this clustering can highlight
     potential problems in the analytical workflow. Additionally, PCA
     verifies the efficacy of data preprocessing, setting the stage for
     reliable further analysis and interpretation.

     Inline graphic CRITICAL: Always start with PCA rather than
     supervised analyses like PLS-DA and OPLS-DA to reduce the risk of
     overfitting and random group separation. PCA and PLS-DA have own
     distinct advantages. PCA is typically used as a first step to get an
     unbiased view of the data and check for quality, while PLS-DA is
     used subsequently to focus on the variations that are specifically
     related to the experimental conditions or classifications. This
     combination offers a balanced and thorough approach to analyzing
     complex metabolomics data.

     * 46.
       Apply partial least squares discriminant analysis (PLS-DA) and
       orthogonal partial least squares discriminant analysis (OPLS-DA)
       for discriminating different experimental groups
       ([98]Figures 3A–3E).

     Note: To ensure the robustness and reliability of OPLS-DA model, it
     is essential to employ a permutation test.

     Inline graphic CRITICAL: The analysis of PCA and (O)PLS-DA requires
     the normalized data.

     * 47.
       Screen for differential metabolic features with the criteria of
       variable important projection (VIP) > 1 and false discovery rate
       (FDR) < 0.05.

     Note: The VIP and FDR values were computed using R package ropls or
     SIMCA software (Umetrics, Sartorius Stedim Biotech, Umea, Sweden).

   >if (!require("BiocManager", quietly = TRUE))

   > install.packages("BiocManager")

   >BiocManager::install(version = "3.17")

   >BiocManager::install('ropls')

   >install.packages('ggplot2')

   >install.packages('ggrepel')

   >install.packages('ggpubr')

   >library(ropls)

   >library(ggplot2)

   >library(ggrepel)

   >library(ggpubr)

   ># First, prepare and input the data as ‘data’. Each row contains a
   sample with the sample name as the row name, each column contains a
   metabolic feature with the feature name as the column name.

   >data<-read.csv("data.csv",header=T,check.names=F,row.names = 1)

   ># Next, create a dataframe ‘datagroup’ with a single column that
   indicates the group information. Each row name represents the group to
   which each sample belongs. The column name is ‘Group’.

   > datagroup<-read.csv("datagroup.csv",header=T,check.names=F,
   row.names = 1)

   > data.pca <- opls(x = data)

   > scoreMN <- data.pca@scoreMN

   > scoreMN <- cbind(scoreMN,datagroup)

   > scoreMN$samples <- rownames(scoreMN)

   > ggplot(scoreMN, aes(p1, p2, color = Group)) +

    geom_point() +stat_ellipse(show.legend = FALSE)+

    geom_text_repel(data = scoreMN, aes(label = samples),

    size = 3, segment.color = "black", show.legend = FALSE )

   > data.plsda <- opls(x = data, y = datagroup[, 'Group'],

   orthoI. = NA)

   ># orthoI. = 0 for PLS-DA, orthoI = NA for OPLS-DA.

   > scoreMNp <- data.plsda@scoreMN

   > scoreMNp <- as.data.frame(scoreMNp)

   > scoreMNp$samples <- rownames(scoreMNp)

   > scoreMNp$Group <- scoreMN$Group

   > ggplot(scoreMNp, aes(p1,p2,color = Group)) +

   geom_point() +stat_ellipse(show.legend = FALSE) +

   geom_text_repel(data = scoreMNp, aes(label = samples),

   size = 3,segment.color = "black", show.legend = FALSE)

   >library(ropls)

   >data.plsda <- opls(x = data, y = datagroup[, 'Group'],

   orthoI. = NA)

   >VIP<-as.data.frame(getVipVn(data.plsda))

   ># P values were computed by t-test or wilcoxon rank-sum test depending
   on data distribution

   ># For t-test

   > v<-names(data)[1:ncol(data)]

   >pval=c()

   >for (i in v){

   p<-t.test(data[,i]∼datagroup$Group)$p.value

   pval=c(pval,p)

   }

   >t_test_res<-data.frame(v,pval)

   ># For Wilcoxon rank-sum test

   >v<-names(data)[1:ncol(data)]

   >pval=c()

   >for (i in v){

   p<-wilcox.test(data[,i]∼datagroup$Group)$p.value

   pval=c(pval,p)

   }

   >wilcox_test_res<-data.frame(v,pval)

   ># FDR computation

   >p.adj<-p.adjust(pval, method="BH")
     * 48.
       Identification for the selected metabolic features.
          + a.
            Match the differential features in the identification table
            with the quantification table mentioned in step 42.

     Note: Features with mass error < 5 ppm in MS1 are acceptable. Of
     note, features with the score ≥44 are kept for further
     identification.
          + b.
            Match the MS/MS spectra ([99]Figure 2G) with reference spectra
            in the HMDB.

     Note: The identification table exported from Progenesis QI contains
     the score for each metabolic feature, which is calculated by the
     mean of mass similarity, isotope similarity, and fragmentation
     score. The maximum value for the score is 60. The score acts as a
     quality parameter to assess the identification reliability.

     Inline graphic CRITICAL: Metabolomics Standards Initiative (MSI) has
     established four confidence levels for metabolite
     identification.[100]^19 Level 1 represents the highest confidence
     where metabolite structures have been confirmed using reference
     standards. Level 2 denotes the probable identification of exact
     structures based on the matching of experimental spectra data with
     those from established databases or literatures. Level 3 to 4
     indicate progressively less certainty in identification. It is
     highly recommended to report the confidence level of the identified
     metabolites in metabolomics studies. Typically, identified
     metabolites should have at least MSI Level 2.

     Note: The matched metabolites in step 48 can be reported as MSI
     Level 2. If reference standards are available, the confidence level
     of these metabolites can be elevated to Level 1 with the
     confirmation of reference standards.
     * 49.
       Categorize the identified metabolites and visualize the
       classification in Sankey plot using R package ggalluvial.
       ([101]Figure 4A).

     Note: These classifications are available across various platforms
     such as MetaboAnalyst and RefMet
     ([102]https://www.metabolomicsworkbench.org/databases/refmet/index.p
     hp)[103]^20 by importing a metabolite list.

     * 50.
       Conduct pathway enrichment analysis of metabolites based on Kyoto
       Encyclopedia of Genes and Genomes (KEGG) database
       ([104]https://www.genome.jp/kegg/) ([105]Figure 4B).
     * 51.
       Trace the connections between metabolites and genes in metabolic
       pathways and visualize them by Cytoscape software[106]^21
       ([107]Figure 4C).

     Note: Access the Cytoscape App Manager to search and install
     MetScape plugin. This tool enables the construction of
     metabolite-gene networks by inputting metabolites of interest and
     optionally, genes of interest. Typically, the parameters required
     include constructing a pathway-based network, selecting a specific
     organism, entering the desired metabolites with their corresponding
     KEGG IDs, and opting for a compound-gene network type. Then,
     compounds and genes associated with input metabolites that are
     identified through computational analysis using databases such as
     KEGG or HMDB.

     * 52.
       Select candidate metabolite biomarkers using multifaceted
       bioinformatics methods such as pathway-based enrichment analysis
       and abundance-based differential analysis.
     * 53.
       Conduct ROC analysis of selected metabolites. Perform
       single-metabolite ROC analysis in R using pROC package and plot the
       area under the curve (AUC) at the same time ([108]Figure 5B).

   >install.packages('pROC')

   >library(pROC)

   >data<-read.csv(“data.csv”, sep=',',head=F, check.names=F) #samples in
   the columns and the first row is sample name, and the second row is the
   outcome of each sample, then each row is each metabolite level in all
   samples.

   >data <- as.data.frame(t(data))

   >colnames(data)<-data[1,]

   >data<-data[-1,]

   >#to avoid predictors must be numeric or ordered in response to roc()
   function

   >set2<-data[,c(1,2)]

   >set1<-dplyr::mutate_all(data[,3:ncol(data)],as.numeric)

   >data<-cbind(set2,set1)

   >for (i in 3:dim(data)[2]){

   colnames(data)[i]<- paste(“metabolite”, i-2, sep=“”,

   collapse=NULL)

   }

   >data[,‘outcome’]<-as.factor(data[,‘outcome’])

   >roc1<- roc(data$outcome, data$metabolite1,smooth=T)

   >plot(roc1, print.auc=T,col="#ef7a6d")
     * 54.
       For multiple metabolite-combined ROC curves, locate the module on
       MetaboAnalyst where both classification method and feature ranking
       method can be chosen ([109]Figures 5A and 5C).

     Note: Candidate biomarkers with AUC > 0.7 is usually considered as
     acceptable for further validation.

     Inline graphic CRITICAL: It is necessary to use an independent
     validation cohort to assess the performance of candidate biomarkers,
     which involves evaluating statistical differences in metabolite
     levels between groups and conducting ROC analysis.

Figure 3.

   [110]Figure 3
   [111]Open in a new tab

   Multivariate analysis

   (A) The structure of datasets utilized in PCA and (O)PLS-DA code. The
   row and column names of the ‘data’ dataframe are shown respectively as
   ‘Sample 1, Sample 2, …’ and ‘Metabolite 1, Metabolite 2, …’. The row
   names of the ‘datagroup’ dataframe are displayed as ‘Sample 1, Sample
   2, …’.

   (B) Example PCA plot. Each dot represents a sample.

   (C) Example OPLS-DA plot.

   (D) Example of a lollipop plot displaying the top 20 metabolites ranked
   by VIP value. Each dot represents a metabolite. VIP >1 is considered as
   a cutoff for significant metabolites.

   (E) Example of a scatter plot exhibiting VIP and FDR values for all
   metabolites. Blue lines indicate the threshold criteria of differential
   metabolite, set as VIP >1 and FDR <0.05.

Figure 4.

   [112]Figure 4
   [113]Open in a new tab

   Metabolite enrichment analysis

   (A) Illustrative Sankey plot depicting the main classes of individual
   metabolites.

   (B) Example pathway analysis of metabolites. Each bar represents a
   pathway term. The upper x-axis indicates -log10(P) and the lower x-axis
   indicates the enrichment ratio.

   (C) Illustrative network depicting interactions between metabolites and
   genes using MetScape plugin in Cytoscape software. Each hexagon
   symbolizes a metabolite, and each dot represents an enzyme responsible
   for the conversion of the adjacent metabolites.

Figure 5.

   [114]Figure 5
   [115]Open in a new tab

   ROC analysis

   (A) Multivariate ROC analysis module on MetaboAnalyst platform
   interface.

   (B) Example ROC curves for individual variables. Each curve represents
   the ROC curve for each variable and the corresponding AUC value for
   each variable is shown. Figure reprinted and adapted with permission
   from Wang et al., 2023.

   (C) Example multivariate ROC curve based on SVM algorithm.
   Figure reprinted and adapted with permission from Wang et al., 2023.

Expected outcomes

   The implementation of this protocol has revealed the omics-scale
   metabolic profile of uterine fluid in various disease conditions among
   women. Notably, an effective biomarker panel has been developed and
   validated, substantiating the reliability of this sensitive and
   non-invasive approach for early OC detection.

Limitations

   First, matrix effect in biological samples is inherent to LC-MS,
   impacting accurate quantification in untargeted metabolomics. However,
   optimizing sample preparation and using internal standards closely
   resembling the targeted metabolites can significantly mitigate this
   effect. Secondly, the current constraints in reference databases and
   annotation methods hinder a complete and accurate coverage of uterine
   fluid metabolome. Advancing algorithms and expanding databases could
   address this issue.

Troubleshooting

Problem 1

   A large amount of fluid leakage happens during the process of flushing
   (step 7).

Potential solution

   Massive leakage during uterine fluid collection can result in
   incomplete sample harvest, impacting the accuracy and reliability of
   the analysis and missing of low-abundance metabolites. It is probably
   due to the large quantity of normal saline infused into the uterus at
   one time. It is advisable to decrease the volume for each lavage and to
   flush at a slow pace to avoid strong turbulence.

Problem 2

   Cells and tissues residue in the uterine fluid (step 12).

Potential solution

   Repeat a second centrifugation under the same condition and transfer
   the supernatant into a new tube.

Problem 3

   Retention time drifts (step 32).

Potential solution

   This issue may be due to insufficient system balance, column
   contamination or chromatographic system leakage and the specific cause
   should be determined. If leakage is detected, the experiment requires
   to be redone. For other reasons such as column contamination, minor
   retention time drifts (< 0.1 min) are generally acceptable and
   manageable as peaks can be aligned with Progenesis QI software.
   However, for drifts that exceed 0.1 min, check the column efficiency,
   and adjust the elution gradient to adequately remove contaminants, then
   repeat the experiment.

Problem 4

   QC samples are not clustered tightly in PCA for large-scale
   metabolomics data (step 45).

Potential solution

   This issue might arise from data acquisition on a single LC-MS system
   over an extended period, which can lead to various unavoidable
   systematic discrepancies. The Systematic Error Removal using Random
   Forest (SERRF) Normalization method,[116]^22 which leverages QC
   samples, could serve as a potential solution for correcting and
   minimizing these systematic variations.

Problem 5

   Multiple features match with the same metabolite (step 48).

Potential solution

   First, it is advised to order these features by their precursor and
   fragment scores in descending order. Subsequently, check the MS/MS
   spectra starting from the highest scoring feature and moving downward,
   as a higher score suggests a greater likelihood of it being the correct
   metabolite match.

   Notably, if the reference standard is available for this metabolite,
   the most accurate method is to run the reference standard under the
   identical LC-MS conditions to verify the alignment of structural data
   with the initial experimental sample.

Problem 6

   When running the roc() function, "Predictor must be numeric or
   ordered." might occur as an error (step 53).

Potential solution

   The pROC package reference manual demonstrates that the roc() function
   requires two vectors (response, predictor). It is essential that the
   predictor vector is numeric or ordered, as opposed to other types like
   character. To address this, character values can be converted to
   numeric values using the dplyr::mutate_all(predictor lines, as numeric)
   function in dplyr package.

Resource availability

Lead contact

   Further information and requests for resources and reagents should be
   directed to and will be fulfilled by the lead contact, Mo Li
   (limo@hsc.pku.edu.cn).

Technical contact

   Questions about the technical specifics of performing the protocol
   should be directed to and will be answered by the technical contact,
   Yuening Jiang (2111110435@stu.pku.edu.cn).

Materials availability

   This study did not generate new unique reagents.

Data and code availability

   The accession number for the untargeted metabolomics data reported in
   this paper is MetaboLights: MTBLS4861.

Acknowledgments