Graphical abstract graphic file with name fx1.jpg [34]Open in a new tab Highlights * • Harvesting uterine fluid for early detection of ovarian cancer * • Extracting metabolites from uterine fluid * • Running LC-MS for untargeted metabolomics * • Bioinformatics analysis and visualization strategies __________________________________________________________________ Publisher’s note: Undertaking any experimental protocol requires adherence to local institutional guidelines for laboratory safety and ethics. __________________________________________________________________ High mortality of ovarian cancer (OC) is primarily attributed to the lack of effective early detection methods. Uterine fluid, pooling molecules from neighboring ovaries, presents an organ-specific advantage over conventional blood samples. Here, we present a protocol for identifying metabolite biomarkers in uterine fluid for early OC detection. We describe steps for uterine fluid collection from patients, metabolite extraction, metabolomics experiments, and candidate metabolite biomarker screening. This standardized workflow holds the potential to achieve early OC diagnosis in clinical practice. Before you begin Overview OC has the highest mortality rate among gynecological tumors primarily because of the absence of effective biomarkers for early diagnosis.[35]^2 The 5-year survival rate for localized OC exceeds 90%, whereas it drops to less than 30% for patients with distant spread, a pattern observed across various types of cancers as well.[36]^3 Although clinical serum biomarkers such as cancer antigen 125 (CA125) and human epididymis protein 4 (HE4) are useful in monitoring OC progression, these circulation-based biomarkers exhibit inefficient performance in the early detection of OC. Differing from blood that lacks organ-specific characteristics, uterine fluid displays unique features owing to its proximity to adjacent ovaries, supported by the closely interconnected structure of the ovaries, fallopian tubes, and uterine cavity. Molecular substances shed from ovaries can be transported to uterine cavity, where they can be retrieved through uterine flushing.[37]^4 This phenomenon establishes the conceptual foundation for uterine fluid to be a valuable sample source for early detection of OC. Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics has demonstrated substantial potential in disease exploration, as illustrated earlier in STAR Protocols.[38]^5^,[39]^6^,[40]^7^,[41]^8 This untargeted approach produces massive amounts of data covering the entire metabolite profile within biological samples.[42]^9 Notably, this metabolomics workflow is applicable to various biological bodily fluids, extending from uterine fluid to nasal fluid, bronchoalveolar fluid, cerebrospinal fluid, pleural fluid and ascitic fluids, etc. For nasopharyngeal cancer (NPC), highly curable when detected early,[43]^10 the existing diagnostic approaches largely hinge on MRI and biopsy,[44]^11 lacking efficient and noninvasive methods for early detection. In this context, the investigation of nasal fluid emerges as a promising pathway for advancing the early diagnosis in NPC. Moreover, in terms of bronchoalveolar lavage fluid, our methodology could aid clinicians in better understanding the metabolic changes associated with pulmonary diseases. As for cerebrospinal fluid, metabolomic analysis provides insights into early diagnosis and the pathological mechanisms of neurological diseases such as Alzheimer’s Disease and multiple sclerosis. The protocol below describes the detailed steps of identifying metabolite biomarkers sourced from uterine fluid, aimed at improving early-stage OC detection. This workflow includes uterine fluid collection, metabolite extraction, LC-MS-based untargeted metabolomics analysis, and biomarker screening analysis. Remarkably, multi-level bioinformatics analysis tools are used to interpret the high dimensional metabolome-wide data. To conclude, this protocol serves as a powerful strategy for biomarker discovery applicable to diagnostics, therapeutics, and disease stratification, targeting a broad spectrum of severe diseases that extend beyond OC. Institutional permissions The collection of uterine fluid samples from patients in the study has been approval by Peking University Third Hospital Medical Science Research Ethic Committee (IRB00006761-M2019471). All participants in our study signed informed consent, and the uterine flushing procedures were performed by professional gynecological surgeons in accord with the ethical guidelines. Please note that the use of clinical specimens from patients require permissions from your local institutions. Establish the inclusion and exclusion criteria of patients Inline graphic Timing: variable This section outlines the enrollment criteria and cohort division for participants. * 1. Include patients clinically diagnosed with early-stage OC, late-stage OC, endometrial cancer, and benign gynecological diseases. * 2. Exclude patients presenting bilateral oophorectomy, metabolic disorders, abnormal uterine cavity, and other malignancies unrelated to primary OC or primary endometrial cancer. * 3. Collect clinical records of patients. Note: The clinical information includes general physiological data (age, weight, body mass index), serum indicators such as CA125, HE4, the Risk of Ovarian Malignancy Algorithm (ROMA),[45]^12^,[46]^13 along with family history, parity history, oral contraceptive usage, and details about FIGO (The International Federation of Gynecology and Obstetrics) stage[47]^14 and histological subtype of tumors. Note: FIGO OC staging system categorizes OC into stage I to IV based on tumor size and extent of metastasis. Specifically, stage I and stage II refer to the early stages when the lesions are confined to the pelvis. In contrast, stage III and IV indicate late stages when the tumors metastasized to abdominal area and more distant organs. This classification can help guide the personalized management of OC patients. * 4. Randomly divide the enrolled patients into two separate cohorts, one designated as a training cohort for constructing the diagnostic model and the other as a validation cohort used for independent validation of the model. Note: Sample size distribution for training and validation cohorts (such as 80/20, or 50/50, etc.) are flexible, largely based on the principal focus of the research, i.e., model establishment or model application. Inline graphic CRITICAL: The enrolled patients must not have received any prior treatments, as previous therapies could complicate the interpretation of metabolic status. Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples __________________________________________________________________ Uterine fluid specimens This paper N/A __________________________________________________________________ Chemicals, peptides, and recombinant proteins __________________________________________________________________ LC-MS grade methanol Thermo Fisher Scientific Cat#047192 LC-MS grade acetonitrile Thermo Fisher Scientific Cat#51101 LC-MS grade formic acid Thermo Fisher Scientific Cat#28905 Ammonium formate Fisher Scientific Cat#A11550 Pierce positive ion calibration solution Thermo Fisher Scientific Cat#88323 Pierce negative ion calibration solution Thermo Fisher Scientific Cat#88324 __________________________________________________________________ Software and algorithms __________________________________________________________________ Xcalibur 2.2 SP1.48 software Thermo Fisher Scientific Cat#OPTON-30965 Progenesis QI (Nonlinear Dynamics, version 2.1) Nonlinear Dynamics [48]http://www.nonlinear.com/progenesis/qi/download/ HMDB HMDB [49]https://hmdb.ca/ MetaboAnalyst MetaboAnalyst [50]https://www.metaboanalyst.ca/ SIMCA software Sartorius Stedim Biotech, Umetrics [51]https://www.sartorius.com/en/products/process-analytical-technology /data-analytics-software/mvda-software/simca KEGG KEGG [52]https://www.genome.jp/kegg/ Cytoscape software Cytoscape [53]https://cytoscape.org/ MetScape plugin Cytoscape [54]https://apps.cytoscape.org/apps/metscape R (v4.3.0) R [55]https://www.r-project.org/ ropls package Bioconductor [56]https://bioconductor.org/packages/release/bioc/htmL/ropls.htmL ggplot2 package R [57]https://cran.r-project.org/web/packages/ggplot2/index.htmL ggrepel package R [58]https://cran.r-project.org/web/packages/ggrepel/index.htmL ggpubr package R [59]https://cran.r-project.org/web/packages/ggpubr/index.htmL ggalluvial R [60]https://cran.r-project.org/web/packages/ggalluvial/ pROC package R [61]https://cran.r-project.org/web/packages/pROC/index.htmL __________________________________________________________________ Other __________________________________________________________________ ACQUITY UPLC HSS T3 column (1.8 μm, 2.1 mm × 100 mm) Waters Cat#186003539 ACQUITY UPLC BEH amide column (1.7 μm, 2.1 mm × 100 mm) Waters Cat#186004801 Thermomixer R Eppendorf Cat#05-400-205 MIKRO 220 R high-speed refrigerated microcentrifuge Hettich Lab Technology Cat#2200|2205 Savant SPD131DDA SpeedVac concentrator Thermo Fisher Scientific Cat#197-3003-00 Ultrasonic water bath Kunshan Shumei Ultrasonic Instrument KQ3200DE ACQUITY UPLC I-Class PLUS system Waters Cat#720003920en Q Exactive Hybrid Quadrupole-Orbitrap mass spectrometer Thermo Fisher Scientific Cat#IQLAAEGAAPFALGMAZR [62]Open in a new tab Materials and equipment Inline graphic CRITICAL: HPLC-grade solvents should be used for all mobile phases and needle wash. Most organic reagents are toxic, flammable, and volatile, which should be handled in a fume hood with personal protective equipment like protective gloves, eyewear, and a lab coat. Inline graphic CRITICAL: All solutions should be degassed by ultrasonication for 30 min. Mobile phase A for reversed phase liquid chromatography (RPLC) Reagent Final concentration Amount Water 99.9% 999 mL Formic acid 0.1% 1 mL Total N/A 1 L [63]Open in a new tab Mobile phase B for RPLC Reagent Final concentration Amount Methanol 100% 1 L Total N/A 1 L [64]Open in a new tab Note: The LC separation program is the same in both positive and negative modes of RPLC. Mobile phase A for Hydrophilic interaction liquid chromatography (HILIC)-positive mode Reagent Final concentration Amount Acetonitrile 95% 950 mL Water 4.9% 49 mL Formic acid 0.1% 1 mL Ammonium formate 5 mM 315.3 mg Total N/A 1 L [65]Open in a new tab Mobile phase B for HILIC-positive mode Reagent Final concentration Amount Water 99.9% 999 mL Formic acid 0.1% 1 mL Ammonium formate 5 mM 315.3 mg Total N/A 1 L [66]Open in a new tab Needle wash solvent 1 (strong) Reagent Final concentration Amount Acetonitrile 50% 500 mL Water 50% 500 mL Total N/A 1 L [67]Open in a new tab Needle wash solvent 2 (weak) Reagent Final concentration Amount Acetonitrile 10% 100 mL Water 90% 900 mL Total N/A 1 L [68]Open in a new tab Step-by-step method details Uterine flushing Inline graphic Timing: 20 min per patient This section explains how to collect uterine fluid from each patient ([69]Figure 1). * 1. Obtain informed consent from all women participating in the study. Note: This involves thorough explanation of the collection procedure, any potential side effect, and the scientific research purpose for which the collected samples would be used. * 2. Schedule an appropriate time for uterine flushing operation. Note: Uterine flushing could be conducted as a single operation or during a surgery, which is routinely performed during the follicular phase, the time right after the women’s period. The endometrial lining is at its thinnest state at that time, which helps in minimizing the risk of trauma and bleeding during the operation. Inline graphic CRITICAL: Uterine flushing should be only performed by qualified gynecologists to prevent complications such as infection and tissue damage, and the procedure itself does not require anesthesia. * 3. Arrange patients in a lithotomy position and conduct vulvar and vaginal disinfection. * 4. Use a sterile speculum to make cervix visible and then rinse it with normal saline before placing a foley catheter for uterine lavage harvest. * 5. Insert a foley catheter into uterine cavity through cervix using surgical forceps. * 6. Inflate the balloon of the catheter and gently pull on the catheter to ensure it is correctly positioned. * 7. Inject 2 mL of normal saline into the uterine cavity and allow it to settle for 30 s before retrieving it back into the collection tube. * 8. Repeat the previous step five times for a complete and thorough uterine flushing. Note: Multiple flushing ensure a more comprehensive sampling. * 9. Merge the fluid collected from each flushing and subsequently divide the pooled fluid into separate 1.5 mL microcentrifuge tubes. * 10. Snap freeze the samples in liquid nitrogen and store at −80°C to prevent metabolite degradation for future analysis. Figure 1. [70]Figure 1 [71]Open in a new tab An overview of uterine fluid collection and metabolite extraction (A) The general steps involve uterine flushing, sample aliquoting, QC sample preparation, and metabolite extraction for subsequent untargeted metabolomics experiments. (B) Dry samples in a vacuum concentrator. (C) The post-dry sample within the Eppendorf tube. (D) Ultrasonicate samples to redissolve. Sample preparation Inline graphic Timing: 5 h This section describes the preparation process for metabolite extraction from uterine fluid samples ([72]Figure 1A). * 11. Thaw uterine fluid sample on ice. * 12. Centrifuge the sample at 17,000 g and 4°C for 10 min to remove cell or tissue fragments. * 13. Transfer the supernatant into a new tube and mix well through pipetting and vortex. * 14. Generate a pooled Quality Control (QC) sample by aliquoting 20 μL from each sample. Note: QC volume from each sample can vary depending on the required QC numbers and the total volume for each experimental sample. * 15. Divide the pooled QC into single QCs of 100 μL per tube as single QC samples to undergo the identical metabolite exaction steps as follows. Inline graphic CRITICAL: QC samples are crucial in assuring the stability of LC-MS system and filtering false positive signals. * 16. Combine 100 μL of each sample with 300 μL of ice-cold methanol and then shake the mixture for 20 min at 4°C in a Thermomixer R (Eppendorf). * 17. Centrifuge the mixture at 17,000 g and 4°C for 10 min to precipitate proteins. * 18. Collect 300 μL of the supernatant gently in a new tube. * 19. Evaporate the supernatant in a vacuum concentrator at 4°C until completely dried ([73]Figures 1B and 1C). * 20. Add 100 μL methanol/water (50:50) into the dried sample for dissolution aided by ultrasonication in an ice-cold water bath for 20 min ([74]Figure 1D). Note: Ice-cold solvent prevents metabolic turnover by suppressing enzymatic activity. * 21. Transfer the redissolved solution to an autosampler vial insert for analysis. Note: Samples prepared for HILIC do not require step 19 and step 20. Untargeted metabolomics Inline graphic Timing: 16 min per sample for RPLC, 12 min per sample for HILIC, and 1 h for database searching This section describes the LC-MS operation for untargeted metabolomics, covering system equilibration, sample runs, data acquisition, and data analysis steps ([75]Figure 2). Inline graphic CRITICAL: Pre-checks and maintenance steps such as system equilibration and calibration, are integral to the optimal operation of an LC-MS system. These preliminary steps ensure the quality and reliability of the analytical results as well as the functionality and longevity of the LC-MS instruments. * 22. Prepare the mobile phase solvents and the needle wash (NW) solutions. Note: Ensure that both mobile phases and NW solutions are freshly prepared and degassed before each experiment and stored at 20°C–22°C immediately before use. Note: NW employs two solutions sequentially before each run, solvent 1 (acetonitrile/water, 50:50) and solvent 2 (acetonitrile/water, 10:90), effectively eliminating residues from the injection needle to avoid contamination between samples. * 23. Position the solvent lines for the LC pump and autosampler NW into the containers with mobile phases and NW solutions. * 24. Purge the LC pump for at least 5 min to remove any air from the system and fill solvent lines with the mobile phase. * 25. Conduct a purge with approximately 50 mL of NW solutions through the NW solvent lines. * 26. Equilibrate the system by running the mobile phases for 20 min at starting conditions and monitor the operational backpressure. Note: Waters ACQUITY UPLC HSS T3 (2.1 mm × 100 mm, 1.7 μm) is used for RPLC separation, and Waters UPLC BEH Amide (2.1 mm × 100 mm, 1.7 μm) is for HILIC separation. Maintain both HILIC and RPLC columns at 40°C. Excessive or insufficient backpressure is usually attributed to column blockage or system leaks, necessitating thorough inspections and corrective measures for optimal system performance.[76]^15 * 27. Clean the ionization source and monitor spray stability. * 28. Perform external mass calibration using commercial calibration solutions (Pierce, Thermo Fisher Scientific) ([77]Figure 2B) to correct the mass axis of the mass spectrometer (MS). Note: The calibration for positive ions includes masses at 74.09643, 83.06037, 195.08465, 262.63612, 524.26496, and 1022.00341; for negative ions, the calibration includes masses at 91.00368, 96.96010, 112.98559, 265.14790, 514.28440, and 1080.00999. The mass error should be within 5 parts per million (ppm). * 29. Start with ten blank sample injections followed by ten QC samples for LC-MS system equilibration. Note: Blank samples solely consist of the solvents without biological substances, specifically, methanol/water (50:50) for RPLC, and methanol/water (75:25) for HILIC. Note: The injection volume is 4 μL. * 30. Then run two QCs (QC ddMS^2) specifically for MS/MS spectra acquisition. * 31. Subsequent samples are all subjected to full-scan MS1 data acquisition and one QC is introduced after every 6–8 experimental samples ([78]Figure 2A). Note: Experimental samples from all groups are injected in a randomized order within and between groups to minimize the effect of run orders on metabolomics data. Note: Please make sure to prepare sufficient QC samples in step 14 according to the total number of experimental samples. Inline graphic CRITICAL: It is strongly recommended to perform metabolomic experiments in a single batch to maintain consistency. If multiple batches are unavoidable, please ensure to use the same QC samples for all batches. * 32. Detect metabolites eluted from the columns following chromatographic programs ([79]Tables 1 and [80]2) by a Q Exactive Hybrid Quadrupole-Orbitrap MS (Thermo Fisher Scientific) ([81]Figure 2C; [82]Tables 3 and [83]4). * 33. Acquire raw LC-MS data using Xcalibur 2.2 SP1.48 software (Thermo Fisher Scientific). Note: Total Ion Current (TIC) chromatography could provide a quick visual representation of all the ions detected over the course of a chromatographic run ([84]Figure 2D). Figure 2. [85]Figure 2 [86]Open in a new tab An overview of LC-MS-based untargeted metabolomics (A) The run orders in the LC-MS-based untargeted metabolomics experiments. (B) Pierce ESI positive ion calibration solution spectra. (C) Physical representations of UPLC-MS equipment. (D) Example total ion current (TIC) chromatogram of uterine fluid in positive mode of RPLC-MS. (E) Illustrative exported tables from Progenesis QI. (F) Distribution of ion signals. Each box represents total signals in one sample, and y-axis shows the normalized intensities. Figure reprinted and adapted with permission from Wang et al., 2023. (G) MS/MS spectra and molecular structure of L-Phenylalanine in uterine fluid. Table 1. RPLC separation program Time (min) Flow rate (mL/min) Mobile phase A (%) Mobile phase B (%) 0 0.3 98 2 1 0.3 98 2 7 0.3 0 100 14.5 0.3 0 100 14.6 0.3 98 2 16 0.3 98 2 [87]Open in a new tab Mobile Phase A: 0.1% formic acid in HPLC-grade water. Mobile Phase B: HPLC-grade methanol. Table 2. HILIC separation program for the positive mode Time (min) Flow rate (mL/min) Mobile phase A (%) Mobile phase B (%) 0 0.3 95 5 1 0.3 95 5 7 0.3 50 50 8 0.3 50 50 8.1 0.3 95 5 12 0.3 95 5 [88]Open in a new tab Mobile Phase A: 0.1% formic acid in HPLC-grade acetonitrile/water (95:5) with 5 mM ammonium formate. Mobile Phase B: 0.1% formic acid in HPLC-grade water with 5 mM ammonium formate. Table 3. MS source parameter settings Source parameter Value Ionization type Electrospray ionization (ESI) Capillary temperature 320°C Spray voltage +3.7 kV for positive mode −3.5 kV for negative mode Sheath gas; auxiliary gas; collision gas Nitrogen Sheath gas pressure 30 psi Auxiliary gas pressure 10 psi Desolvant temperature 300°C Collision gas pressure 1.5 mTorr [89]Open in a new tab Table 4. MS acquisition parameter settings Acquisition parameter Value MS1 __________________________________________________________________ Scanning mode Full-scan Resolution 70,000 Scan range (m/z) 80–1200 Automatic gain control (AGC) target 1✕10^6 Maximum isolation time (IT) 50 ms __________________________________________________________________ MS/MS __________________________________________________________________ Scanning mode Data-dependent acquisition Resolution 17,500 AGC target 1✕10^5 Maximum IT 50 ms Normalized Collision energy (NCE) Stepped NCE: 15 V, 30 V, 55 V Intensity threshold 1✕10^5 [90]Open in a new tab Subsequently, Progenesis QI software (Nonlinear dynamics, v 2.1) is utilized for peak picking and alignment, deconvolution, and feature (pairs of m/z and retention time) database searching, detailed in the following steps 34–42. Note: Open-source platforms such as XCMS online,[91]^16 MZmine 2,[92]^17 MS-DIAL,[93]^18 offer free alternatives to Progenesis QI for processing raw metabolomics data. * 34. Click on “New” and define a name for the new project. * 35. Set analysis parameters including the MS machine, data format and ionization mode. * 36. Select the possible adducts. * 37. Import raw data of all run files and automatically select a QC as an alignment reference. * 38. Set the minimum peak width as 0.05 min and retention time limits from 0.7 min to the end of the elution duration. Note: Excluding peaks before the hold-up time of the LC system (0.7 min) is done to remove unretained compounds and prevent noise. * 39. Start peak picking and alignment automatically. * 40. Create an experiment design and set comparison groups. * 41. Conduct deconvolution and database searching by choosing the Human Metabolome Database (HMDB, [94]http://www.hmdb.ca/) in MetaScope plugin. Note: HMDB is a comprehensive repository encompassing thousands of metabolites identified in the human body, each provided with in-depth information. * 42. Export two comma separated value (.csv) format files as illustrated in [95]Figure 2E for each LC-MS method. Note: One file contains a table of quantitative results with the peak intensities for each sample, and the other is a list of preliminary identification results involving the feature matching to the HMDB. Inline graphic CRITICAL: Metabolite identification is very time-consuming, as it necessitates manual scrutiny and verification of each metabolic feature. It is thus recommended to first narrow down the number of features through differential screening, enabling the process more manageable, which can be referred in step 48. Bioinformatics analysis and data visualization Inline graphic Timing: variable This section details the workflow for bioinformatics and visualization, focusing on identifying differential metabolites, conducting functional enrichment analysis, and evaluating candidate biomarker performance. * 43. First, filter out the metabolic features when their coefficient of variation > 30% in QC samples. * 44. Perform metabolomics data normalization of raw peak intensities ([96]Figure 2F) on MetaboAnalyst platform ([97]https://www.metaboanalyst.ca/). Note: Choose these parameters: normalization by sum, log transformation (base 10), and Pareto scaling for the normalization procedure. * 45. Utilize principal component analysis (PCA) on all samples (including QC samples). Note: PCA is essential for quality control in untargeted metabolomics, providing dataset visualization to identify outliers, batch effects, and instrument drift. The tight clustering of QC samples in PCA plot indicate the high quality and reliability of the analyzed data, and discrepancies in this clustering can highlight potential problems in the analytical workflow. Additionally, PCA verifies the efficacy of data preprocessing, setting the stage for reliable further analysis and interpretation. Inline graphic CRITICAL: Always start with PCA rather than supervised analyses like PLS-DA and OPLS-DA to reduce the risk of overfitting and random group separation. PCA and PLS-DA have own distinct advantages. PCA is typically used as a first step to get an unbiased view of the data and check for quality, while PLS-DA is used subsequently to focus on the variations that are specifically related to the experimental conditions or classifications. This combination offers a balanced and thorough approach to analyzing complex metabolomics data. * 46. Apply partial least squares discriminant analysis (PLS-DA) and orthogonal partial least squares discriminant analysis (OPLS-DA) for discriminating different experimental groups ([98]Figures 3A–3E). Note: To ensure the robustness and reliability of OPLS-DA model, it is essential to employ a permutation test. Inline graphic CRITICAL: The analysis of PCA and (O)PLS-DA requires the normalized data. * 47. Screen for differential metabolic features with the criteria of variable important projection (VIP) > 1 and false discovery rate (FDR) < 0.05. Note: The VIP and FDR values were computed using R package ropls or SIMCA software (Umetrics, Sartorius Stedim Biotech, Umea, Sweden). >if (!require("BiocManager", quietly = TRUE)) > install.packages("BiocManager") >BiocManager::install(version = "3.17") >BiocManager::install('ropls') >install.packages('ggplot2') >install.packages('ggrepel') >install.packages('ggpubr') >library(ropls) >library(ggplot2) >library(ggrepel) >library(ggpubr) ># First, prepare and input the data as ‘data’. Each row contains a sample with the sample name as the row name, each column contains a metabolic feature with the feature name as the column name. >data<-read.csv("data.csv",header=T,check.names=F,row.names = 1) ># Next, create a dataframe ‘datagroup’ with a single column that indicates the group information. Each row name represents the group to which each sample belongs. The column name is ‘Group’. > datagroup<-read.csv("datagroup.csv",header=T,check.names=F, row.names = 1) > data.pca <- opls(x = data) > scoreMN <- data.pca@scoreMN > scoreMN <- cbind(scoreMN,datagroup) > scoreMN$samples <- rownames(scoreMN) > ggplot(scoreMN, aes(p1, p2, color = Group)) + geom_point() +stat_ellipse(show.legend = FALSE)+ geom_text_repel(data = scoreMN, aes(label = samples), size = 3, segment.color = "black", show.legend = FALSE ) > data.plsda <- opls(x = data, y = datagroup[, 'Group'], orthoI. = NA) ># orthoI. = 0 for PLS-DA, orthoI = NA for OPLS-DA. > scoreMNp <- data.plsda@scoreMN > scoreMNp <- as.data.frame(scoreMNp) > scoreMNp$samples <- rownames(scoreMNp) > scoreMNp$Group <- scoreMN$Group > ggplot(scoreMNp, aes(p1,p2,color = Group)) + geom_point() +stat_ellipse(show.legend = FALSE) + geom_text_repel(data = scoreMNp, aes(label = samples), size = 3,segment.color = "black", show.legend = FALSE) >library(ropls) >data.plsda <- opls(x = data, y = datagroup[, 'Group'], orthoI. = NA) >VIP<-as.data.frame(getVipVn(data.plsda)) ># P values were computed by t-test or wilcoxon rank-sum test depending on data distribution ># For t-test > v<-names(data)[1:ncol(data)] >pval=c() >for (i in v){ p<-t.test(data[,i]∼datagroup$Group)$p.value pval=c(pval,p) } >t_test_res<-data.frame(v,pval) ># For Wilcoxon rank-sum test >v<-names(data)[1:ncol(data)] >pval=c() >for (i in v){ p<-wilcox.test(data[,i]∼datagroup$Group)$p.value pval=c(pval,p) } >wilcox_test_res<-data.frame(v,pval) ># FDR computation >p.adj<-p.adjust(pval, method="BH") * 48. Identification for the selected metabolic features. + a. Match the differential features in the identification table with the quantification table mentioned in step 42. Note: Features with mass error < 5 ppm in MS1 are acceptable. Of note, features with the score ≥44 are kept for further identification. + b. Match the MS/MS spectra ([99]Figure 2G) with reference spectra in the HMDB. Note: The identification table exported from Progenesis QI contains the score for each metabolic feature, which is calculated by the mean of mass similarity, isotope similarity, and fragmentation score. The maximum value for the score is 60. The score acts as a quality parameter to assess the identification reliability. Inline graphic CRITICAL: Metabolomics Standards Initiative (MSI) has established four confidence levels for metabolite identification.[100]^19 Level 1 represents the highest confidence where metabolite structures have been confirmed using reference standards. Level 2 denotes the probable identification of exact structures based on the matching of experimental spectra data with those from established databases or literatures. Level 3 to 4 indicate progressively less certainty in identification. It is highly recommended to report the confidence level of the identified metabolites in metabolomics studies. Typically, identified metabolites should have at least MSI Level 2. Note: The matched metabolites in step 48 can be reported as MSI Level 2. If reference standards are available, the confidence level of these metabolites can be elevated to Level 1 with the confirmation of reference standards. * 49. Categorize the identified metabolites and visualize the classification in Sankey plot using R package ggalluvial. ([101]Figure 4A). Note: These classifications are available across various platforms such as MetaboAnalyst and RefMet ([102]https://www.metabolomicsworkbench.org/databases/refmet/index.p hp)[103]^20 by importing a metabolite list. * 50. Conduct pathway enrichment analysis of metabolites based on Kyoto Encyclopedia of Genes and Genomes (KEGG) database ([104]https://www.genome.jp/kegg/) ([105]Figure 4B). * 51. Trace the connections between metabolites and genes in metabolic pathways and visualize them by Cytoscape software[106]^21 ([107]Figure 4C). Note: Access the Cytoscape App Manager to search and install MetScape plugin. This tool enables the construction of metabolite-gene networks by inputting metabolites of interest and optionally, genes of interest. Typically, the parameters required include constructing a pathway-based network, selecting a specific organism, entering the desired metabolites with their corresponding KEGG IDs, and opting for a compound-gene network type. Then, compounds and genes associated with input metabolites that are identified through computational analysis using databases such as KEGG or HMDB. * 52. Select candidate metabolite biomarkers using multifaceted bioinformatics methods such as pathway-based enrichment analysis and abundance-based differential analysis. * 53. Conduct ROC analysis of selected metabolites. Perform single-metabolite ROC analysis in R using pROC package and plot the area under the curve (AUC) at the same time ([108]Figure 5B). >install.packages('pROC') >library(pROC) >data<-read.csv(“data.csv”, sep=',',head=F, check.names=F) #samples in the columns and the first row is sample name, and the second row is the outcome of each sample, then each row is each metabolite level in all samples. >data <- as.data.frame(t(data)) >colnames(data)<-data[1,] >data<-data[-1,] >#to avoid predictors must be numeric or ordered in response to roc() function >set2<-data[,c(1,2)] >set1<-dplyr::mutate_all(data[,3:ncol(data)],as.numeric) >data<-cbind(set2,set1) >for (i in 3:dim(data)[2]){ colnames(data)[i]<- paste(“metabolite”, i-2, sep=“”, collapse=NULL) } >data[,‘outcome’]<-as.factor(data[,‘outcome’]) >roc1<- roc(data$outcome, data$metabolite1,smooth=T) >plot(roc1, print.auc=T,col="#ef7a6d") * 54. For multiple metabolite-combined ROC curves, locate the module on MetaboAnalyst where both classification method and feature ranking method can be chosen ([109]Figures 5A and 5C). Note: Candidate biomarkers with AUC > 0.7 is usually considered as acceptable for further validation. Inline graphic CRITICAL: It is necessary to use an independent validation cohort to assess the performance of candidate biomarkers, which involves evaluating statistical differences in metabolite levels between groups and conducting ROC analysis. Figure 3. [110]Figure 3 [111]Open in a new tab Multivariate analysis (A) The structure of datasets utilized in PCA and (O)PLS-DA code. The row and column names of the ‘data’ dataframe are shown respectively as ‘Sample 1, Sample 2, …’ and ‘Metabolite 1, Metabolite 2, …’. The row names of the ‘datagroup’ dataframe are displayed as ‘Sample 1, Sample 2, …’. (B) Example PCA plot. Each dot represents a sample. (C) Example OPLS-DA plot. (D) Example of a lollipop plot displaying the top 20 metabolites ranked by VIP value. Each dot represents a metabolite. VIP >1 is considered as a cutoff for significant metabolites. (E) Example of a scatter plot exhibiting VIP and FDR values for all metabolites. Blue lines indicate the threshold criteria of differential metabolite, set as VIP >1 and FDR <0.05. Figure 4. [112]Figure 4 [113]Open in a new tab Metabolite enrichment analysis (A) Illustrative Sankey plot depicting the main classes of individual metabolites. (B) Example pathway analysis of metabolites. Each bar represents a pathway term. The upper x-axis indicates -log10(P) and the lower x-axis indicates the enrichment ratio. (C) Illustrative network depicting interactions between metabolites and genes using MetScape plugin in Cytoscape software. Each hexagon symbolizes a metabolite, and each dot represents an enzyme responsible for the conversion of the adjacent metabolites. Figure 5. [114]Figure 5 [115]Open in a new tab ROC analysis (A) Multivariate ROC analysis module on MetaboAnalyst platform interface. (B) Example ROC curves for individual variables. Each curve represents the ROC curve for each variable and the corresponding AUC value for each variable is shown. Figure reprinted and adapted with permission from Wang et al., 2023. (C) Example multivariate ROC curve based on SVM algorithm. Figure reprinted and adapted with permission from Wang et al., 2023. Expected outcomes The implementation of this protocol has revealed the omics-scale metabolic profile of uterine fluid in various disease conditions among women. Notably, an effective biomarker panel has been developed and validated, substantiating the reliability of this sensitive and non-invasive approach for early OC detection. Limitations First, matrix effect in biological samples is inherent to LC-MS, impacting accurate quantification in untargeted metabolomics. However, optimizing sample preparation and using internal standards closely resembling the targeted metabolites can significantly mitigate this effect. Secondly, the current constraints in reference databases and annotation methods hinder a complete and accurate coverage of uterine fluid metabolome. Advancing algorithms and expanding databases could address this issue. Troubleshooting Problem 1 A large amount of fluid leakage happens during the process of flushing (step 7). Potential solution Massive leakage during uterine fluid collection can result in incomplete sample harvest, impacting the accuracy and reliability of the analysis and missing of low-abundance metabolites. It is probably due to the large quantity of normal saline infused into the uterus at one time. It is advisable to decrease the volume for each lavage and to flush at a slow pace to avoid strong turbulence. Problem 2 Cells and tissues residue in the uterine fluid (step 12). Potential solution Repeat a second centrifugation under the same condition and transfer the supernatant into a new tube. Problem 3 Retention time drifts (step 32). Potential solution This issue may be due to insufficient system balance, column contamination or chromatographic system leakage and the specific cause should be determined. If leakage is detected, the experiment requires to be redone. For other reasons such as column contamination, minor retention time drifts (< 0.1 min) are generally acceptable and manageable as peaks can be aligned with Progenesis QI software. However, for drifts that exceed 0.1 min, check the column efficiency, and adjust the elution gradient to adequately remove contaminants, then repeat the experiment. Problem 4 QC samples are not clustered tightly in PCA for large-scale metabolomics data (step 45). Potential solution This issue might arise from data acquisition on a single LC-MS system over an extended period, which can lead to various unavoidable systematic discrepancies. The Systematic Error Removal using Random Forest (SERRF) Normalization method,[116]^22 which leverages QC samples, could serve as a potential solution for correcting and minimizing these systematic variations. Problem 5 Multiple features match with the same metabolite (step 48). Potential solution First, it is advised to order these features by their precursor and fragment scores in descending order. Subsequently, check the MS/MS spectra starting from the highest scoring feature and moving downward, as a higher score suggests a greater likelihood of it being the correct metabolite match. Notably, if the reference standard is available for this metabolite, the most accurate method is to run the reference standard under the identical LC-MS conditions to verify the alignment of structural data with the initial experimental sample. Problem 6 When running the roc() function, "Predictor must be numeric or ordered." might occur as an error (step 53). Potential solution The pROC package reference manual demonstrates that the roc() function requires two vectors (response, predictor). It is essential that the predictor vector is numeric or ordered, as opposed to other types like character. To address this, character values can be converted to numeric values using the dplyr::mutate_all(predictor lines, as numeric) function in dplyr package. Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Mo Li (limo@hsc.pku.edu.cn). Technical contact Questions about the technical specifics of performing the protocol should be directed to and will be answered by the technical contact, Yuening Jiang (2111110435@stu.pku.edu.cn). Materials availability This study did not generate new unique reagents. Data and code availability The accession number for the untargeted metabolomics data reported in this paper is MetaboLights: MTBLS4861. Acknowledgments