Abstract Rationale: Gastric cancer (GC) is preceded by a stepwise progression of precancerous gastric lesions. Distinguishing individuals with precancerous gastric lesions that have progression potential to GC is an important need. Perturbated lipid metabolism, particularly the dysregulation of de novo lipogenesis, is involved in gastric carcinogenesis. We conducted the first prospective lipidomics study exploring lipidomic signatures for the risk of gastric lesion progression and early GC. Methods: Our two-stage study of targeted lipidomics enrolled 400 subjects from the National Upper Gastrointestinal Cancer Early Detection Program in China, including 200 subjects of GC and different gastric lesions in the discovery and validation stages. Of validation stage, 152 cases with gastric lesions were prospectively followed for the progression of gastric lesions for a median follow-up of 580 days (interquartile range 390-806 days). We examined the lipidomic signatures associated with the risk of advanced gastric lesions and their progression to GC. Our published tissue proteomic data were referred to further investigate highlighted lipids with their biologically related protein expression in gastric mucosa. Results: We identified 11 plasma lipids significantly inversely associated with the risk of gastric lesion progression and GC occurrence. These lipids were integrated as latent profiles to identify 5 clusters of lipid expression that had distinct risk of gastric lesion progression. The latent profiles significantly improved the ability to predict the progression potential of gastric lesions (AUC: 0.82 vs 0.68, Delong's P = 4.6×10^-4) and risk of early GC (AUC: 0.81 vs 0.55, P = 6.3×10^-5). Significant associations were found between highlighted lipids, their biologically correlated proteins and the risk of GC, supporting the role of the pathways involving monocarboxylic acid metabolism and lipid transport and catabolic process in GC. Conclusions: Our study revealed the lipidomic signatures associated with the risk of gastric lesion progression and GC occurrence, exhibiting translational implications for GC prevention. Keywords: Gastric cancer, Lipidomics, Precancerous gastric lesion, Biomarker Introduction Gastric cancer (GC) is one major public health threat with high morbidity and mortality worldwide [51]^1. GC of the intestinal type predominates in high-risk geographic areas [52]^2, and its occurrence experiences multistep cascade progression of gastric lesions, which evolve from superficial gastritis (SG), chronic atrophic gastritis (CAG), intestinal metaplasia (IM), and low-grade intraepithelial neoplasia (LGIN) to high-grade intraepithelial neoplasia (HGIN) and invasive GC [53]^3^,[54]^4. Studies of TCGA and other data have examined the molecular subtypes of GC, aiming to provide a roadmap for patient stratification and targeted therapies [55]^5^,[56]^6. However, while most GCs are diagnosed at locally advanced or advanced stages with unfavorable prognosis[57]^7, efforts are warranted to identify populations at particularly high-risk for progression of gastric lesions and development of GC, essential for improving the primary prevention and early detection of GC. Efficient biomarkers are therefore highly needed. Lipids play essential roles in cellular functions related to the carcinogenesis process [58]^8. Perturbated lipid metabolism, including increased lipid uptake, endogenous de novo fatty acid synthesis, fatty acid oxidation, and cholesterol accumulation, has been reported to promote tumor growth and progression [59]^9^-[60]^11. In addition, lipid content of phospholipids could compromise membrane fluidity and signal transduction which may in turn affect GC tumorigenesis and progression [61]^12^,[62]^13. In our recent study based on untargeted metabolomics covering carbohydrates, amino acids, nucleotides, polar lipids, and other metabolites; six lipids, including α-linolenic acid, linoleic acid, palmitic acid, arachidonic acid, sn-1 lysophosphatidylcholine (LysoPC)18:3, and sn-2 LysoPC20:3 stood out to have the most robust associations with the risk of early GC, with the first three also significantly associated with the risk of gastric lesion progression in a prospective analysis [63]^14. These highlight the potential importance of the overall lipidomic profile underlying GC carcinogenesis [64]^15. However, previous metabolomics studies of GC were restricted to water-soluble compounds and volatile metabolites [65]^16, which lacked coverage and in-depth investigation for a wide range of lipids with potentially pivotal functions, thus leaving a knowledge gap on the full spectrum of lipidomic signatures associated with the development of GC. Based on a total of 400 subjects from Linqu county, a well-recognized high-risk area in eastern China [66]^4^,[67]^17, we conducted the first comprehensive lipidomics study for GC and delineated a plasma lipidomics profile for a sequence of gastric lesions and GC in two stages. We took advantage of our prospectively followed participants and longitudinally investigated the lipidomic signatures underlying the progression of gastric lesions and development of GC. Methods Study participants Our study involved a total of 400 subjects in two stages from Linqu County, Shandong Province of China, an established high-risk area for GC, where most GCs are of the intestinal type [68]^4^,[69]^17. All subjects were enrolled from those attending the National Upper Gastrointestinal Cancer Early Detection (UGCED) Program for rural areas, in which residents aged 40 to 69 years received upper gastroendoscopy examinations free of charge. Individuals with cardiovascular, liver and spleen disorder and other major chronic diseases are ineligible for gastroendoscopy and were therefore excluded from the program. Gastroendoscopy was performed by two experienced gastroenterologists using video endoscopes (Olympus). For each individual, biopsies were taken at five standardized sites and other sites with suspicious lesion detected by endoscopy, if any [70]^18. Formalin-fixed, paraffin-embedded tissue samples for biopsy were reviewed blindly by two pathologists. Each subject was given a global diagnosis of normal, SG, CAG, IM, LGIN, HGIN, or invasive GC, defined as the most severe gastric histology among all biopsies, following the criteria of the Updated Sydney System [71]^18 and the Chinese Association of Gastric Cancer [72]^19. Subjects were surveyed using standard questionnaires and had a 5ml blood sample collected following standardized collection process. H.pylori infection status was determined by enzyme-linked immunosorbent assay for plasma IgG [73]^20. The study consisted of two independent stages involving a total of 400 subjects. The discovery set included a total of 200 subjects with gastric lesions of different stages (n = 169) and GC (n = 31, including 22 HGINs and 9 invasive GCs) diagnosed in 2018. The validation set further independently enrolled 200 subjects, including 48 cases of GC and 152 cases with different gastric lesions diagnosed in 2017. We did not include any subjects with normal gastric mucosa as few of the adult residents had completely normal histology [74]^17^,[75]^19. We prospectively followed the subjects of gastric lesions in the validation stage (n = 152, “prospective cohort”) until May 31, 2021, for a median follow-up of 580 days (interquartile range 390 to 806 days), with endoscopic examinations conducted at the endpoint for each individual. Among them, we had a multi-time point longitudinal sub-cohort of 76 participants who undertook further gastroendoscopy examinations in the middle of follow-up and thus had three or more measurement of gastric lesions during the follow-up. The progression of gastric lesions during the follow-up for the prospective cohort, or during a time window for the multi-time point longitudinal sub-cohort was assessed based on the global diagnosis of gastric lesions, defined as the most severe gastric histology among all biopsies (SG, CAG, IM, LGIN, HGIN or invasive GC). Subjects were considered to have progression of gastric lesions, if the severity of gastric lesion at follow-up endpoint is higher than that at baseline. Details of the participants in each cohort are presented in Figure [76]1 and Table S1. Figure 1. [77]Figure 1 [78]Open in a new tab General workflow of the study. Targeted lipidomics analysis involved a total of 200 subjects in two stages respectively. In the validation stage, 152 non-GC subjects were prospectively followed for the progression of gastric lesions (“prospective follow-up cohort”). For 11 validated lipids significantly associated with risk of gastric lesion progression and GC occurrence, latent profiles were extracted using VAEN, representing the refined molecular pattern of lipids. Latent profiles of lipids were used to define lipidomic-based clusters of the prospective cohort subjects and the time-varying trajectories of gastric lesion progression were delineated by the clusters. XGBoost models were constructed to predict the risk of gastric lesion progression and GC occurrence. CAG, chronic atrophic gastritis; FDR: false discovery rate; GC, gastric cancer; HGIN, high-grade intraepithelial neoplasia; IM, intestinal metaplasia; LGIN, low-grade intraepithelial neoplasia; ROC, receiver operating characteristic; SG, superficial gastritis; VAEN, variational auto-encoder followed by the elastic net regression model; VIP, variable importance in projection; XGBoost, extreme gradient boosting. The study was approved by the Institutional Review Board of Peking University Cancer Hospital. Informed consent was waived as study subjects were selected within the framework of the National UGCED Program. Targeted lipidomics profiling Targeted lipidomics profiling was performed on plasma samples using ultra high-performance liquid chromatography-mass spectrometry (LC-MS) [79]^21. Methods on sample preparation and LC-MS assays are detailed in the Supplementary Methods. Quality control (QC) samples were prepared using mixed plasma samples, with 1 QC sample inserted between every 20 tested samples. A total of 10 and 11 QC samples were inserted during the lipidomics profiling for plasma samples in the discovery and validation stage, respectively. Ionization signals were monitored in QC samples based on the intensities of internal standards for individual lipid classes to ensure no significant drop in intensity (within 20%) and no drift in retention time (within 0.05 min) throughout the run. Lipids were identified based on structure-specific multiple reaction monitoring (MRMs), which comprise MRMs specific to both head groups distinct to individual lipid classes and fatty acyl compositions, as well as correct retention times by comparing to authentic lipid reference compounds from human lipid ID inventory constructed in-house. Lipid levels were expressed in moles per L (mol/L) of plasma for statistical analyses. Bioinformatics and statistical analysis We conducted bioinformatics and statistical analyses for the lipid signatures associated with the risk of GC compared with the well-recognized mild gastric lesion group (SG/CAG) or advanced gastric lesion group (IM/LGIN) as references, based on the discovery and