Abstract Objective To identify key cerebrospinal fluid (CSF) metabolomic biomarkers associated with Parkinson’s disease (PD) and prodromal PD, providing insights for intervention strategy development. Methods Six hundred and thirty-nine participants from the Parkinson’s Progression Markers Initiative (PPMI) cohort were included: 300 PD patients, 112 healthy controls (HC), and 227 prodromal PD patients. Regression analysis and OPLS-DA identified metabolic biomarkers, while pathway analysis examined their relationship to clinical features. An XGBoost-based early prediction model was developed to assess the distinction between PD, prodromal PD, and HC. A two-sample bidirectional Mendelian randomization analysis was performed to examine the association between differential metabolites and Parkinson’s disease. Results Sixty-four metabolites were significantly altered in PD patients compared to HC, with 58 elevated and 6 reduced. In prodromal PD, 19 metabolites were increased, and 34 were decreased. Key metabolic pathways involved glutathione and amino acid metabolism. Dopamine 3-O-sulfate correlated with PD progression, levodopa-equivalent dose, and non-motor symptoms. The XGBoost model demonstrated high specificity in predicting the onset of PD. The MR analysis results showed a positive correlation between higher genetic predictions of dopamine 3-O-sulfate levels and the risk of Parkinson’s disease. In contrast, the reverse MR analysis found that Parkinson’s disease susceptibility significantly increased dopamine 3-O-sulfate levels. Conclusion The differential expression of CSF metabolites reveals early cellular metabolic changes, providing insights for early diagnosis and monitoring PD progression. A bidirectional causal relationship exists between genetically determined PD susceptibility and metabolites. Keywords: Parkinson diseases, metabolomic biomarkers, early prediction model, bidirectional Mendelian randomization, PD susceptibility Introduction Parkinson’s disease (PD) is a motor disorder characterized by the progressive loss of dopaminergic neurons in the substantia nigra and the abnormal aggregation of α-synuclein. The motor symptoms of PD are primarily tremor, bradykinesia, and rigidity. In addition to these motor deficits, PD also encompasses a range of non-motor and prodromal symptoms, such as sleep disturbances, olfactory dysfunction, psychiatric and mood changes, cognitive impairment, and autonomic dysfunction. These non-motor symptoms may appear either before or concurrently with the onset of motor symptoms ([31]Marinus et al., 2018). Although numerous hypotheses regarding the pathogenesis of PD have been proposed, there is currently no effective method to slow the progression of the disease. This complexity arises from the involvement of multiple brain regions and various neurotransmitter systems. These systems include the co-release of dopamine, a typical neurotransmitter, and other excitatory or inhibitory neurotransmitters, which contribute to clinical heterogeneity ([32]Barcomb and Ford, 2023). While the accuracy of clinical diagnosis has improved in the past decade, especially in the early stages, reaching up to 90.3% in some studies ([33]Virameteekul et al., 2023), predicting disease progression in the early stages remains challenging. This is primarily due to the overlap of early clinical features, the complexity of disease subtypes, and the limitations of diagnostic criteria ([34]Tolosa et al., 2021). The prodromal phase is considered a critical window for intervention, making early and accurate diagnosis essential. Predicting disease progression based on reliable and sensitive early biomarkers and quantifying different pathological states of PD remain key research focuses ([35]Theis et al., 2024). Early diagnosis of PD primarily involves clinical symptom assessment, biochemical testing, imaging techniques, and genetic analysis ([36]Parab et al., 2023; [37]Mitchell et al., 2021). Cerebrospinal fluid (CSF), which directly interacts with brain cells, offers an accurate reflection of the underlying molecular mechanisms of PD. While the α-synuclein seed amplification assay in CSF demonstrates high sensitivity and specificity, it reflects only part of the disease pathology, highlighting the need for additional biomarkers to fully characterize PD ([38]Postuma and Berg, 2016). CSF metabolomics, through the mapping and quantification of various small-molecule metabolites, provides a comprehensive insight into cellular metabolism and neurotransmitter alterations ([39]Stoessel et al., 2018). With recent advancements in liquid chromatography-mass spectrometry (LC-MS/MS), key biomarkers related to lipid metabolism, polyamines, amino acids, and purine metabolism have garnered increasing attention ([40]Trezzi et al., 2017; [41]Kremer et al., 2021). This study aims to explore the differences in various metabolites, particularly lipid metabolites, at different clinical stages of PD (including healthy controls, prodromal PD, and clinically diagnosed PD patients) using CSF metabolomics as a data-driven source. The study further aims to predict the risk of PD progression. The objectives of this study are as follows: (1) to identify cerebrospinal metabolic biomarkers at different stages of PD progression; (2) to assess the reliability of predictive models by developing a clinical risk model for PD; (3) to link lipid metabolism biomarkers with clinical manifestations to provide clinical utility; (4) to uncover potential mechanisms underlying PD progression through metabolic biomarkers and associated molecular pathways; (5) MR analysis was conducted using publicly available genome-wide association data to evaluate the causal relationship between differential metabolites and Parkinson’s disease. Materials and methods Study participants This study utilized data from the Parkinson’s Progression Markers Initiative (PPMI) database, a large-scale clinical observational study aimed at identifying biomarkers of PD progression from the prodromal phase through to disease onset. A total of 639 participants were included in the analysis, with data collection completed by January 2020. The sample size was determined based on previous studies ([42]Huntwork-Rodriguez et al., 2023). Participants were classified into three groups based on predefined inclusion criteria: (1) PD patients: individuals diagnosed with PD, who were undergoing levodopa treatment. (2) Healthy controls: individuals with no history of neurological disorders, no first-degree family history of PD, and normal dopamine transporter (DAT) single-photon emission computed tomography (SPECT) imaging. (3) Prodromal participants: individuals who had not been clinically diagnosed with PD but exhibited one or more of the following risk factors: rapid eye movement sleep behavior disorder (RBD), olfactory dysfunction, dopamine transporter (DAT) deficiency, or genetic variants associated with an increased risk of PD. The prodromal cohort has as inclusion criteria age >60 years (with the exception of SCNA and other rare mutations). Baseline demographic information, motor and non-motor assessments, and biochemical test results were collected for all participants. All participants underwent lumbar puncture for CSF collection, followed by metabolite and lipid analysis. Data for this study were accessed via the PPMI online database.[43]^1 The PPMI study received ethical approval from the institutional review boards of over 50 research centers globally. Detailed information regarding the ethics committees of the clinical centers can be found in [44]Supplementary Table 1. All participants provided written informed consent before inclusion in the study, in accordance with ethical guidelines ([45]Brumm et al., 2023). The methodology of this study complies with the relevant guidelines of the PPMI Data and Publications Committee (DPC), and the manuscript was submitted to the DPC for review. Genetic association summary data were obtained from GWAS, with dopamine 3-O-sulfate GWAS data sourced from the Wisconsin Alzheimer’s Disease Research Center (WADRC) and the Wisconsin Registry for Alzheimer’s Prevention (WRAP) cohorts, two European populations ([46]Panyard et al., 2021). The data included 412 cerebrospinal fluid metabolites from 291 samples. Parkinson’s disease GWAS data were sourced from the International Parkinson’s Disease Genomics Consortium ([47]Nalls et al., 2019), comprising 33,674 PD cases and 449,056 control samples. Cerebrospinal fluid metabolite and lipid analysis CSF samples collected from participants were analyzed using liquid chromatography–tandem mass spectrometry (LC-MS/MS), employing both targeted metabolomics/lipidomics and untargeted metabolomics approaches. A total of 348 compounds were identified, including sphingolipids, polyamines, cholesterol, gangliosides, ceramides, amino acids, caffeine, and purine metabolites. To minimize batch effects, internal references were established separately for PD patients and