Abstract Background This study utilized Mendelian randomization (MR) to investigate the causal relationship between circulating plasma proteins and lung adenocarcinoma. Methods We obtained 734 circulating plasma protein data from genome-wide association studies (GWAS) as exposure factors and extracted single nucleotide polymorphisms (SNPs) as instrumental variables. And we obtained lung adenocarcinoma data (including 11,245 cases and 54,619 controls) from the IEU Open GWAS database as the outcome factor. The main analytical methods used are inverse-variance weighted (IVW) or Wald ratio to assess the causal relationship between circulating plasma protein levels and lung adenocarcinoma. Sensitivity analysis (leave one out method, heterogeneity and pleiotropy tests), external validation analysis, and meta-analysis after MR were used to evaluate the reliability of MR results. Finally, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were performed on the final screened plasma proteins. Results Through the preliminary and external validation stages, ICAM5 (OR = 0.92, 95%CI: 0.89–0.95, P = 2.31 × 10^–6), PCYOX1 (OR = 0.89, 95%CI: 0.85–0.93, P = 5.31 × 10^–8), and TYMP (OR = 0.76, 95%CI: 0.66–0.87, P = 5.79 × 10^–5) are negatively correlated with lung adenocarcinoma. Sensitivity analyses, external validation, and post-MR meta-analysis indicated that the MR results were robust. GO and KEGG pathway enrichment analyses demonstrated that these plasma proteins were primarily enriched in pathways such as "pyrimidine deoxyribonucleoside monophosphate metabolic process", "deoxyribonucleoside monophosphate catabolic process", "mitochondrial genome maintenance", and "Pyrimidine metabolism". Conclusions ICAM5, PCYOX1 and TYMP are associated with a decreased risk of lung adenocarcinoma. Plasma proteins may become new biological markers for lung adenocarcinoma, providing new insights into the prevention and treatment of this disease. Keywords: Mendelian randomization, Plasma proteins, Lung adenocarcinoma Introduction The global incidence of lung cancer is second only to breast cancer, but it has the highest mortality rate, accounting for approximately 18% of total cancer deaths [[36]1]. It has become one of the global health burdens [[37]2]. Lung cancer is classified as NSCLC and SCLC according to histopathological classification, accounting for approximately 85% and 15% respectively [[38]3]. Adenocarcinoma is the most common type of NSCLC, followed by squamous cell carcinoma. Since the 1990 s, with the implementation of early lung cancer screening and continuous optimization of treatment regimens, the 5-year survival rate of lung cancer patients has increased from 13 to 22% [[39]4]. However, there are significant differences in the prognosis of lung cancer patients at different clinical stages, with a 5-year survival rate of over 90% for stage IA, and less than 10% for stage IV [[40]5]. Therefore, early diagnosis and treatment of lung cancer are of great significance for the prognosis of patients. Despite the rapid development of medical technology, the complex pathogenesis of cancer poses many challenges in its diagnosis and treatment. Firstly, current screening methods such as imaging and pathological examination still have many limitations in early screening [[41]5]. Secondly, for patients who cannot be cured by surgery, the existing treatment methods such as chemotherapy, radiotherapy, and immunotherapy are also unsatisfactory in terms of efficacy [[42]6]. Therefore, further exploration of biomarkers for early diagnosis of lung cancer may help improve the survival rate of lung cancer patients. Circulating plasma proteins are important components of the blood, actively participating in various biological processes in the human body, such as signal transduction, transportation, growth, repair, and anti infection, and playing a crucial role [[43]7]. In addition, plasma proteins also play an important role in the occurrence and development of cancer, participating in the growth, migration, invasion of cancer cells, and the formation of tumor microenvironment [[44]8]. Moreover, plasma proteins can serve as valuable biomarkers in cancer. Multiple studies have shown that circulating proteins have important value in early screening, treatment response monitoring, and prognosis prediction of lung cancer [[45]9]. It may open up a new way for the precise treatment of cancer. Plasma proteins are closely related to lung cancer, but there are still many protein markers that have not been identified. Mendelian randomization (MR) analysis is a commonly used genetic epidemiological method that utilizes single nucleotide polymorphisms (SNPs) strongly associated with the exposure factor as instrumental variables (IVs) to infer the causal relationship between the exposure factor and the study outcome [[46]10]. Compared to traditional observational epidemiology, MR analysis is less susceptible to confounding factors and reverse causality [[47]11]. Based on this, the present study applies the MR analysis method to investigate the association between plasma proteins and lung adenocarcinoma, aiming to provide new insights for the early prevention of lung adenocarcinoma. Materials and methods Data sources Firstly, genetic variants associated with protein expression levels are referred to as"protein quantitative trait loci (pQTL)", the circulating plasma protein pQTL data of this study was obtained from the study of Lin J et al. [[48]12], which integrated five previously published GWAS datasets [[49]7, [50]13–[51]16]. Second, the lung adenocarcinoma GWAS summary data were obtained from the IEU Open GWAS database ([52]https://gwas.mrcieu.ac.uk/), including a total of 65,864 European population samples (11,245 cases and 54,619 controls) [[53]17–[54]21]. All the study data mentioned above were derived from publicly available GWAS databases or previously published studies and did not require separate ethical approval or consent. The data details are shown in Table [55]1. Table 1. GWAS data information on Preliminary analysis and External validation analysis Validation phase Research Exposure and Outcome Source of participants (sample size) Types of plasma proteins Analytical methods Preliminary analysis Sun et al. [[56]7] Plasma Proteins England (3301) 3622 SOMAscan Emilsson et al. [[57]13] Iceland (5457) 4137 SOMAscan Folkersen et al. [[58]14] Europe (3394) 83 Olink ProSeek Suhre et al. [[59]15] Southern Germany (1000) 1124 SOMAscan Yao et al. [[60]16] America (7333) 71 Multiplex immunoassays TRICL [[61]17–[62]21] Lung Adenocarcinoma European (11,245 cases of patients, 54,619 cases of controls) - - External validation analysis UKB-PPP [[63]22] Plasma Proteins European (33,469) 2923 - deCODE [[64]23] Icelander (35,559) 4907 - [65]Open in a new tab Study design As a statistical method, MR analysis must satisfy three core assumptions [[66]24]: 1. The instrumental variables are significantly associated with the exposure; 2. The instrumental variables must be unrelated to confounding factors between the exposure and the outcome; 3. The instrumental variables have no direct relationship with the outcome, only influencing the outcome through the exposure (Fig. [67]1). Based on these core assumptions, if a study includes two or more instrumental variables, the inverse variance weighted (IVW) analysis method is used as the primary analysis method, with MR-Egger regression and weighted median (WM) as secondary auxiliary methods, to evaluate the causal effect between circulating plasma proteins and lung adenocarcinoma. In addition, when only one instrumental variable is included in the MR model, because there is no invalid instrumental variable test, the Wald Ratio method is used to assess the causal effect between circulating plasma proteins and lung adenocarcinoma. Sensitivity analysis is crucial for detecting pleiotropy, with the application of Cochran Q test to assess heterogeneity, MR-PRESSO to detect pleiotropy residuals and outliers, and the use of the"leave-one-out test"to assess whether MR is determined or biased by a single SNP. For the main analysis, Bonferroni correction is used to adjust the significance threshold for multiple testing, resulting in a corrected P value of P < 6.81 × 10^–5 (0.05/number of tests, i.e., 0.05/734). Fig. 1. [68]Fig. 1 [69]Open in a new tab Schematic diagram of the instrumental variable core assumptions in the analysis of two-sample Mendelian randomization. Assumption 1: IVs are significantly associated with the exposure; Assumption 2: IVs must be unrelated to confounding factors between the exposure and the outcome; Assumption 3: IVs only affects the outcome through its exposure. Abbreviations: SNPs, single nucleotide polymorphisms; IVW, inverse variance weighted Finally, we used the UK Biobank Pharma Proteomics Project (UKB-PPP) [[70]22] and deCODE [[71]23] datasets to externally validate preliminary validation results. The details of UKB-PPP and the deCODE datasets can be found in Table [72]1. A statistical significance threshold of 0.05 was applied for the validation. Instrumental variables In order to screen for SNPs significantly associated with exposure and to ensure the accuracy and reliability of the study conclusions, this study followed the plasma circulating protein pQTL screening criteria as proposed by Lin J et al. [[73]12]. The inclusion criteria for pQTLs in the study were as follows: (1) pQTLs reaching genome-wide significance threshold (P < 5 × 10^–8); (2) pQTLs located outside the major histocompatibility complex (MHC) region (chr6, 26–34 Mb); (3) no significant linkage disequilibrium (LD) between pQTLs (LD clustering r^2 < 0.001); and (4) was a cis-acting pQTL (defined as a genetic variant located within a 500 kb window upstream and downstream of the gene unit influencing protein expression). Ultimately, 738 cis-acting SNPs representing 734 circulating plasma proteins were included. Additionally, In order to minimize the impact of weak instrumental variable bias on the results and ensure a robust relationship between the instrumental variable and exposure, the strength of the association between SNPs and exposure is evaluated using the F-statistic. When F > 10, it indicates that the likelihood of weak instrumental variable bias is small. The formula for the F-statistic is [MATH: F=N-K-1/K×R2/(1-R2) :MATH] , where N represents the sample size of the exposure, K represents the number of SNPs, R^2 represents the proportion of exposure variance explained by SNPs, and [MATH: R2=2×(1-MAF)×MAF< mo>×β/SD :MATH] . Here, MAF denotes the minor allele frequency, which can be considered equivalent to the frequency of the effect allele when calculating, and [MATH: β :MATH] represents the effect size of the allele, while [MATH: SD=SE×N :MATH] is the standard deviation, and SE represents the standard error of [MATH: β :MATH] . Following these steps, instrumental variables for MR analysis were strictly screened. The F-statistics of the IVs screened from preliminary and external validation phases are all greater than 10, indicating that these IVs are robust and satisfy the assumption 1. Additionally, in order to satisfy the MR analysis assumption 2 and 3, we utilized the NHGRI-EBI Catalog ([74]https://www.ebi.ac.uk/gwas/) to analyze the IVs screened from preliminary validation, which aiming to removing the IVs associated with lung adenocarcinoma and confounding factors (i.e., including such as smoking [[75]25], alcohol consumption [[76]26], BMI [[77]27], hypertension [[78]28], diabetes [[79]29], and high cholesterol [[80]30]). GO and KEGG pathway enrichment analyses In order to further reveal the function of plasma proteins in lung adenocarcinoma, we used an online analysis network to perform GO analysis and KEGG pathway enrichment analysis on the significantly associated plasma proteins, with a threshold set at p < 0.05. The MR analysis was implemented using the ‘TwoSampleMR’ ([81]https://github.com/MRCIEU/TwoSampleMR) package, ‘MR-PRESSO’ ([82]https://github.com/rondolab/MR-PRESSO) package and ‘MendelianRandomization’ package in R (version 4.2.2). Results The MR analysis results of the relationship between circulating plasma proteins and lung adenocarcinoma This study conducted an analysis of the causal relationship between 734 plasma proteins and lung adenocarcinoma using MR. At the Bonferroni significance level (P < 6.81 × 10^–5), MR analysis suggested a causal relationship between 9 circulating plasma proteins and lung adenocarcinoma, namely CTSH, FLRT3, SFTPB, ICAM5, PCYOX1, C3, BCAN, TYMP, and UNC5D, as detailed in Table [83]2 and Fig. [84]2. CTSH, FLRT3, SFTPB, ICAM5, PCYOX1, C3, BCAN, TYMP, and UNC5D each have only 1 IV, so the Wald Ratio method was used to estimate the causal effect of plasma proteins on lung adenocarcinoma. We evaluated the instrumental variables (IVs) for the aforementioned nine plasma proteins using the NHGRI-EBI Catalog and identified that rs34593439 and rs1130866 were associated with lung adenocarcinoma and confounding factors. Consequently, we excluded CTSH and SFTPB, ultimately confirming seven plasma proteins. No heterogeneity was detected in the main analysis. Table 2. MR results of plasma proteins significantly associated with lung adenocarcinoma Plasma protein UniProt ID SNPs Effect allele OR(95%CI) P value PVE(%) F statistics CTSH [85]P09668 rs34593439 A 1.10(1.06, 1.13) 1.46 × 10^–7 49.95 1098.94 FLRT3 [86]Q9NZU0 rs11908097 C 1.15(1.09, 1.23) 3.81 × 10^–6 14.37 255.54 SFTPB [87]P07988 rs1130866 A 0.90(0.87, 0.93) 1.94 × 10^–9 46.61 972.31 ICAM5 [88]Q8N612;[89]Q9UMF0 rs281439 C 0.92(0.89, 0.95) 2.31 × 10^–6 53.15 1194.80 PCYOX1 [90]Q9UHG3 rs2706762 C 0.89(0.85, 0.93) 5.31 × 10^–8 32.00 609.47 C3 [91]P01024 rs163494 C 0.78(0.69, 0.88) 2.43 × 10^–5 4.43 72.52 BCAN [92]Q96GW7 rs7541549 T 0.76(0.67, 0.87) 4.49 × 10^–5 2.96 48.84 TYMP [93]P19971 rs131798 G 0.76(0.66, 0.87) 5.79 × 10^–5 2.85 46.24 UNC5D [94]Q6UXZ4 rs6468316 T 0.70(0.60, 0.82) 1.62 × 10^–5 2.13 34.52 [95]Open in a new tab SNPs single-nucleotide polymorphisms, PVE proportion of variance explained Fig. 2. [96]Fig. 2 [97]Open in a new tab Volcano plot of the MR analysis results. The relationship between 734 plasma proteins and the risk of lung adenocarcinoma. OR for increased risk of lung adenocarcinoma were expressed as per SD increase in plasma protein levels. Dashed horizontal black line corresponded to P = 6.81 × 10^–5 (0.05/734). Abbreviations: ln, natural logarithm; PVE, proportion of variance explained Sensitivity analysis Due to the fact that each protein corresponds to only one SNP, it is not possible to conduct horizontal pleiotropy and leave-one-out tests. Therefore, this study ensures the stability and reliability of the results through external validation. External validation To further clarify the relationship identified from preliminary analysis, we selected plasma proteins (e.g., FLRT3, ICAM5, PCYOX1, C3, BACN, TYMP, and UNC5D) from the UKB-PPP [[98]22] and deCODE datasets, and analyzed them in the external validation phase. And we used the IVs screening criteria in the preliminary analysis phase to find out the IVs of these proteins from UKB-PPP and deCODE datasets. Obtaining IVs from the deCODE datasets and performing MR and sensitivity analyses, we found that only ICAM5 had statistically significant results. Additionally, relevant IVs data for the plasma proteins FLRT3 and PCYOX1 cannot be obtained in the UKB-PPP datasets. Therefore, we only obtained IVs for the remaining five plasma proteins to perform MR analysis and sensitivity analysis. The analyzed results supported the causal relationships between ICAM5, TYMP and UNC5D with lung adenocarcinoma. To further enhance the reliability of the results, the study conducted a meta-analysis of the preliminary analysis results and external validation results following MR analysis. Notably, although PCYOX1 did not reach statistical significance in the external validation of the deCODE dataset (P = 0.70), the post-MR meta-analysis still demonstrated a statistically significant combined effect for PCYOX1 due to the high weight contribution of the preliminary analysis (92.6% in the common effect model and 90.7% in the random effects model). Finally, the findings indicated a causal relationship between ICAM5, PCYOX1 and TYMP with lung adenocarcinoma (Table [99]3 and Fig. [100]3). Table 3. External validation MR results of plasma proteins and lung adenocarcinoma Research Exposure (Plasma protein) Outcome OR(95%CI) P value PVE(%) UKB-PPP ICAM5 lung adenocarcinoma 0.90(0.83–0.97) 5.13 × 10^–3 14.24 TYMP lung adenocarcinoma 0.77(0.65–0.92) 3.48 × 10^–3 1.76 UNC5D lung adenocarcinoma 0.48(0.28–0.83) 7.95 × 10^–3 0.20 deCODE ICAM5 lung adenocarcinoma 0.91(0.85–0.98) 1.47 × 10^–2 0.24 [101]Open in a new tab SNPs single-nucleotide polymorphisms, PVE proportion of variance explained Fig. 3. [102]Fig. 3 [103]Open in a new tab Forest plot of the post-MR meta-analysis GO and KEGG pathway enrichment analysis results To better understand the four plasma proteins, we performed GO and KEGG pathway enrichment analyses using an online analytical network. The results showed that in terms of biological processes (BP), the plasma proteins were primarily enriched in"pyrimidine deoxyribonucleoside monophosphate metabolic process","deoxyribonucleoside monophosphate catabolic process","pyrimidine nucleobase metabolic process", and"mitochondrial genome maintenance". In terms of cellular components (CC), they were significantly associated with"very-low-density lipoprotein particle","triglyceride-rich plasma lipoprotein particle", and"plasma lipoprotein particle". In terms of molecular functions (MF), the proteins were mainly involved in"oxidoreductase activity, acting on a sulfur group of donors, oxygen as acceptor","pentosyltransferase activity", and"growth factor activity". The top 10 significantly enriched GO terms (P < 0.05) are detailed in Fig. [104]4. Fig. 4. [105]Fig. 4 [106]Open in a new tab Functional Enrichment Analysis of Four Plasma Proteins. a GO analysis results. The x-axis represents the negative logarithm (base 10) of the enrichment score; the larger this value, the more significant the enrichment of the function, and the y-axis represents the GO term. Orange bars: BP, Green bars: CC, Blue bars: MF. b KEGG pathway enrichment analysis results. The x-axis represents the negative logarithm (base 10) of the enrichment score, and the y-axis represents the name of the KEGG pathway. The size of the bubbles represents the number of enriched plasma proteins in each pathway, while the color of the bubbles represents the significance of the enrichment Discussion This study investigated the causal relationship between 734 plasma proteins and lung adenocarcinoma. Using MR analysis and Bonferroni correction, 9 proteins (CTSH, FLRT3, SFTPB, ICAM5, PCYOX1, C3, BACN, TYMP, and UNC5D) were found to be relationship with lung adenocarcinoma. Subsequently, through analyses of confounding factors, external validation, and post-MR meta-analysis, only 3 out of the 9 proteins (ICAM5, PCYOX1, and TYMP) were validated, further supporting the reliability of this study. Intercellular adhesion molecule 5 (ICAM5) is a unique adhesion molecule with binding ability, flexibility, and multiple potential signaling capabilities, playing an important role in the growth and maturation of neuronal synapses [[107]31]. In recent years, research on ICAM5 has found a close association with glioblastoma (GBM), and its expression levels can be used to predict the survival risk of GBM, showing a negative correlation [[108]32]. In addition, ICAM5 is involved in related signaling pathways that promote the development of bladder cancer [[109]33]. Although there is limited research on its relevance to lung adenocarcinoma, there have been reports of a significant association between ICAM5 and a reduced risk of lung adenocarcinoma [[110]34], consistent with the findings of this study. In another MR study, ICAM5 and FLRT3 are potential targets for lung cancer and are categorized as tier 4 targets, with potential practicality as cancer drug targets [[111]8]. The above studies found that ICAM5 may play different roles in different types of cancer, and may be a potential protective factor in lung adenocarcinoma. Further exploration of its protective mechanism in lung adenocarcinoma may become a potential therapeutic target, suggesting a new direction for drug research in lung adenocarcinoma. Prenylcysteine oxidase 1 (PCYOX1) is an enzyme involved in the degradation of prenylated proteins [[112]35]. It is expressed in various embryonic and adult tissues, predominantly in the liver. PCYOX1 interacts with multiple proteins and plays crucial roles in the respiratory chain, cell death, cell signaling, movement and transport, metabolism and protein degradation [[113]36]. The dysregulation of PCYOX1 may affect the balance of apoptosis in cells, leading to excessive proliferation or death of cells, which may contribute to the development of cancer. Little is known about PCYOX1's role in cancer. Studies have shown that PCYOX1 exhibits tumor type-specific expression patterns across malignancies [[114]36]. Our research found that PCYOX1 is negatively correlated with lung adenocarcinoma. Based on these findings, further studies are needed to determine the mechanism of action of this protein in the field of tumors. Thymidine phosphorylase (TYMP) is the rate-limiting enzyme catalyzing thymidine and has a promoting effect on the invasion and metastasis of cancer cells [[115]37]. This may be related to the strong pro-angiogenic and anti-apoptotic effects of TYMP [[116]38]. Therefore, in many studies, TYMP expression has been associated with poor prognosis [[117]39]. In addition, TYMP is also involved in the metabolism of chemotherapeutic drugs such as 5-FU and capecitabine. Tarar A et al. [[118]40] study shows that using human mesenchymal stem cells (hMSCs) as cell carriers to deliver TYMP to lung cancer cells can enhance the cytotoxic effect of chemotherapy drugs on cancer cells. Another study found that the active metabolite 5’-DFCR of capecitabine can aim at chemotherapy-resistant NSCLC for targeted therapy, and these NSCLC exhibit higher CAD and TYMP expression [[119]41]. Furthermore, TYMP itself as a therapeutic target for cancer, and its inhibitor has been used in cancer treatment [[120]38]. It can be seen that TYMP plays a variety of roles in the development and treatment of tumors. This study found a negative correlation between TYMP and lung adenocarcinoma, which may be a potential protective factor, but further investigation is needed. As the level of health improves, tumor screening has become increasingly popular. In tumor screening, the convenience, speed, and repeatability brought by serum biomarker tests all have significant advantages. This study explores the causal relationship between four plasma proteins and lung adenocarcinoma, which is helpful for the early diagnosis and treatment of lung cancer. This study also has some limitations. First, due to the limitation of the number of SNPs, sensitive analysis of the identified plasma proteins cannot be performed. Second, the specificity of the study population, as all individuals in this study are of European descent, may make it difficult to generalize our conclusions to other populations. Third, plasma protein levels may be influenced by various factors, however, this study only focuses on genetics, therefore, larger-scale GWAS studies and foundational research are needed to further validate the reliability and credibility of the results of this study. Conclusion In conclusion, this study used two-sample MR analysis to explore the causal relationship between plasma proteins and lung adenocarcinoma, and identified 3 plasma proteins (ICAM5, PCYOX1 and TYMP) significantly associated with the risk of lung adenocarcinoma, providing new insights for the early diagnosis, treatment, and prognosis prediction of lung adenocarcinoma. Acknowledgements