Abstract
Background
This study utilized Mendelian randomization (MR) to investigate the
causal relationship between circulating plasma proteins and lung
adenocarcinoma.
Methods
We obtained 734 circulating plasma protein data from genome-wide
association studies (GWAS) as exposure factors and extracted single
nucleotide polymorphisms (SNPs) as instrumental variables. And we
obtained lung adenocarcinoma data (including 11,245 cases and 54,619
controls) from the IEU Open GWAS database as the outcome factor. The
main analytical methods used are inverse-variance weighted (IVW) or
Wald ratio to assess the causal relationship between circulating plasma
protein levels and lung adenocarcinoma. Sensitivity analysis (leave one
out method, heterogeneity and pleiotropy tests), external validation
analysis, and meta-analysis after MR were used to evaluate the
reliability of MR results. Finally, Gene Ontology (GO) and Kyoto
Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses
were performed on the final screened plasma proteins.
Results
Through the preliminary and external validation stages, ICAM5
(OR = 0.92, 95%CI: 0.89–0.95, P = 2.31 × 10^–6), PCYOX1 (OR = 0.89,
95%CI: 0.85–0.93, P = 5.31 × 10^–8), and TYMP (OR = 0.76, 95%CI:
0.66–0.87, P = 5.79 × 10^–5) are negatively correlated with lung
adenocarcinoma. Sensitivity analyses, external validation, and post-MR
meta-analysis indicated that the MR results were robust. GO and KEGG
pathway enrichment analyses demonstrated that these plasma proteins
were primarily enriched in pathways such as "pyrimidine
deoxyribonucleoside monophosphate metabolic process",
"deoxyribonucleoside monophosphate catabolic process", "mitochondrial
genome maintenance", and "Pyrimidine metabolism".
Conclusions
ICAM5, PCYOX1 and TYMP are associated with a decreased risk of lung
adenocarcinoma. Plasma proteins may become new biological markers for
lung adenocarcinoma, providing new insights into the prevention and
treatment of this disease.
Keywords: Mendelian randomization, Plasma proteins, Lung adenocarcinoma
Introduction
The global incidence of lung cancer is second only to breast cancer,
but it has the highest mortality rate, accounting for approximately 18%
of total cancer deaths [[36]1]. It has become one of the global health
burdens [[37]2]. Lung cancer is classified as NSCLC and SCLC according
to histopathological classification, accounting for approximately 85%
and 15% respectively [[38]3]. Adenocarcinoma is the most common type of
NSCLC, followed by squamous cell carcinoma. Since the 1990 s, with the
implementation of early lung cancer screening and continuous
optimization of treatment regimens, the 5-year survival rate of lung
cancer patients has increased from 13 to 22% [[39]4]. However, there
are significant differences in the prognosis of lung cancer patients at
different clinical stages, with a 5-year survival rate of over 90% for
stage IA, and less than 10% for stage IV [[40]5]. Therefore, early
diagnosis and treatment of lung cancer are of great significance for
the prognosis of patients. Despite the rapid development of medical
technology, the complex pathogenesis of cancer poses many challenges in
its diagnosis and treatment. Firstly, current screening methods such as
imaging and pathological examination still have many limitations in
early screening [[41]5]. Secondly, for patients who cannot be cured by
surgery, the existing treatment methods such as chemotherapy,
radiotherapy, and immunotherapy are also unsatisfactory in terms of
efficacy [[42]6]. Therefore, further exploration of biomarkers for
early diagnosis of lung cancer may help improve the survival rate of
lung cancer patients.
Circulating plasma proteins are important components of the blood,
actively participating in various biological processes in the human
body, such as signal transduction, transportation, growth, repair, and
anti infection, and playing a crucial role [[43]7]. In addition, plasma
proteins also play an important role in the occurrence and development
of cancer, participating in the growth, migration, invasion of cancer
cells, and the formation of tumor microenvironment [[44]8]. Moreover,
plasma proteins can serve as valuable biomarkers in cancer. Multiple
studies have shown that circulating proteins have important value in
early screening, treatment response monitoring, and prognosis
prediction of lung cancer [[45]9]. It may open up a new way for the
precise treatment of cancer. Plasma proteins are closely related to
lung cancer, but there are still many protein markers that have not
been identified.
Mendelian randomization (MR) analysis is a commonly used genetic
epidemiological method that utilizes single nucleotide polymorphisms
(SNPs) strongly associated with the exposure factor as instrumental
variables (IVs) to infer the causal relationship between the exposure
factor and the study outcome [[46]10]. Compared to traditional
observational epidemiology, MR analysis is less susceptible to
confounding factors and reverse causality [[47]11]. Based on this, the
present study applies the MR analysis method to investigate the
association between plasma proteins and lung adenocarcinoma, aiming to
provide new insights for the early prevention of lung adenocarcinoma.
Materials and methods
Data sources
Firstly, genetic variants associated with protein expression levels are
referred to as"protein quantitative trait loci (pQTL)", the circulating
plasma protein pQTL data of this study was obtained from the study of
Lin J et al. [[48]12], which integrated five previously published GWAS
datasets [[49]7, [50]13–[51]16]. Second, the lung adenocarcinoma GWAS
summary data were obtained from the IEU Open GWAS database
([52]https://gwas.mrcieu.ac.uk/), including a total of 65,864 European
population samples (11,245 cases and 54,619 controls) [[53]17–[54]21].
All the study data mentioned above were derived from publicly available
GWAS databases or previously published studies and did not require
separate ethical approval or consent. The data details are shown in
Table [55]1.
Table 1.
GWAS data information on Preliminary analysis and External validation
analysis
Validation phase Research Exposure and Outcome Source of participants
(sample size) Types of plasma proteins Analytical methods
Preliminary analysis Sun et al. [[56]7] Plasma Proteins England (3301)
3622 SOMAscan
Emilsson et al. [[57]13] Iceland (5457) 4137 SOMAscan
Folkersen et al. [[58]14] Europe (3394) 83 Olink ProSeek
Suhre et al. [[59]15] Southern Germany (1000) 1124 SOMAscan
Yao et al. [[60]16] America (7333) 71 Multiplex immunoassays
TRICL [[61]17–[62]21] Lung Adenocarcinoma European (11,245 cases of
patients, 54,619 cases of controls) - -
External validation analysis UKB-PPP [[63]22] Plasma Proteins European
(33,469) 2923 -
deCODE [[64]23] Icelander (35,559) 4907 -
[65]Open in a new tab
Study design
As a statistical method, MR analysis must satisfy three core
assumptions [[66]24]: 1. The instrumental variables are significantly
associated with the exposure; 2. The instrumental variables must be
unrelated to confounding factors between the exposure and the outcome;
3. The instrumental variables have no direct relationship with the
outcome, only influencing the outcome through the exposure
(Fig. [67]1). Based on these core assumptions, if a study includes two
or more instrumental variables, the inverse variance weighted (IVW)
analysis method is used as the primary analysis method, with MR-Egger
regression and weighted median (WM) as secondary auxiliary methods, to
evaluate the causal effect between circulating plasma proteins and lung
adenocarcinoma. In addition, when only one instrumental variable is
included in the MR model, because there is no invalid instrumental
variable test, the Wald Ratio method is used to assess the causal
effect between circulating plasma proteins and lung adenocarcinoma.
Sensitivity analysis is crucial for detecting pleiotropy, with the
application of Cochran Q test to assess heterogeneity, MR-PRESSO to
detect pleiotropy residuals and outliers, and the use of
the"leave-one-out test"to assess whether MR is determined or biased by
a single SNP. For the main analysis, Bonferroni correction is used to
adjust the significance threshold for multiple testing, resulting in a
corrected P value of P < 6.81 × 10^–5 (0.05/number of tests, i.e.,
0.05/734).
Fig. 1.
[68]Fig. 1
[69]Open in a new tab
Schematic diagram of the instrumental variable core assumptions in the
analysis of two-sample Mendelian randomization. Assumption 1: IVs are
significantly associated with the exposure; Assumption 2: IVs must be
unrelated to confounding factors between the exposure and the outcome;
Assumption 3: IVs only affects the outcome through its exposure.
Abbreviations: SNPs, single nucleotide polymorphisms; IVW, inverse
variance weighted
Finally, we used the UK Biobank Pharma Proteomics Project (UKB-PPP)
[[70]22] and deCODE [[71]23] datasets to externally validate
preliminary validation results. The details of UKB-PPP and the deCODE
datasets can be found in Table [72]1. A statistical significance
threshold of 0.05 was applied for the validation.
Instrumental variables
In order to screen for SNPs significantly associated with exposure and
to ensure the accuracy and reliability of the study conclusions, this
study followed the plasma circulating protein pQTL screening criteria
as proposed by Lin J et al. [[73]12]. The inclusion criteria for pQTLs
in the study were as follows: (1) pQTLs reaching genome-wide
significance threshold (P < 5 × 10^–8); (2) pQTLs located outside the
major histocompatibility complex (MHC) region (chr6, 26–34 Mb); (3) no
significant linkage disequilibrium (LD) between pQTLs (LD clustering
r^2 < 0.001); and (4) was a cis-acting pQTL (defined as a genetic
variant located within a 500 kb window upstream and downstream of the
gene unit influencing protein expression). Ultimately, 738 cis-acting
SNPs representing 734 circulating plasma proteins were included.
Additionally, In order to minimize the impact of weak instrumental
variable bias on the results and ensure a robust relationship between
the instrumental variable and exposure, the strength of the association
between SNPs and exposure is evaluated using the F-statistic. When F >
10, it indicates that the likelihood of weak instrumental variable bias
is small. The formula for the F-statistic is
[MATH: F=N-K-1/K×R2/(1-R2) :MATH]
, where N represents the sample size of the exposure, K represents the
number of SNPs, R^2 represents the proportion of exposure variance
explained by SNPs, and
[MATH:
R2=2×(1-MAF)×MAF<
mo>×β/SD :MATH]
. Here, MAF denotes the minor allele frequency, which can be considered
equivalent to the frequency of the effect allele when calculating, and
[MATH: β :MATH]
represents the effect size of the allele, while
[MATH:
SD=SE×N :MATH]
is the standard deviation, and SE represents the standard error of
[MATH: β :MATH]
. Following these steps, instrumental variables for MR analysis were
strictly screened. The F-statistics of the IVs screened from
preliminary and external validation phases are all greater than 10,
indicating that these IVs are robust and satisfy the assumption 1.
Additionally, in order to satisfy the MR analysis assumption 2 and 3,
we utilized the NHGRI-EBI Catalog ([74]https://www.ebi.ac.uk/gwas/) to
analyze the IVs screened from preliminary validation, which aiming to
removing the IVs associated with lung adenocarcinoma and confounding
factors (i.e., including such as smoking [[75]25], alcohol consumption
[[76]26], BMI [[77]27], hypertension [[78]28], diabetes [[79]29], and
high cholesterol [[80]30]).
GO and KEGG pathway enrichment analyses
In order to further reveal the function of plasma proteins in lung
adenocarcinoma, we used an online analysis network to perform GO
analysis and KEGG pathway enrichment analysis on the significantly
associated plasma proteins, with a threshold set at p < 0.05. The MR
analysis was implemented using the ‘TwoSampleMR’
([81]https://github.com/MRCIEU/TwoSampleMR) package, ‘MR-PRESSO’
([82]https://github.com/rondolab/MR-PRESSO) package and
‘MendelianRandomization’ package in R (version 4.2.2).
Results
The MR analysis results of the relationship between circulating plasma
proteins and lung adenocarcinoma
This study conducted an analysis of the causal relationship between 734
plasma proteins and lung adenocarcinoma using MR. At the Bonferroni
significance level (P < 6.81 × 10^–5), MR analysis suggested a causal
relationship between 9 circulating plasma proteins and lung
adenocarcinoma, namely CTSH, FLRT3, SFTPB, ICAM5, PCYOX1, C3, BCAN,
TYMP, and UNC5D, as detailed in Table [83]2 and Fig. [84]2. CTSH,
FLRT3, SFTPB, ICAM5, PCYOX1, C3, BCAN, TYMP, and UNC5D each have only 1
IV, so the Wald Ratio method was used to estimate the causal effect of
plasma proteins on lung adenocarcinoma. We evaluated the instrumental
variables (IVs) for the aforementioned nine plasma proteins using the
NHGRI-EBI Catalog and identified that rs34593439 and rs1130866 were
associated with lung adenocarcinoma and confounding factors.
Consequently, we excluded CTSH and SFTPB, ultimately confirming seven
plasma proteins. No heterogeneity was detected in the main analysis.
Table 2.
MR results of plasma proteins significantly associated with lung
adenocarcinoma
Plasma protein UniProt ID SNPs Effect allele OR(95%CI) P value PVE(%) F
statistics
CTSH [85]P09668 rs34593439 A 1.10(1.06, 1.13) 1.46 × 10^–7 49.95
1098.94
FLRT3 [86]Q9NZU0 rs11908097 C 1.15(1.09, 1.23) 3.81 × 10^–6 14.37
255.54
SFTPB [87]P07988 rs1130866 A 0.90(0.87, 0.93) 1.94 × 10^–9 46.61 972.31
ICAM5 [88]Q8N612;[89]Q9UMF0 rs281439 C 0.92(0.89, 0.95) 2.31 × 10^–6
53.15 1194.80
PCYOX1 [90]Q9UHG3 rs2706762 C 0.89(0.85, 0.93) 5.31 × 10^–8 32.00
609.47
C3 [91]P01024 rs163494 C 0.78(0.69, 0.88) 2.43 × 10^–5 4.43 72.52
BCAN [92]Q96GW7 rs7541549 T 0.76(0.67, 0.87) 4.49 × 10^–5 2.96 48.84
TYMP [93]P19971 rs131798 G 0.76(0.66, 0.87) 5.79 × 10^–5 2.85 46.24
UNC5D [94]Q6UXZ4 rs6468316 T 0.70(0.60, 0.82) 1.62 × 10^–5 2.13 34.52
[95]Open in a new tab
SNPs single-nucleotide polymorphisms, PVE proportion of variance
explained
Fig. 2.
[96]Fig. 2
[97]Open in a new tab
Volcano plot of the MR analysis results. The relationship between 734
plasma proteins and the risk of lung adenocarcinoma. OR for increased
risk of lung adenocarcinoma were expressed as per SD increase in plasma
protein levels. Dashed horizontal black line corresponded to P = 6.81
× 10^–5 (0.05/734). Abbreviations: ln, natural logarithm; PVE,
proportion of variance explained
Sensitivity analysis
Due to the fact that each protein corresponds to only one SNP, it is
not possible to conduct horizontal pleiotropy and leave-one-out tests.
Therefore, this study ensures the stability and reliability of the
results through external validation.
External validation
To further clarify the relationship identified from preliminary
analysis, we selected plasma proteins (e.g., FLRT3, ICAM5, PCYOX1, C3,
BACN, TYMP, and UNC5D) from the UKB-PPP [[98]22] and deCODE datasets,
and analyzed them in the external validation phase. And we used the IVs
screening criteria in the preliminary analysis phase to find out the
IVs of these proteins from UKB-PPP and deCODE datasets. Obtaining IVs
from the deCODE datasets and performing MR and sensitivity analyses, we
found that only ICAM5 had statistically significant results.
Additionally, relevant IVs data for the plasma proteins FLRT3 and
PCYOX1 cannot be obtained in the UKB-PPP datasets. Therefore, we only
obtained IVs for the remaining five plasma proteins to perform MR
analysis and sensitivity analysis. The analyzed results supported the
causal relationships between ICAM5, TYMP and UNC5D with lung
adenocarcinoma. To further enhance the reliability of the results, the
study conducted a meta-analysis of the preliminary analysis results and
external validation results following MR analysis. Notably, although
PCYOX1 did not reach statistical significance in the external
validation of the deCODE dataset (P = 0.70), the post-MR meta-analysis
still demonstrated a statistically significant combined effect for
PCYOX1 due to the high weight contribution of the preliminary analysis
(92.6% in the common effect model and 90.7% in the random effects
model). Finally, the findings indicated a causal relationship between
ICAM5, PCYOX1 and TYMP with lung adenocarcinoma (Table [99]3 and
Fig. [100]3).
Table 3.
External validation MR results of plasma proteins and lung
adenocarcinoma
Research Exposure (Plasma protein) Outcome OR(95%CI) P value PVE(%)
UKB-PPP ICAM5 lung adenocarcinoma 0.90(0.83–0.97) 5.13 × 10^–3 14.24
TYMP lung adenocarcinoma 0.77(0.65–0.92) 3.48 × 10^–3 1.76
UNC5D lung adenocarcinoma 0.48(0.28–0.83) 7.95 × 10^–3 0.20
deCODE ICAM5 lung adenocarcinoma 0.91(0.85–0.98) 1.47 × 10^–2 0.24
[101]Open in a new tab
SNPs single-nucleotide polymorphisms, PVE proportion of variance
explained
Fig. 3.
[102]Fig. 3
[103]Open in a new tab
Forest plot of the post-MR meta-analysis
GO and KEGG pathway enrichment analysis results
To better understand the four plasma proteins, we performed GO and KEGG
pathway enrichment analyses using an online analytical network. The
results showed that in terms of biological processes (BP), the plasma
proteins were primarily enriched in"pyrimidine deoxyribonucleoside
monophosphate metabolic process","deoxyribonucleoside monophosphate
catabolic process","pyrimidine nucleobase metabolic process",
and"mitochondrial genome maintenance". In terms of cellular components
(CC), they were significantly associated with"very-low-density
lipoprotein particle","triglyceride-rich plasma lipoprotein particle",
and"plasma lipoprotein particle". In terms of molecular functions (MF),
the proteins were mainly involved in"oxidoreductase activity, acting on
a sulfur group of donors, oxygen as acceptor","pentosyltransferase
activity", and"growth factor activity". The top 10 significantly
enriched GO terms (P < 0.05) are detailed in Fig. [104]4.
Fig. 4.
[105]Fig. 4
[106]Open in a new tab
Functional Enrichment Analysis of Four Plasma Proteins. a GO analysis
results. The x-axis represents the negative logarithm (base 10) of the
enrichment score; the larger this value, the more significant the
enrichment of the function, and the y-axis represents the GO term.
Orange bars: BP, Green bars: CC, Blue bars: MF. b KEGG pathway
enrichment analysis results. The x-axis represents the negative
logarithm (base 10) of the enrichment score, and the y-axis represents
the name of the KEGG pathway. The size of the bubbles represents the
number of enriched plasma proteins in each pathway, while the color of
the bubbles represents the significance of the enrichment
Discussion
This study investigated the causal relationship between 734 plasma
proteins and lung adenocarcinoma. Using MR analysis and Bonferroni
correction, 9 proteins (CTSH, FLRT3, SFTPB, ICAM5, PCYOX1, C3, BACN,
TYMP, and UNC5D) were found to be relationship with lung
adenocarcinoma. Subsequently, through analyses of confounding factors,
external validation, and post-MR meta-analysis, only 3 out of the 9
proteins (ICAM5, PCYOX1, and TYMP) were validated, further supporting
the reliability of this study.
Intercellular adhesion molecule 5 (ICAM5) is a unique adhesion molecule
with binding ability, flexibility, and multiple potential signaling
capabilities, playing an important role in the growth and maturation of
neuronal synapses [[107]31]. In recent years, research on ICAM5 has
found a close association with glioblastoma (GBM), and its expression
levels can be used to predict the survival risk of GBM, showing a
negative correlation [[108]32]. In addition, ICAM5 is involved in
related signaling pathways that promote the development of bladder
cancer [[109]33]. Although there is limited research on its relevance
to lung adenocarcinoma, there have been reports of a significant
association between ICAM5 and a reduced risk of lung adenocarcinoma
[[110]34], consistent with the findings of this study. In another MR
study, ICAM5 and FLRT3 are potential targets for lung cancer and are
categorized as tier 4 targets, with potential practicality as cancer
drug targets [[111]8]. The above studies found that ICAM5 may play
different roles in different types of cancer, and may be a potential
protective factor in lung adenocarcinoma. Further exploration of its
protective mechanism in lung adenocarcinoma may become a potential
therapeutic target, suggesting a new direction for drug research in
lung adenocarcinoma.
Prenylcysteine oxidase 1 (PCYOX1) is an enzyme involved in the
degradation of prenylated proteins [[112]35]. It is expressed in
various embryonic and adult tissues, predominantly in the liver. PCYOX1
interacts with multiple proteins and plays crucial roles in the
respiratory chain, cell death, cell signaling, movement and transport,
metabolism and protein degradation [[113]36]. The dysregulation of
PCYOX1 may affect the balance of apoptosis in cells, leading to
excessive proliferation or death of cells, which may contribute to the
development of cancer. Little is known about PCYOX1's role in cancer.
Studies have shown that PCYOX1 exhibits tumor type-specific expression
patterns across malignancies [[114]36]. Our research found that PCYOX1
is negatively correlated with lung adenocarcinoma. Based on these
findings, further studies are needed to determine the mechanism of
action of this protein in the field of tumors.
Thymidine phosphorylase (TYMP) is the rate-limiting enzyme catalyzing
thymidine and has a promoting effect on the invasion and metastasis of
cancer cells [[115]37]. This may be related to the strong
pro-angiogenic and anti-apoptotic effects of TYMP [[116]38]. Therefore,
in many studies, TYMP expression has been associated with poor
prognosis [[117]39]. In addition, TYMP is also involved in the
metabolism of chemotherapeutic drugs such as 5-FU and capecitabine.
Tarar A et al. [[118]40] study shows that using human mesenchymal stem
cells (hMSCs) as cell carriers to deliver TYMP to lung cancer cells can
enhance the cytotoxic effect of chemotherapy drugs on cancer cells.
Another study found that the active metabolite 5’-DFCR of capecitabine
can aim at chemotherapy-resistant NSCLC for targeted therapy, and these
NSCLC exhibit higher CAD and TYMP expression [[119]41]. Furthermore,
TYMP itself as a therapeutic target for cancer, and its inhibitor has
been used in cancer treatment [[120]38]. It can be seen that TYMP plays
a variety of roles in the development and treatment of tumors. This
study found a negative correlation between TYMP and lung
adenocarcinoma, which may be a potential protective factor, but further
investigation is needed.
As the level of health improves, tumor screening has become
increasingly popular. In tumor screening, the convenience, speed, and
repeatability brought by serum biomarker tests all have significant
advantages. This study explores the causal relationship between four
plasma proteins and lung adenocarcinoma, which is helpful for the early
diagnosis and treatment of lung cancer.
This study also has some limitations. First, due to the limitation of
the number of SNPs, sensitive analysis of the identified plasma
proteins cannot be performed. Second, the specificity of the study
population, as all individuals in this study are of European descent,
may make it difficult to generalize our conclusions to other
populations. Third, plasma protein levels may be influenced by various
factors, however, this study only focuses on genetics, therefore,
larger-scale GWAS studies and foundational research are needed to
further validate the reliability and credibility of the results of this
study.
Conclusion
In conclusion, this study used two-sample MR analysis to explore the
causal relationship between plasma proteins and lung adenocarcinoma,
and identified 3 plasma proteins (ICAM5, PCYOX1 and TYMP) significantly
associated with the risk of lung adenocarcinoma, providing new insights
for the early diagnosis, treatment, and prognosis prediction of lung
adenocarcinoma.
Acknowledgements