Abstract
To address the pressing need to improve breast cancer outcomes, we
identify 9 plasma proteins with significant associations to breast
cancer, namely ULK3, CSK, ASIP, TLR1 in breast cancer, ADH5, SARS2,
ULK3, UBE2N in Luminal A subtype, PEX14 in Luminal B subtype. Tumor
immune cell infiltration analysis and mutation phenotypes in mice
further demonstrate a complex pattern of interaction between these
genes and immune responses. Compared to normal tissues, tumor tissues
exhibit reduced expression of ULK3 and CSK. Notably, elevated ULK3
expression in both breast cancer and the Luminal A subtype is
significantly associated with prolonged recurrence-free survival.
Overexpression of CSK and ULK3 is confirmed to significantly inhibit
the proliferation and migratory ability of MCF-7 cells. Additionally,
three drug candidates—TG100801, Hydrochlorothiazide, and Imatinib—show
promise in targeting these proteins, contributing valuable insights for
prioritizing drug development in realm of breast cancer.
Subject terms: Molecular medicine, Breast cancer
__________________________________________________________________
Genetic association study identifies 9 plasma proteins linked to breast
cancer survival. ULK3 and CSK show potential as therapeutic targets,
with TG100801, Hydrochlorothiazide, and Imatinib as promising drug
candidates.
Introduction
Breast cancer (BC) is a malignant neoplasm that originates from cells
within the breast tissue. It is the most prevalent cancer in women and
also constitutes the primary cause of cancer-related mortality in women
globally. BC encompasses four distinct molecular subtypes: human
epidermal growth factor receptor 2 positive (HER2), luminal A, luminal
B and triple-negative breast cancer (TNBC). Significantly, the clinical
attributes and biological behaviors of these distinct subtypes exhibit
substantial variations. The HER2 and TNBC subtypes manifest heightened
invasiveness, increased metastatic potential, susceptibility to
relapse, and an unfavorable prognosis. It is noteworthy that
approximately 46% of TNBC patients present with distant metastasis at
the point of diagnosis^[42]1.
In 2020, the International Agency for Research on Cancer (IARC,
[43]https://www.iarc.who.int/) released that there were approximately
2.3 million new cases of BC worldwide, accounting for 24.5% of all
malignancy in women. In addition, the worldwide fatality count
attributed to breast cancer in 2020 reached approximately 685,000,
constituting 15.5% of female cancer-related deaths. The widespread
occurrence of BC has resulted in substantial physical and economic
burdens on individuals and nations globally. Therefore, there exists a
compelling necessity to explore targeted drugs of BC.
Plasma proteins encompass both actively secreted and passively leaked
proteins originating from diverse organs and cells. They play pivotal
roles in numerous biological processes and are commonly dysregulated or
upregulated in a myriad of diseases, including cardiovascular disease
and inflammatory bowel disease^[44]2,[45]3. These proteins function as
biomarkers, facilitating the diagnosis of diverse diseases.
Consequently, they hold paramount significance in clinical practice and
drug development over the past decade. Circulating proteins, being
readily targetable using small molecules or biologics (such as
monoclonal antibodies), emerge as compelling candidates for potential
drug targets.
Mendelian randomization (MR) analysis was employed to evaluate
potential causal relationships between plasma proteins and BC. MR
stands as an invaluable tool for inferring causal relationship between
exposure and outcome, gaining widespread usage in numerous prominent
studies in recent years^[46]4,[47]5. Its fundamental principle relies
on the random allocation of genes during the mitosis stage, enabling
the inference of the impact of biological factors on disease through
their effects on phenotypes^[48]6,[49]7. This approach offers the
advantages of the randomized controlled trial, overcoming potential
confounding and reverse causality, while also eliminating the
disadvantage for the randomized controlled trial, thus saving financial
and human resources. Protein quantitative trait locus (pQTLs) represent
genetic variants related with plasma proteins at the Genome-wide
Association Study (GWAS) level^[50]2,[51]8,[52]9, facilitating the
ability to distinguish the relationship between plasma proteins and
BC^[53]10. We performed a systematic exploration into the potential
causality between plasma proteins and breast cancer, including its four
subtypes. Finally, plasma proteins involved in prognosis were
prioritized as potential therapeutic targets for drug repurposing or
the development of new agents.
Colocalization analysis, a statistical method employed in genetics and
genomics, aims to discern whether there exists a common genetic
foundation for the observed associations of distinct traits. It assists
in ascertaining whether a specific genetic loci influences both traits
or if the observed associations are, in fact, independent^[54]11. It
has been applied to enhance resolution and validate the robustness of
protein instrumental variables across a diverse spectrum of diseases,
including hematological disorders and neurological
diseases^[55]12,[56]13.
Previous studies have used MR to explore the relationship between
plasma proteins and breast cancer, laying important groundwork but
leaving gaps in understanding^[57]14. Mälarstig et al. measured plasma
protein levels using Olink PEA Explore in a cohort of 598 patients with
a follow-up of two years. While informative, the study was limited by
its small sample size and short duration^[58]14. Similarly, Papier et
al. assessed 1463 plasma proteins across 19 cancer types in the UK
Biobank, though their findings on breast cancer were
inconclusive^[59]15. Smith-Byrne et al. further explored potential
causal links between plasma proteins and breast cancer using two-sample
Mendelian randomization (TSMR) and colocalization analysis^[60]16.
However, these studies have not yet conducted deeper validation and
exploration behind the causal relationship.
In this study, we employed an integrated approach that connected
genetic variations with breast cancer, plasma proteins, biological
pathways, and drug discovery. The analysis identified 62 circulating
proteins, including ULK3, TLR1, CSK, ASIP, ADH5, SARS2, UBE2N, and
PEX14, which were strongly supported by colocalization analysis.
Through experimental studies and validation in clinical cohorts, CSK
and ULK3 were confirmed as critical targets closely linked to Luminal A
breast cancer. Additionally, three drug candidates—TG100801,
Hydrochlorothiazide, and Imatinib—demonstrated potential for targeting
these proteins in realm of breast cancer treatment.
Results
Proteome-wide MR analysis
The deCode database contains a total of 4907 plasma protein data. Among
them, 1815 proteins successfully met the four established criteria and
were slated for inclusion in the analysis (Supplementary Data [61]1).
The cohort included 133,384 breast cancer patients and 113,789
controls. Among the patients, 45,253 were classified as effective
luminal A-like cases, while 8602 were diagnosed with triple-negative
breast cancer. We conducted TSMR and Summary-data-based Mendelian
randomization (SMR) analyses involving 1815 plasma proteins and their
association with breast cancer and its subtypes. Upon excluding
associations that failed the Heterogeneity in dependent instrument
(HEIDI) test and implementing multiple testing correction, we
identified a final set of 62 circulating proteins. Among these, 33 were
associated with overall breast cancer, 3 with Luminal B, and 24 with
Luminal A subtypes, 2 with triple-negative breast cancer. (Tabel 1,
Supplementary Data [62]2-[63]11) The comprehensive results of the MR
analysis were visually depicted in Fig. [64]1a and Supplementary
Figs. [65]1-[66]2. It can be seen that the circulating proteins of
various subtypes are not consistent, indicating heterogeneity among
different subtypes of breast cancer. The position of 62 genetically
predicted circulating protein genes was shown in Fig. [67]1b. More
detailed characteristic of 62 proteins were shown in Table [68]1. Based
on the MR power analysis, our study had sufficient statistical power to
detect associations between plasma proteins and breast cancer
(Supplementary Data [69]12)
Fig. 1. The mendelian analysis and colocalization analysis results between
plasma proteins and breast cancer.
[70]Fig. 1
[71]Open in a new tab
a The identified gene proteins after colocalization analysis and
mendelian analysis. * & # stand different probes in the Decode
database: *13094_75 #8427_118 PH4 posterior probability of H4. The
diagram depicts concentric circles, with each circle’s distinct
coloration corresponding to specific groups. The innermost circle
denotes different groups, while the middle circle signifies the
colocalization result, and the outermost circle represents the TSMR
result. Color gradients, ranging from red to blue, symbolize β values
spanning from −0.05 to 0.5. In this gradient, red signifies protective
factors, whereas blue indicates risk factors. b Manhattan plots for
identified 62 circulating proteins in mendelian analysis stage. Results
are plotted by gene start position. Each point signifies an individual
association test between a gene and breast cancer, arranged by genomic
position on the x-axis and association strength on the y-axis,
represented as the −log10(P) of a z-score test.
Table 1.
The mendelian analysis and colonization analysis of causalities between
breast cancer and plasma proteins
Gene Outcome Method nsnp FDR PH4 Tier OR 95% CI
KPNA2 Breast cancer Wald ratio 1 0.023 0.545 2 0.634 (0.489,0.822)
CSK Breast cancer Wald ratio 1 0.001 0.849 1 0.639 (0.527,0.775)
DNAJC27 Breast cancer Wald ratio 1 0.097 0.0002 3 0.659 (0.493,0.883)
ACADVL Breast cancer Wald ratio 1 0.04 0.204 3 0.663 (0.516,0.853)
FAM49B Breast cancer Wald ratio 1 0.033 0.039 3 0.669 (0.526,0.85)
ULK3 Breast cancer Wald ratio 1 0.0001 0.964 1 0.68 (0.586,0.79)
CP Breast cancer Wald ratio 1 0.097 0.12 3 0.709 (0.556,0.903)
CDKN1B Breast cancer Wald ratio 1 0.08 0.282 3 0.792 (0.678,0.925)
OMG Breast cancer Wald ratio 1 0.066 0.0002 3 0.795 (0.685,0.923)
SNUPN Breast cancer IVW 2 0.001 0.722 2 0.802 (0.732,0.878)
TNFRSF19 Breast cancer IVW 2 0.079 0.077 3 0.832 (0.746,0.928)
UBE2C Breast cancer Wald ratio 1 0.087 0.053 3 0.88 (0.807,0.961)
RSPO3^# Breast cancer IVW 6 0.061 0.583 2 0.895 (0.84,0.953)
RSPO3* Breast cancer IVW 6 0.061 0.631 2 0.905 (0.854,0.958)
GSTT2B Breast cancer IVW 13 0.023 0.257 3 0.928 (0.893,0.964)
FLRT3 Breast cancer IVW 26 0.047 0.124 3 0.969 (0.953,0.986)
SWAP70 Breast cancer IVW 14 0.074 0.644 2 1.074 (1.03,1.12)
CCBL2 Breast cancer IVW 10 0.009 0.473 3 1.088 (1.045,1.133)
ASIP Breast cancer IVW 6 0.009 0.896 1 1.097 (1.049,1.148)
PEAR1 Breast cancer IVW 8 0.002 0.01 3 1.115 (1.064,1.169)
ADH5 Breast cancer IVW 5 0.061 0.549 2 1.128 (1.053,1.21)
RALB Breast cancer IVW 2 0.001 <0.0001 3 1.142 (1.085,1.202)
CNPY3 Breast cancer Wald ratio 1 0.087 0.029 3 1.153 (1.046,1.272)
COL6A3 Breast cancer IVW 2 0.046 0.794 2 1.200 (1.086,1.326)
SARS2 Breast cancer Wald ratio 1 0.033 0.523 2 1.209 (1.08,1.353)
CETN3 Breast cancer Wald ratio 1 0.089 <0.0001 3 1.281 (1.08,1.519)
NPPA Breast cancer Wald ratio 1 0.085 0.29 3 1.339 (1.1,1.63)
TLR1 Breast cancer IVW 2 0.009 0.987 1 1.375 (1.179,1.604)
SERPIND1 Breast cancer Wald ratio 1 0.029 0.034 3 1.576 (1.208,2.056)
LAYN Breast cancer Wald ratio 1 0.002 0.609 2 1.65 (1.312,2.076)
TFPI2 Breast cancer Wald ratio 1 0.051 <0.0001 3 1.657 (1.206,2.278)
PELO Breast cancer Wald ratio 1 0.005 0.064 3 1.712 (1.308,2.241)
DOK2 Breast cancer Wald ratio 1 0.085 0.028 3 1.722 (1.192,2.486)
UBE2N Luminal_A Wald ratio 1 0.005 0.934 1 0.499 (0.356,0.698)
CSK Luminal_A Wald ratio 1 <0.0001 0.728 2 0.509 (0.395,0.657)
KPNA2 Luminal_A Wald ratio 1 0.096 0.096 3 0.575 (0.408,0.811)
ULK3 Luminal_A Wald ratio 1 <0.0001 0.806 1 0.597 (0.489,0.727)
SYT11 Luminal_A Wald ratio 1 0.03 <0.0001 3 0.64 (0.499,0.821)
SNUPN Luminal_A IVW 2 <0.0001 0.669 2 0.721 (0.639,0.813)
OMG Luminal_A Wald ratio 1 0.098 0.091 3 0.735 (0.604,0.894)
SRA1 Luminal_A IVW 3 0.06 0.295 3 0.804 (0.715,0.903)
FBLN1 Luminal_A IVW 5 0.073 0.032 3 0.862 (0.79,0.941)
NID1 Luminal_A IVW 9 0.09 0.072 3 0.882 (0.817,0.952)
EPHB1 Luminal_A IVW 15 0.073 0.298 3 0.93 (0.891,0.971)
SERPING1 Luminal_A IVW 20 0.044 0.111 3 0.943 (0.915,0.972)
FGF2 Luminal_A IVW 23 0.073 0.386 3 0.95 (0.923,0.978)
LCT Luminal_A IVW 13 0.073 0.021 3 1.044 (1.019,1.07)
IDUA Luminal_A IVW 26 0.073 0.029 3 1.048 (1.021,1.077)
CYB5R2 Luminal_A IVW 16 0.073 0.017 3 1.082 (1.035,1.131)
CCBL2 Luminal_A IVW 9 0.073 0.091 3 1.098 (1.041,1.159)
SWAP70 Luminal_A IVW 14 0.073 0.235 3 1.099 (1.04,1.162)
RALB Luminal_A IVW 2 0.073 <0.0001 3 1.123 (1.049,1.201)
ANXA4 Luminal_A IVW 8 0.09 0.023 3 1.136 (1.051,1.228)
ADH5 Luminal_A IVW 5 0.073 0.863 1 1.200 (1.079,1.335)
CTSF Luminal_A IVW 3 0.005 <0.0001 3 1.214 (1.113,1.325)
SARS2 Luminal_A Wald ratio 1 0.015 0.876 1 1.323 (1.142,1.532)
LAYN Luminal_A Wald ratio 1 0.005 0.707 2 1.866 (1.379,2.524)
UBASH3B Luminal_B IVW 2 0.027 0.754 2 0.596 (0.465,0.762)
CDH17 Luminal_B IVW 15 0.027 0.314 3 1.172 (1.086,1.264)
PEX14 Luminal_B Wald ratio 1 0.028 0.873 1 3.268 (1.834,5.822)
KLRB1 Triple_Negative IVW 2 0.047 0.014 3 0.316 (0.176,0.566)
IL1RL1 Triple_Negative IVW 34 0.042 0.3 3 0.92 (0.883,0.958)
[72]Open in a new tab
* & # stand different probes in the Decode database: *13094_75
#8427_118;
FDR false discovery rate, PH4 posterior probability of H4, Nsnp number
of single nucleotide polymorphisms, IVW inverse-variance weighted, OR
odds ratio, 95% CI 95% confidential interval.
Colocalization analysis results
To ascertain the robustness of the previously identified 62 circulating
proteins, we performed colocalization analysis. As a result, we
identified 9 circulating proteins with high support (posterior
probability of H4, PH4 > 0.8, Tier 1) and 13 with medium support
(0.5 ≤ PH4 ≤ 0.8, Tier 2), 40 with limited support (PH4 < 0.5, Tier 3)
as shown in the Fig. [73]1a.
We have identified four robust support proteins in breast cancer:
Unc-51 Like Kinase 3 (ULK3), Toll Like Receptor 1 (TLR1), C-terminal
Src kinase (CSK), Agouti Signaling Protein (ASIP). Notably, ULK3 (Odds
ratio [OR] 0.680, 95% confidence interval [CI] 0.586–0.790) and CSK (OR
0.639, 95%CI 0.527–0.775) exhibited protective effects, whereas the
ASIP (OR 1.097, 95%CI 1.049–1.148) and TLR1 (OR 1.375, 95%CI
1.179–1.604) were associated with an increased risk of breast cancer.
In the context of luminal A breast cancer, four plasma proteins
exhibited high support: Alcohol dehydrogenase 5 (ADH5), Seryl-tRNA
Synthetase 2 (SARS2), Ubiquitin-conjugating enzyme E2 N (UBE2N), ULK3.
ULK3 (OR 0.597, 95%CI 0.489–0.727) and UBE2N (OR 0.499, 95%CI
0.356–0.698) were identified as protective factors, while ADH5 (OR
1.200, 95%CI 1.079–1.335) and SARS2 (OR 1.323, 95%CI 1.142–1.532) were
deemed risk factors. Additionally, one high-support protein in luminal
B breast cancer, namely peroxisomal biogenesis factor 14 (PEX14), was
identified as a risk factor (OR 3.268, 95%CI 1.834–5.822).
In the domain of medium support proteins, nine instances of breast
cancer were identified, namely collagen type VI alpha 3 chain (COL6A3,
OR 1.200, 95%CI 1.086–1.326), ADH5 (OR 1.128, 95%CI 1.053–1.210), SARS2
(OR 1.209, 95%CI 1.08–1.353), R-spondin 3 (RSPO3, OR 0.895, 95%CI
0.84–0.953), R-spondin 3 (RSPO3, OR 0.905, 95%CI 0.854–0.958),
Switching B Cell Complex Subunit SWAP70 (SWAP70, OR 1.074, 95%CI
1.03–1.12), Snurportin 1 (SNUPN, OR 0.802, 95%CI 0.732–0.878),
Layilin(LAYN, OR 1.650, 95%CI 1.312–2.076), Recombinant Karyopherin
Alpha 2 (OR 0.634, 95%CI 0.489–0.822). Coincidentally, two proteins
share the designation RSPO3; however, they employ distinct gene probes,
with serial numbers 13094_75 and 8427_118, respectively, in the deCODE
database. Meanwhile, Luminal A breast cancer exhibited three proteins
(SNUPN, LAYN, CSK) with medium support, (Supplementary Data [74]13) and
Luminal B type presented a single case (Ubiquitin-associated and SH3
domain-containing protein B, UBASH3B). (Supplementary Data [75]14)
Identified potential targets for the HER-2 enriched subtype and
triple-negative breast cancer in MR analysis were all classified as
tier 3 targets in the colocalization stage. Additional details can be
found in Table [76]1. (Supplementary Data [77]15 and [78]16)
Protein-protein interaction network analysis results
Proteins CSK, ULK3, LAYN, ASIP, SWAP70, ADH5, SARS2, and SNUPN have
been identified as playing noteworthy roles in breast cancer. To
elaborate, CSK, ULK3, LAYN, SWAP70, ADH5, and SNUPN exhibited
particular importance in Luminal A breast cancer. Within TNBC, the
proteins Interleukin 33, killer cell lectin like receptor B1, and
interleukin 1 receptor like 1 emerged as core nodes, underscoring their
pivotal role in this particular subtype of breast cancer. Similarly, in
Luminal B breast cancer, proteins Peroxisomal biogenesis factor 13,
PEX14, Cadherin 17, UBASH3B, and ELAV like RNA binding protein 3 served
as core nodes, indicating their significance within this subtype.
Further details can be seen in the Supplementary Figs. [79]3-[80]6 and
Supplementary Data [81]17. The depicted nodes in the Supplementary
Figs. [82]3-[83]6 represented proteins, while the variously colored
lines between the nodes signified distinct types of interactions,
including physical (protein-protein) interactions, co-expression,
colocalization, genetic interactions, enrichment pathways, and
predictions from websites.
Pathway enrichment results
In addition, we also conducted pathway enrichment analysis to further
explore the pathways involved in breast cancer and its subtypes. Due to
limitations in the number of genes, we conducted pathway analysis on
breast cancer and Luminal A breast cancer.
In the context of breast cancer, Gene Ontology (GO) analysis revealed
significant enrichment in the following biological processes (BP):
nuclear transport (p = 0.002), nucleocytoplasmic transport (p = 0.002),
and blood coagulation (p = 0.005). In the cellular components (CC)
category, the endoplasmic reticulum lumen (p = 0.001) was notably
enriched, with 4 genes involved. For molecular functions (MF),
serine-type endopeptidase inhibitor activity was significantly
represented (p = 0.0005). Disease Ontology (DO) analysis identified
several diseases with substantial enrichment, including nephritis
(p = 0.026), glomerulonephritis (p = 0.014), lipid metabolism disorders
(p = 0.007), and cervical squamous cell carcinoma (p = 0.017).
Furthermore, Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis
highlighted enriched pathways such as fatty acid degradation
(p = 0.004), drug metabolism via cytochrome P450 (p = 0.011),
metabolism of xenobiotics by cytochrome P450 (p = 0.013),
nucleocytoplasmic transport (p = 0.023), and the HIF-1 signaling
pathway (p = 0.024). (Fig. [84]2 and Supplementary Data [85]18)
Fig. 2. Pathway analysis of identified proteins associated with breast
cancer.
[86]Fig. 2
[87]Open in a new tab
a KEGG pathway enrichment analysis of identified proteins associated
with breast cancer. b DO disease categorization analysis of identified
proteins associated with breast cancer. c GO biological process
analysis of identified proteins associated with breast cancer. BP
biological processes, CC cellular components, MF molecular functions.
For luminal A breast cancer, the GO analysis identified enriched
biological processes, including negative regulation of phagocytosis,
disaccharide metabolism, and skeletal muscle cell proliferation (all
p < 0.05). The CC analysis revealed significant enrichment of the
collagen-containing extracellular matrix (p = 0.0001). Notable
molecular functions included phosphatidylserine binding (p = 0.003) and
nucleocytoplasmic carrier activity (p = 0.0007). KEGG analysis
underscored key pathways such as nucleocytoplasmic transport
(p = 0.015), highlighting their potential significance in this
research. (Supplementary Fig. [88]7 and Supplementary Data [89]19)
Mouse knock-out model results
Eight plasma proteins were identified as highly relevant through
colocalization analysis: ULK3, TLR1, CSK, ASIP, ADH5, SARS2, UBE2N, and
PEX14. Except for ASIP and SARS2, which were not available in the
database, mutations related to the remaining six proteins were
documented.
Mutations in the ULK3 gene in mice were associated with diverse
phenotypes affecting the endocrine/exocrine, reproductive,
hematopoietic, and immune systems, underscoring the gene’s critical
role in these areas. Similarly, TLR1 was implicated in the
hematopoietic and immune systems; CSK influenced the
endocrine/exocrine, hematopoietic, and immune systems, as well as aging
and mortality; ADH5 was involved in hematopoiesis and immunity, in
addition to aging and mortality; UBE2N played a significant role in the
endocrine/exocrine, hematopoietic, reproductive and immune systems,
alongside aging and mortality; and PEX14 was important in the system
related to aging and mortality. Collectively, five of the six proteins
engaged the hematopoietic and immune systems. Further exploration of
the mutation phenotypes primarily revealed involvement in
immunoglobulin levels, cytokines, immune cells, and immune organs
(thymus, spleen, lymph nodes), with specific details available in
Table [90]2 and Supplementary Data [91]20.
Table 2.
Mouse knock-out phenotype overview of identified genes
ULK3 TLR1 CSK ADH5 UBE2N PEX14
Endocrine/exocrine glands √ √ √
Hematopoietic system √ √ √ √ √
Immune system √ √ √ √ √
Mortality/aging √ √ √ √
Reproductive system √ √
[92]Open in a new tab
ASIP is not available in the database. The mutation of SARS2 is not
available in the database.
Tumor immune cell infiltration
In murine knockout models, we identified specific plasma proteins
associated with the immune system that may play a significant role in
the pathogenesis of breast cancer. Consequently, we further validated
the relationship between the expression of specific genes and immune
cell infiltration in a breast cancer cohort. The results indicated a
complex interaction pattern between gene expression and immune
responses in breast cancer. Specifically, ULK3 expression was
positively correlated with CD4 + T cell infiltration (r = 0.13,
p = 7.08E−05) but negatively correlated with CD8 + T cell (r = −0.10,
p = 1.41E−03) and macrophage infiltration (r = −0.10, p = 1.12E−03).
TLR1 exhibited a positive correlation with all six studied immune cells
(r ranging from 0.42 to 0.72, p < 0.05). CSK showed positive
correlations with B cells, CD8 + T cells, CD4 + T cells, neutrophils,
and dendritic cells (r values were 0.34, 0.10, 0.41, 0.33, and 0.37,
respectively). In hormone receptor-positive breast cancer, ADH5 was
positively correlated with CD8 + T cells, CD4 + T cells, macrophages,
neutrophils, and dendritic cells, with r values of 0.28, 0.15, 0.21,
0.20, and 0.14, respectively. Similarly, UBE2N displayed positive
correlations with B cells, CD8 + T cells, macrophages, neutrophils, and
dendritic cells (all P < 0.05). Further details are available in
Supplementary Data [93]21.
Survival analysis results
Survival analyses were performed on a platform integrating data from
Gene Expression Omnibus (GEO), European Genome-Phenome Archive (EGA),
The Cancer Genome Atlas (TCGA), and Molecular Taxonomy of Breast Cancer
International Consortium (Metabric) databases. Recurrence free survival
(RFS) and overall survival (OS) were considered as the primary outcome
measures, with a follow-up period spanning 150 to 300 months. As shown
in Figs. [94]3–[95]4 and Supplementary Figs. [96]8-[97]9, heightened
expressions of ULK3 (Hazard Ratio (HR) 0.67, 95% CI 0.58–0.78,
P = 1.7e−07), SNUPN (HR 0.82, 95% CI 0.74–0.91, P < 0.001), and ASIP
(HR 0.82, 95% CI 0.74–0.91, P < 0.001) were correlated with a
statistically significant extension of RFS. Additionally, breast cancer
patients with high RSPO3 expression (HR 0.7, 95% CI 0.54–0.92,
P = 0.010) and ASIP expression (HR 0.76, 95% CI 0.63–0.92, P = 0.004)
had longer OS compared to those with low expression, with statistically
significant differences.
Fig. 3.
[98]Fig. 3
[99]Open in a new tab
Overall and recurrence free survival analysis of identified plasma
proteins with high support in breast cancer.
Fig. 4.
[100]Fig. 4
[101]Open in a new tab
Overall and recurrence free survival analysis of identified plasma
proteins with high support in Luminal breast cancer.
In Luminal A breast cancer, elevated expression levels of ULK3 (HR
0.65, 95% CI 0.51–0.83, P < 0.001) and SNUPN (HR 0.77, 95% CI
0.66–0.91, P = 0.002) were significantly linked to prolonged RFS.
Meanwhile, there was no notable disparity in RFS between high and low
expression groups of CSK and ADH5 proteins. Similarly, no significant
distinctions in OS were discerned between high and low expression
groups for CSK, ULK3 and ADH5 proteins. (Figs. [102]3–[103]4 and
Supplementary Fig. [104]10)
In Luminal B breast cancer, there were no statistically significant
differences in RFS and OS between high and low expression groups for
PEX14 proteins. (Figs. [105]3–[106]4 and Supplementary Fig. [107]11)
Druggable protein
As one protein concurrently manifested in breast cancer and its
subtypes, we identified a total of eight high-support proteins
actually. We searched for the 8 proteins obtained above in the
mentioned database: 3 proteins were found in the database, namely ULK3,
CSK, ADH5, as shown in Fig. [108]5. No information was available for
the remaining high-support proteins (ASIP, TLR1, PEX14, SARS2, UBE2N).
Fig. 5.
[109]Fig. 5
[110]Open in a new tab
Discovered drug targeting identified plasma proteins with robust
support in breast cancer.
CSK functions as a protective factor in breast cancer by inhibiting
SRC. TG100801, an SRC inhibitor, emerges as a potential drug target for
breast cancer and is currently in the investigation stage.
Hydrochlorothiazide was reported to be associated with CSK, yet the
specific direction of this relationship remains unknown, warranting
further exploration in breast cancer^[111]17. ULK3 serves as a
protective factor in breast cancer. Previous literature suggested a
connection between Imatinib and the ULK3 protein, but the specific
nature of this relationship was unclear and required additional
investigation^[112]18. Hydrochlorothiazide and Imatinib are both in
approved stage. On the contrary, ADH5 is identified as a risk factor
for breast cancer. Inhibitors of ADH5, namely Nitrefazole, N6022, and
Cavosostat, have been recognized previously. While Nitrefazole has
obtained approval, the latter two await approval.
Immunohistochemistry
We collected tumor and adjacent non-tumor tissues from 25 patients with
Luminal A subtype breast cancer. All cases were pathologically
classified as invasive ductal carcinoma, with a mean patient age of 40
years and clinical stages ranging from I to III. Detailed information
is provided in Supplementary Data [113]22. Among these high-confidence
proteins, only three—ULK3, CSK, and ADH5—were identified in the
database with corresponding drugs. Consequently, we performed further
validation on these proteins. This study conducted an
immunohistochemical analysis of three proteins—CSK, ULK3, and ADH5—in
both tumor and adjacent normal tissues. Qualitative comparisons and
quantitative assessments of the immunohistochemistry images revealed
that the expression levels of CSK and ULK3 were significantly lower in
tumor tissues compared to adjacent normal tissues, with markedly
reduced staining intensity (P < 0.001, 0.005 respectively). In
contrast, ADH5 expression was significantly downregulated in tumor
tissues relative to normal tissues (P = 0.006). All relevant
immunohistochemical results and their quantitative analyses were
presented in the Fig. [114]6 and Supplementary Data [115]23.
Fig. 6. Immunohistochemical analysis of CSK, ULK3, and ADH5 expression in
Luminal A breast cancer tissues.
[116]Fig. 6
[117]Open in a new tab
a Representative images showing CSK, ULK3, and ADH5 expression in
Luminal A breast cancer tissues and adjacent normal tissues. Positive
staining is indicated by brown coloration. b Quantitative analysis of
protein expression in tumor and adjacent tissues in Luminal A breast
cancer. Quantification of CSK, ULK3, and ADH5 expression levels in
cancer tissues compared to normal tissues. AOD average optical density,
T tumor, N normal. *P < 0.05, **P < 0.01, ***P < 0.001. Data are
presented as mean ± standard deviation. (N = 25 biologically
independent samples).
Cell proliferation and migration results
To elucidate the functional roles of potential drug targets in breast
cancer, MCF-7 cells were transfected with lentiviral vectors carrying
the CSK and ULK3 genes. Real-time quantitative polymerase chain
reaction (qPCR) analysis confirmed a significant upregulation of CSK
and ULK3 expression in transfected MCF-7 cells (P < 0.001)
(Fig. [118]7a, [119]b and Supplementary Data [120]24). Cell Counting
Kit-8 (CCK-8) assay results demonstrated that overexpression of CSK and
ULK3 significantly inhibited the proliferation of MCF-7 cells
(P < 0.001) (Fig. [121]7c and Supplementary Data [122]25).
Fig. 7. Effects of CSK and ULK3 overexpression on the proliferation and
migration of MCF-7 cells.
[123]Fig. 7
[124]Open in a new tab
a, b CSK and ULK3 were significantly overexpressed in MCF-7 cells
24 hours after lentiviral transfection. c Cell proliferation was
assessed using the CCK-8 assay at 24, 48, 72, and 96 hours in MCF-7
cells overexpressing CSK and ULK3b (N = 5 per group). d–g Cell
migration was evaluated using a wound healing assay in MCF-7 cells
overexpressing CSK and ULK3 at 24 hours. *P < 0.05, **P < 0.01,
***P < 0.001.
In the wound healing assay conducted 24 hours post-scratch, the cell
migration rate was 35.56% in the ULK3 overexpression group compared to
81.66% in the corresponding control group. Similarly, the migration
rate was 82.56% in the CSK overexpression group, compared to 95.90% in
its control group. These findings further confirmed that overexpression
of CSK and ULK3 markedly reduced the migratory ability of MCF-7 cells
(P < 0.001) (Supplementary Data [125]26), consistent with the results
of the CCK-8 assay. In conclusion, overexpression of CSK and ULK3
inhibits both the proliferation and migration of MCF-7 breast cancer
cells (Fig. [126]7d–g).
Discussion
This study utilized MR analysis to investigate the potential causality
between circulating plasma proteins and the risk of breast cancer and
its subtypes. The results revealed 62 circulating proteins, with 33
associated with breast cancer, 24 with Luminal A, 3 with Luminal B, and
2 with triple-negative breast cancer following MR analysis. The
circulating proteins, namely ULK3, TLR1, CSK, ASIP, ADH5, SARS2, UBE2N,
PEX14, were identified to be highly supported through colocalization
analysis. Among them, we identified three plasma proteins with
potential druggability, each already associated with specific targeted
therapies. Through experimental studies and validation in clinical
cohorts, CSK and ULK3 were confirmed as critical targets closely linked
to Luminal A breast cancer.
In the pathway enrichment analysis of breast cancer, we identified that
these genes were primarily involved in nucleocytoplasmic transport and
the HIF-1 signaling pathway. In the context of Luminal A breast cancer,
these genes were also enriched in pathways related to nucleocytoplasmic
transport. HIF-1 is a critical regulator of cellular adaptation under
low-oxygen conditions, driving breast cancer cells to adapt to hypoxic
environments and promoting tumor progression. HIF-1 consists of two
subunits, HIF-1α and HIF-1β. Under normal oxygen conditions, HIF-1α is
rapidly degraded; however, under hypoxic conditions, HIF-1α stabilizes
and dimerizes with HIF-1β, thereby activating the transcription of
downstream target genes^[127]19. The activation of HIF-1α promotes
metabolic reprogramming in breast cancer cells, facilitating aerobic
glycolysis, increasing glucose uptake, and enhancing lactate
production. Moreover, the HIF-1 signaling pathway regulates the
expression of genes related to angiogenesis, cell proliferation,
invasion, and metastasis, such as vascular endothelial growth factor
and glycolytic enzymes. Thus, targeting the HIF-1 signaling pathway
could provide new therapeutic opportunities for breast cancer
treatment^[128]20,[129]21.
Many of the plasma proteins identified in this study were found to play
a significant role in regulating hematopoietic and immune system
functions. Specifically, these proteins were associated with
immunoglobulin levels, cytokine production, immune cell activation, and
the functionality of immune organs. Additionally, the observed immune
cell infiltration further reinforced the interaction between these
plasma proteins and immune cells, suggesting their potential
involvement in modulating immune responses. While current researches
predominantly focus on the immune landscape of triple-negative breast
cancer, our findings indicate that the immune microenvironment in
Luminal breast cancer also warrants attention. Future investigations
into the potential for immunotherapy in the treatment of Luminal-type
breast cancer, particularly in later lines of therapy, would be
valuable.
CSK was recognized to participate in environmental information
processing, such as the classic pathway for breast cancer, Epidermal
growth factor receptor (EGFR) signaling pathway. EGFR is a conserved
carboxy-terminal tyrosine kinase, encompassing Src homology domains 2
and 3, along with the Src family catalytic domain SH1^[130]22. EGFR and
Src tyrosine kinase cooperatively regulate EGFR-mediated cell signaling
under pathological conditions, facilitating cellular transformation and
tumor development. CSK inhibited Src activation, playing a crucial role
in downregulating ErbB-2/neu-activated Src kinase and suppressing the
tumor invasion of breast cancer^[131]23. Src kinase plays a crucial
role in promoting hypoxia-induced signaling, leading to the activation
of the HIF-1 pathway^[132]24. As an inhibitor of Src, CSK may
indirectly suppress HIF-1 activation, thereby reducing angiogenesis and
invasiveness in breast cancer cells. Additionally, Src collaborates
with the Rap1 signaling pathway in regulating cell adhesion and
migration^[133]25. By negatively regulating Src activity, CSK could
indirectly influence integrin-mediated functions within the Rap1
signaling pathway.
Our research findings indicated that CSK protein acted as a protective
factor in breast cancer, which aligned with previous research results.
CHK, a homologous protein of CSK, has been shown to significantly
inhibit tumor size in naked mice implanted with MCF-7 cells transfected
with wt-CHK^[134]26. These support that CSK is a novel drug targets in
breast cancer. Currently, drugs targeting CSK include TG100801 and
hydrochlorothiazide. TG100801 is a multi-target inhibitor of vascular
endothelial growth factor receptors and Src kinases, designed to induce
apoptosis in proliferating endothelial cells involved in
neovascularization and to suppress inflammation-mediated processes. It
is currently being developed for use in treating macular
degeneration^[135]27.
ULK3 demonstrated a protective effect in overall breast cancer and
luminal A subtype. Furthermore, ULK3 may serve as a potential favorable
prognostic marker in RFS in breast cancer, particularly within the
luminal A subtype. Our findings reinforce the previous research by
Zhang et al., suggesting the significant role in breast cancer^[136]28.
ULK3 belongs to the ULK family, and its homologous proteins, ULK1 and
ULK2, have garnered extensive attention in the pathogenesis of breast
cancer recently^[137]28–[138]30. The ULK3 protein activates
glioma-associated oncogene homolog 1 in a dependent manner,
upregulating transcription of the DNMT3A gene during autophagy
induction, thus stimulating autophagy^[139]31. It is also suggested
that ULK3 can catalyze the phosphorylation of ESCRT-III proteins
CHMP1A/B, CHMP2A, and IST1 in vitro and in situ, leading to subcellular
localization changes in ESCRT-III assembly and preventing their
aggregation. As a result, catalytically active ULK3 delays the final
step of cell division^[140]32. Previous literature suggested a
connection between Imatinib and the ULK3 protein. Imatinib mesylate is
a protein tyrosine kinase inhibitor that selectively inhibits BCR-ABL
tyrosine kinase, thereby affecting related downstream pathways such as
the Ras/MapK, Src/Pax/Fak/Rac, and PI/PI3K/AKT/BCL-2
pathways^[141]33,[142]34. It has been approved for use in various
conditions including leukemias, myelodysplastic/myeloproliferative
diseases, hypereosinophilic syndrome, and gastrointestinal stromal
tumors. Vitro studies have demonstrated that the combination of
imatinib with vinorelbine or chemotherapy significantly enhances growth
inhibition in breast cancer models^[143]35,[144]36. A phase II clinical
trial have indicated that imatinib alone shows no clinical activity
against metastatic breast cancer (MBC) or MBC with overexpressing
platelet-derived growth factor receptor-beta (PDGFR-β)^[145]37,[146]38.
In cases of MBC expressing c-kit and/or PDGFR-β positive hormone
receptors, the combination of imatinib with letrozole resulted in a
median RFS of 8.7 months and an OS of 44.3 months^[147]39. Another
phase I/II trial involving patients with MBC (expressing PDGFR-α and/or
-β and/or KIT) demonstrated the fairly low clinical benefits of the
combining imatinib with vinorelbine^[148]40. Future clinical studies
should aim to refine the target population for more precise treatment
interventions, also necessitating larger sample sizes for robust
conclusions.
MR results revealed significant genetic associations between plasma
proteins and breast cancer incidence risk. While these targets may play
a role in the initiation of cancer, they may not be suitable for
treating cancer progression. The key distinction lies in that proteins
associated with cancer risk are more relevant for prevention and early
intervention, whereas those linked to cancer progression are better
suited for tumor therapies. Experimental validation showed that
overexpression of CSK and ULK3 significantly inhibited the
proliferation and migration of MCF-7 cells, supporting their potential
role in breast cancer progression. These findings suggest that CSK and
ULK3 could serve as therapeutic targets for advanced breast cancer. In
conclusion, while our study provides insights into the genetic
underpinnings of both breast cancer incidence and progression, it is
crucial to differentiate between these two mechanisms when evaluating
therapeutic targets. Future research should further explore the
functional role of these targets in cancer progression and assess their
potential in the development of therapeutic agents.
Some plasma proteins were classified as having moderate support in the
colocalization analysis, indicating a moderate probability the
associations between plasma proteins and breast cancer are driven by
the same causal variant. This finding may also be influenced by
pleiotropy with a low probability, as certain traits or diseases are
affected by multiple genetic loci or complex mechanisms, making it
challenging to establish a clear colocalization signal. Such complexity
is particularly common in multifactorial diseases^[149]41.
There are three main methods available on the market for proteomics
analysis: Olink, SomaScan, and direct measurement techniques. Direct
measurement methods, being untargeted approaches, offer lower
throughput but are applicable to all disease
types^[150]14–[151]16,[152]42. Although both the Olink and SomaScan
platforms are affinity-based, they differ in nature: one is
antibody-based, while the other is aptamer-based. A recent study
comparing the Olink Explore 3072 platform with the SomaScan platform
found that the median coefficient of variation for Olink was 16.5%,
compared to 9.9% for SomaScan, suggesting that SomaScan provides more
accurate measurements^[153]43. Therefore, the technology employed in
this study, SomaScan, offers greater precision compared to other
available techniques.
This study demonstrates significant innovation in methodology, data
application, and validation. Unlike Anders Mälarstig et al. and Karl
Smith-Byrne et al., who utilized a single MR method to analyze
associations between breast cancer and related
proteins^[154]14,[155]16, this research combines TSMR and SMR methods.
This integration of intersectional and colocalization analyses enhances
the reliability of the results. While earlier studies by Anders
Mälarstig, Keren Papier, and Karl Smith-Byrne have explored breast
cancer associations with plasma proteins, these works primarily use MR
and colocalization analysis to assess causality and lack functional
validation in animal models and substantial investigation of drug
effects^[156]14–[157]16. To bridge this gap, we conducted functional
experiments in cell lines and mouse models to investigate the
biological mechanisms of breast cancer-related proteins, addressing a
key gap in mechanistic validation. Moreover, unlike the study by Keren
Papier et al., which was limited by the Olink Explore II panel, our
research covers a broader spectrum of proteins, providing a more
comprehensive view for the exploration of breast cancer-related
biomarkers. In addition to confirming associations with previously
identified proteins, such as ULK3, TLR1, and
LAYN^[158]14,[159]16,[160]28, our cross-analyses identified additional
potential biomarkers for breast cancer, including CSK, ASIP, ADH5,
SARS2, UBE2N, and PEX14, thus expanding upon findings in the
literature. Through validation in functional experiments, this study
substantiates the associations between identified proteins and breast
cancer, offering crucial support for future research on breast cancer
biomarkers.
Several limitations should be noted in this study. Firstly, The GWAS
data of breast cancer, deCODE, GEO, EGA, TCGA database predominantly
consist of individuals of European ancestries, constraining the
generalizability of our conclusions. However, we incorporated a
substantial amount of data from non-European populations during the
validation phase. Secondly, we cannot mine the relationship between
proteins outside the database and breast cancer, due to the limitation
of the range of circulating plasma proteins included in deCODE
database. Thirdly, despite our efforts to comprehensively search all
available drug databases, the exclusion of drugs in the developmental
or clinical trial stages may have led to inability to identify drugs
targeting the proteins identified. Another limitation of this study is
the absence of power calculations specific to breast cancer subtypes.
While power analysis was conducted based on the total sample size, the
cohort’s limited availability of subtype-specific sample sizes
prevented detailed analyses by subtype. However, given the large
overall sample size, we believe the statistical power is sufficient to
detect meaningful effects across subtypes.
In conclusion, this study identified a total of 9 robust genes and 13
medium genes of plasma protein, which genetically associated with
breast cancer and its subtypes. These genes were predominantly enriched
in nucleocytoplasmic transport and HIF-1 signaling pathways.
Furthermore, our findings underscore the critical role of plasma
proteins in regulating hematopoietic and immune functions. Compared to
normal tissue, tumor tissue exhibited reduced expression of ULK3 and
CSK. Notably, ULK3 expression in both breast cancer and the Luminal A
subtype was significantly associated with prolonged recurrence-free
survival. Overexpression of CSK and ULK3 was confirmed to significantly
inhibit the proliferation and migratory ability of MCF-7 cells, as
demonstrated by the CCK-8 assay and wound healing assay, respectively.
Moreover, a comprehensive search in drug databases led to the
identification of three potential drugs: TG100801, Hydrochlorothiazide,
Imatinib. These findings offer genetic evidence that supports the
identification of potential therapeutic target drugs for breast cancer,
thereby contributing valuable insights for prioritizing drug
development in realm of breast cancer.
Method
Study design
The framework of this study design was elucidated in Fig. [161]8.
Initially, MR analyses, namely SMR and TSMR analysis, were performed on
the data of BC and circulating proteins. Subsequently, pathway
enrichment analysis and protein-protein interaction (PPI) network
analysis were conducted based on the results of SMR and TSMR analyses
to further delineate potential mechanisms underlying breast cancer and
its subtypes.
Fig. 8. Schematic diagram of this study.
[162]Fig. 8
[163]Open in a new tab
GWAS genome-wide association study, pQTL protein quantitative trait
loci, cis-pQTL cis-acting protein quantitative trait loci, MHC major
histocompatibility complex, LD linkage disequilibrium, MR Mendelian
randomization, SMR summary-data-based MR test, HEIDI heterogeneity in
dependent instrument, PH4 posterior probability of H4, GEO Gene
Expression Omnibus, EGA European Genome-Phenome Archive, TCGA The
Cancer Genome Atlas, Metabric Molecular Taxonomy of Breast Cancer
International Consortium. (Created in BioRender. quan, l. (2025)
[164]https://BioRender.com/ cdz72yh).
To ensure the robustness of the findings, additional colocalization
analysis was performed on the results obtained from MR to identify the
target circulating protein. In result, the plasma proteins were
categorized into three tiers: tier 1 for high-support, tier 2 for
medium-support, and tier 3 for all others.
The tissue specimens from patients with breast cancer were collected to
perform immunohistochemistry (IHC) to assess the expression of the
target protein in tumor tissues and adjacent normal tissues. The
relationship between target protein expression and immune cell
infiltration in the tumor microenvironment was subsequently analyzed.
In addition, survival analysis was performed to investigate the
prognostic significance of the target protein in breast cancer. While
the mouse knock-out model was performed to validate the result of
colocalization analysis. Additionally, functional validation of the
target protein was conducted using in vitro experiments in cell line.
Ultimately, exploration of drug libraries was undertaken to identify
drugs targeting circulating proteins with high support.
Datasets
The genotyping data for BC and its subtypes were gained from a recent
publicly available comprehensive study of breast cancer, including
133,384 cases and 113,789 counterparts from over 20 European countries
by 82 Breast Cancer Association Consortium studies^[165]44. Notably, it
is the first study to reveal the GWAS outcomes of BC as well as its
four distinct subtypes: HER-2, Luminal A, Luminal B and TNBC subtypes.
The deCODE repository ([166]https://www.decode.com/summarydata/) is a
comprehensive genome-wide blood proteome database that contains summary
data of 4907 plasma protein. These data were derived from a large-scale
pQTL study involving 35,559 individuals^[167]8. These individuals were
enrolled in two major initiatives: the Icelandic Cancer Project and
various genetic programs at deCODE genetics from 2000 to 2019. Egil
Ferkingstad et al. employed SomaScan, a multiplexed aptamer assay that
measures plasma proteins using 5284 aptamers to assess the relative
binding rates of plasma samples to each aptamer. The study measured
27.2 million sequence variants and the levels of 4907 plasma proteins
among these participants^[168]8. Widely utilized, it has proven
instrumental in identifying pivotal genetic risk factors across various
prevalent diseases, such as cardiovascular disease and cancer.
MR analysis
For inclusion, pQTLs were required to satisfy these standards: (i)
Demonstrate a significant association across the genome
(P < 5 × 10^−8); (ii) Occur beyond the boundaries of the major
histocompatibility complex (MHC) area (chr6, 25.5–34.0 Mb); (iii) Show
a distinct association characterized by linkage disequilibrium (LD)
clumping r^2 being less than 0.01, considering a clumping window of
10000 kb; and (iv) Function as a cis-acting pQTL^[169]8. The
classification of a pQTL association as cis occurs when its related
variant lies within a maximum distance of 1 Mb from the transcription
start site of the gene that produces the relevant protein, otherwise
it’s categorized as trans^[170]8.
MR analyses were performed utilizing R software and TwoSampleMR
package. In this study, we used circulating proteins as exposures and
breast cancer types as outcomes to perform MR using “TwoSampleMR”. For
single pQTL, the Wald ratio was adopted. With multiple pQTLs, we
employed the inverse variance weighted (IVW) method and evaluated
heterogeneity. Breast cancer risks were considered as OR per standard
deviation (SD) increase in plasma protein levels. In addition, the SMR
of multi-single nucleotide polymorphisms (SNPs) for every circulating
protein was conducted as a sensitivity analysis to corroborate the
primary findings. The HEIDI test was utilized to differentiate between
two types of genetic effects: pleiotropic models, where a single
genetic variant influences multiple traits or outcomes, and linkage
models, where a genetic variant is associated with a specific trait due
to its proximity to a gene that directly affects that trait. This
approach offers advantages over many GWAS and molecular
QTL-methods^[171]45,[172]46. Associations in the HEIDI test with P
values below 0.05, indicative of pleiotropy, were excluded from
subsequent analysis. For the SMR analysis, we utilized the SMR software
and implemented multiple testing by Benjamini-Hochberg method, setting
the false discovery rate (FDR) threshold at α = 0.1. We performed a MR
power analysis to assess the statistical power of our study to detect
causal effects^[173]47. MR analysis was performed following the MR
Statement guidelines to ensure robust and transparent
methodology^[174]48.
Colocalization analysis
Colocalization analysis through the “coloc” R package was performed to
determine if the associations between plasma proteins and BC, along
with its subtypes, were influenced by LD^[175]49. We calculated
posterior probabilities by Approximate Bayes Factor computations for
the following five hypotheses. H0: No significant association exists
between plasma proteins and breast cancer. H1: Only plasma proteins are
significantly associated with SNP loci in a specific genomic region.
H2: Only breast cancer is significantly associated with SNP loci in a
specific genomic region. H3: Both plasma proteins and breast cancer are
significantly associated with SNP loci in a specific genomic region,
but these associations are driven by different causal variants. H4:
Both plasma proteins and breast cancer are significantly associated
with SNP loci in a specific genomic region, and these associations are
driven by the same causal variant^[176]50. The formula for calculating
PH4 is as follows:
[MATH:
PH4=L4⋅Prior(H4)<
mi>L0⋅Prior(H0)+L1⋅Prior(H1
)+L2⋅P
rior(H2)+L3⋅
Prior(H3)+L4<
/mn>⋅Prior<
mo>(H4) :MATH]
L0, L1, L2, L3, L4 represent the likelihoods under each corresponding
hypothesis. Prior(H0), Prior(H1), Prior(H2), Prior(H3), Prior(H4)
represent the prior probabilities for each hypothesis^[177]50. The
prior probabilities were set as follows: SNP association with only the
first trait (p1) at 1 × 10^−4, with only the second trait (p2) at 1 ×
10^−4, and with both traits (p12) at 1 × 10^−5. Strong colocalization
was inferred when PH4 was å 0.8, while a PH4 between 0.5 and 0.8
indicated medium colocalization^[178]3,[179]50. This analysis was only
applied to proteins with FDR-corrected P values < 0.1 in TSMR and HEIDI
test P values > 0.05 in SMR. Based on the colocalization evidence, we
categorized the MR results into three tiers: tier 1 for high-support
(PH4 å 0.8), tier 2 for medium-support (0.5 ≤ PH4 ≤ 0.8), and tier 3
for all others.
Protein-Protein interaction network analysis
The protein-protein interactions play a pivotal role in biological
functions and processes. It is notable that numerous proteins seem to
carry out their functions through these interactions^[180]51,[181]52.
To further elucidate the interactions among the 62 identified
circulating proteins, obtained through Mendelian randomization of drug
targets in the initial step. Gene interaction data were obtained from
the GeneMANIA website and imported into the R environment^[182]53. The
graph package was then employed to construct a network graph, with the
layout function used to optimize node arrangements for clearer
representation of interaction relationships. Finally, the network was
visualized using the “ggplot2” package. The interactions were
classified into the following aspects: including co-expression,
physical interactions, genetic interaction, predicted interaction,
shared protein domain and colocation.
Pathway analysis
In this study, we conducted analyses of GO, KEGG, and DO to
comprehensively explore the biological functions and relevance of the
target genes^[183]54–[184]56. Initially, we performed GO and KEGG
enrichment analyses using the “clusterProfiler” package. Target genes
were extracted from the input data, and their corresponding Entrez IDs
were obtained from the “org.Hs.eg.db” database. Subsequently, we
executed GO and KEGG analyses using the “enrichGO” and
“enrichKEGG”functions, respectively, while the “enrichDO”function was
employed to identify biological pathways associated with specific
diseases. During this process, the gene ratio was calculated to
evaluate the enrichment of target genes within specific pathways. P
value < 0.05 was considered significant. The dynamic enrichment bubble
plot of KEGG analysis and signaling pathways were depicted.
Mouse knock-out models
The Mouse Genome Informatics (MGI,
[185]http://www.informatics.jax.org/) is a comprehensive database
providing gene function, expression, and phenotypes, as well as genetic
maps and molecular markers information of the laboratory
mouse^[186]57–[187]59. Thus, it can serve as a valuable platform to
study the gene of identified plasma proteins by comparison phenotypes
between gene and its mutations/alleles. The genes of acquires proteins
were retrieved in the website and the section “Mutations, Alleles, and
Phenotypes” would display phenotype and mutation information of this
genes.
Tumor immune cell infiltration
Tumor Immune Estimation Resource (TIMER,
[188]https://cistrome.shinyapps.io/timer/), developed by Taiwen Li and
colleagues, is an online tool that employs a deconvolution algorithm to
estimate the abundance of tumor-infiltrating immune cells based on gene
expression data^[189]60,[190]61. The expression of selected genes and
the infiltration of immune cells in breast cancer were studied using
the TIMER. The immune cells consisted of B cells, CD8 + T cells,
CD4 + T cells, macrophages, neutrophils, and dendritic cells. The
correlation coefficients between the genes and immune cells were
denoted by r, which ranged from -1 to 1. A correlation was considered
statistically significant if P < 0.05.
Survival analysis
GEO ([191]http://ww.ncbinlm.nih.gov/geo/), TCGA
([192]https://portal.gdc.cancer.gov/), [193]EGA, Metabric
([194]http://molonc.bccrc.ca/aparicio-lab/research/metabric/) are all
famous databases of breast cancer, owing a large number of cohorts and
patient data. KM-plotter ([195]http://kmplot.m/analysis/) is the most
influential survival analysis tool based on the integrative data from
Metabric, GEO, TCGA, EGA, encompassing survival information from 7,830
breast cancer patients, with RFS data available for 5,268 patients and
OS data for 5,165 patients^[196]62,[197]63. Patient classification
follows the St. Gallen criteria. To further analyze the prognostic
value of the acquired genes of the target proteins, the patients were
assigned to two groups accordingly, namely low expression group and
high expression group. The OS and RFS were used as primary outcome
endpoints in two cohorts, and P value < 0.05 indicated a significant
difference.
Druggable protein identification
To evaluate the translational potential of these risk-associated
targets, we further prioritized candidates with plausible links to
disease progression. While not all identified targets demonstrated
significant associations with breast cancer progression, our MR
analysis revealed their robust genetic links to breast cancer incidence
risk. These risk-associated targets may represent candidates for
primary prevention strategies or early-stage therapeutic interventions.
Compared to drugs targeting unsupported targets, medications targeting
human genetic-supported targets exhibit a higher probability of
therapeutic success. The following database, DrugBank
([198]https://go.drugbank.com/)^[199]64, DGIdb
([200]https://dgidb.org/)^[201]64, CheMBL
([202]https://www.ebi.ac.uk/chembl/)^[203]65,[204]66, Therapeutic
Target Database ([205]https://db.idrblab.net/ttd/)^[206]67, were
searched for potential drugs targeted identified proteins associated
with breast cancer. This study prioritized the identification of
proteins with substantial evidence (PH4 > 0.8) in the drug databases.
We obtained pertinent information for each drug, including details on
the protein target, indications, and developmental status.
Immunohistochemistry
We collected tissue samples from 25 patients with Luminal A subtype
breast cancer at the Cancer Hospital, Chinese Academy of Medical
Sciences and Peking Union Medical College. Tumor and adjacent non-tumor
tissues from 25 Luminal A breast cancer patients were analyzed using
IHC. Paraffin-embedded sections were dewaxed, rehydrated, and subjected
to antigen retrieval in EDTA buffer (pH 9.0) using a microwave.
Endogenous peroxidase activity was blocked with 3% hydrogen peroxide,
followed by blocking non-specific binding with 3% BSA. Sections were
incubated overnight at 4°C with rabbit anti-ADH5 primary antibody
(1:200, D225407, Sangon), then with HRP-conjugated secondary antibody
(GB23303, Servicebio). DAB was used for chromogenic detection, and
nuclei were counterstained with hematoxylin. The slides were
dehydrated, mounted, and analyzed under a microscope, with blue nuclei
and brown DAB-positive staining indicating expression. Quantification
of staining intensity was performed using Image J software. The same
protocol was applied for detecting ULK3 (D291625, Sangon), and CSK
(D151197, Sangon).
Cell culture and transfection
The MCF-7 cell line was provided by the State Key Laboratory of
Biotherapy at Sichuan University (Chengdu, China). MCF-7 cells are
derived from human breast cancer and are of female origin. The MCF-7
cells are not listed in the ICLAC register, and therefore,
authentication was not performed. MCF-7 cells were cultured in DMEM
medium supplemented with 10% fetal bovine serum under sterile
conditions at 37°C with 5% CO[2]. Lentiviral particles for CSK and ULK3
were purchased from GeneChem Co., Ltd (Shanghai, China). Transfection
was performed at a multiplicity of infection of 500, with cells
incubated with the viral particles for 24 hours. The experiment
included four groups: the CSK overexpression group, the ULK3
overexpression group, the empty vector group (Vector), and the blank
control group (Control).
qPCR verification
qPCR was performed to verify the expression levels of CSK and ULK3
after transfection. Total RNA was extracted from cells using TRIzol
reagent (Invitrogen, CA, USA). The RNA was reverse-transcribed into
cDNA using the Prime Script RT reagent kit (TaKaRa, Shanghai, China).
qPCR was conducted on a fluorescence-based qPCR system (TaKaRa,
Shanghai, China) using the SYBR Premix Ex Taq II reagent kit. The
primer sequences used for qPCR were as follows: For the hGAPDH gene,
the amplicon size was 101 bp, with the forward primer sequence
5’-ACAACTTTGGTATCGTGGAAGG-3’ and the reverse primer sequence 5’-
GCCATCACGCCACAGTTTC-3’. For the hCSK gene, the amplicon size was
126 bp, with the forward primer sequence 5’-AGGACCCCAACTGGTACAAAG-3’
and the reverse primer sequence 5’- CGTGGAACCAAGGCATGAG-3’. For the
hULK3 gene, the amplicon size was 102 bp, with the forward primer
sequence 5’- GAACGGAATATCTCTCACCTGGA-3’ and the reverse primer sequence
5’-GTGTTGTGCGAAACCAAAGTC-3’.
Cell proliferation and migration analysis
Transfected cells were seeded into 96-well plates, and cell viability
was assessed at 24, 48, 72, and 96 hours using CCK-8 assay
(MedChemExpress, USA). For the wound healing assay, transfected cells
were seeded into 6-well plates. After 24 hours of culture, a uniform
scratch was made, and the cells’ migratory ability was evaluated
24 hours post-scratch.
Inclusion and ethics
In this study, local researchers were actively involved throughout the
entire research process, including study design, implementation, data
ownership, and authorship, to ensure the research addressed the
specific needs and context of the community. Roles and responsibilities
were clearly defined at the beginning of the study, and
capacity-building plans were made for local researchers to ensure
equitable involvement. This study was approved by the Ethics Committee
of National Cancer Center/Cancer Hospital, Chinese Academy of Medical
Sciences and Peking Union Medical College Ethics Committee (Clinical
Ethics Approval No. 23/139-3884). Written informed consent was obtained
from all participants. All ethical regulations relevant to human
research participants were followed.
Statistics and reproducibility
IBM SPSS Statistics (version 21) and GraphPad Prism (version 8.0) were
used for statistical analysis and plotting. A paired sample t-test was
conducted to assess differences in the expression levels of proteins
between tumor tissues and adjacent normal tissues. Group comparisons in
qPCR validation, CCK-8 assays, and cell scratch assays were performed
using t-tests. Independent sample t-tests or Mann-Whitney U tests were
applied based on the distribution of the data, with normality assessed
by the Shapiro-Wilk test. Quantification of staining intensity and cell
scratch analysis were performed using Image J software (version 1.53a).
Statistical significance was defined as a two-sided P-value < 0.05. In
vitro experiments were conducted with a minimum of three independent
biological replicates.
Reporting summary
Further information on research design is available in the [207]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[208]Supplemantary Material^ (2MB, pdf)
[209]42003_2025_8046_MOESM2_ESM.docx^ (16.8KB, docx)
Description of Additional Supplementary Files
[210]Supplementary Data 1^ (1.4MB, zip)
[211]Supplementary Data 2-26^ (3.3MB, xlsx)
[212]Reporting Summary^ (2.3MB, pdf)
Acknowledgements