Abstract Background Breast cancer is the most prevalent malignancy and the leading cause of cancer-related deaths among women worldwide. Several case reports have shown that some breast cancer patients subsequently develop acute myeloid leukemia (AML) within a short period. However, the causal relationship and pathogenic mechanisms between breast cancer and AML remain incompletely understood. Methods Mendelian randomization (MR) analyses were conducted to explore the bidirectional causal relationships between breast cancer and AML. Additionally, we applied the Bayesian Weighted Mendelian Randomization (BWMR) approach to validate the results of the MR analysis. Subsequently, we utilized RNA-seq data from various sources to explore the potential molecular signaling pathways between breast cancer and AML. Results Both IVW method and BWMR approach demonstrated that data from three distinct sources consistently indicated breast cancer as a risk factor for AML, with all sources showing statistically significant results (all P < 0.05, Odds Ratios [ORs] > 1). Bioinformatic analyses suggested that extracellular vesicle functions and p53 signaling pathway may mediate molecular links between breast cancer and AML. Using machine learning, we identified 8 genes with high diagnostic efficacy for predicting the occurrence of AML in breast cancer patients. Conclusions MR analyses indicated a causal relationship between breast cancer and AML. Additionally, transcriptome analysis offered a theoretical basis for understanding the potential mechanisms and therapeutic targets of AML in breast cancer patients. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02288-7. Keywords: Breast cancer, Acute myeloid leukemia, Mendelian randomization, Transcriptome overlap analysis Introduction Breast cancer is the most prevalent malignancy globally, representing a significant threat to the health and longevity of women [[30]1]. As of 2020, female breast cancer has become the most frequently diagnosed malignancy worldwide, surpassing lung cancer. It accounts for approximately 2.3 million new cases, representing 11.7% of all cancer diagnoses [[31]2]. The increasing incidence of secondary malignant neoplasms, including hematologic malignancies, occurring months or years after the initial tumor diagnosis, is drawing increased scrutiny as the breast cancer survivor population continues to grow significantly [[32]3, [33]4]. Breast cancer can be classified into ER-positive or ER-negative subtypes based on the presence or absence of estrogen receptor (ER) expression on tumor cells. Approximately 70% of breast cancer cases exhibit ER expression, positioning it as a key target for therapeutic intervention [[34]5]. Acute myeloid leukemia (AML) is a malignancy affecting hematopoietic stem and progenitor cells, characterized by the abnormal proliferation of undifferentiated cells in the bone marrow and peripheral blood [[35]6]. Previous research has shown that AML frequently develops after Hodgkin lymphoma, breast cancer, ovarian cancer, and testicular cancer, particularly in patients who have undergone cytotoxic chemotherapy and/or radiotherapy [[36]7, [37]8]. Furthermore, AML is also observed to develop after pre-existing myeloid malignancies, notably Myelodysplastic Syndrome (MDS) or the co-occurrence of MDS with Myeloproliferative Neoplasms (MPN) [[38]9]. The prognosis for women with early-stage breast cancer has significantly improved with the adoption of neoadjuvant chemotherapy protocols incorporating anthracyclines and alkylating agents [[39]10]. However, studies indicate cytotoxic drugs significantly increase the risk of secondary AML, which is acknowledged in the WHO classification as "therapy-related myeloid neoplasms" [[40]11, [41]12]. Although this association is a late complication of chemotherapy and dose-dependent, there have been cases where patients developed AML shortly after a breast cancer diagnosis [[42]13–[43]15]. Additionally, a nationwide cohort study in France found that among breast cancer survivors included in the study, the incidence rate of AML rose sharply after breast cancer diagnosis, with a standardized incidence ratio approximately 3.2 times higher than that of the general population [[44]16]. Drawing on the findings of the aforementioned research, we propose that a causal relationship may exist between breast cancer and AML, with breast cancer potentially serving as a risk factor for the development of AML. Given the inherent limitations of conventional statistical approaches, observational studies often encounter challenges due to confounding variables and reverse causality, which obscure the causal relationship between breast cancer and AML. Mendelian randomization (MR), in contrast to traditional randomized clinical trials (RCTs), utilizes genetic variants to assess the potential causal influences of risk factors on outcomes. This approach not only offers significant savings in cost and time but also enables the exploration of a broader range of clinical questions [[45]17, [46]18]. Bayesian weighted Mendelian randomization (BWMR) approach is a statistical method designed to address uncertainty from weak effects due to polygenicity and violations of the IV hypothesis caused by multidirectional influences. The method utilizes Bayesian weighted outlier detection to offer a robust and reliable approach for identifying these effects [[47]19]. The purpose of this study was to use MR methods to explore whether there is a causal relationship between breast cancer and AML. Furthermore, RNA-Seq is an advanced technique for transcriptome profiling that leverages deep sequencing technologies, providing a powerful tool for comprehensive analysis of differential gene expression and mRNA splicing across the entire transcriptome. It has become an indispensable method in molecular biology, significantly enhancing our understanding of genomic functions [[48]20, [49]21]. We utilized transcriptomic data of breast cancer and AML to explore potential molecular interactions and shared diagnostic markers, offering insights into the underlying pathological mechanisms that may connect the two diseases. Materials and methods Data source The summary Genome-Wide Association Study (GWAS) data for breast cancer were obtained from three sources: Lee Lab UK Biobank (UKBB) ([50]https://www.leelabsg.org/resources), Neale Lab UK Biobank (UKBB) ([51]https://www.nealelab.is/) and the 200 K custom array (iCOGS) [[52]22]. The UK Biobank is a cohort of 502,000 participants, aged 40 to 69, recruited across the United Kingdom between 2006 and 2010. Additionally, the iCOGS array comprises a total of 89,677 samples. Outcome data for AML were sourced from FinnGen's R10 dataset, which covers 412,181 samples ([53]https://r10.finngen.fi/), with additional details also available in Table [54]1. FinnGen is a research initiative established in Finland to explore genetic variation linked to disease progression. Table [55]1 provides detailed information on both exposure and outcome data. Both exposure and outcome data were sourced from European populations. Ethical approval was not required, as the original GWAS had obtained the necessary clearances. Table 1. Details of selected instrumental variables for the GWAS data Data source Gender Traits SNP (n) Cases Controls Lee Lab UKBB Both sexes Breast cancer 27,946,810 12,898 388,549 Lee Lab UKBB Female Breast cancer 27,944,595 12,671 388,549 Neale Lab UKBB Both sexes Breast cancer 13,778,263 8,304 352,890 Neale Lab UKBB Female Breast cancer 13,778,263 8,246 185,928 iCOGS Both sexes Overall breast cancer 14,061,818 46,785 42,892 FinnGen R10 Both sexes AML 20,191,309 244 31,4192 [56]Open in a new tab SNP Single-nucleotide polymorphisms, UKBB UK Biobank, AML Acute myeloid leukemia In the transcriptome overlap analysis, breast cancer transcriptome data were sourced from The Cancer Genome Atlas (TCGA) ([57]http://xena.ucsc.edu/). From the TCGA-BRCA cohort, we selected 113 normal tissue samples and 1,104 breast cancer samples. The gene expression data for AML were obtained from the Beat 2.0 AML dataset ([58]http://www.vizome.org), which includes 671 AML samples and 36 non-AML samples [[59]23]. Study design The study methods were compliant with the STROBE-MR checklist [[60]24]. Each MR approach relied on three principal assumptions aimed at reducing the impact of biases on the estimates obtained from MR studies [[61]17]. Firstly, the relevance criterion was met by identifying genetic variants strongly linked to breast cancer, achieving genome-wide significance (P < 5e-8). Secondly, the independence assumption was validated by ensuring that the selected genetic variants had no associations with other potential confounders related to breast cancer and AML. Finally, the instrumental variables (IVs) did not influence the outcomes through any pathways other than the exposure determinants under consideration. Figure [62]1 presents a study framework diagram that illustrates the design of our research. Fig. 1. [63]Fig. 1 [64]Open in a new tab Overview of study design. BRCA breast cancer, AML Acute myeloid leukemia, IVW inverse variance weighted, MR-PRESSO Mendelian Randomization Pleiotropy Residual Sum and Outlier, BWMR Bayesian weighted Mendelian randomization, TCGA The Cancer Genome Atlas, GEO Gene Expression Omnibus, DEGs differentially expressed genes, GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes, PPI protein–protein interaction, LASSO Least Absolute Shrinkage and Selection Operator Selection of genetic instruments To ensure the complete random independence of instrumental variable estimations and to eliminate the effects of linkage disequilibrium (LD) on outcome variables, a stringent threshold (P < 5e − 8) was established to selectively identify Single Nucleotide Polymorphisms (SNPs) significantly associated with exposure variables across the genome. Next, after establishing the LD parameters (r^2 = 0.001 and kb = 10,000), the aforementioned SNP dataset was subjected to filtration during preprocessing to ensure alignment between the alleles associated with exposure effects and the magnitude of outcome effects. In addition, the F-statistic was calculated to assess overlapping effects, and detect weak instrument bias, weak instrumental bias, with F < 10 considered indicative of bias and excluded. Finally, we searched PhenoScanner ([65]http://www.phenoscanner.medschl.cam.ac.uk/) and removed SNPs potentially correlated with AML through other phenotypes. MR analysis After eliminating potential confounding factors associated with outcomes, the genetic instruments employed for the exposures were clarified. The inverse variance weighted (IVW) method was employed as the primary analysis to assess the causal link between breast cancer and AML [[66]25]. In addition, MR Egger, weighted median, simple mode and weighted mode were implemented as supplementary analytical techniques. Subsequently, we employed the BWMR approach to complement the IVW method, considering results significant when the P-values from both methods were below 0.05. Integrating these two methods enables a more precise evaluation of the influence of risk factors on complex traits or diseases. To verify the robustness of the causal association, we conducted a range of sensitivity analyses, including MR-Egger regression test, Mendelian Randomization Pleiotropy Residual Sum and Outlier (MR-PRESSO) test, leave-one-out sensitivity analysis, and Cochran’s Q-test [[67]26]. The MR-Egger regression test was performed to assess directional pleiotropy. The MR-PRESSO test was utilized to assess potential bias caused by horizontal pleiotropy by identifying and excluding pleiotropic SNPs with a P < 0.05, thus enabling a reevaluation of the causal relationships. The leave-one-out method was applied to evaluate the robustness of the MR findings by sequentially excluding individual SNPs. Cochran’s Q-test was used to assess the degree of heterogeneity influencing the MR findings, based on the IVW and the MR-Egger estimates. Additionally, a funnel plot was used to evaluate the potential direction of pleiotropy. All MR analyses were carried out using the “TwoSampleMR” and “MRPRESSO” packages in R (version 4.2.3). Identification of DEGs, weighted gene co-expression network analysis, and functional enrichment analysis Employing the limma software package, we identified differentially expressed genes (DEGs) in breast cancer, applying a stringent threshold of |Log2 Fold Change (FC)|> 1 and a significance level of adjusted p < 0.05 [[68]27]. To elucidate gene co-expression networks in AML, we conducted Weighted Gene Co-Expression Network Analysis (WGCNA) using the R package "WGCNA" (version 1.71) [[69]28]. Next, we identified the overlapping genes between breast cancer and AML. Subsequently, we used the "ClusterProfiler" R package to perform Gene Ontology (GO) analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation to explore the biological foundations associated with these overlapping gene [[70]29]. Construction of PPI network and identification of hub genes A protein–protein interaction (PPI) network was constructed using the "STRING" database ([71]https://string-db.org). Subsequently, the PPI network was visualized using the Cytohubba plugin (version 3.10.1) within Cytoscape software, which was also used to calculate the degree of each protein node. We designated the top 15 identified genes as the hub genes in our study. Construction of the AML-related diagnostic and predictive model Using the 15 hub genes mentioned above, we applied 8 machine learning algorithms alongside the Least Absolute Shrinkage and Selection Operator (LASSO) dimensionality reduction technique to predict the risk of AML. Following this, we used the "ROC" package to construct the receiver operating characteristic (ROC) curves to validate the diagnostic efficacy of the candidate biomarkers. The area under the ROC curve (AUC) was used as a measure of accuracy, with a criterion of 0.9 ≤ AUC < 1 applied to identify outstanding accuracy. Results Selection of SNPs Based on a genome-wide significance threshold of P < 5e-8 and after removing LD, 126 SNPs were identified as instrumental variables for breast cancer, each surpassed this level of significance. Furthermore, after evaluating the SNP dataset and identifying significant risk factors for AML via PhenoScanner, including aging, gender, smoking, alcohol consumption, obesity, diabetes, exposure to benzene and formaldehyde, and previous radiochemotherapy, we found that no SNPs required removal due to associations with these confounding variables [[72]30–[73]36]. Consequently, 126 SNPs were selected to evaluate the genetic risk of AML in patients with breast cancer. Notably, the F-statistics in our analysis exceeded 10, indicating that these IVs are strong predictors of breast cancer incidence. Additional details on the SNPs are provided in Table S1. Causal effect of breast cancer on AML via forward MR In our study, we used IVW as the primary analytical method to assess the relationship between breast cancer and AML. We analyzed breast cancer data from Lee Lab UKBB cohort with AML, and the IVW method revealed a significantly increased risk associated with both breast cancer and AML [breast cancer of both sexes: Odds Ratio (OR) = 1.555, 95% Confidence Interval (CI) 1.074–2.250, P = 0.019; breast cancer of female: OR = 1.575, 95% CI 1.091–2.272, P = 0.015] (Fig. [74]2: Forest plots showing causal estimates between breast cancer and AML in forward MR, Table S2: MR results of the associations between breast cancer and AML risk under five methods). Similarly, we used breast cancer data from Neale Lab UKBB dataset (breast cancer of both sexes: OR = 8.209, 95% CI 1.389–15.030, P = 0.018; breast cancer of female: OR = 4.436, 95% CI 0.677–8.196, P = 0.021) and iCOGS array (overall breast cancer: OR = 1.384, 95% CI 1.043–1.835, P = 0.024) with AML, finding a significant causal relationship between breast cancer and AML (Fig. [75]2: Forest plots showing causal estimates between breast cancer and AML in forward MR, Table S2: MR results of the associations between breast cancer and AML risk under five methods). In addition, the MR Egger, weighted median, simple mode and weighted mode also supported the finding that breast cancer was significantly associated with AML (Fig. [76]2: Forest plots showing causal estimates between breast cancer and AML in forward MR, Fig. [77]3: Scatter plots showing causal estimates between breast cancer and AML in forward MR and Table S2: MR results of the associations between breast cancer and AML risk under five methods). After conducting a detailed examination of the breast cancer data from Lee Lab UKBB and AML using the BWMR approach, the results were found to be significant (breast cancer of both sexes: P = 0.018; breast cancer of female: P = 0.014). We also performed BWMR approach on breast cancer data from Neale Lab UKBB (breast cancer of both sexes: P = 0.022; breast cancer of female: P = 0.028) and iCOGS array (overall breast cancer: P = 0.014) with AML, and the results were similarly significant. These findings further support the notion that breast cancer is a risk factor for AML and the specific results of BWMR approach are provided in Fig. [78]2. Fig. 2. [79]Fig. 2 [80]Open in a new tab Forest plots showing causal estimates between breast cancer and AML in forward MR. A The causal association results for breast cancer from Neale Lab UKBB and AML. B The causal association results for breast cancer from Lee Lab UKBB, iCOGS array and AML. AML acute myeloid leukemia, UKBB UK Biobank, BWMR Bayesian Weighted Mendelian Randomization, MR Mendelian randomization, CI confidence interval, OR odds ratio, pval p-value Fig. 3. [81]Fig. 3 [82]Open in a new tab Scatter plots showing causal estimates between breast cancer and AML in forward MR. A The causal association results for breast cancer of both sexes from Lee Lab UKBB and AML. B The causal association results for breast cancer of female from Lee Lab UKBB and AML. C The causal association results for breast cancer of both sexes from Neale Lab UKBB and AML. D The causal association results for breast cancer of female from Neale Lab UKBB and AML. E The causal association results for overall breast cancer from iCOGS array and AML. AML acute myeloid leukemia, UKBB UK Biobank, SNP Single Nucleotide Polymorphisms, MR Mendelian randomization, IVW Inverse variance weighted Moreover, the sensitivity analyses conducted in this study confirm the robustness of the observed causal estimates. The MR-Egger regression test showed no evidence of directional pleiotropy, with all intercept P-values greater than 0.05 (Table S3: The heterogeneity and horizontal pleiotropy of individual SNPs). Similarly, the MR-PRESSO test results indicated the absence of horizontal pleiotropy (P > 0.05) (Table S3: The heterogeneity and horizontal pleiotropy of individual SNPs). Additionally, the symmetry of funnel plots showed no obvious horizontal pleiotropy (Fig.S1: Funnel plots illustrating the causal effect of breast cancer on AML in forward MR). Furthermore, the leave-one-out analysis discovered no single SNP that significantly influenced the causal relationship between breast cancer and AML (Fig.S2: The leave-one-out analysis of the estimations for breast cancer and AML in forward MR). The P-values for the Cochran's Q tests were all above 0.05, indicating no heterogeneity among the SNPs (IVW, all P > 0.05; MR-Egger, all P > 0.05) (Table S3: The heterogeneity and horizontal pleiotropy of individual SNPs). Causal association of AML with breast cancer via reverse MR We conducted a reverse MR study to explore potential bidirectional relationships between AML and breast cancer. Using the IVW model, we did not observe any causal relationship between AML and breast cancer derived from the Lee Lab UKBB (breast cancer of both sexes: OR = 0.986; 95% CI 0.960–1.012; P = 0.288; breast cancer of female: OR = 0.987; 95% CI 0.960–1.014; P = 0.326) (Fig.S3: Forest plot showing causal estimates between AML and breast cancer in reverse MR). Similarly, there was no significant causal association between AML and breast cancer derived from the Neale Lab UKBB (breast cancer of both sexes: OR = 0.9997; 95% CI 0.9989–1.0005; P = 0.466; breast cancer of female: OR = 0.9995; 95% CI 0.9979–1.0010; P = 0.504) or between AML and breast cancer derived from the iCOGS array (overall breast cancer: OR = 1.001; 95% CI 0.976–1.027; P = 0.925) (Fig.S3: Forest plot showing causal estimates between AML and breast cancer in reverse MR). The same conclusion is corroborated four additional MR analytical approaches, including MR Egger, weighted median, simple mode and weighted mode, as consistently shown in Table S4 and Fig.S3-6. Using the BWMR approach, we examined the causal relationship between breast cancer data from the Lee Lab UKBB and AML, and our findings did not reveal any significant causal association (P > 0.05) (Fig.S3: Forest plot showing causal estimates between AML and breast cancer in reverse MR). Analysis of AML and breast cancer data from the Neale Lab UKBB and the iCOGS array using the BWMR approach also yielded the same results (all P > 0.05) (Fig.S3: Forest plot showing causal estimates between AML and breast cancer in reverse MR). These findings corroborate the results obtained from the IVW method. DEGs identification of breast cancer By comparing the normal group with the breast cancer group, 2,867 DEGs were identified, including of 1,109 upregulated genes and 1,758 downregulated genes. A heat map was employed to visualize the expression patterns of DEGs across samples (Fig. [83]4A: A heatmap of the top 40 DEGs in breast cancer). Fig. 4. [84]Fig. 4 [85]Open in a new tab Bulk-RNA-seq analysis revealing potential molecular links between breast cancer and AML. A A heatmap of the top 40 DEGs in breast cancer. B Module-trait relationships in AML. Each module contains the corresponding correlation coefficient and P-value. C Venn diagram showing 102 overlapping genes between downregulated breast cancer genes and AML negative modules. D Venn diagram showing 56 overlapping genes between upregulated breast cancer genes and AML positive modules. E GO enrichment analysis for overlapping genes in the category of Biological Processes. F GO enrichment analysis for overlapping genes in the category of Molecular Functions. G GO enrichment analysis for overlapping genes in the category of Cellular Components. H KEGG pathway enrichment analysis for overlapping genes. AML acute myeloid leukemia, WGCNA Weighted Gene Co-expression Network Analysis, DEGs Differentially Expressed Genes, GO Gene Ontology, BP Biological Processes, MF Molecular Functions, CC Cellular Components, KEGG Kyoto Encyclopedia of Genes and Genomes WGCNA identified the critical genes of AML To identify the critical genes associated with AML, we used WGCNA analysis to divide gene expression profiles from complex biological processes into multiple highly correlated signature modules. And then, the dynamic tree cutting method was employed to detect and consolidate similar gene modules, ultimately identifying a total of 17 modules. Among these, the darkorange module (R = 0.14, P = 2e-04), darkturquoise module (R = 0.72, P = 4e-16), pink module (R = 0.27, P = 3e-13), grey60 module (R = 0.3, P = 4e-16), lightcyan module (R = 0.36, P = 2e-23), lightyellow module (R = 0.19, P = 3e-07), red module (R = 0.21, P = 2e-08), and black module (R = 0.11, P = 0.003) were negative correlated with AML, collectively comprising 816 critical genes. The violet module (R = 0.098, P = 0.009), midnightblue module (R = 0.095, P = 0.01), yellow module (R = 0.084, P = 0.02), darkred module (R = 0.034, P = 1e-20), green module (R = 0.096, P = 0.01), tan module (R = 0.083, P = 0.03), darkgreen module (R = 0.14, P = 1e-04), paleturquoise module (R = 0.22, P = 6e-09), and grey module (R = 0.11, P = 0.003) were positive correlated with AML, collectively comprising 422 critical genes (Fig. [86]4B: Module-trait relationships in AML). GO function and KEGG pathway annotation of shared genes After intersecting the critical targets of breast cancer and AML, we generated two Venn diagrams, displaying 102 and 56 overlapping genes, respectively (Fig. [87]4C: Venn diagram showing 102 overlapping genes between downregulated breast cancer genes and AML negative modules, Fig. [88]4D: Venn diagram showing 56 overlapping genes between upregulated breast cancer genes and AML positive modules). To gain deeper insights into the biological processes associated with these overlapping genes between breast cancer and AML, we conducted GO enrichment analysis and KEGG pathway analysis. The results of the GO analysis indicated that these genes primarily affect cellular component assembly, protein dimerization activity, and extracellular vesicles (Fig. [89]4E, F, G). Moreover, the KEGG enrichment analysis highlighted the impact of these overlapping genes on key cellular functions, including the p53 signaling pathway and the Apelin signaling pathway (Fig. [90]4H: KEGG pathway enrichment analysis of the overlapping genes). PPI network and analysis of hub genes The PPI network was constructed for the 158 overlapping genes. And then, the 15 hub genes were visualized using Cytoscape software. Briefly, H2AC6, H4C8, H2AC11, H2AC20, CDKN2A, H2BC12, H1-2, H2BC5, H1-3, H2AC13, H2BC21, H3C4, H2BC11, H1-4 and H2BC4 were sorted out. The deeper color indicates higher scores (Fig. [91]5A: The PPI network of 15 hub genes). Fig. 5. [92]Fig. 5 [93]Open in a new tab Analysis of hub genes and construction of a diagnostic and predictive model for AML risk prediction. A The PPI network of 15 hub genes. B, C LASSO regression was used for machine learning gene set selection. Coefficient profile diagram (B) and cross-validation diagram (C). D, E ROC curves of 8 genes related to AML in breast cancer patients across training set and validation set. PPI Protein–Protein interaction, LASSO Least Absolute Shrinkage and Selection Operator, ROC Receiver Operating Characteristic Construction of diagnostic and predictive model for AML risk prediction After further refining the previously identified 15 hub genes using LASSO, 8 genes (H2BC12, H1-4, H2BC4, CDKN2A, H2BC5, H2AC20, H2BC21, H2AC6) were selected to develop a diagnostic and predictive model for assessing AML risk in breast cancer patients, utilizing 8 distinct machine learning algorithms (Fig. [94]5B, C: LASSO regression was used for machine learning gene set selection). The outcomes from both the training and validation cohorts demonstrated substantial discriminative efficacy (AUC > 0.90, accuracy > 0.50, sensitivity > 0.90, specificity > 0.90), with additional details are provided in Table S5. Comprehensive analysis revealed that the GNB machine learning algorithm exhibited superior performance in terms of fit quality and evaluative reliability. The AUC, accuracy, sensitivity, and specificity in the training and validation cohorts were 1.000 and 0.996, 0.998 and 1.000, 1.000 and 0.994, 0.994 and 1.000, respectively (Table S5 and Fig. [95]5D, E: ROC curves of 8 genes related to AML in breast cancer patients across training set and validation set). Discussion Breast cancer is the most commonly diagnosed malignancy in women and remains the second leading cause of cancer-related mortality, despite significant advancements in treatment [[96]37, [97]38]. Although therapy-related AML is recognized as a significant long-term complication in breast cancer survivors treated with cytotoxic agents [[98]39], literature reports have also identified cases of AML in patients who have not been exposed to DNA-damaging therapies (including alkylating agents, antimetabolites, platinum-based antineoplastic agents, and topoisomerase inhibitors) during breast cancer treatment [[99]40]. Understanding the causal link between breast cancer and AML is vital for the surveillance of secondary tumors and long-term survival of breast cancer survivors. A retrospective cohort study utilizing the Korean national database revealed that breast cancer patients who did not receive adjuvant chemotherapy exhibited a cumulative 10-year incidence rate of approximately 0.024% for secondary AML [[100]41]. Similarly, a Danish population-based cohort study found that breast cancer survivors had a 90% increased risk of developing AML within 10 years of diagnosis, especially among those who did not receive endocrine therapy. Additionally, AML risk increased by 33% in patients who did not undergo chemotherapy [[101]42]. In line with the research findings mentioned above, our study used MR analysis to provide genetic evidence supporting a bidirectional causal relationship between breast cancer and AML. After excluding the confounding factors, including aging, gender, smoking, alcohol consumption, obesity, diabetes, exposure to benzene and formaldehyde, and previous radiochemotherapy, we found that breast cancer increases the risk of AML. Certain genetic variations might increase the risk of both breast cancer and AML in individuals. In breast cancer, mutations in the high-penetrance tumor suppressor genes BRCA1 and BRCA2 follow an autosomal-dominant inheritance pattern and typically manifest as either loss-of-function or missense mutations [[102]43]. Similarly, BRCA1 mutations have also been observed in AML [[103]44]. Breast cancer patients with a family history of breast or ovarian cancer, often linked to BRCA1 mutations, have an increased risk of developing leukemia [[104]45]. Although our MR analysis indicates that breast cancer is a risk factor for AML, the exact mechanism behind this association is still unclear. Therefore, we conducted a transcriptome overlap analysis of breast cancer and AML. Bioinformatic analyses indicated that extracellular vesicle functions, p53 signaling pathway and Apelin signaling pathway may mediate molecular links between breast cancer and AML. This perspective is consistent with findings from previous studies. Extracellular vesicles (EVs) are membrane-bound nanovesicles that carry nucleic acids and proteins essential for intercellular communication. They are released by all cell types under both normal physiological and pathological conditions [[105]46]. EVs are classified into three groups, including exosomes, microvesicles and apoptotic bodies, depending on their biogenesis and size [[106]47]. In breast cancer, EVs play a crucial role in promoting angiogenesis, tumor progression, immune evasion, metastasis, and chemoresistance [[107]48]. Likewise, during the progression of AML, exosomes alter the bone marrow microenvironment by transferring miRNAs or serving as carriers for oncogenic mRNAs, thereby increasing the potential for AML progression [[108]49]. Dysfunctions caused by TP53 tumor suppressor gene mutations are closely associated with tumor development [[109]50]. When TP53 gene mutates, the resulting mutant p53 loses its transcriptional activity and exhibits dominant-negative effects, which disrupt the p53 signaling pathway and promote cancer progression [[110]51, [111]52]. Studies have demonstrated that certain genes modulate the activity of the p53 pathway, leading to either enhanced proliferation and migration of breast cancer cells or inhibition of their proliferation and promotion of apoptosis [[112]53, [113]54]. Similarly, research has revealed the prevalence of p53 mutations in secondary AML [[114]55]. Both previous studies and our findings suggest that that abnormalities in the p53 signaling pathway and the functions of extracellular vesicles may be common characteristics shared by both breast cancer and AML. Thus, detecting abnormalities in the p53 signaling pathway and extracellular vesicle functions may serve as early warning markers for AML in breast cancer patients. By integrating transcriptome data from both breast cancer and AML, we identified H2BC12, H1-4, H2BC4, CDKN2A, H2BC5, H2AC20, H2BC21, and H2AC6 as potential diagnostic biomarkers. Among the aforementioned genes, CDKN2A (cyclin-dependent kinase inhibitor 2A) has been identified as being associated with both breast cancer and AML. In breast cancer, CDKN2A has been found to be both mutated and deleted, with its variants showing significant associations with the disease [[115]56, [116]57]. Similarly, in AML patients, downregulation of CDKN2A was identified through apoptotic gene expression analysis, and CDKN2A deletion was detected using oligonucleotide-array comparative genomic hybridization (oaCGH) [[117]58, [118]59]. Therefore, CDKN2A may be a shared gene involved in the pathogenesis of both breast cancer and AML. Monitoring its expression in breast cancer patients could potentially help predict the risk of secondary AML. Despite the significance of these findings, our research has several limitations. Firstly, the GWASs used primarily involved patients of European descent, which may limit the applicability of our results to other ethnic populations. Secondly, our analysis did not incorporate nonlinear associations, underscoring the need for a comprehensive dataset that includes all relevant exposure and outcome variables, as well as data on genetic instruments. Thirdly, the causal relationships inferred from MR should be further validated with real-world clinical data to enhance the robustness and reliability of our findings. Finally, this study exclusively employed bioinformatics to explore shared diagnostic genes and potential functions related to AML in breast cancer patients. These diagnostic genes require further validation in clinical cohorts to confirm their relevance and potential for clinical application. Conclusions Our MR analysis suggested a causal relationship between breast cancer and AML. Furthermore, transcriptome analysis provided a theoretical basis for understanding the potential mechanisms and identifying potential therapeutic targets for AML in breast cancer patients. Supplementary Information [119]Supplementary material 1.^ (4.4MB, xls) Acknowledgements