Abstract Background Breast cancer is the most prevalent and deadly cancer among women globally, necessitating more effective diagnostic and therapeutic approaches. This study aims to explore new treatment targets and diagnostic tools. Methods Employing machine learning techniques and utilizing PCR, IHC technologies, and multiple databases, we identified and validated genes closely linked with breast cancer and copper-induced cell death. We then explored how their expression levels impact cancer diagnosis, prognosis, immune cell infiltration, and drug sensitivity. Results This investigation identified three crucial genes—MT1M, GRHL2, and PKM—intimately associated with the copper death mechanism in breast cancer pathology. Validated through comprehensive analysis across cells, tissue models, and diverse databases, these genes showed significant differential expression (P-value < 0.05), affirming their pivotal role in enhancing diagnostic accuracy (AUC values: 0.917, 0.970, 0.951) and prognostic assessment (HR = 0.65, P = 0.018; HR = 1.69, P = 0.0011; HR = 1.51, P = 0.012) in breast cancer. Additionally, their expression levels influence the infiltration of immune cells and the sensitivity to certain drugs. Conclusion MT1M, GRHL2, and PKM are novel diagnostic and therapeutic targets for breast cancer. These findings enhance prognostic evaluations, deepen our understanding of its mechanisms. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-03340-2. Keywords: Biomarker, Breast cancer, Copper death, Immune infiltration, Machine learning Introduction Breast cancer is the most common malignant tumor in women, with the highest incidence and mortality rates [[38]1–[39]4]. Scientists have developed comprehensive biological markers for breast cancer, including ER, PR and HER2 [[40]5–[41]8], reliable predictive factors are still lacking for patients in advanced stages [[42]1]. Therefore, finding effective biomarkers to assess the prognosis of breast cancer patients and exploring new approaches of breast cancer treatment are particularly important. ‘Copper death’ and ferroptosis [[43]9, [44]10] are similar as both represent specialized forms of cell death, distinct from common apoptosis and autophagy. ‘Copper death’ specifically refers to cell death induced by copper and includes direct effects such as oxidative stress, as well as indirect mechanisms like copper poisoning [[45]11]. This process can lead to tumor cell death and affect tumor growth [[46]12–[47]15], making copper-related factors promising targets for cancer therapy [[48]16–[49]18].The role of copper death-related genes in breast cancer diagnosis, prognosis, and immune response remains to be fully understood. This study aims to use machine learning to pinpoint key genes and assess their relevance in breast cancer, offering fresh perspectives for treatment and diagnosis strategies. As illustrated by the specific research route shown in Fig. [50]1. Fig. 1. [51]Fig. 1 [52]Open in a new tab Flowchart for research Materials and methods Screening and processing of gene expression dataset We retrieved dataset [53]GSE42568 ([54]GPL571) from the GEO database ([55]https://www.ncbi.nlm.nih.gov/geo/), which includes 104 breast cancer samples and 17 normal breast biopsy specimens. Data normalization was performed using the ‘Limma’ package in R, with probe IDs converted to gene symbols based on the platform’s information. From the Genecard database, we selected 1038 genes related to Copper death (Table [56]S1), using a correlation score threshold of greater than 7. Further analysis with the ‘Limma’ package identified differentially expressed genes (DEGs) in [57]GSE42568, applying criteria of|log2 fold change (FC)| >1.5 and FDR-adjusted P-value < 0.05. To control for multiple testing, p-values were adjusted using the Benjamini-Hochberg method. Genes with log FC > 1.5 and FDR-adjusted P-value < 0.05 were classified as up-regulated, while those with log FC < -1.5 and FDR-adjusted P-value < 0.05 were considered down-regulated. For visualization, we employed the ‘Pheatmap’ and ‘ggplot2’ packages in R to create heatmaps and volcano plots of these DEGs, respectively. Analysis of immune infiltration, gene interactions, and regulatory networks We utilized the CIBERSORT [[58]19] algorithm to analyze immune cell composition in breast cancer and normal samples from the [59]GSE42568 dataset. The correlation between immune cells and key genes was assessed using the TIMER resource ([60]https://cistrome.shinyapps.io/timer/) [[61]20], with significance set at P-value < 0.05. GeneMANIA ([62]http://genemania.org) [[63]21, [64]22] was employed to construct a functional association network among hub genes. Unlike tools dedicated solely to PPI such as STRING, GeneMANIA integrates various data types—including co-expression, genetic interaction, pathway, and PPI—providing a broader view of functional relationships., and the GEPIA platform ([65]http://gepia.cancer-pku.cn/) [[66]23] helped analyze the relationship between hub gene expression and tumor profiles in breast cancer. NetworkAnalyst ([67]https://www.networkanalyst.ca/) [[68]24] was used for regulatory network construction, incorporating predictions from the JASPAR [[69]25] and TarBase [[70]26] databases. Survival analysis was conducted using the Kaplan-Meier plotter ([71]http://kmplot.com/analysis/) [[72]27], evaluating the impact of hub genes on breast cancer patient survival through hazard ratios (HR) and log-rank p-values. This comprehensive approach facilitates a deeper understanding of the molecular mechanisms in breast cancer. Weighted gene co-expression analysis “WGCNA” package [[73]28]construct co-expression networks and modules for control and patient groups. The “pickSoftThreshold” function was employed to filter and validate the optimal Soft Threshold b = 7 (Fig. [74]4B) [[75]29]. A hierarchical clustering tree was constructed using a dynamic tree-cutting approach. Each branch represents a module combining all genes with similar expression levels, and each leaf denotes an individual gene (Fig. [76]4C, D). Subsequently, the Pearson correlation coefficient was employed to establish an adjacency matrix, which was then transformed into a topology overlap matrix (TOM) with appropriate power values and topological anisotropy (1-TOM). Gene modules were delineated through hierarchical clustering based on the I-TOM dissimilarity matrix, continuously allocating genes into 33 colored modules for analyzing module-clinical trait (control and patient) co-expression similarity and adjacency. Fig. 4. [77]Fig. 4 [78]Open in a new tab Acquisition and functional enrichment analysis of 79 DEGCUs. A Venn diagrams of 79 DEGCUs. B, C The results of GO and KEGG analysis are displayed by bubble plots, respectively. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; BP, biological process; CC, cellular component; MF, molecular function Functional and pathway enrichment analysis Gene Ontology (GO) [[79]30] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [[80]31] functional enrichment analyses were conducted using the “clusterProfiler” R package in R. A q-value threshold of < 0.05 was set as the selection criteria to evaluate biological processes (BP), molecular functions (MF), cellular components (CC), and gene signaling pathways associated with the genes. Hub gene selection via machine learning The Least Absolute Shrinkage and Selection Operator (LASSO) logistic regression, performed using the “glmnet” package, is a data mining technique. By applying the L1 penalty (λ), it sets the coefficients of unimportant variables to zero, allowing the selection of crucial variables and the construction of the optimal classification model [[81]32]. LASSO was chosen over Ridge and Elastic Net regression due to its automatic feature selection capability through L1 regularization, which effectively identifies a small subset of key genes from high-dimensional data while providing superior interpretability—characteristics particularly suited for biomarker discovery in genomic studies. Support Vector Machine-Recursive Feature Elimination (SVM-RFE) is a common machine learning approach based on embedded methods, identifying the best core genes by discarding feature vectors produced by SVM [[82]33]. In this study, SVM-RFE was implemented with a 10-fold cross-validation strategy. The dataset was randomly divided into ten folds, and recursive feature elimination was applied within each fold to rank genes. The top-ranked genes were then evaluated across subsets of 1 to 40 features to determine the optimal feature number by examining error rate and accuracy. This approach mimics the functionality of RFECV. The SVM classifier was implemented using the e1071 R package with a linear kernel and default parameters (C = 1, gamma = 0.1). Random Forest (RF) analysis is a machine learning method rooted in decision trees, primarily identifying optimal variables through the R package e1071, scoring the significance of each variable to assess their importance [[83]34]. Hub genes are defined as the intersection of genes filtered by the three machine learning methods using the R package Venn. Gene set enrichment analysis We performed single-gene Gene Set Enrichment Analysis (GSEA) [[84]35] to further investigate the potential mechanisms through which the hub genes impact breast cancer. Quantitative RT–PCR The normal human mammary epithelial cell line MCF10A were obtained from Cellcook Biotech Company (Guangzhou, China). and the human breast cancer cell line MDA-MB-231 were obtained from Procell Life Science & Technology (Wuhan, China). Total RNA was extracted and purified using the RNA simple Total RNA Kit (Tiangen Biotech). For reverse transcription, the FastKing RT Kit (with gDNase) (Tiangen Biotech) was employed. qRT-PCR was performed according to the manufacturer’s instructions using the SuperReal PreMix Plus (SYBR Green) from Tiangen Biotech. Each experiment for every sample was repeated at least three times. The 2(-ΔΔCt) method was used to calculate relative expression levels, normalized to GAPDH. Differences in gene expression were statistically tested using the Student’s t-test. Results were analyzed and charts were created using GraphPad Prism (version 8.0). (*P-value < 0.05, ** P-value < 0.01, and *** P-value < 0.001). The sequences of the primers used are shown in Supplementary Table [85]S3. In addition, we collected more information on breast tumor cell lines from the Cancer Cell Line Encyclopedia (CCLE) database. Immunohistochemistry All patients from Harbin Medical University Cancer Hospital provided written informed consent. None of the breast cancer patients had received chemotherapy, radiotherapy, or any medical interventions before surgery. Post-surgery, all were pathologically diagnosed with invasive breast cancer. Tissue sections were prepared by paraffin embedding, heated at 58 °C for one hour, and treated for antigen retrieval. They were then incubated with MT1M (17281-1-AP, Proteintech), PKM (10078-2-AP, Proteintech), and GRHL2 (ab271023, Abcam) antibodies. After overnight incubation at room temperature with biotinylated secondary antibodies and a 30-minute incubation at 37 °C with horseradish peroxidase, sections were developed using DAB (OptiView Kit, Roche Diagnostics). Digital images of the tissue sections were analyzed using ImageJ software (version 2.1.0, [86]https://imagej.nih.gov/ij/). Analysis began by manually defining regions of interest (ROIs) and adjusting thresholds to identify positive staining. The immunohistochemistry score (IHC score) was calculated as the percentage of the positively stained area of the ROI, accurately assessing protein expression. Drug sensitivity analysis Data was obtained from the CellMiner database ([87]https://discover.nci.nih.gov/cellminer/home.do), with gene expression data derived from RNA sequencing of cell lines and drug compound activity measured by growth inhibition Z-scores. Expression levels were divided into high and low groups based on the median expression. Pearson correlation coefficients were used to determine the correlation between drug compound activity and gene expression. Analysis included 218 FDA-approved drugs and 574 drugs in clinical trials; drugs were selected based on a significance threshold of P-value < 0.05 (Table [88]S4). Statistical analysis Pearson correlation analysis was used to explore the relationship between the expression levels of the hub genes. Almost all bioinformatics and statistical analyses were performed using R software. A significance level of a two-sided p-value less than 0.05 was adopted. GraphPad Prism 8.0 (GraphPad Software, CA, USA) was used for statistical analysis, and unpaired t-tests were conducted to assess differential gene expression. Receiver Operating Characteristic (ROC) curve analysis was applied to evaluate the diagnostic value of hub genes. P-value < 0.05 was defined as statistically significant. Results Identification of DEGCUs Microarray data for breast cancer ([89]GSE42568) was sourced from the GEO database, yielding 900 differentially expressed genes (DEGs). From the Genecard database, 1036 Copper death-related genes were identified. An intersection of these datasets revealed 85 differentially expressed genes related to Copper death (DEGCUs), detailed in Fig. [90]2A-C. Fig. 2. [91]Fig. 2 [92]Open in a new tab Analysis of Differentially Expressed Genes (DEGs) and Immune Infiltration. A The volcano plot presents DEGs in dataset [93]GSE42568. B A heatmap displays the pattern of these DEGs. C Venn diagrams illustrate the overlap of DEGs with genes related to Copper death (DEGCUs). D Immune infiltration analysis explores the correlation between 22 types of immune infiltrating lymphocytes and DEGCUs in [94]GSE42568, comparing control (C) and patient (P) samples. Immune infiltration analysis The CIBERSORT algorithm was used to demonstrate that eight types of immune cell populations in breast cancer samples and normal samples (including CD8 + T cells, activated NK cells, M0 macrophages, M1 macrophages, M2 macrophages, resting dendritic cells, activated dendritic cells, and resting mast cells) are correlated. The results were statistically significant (P-value < 0.05) (Fig. [95]2D). Weighted co-expression network construction and core module selection The heatmap delineates module-trait correlations in breast cancer weighted co-expression networks. Diverse co-expression modules are color-coded, with corresponding correlation coefficients and p-values. The horizontal axis categorizes samples into control (‘c’) and patient (‘p’) groups. Notably, the turquoise module, comprising 2799 genes, exhibited the strongest correlation with breast cancer (Fig. [96]3A-C). Fig. 3. [97]Fig. 3 [98]Open in a new tab Construction of weighted gene co-expression networks. A Sample clustering dendrogram with tree leaves corresponding to individual samples. B Shows the original and combined modules under the clustering tree. C Heat map of module-trait relationships, where each color represents a co-expression module and the values represent module-trait correlation coefficients and p-values. It can be seen that the ME turquoise module has the highest correlation with breast cancer. WGCNA, weighted gene co-expression networks analysis Acquisition of 79 DEGCUs and Identification of hub genes by machine learning The identified DEGCUs comprising 79 genes, overlap between the 85 DEGCUs set and the turquoise module in WGCNA analysis (Fig. [99]4A). Functional enrichment using GO and KEGG highlighted their roles in biological processes such as peptide hormone response and localization in cellular components like vesicular lumens. Molecular Function analysis associated these genes with receptor activities and protein C-terminus binding (Fig. [100]4B). KEGG pathway analysis indicated their involvement in pathways like cancer proteoglycans, lipid metabolism, and the FoxO signaling pathway (Fig. [101]4C).Using the LASSO regression algorithm, 13 genes were extracted from the 79 DEGCUs (Fig. [102]5A). Next, the SVM-RFE algorithm identified 18 genes (Fig. [103]5B). Then, 32 genes were selected through the RF (Random Forest) algorithm (Fig. [104]5C). Subsequently, a Venn diagram was used to visualize the intersecting genes, ultimately yielding 8 genes, namely ANK2, ATPB1, ESD, FN1, GRHL2, MT1M, and PKM (Fig. [105]5D). Fig. 5. [106]Fig. 5 [107]Open in a new tab Screening hub genes by machine learning. A LASSO regression algorithm. B SVM-RFE algorithm. C RF algorithm. D Venn diagrams for three algorithms. LASSO, Least Absolute Shrinkage and Selection Operator; SVM-RFE, Support Vector Machine-Recursive Feature Elimination; RF, Random Forest Analysis of hub gene interaction We evaluated the correlation among hub genes using the Pearson correlation coefficient. Spearman correlation analysis revealed a significant positive correlation between ESD and ANK2, PKM, ATP8B1, GRHL2, and TOP2A, as well as a negative correlation between ANK2 and GRHL2, TOP2A, FN1, and between FN1 and ESD (Figure [108]S1A). To further investigate these relationships, we constructed a Gene–gene interaction (GGI) network of the hub genes using the GeneMANIA database. This network illustrates the interactions among genes in our list, forming a composite of selected networks from the database to optimally connect related genes. In the network, nodes symbolize genes, while links represent network connections, indicating that genes may be interconnected through multiple network types (Figure [109]S1B). Expression of hub genes and validation of external datasets We also compared the transcriptional levels of hub genes between breast cancer and normal tissues using GEPIA (Figure [110]S2). We found that ATPB1, FN1, CRHL2, PKM, and TOP2A are downregulated in tumor tissues, while ANK1 and MT1M mRNA levels are significantly upregulated(P-value < 0.05 ). Diagnostic and prognostic values of hub genes in breast cancer patients Hub genes with an Area Under the Curve (AUC) value greater than 0.9 were considered significant diagnostic markers. Results from the [111]GSE42568 dataset indicated that the AUC values were as follows: ANK2, 0.963; ATP8B1, 0.911; ESD, 0.920; FN1, 0.923; GRHL2, 0.970; MT1M, 0.917; PKM, 0.951; and TOP2A, 0.931 (Fig. [112]6A). As depicted in Figure [113]S3, MT1M demonstrated a positive correlation with the overall survival (OS) of breast cancer patients, evidenced by a hazard ratio (HR) of 0.65 (95% CI: 0.45–0.93, P = 0.018). Conversely, GRHL2 and PKM exhibited negative correlations, with HRs of 1.69 (95% CI: 1.23–2.33, P = 0.0011) and 1.51 (95% CI: 1.09–2.09, P = 0.012), respectively. We identified that only MT1M, GRHL2, and PKM show differential expression in breast cancer and possess significant diagnostic and prognostic value. Fig. 6. [114]Fig. 6 [115]Open in a new tab ROC Analysis, GSEA, and PCR Validation. A ROC curve analysis. Assesses the predictive values of these genes using ROC curves from the [116]GSE42568 dataset. B GSEA analysis of hub genes. C Real-Time PCR. Quantifies mRNA levels of the genes in breast cancer cells versus normal cells, with statistical significance marked as *P-value < 0.05, **P-value < 0.01, ***P-value < 0.001;'ns' for no significant difference GSEA analysis According to the GSEA results, the high expression group of GRHL2 primarily focuses on African trypanosomiasis and allograft rejection. The high expression group of MT1M mainly concentrates on asthma and axon guidance. Lastly, the high expression group of PKM primarily centers around 2-oxocarboxylic acid metabolism and collecting duct acid secretion (Fig. [117]6B). Correlation of hub genes with immune infiltration in breast cancer This study utilized the TIMER database to investigate the correlation between the expression of three central genes and immune cell infiltration in breast cancer (Figure [118]S4). Expression of GRHL2 showed a positive correlation with the infiltration of CD8 + T cells, macrophages, and neutrophils in breast cancer. PKM2 expression was positively correlated with the infiltration of B cells, neutrophils, and dendritic cells, but negatively correlated with the infiltration of CD4 + T cells in breast cancer. Moreover, infiltration of CD8 + T cells, CD4 + T cells, neutrophils, and dendritic cells showed a positive correlation with the expression of MT1M in breast cancer patients. RT-PCR and immunohistochemistry analysis PCR analysis revealed elevated PKM expression and reduced MT1M and GRHL2 expressions in MDA-MB-231 breast cancer cells compared to normal MCF-10 A cells (Fig. [119]6C). Immunohistochemical (IHC) analyses demonstrated high PKM and GRHL2 expressions in cancerous tissues, with MT1M primarily found in adjacent non-cancerous tissues (Fig. [120]7). Consistently, the CCLE database indicated widespread upregulation of PKM and variable expressions of MT1M and GRHL2 across various breast cancer cell lines (Figure S5). Fig. 7. [121]Fig. 7 [122]Open in a new tab Immunohistochemical Staining Results and Statistical Analysis of Key Genes in Infiltrating Breast Cancer. A Immunohistochemical images and local magnification of infiltrating breast cancer and adjacent non-cancerous tissues. B Statistical results of gene expression in infiltrating breast cancer and adjacent non-cancerous tissues Regulatory network construction To construct the regulatory network, we predicted miRNAs and transcription factors (TFs) on MT1M, GRHL2, PKM. The visualization of the regulatory network was achieved using Cytoscape (version 3.7.1). Utilizing the JASPAR database, we identified several transcription factors (TFs). Among these, 1 transcription factor had a degree (node connectivity) of ≥ 2 (Table [123]S2). Potential miRNAs were predicted using the TarBase database, resulting in 9 miRNAs with a degree of ≥ 2 (Figure S6). Drug sensitivity analysis This study evaluated the association between three genes (MT1M, GRHL2, PKM) and the sensitivity to specific drugs (aloin, AZD-9496, GSK-2126458). The results showed that the drug activity Z-scores of the high-expression groups for MT1M and GRHL2 were significantly higher than those of the low-expression groups, and they were significantly positively correlated with drug sensitivity (Cor = 0.609, P-value < 0.0001; Cor = 0.563, P-value < 0.00001). The correlation for PKM was weaker (Cor = 0.395, P = 0.0018), suggesting a potential association between the expression of these copper-related genes and drug sensitivity in breast cancer. However, since no experimental validation or molecular docking was conducted, these observations should be considered preliminary and hypothesis-generating, warranting further experimental investigation (Figure S7). Discussion Breast cancer is a common and dangerous tumor affecting women. A deep understanding of its pathogenesis is essential for better prognosis and new treatment pathways. The use of machine learning algorithms and other bioinformatics methods to identify biomarkers of disease has been employed in various illnesses, such as osteoarthritis [[124]36], breast cancer [[125]37, [126]38]and gastric cancer [[127]39]. “Copper death,” a cellular mortality mechanism instigated by copper, has emerged as a potential new target for cancer treatment. Additionally, immune cell infiltration is linked to various responses to breast cancer prognosis and treatment [[128]40]. Therefore, it is rational to explore the relationship between Copper death and breast cancer using machine learning and other techniques. The aim of this study is to identify potential Copper death-related genes in breast cancer and to explore the roles and mechanisms of these genes in immune infiltration within breast cancer tissues, offering new insights and directions for the potential mechanisms and early diagnosis of breast cancer. In this study, we retrieved 85 disease-associated differentially expressed gene clusters (DEGCUs) from the transcriptomic dataset GSE 42,568 in the GEO database. Through GO and KEGG enrichment analyses, we found that these genes are primarily enriched in the “proteoglycans in cancer” pathway. Additionally, employing machine learning, ROC analysis, and survival curve analysis, we identified that only MT1M, GRHL2, and PKM show differential expression in breast cancer and possess significant diagnostic and prognostic value. These genes are not only statistically significant but also play crucial roles in the biological processes of the disease, demonstrating strong potential for clinical application. We conducted experimental validations using qRT-PCR and immunohistochemistry (IHC), and the results were largely consistent with our bioinformatics findings, further reinforcing the accuracy of our research model. Importantly, we discovered that changes in the expression of these genes not only affect the infiltration of immune cells but also alter the sensitivity to certain specific drugs. This suggests that these genes may influence the progression and treatment response of breast cancer by modulating the immune microenvironment and drug response pathways. Through these analyses and validations, we have not only confirmed the biological significance of these genes but also highlighted their potential diagnostic and therapeutic value in clinical applications, providing a scientific basis for future clinical trials and treatment strategies. MT1M is a member of the metallothionein family and is also highly involved in cancer, demonstrating inhibitory effects [[129]41–[130]48]. MT1M has been shown to regulate cell apoptosis, proliferation, migration, invasion, and epithelial-mesenchymal transition (EMT) through its impact on signaling pathways such as Bcl 2, Bax, Caspase-3, and the NF-kB pathway [[131]42, [132]45].Therefore, MT1M can function as a tumor suppressor. In this study, we found that MT1M might be associated with Copper death and immune evasion. Moreover, MT1M expression is reduced in breast cancer patient tissues and cells, and it possesses significant diagnostic accuracy (with AUC > 0.90). Survival analysis indicates that high expression of MT1M positively correlates with patient prognosis, suggesting that MT1M might serve as a diagnostic and prognostic biomarker for breast cancer. This is consistent with previous research findings. GRHL2 belongs to the Grainyhead-like (GRHL) family [[133]49, [134]50]. It is associated with cancer development. Some studies suggest that GRHL2 has tumor-suppressing effects, countering epithelial-mesenchymal transition by upregulating epithelial markers or downregulating mesenchymal markers [[135]51–[136]53]. However, other research indicates that GRHL2 might enhance proliferation, replication potential, and evade cell apoptosis by activating the ErbB3 gene and inhibiting the expression of apoptotic receptors, suggesting it may also have oncogenic functions [[137]54–[138]57]. In our study, data retrieved from the TCGA + GTEx database and IHC indicates that GRHL2 is highly expressed in breast cancer tissues. High expression of GRHL2 negatively correlates with patient prognosis. However, cell experiments revealed that the expression of GRHL2 in breast epithelial cells, MCF10A, is higher than in the breast cancer cell line, MDA-MB-231. Thus, we cannot exclude the possibility that its expression in cancer tissues might be elevated in cells other than cancer cells, including immune cells. This elevated expression in immune cells also seems to potentially influence prognosis. PKM is the rate-limiting enzyme in the final step of aerobic glycolysis [[139]58, [140]59]. participates in the regulation of various biological processes, including cell proliferation, differentiation, migration, inflammation, signal transduction, and synthetic and catabolic pathways [[141]60–[142]62]. Knocking out PKM can inhibit the proliferation of several different types of tumor cells, leading to their apoptosis [[143]61–[144]63]. Our findings align with prior research, showing elevated PKM expression in breast cancer with negative OS correlation. Moreover. Aerobic glycolysis, as a novel therapeutic target in cancer, including in pancreatic cancer [[145]64], and liver cancer [[146]65], consistent with our GSEA results. However, previous research has never explored its function and significance in Copper death and immunotherapy. We discovered that this gene occupies a central position in the Copper death-related GGI analysis and is enriched with the most relevant functions. Therefore, the association between aerobic glycolysis and Copper death might be an innovative aspect worth exploring in the future. The aforementioned three pivotal genes are interconnected, playing a crucial role in Copper death and immunity within the context of breast cancer. Utilizing both WCGNA and cutting-edge machine learning methodologies, this research delved into an in-depth analysis of these genes within tumor and immune cells. The findings present a more holistic and trustworthy perspective compared to conventional methodologies. Future research endeavors will be constructed upon these three genes. Our study conducted a systematic analysis using bioinformatics approaches; however, we acknowledge several important limitations. First, the research relied heavily on publicly available transcriptomic datasets, and the functional roles of the identified hub genes (such as MT1M, GRHL2, and PKM) require further validation through in vitro and in vivo experiments to enhance the biological interpretability and translational potential of our findings. Second, in applying machine learning methods such as LASSO and SVM-RFE, we did not perform systematic hyperparameter tuning (e.g., grid search or random search for kernel type, C value, and gamma in SVM). Instead, default parameters were used. Although 10-fold cross-validation was incorporated into the SVM-RFE procedure to enhance the stability of feature selection, mimicking the behavior of RFECV, this manual implementation still presents limitations in terms of model robustness. Given that the performance of SVM is highly dependent on kernel functions and parameter settings, the absence of comprehensive optimization may introduce potential bias into the results. Finally, in the qRT-PCR validation, we used the 2^(-ΔΔCt) method to calculate relative expression levels, but did not experimentally validate the amplification efficiencies of the target and reference (GAPDH) genes. This method assumes comparable and near-100% amplification efficiencies between genes. Although previous studies suggest that differences in commonly used primers are usually minor, this unverified assumption may still introduce slight deviations. Moreover, while GAPDH is a widely used reference gene, its expression stability across experimental groups was not statistically assessed in our study, which may represent another source of variability. These methodological limitations suggest that future studies should incorporate experimental validation of primer efficiency, use multiple stable reference genes, and adopt comprehensive hyperparameter optimization strategies in machine learning analyses to improve the robustness and reproducibility of results. Conclusions This study accurately identified and recognized MT1M, GRHL2, and PKM—three key genes closely associated with Copper death and immune response in breast cancer. These genes play a crucial role in the progression of breast cancer and, as potential tumor biomarkers, hold significant importance for clinical translation. Supplementary Information Below is the link to the electronic supplementary material. [147]Supplementary Material 1^ (106.7MB, docx) [148]Supplementary Material 2^ (94.4KB, xlsx) Acknowledgements