Abstract

Background

   Breast cancer is the most prevalent and deadly cancer among women
   globally, necessitating more effective diagnostic and therapeutic
   approaches. This study aims to explore new treatment targets and
   diagnostic tools.

Methods

   Employing machine learning techniques and utilizing PCR, IHC
   technologies, and multiple databases, we identified and validated genes
   closely linked with breast cancer and copper-induced cell death. We
   then explored how their expression levels impact cancer diagnosis,
   prognosis, immune cell infiltration, and drug sensitivity.

Results

   This investigation identified three crucial genes—MT1M, GRHL2, and
   PKM—intimately associated with the copper death mechanism in breast
   cancer pathology. Validated through comprehensive analysis across
   cells, tissue models, and diverse databases, these genes showed
   significant differential expression (P-value < 0.05), affirming their
   pivotal role in enhancing diagnostic accuracy (AUC values: 0.917,
   0.970, 0.951) and prognostic assessment (HR = 0.65, P = 0.018;
   HR = 1.69, P = 0.0011; HR = 1.51, P = 0.012) in breast cancer.
   Additionally, their expression levels influence the infiltration of
   immune cells and the sensitivity to certain drugs.

Conclusion

   MT1M, GRHL2, and PKM are novel diagnostic and therapeutic targets for
   breast cancer. These findings enhance prognostic evaluations, deepen
   our understanding of its mechanisms.

Supplementary Information

   The online version contains supplementary material available at
   10.1007/s12672-025-03340-2.

   Keywords: Biomarker, Breast cancer, Copper death, Immune infiltration,
   Machine learning

Introduction

   Breast cancer is the most common malignant tumor in women, with the
   highest incidence and mortality rates [[38]1–[39]4]. Scientists have
   developed comprehensive biological markers for breast cancer, including
   ER, PR and HER2 [[40]5–[41]8], reliable predictive factors are still
   lacking for patients in advanced stages [[42]1]. Therefore, finding
   effective biomarkers to assess the prognosis of breast cancer patients
   and exploring new approaches of breast cancer treatment are
   particularly important. ‘Copper death’ and ferroptosis [[43]9, [44]10]
   are similar as both represent specialized forms of cell death, distinct
   from common apoptosis and autophagy. ‘Copper death’ specifically refers
   to cell death induced by copper and includes direct effects such as
   oxidative stress, as well as indirect mechanisms like copper poisoning
   [[45]11]. This process can lead to tumor cell death and affect tumor
   growth [[46]12–[47]15], making copper-related factors promising targets
   for cancer therapy [[48]16–[49]18].The role of copper death-related
   genes in breast cancer diagnosis, prognosis, and immune response
   remains to be fully understood. This study aims to use machine learning
   to pinpoint key genes and assess their relevance in breast cancer,
   offering fresh perspectives for treatment and diagnosis strategies. As
   illustrated by the specific research route shown in Fig. [50]1.

Fig. 1.

   [51]Fig. 1
   [52]Open in a new tab

   Flowchart for research

Materials and methods

Screening and processing of gene expression dataset

   We retrieved dataset [53]GSE42568 ([54]GPL571) from the GEO database
   ([55]https://www.ncbi.nlm.nih.gov/geo/), which includes 104 breast
   cancer samples and 17 normal breast biopsy specimens. Data
   normalization was performed using the ‘Limma’ package in R, with probe
   IDs converted to gene symbols based on the platform’s information. From
   the Genecard database, we selected 1038 genes related to Copper death
   (Table [56]S1), using a correlation score threshold of greater than 7.
   Further analysis with the ‘Limma’ package identified differentially
   expressed genes (DEGs) in [57]GSE42568, applying criteria of|log2 fold
   change (FC)| >1.5 and FDR-adjusted P-value < 0.05. To control for
   multiple testing, p-values were adjusted using the Benjamini-Hochberg
   method. Genes with log FC > 1.5 and FDR-adjusted P-value < 0.05 were
   classified as up-regulated, while those with log FC < -1.5 and
   FDR-adjusted P-value < 0.05 were considered down-regulated. For
   visualization, we employed the ‘Pheatmap’ and ‘ggplot2’ packages in R
   to create heatmaps and volcano plots of these DEGs, respectively.

Analysis of immune infiltration, gene interactions, and regulatory networks

   We utilized the CIBERSORT [[58]19] algorithm to analyze immune cell
   composition in breast cancer and normal samples from the [59]GSE42568
   dataset. The correlation between immune cells and key genes was
   assessed using the TIMER resource
   ([60]https://cistrome.shinyapps.io/timer/) [[61]20], with significance
   set at P-value < 0.05. GeneMANIA ([62]http://genemania.org) [[63]21,
   [64]22] was employed to construct a functional association network
   among hub genes. Unlike tools dedicated solely to PPI such as STRING,
   GeneMANIA integrates various data types—including co-expression,
   genetic interaction, pathway, and PPI—providing a broader view of
   functional relationships., and the GEPIA platform
   ([65]http://gepia.cancer-pku.cn/) [[66]23] helped analyze the
   relationship between hub gene expression and tumor profiles in breast
   cancer. NetworkAnalyst ([67]https://www.networkanalyst.ca/) [[68]24]
   was used for regulatory network construction, incorporating predictions
   from the JASPAR [[69]25] and TarBase [[70]26] databases. Survival
   analysis was conducted using the Kaplan-Meier plotter
   ([71]http://kmplot.com/analysis/) [[72]27], evaluating the impact of
   hub genes on breast cancer patient survival through hazard ratios (HR)
   and log-rank p-values. This comprehensive approach facilitates a deeper
   understanding of the molecular mechanisms in breast cancer.

Weighted gene co-expression analysis

   “WGCNA” package [[73]28]construct co-expression networks and modules
   for control and patient groups. The “pickSoftThreshold” function was
   employed to filter and validate the optimal Soft Threshold b = 7
   (Fig. [74]4B) [[75]29]. A hierarchical clustering tree was constructed
   using a dynamic tree-cutting approach. Each branch represents a module
   combining all genes with similar expression levels, and each leaf
   denotes an individual gene (Fig. [76]4C, D). Subsequently, the Pearson
   correlation coefficient was employed to establish an adjacency matrix,
   which was then transformed into a topology overlap matrix (TOM) with
   appropriate power values and topological anisotropy (1-TOM). Gene
   modules were delineated through hierarchical clustering based on the
   I-TOM dissimilarity matrix, continuously allocating genes into 33
   colored modules for analyzing module-clinical trait (control and
   patient) co-expression similarity and adjacency.

Fig. 4.

   [77]Fig. 4
   [78]Open in a new tab

   Acquisition and functional enrichment analysis of 79 DEGCUs. A Venn
   diagrams of 79 DEGCUs. B, C The results of GO and KEGG analysis are
   displayed by bubble plots, respectively. GO, Gene Ontology; KEGG, Kyoto
   Encyclopedia of Genes and Genomes; BP, biological process; CC, cellular
   component; MF, molecular function

Functional and pathway enrichment analysis

   Gene Ontology (GO) [[79]30] and Kyoto Encyclopedia of Genes and Genomes
   (KEGG) [[80]31] functional enrichment analyses were conducted using the
   “clusterProfiler” R package in R. A q-value threshold of < 0.05 was set
   as the selection criteria to evaluate biological processes (BP),
   molecular functions (MF), cellular components (CC), and gene signaling
   pathways associated with the genes.

Hub gene selection via machine learning

   The Least Absolute Shrinkage and Selection Operator (LASSO) logistic
   regression, performed using the “glmnet” package, is a data mining
   technique. By applying the L1 penalty (λ), it sets the coefficients of
   unimportant variables to zero, allowing the selection of crucial
   variables and the construction of the optimal classification model
   [[81]32]. LASSO was chosen over Ridge and Elastic Net regression due to
   its automatic feature selection capability through L1 regularization,
   which effectively identifies a small subset of key genes from
   high-dimensional data while providing superior
   interpretability—characteristics particularly suited for biomarker
   discovery in genomic studies. Support Vector Machine-Recursive Feature
   Elimination (SVM-RFE) is a common machine learning approach based on
   embedded methods, identifying the best core genes by discarding feature
   vectors produced by SVM [[82]33]. In this study, SVM-RFE was
   implemented with a 10-fold cross-validation strategy. The dataset was
   randomly divided into ten folds, and recursive feature elimination was
   applied within each fold to rank genes. The top-ranked genes were then
   evaluated across subsets of 1 to 40 features to determine the optimal
   feature number by examining error rate and accuracy. This approach
   mimics the functionality of RFECV. The SVM classifier was implemented
   using the e1071 R package with a linear kernel and default parameters
   (C = 1, gamma = 0.1). Random Forest (RF) analysis is a machine learning
   method rooted in decision trees, primarily identifying optimal
   variables through the R package e1071, scoring the significance of each
   variable to assess their importance [[83]34]. Hub genes are defined as
   the intersection of genes filtered by the three machine learning
   methods using the R package Venn.

Gene set enrichment analysis

   We performed single-gene Gene Set Enrichment Analysis (GSEA) [[84]35]
   to further investigate the potential mechanisms through which the hub
   genes impact breast cancer.

Quantitative RT–PCR

   The normal human mammary epithelial cell line MCF10A were obtained from
   Cellcook Biotech Company (Guangzhou, China). and the human breast
   cancer cell line MDA-MB-231 were obtained from Procell Life Science &
   Technology (Wuhan, China). Total RNA was extracted and purified using
   the RNA simple Total RNA Kit (Tiangen Biotech). For reverse
   transcription, the FastKing RT Kit (with gDNase) (Tiangen Biotech) was
   employed. qRT-PCR was performed according to the manufacturer’s
   instructions using the SuperReal PreMix Plus (SYBR Green) from Tiangen
   Biotech. Each experiment for every sample was repeated at least three
   times. The 2(-ΔΔCt) method was used to calculate relative expression
   levels, normalized to GAPDH. Differences in gene expression were
   statistically tested using the Student’s t-test. Results were analyzed
   and charts were created using GraphPad Prism (version 8.0).
   (*P-value < 0.05, ** P-value < 0.01, and *** P-value < 0.001). The
   sequences of the primers used are shown in Supplementary Table [85]S3.
   In addition, we collected more information on breast tumor cell lines
   from the Cancer Cell Line Encyclopedia (CCLE) database.

Immunohistochemistry

   All patients from Harbin Medical University Cancer Hospital provided
   written informed consent. None of the breast cancer patients had
   received chemotherapy, radiotherapy, or any medical interventions
   before surgery. Post-surgery, all were pathologically diagnosed with
   invasive breast cancer. Tissue sections were prepared by paraffin
   embedding, heated at 58 °C for one hour, and treated for antigen
   retrieval. They were then incubated with MT1M (17281-1-AP,
   Proteintech), PKM (10078-2-AP, Proteintech), and GRHL2 (ab271023,
   Abcam) antibodies. After overnight incubation at room temperature with
   biotinylated secondary antibodies and a 30-minute incubation at 37 °C
   with horseradish peroxidase, sections were developed using DAB
   (OptiView Kit, Roche Diagnostics). Digital images of the tissue
   sections were analyzed using ImageJ software (version 2.1.0,
   [86]https://imagej.nih.gov/ij/). Analysis began by manually defining
   regions of interest (ROIs) and adjusting thresholds to identify
   positive staining. The immunohistochemistry score (IHC score) was
   calculated as the percentage of the positively stained area of the ROI,
   accurately assessing protein expression.

Drug sensitivity analysis

   Data was obtained from the CellMiner database
   ([87]https://discover.nci.nih.gov/cellminer/home.do), with gene
   expression data derived from RNA sequencing of cell lines and drug
   compound activity measured by growth inhibition Z-scores. Expression
   levels were divided into high and low groups based on the median
   expression. Pearson correlation coefficients were used to determine the
   correlation between drug compound activity and gene expression.
   Analysis included 218 FDA-approved drugs and 574 drugs in clinical
   trials; drugs were selected based on a significance threshold of
   P-value < 0.05 (Table [88]S4).

Statistical analysis

   Pearson correlation analysis was used to explore the relationship
   between the expression levels of the hub genes. Almost all
   bioinformatics and statistical analyses were performed using R
   software. A significance level of a two-sided p-value less than 0.05
   was adopted. GraphPad Prism 8.0 (GraphPad Software, CA, USA) was used
   for statistical analysis, and unpaired t-tests were conducted to assess
   differential gene expression. Receiver Operating Characteristic (ROC)
   curve analysis was applied to evaluate the diagnostic value of hub
   genes. P-value < 0.05 was defined as statistically significant.

Results

Identification of DEGCUs

   Microarray data for breast cancer ([89]GSE42568) was sourced from the
   GEO database, yielding 900 differentially expressed genes (DEGs). From
   the Genecard database, 1036 Copper death-related genes were identified.
   An intersection of these datasets revealed 85 differentially expressed
   genes related to Copper death (DEGCUs), detailed in Fig. [90]2A-C.

Fig. 2.

   [91]Fig. 2
   [92]Open in a new tab

   Analysis of Differentially Expressed Genes (DEGs) and Immune
   Infiltration. A The volcano plot presents DEGs in dataset [93]GSE42568.
   B A heatmap displays the pattern of these DEGs. C Venn diagrams
   illustrate the overlap of DEGs with genes related to Copper death
   (DEGCUs). D Immune infiltration analysis explores the correlation
   between 22 types of immune infiltrating lymphocytes and DEGCUs in
   [94]GSE42568, comparing control (C) and patient (P) samples.

Immune infiltration analysis

   The CIBERSORT algorithm was used to demonstrate that eight types of
   immune cell populations in breast cancer samples and normal samples
   (including CD8 + T cells, activated NK cells, M0 macrophages, M1
   macrophages, M2 macrophages, resting dendritic cells, activated
   dendritic cells, and resting mast cells) are correlated. The results
   were statistically significant (P-value < 0.05) (Fig. [95]2D).

Weighted co-expression network construction and core module selection

   The heatmap delineates module-trait correlations in breast cancer
   weighted co-expression networks. Diverse co-expression modules are
   color-coded, with corresponding correlation coefficients and p-values.
   The horizontal axis categorizes samples into control (‘c’) and patient
   (‘p’) groups. Notably, the turquoise module, comprising 2799 genes,
   exhibited the strongest correlation with breast cancer (Fig. [96]3A-C).

Fig. 3.

   [97]Fig. 3
   [98]Open in a new tab

   Construction of weighted gene co-expression networks. A Sample
   clustering dendrogram with tree leaves corresponding to individual
   samples. B Shows the original and combined modules under the clustering
   tree. C Heat map of module-trait relationships, where each color
   represents a co-expression module and the values represent module-trait
   correlation coefficients and p-values. It can be seen that the ME
   turquoise module has the highest correlation with breast cancer. WGCNA,
   weighted gene co-expression networks analysis

Acquisition of 79 DEGCUs and Identification of hub genes by machine learning

   The identified DEGCUs comprising 79 genes, overlap between the 85
   DEGCUs set and the turquoise module in WGCNA analysis (Fig. [99]4A).
   Functional enrichment using GO and KEGG highlighted their roles in
   biological processes such as peptide hormone response and localization
   in cellular components like vesicular lumens. Molecular Function
   analysis associated these genes with receptor activities and protein
   C-terminus binding (Fig. [100]4B). KEGG pathway analysis indicated
   their involvement in pathways like cancer proteoglycans, lipid
   metabolism, and the FoxO signaling pathway (Fig. [101]4C).Using the
   LASSO regression algorithm, 13 genes were extracted from the 79 DEGCUs
   (Fig. [102]5A). Next, the SVM-RFE algorithm identified 18 genes
   (Fig. [103]5B). Then, 32 genes were selected through the RF (Random
   Forest) algorithm (Fig. [104]5C). Subsequently, a Venn diagram was used
   to visualize the intersecting genes, ultimately yielding 8 genes,
   namely ANK2, ATPB1, ESD, FN1, GRHL2, MT1M, and PKM (Fig. [105]5D).

Fig. 5.

   [106]Fig. 5
   [107]Open in a new tab

   Screening hub genes by machine learning. A LASSO regression algorithm.
   B SVM-RFE algorithm. C RF algorithm. D Venn diagrams for three
   algorithms. LASSO, Least Absolute Shrinkage and Selection Operator;
   SVM-RFE, Support Vector Machine-Recursive Feature Elimination; RF,
   Random Forest

Analysis of hub gene interaction

   We evaluated the correlation among hub genes using the Pearson
   correlation coefficient. Spearman correlation analysis revealed a
   significant positive correlation between ESD and ANK2, PKM, ATP8B1,
   GRHL2, and TOP2A, as well as a negative correlation between ANK2 and
   GRHL2, TOP2A, FN1, and between FN1 and ESD (Figure [108]S1A). To
   further investigate these relationships, we constructed a Gene–gene
   interaction (GGI) network of the hub genes using the GeneMANIA
   database. This network illustrates the interactions among genes in our
   list, forming a composite of selected networks from the database to
   optimally connect related genes. In the network, nodes symbolize genes,
   while links represent network connections, indicating that genes may be
   interconnected through multiple network types (Figure [109]S1B).

Expression of hub genes and validation of external datasets

   We also compared the transcriptional levels of hub genes between breast
   cancer and normal tissues using GEPIA (Figure [110]S2). We found that
   ATPB1, FN1, CRHL2, PKM, and TOP2A are downregulated in tumor tissues,
   while ANK1 and MT1M mRNA levels are significantly
   upregulated(P-value < 0.05 ).

Diagnostic and prognostic values of hub genes in breast cancer patients

   Hub genes with an Area Under the Curve (AUC) value greater than 0.9
   were considered significant diagnostic markers. Results from the
   [111]GSE42568 dataset indicated that the AUC values were as follows:
   ANK2, 0.963; ATP8B1, 0.911; ESD, 0.920; FN1, 0.923; GRHL2, 0.970; MT1M,
   0.917; PKM, 0.951; and TOP2A, 0.931 (Fig. [112]6A). As depicted in
   Figure [113]S3, MT1M demonstrated a positive correlation with the
   overall survival (OS) of breast cancer patients, evidenced by a hazard
   ratio (HR) of 0.65 (95% CI: 0.45–0.93, P = 0.018). Conversely, GRHL2
   and PKM exhibited negative correlations, with HRs of 1.69 (95% CI:
   1.23–2.33, P = 0.0011) and 1.51 (95% CI: 1.09–2.09, P = 0.012),
   respectively. We identified that only MT1M, GRHL2, and PKM show
   differential expression in breast cancer and possess significant
   diagnostic and prognostic value.

Fig. 6.

   [114]Fig. 6
   [115]Open in a new tab

   ROC Analysis, GSEA, and PCR Validation. A ROC curve analysis. Assesses
   the predictive values of these genes using ROC curves from the
   [116]GSE42568 dataset. B GSEA analysis of hub genes. C Real-Time PCR.
   Quantifies mRNA levels of the genes in breast cancer cells versus
   normal cells, with statistical significance marked as *P-value < 0.05,
   **P-value < 0.01, ***P-value < 0.001;'ns' for no significant difference

GSEA analysis

   According to the GSEA results, the high expression group of GRHL2
   primarily focuses on African trypanosomiasis and allograft rejection.
   The high expression group of MT1M mainly concentrates on asthma and
   axon guidance. Lastly, the high expression group of PKM primarily
   centers around 2-oxocarboxylic acid metabolism and collecting duct acid
   secretion (Fig. [117]6B).

Correlation of hub genes with immune infiltration in breast cancer

   This study utilized the TIMER database to investigate the correlation
   between the expression of three central genes and immune cell
   infiltration in breast cancer (Figure [118]S4). Expression of GRHL2
   showed a positive correlation with the infiltration of CD8 + T cells,
   macrophages, and neutrophils in breast cancer. PKM2 expression was
   positively correlated with the infiltration of B cells, neutrophils,
   and dendritic cells, but negatively correlated with the infiltration of
   CD4 + T cells in breast cancer. Moreover, infiltration of CD8 + T
   cells, CD4 + T cells, neutrophils, and dendritic cells showed a
   positive correlation with the expression of MT1M in breast cancer
   patients.

RT-PCR and immunohistochemistry analysis

   PCR analysis revealed elevated PKM expression and reduced MT1M and
   GRHL2 expressions in MDA-MB-231 breast cancer cells compared to normal
   MCF-10 A cells (Fig. [119]6C). Immunohistochemical (IHC) analyses
   demonstrated high PKM and GRHL2 expressions in cancerous tissues, with
   MT1M primarily found in adjacent non-cancerous tissues (Fig. [120]7).
   Consistently, the CCLE database indicated widespread upregulation of
   PKM and variable expressions of MT1M and GRHL2 across various breast
   cancer cell lines (Figure S5).

Fig. 7.

   [121]Fig. 7
   [122]Open in a new tab

   Immunohistochemical Staining Results and Statistical Analysis of Key
   Genes in Infiltrating Breast Cancer. A Immunohistochemical images and
   local magnification of infiltrating breast cancer and adjacent
   non-cancerous tissues. B Statistical results of gene expression in
   infiltrating breast cancer and adjacent non-cancerous tissues

Regulatory network construction

   To construct the regulatory network, we predicted miRNAs and
   transcription factors (TFs) on MT1M, GRHL2, PKM. The visualization of
   the regulatory network was achieved using Cytoscape (version 3.7.1).
   Utilizing the JASPAR database, we identified several transcription
   factors (TFs). Among these, 1 transcription factor had a degree (node
   connectivity) of ≥ 2 (Table [123]S2). Potential miRNAs were predicted
   using the TarBase database, resulting in 9 miRNAs with a degree of ≥ 2
   (Figure S6).

Drug sensitivity analysis

   This study evaluated the association between three genes (MT1M, GRHL2,
   PKM) and the sensitivity to specific drugs (aloin, AZD-9496,
   GSK-2126458). The results showed that the drug activity Z-scores of the
   high-expression groups for MT1M and GRHL2 were significantly higher
   than those of the low-expression groups, and they were significantly
   positively correlated with drug sensitivity (Cor = 0.609,
   P-value < 0.0001; Cor = 0.563, P-value < 0.00001). The correlation for
   PKM was weaker (Cor = 0.395, P = 0.0018), suggesting a potential
   association between the expression of these copper-related genes and
   drug sensitivity in breast cancer. However, since no experimental
   validation or molecular docking was conducted, these observations
   should be considered preliminary and hypothesis-generating, warranting
   further experimental investigation (Figure S7).

Discussion

   Breast cancer is a common and dangerous tumor affecting women. A deep
   understanding of its pathogenesis is essential for better prognosis and
   new treatment pathways. The use of machine learning algorithms and
   other bioinformatics methods to identify biomarkers of disease has been
   employed in various illnesses, such as osteoarthritis [[124]36], breast
   cancer [[125]37, [126]38]and gastric cancer [[127]39]. “Copper death,”
   a cellular mortality mechanism instigated by copper, has emerged as a
   potential new target for cancer treatment. Additionally, immune cell
   infiltration is linked to various responses to breast cancer prognosis
   and treatment [[128]40]. Therefore, it is rational to explore the
   relationship between Copper death and breast cancer using machine
   learning and other techniques. The aim of this study is to identify
   potential Copper death-related genes in breast cancer and to explore
   the roles and mechanisms of these genes in immune infiltration within
   breast cancer tissues, offering new insights and directions for the
   potential mechanisms and early diagnosis of breast cancer.

   In this study, we retrieved 85 disease-associated differentially
   expressed gene clusters (DEGCUs) from the transcriptomic dataset GSE
   42,568 in the GEO database. Through GO and KEGG enrichment analyses, we
   found that these genes are primarily enriched in the “proteoglycans in
   cancer” pathway. Additionally, employing machine learning, ROC
   analysis, and survival curve analysis, we identified that only MT1M,
   GRHL2, and PKM show differential expression in breast cancer and
   possess significant diagnostic and prognostic value. These genes are
   not only statistically significant but also play crucial roles in the
   biological processes of the disease, demonstrating strong potential for
   clinical application. We conducted experimental validations using
   qRT-PCR and immunohistochemistry (IHC), and the results were largely
   consistent with our bioinformatics findings, further reinforcing the
   accuracy of our research model. Importantly, we discovered that changes
   in the expression of these genes not only affect the infiltration of
   immune cells but also alter the sensitivity to certain specific drugs.
   This suggests that these genes may influence the progression and
   treatment response of breast cancer by modulating the immune
   microenvironment and drug response pathways.

   Through these analyses and validations, we have not only confirmed the
   biological significance of these genes but also highlighted their
   potential diagnostic and therapeutic value in clinical applications,
   providing a scientific basis for future clinical trials and treatment
   strategies.

   MT1M is a member of the metallothionein family and is also highly
   involved in cancer, demonstrating inhibitory effects [[129]41–[130]48].
   MT1M has been shown to regulate cell apoptosis, proliferation,
   migration, invasion, and epithelial-mesenchymal transition (EMT)
   through its impact on signaling pathways such as Bcl 2, Bax, Caspase-3,
   and the NF-kB pathway [[131]42, [132]45].Therefore, MT1M can function
   as a tumor suppressor. In this study, we found that MT1M might be
   associated with Copper death and immune evasion. Moreover, MT1M
   expression is reduced in breast cancer patient tissues and cells, and
   it possesses significant diagnostic accuracy (with AUC > 0.90).
   Survival analysis indicates that high expression of MT1M positively
   correlates with patient prognosis, suggesting that MT1M might serve as
   a diagnostic and prognostic biomarker for breast cancer. This is
   consistent with previous research findings.

   GRHL2 belongs to the Grainyhead-like (GRHL) family [[133]49, [134]50].
   It is associated with cancer development. Some studies suggest that
   GRHL2 has tumor-suppressing effects, countering epithelial-mesenchymal
   transition by upregulating epithelial markers or downregulating
   mesenchymal markers [[135]51–[136]53]. However, other research
   indicates that GRHL2 might enhance proliferation, replication
   potential, and evade cell apoptosis by activating the ErbB3 gene and
   inhibiting the expression of apoptotic receptors, suggesting it may
   also have oncogenic functions [[137]54–[138]57]. In our study, data
   retrieved from the TCGA + GTEx database and IHC indicates that GRHL2 is
   highly expressed in breast cancer tissues. High expression of GRHL2
   negatively correlates with patient prognosis. However, cell experiments
   revealed that the expression of GRHL2 in breast epithelial cells,
   MCF10A, is higher than in the breast cancer cell line, MDA-MB-231.
   Thus, we cannot exclude the possibility that its expression in cancer
   tissues might be elevated in cells other than cancer cells, including
   immune cells. This elevated expression in immune cells also seems to
   potentially influence prognosis.

   PKM is the rate-limiting enzyme in the final step of aerobic glycolysis
   [[139]58, [140]59]. participates in the regulation of various
   biological processes, including cell proliferation, differentiation,
   migration, inflammation, signal transduction, and synthetic and
   catabolic pathways [[141]60–[142]62]. Knocking out PKM can inhibit the
   proliferation of several different types of tumor cells, leading to
   their apoptosis [[143]61–[144]63]. Our findings align with prior
   research, showing elevated PKM expression in breast cancer with
   negative OS correlation. Moreover. Aerobic glycolysis, as a novel
   therapeutic target in cancer, including in pancreatic cancer [[145]64],
   and liver cancer [[146]65], consistent with our GSEA results. However,
   previous research has never explored its function and significance in
   Copper death and immunotherapy. We discovered that this gene occupies a
   central position in the Copper death-related GGI analysis and is
   enriched with the most relevant functions. Therefore, the association
   between aerobic glycolysis and Copper death might be an innovative
   aspect worth exploring in the future.

   The aforementioned three pivotal genes are interconnected, playing a
   crucial role in Copper death and immunity within the context of breast
   cancer. Utilizing both WCGNA and cutting-edge machine learning
   methodologies, this research delved into an in-depth analysis of these
   genes within tumor and immune cells. The findings present a more
   holistic and trustworthy perspective compared to conventional
   methodologies. Future research endeavors will be constructed upon these
   three genes.

   Our study conducted a systematic analysis using bioinformatics
   approaches; however, we acknowledge several important limitations.
   First, the research relied heavily on publicly available transcriptomic
   datasets, and the functional roles of the identified hub genes (such as
   MT1M, GRHL2, and PKM) require further validation through in vitro and
   in vivo experiments to enhance the biological interpretability and
   translational potential of our findings. Second, in applying machine
   learning methods such as LASSO and SVM-RFE, we did not perform
   systematic hyperparameter tuning (e.g., grid search or random search
   for kernel type, C value, and gamma in SVM). Instead, default
   parameters were used. Although 10-fold cross-validation was
   incorporated into the SVM-RFE procedure to enhance the stability of
   feature selection, mimicking the behavior of RFECV, this manual
   implementation still presents limitations in terms of model robustness.
   Given that the performance of SVM is highly dependent on kernel
   functions and parameter settings, the absence of comprehensive
   optimization may introduce potential bias into the results. Finally, in
   the qRT-PCR validation, we used the 2^(-ΔΔCt) method to calculate
   relative expression levels, but did not experimentally validate the
   amplification efficiencies of the target and reference (GAPDH) genes.
   This method assumes comparable and near-100% amplification efficiencies
   between genes. Although previous studies suggest that differences in
   commonly used primers are usually minor, this unverified assumption may
   still introduce slight deviations. Moreover, while GAPDH is a widely
   used reference gene, its expression stability across experimental
   groups was not statistically assessed in our study, which may represent
   another source of variability. These methodological limitations suggest
   that future studies should incorporate experimental validation of
   primer efficiency, use multiple stable reference genes, and adopt
   comprehensive hyperparameter optimization strategies in machine
   learning analyses to improve the robustness and reproducibility of
   results.

Conclusions

   This study accurately identified and recognized MT1M, GRHL2, and
   PKM—three key genes closely associated with Copper death and immune
   response in breast cancer. These genes play a crucial role in the
   progression of breast cancer and, as potential tumor biomarkers, hold
   significant importance for clinical translation.

Supplementary Information

   Below is the link to the electronic supplementary material.
   [147]Supplementary Material 1^ (106.7MB, docx)
   [148]Supplementary Material 2^ (94.4KB, xlsx)

Acknowledgements