Abstract

Background

   Metabolic reprogramming and immune evasion synergistically drive breast
   carcinogenesis, but their combined impact remains unclear.

Methods

   Transcriptomic data from the TCGA and GEO cohorts were integrated.
   Differentially expressed genes were identified, followed by WGCNA to
   detect immune-correlated co-expression modules.
   Immune-metabolism-related genes (IMGs) were screened using Genecards.
   Four machine learning algorithms (LASSO, SVM, RF, XGBoost) identified
   hub genes. The diagnostic value was evaluated by Kaplan-Meier and ROC
   analysis. CIBERSORT quantified immune microenvironment associations.
   The expression profiles of genes in different cells were plotted using
   single-cell RNA data. IHC validated protein expression in clinical
   samples.

Results

   Research has found that SELENOP and PKMYT1 are key immune metabolic
   hubs. Compared with normal tissues, the expression of SELENOP was
   significantly decreased (p < 0.05), while PKMYT1 showed an upward trend
   (p < 0.05). Both of these genes have demonstrated high accuracy in the
   diagnosis of breast cancer and can effectively predict the overall
   survival period of patients. Low SELENOP expression is associated with
   high PKMYT1 expression levels, which is significantly related to
   changes in immune infiltration and the expression patterns of
   checkpoint proteins. Immunohistochemical detection further confirmed
   that these genes were significantly correlated with histological grade,
   LAG-3, CD244, ER, PR and Her-2 and other indicators (p < 0.05).

Conclusion

   SELENOP and PKMYT1 are novel immunomodulatory factors related to
   multiple pathological indicators of breast cancer and can be used as
   diagnostic biomarkers.

   Keywords: Breast cancer, WGCNA, Biomarkers, Machine learning

Highlights

     * •
       SELENOP/PKMYT1 was identified as an immune-metabolic center gene
       through comprehensive bioinformatics.
     * •
       The expression and survival impact of key genes were verified using
       dual datasets.
     * •
       The IHC of clinical samples confirmed the diagnostic and prognostic
       value.

1. Introduction

   Breast cancer (BC) is the most commonly diagnosed malignancy in women
   [[41]1], with immune evasion and metabolic reprogramming being two
   major characteristics of the disease [[42]2]. While the immune system
   eliminates transformed cells through multiple effector mechanisms
   (including cytotoxicity, phagocytosis, and cytokine-mediated
   surveillance) [[43]3], BC cells disrupt immune surveillance by
   overexpressing immunosuppressive ligands such as PD-L1. This process
   induces apoptosis and functional exhaustion of tumor-specific CD8^+ T
   cells, thereby ultimately leading to their impairment [[44]4]. Emerging
   evidence suggests that there is complex crosstalk between immune
   dysfunction and metabolic dysregulation in BC progression [[45]5]. As
   tumors develop, cancer cells reprogram their metabolism in response to
   the nutrient-poor tumor microenvironment, utilizing metabolic
   reprogramming through glycolysis, oxidative phosphorylation, amino acid
   metabolism, lipid metabolism, and other metabolic pathways to meet
   bioenergetic and biosynthetic demands [[46]6,[47]7].

   Given the core role of immune evasion and metabolic reprogramming in
   the progression of breast cancer and their complex interactions,
   understanding how metabolic reprogramming promotes cancer progression
   by influencing the immune system is of significant importance in
   unraveling the complex mechanisms of cancer. To identify the key
   molecules mediating the interaction between immune evasion and
   metabolic reprogramming in breast cancer, this study utilized TCGA and
   GEO data, applying WGCNA and machine learning algorithms to screen out
   SELENOP and PKMYT1 genes closely related to immunity and metabolism, as
   shown in [48]Fig. 1. SELENOP, primarily a selenium transporter and
   antioxidant [[49]8], is implicated in modulating inflammatory responses
   and immune cell activity within the tumor microenvironment, linking
   selenium/redox metabolism to immune regulation [[50]9,[51]10]. PKMYT1,
   a critical cell cycle regulator at the G2/M checkpoint [[52]11], has
   emerging but less defined roles beyond proliferation, with potential
   intersections in metabolic signaling and tumor microenvironment
   influence. However, their specific contributions to the interplay
   between immune evasion and metabolic reprogramming in breast cancer,
   and the resulting prognostic significance, remain poorly understood.

Fig. 1.

   [53]Fig. 1
   [54]Open in a new tab

   Research flowchart.

2. Materials and methods

2.1. Cancer data sources

   The [55]GSE58135 and [56]GSE161529 datasets were downloaded from GEO
   ([57]https://www.ncbi.nlm.nih.gov/geo/). [58]GSE58135 ([59]GPL11154
   platform) contains 168 breast tissue samples. The scRNA-seq data under
   accession number [60]GSE161529
   was selected, including 28 samples: 4 from TNBC, 6 from HER + tumors,
   and 18 from ER + tumors. Additionally, 23,931 metabolism-related genes
   were downloaded from the Genecardss database.

2.2. Identification of DEGs

   DEG analysis was performed on [61]GSE58135 using the “limma” R package
   with thresholds of |log2 fold change| > 3 and false discovery rate
   (FDR) < 0.01, resulting in 1026 significant DEGs. Visualization was
   performed using the 'ggplot2' package, with a volcano plot generated to
   illustrate the results.

2.3. WGCNA screening for immune-related genes

   Weighted Gene Co-Expression Network Analysis (WGCNA) is a systems
   biology approach used to identify co-expressed gene modules and explore
   the relationship between these modules and phenotypic traits [[62]12].
   This module-based approach efficiently identifies genes acting in
   concert with immune processes, avoiding limitations of single-gene
   analysis. First, we constructed a gene co-expression network for breast
   cancer tissue samples using the WGCNA R package, which was then divided
   into different modules. Next, we used the CIBERSORT tool to obtain the
   immune cell distribution for each sample. Finally, we calculated the
   correlation between the module eigengenes (ME) and phenotypic
   variables, selecting the module with the strongest correlation to
   immune scores.

2.4. Screening hub genes using four machine learning methods

   In this study, four machine learning algorithms were used to identify
   immune-metabolic hub genes biomarkers for breast cancer, including
   LASSO, SVM, RF, and XGBoost. LASSO is a data mining technique that
   applies an absolute value penalty to regression coefficients, shrinking
   some coefficients to zero and eliminating less important features
   [[63]13]. SVM is a supervised learning algorithm that identifies
   important disease-associated genes by analyzing support vectors and
   decision boundaries [[64]14]. RF is an ensemble learning method that
   evaluates the importance of each gene in classification, helping to
   identify genes associated with the disease [[65]15]. XGBoost is a
   gradient boosting-based ensemble learning method that selects important
   genes based on feature importance scores [[66]16]. Compared to
   conventional differential expression analysis, these machine learning
   algorithms excel at identifying key biomarkers by capturing complex,
   non-linear interactions and synergistic effects among genes, rather
   than relying solely on individual gene expression changes. The
   overlapping genes identified by these four algorithms were considered
   as candidate gene biomarkers.

2.5. scRNA-seq data processing and cell type identification

   First, 28 breast cancer samples were included, with a total of
   186,113 cells. The criteria for cell inclusion were set to include
   genes expressed in 500 to 5000 genes per cell, and the maximum
   percentage of mitochondrial genes was set at 15 %. After filtering,
   87,597 cells were retained. The data from all 28 samples were then
   log-normalized. The FindVariableFeatures function was used to identify
   the top 2000 highly variable genes. Subsequently, cells were clustered
   using the FindNeighbors and FindClusters functions, which resulted in
   12 distinct clusters. Gene annotations for the clusters were gathered
   by reviewing the literature. The clusters were then categorized into
   different cell types based on these annotations. This single-cell
   method can precisely identify gene expression patterns within specific
   cell subpopulations, providing more specific resolution compared to
   conventional tissue analysis. Finally, the expression of hub genes
   across different tissues was analyzed to evaluate their relevance and
   potential as biomarkers for breast cancer.

2.6. ROC analysis and survival analysis

   Differential analysis of SELENOP and PKMYT1 was performed in the
   [67]GSE58135 dataset, and validation was carried out in the TCGA breast
   cancer dataset. Additionally, receiver operating characteristic (ROC)
   curve analysis demonstrated the diagnostic accuracy of SELENOP and
   PKMYT1 in distinguishing breast cancer tissues from normal tissues.
   Survival analysis for breast cancer patients was conducted using the
   Kaplan-Meier Plotter([68]http://kmplot.com/analysis/) [[69]17]. The
   Kaplan-Meier curve was used to assess the overall survival based on the
   expression levels of SELENOP and PKMYT1, providing insight into their
   prognostic value in breast cancer.

2.7. Immune infiltration analysis

   CIBERSORT, an algorithm that estimates the relative abundance of immune
   cell types by evaluating subsets of RNA transcripts, was used to
   analyze immune cell infiltration [[70]18]. This method uses a
   predefined reference gene expression matrix to infer the abundance of
   immune cell populations. A total of 22 immune cell types were
   evaluated, and the proportion and differential changes of immune cells
   between the tumor and normal groups were thoroughly analyzed and
   visualized. This approach provides a standardized and comprehensive
   view of the complex immune landscape within breast cancer tissues. The
   correlation between SELENOP and PKMYT1 and the 22 immune cell types was
   calculated.

2.8. Enrichment analysis

   Pathway enrichment analysis was performed using gene set enrichment
   analysis (GSEA) to identify biological processes associated with
   SELENOP and PKMYT1 expression [[71]19]. The tumor samples were divided
   into high expression group and low expression group according to the
   median expression level. Genes were ranked by log2 fold change between
   groups, and GSEA was performed against GO biological process terms
   (MSigDB c5.go.v7.4) with 1000 permutations. All analyses utilized the
   clusterProfiler R package.

2.9. Tissue samples

   A total of 60 paraffin-embedded tissue samples from breast cancer
   patients were included in this study (The First Affiliated Hospital of
   Bengbu Medical University). The study was approved by the Ethics
   Committee of Bengbu Medical University, and informed consent was
   obtained from all patients.

2.10. Immunohistochemistry (IHC)

   Immunohistochemistry was performed using the standard procedure. Tissue
   sections were prepared and underwent antigen retrieval and hydrogen
   peroxide blocking steps. Primary antibodies were applied for 1 h at
   37 °C, followed by PBS washing to remove unbound antibodies. Secondary
   antibodies were added and incubated for 30 min at 37 °C, followed by
   PBS washing. The sections were then stained with DAB (3 min), and the
   reaction was terminated after the color developed. Hematoxylin was used
   for counterstaining (1 min), followed by rinsing with tap water. The
   slides were dehydrated through successive ethanol solutions (75 %,
   85 %, 95 %, and 100 %) for 3 min each. Neutral balsam was applied to
   mount the slides, and a coverslip was placed to avoid air bubbles.

   All IHC slides were independently scored by two breast pathologists
   with >5 years of specialization experience, blinded to clinical data.
   Discordant cases (score difference ≥2 points) underwent consensus
   review using a multi-head microscope to finalize scores. The staining
   results were observed under a light microscope. Images were captured
   and analyzed. The staining intensity was scored as follows: no color (0
   points), pale yellow (1 point), brown-yellow (2 points), and brown (3
   points). The percentage of positive staining was scored as 0
   (negative), 1 (1–25 %), 2 (26–50 %), 3 (51–75 %), and 4 (76–100 %). The
   total score was the product of the “staining intensity score” and the
   “positive staining rate score."

2.11. Statistical analysis

   Data analysis was performed using R software (version 4.2.1). Pearson
   correlation analysis was conducted to explore the relationship between
   hub genes and clinical characteristics. A p-value <0.05 was considered
   statistically significant for all analyses.

3. Results

3.1. GenesSelection of immune- and metabolism-related genes

   To identify immune-related genes, we performed WGCNA on gene expression
   data from BRCA patients to construct a gene co-expression network.
   Initially, Pearson correlation coefficients were calculated for all
   genes to build the adjacency matrix and Topological Overlap Matrix
   (TOM). The optimal soft threshold for the scale-free topology model was
   determined to be 6 ([72]Fig. 2A). A total of 39,376 genes were divided
   into 13 modules([73]Fig. 2B). Next, we used the CIBERSORT package to
   estimate immune cell scores for each sample and computed the
   correlation between module eigengenes (ME) and immune cell scores.
   Among the immune-related feature genes, the yellow module exhibited the
   strongest correlation with immune cells (r = 0.97) ([74]Fig. 2C). We
   then used volcano maps to show the distribution of differences in gene
   expression between tumors and normal tissues. A threshold of |log2FC| ≥
   3 and p-value <0.05 was applied to filter genes, resulting in 1026
   genes being selected from the [75]GSE58135 dataset. The top 10
   up-regulated genes and the top 10 down-regulated genes are
   displayed([76]Fig. 2D). Subsequently, we extracted 2831 immune-related
   genes from the yellow module in the WGCNA analysis and retrieved 23,931
   metabolism-related genes from the Genecards database. By taking the
   intersection of these three gene sets, we ultimately identified 51
   immune-metabolic genes (IMGs) ([77]Fig. 2E).

Fig. 2.

   [78]Fig. 2
   [79]Open in a new tab

   Acquisition of IMGs (A) Scatter plot of the scale-free topology fitting
   index and the mean connectivity plot for determining the optimal soft
   threshold. (B) Hierarchical clustering dendrogram based on the
   Topological Overlap Matrix (TOM), illustrating the gene modules. (C)
   Heatmap showing the correlation between gene modules and immune cell
   scores. (D) Volcano plot of differentially expressed genes, where 701
   genes are upregulated and 325 genes are downregulated. (E) The Venn
   diagram illustrates the intersection of immune-related genes,
   metabolism-related genes, and genes screened in the “DESeq2″ R package,
   resulting in the identification of 51 immune-metabolic genes (IMGs).

3.2. B gene selection through machine learning

   First, LASSO was applied to filter out 7 genes from the
   immune-metabolic genes (IMGs)([80]Fig. 3A-B). Next, the SVM algorithm
   was used to analyze IMGs, identifying 12 significant genes([81]Fig.
   3C-D). Subsequently, RF analysis was performed, identifying 11 key
   genes([82]Fig. 3E-F). The XGBoost algorithm ranked genes based on their
   importance scores, and we visualized the top 12 feature genes
   identified by this method([83]Fig. 3G). Finally, a Venn diagram was
   used to analyze the intersection of the gene sets selected by LASSO,
   SVM, RF, and XGBoost. The intersection revealed two genes: SELENOP and
   PKMYT1([84]Fig. 3H).

Fig. 3.

   [85]Fig. 3
   [86]Open in a new tab

   Central genes were screened by machine learning. (A–B) The LASSO
   algorithm. (C–D) The SVM algorithm. (E–F) The RF algorithm. (G) The
   XGBoost algorithm. (H) The Venn diagram of the four algorithms.

3.3. Expression of hub genes and validation with external datasets

   Differential expression analysis was performed to validate the
   expression of the hub genes (SELENOP and PKMYT1) in external datasets.
   In the [87]GSE58135 dataset, SELENOP was found to be downregulated,
   while PKMYT1 was upregulated compared to normal control
   samples([88]Fig. 4A-B). Next, we validated these findings using the
   TCGA breast cancer dataset. The expression patterns of SELENOP and
   PKMYT1 in TCGA were consistent with those in the [89]GSE58135 dataset,
   and both genes exhibited statistically significant differences
   (p < 0.05)([90]Fig. 4C-D). To assess the diagnostic potential of these
   genes, ROC analysis was performed. In the [91]GSE58135 dataset, the AUC
   for SELENOP and PKMYT1 were 0.990 and 0.948, respectively([92]Fig.
   4E-F). In the TCGA dataset, SELENOP and PKMYT1 had AUC values of 0.919
   and 0.979, respectively([93]Fig. 4G-H), suggesting strong diagnostic
   potential for both genes. Finally, survival analysis was conducted
   using the Kaplan-Meier Plotter tool ([94]https://kmplot.com/), and the
   survival curves for SELENOP and PKMYT1 were plotted. The results
   indicated that high expression of PKMYT1 was associated with poorer
   prognosis, while high expression of SELENOP was correlated with better
   survival outcomes([95]Fig. 4I-J). This further supports the potential
   of these genes as prognostic markers in breast cancer.

Fig. 4.

   [96]Fig. 4
   [97]Open in a new tab

   Validation using TCGA Data (A–B) Differential expression of SELENOP and
   PKMYT1 in the [98]GSE58135 dataset. (C–D) Validation of differential
   expression for both genes in TCGA cohort. (E–F) Diagnostic potential of
   SELENOP and PKMYT1 assessed by ROC analysis. (G–H) ROC analysis of
   SELENOP and PKMYT1 in the TCGA dataset. (I–J) Survival analysis of
   SELENOP and PKMYT1.

3.4. GSEA between high-risk and low-risk groups

   To analyze the impact of the high-expression and low-expression groups
   on cancer progression, we performed Gene Set Enrichment Analysis (GSEA)
   to identify the most significantly enriched pathways between the two
   groups. The GSEA results showed that the SELENOP high-expression group
   was highly enriched in pathways related to ABC transporters, the MTOR
   signaling pathway, and the coagulation cascade([99]Fig. 5A). On the
   other hand, the PKMYT1 high-expression group was highly enriched in
   pathways related to DNA replication, the cell cycle, spliceosome, and
   mismatch repair([100]Fig. 5B). Next, the “ESTIMATE” package was used to
   compare the ESTIMATE scores between the two groups. As shown in the
   figure, the SELENOP high-expression group was positively correlated
   with the ESTIMATE score, while the PKMYT1 high-expression group showed
   a negative correlation with the ESTIMATE score([101]Fig. 5C-D).

Fig. 5.

   [102]Fig. 5
   [103]Open in a new tab

   GSEA Analysis and Correlation with ESTIMATE Scores (A–B) GSEA based on
   the expression levels of SELENOP and PKMYT1. (C–D) Distribution of
   ESTIMATE scores based on the expression levels of SELENOP and PKMYT1.

3.5. Immune infiltration analysis

   The interaction between tumor development and immune cells is
   inseparable. To further investigate the relationship between immune
   cell infiltration and SELENOP and PKMYT1, we used the “CIBERSORT” tool
   in R to score immune cells in the TCGA dataset. Based on the expression
   levels of SELENOP and PKMYT1, we divided the samples into
   high-expression and low-expression groups and analyzed their
   correlation with various immune cells. The heatmap of 22 immune cell
   types showed that SELENOP and PKMYT1 genes are correlated with multiple
   immune cells([104]Fig. 6A-B). We further analyzed the relationship
   between immune cells, immune checkpoints, and SELENOP and PKMYT1. The
   results indicated that FAS, CD28, CD40LG, CD4, CD244, BTLA, CD8A, CD48,
   CD274, and CD8B were positively correlated with the SELENOP index,
   while LAG3 and IDO1 were negatively correlated with the SELENOP
   index([105]Fig. 6C). On the other hand, LAG3, CD80, PDCD1, IDO1, CTLA4,
   ICOSLG, and ICOS were positively correlated with the PKMYT1 index,
   while FAS, CD40LG, TIGIT, CD28, CD244 were negatively correlated with
   the PKMYT1 index([106]Fig. 6D). Subsequently, we found that the
   expressions of CD244 and LAG3 were closely related to the expressions
   of SELENOP and PKMYT1, and the differences were significant between the
   high-expression and low-expression groups([107]Fig. 6E-H). This
   analysis suggests that both SELENOP and PKMYT1 are involved in the
   immune microenvironment, influencing the immune cell landscape and
   immune checkpoint interactions.

Fig. 6.

   [108]Fig. 6
   [109]Open in a new tab

   Analysis of Immune Cell Infiltration and its Relationship with SELENOP
   and PKMYT1 (A–B) Correlation analysis of SELENOP and PKMYT1 expression
   with 22 immune cell types. (C–D) Relationships between SELENOP/PKMYT1
   and immune therapy related molecules visualized by chord diagrams.
   (E–H) Differential expression of key genes in CD244/LAG-3 high-versus
   low-expression groups across TCGA and GEO cohorts.

3.6. Single-cell analysis of BRCA

   We selected 28 single-cell RNA sequencing samples from the
   [110]GSE161529 dataset. First, we filtered out low-quality cells based
   on quality control standards (with a mitochondrial gene proportion of
   15 %), ultimately obtaining 87,597 high-quality cells for subsequent
   analysis. Then, we identified 2000 highly variable genes for downstream
   dimensionality reduction([111]Fig. 7A). In addition, we selected 9
   principal components (PCS) based on ElbowPlot and PC Heatmap, and
   divided the samples into 12 cell clusters([112]Fig. 7B-C). Use the
   clustering heat map to display the top 5 marker genes of each
   subgroup([113]Fig. 7D). Using the “SingleR” package, we annotated the
   cell clusters based on marker genes and identified six cell types:
   epithelial cells, endothelial cells, fibroblasts, myeloid cells, T
   cells, and B cells([114]Fig. 7E). The expression of key marker genes
   for each cell type was visualized using a bubble plot([115]Fig. 7F).
   SELENOP was primarily expressed in T cells and B cells, while the
   PKMYT1 gene was mainly expressed in endothelial cells, T cells, and B
   cells([116]Fig. 7H-I).

Fig. 7.

   [117]Fig. 7
   [118]Open in a new tab

   Identification of Six Distinct Annotated Cell Clusters Based on
   Single-Cell RNA-seq Data (A) Scatter plot of the top 2000 highly
   variable genes. (B) ElbowPlot of 28 samples. (C) PC Heatmap of the top
   20 PCs. (D) The five marker genes with the highest average logFC in
   each subgroup. (E) Twelve cell clusters were identified as six types of
   cells. (F) Based on the composition of marker genes, the 12 cell
   clusters were identified as six distinct cell types. (H–I) Expression
   levels of SELENOP and PKMYT1 in each cell cluster.

3.7. Immunohistochemical analysis

   We collected breast cancer tissue samples from 60 volunteers and
   performed IHC experiments to validate the expression of SELENOP, PKMYT1
   and immune checkpoints in BRCA tissue. Considering the expression
   distribution characteristics of SELENOP and PKMYT1, we chose the median
   value to divide the high and low expression groups to ensure group
   balance and statistical power. We found significant differences in the
   expression of SELENOP, PKMYT1, LAG3 and CD244 between the cancer
   tissues and adjacent cancer tissues. SELENOP and CD244 were highly
   expressed in the adjacent cancer tissues, while PKMYT1 and LAG3 were
   highly expressed in the cancer tissues([119]Fig. 8A-B). Subsequently, a
   correlation analysis of these genes was performed. The results showed
   that SELENOP was negatively correlated with PKMYT1, LAG-3, ER, PR,
   HER-2 and Ki-67, while it was positively correlated with CD244. The
   results showed that PKMYT1 was negatively correlated with SELENOP,
   CD244, HER-2 and Ki-67, while it was positively correlated with
   LAG-3,ER and PR([120]Fig. 8C). The expression of SELENOP was not
   statistically significant concerning patient age (p > 0.05); however,
   differences in histological grade, lymph node metastasis, LAG-3, CD244,
   estrogen receptors, progesterone receptors, and Her-2 expression were
   statistically significant (p < 0.05) ([121]Table 1). The expression of
   PKMYT1 was not statistically significant concerning patient age, and
   lymph node metastasis (p > 0.05); however, differences in histological
   grade, LAG-3, CD244, estrogen receptors, progesterone receptors, and
   Her-2 expression were statistically significant (p < 0.05) ([122]Table
   1).

Fig. 8.

   [123]Fig. 8
   [124]Open in a new tab

   IHC Validation (A) Representative IHC images of SELENOP, PKMYT1, LAG-3,
   and CD244. (B) Analysis of expression differences of SELENOP, PKMYT1,
   LAG-3 and CD244 in adjacent tissues and cancer. (C) Correlation
   analysis between SELENOP, PKMYT1, LAG-3, CD244, ER, PR, HER-2 and
   Ki-67.

Table 1.

   Correlation between the expression of SELENOP and PKMYT1 and
   clinico-pathological parameters.
   Variable n SELENOP (n)
     __________________________________________________________________

   P value PKMYT1 (n)
     __________________________________________________________________

   P value
   High expression Low expression High expression Low expression
   Age (years)
    ≤50 22 12 10 0.762 10 12 0.348
    >50 38 18 20 20 18
   Grade
    Ⅰ∼Ⅱ 47 29 18 0.033 18 29 <0.001
    Ⅲ 13 1 12 12 1
   Lymph node
    0∼3 42 24 18 0.018 20 22 0.207
    >3 18 6 12 10 8
   LAG-3
    0∼1 35 23 12 0.039 11 24 0.007
    >1 25 7 18 19 6
   CD244
    0∼1.5 26 5 21 <0.001 19 8 0.017
    >1.5 34 25 9 11 23
   Estrogen receptor
     22 15 7 0.001 12 10 0.036
    + 38 15 23 18 20
   Progesterone
     34 20 14 0.002 16 18 <0.001
    + 26 10 16 14 12
   Her-2
     13 9 4 0.034 2 11 0.045
    + 47 21 26 28 19
   [125]Open in a new tab

4. Discussion

   The occurrence of breast cancer is the result of the combined effects
   of various factors, including genetics, environment, hormones, and
   lifestyle, and the specific molecular mechanisms are still under
   investigation [[126]20]. Studies have shown that the progression of
   breast cancer is closely related to immune system responses and
   metabolic regulation, both of which play important roles in tumor
   growth and metastasis [[127]21]. Several studies using WGCNA have
   revealed key genes and their potential molecular mechanisms in breast
   cancer, providing new insights into the pathophysiological processes of
   the disease [[128]22]. After the preliminary identification of
   differentially expressed genes (DEGs), we combined WGCNA and machine
   learning techniques to further screen potential biomarkers. We
   identified two immune-metabolic hub genes, SELENOP and PKMYT1, and
   validated them using the TCGA dataset. We performed time-dependent ROC
   curve analysis to evaluate the diagnostic value of these hub genes.

   Selenoprotein P (SELENOP) is a protein that contains multiple
   selenium-cysteine residues and is primarily responsible for selenium
   transport in the body. It also plays a crucial role in antioxidant
   defense [[129]23]. Numerous experimental studies have shown that
   selenium can inhibit the growth and proliferation of cancer cells
   through various mechanisms, including the regulation of oxidative
   stress and the modulation of signaling pathways [[130]24,[131]25]. In
   breast cancer research, selenium has been found to enhance the
   anticancer effect of mouse breast cancer 4T1 cells by regulating the
   signaling pathways of androgen receptor (AR), insulin-like growth
   factor-1 receptor (IGF-1R), epidermal growth factor receptor (EGFR),
   and calcium-binding protein TROP2 [[132]26]. Additionally, another
   study indicated that methyl selenite significantly inhibits breast
   cancer growth by suppressing the JAK2/STAT3 signaling pathway
   [[133]27]. We found that in the high SELENOP expression group, the
   PI3K/AKT/mTOR signaling pathway was upregulated. This may be consistent
   with SELENOP's role in protecting cells from oxidative damage by
   neutralizing free radicals, thus reducing cancer risk.

   PKMYT1 is a membrane-associated protein kinase that regulates the
   progression of the cell cycle mainly by phosphorylating specific sites
   on cyclin-dependent kinase 1 (CDK1) [[134]28]. It is generally believed
   that PKMYT1 inhibits CDK1 activity by phosphorylating Thr14 and Tyr15
   of CDK1, preventing cells from entering mitosis (M phase) [[135]29]. A
   proteogenomic analysis based on patient-derived xenografts (PDX) from
   22 ER + breast cancer patients suggested that PKMYT1 is a WEE1 homolog
   [[136]30]. WEE1 can phosphorylate Tyr15 of CDK1 and is involved in
   regulating the activity of CDK1 and CDK2 [[137]29]. In tumor cells, the
   mutation rate of the tumor suppressor gene p53 is relatively high, with
   mutations found in over 50 % of tumor cells [[138]31]. Tumor cells with
   p53 dysfunction exhibit inactivation of the G1/S checkpoint [[139]32],
   making DNA repair largely dependent on the G2/M checkpoint. At this
   point, inhibiting WEE1 further inactivates the G2/M checkpoint
   [[140]33], driving cells to prematurely enter mitosis, resulting in
   mitotic catastrophe and causing a synthetic lethality effect (SL)
   [[141]34]. This finding is consistent with our functional enrichment
   analysis, which showed that PKMYT1 plays a key role in cell cycle
   regulation and DNA damage repair. In this study, we found that PKMYT1
   expression was significantly elevated in breast cancer tissues and
   demonstrated high diagnostic accuracy for breast cancer, suggesting its
   potential diagnostic value.

   Two key interconnected mechanisms driving cancer progression are immune
   escape and metabolic reprogramming. Our findings demonstrate distinct
   cellular distributions for SELENOP and PKMYT1, with SELENOP
   predominantly expressed in immune cells and PKMYT1 widely expressed in
   endothelial cells, suggesting their involvement in microenvironment
   regulation through different cellular compartments. High SELENOP
   expression shows significant positive correlations with CD8^+ T cell
   infiltration and immune activating molecules and significant negative
   correlations with immunosuppressive molecules. These observations
   provide novel insights, indicating that dysregulation of selenium
   metabolism, as reflected by SELENOP expression, may contribute to
   immune escape by modulating immune cell activity. Notably, high PKMYT1
   expression is significantly enriched in poorly differentiated tumors
   and shows a positive correlation with Estrogen Receptor and
   Progesterone Receptor status. This pattern suggests PKMYT1 holds
   potential as a clinical biomarker relevant to tumor grading and
   molecular subtyping.

   This study has certain limitations. Firstly, although RNAseq analysis
   based on public databases combines immunohistochemical validation of
   independent cocohort (n = 60), the statistical power of subgroup
   analysis may still be limited. Secondly, the mechanisms of SELENOP and
   PKMYT1 in the occurrence and development of breast cancer remain
   unclear, which will be the focus of subsequent experimental research.

5. Conclusion

   This study establishes SELENOP and PKMYT1 as key metabolic-immune
   regulators reprogramming the breast cancer microenvironment, serving as
   diagnostic biomarkers and therapeutic targets.

CRediT authorship contribution statement

   Guohui Tang: Writing – original draft, Methodology, Conceptualization.
   Zheng Zhang: Investigation, Formal analysis. Bo Pang: Software, Formal
   analysis. Ruonan Li: Resources, Data curation. Yuting Liu: Resources,
   Data curation. Haotian Cai: Resources, Data curation. Wenrui Wang:
   Visualization, Supervision. Changjie Chen: Visualization, Supervision.
   Yurong Ou: Visualization, Supervision. Qingling Yang: Writing – review
   & editing, Funding acquisition.

Ethical approval

   This study was approved by the Ethics Committee of Bengbu Medical
   University (Reference number: 2023-236). The patient provided their
   written informed consent to participate in this study.

Ethics statement

   The study was approved by the Ethics Committee of Bengbu Medical
   College.

Funding

   This study was supported by the University Synergy Innovation Program
   of Anhui Province (GXXT-2022-064), excellent Scientific Research and
   Innovation Team of Anhui Universities (2024AH010021), the Major Program
   of Anhui Educational Committee (No.: KJ2019ZD28), the Program for
   graduate's research of Bengbu Medical College (Byycx22042, Byycx23013).

Declaration of Competing interest

   The authors declare no conflict of interest.

Acknowledgments