Abstract [36]graphic file with name ao2c08227_0012.jpg Objectives: We aim to identify the breast cancer (BC) subtype clusters and the crucial gene classifier prognostic signatures by integrating genomic analysis with the tumor immune microenvironment (TME). Methods: Data sets of BC were derived from the Cancer Genome Atlas (TCGA), METABRIC, and Gene Expression Omnibus (GEO) databases. Unsupervised consensus clustering was carried out to obtain the subtype clusters of BC patients. Weighted gene coexpression network analysis (WGCNA), least absolute shrinkage and selection operator (LASSO), and univariate and multivariate regression analysis were employed to obtain the gene classifier signatures and their biological functions, which were validated by the BC dataset from the METABRIC database. Additionally, to evaluate the overall survival rates of BC patients, Kaplan–Meier survival analysis was carried out. Moreover, to assess how BC subtype clusters are related to the TME, single-cell analysis was performed. Finally, the drug sensitivity and the immune cell infiltration for different phenotypes of BC patients were also calculated by the CIBERSORT and ESTIMATE algorithms. Results: TCGA–BC samples were divided into three subtype clusters, S1, S2, and S3, among which the prognosis of S2 was poor and that of S1 and S3 were better. Three key pathways and 10 crucial prognostic-related gene signatures are screened. Finally, single-cell analysis suggests that S1 samples have the most types of immune cells, S2 with more sensitivity to tumor treatment drugs are enriched with more neutrophils, and more multilymphoid progenitor cells are involved in subtype cluster S3. Conclusions: Our novelty was to identify the BC subtype clusters and the gene classifier signatures employing a large-amount dataset combined with multiple bioinformatics methods. All of the results provide a basis for clinical precision treatment of BC. 1. Introduction Breast cancer (BC), a heterogeneous tumor, has many etiologies and clinical features.^[37]1,[38]2 Studies have found that multiple subtypes have been identified in the development and progression of BC.^[39]3 Based on the 2013 St. Gallen International Breast Cancer Congress, more than five molecular subtypes of BC can be defined, i.e., luminal A, luminal B, human epidermal growth factor receptor 2 (HER2) + B2, HER2 overexpression, basal-like triple-negative breast cancer (TNBC), and other subtypes.^[40]4 The classical classification methods of breast cancer subtypes include immunohistochemical-based estrogen receptor (ER)-, progesterone receptor (PR)-, and HER2 and intrinsic gene expression-based classification methods.^[41]5 Correspondingly, the resulting tumor subtypes with distinct genomic alterations show different clinical outcomes and treatment responses. Identification of BC subtypes can provide insight into the pathogenesis of the disease and contribute to the precise diagnosis of BC and personalized targeted therapy. Clinical phenotypes such as disease stage, metastasis, and tumor resectability, as well as histological subtypes, are frequently used in current clinical approaches for outcome prediction and treatment decision-making for BC treatment. However, the development of molecular profiling more recently has given rise to the possibility of quantitative tumor analysis based on profiles of protein expression, mutations, and/or genome-wide gene transcription. These assays promise to characterize tumor subtypes with more precision and accuracy and to more correctly forecast how particular tumor types will respond to various therapies. For instance, Perou et al. divided BC into five intrinsic molecular subtypes based on gene expression signatures—basal-like, normal-like, HER2-enriched, and luminal A and B, revealing variations in the genesis of tumor cells as well as disparate patterns of progression.^[42]6 Another study classified BC into 1q/16q, amplifier, and complex subtypes based on the CNA pattern.^[43]7 Additionally, it is widely known that the cross-talk between the tumor cells and tumor microenvironment (TME) influences genetic and epigenetic changes that affect the formation, incidence, and therapeutic resistance of cancer. In BC patients, a better clinical result has been related to immune infiltration levels.^[44]8,[45]9 Particularly, higher CD8+ T-cell infiltration is strongly associated with better overall survival (OS) in ER-negative patients.^[46]10,[47]11 Moreover, high immune infiltration is additionally associated with a better response to adjuvant chemotherapy.^[48]12 Many studies in recent years have shown that the TME can be explained by transcriptomic data.^[49]13−[50]18 These findings indicate that a lower risk of breast cancer recurrence is correlated with a higher expression of leukocyte-related genes.^[51]13,[52]16,[53]19,[54]20 Notably, Ali et al. and Bense et al. recently reported in a metastudy how specific immune cell types affect breast cancer prognosis.^[55]13,[56]21 However, the role and clinical relevance of immunity in BC still need to be elucidated by more comprehensive analysis. Presently, three BC prognostic subtype clusters and a set of genetic signatures that can be used to classify BC subtypes were identified using BC data in public databases, which may provide effective evidence for precise clinical treatment of BC and help explore BC heterogeneity between patients. The single-cell sequence analysis of BC suggests progressive immune infiltration of clinically relevant immune clusters. Through characterization of the immune components of the TME, the immune infiltration and poor prognosis group identified as tumor-related. Further subtype analysis revealed that most types of immune cells are enriched in subtype cluster S1 samples, more neutrophils are distributed in the subtype S2 with more sensitivity to tumor treatment drugs, and more multilymphoid progenitor cells are involved in subtype cluster S3. All of the results may provide an effective basis for the clinical precision treatment of BC. 2. Materials and Methods 2.1. Data Source Presently, a total of three publicly available data sets, i.e., cohorts A, B, and C, are used in this study, which are respectively derived from the Cancer Genome Atlas (TCGA), Molecular Taxonomy of Breast Cancer International Consortium (METABRIC), and Gene Expression Omnibus (GEO) databases. Among them, the gene expression profiling data of cohort A obtained from TCGA was used as a training set for model construction, which contains 103 paracancerous samples and 903 BC samples.^[57]22 The gene expression data of cohort B derived from METABRIC, which included 154 paracancerous samples and 1826 BC samples, were used to assess the reliability of the constructed model.^[58]23 Further, to explore the underlying biology of BC, single-cell analysis was performed on cohort C (accession number [59]GSE118389), which includes six primary TNBC patients with gene expression profile data.^[60]24 2.2. Clustering Analysis for Determining the BC Subtypes To determine the molecular subtypes of BC in patients, the consensus clustering analysis was carried out by employing the ConsensusClusterPlus package of R.^[61]25 Before running the program, the “max cluster number (maxK),” “cluster algorithm (clusterAlg),” and “distance” parameters in this package were set to “5”, “hc,” and “Pearson,” respectively. In order to ensure the stability and reliability of the classification, all cluster analyses were repeated 1000 times. Additionally, by calculating the consensus matrix, cumulative distribution function (CDF), and relative change in the area under the CDF curve, the optimal number of subtypes (K) was obtained. 2.3. Construction of the Coexpression Network To identify the modules associated with clinical features, weighted gene coexpression network analysis (WGCNA)^[62]26 was carried out on BC cohort A data by the corresponding package of R. At first, a topological overlap matrix (TOM) was obtained by the adjacency matrix. Then, all genes were split into various gene modules by the TOM dissimilarity measure. The outliers were identified and removed by clustering analysis and the optimal value of soft threshold β was employed to ensure a scale-free network. When minModuleSize = 30 and merge CutHeight = 0.25, the gene module was identified as a key module. Finally, the gene significance (GS) and module membership (MM) of the genes in the module were calculated, which represent the correlation between genes and subtypes. Herein, the hub genes are the genes with GS > 0.5 and MM > 0.9. 2.4. Functional Enrichment Analysis of Genes in Key Modules To investigate the potential biological functions performed by the acquired genes in key modules, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were carried out by employing several R packages, which contain “org. Hs.Eg.Db,” “ggplot2,” and “clusterProfiler.”^[63]27,[64]28 Subsequently, the enriched GO or pathway terms with significant differences were screened by a threshold p-value <0.05. 2.5. BC Subtype-Related Gene Classification Signature Screening To screen the key genes classification signature associated with BC subtypes, we first built the gene classifier as previously described.^[65]29 In brief, employing the logistic regression, the candidate genes associated with BC subtype distinction (with p < 0.05) were identified. Finally, using the glmnet and decision tree packages^[66]30 in R, the LASSO regression was carried out to narrow the variables and obtain the final gene classification signature. 2.6. Establishment of the Risk Prognostic Model To investigate the impact of gene classification signatures on the patient’s survival outcome and survival time, the BC gene expression data of cohort A were also used to establish a clinical prediction model. Employing the gene classifiers obtained above, the risk prognostic model was built based on the following formula. 2.6. 1 where coefi denotes the coefficient and id denotes the normalized count for each gene. Additionally, the TCGA–BC patients were split into high- and low-risk groups based on the median risk score. The overall survival rates of two patient groups were assessed using Kaplan–Meier survival analysis. Additionally, employing the survival package^[67]31 in R, the ability of the classifier to predict the tumor lymph node metastasis (TNM) status, stage, age, and risk was assessed by the time-dependent receiver operating characteristic (ROC) curve. Moreover, area under the curve (AUC) values were computed to determine the classifier’s or pertinent clinical factors’ predictive potential. Finally, the correlation between TCGA-based breast cancer clinical immunohistochemical subtypes and previous high- and low-risk subtypes was evaluated and visualized using the ggstatsplot package.^[68]32 2.7. Validation of the Classification Model To further confirm the robustness of gene classified signatures, the data of cohort B derived from METABRIC are employed for external validation. By calculating the individual risk score of high- and low-risk BC patients, their individual survival status was evaluated. In addition, the molecular intrinsic subtypes of BC patients were determined by the PAM50 classifier algorithm.^[69]33 Also, to compare the overall survival among subtypes, log-rank tests were carried out to assess the correlation between the basic classification clusters and tumor stage. 2.8. Evaluation Heterogeneity among Different Subtype Clusters To further evaluate the heterogeneity of the molecular intrinsic subtypes of BC patients, single-cell analysis was performed on cohort C ([70]GSE118389). To standardize the expression matrix of cohort C and to perform cell cluster analysis, the Seurat package, FindNeighbors, and FindClusters are used. In addition, the differentially expressed genes (DEGs) among cell clusters were obtained by FindAllMarkers in the Seurat package. Herein, the criteria for screening DEGs were |log fold change (FC)| ≥ 1 and the adjusted p-value ≤0.05. Finally, the cell subgroups determined by the prognostic gene classifier signatures were observed by violin plots, and the interaction between cells was analyzed by the R cellchat (1.6.1) package.^[71]34 2.9. Prognostic Nomogram Construction For further exploring the prognostic role of gene classification signatures, univariate and multivariate regression analysis was also performed to filter the significant prognostic factors. To determine which gene classifier signatures were significantly associated with OS, the univariate regression models were used and p-values were calculated. The gene classifier signatures with a p-value <0.05 are significantly related to clinicopathologic features. Subsequently, a prognostic nomogram was constructed to improve the prediction ability by integrating important features derived from multiple regression analysis with the survival and RMS packages.^[72]35 2.10. Evaluation of Drug Sensitivity and Immune Cell Infiltration For investigating the molecular characteristics associated with drug sensitivity and drug resistance, the Genomics of Drug Sensitivity in Cancer (GSDC) database ([73]https://www.cancerrxgene.org/) was employed, which is a public resource for the discovery of biomarkers in cancer cells. Using the pRRophetic package^[74]36 of R, the sensitivity data of three BC subtype clusters to different drugs were counted, and the drug sensitivity of different phenotypes of BC patients from the gene expression data was also predicted. Additionally, for exploring the differences of different subtypes of immune cells based among multiple clusters of TCGA–BC samples, CIBERSORT,^[75]37,[76]38 a tool to deconvolute the expression matrix of immune cell subtypes, was used. Besides, to explore the degree of immune cell infiltration between BC subgroups, the ESTIMATE^[77]39 algorithm was also performed. 3. Results 3.1. Identification of Clusters Associated with Subtypes of BC Patients Employing the ConsensusClusterPlus package of R, the gene expression data allowed the 1006 samples in the TCGA–BC cohort A to be divided into various groups. With the changes in the consensus matrix (CM) K value, the resulting clusters in CM ([78]Figure [79]1 a–d) and their corresponding prognosis ([80]Figure [81]1e–h) are also different. Particularly, when the maxK value is equal to 3, three clean clusters associated with molecular subtypes of BC in patients are obtained, which are named cluster S1 (n = 382), cluster S2 (n = 328), and cluster S3 (n = 296) subgroups ([82]Figure [83]1b). Subsequently, Kaplan–Meier survival analysis was performed on the gene expression data of the patients using the log-rank test ([84]Figure [85]1e–h). Interestingly, only when K = 3, the differences in prognosis among the three clustering subgroups are statistically significant (log-rank test P = 0.028 < 0.05, [86]Figure [87]1f), which is consistent with the above results. In detail, the prognosis of cluster S2 patients was the worst, and that of cluster S3 patients was the best ([88]Figure [89]1f). In contrast, moderate OS was observed in patients of cluster 1 ([90]Figure [91]1f). Besides, we evaluated the relative changes of areas under the CDF curve, and the results showed that when K = 3, the clustering results tend to be stable ([92]Figure [93]1i). Further, the principal component analysis of gene expression data in these three clusters is depicted in [94]Figure [95]1j. Consequently, a total of three subtype clusters of BC with significantly different prognoses were classified by unsupervised consistent clustering, which are used for further research. Figure 1. [96]Figure 1 [97]Open in a new tab Consensus clustering for TCGA–BC. (a–d) Consistency matrix heat map at k = 2–5. (e–h) Kaplan–Meier curves for OS of BC patients stratified by consensus clustering (2–5). (i) Cumulative distribution function curve tracking graph. (j) PCA of expression when subtype classification number is 3. 3.2. WGCNA Analysis Presently, WGCNA was carried out on the TCGA–BC cohort A to establish a weighted coexpression network. Clearly, the connection between genes in the network satisfies the scale-free distribution when the soft threshold β = 6 (scale-free R^2 = 0.9) ([98]Figure [99]2 a). Subsequently, the phylogenetic tree was used to retrieve the coexpression modules (cut height = 0.25) ([100]Figure [101]2a). Additionally, the hierarchical cluster analysis showed that modules with similar gene expression patterns were clustered on the same branch ([102]Figure [103]2b). [104]Figure [105]2c shows the calculated and visible correlations between modules and the subtype clusters, in which the red color module had a significantly negative association with subtype cluster S1 (r = 0.70, p < 0.01). Besides, the black module had the strongest association with subtype cluster S2 (r = 0.67, p < 0.01), while the blue module had the strongest association with subtype cluster S3 (r = −0.81, p < 0.01). Finally, the correlation between genes and subtypes is depicted in [106]Figure [107]2d–f. Through WGCNA, a total of three subtype-related gene modules were found, which can be considered as gene signatures specific to each BC subtype. Figure 2. Figure 2 [108]Open in a new tab WGCNA analysis of TCGA–BC. (a) Scale-free fit index analysis for different soft thresholds. (b) Cluster dendrogram of TCGA–BC. (c) Correlation between each module and different subtype clusters. (d–f) Correlation between module gene expression and subtype clusters. 3.3. BC Subtype Cluster-Specific Enrichment Analysis For investigating the biological pathways and biological processes related to BC subtype clusters, enrichment analysis of significant genes in modules was performed. [109]Figure [110]3 a,b shows that most of the biological processes and pathways associated with subtype cluster S1 are related to the regulation of angiogenesis and the PI3K-Akt pathway. The occurrence of these tumor vessels may be related to the tumor progression, which is consistent with the moderate survival time of subtype cluster S1 ([111]Figure [112]1f). Additionally, the genes in the subtype cluster S2 module show significant enrichment for hormone secretion and the AMPK signaling pathway ([113]Figure [114]3c,d). Actually, these are linked to the body’s active immune metabolic processes, which indicate the body’s immunological response as cancer progresses.^[115]40 After observing [116]Figure [117]3e,f, we found that the genes in subtype cluster S3 are mainly enriched in epidermis development and the Wnt signaling pathway. Overall, through enrichment analysis of the gene modules related to the three subtypes, combined with the analysis of the prognosis of BC, the accuracy of unsupervised consistent clustering for BC subtyping is further confirmed, and the basis for clinical treatment of these subtypes is also provided. Figure 3. [118]Figure 3 [119]Open in a new tab GO bioprocess functional enrichment and pathway analysis of genes in (a, b) subtype cluster S1, (c, d) subtype cluster S2, and (e, f) subtype cluster S3 modules. 3.4. Identification of Gene Classifier Signatures Associated with BC Subtype Clusters The distribution of different BC stages and ages in the three subtype clusters is depicted in [120]Figure [121]4 a. Obviously, patients with N and M stages had no significant difference in age, while lower stage and T stages were more common in subtype clusters S1 and S3 ([122]Figure [123]4a). To avoid overfitting the subtyping characteristics, the LASSO regression on the gene expression of TCGA–BC patients was performed. As a result, a total of 160 genes that are related to BC subtype clusters are obtained ([124]Figure [125]4b) and the optimal value of penalty parameters through 10 rounds of cross-validation is also determined ([126]Figure [127]4c). Additionally, the boxplots of the three stages also suggest obvious differences among the three BC subtype clusters, revealing the robustness of constructed gene classifier signatures ([128]Figure [129]4d). Figure 4. Figure 4 [130]Open in a new tab Identification and evaluation of the gene classifier for BC subtyping. (a) Distribution of follow-up data of breast cancer patients in three subtypes. (b) LASSO coefficient profiles of the module genes. (c) Selection of the penalty parameter (λ) in the LASSO model. (d) Gene classifier is represented by the boxplot in the three subtypes. 3.5. Assessment of the Gene Classifier Prognostic Signature for BC By analyzing the Kaplan–Meier curve, we found that the OS of BC samples in the high-risk group had a lower value than those in the low-risk group, indicating that the prognostic marker of the risk score was effective (p < 0.05) ([131]Figure [132]5 a). In addition, the risk curve and scatter plot indicate that the risk factors and mortality of the high-risk group were higher than those of the low-risk group ([133]Figure [134]5b). Furthermore, ROC analysis reveals that the AUC of the risk score was 0.703 ([135]Figure [136]5c), suggesting the reliability of the obtained gene classifier signatures of BC. At the same time, the survival analysis results of BC subtype clusters obtained by using the unsupervised consistent clustering were relatively consistent with the prognosis results in high- and low-risk groups obtained from the prognostic model. In detail, the subtype cluster S3 with a better survival rate was similar to that in the low-risk group and the subtype cluster S2 with poor survival was close to that in the high-risk group ([137]Figure [138]5d). All of these findings imply that the gene signature is a predictive factor unique to BC patients. Figure 5. Figure 5 [139]Open in a new tab Survival evaluation of gene classifier signatures in TCGA. (a) Kaplan–Meier survival curve of the patients in high- and low-risk scores. (b) Patient subtypes were grouped based on the median level of gene classifier labels in TCGA. (c) Results of ROC analysis for the clinical factors. (d) Distribution of subtypes in high- and low-risk levels. 3.6. Validation of the Prognostic Signature for BC For further validating the stability of gene classifier signatures, the patients of BC in the METABRIC database were classified into two different groups, namely, high- and low-risk groups. Then, the OS of patients in high- and low-risk groups was analyzed by Kaplan–Meier survival analysis. As a result, samples from high-risk groups have higher OS than those from low-risk groups, suggesting that prognostic markers of risk scores were valid (P < 0.05) ([140]Figure [141]6 a). The generated risk curves and scatterplots indicate that compared with the risk factor and mortality in low-risk groups, those in high-risk groups were high ([142]Figure [143]6b). All of these results suggest that the gene classifiers are highly reliable in marking prognostic markers of BC. Moreover, the molecular intrinsic subtypes of BC patients were determined by the PAM50 classifier algorithm. Clearly, according to gene classified signatures, three subtype clusters were identified ([144]Figure [145]6c) and there were also significant differences in prognosis (p < 0.05) among three subtype clusters ([146]Figure [147]6d), which is in agreement with the above results. Overall, all of the findings suggest that the genetic classification markers are independent prognostic factors in BC patients. Figure 6. Figure 6 [148]Open in a new tab Survival evaluation of gene classifier signatures in METABRIC. (a) Patient subtypes were grouped based on the median level of gene classifier labels in METABRIC. (b) Kaplan–Meier survival curve of OS in different groups for the BC patients in METABRIC. (c) K-means algorithm for molecular subtype classification in the BC dataset of METABRIC (n = 1904). (d) Kaplan–Meier survival curve of OS between subtypes in METABRIC. 3.7. Independent Prognostic Ability of Gene Classifier Signatures For evaluating the independent prognostic ability of the obtained gene classifier signatures, we also investigated the correlation between the gene classifier signatures and clinicopathological features of TCGA–BC samples through univariate regression analysis and multivariate Cox regression analysis. The clinically common factors like age, tumor stage, group, and TNM status were registered as covariates for analysis. As a result, age, M status, and group are independent factors that can be used to predict the prognosis of BC patients ([149]Figure [150]7 a,b). In addition, the clinicopathological information and independent prognostic factors are combined and then a nomogram involving three items in TCGA was constructed, which is used as a clinically relevant quantitative method for predicting the mortality of BC patients. Obviously, we found that by adding points for each prognostic parameter, each patient will be assigned a total score value and higher total scores corresponded to poorer patient outcomes ([151]Figure [152]7c). Moreover, calibration plots for the TCGA cohort A reveal a similar performance of the nomogram to the developed prognostic model ([153]Figure [154]7d). In addition, the DCA curve also indicated that the nomogram had good stability and reliability net benefit curve in age compared with other clinical factors ([155]Figure [156]7e). Finally, we further screened the gene signatures that are associated with the prognosis by employing univariate regression analysis. As a result, a total of ten gene signatures significantly related to prognosis were obtained, which contain BARD1, CASP7, CYP11A1, IL1RAP, ITGB1, RAB13, RPL28, TGFBR1, DUSP1, and CXCL13 ([157]Figure [158]7f). Figure 7. [159]Figure 7 [160]Open in a new tab Correlation between the gene classifier signatures and clinicopathological features of TCGA–BC samples. (a, b) Univariate and multivariate analysis including risk status and clinical factors. (c) Nomogram for comprehensively predicting the 1-, 3-, and 5-year OS of BC patients in the TCGA database. (d) Calibration plots for estimating the 1-, 3-, and 5-year OS for TCGA–BC samples. (e) Decision curve plots of the “nomogram,” “group,” “pathology_M,” “age,” “All,” and “None” models by Cox regression analysis. (f) Significantly prognostic genes screened by univariate analysis. 3.8. Evaluation of Cellular Heterogeneity in the Subtype Cluster of BC Patients To evaluate the heterogeneity of the molecular intrinsic subtypes of BC patients, single-cell analysis was performed on [161]GSE118389. Employing the immune cell-specific marker genes obtained from the published article,^[162]41 more than 1500 cells from six primary BC patients were clustered into 13 clusters and 12 cell types, i.e., CD8+ T-cells, endothelial cells, mast cells, stromal cells, neutrophil, natural killer T-cells, multilymphoid progenitor cells, epithelial cells, secretory cells, B-cells, plasmacytoid dendritic cells, and cancer stem cells ([163]Figure [164]8 a). Subsequently, the DEGs between different cell types were identified and visualized by the volcano plots ([165]Figure [166]8b). In addition, the signature distribution of key prognostic gene classifiers reveals that BARD1 is mainly expressed in T and cancer stem cells, while CASP7 is mainly expressed in endothelial cells ([167]Figure [168]8c). Moreover, DUSP1, ITGB1, RAB13, and RPL28 are expressed in subsets other than T-cells and B-cells ([169]Figure [170]8c). With respect to TGFBR1, it was mainly expressed in T-cells, neutrophils, and epithelial cells. In contrast, CYP11A1, IL1RAP, and CXCL13 were not specifically expressed ([171]Figure [172]8c). Furthermore, to characterize the communication between key immune cells, tumor cells, and other microenvironmental cells, the significant ligand–receptor interactions between 12 cell types were detected and analyzed through the CellChat analysis. The results show that there is extensive cellular communication among tumor cells, NK-T cells, epithelial cells, and secretory cells ([173]Figure [174]8d). Figure 8. [175]Figure 8 [176]Open in a new tab Single-cell sequencing analysis. (a) Cells are divided into thirteen types, including CD8+ T-cells, endothelial cells, mast cells, stromal cells, neutrophil, natural killer T-cells, multilymphoid progenitor cells, epithelial cells, secretory cells, B-cells, plasmacytoid dendritic cells, and cancer stem cells. (b) Volcano of DEGs between different cell populations. (c) Violin plots displaying the expression of representative gene classifier signatures across the cell types identified in the single-cell dataset. The y-axis shows the normalized read count. (d) Cellular communication analysis among tumor cells, NK-T cells, epithelial cells, and secretory cells. 3.9. Distribution of Subtype Clusters in BC Intrinsic Profiles and Risk Subgroups To investigate the association between the subtype clusters and the clinicopathological characteristics and risk subgroups of BC patients, we assessed the proportion of subtype clusters in BC molecular intrinsic profiles (i.e., basal, Her2, LumA, LumB, and normal) and the risk subgroups (i.e., high, low) using the chi-square analysis.^[177]42 Interestingly, all of the BC molecular intrinsic profiles were identified in the three subtype clusters, but the proportions are different ([178]Figure [179]9 a). For example, the proportion of subtype cluster S2 with poor prognosis in the basal-like profile was 80% (p = 9.49 × 10^–33), whereas subtype cluster S1 (68%) showed a BC normal-enriched profile (p = 2.35 × 10^–4, [180]Figure [181]9a). Of note, 52% of subtype cluster S3 exhibited a BC LumB-enriched profile (p = 3.21 × 10^–10, [182]Figure [183]9a). Additionally, compared with the proportion of subtype cluster S2 patients in the low-risk group, that in the high-risk group was significantly higher (p = 3.38 × 10^–6), whereas the proportion of samples in subtype clusters S1 and S3 in the low-risk group are higher (p = 1.15 × 10^–4, [184]Figure [185]9b). All of the results indicate that the classified subtype clusters and the clinicopathological features of BC patients exhibit a certain correlation. Figure 9. Figure 9 [186]Open in a new tab Correlation analysis of the clinical type and subtype clusters. (a) Histogram of the correlation between clinical immunohistochemical typing and subtype clusters. (b) Histogram of the correlation between high- and low-risk typing and subtype clusters. 3.10. Drug Sensitivity Analysis and Immune Cell Infiltration Evaluation Using the “pRRophetic” package of R, the IC50 differences of eight targeted and chemotherapeutic drugs in three different subtypes of clustering were studied to predict their sensitivities to drug therapy. [187]Figure [188]10 a–h, respectively, shows the drug sensitivity results of Sorafenib, Gefitinib, Bleomycin, Bosutinib, Etoposide, Lenalidomide, Camptothecin, and Methotrexate in three subtypes of BC clusters. The statistical results demonstrated that subtype cluster S2 samples are generally more sensitive to tumor treatment drugs than the other two subtype clusters. In contrast, subtype cluster S3 is less sensitive to most drugs ([189]Figure [190]10a–h). Figure 10. [191]Figure 10 [192]Open in a new tab Drug sensitivity analysis as well as immune cell infiltration evaluation. (a–h) Scatter plot shows that drug sensitivity prediction scores are distributed differently among subtypes. (i) Comparison of immune cell proportions between subtypes (TCGA). (j–m) Scatter plot for the immune score, stromal score, estimated score, and tumor purity, which are distributed differently among subtypes. Additionally, the differences of 22 subtypes of immune cells among TCGA–BC samples in three subtype clusters are investigated using the CIBERSORT algorithm. According to the results, there are significant differences in immune infiltration between the three subtype clusters for B-cells naive, plasma cells, T-cells CD4 memory resting, T-cells CD4 memory activated, T-cells follicular helper, T-cells regulatory Tregs, T-cells γ delta, NK cells resting, monocytes, macrophages M0, M1, M2, dendritic cells activated, mast cells resting, and eosinophils ([193]Figure [194]10 i), taking up a sizable portion of immune cell invasion. Besides, compared to other clusters with a poorer prognosis, the subtype cluster S1 with better survival showed a higher number of M1 and M2 macrophages and activated NK cells ([195]Figure [196]10i). Finally, the tumor purity for all BC samples in three subtype clusters as well as the stromal and immune scores were calculated. As a result, subtype cluster S2 had better immune, stromal, and estimate scores than the others, while subtype clusters S1 and S3 had more pure tumors ([197]Figure [198]10j–m). 4. Discussion With the rapid development of high-throughput sequencing and the accumulation of massive biomedical data, a large number of bioinformatics methods have been used to identify new gene signatures related to the progression, survival, or prognosis of malignant carcinoma.^[199]43−[200]45 Particularly, the critical role of the TME in tumor biology has made targeted TME therapy for tumors a research hotspot. BC, as one of the most common malignant tumors in women, has attracted more attention from researchers due to its poorer prognosis.^[201]46 As a biologically heterogeneous disease, BC has different molecular subtypes according to the gene expression profiles. Presently, we attempt to identify the BC subtype clusters by integrating genomic analysis with the TME. As far as we know, a large-amount dataset combined with multiple bioinformatics methods was first used to identify the intrinsic subtypes and the gene classifier signatures for BC patients. First, we performed unsupervised consistent clustering on the mRNA profiles of TCGA–BC cohort A and a total of three BC subtype clusters were obtained, which are named cluster S1 (n = 382), cluster S2 (n = 328), and cluster S3 (n = 296) subtypes ([202]Figure [203]1 b). Then, three corresponding subtype-related gene modules were obtained from WGCNA, which are specific to each BC subtype cluster ([204]Figure [205]2c). Further analysis of enrichment reveals that genes in subtype cluster S1 are principally involved in the regulation of angiogenesis and PI3K-Akt pathways, and genes in subtype cluster S2 exhibit significant enrichment for hormone secretion and AMPK signaling pathways ([206]Figure [207]3a–d). Additionally, genes in subtype cluster S3 are mainly enriched in epidermis development and Wnt signaling pathways ([208]Figure [209]3e,f). Actually, these biological processes and signaling pathways exhibit crucial roles in the development of BC. For instance, angiogenesis is a crucial process in the formation of inflammatory BC (IBC), and genes and proteins involved in angiogenesis are often overexpressed and overactivated in IBC.^[210]47,[211]48 Recent studies have shown that hormone receptor (HR) + BC causes the majority of BC-related deaths.^[212]49 Yu et al. have also reported that in BC patients with bone, lung, and brain metastases, actionable mutations frequently occur in the PI3K/AKT/mTOR pathway.^[213]50 Inhibiting the activation of the PI3K/Akt pathway partially restores the sensitivity of trastuzumab-resistant HER2-positive BC cells.^[214]51 In addition, the genetic diversities of BC patients demonstrate that it is important to identify gene signatures with distinct survival and druggable targets. Presently, LASSO regression analysis on the gene expression profiles of TCGA–BC patients obtained a total of 160 gene classifiers related to the BC subtype ([215]Figure [216]4 b). Employing the obtained gene classifiers, we built the risk prognostic model, and this model was also then validated by the gene expression data of cohort B in the METABRIC database. The results show that the OS of samples from high-risk groups is higher than those in low-risk groups, suggesting that the risk model of prognostic markers was reliable (P < 0.05) ([217]Figure [218]6a). Indeed, the important role of OS in multiple stages of tumorigenesis and development has been reported in many studies.^[219]52,[220]53 Further, evaluation of the independent prognostic ability of obtained gene classifier signatures indicates that the BC patients in subtype cluster S2 have the poorest prognosis, while subtype clusters S1 and S3 exhibit good prognosis ([221]Figure [222]5d). Subsequently, ten gene signatures significantly related to prognosis were obtained, which contain BARD1, CASP7, CYP11A1, IL1RAP, ITGB1, RAB13, RPL28, TGFBR1, DUSP1, and CXCL13 ([223]Figure [224]7f). All of these results suggest that the independent prognostic factors of BC patients can be the genetic classifier signatures. Besides, it is well-known that TME has an important role during the progression of human cancers, as well as the OS after the operation.^[225]54,[226]55 Presently, the heterogeneity of the molecular subtypes of BC patients was also explored by the single-cell analysis on [227]GSE118389. The results demonstrate that most of the key prognosis-related gene classification signatures are distributed in immune cells. Notably, there were more neutrophils in the subtype S2 with poor prognosis ([228]Figure [229]8 c), which is consistent with the previous study.^[230]56 Single-cell analysis also shows that subtype cluster S1 contains more types of immune cells, while subtype cluster S3 with a good prognosis includes more multilymphoid progenitor cells ([231]Figure [232]8c). Moreover, the chi-square analysis exhibits that 80% of basal-like profile samples are found in subtype cluster S2 (p = 9.49 × 10^–33), 68% of the normal-enriched profiles are subtype cluster S1 samples, and 52% of the subtype cluster S3 showed LumB enrichment (p = 3.21 × 10^–10, [233]Figure [234]9a), indicating the association among the classification subtype cluster and the clinicopathological features of patients with BC. Finally, drug sensitivity analysis demonstrates that subtype cluster S2 samples are generally more sensitive to tumor treatment drugs and subtype cluster S3 is less sensitive to most drugs ([235]Figure [236]10). All of the findings may provide a basis for targeting the subtype-specific molecular and immunotherapy of BC. However, the gene signatures that characterize each subtype cluster of BC need further be studied and validated by biological experiments. 5. Conclusions Taken together, three subtype clusters, S1, S2, and S3, are obtained from the gene expression profiles of TCGA–BC cohort A. WGCNA and pathway enrichment analysis revealed that PI3K-Akt, AMPK, and Wnt signaling pathways, respectively, participated in the subtype clusters S1, S2, and S3. Additionally, the risk prognostic model was constructed by the obtained 160 gene classifiers associated with the BC subtype and validated by the data in the METABRIC database. Clinical prognostic evaluation reveals the poor prognostic ability of subtype cluster S2 and the good prognostic ability of subtype clusters S1 and S3. Then, ten crucial prognostic-related gene signatures, i.e., BARD1, CASP7, CYP11A1, IL1RAP, ITGB1, RAB13, RPL28, TGFBR1, DUSP1, and CXCL13, are screened. Finally, single-cell analysis demonstrates that most types of immune cells are enriched in subtype cluster S1 samples, more neutrophils are distributed in the subtype S2 with more sensitivity to tumor treatment drugs, and more multilymphoid progenitor cells are involved in subtype cluster S3. All of the results may provide an effective basis for the clinical precision treatment of BC. Acknowledgments