Abstract

   [36]graphic file with name ao2c08227_0012.jpg

   Objectives: We aim to identify the breast cancer (BC) subtype clusters
   and the crucial gene classifier prognostic signatures by integrating
   genomic analysis with the tumor immune microenvironment (TME). Methods:
   Data sets of BC were derived from the Cancer Genome Atlas (TCGA),
   METABRIC, and Gene Expression Omnibus (GEO) databases. Unsupervised
   consensus clustering was carried out to obtain the subtype clusters of
   BC patients. Weighted gene coexpression network analysis (WGCNA), least
   absolute shrinkage and selection operator (LASSO), and univariate and
   multivariate regression analysis were employed to obtain the gene
   classifier signatures and their biological functions, which were
   validated by the BC dataset from the METABRIC database. Additionally,
   to evaluate the overall survival rates of BC patients, Kaplan–Meier
   survival analysis was carried out. Moreover, to assess how BC subtype
   clusters are related to the TME, single-cell analysis was performed.
   Finally, the drug sensitivity and the immune cell infiltration for
   different phenotypes of BC patients were also calculated by the
   CIBERSORT and ESTIMATE algorithms. Results: TCGA–BC samples were
   divided into three subtype clusters, S1, S2, and S3, among which the
   prognosis of S2 was poor and that of S1 and S3 were better. Three key
   pathways and 10 crucial prognostic-related gene signatures are
   screened. Finally, single-cell analysis suggests that S1 samples have
   the most types of immune cells, S2 with more sensitivity to tumor
   treatment drugs are enriched with more neutrophils, and more
   multilymphoid progenitor cells are involved in subtype cluster S3.
   Conclusions: Our novelty was to identify the BC subtype clusters and
   the gene classifier signatures employing a large-amount dataset
   combined with multiple bioinformatics methods. All of the results
   provide a basis for clinical precision treatment of BC.

1. Introduction

   Breast cancer (BC), a heterogeneous tumor, has many etiologies and
   clinical features.^[37]1,[38]2 Studies have found that multiple
   subtypes have been identified in the development and progression of
   BC.^[39]3 Based on the 2013 St. Gallen International Breast Cancer
   Congress, more than five molecular subtypes of BC can be defined, i.e.,
   luminal A, luminal B, human epidermal growth factor receptor 2 (HER2) +
   B2, HER2 overexpression, basal-like triple-negative breast cancer
   (TNBC), and other subtypes.^[40]4 The classical classification methods
   of breast cancer subtypes include immunohistochemical-based estrogen
   receptor (ER)-, progesterone receptor (PR)-, and HER2 and intrinsic
   gene expression-based classification methods.^[41]5 Correspondingly,
   the resulting tumor subtypes with distinct genomic alterations show
   different clinical outcomes and treatment responses. Identification of
   BC subtypes can provide insight into the pathogenesis of the disease
   and contribute to the precise diagnosis of BC and personalized targeted
   therapy.

   Clinical phenotypes such as disease stage, metastasis, and tumor
   resectability, as well as histological subtypes, are frequently used in
   current clinical approaches for outcome prediction and treatment
   decision-making for BC treatment. However, the development of molecular
   profiling more recently has given rise to the possibility of
   quantitative tumor analysis based on profiles of protein expression,
   mutations, and/or genome-wide gene transcription. These assays promise
   to characterize tumor subtypes with more precision and accuracy and to
   more correctly forecast how particular tumor types will respond to
   various therapies. For instance, Perou et al. divided BC into five
   intrinsic molecular subtypes based on gene expression
   signatures—basal-like, normal-like, HER2-enriched, and luminal A and B,
   revealing variations in the genesis of tumor cells as well as disparate
   patterns of progression.^[42]6 Another study classified BC into 1q/16q,
   amplifier, and complex subtypes based on the CNA pattern.^[43]7

   Additionally, it is widely known that the cross-talk between the tumor
   cells and tumor microenvironment (TME) influences genetic and
   epigenetic changes that affect the formation, incidence, and
   therapeutic resistance of cancer. In BC patients, a better clinical
   result has been related to immune infiltration levels.^[44]8,[45]9
   Particularly, higher CD8+ T-cell infiltration is strongly associated
   with better overall survival (OS) in ER-negative
   patients.^[46]10,[47]11 Moreover, high immune infiltration is
   additionally associated with a better response to adjuvant
   chemotherapy.^[48]12 Many studies in recent years have shown that the
   TME can be explained by transcriptomic data.^[49]13−[50]18 These
   findings indicate that a lower risk of breast cancer recurrence is
   correlated with a higher expression of leukocyte-related
   genes.^[51]13,[52]16,[53]19,[54]20 Notably, Ali et al. and Bense et al.
   recently reported in a metastudy how specific immune cell types affect
   breast cancer prognosis.^[55]13,[56]21 However, the role and clinical
   relevance of immunity in BC still need to be elucidated by more
   comprehensive analysis.

   Presently, three BC prognostic subtype clusters and a set of genetic
   signatures that can be used to classify BC subtypes were identified
   using BC data in public databases, which may provide effective evidence
   for precise clinical treatment of BC and help explore BC heterogeneity
   between patients. The single-cell sequence analysis of BC suggests
   progressive immune infiltration of clinically relevant immune clusters.
   Through characterization of the immune components of the TME, the
   immune infiltration and poor prognosis group identified as
   tumor-related. Further subtype analysis revealed that most types of
   immune cells are enriched in subtype cluster S1 samples, more
   neutrophils are distributed in the subtype S2 with more sensitivity to
   tumor treatment drugs, and more multilymphoid progenitor cells are
   involved in subtype cluster S3. All of the results may provide an
   effective basis for the clinical precision treatment of BC.

2. Materials and Methods

2.1. Data Source

   Presently, a total of three publicly available data sets, i.e., cohorts
   A, B, and C, are used in this study, which are respectively derived
   from the Cancer Genome Atlas (TCGA), Molecular Taxonomy of Breast
   Cancer International Consortium (METABRIC), and Gene Expression Omnibus
   (GEO) databases. Among them, the gene expression profiling data of
   cohort A obtained from TCGA was used as a training set for model
   construction, which contains 103 paracancerous samples and 903 BC
   samples.^[57]22 The gene expression data of cohort B derived from
   METABRIC, which included 154 paracancerous samples and 1826 BC samples,
   were used to assess the reliability of the constructed model.^[58]23
   Further, to explore the underlying biology of BC, single-cell analysis
   was performed on cohort C (accession number [59]GSE118389), which
   includes six primary TNBC patients with gene expression profile
   data.^[60]24

2.2. Clustering Analysis for Determining the BC Subtypes

   To determine the molecular subtypes of BC in patients, the consensus
   clustering analysis was carried out by employing the
   ConsensusClusterPlus package of R.^[61]25 Before running the program,
   the “max cluster number (maxK),” “cluster algorithm (clusterAlg),” and
   “distance” parameters in this package were set to “5”, “hc,” and
   “Pearson,” respectively. In order to ensure the stability and
   reliability of the classification, all cluster analyses were repeated
   1000 times. Additionally, by calculating the consensus matrix,
   cumulative distribution function (CDF), and relative change in the area
   under the CDF curve, the optimal number of subtypes (K) was obtained.

2.3. Construction of the Coexpression Network

   To identify the modules associated with clinical features, weighted
   gene coexpression network analysis (WGCNA)^[62]26 was carried out on BC
   cohort A data by the corresponding package of R. At first, a
   topological overlap matrix (TOM) was obtained by the adjacency matrix.
   Then, all genes were split into various gene modules by the TOM
   dissimilarity measure. The outliers were identified and removed by
   clustering analysis and the optimal value of soft threshold β was
   employed to ensure a scale-free network. When minModuleSize = 30 and
   merge CutHeight = 0.25, the gene module was identified as a key module.
   Finally, the gene significance (GS) and module membership (MM) of the
   genes in the module were calculated, which represent the correlation
   between genes and subtypes. Herein, the hub genes are the genes with GS
   > 0.5 and MM > 0.9.

2.4. Functional Enrichment Analysis of Genes in Key Modules

   To investigate the potential biological functions performed by the
   acquired genes in key modules, Gene Ontology (GO) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were
   carried out by employing several R packages, which contain “org.
   Hs.Eg.Db,” “ggplot2,” and “clusterProfiler.”^[63]27,[64]28
   Subsequently, the enriched GO or pathway terms with significant
   differences were screened by a threshold p-value <0.05.

2.5. BC Subtype-Related Gene Classification Signature Screening

   To screen the key genes classification signature associated with BC
   subtypes, we first built the gene classifier as previously
   described.^[65]29 In brief, employing the logistic regression, the
   candidate genes associated with BC subtype distinction (with p < 0.05)
   were identified. Finally, using the glmnet and decision tree
   packages^[66]30 in R, the LASSO regression was carried out to narrow
   the variables and obtain the final gene classification signature.

2.6. Establishment of the Risk Prognostic Model

   To investigate the impact of gene classification signatures on the
   patient’s survival outcome and survival time, the BC gene expression
   data of cohort A were also used to establish a clinical prediction
   model. Employing the gene classifiers obtained above, the risk
   prognostic model was built based on the following formula.
   2.6. 1

   where coefi denotes the coefficient and id denotes the normalized count
   for each gene.

   Additionally, the TCGA–BC patients were split into high- and low-risk
   groups based on the median risk score. The overall survival rates of
   two patient groups were assessed using Kaplan–Meier survival analysis.
   Additionally, employing the survival package^[67]31 in R, the ability
   of the classifier to predict the tumor lymph node metastasis (TNM)
   status, stage, age, and risk was assessed by the time-dependent
   receiver operating characteristic (ROC) curve. Moreover, area under the
   curve (AUC) values were computed to determine the classifier’s or
   pertinent clinical factors’ predictive potential. Finally, the
   correlation between TCGA-based breast cancer clinical
   immunohistochemical subtypes and previous high- and low-risk subtypes
   was evaluated and visualized using the ggstatsplot package.^[68]32

2.7. Validation of the Classification Model

   To further confirm the robustness of gene classified signatures, the
   data of cohort B derived from METABRIC are employed for external
   validation. By calculating the individual risk score of high- and
   low-risk BC patients, their individual survival status was evaluated.
   In addition, the molecular intrinsic subtypes of BC patients were
   determined by the PAM50 classifier algorithm.^[69]33 Also, to compare
   the overall survival among subtypes, log-rank tests were carried out to
   assess the correlation between the basic classification clusters and
   tumor stage.

2.8. Evaluation Heterogeneity among Different Subtype Clusters

   To further evaluate the heterogeneity of the molecular intrinsic
   subtypes of BC patients, single-cell analysis was performed on cohort C
   ([70]GSE118389). To standardize the expression matrix of cohort C and
   to perform cell cluster analysis, the Seurat package, FindNeighbors,
   and FindClusters are used. In addition, the differentially expressed
   genes (DEGs) among cell clusters were obtained by FindAllMarkers in the
   Seurat package. Herein, the criteria for screening DEGs were |log fold
   change (FC)| ≥ 1 and the adjusted p-value ≤0.05. Finally, the cell
   subgroups determined by the prognostic gene classifier signatures were
   observed by violin plots, and the interaction between cells was
   analyzed by the R cellchat (1.6.1) package.^[71]34

2.9. Prognostic Nomogram Construction

   For further exploring the prognostic role of gene classification
   signatures, univariate and multivariate regression analysis was also
   performed to filter the significant prognostic factors. To determine
   which gene classifier signatures were significantly associated with OS,
   the univariate regression models were used and p-values were
   calculated. The gene classifier signatures with a p-value <0.05 are
   significantly related to clinicopathologic features. Subsequently, a
   prognostic nomogram was constructed to improve the prediction ability
   by integrating important features derived from multiple regression
   analysis with the survival and RMS packages.^[72]35

2.10. Evaluation of Drug Sensitivity and Immune Cell Infiltration

   For investigating the molecular characteristics associated with drug
   sensitivity and drug resistance, the Genomics of Drug Sensitivity in
   Cancer (GSDC) database ([73]https://www.cancerrxgene.org/) was
   employed, which is a public resource for the discovery of biomarkers in
   cancer cells. Using the pRRophetic package^[74]36 of R, the sensitivity
   data of three BC subtype clusters to different drugs were counted, and
   the drug sensitivity of different phenotypes of BC patients from the
   gene expression data was also predicted.

   Additionally, for exploring the differences of different subtypes of
   immune cells based among multiple clusters of TCGA–BC samples,
   CIBERSORT,^[75]37,[76]38 a tool to deconvolute the expression matrix of
   immune cell subtypes, was used. Besides, to explore the degree of
   immune cell infiltration between BC subgroups, the ESTIMATE^[77]39
   algorithm was also performed.

3. Results

3.1. Identification of Clusters Associated with Subtypes of BC Patients

   Employing the ConsensusClusterPlus package of R, the gene expression
   data allowed the 1006 samples in the TCGA–BC cohort A to be divided
   into various groups. With the changes in the consensus matrix (CM) K
   value, the resulting clusters in CM ([78]Figure [79]1
   a–d) and their corresponding prognosis ([80]Figure [81]1e–h) are also
   different. Particularly, when the maxK value is equal to 3, three clean
   clusters associated with molecular subtypes of BC in patients are
   obtained, which are named cluster S1 (n = 382), cluster S2 (n = 328),
   and cluster S3 (n = 296) subgroups ([82]Figure [83]1b). Subsequently,
   Kaplan–Meier survival analysis was performed on the gene expression
   data of the patients using the log-rank test ([84]Figure [85]1e–h).
   Interestingly, only when K = 3, the differences in prognosis among the
   three clustering subgroups are statistically significant (log-rank test
   P = 0.028 < 0.05, [86]Figure [87]1f), which is consistent with the
   above results. In detail, the prognosis of cluster S2 patients was the
   worst, and that of cluster S3 patients was the best ([88]Figure
   [89]1f). In contrast, moderate OS was observed in patients of cluster 1
   ([90]Figure [91]1f). Besides, we evaluated the relative changes of
   areas under the CDF curve, and the results showed that when K = 3, the
   clustering results tend to be stable ([92]Figure [93]1i). Further, the
   principal component analysis of gene expression data in these three
   clusters is depicted in [94]Figure [95]1j. Consequently, a total of
   three subtype clusters of BC with significantly different prognoses
   were classified by unsupervised consistent clustering, which are used
   for further research.

Figure 1.

   [96]Figure 1
   [97]Open in a new tab

   Consensus clustering for TCGA–BC. (a–d) Consistency matrix heat map at
   k = 2–5. (e–h) Kaplan–Meier curves for OS of BC patients stratified by
   consensus clustering (2–5). (i) Cumulative distribution function curve
   tracking graph. (j) PCA of expression when subtype classification
   number is 3.

3.2. WGCNA Analysis

   Presently, WGCNA was carried out on the TCGA–BC cohort A to establish a
   weighted coexpression network. Clearly, the connection between genes in
   the network satisfies the scale-free distribution when the soft
   threshold β = 6 (scale-free R^2 = 0.9) ([98]Figure [99]2
   a). Subsequently, the phylogenetic tree was used to retrieve the
   coexpression modules (cut height = 0.25) ([100]Figure [101]2a).
   Additionally, the hierarchical cluster analysis showed that modules
   with similar gene expression patterns were clustered on the same branch
   ([102]Figure [103]2b). [104]Figure [105]2c shows the calculated and
   visible correlations between modules and the subtype clusters, in which
   the red color module had a significantly negative association with
   subtype cluster S1 (r = 0.70, p < 0.01). Besides, the black module had
   the strongest association with subtype cluster S2 (r = 0.67, p < 0.01),
   while the blue module had the strongest association with subtype
   cluster S3 (r = −0.81, p < 0.01). Finally, the correlation between
   genes and subtypes is depicted in [106]Figure [107]2d–f. Through WGCNA,
   a total of three subtype-related gene modules were found, which can be
   considered as gene signatures specific to each BC subtype.

Figure 2.

   Figure 2
   [108]Open in a new tab

   WGCNA analysis of TCGA–BC. (a) Scale-free fit index analysis for
   different soft thresholds. (b) Cluster dendrogram of TCGA–BC. (c)
   Correlation between each module and different subtype clusters. (d–f)
   Correlation between module gene expression and subtype clusters.

3.3. BC Subtype Cluster-Specific Enrichment Analysis

   For investigating the biological pathways and biological processes
   related to BC subtype clusters, enrichment analysis of significant
   genes in modules was performed. [109]Figure [110]3
   a,b shows that most of the biological processes and pathways associated
   with subtype cluster S1 are related to the regulation of angiogenesis
   and the PI3K-Akt pathway. The occurrence of these tumor vessels may be
   related to the tumor progression, which is consistent with the moderate
   survival time of subtype cluster S1 ([111]Figure [112]1f).
   Additionally, the genes in the subtype cluster S2 module show
   significant enrichment for hormone secretion and the AMPK signaling
   pathway ([113]Figure [114]3c,d). Actually, these are linked to the
   body’s active immune metabolic processes, which indicate the body’s
   immunological response as cancer progresses.^[115]40 After observing
   [116]Figure [117]3e,f, we found that the genes in subtype cluster S3
   are mainly enriched in epidermis development and the Wnt signaling
   pathway. Overall, through enrichment analysis of the gene modules
   related to the three subtypes, combined with the analysis of the
   prognosis of BC, the accuracy of unsupervised consistent clustering for
   BC subtyping is further confirmed, and the basis for clinical treatment
   of these subtypes is also provided.

Figure 3.

   [118]Figure 3
   [119]Open in a new tab

   GO bioprocess functional enrichment and pathway analysis of genes in
   (a, b) subtype cluster S1, (c, d) subtype cluster S2, and (e, f)
   subtype cluster S3 modules.

3.4. Identification of Gene Classifier Signatures Associated with BC Subtype
Clusters

   The distribution of different BC stages and ages in the three subtype
   clusters is depicted in [120]Figure [121]4
   a. Obviously, patients with N and M stages had no significant
   difference in age, while lower stage and T stages were more common in
   subtype clusters S1 and S3 ([122]Figure [123]4a). To avoid overfitting
   the subtyping characteristics, the LASSO regression on the gene
   expression of TCGA–BC patients was performed. As a result, a total of
   160 genes that are related to BC subtype clusters are obtained
   ([124]Figure [125]4b) and the optimal value of penalty parameters
   through 10 rounds of cross-validation is also determined ([126]Figure
   [127]4c). Additionally, the boxplots of the three stages also suggest
   obvious differences among the three BC subtype clusters, revealing the
   robustness of constructed gene classifier signatures ([128]Figure
   [129]4d).

Figure 4.

   Figure 4
   [130]Open in a new tab

   Identification and evaluation of the gene classifier for BC subtyping.
   (a) Distribution of follow-up data of breast cancer patients in three
   subtypes. (b) LASSO coefficient profiles of the module genes. (c)
   Selection of the penalty parameter (λ) in the LASSO model. (d) Gene
   classifier is represented by the boxplot in the three subtypes.

3.5. Assessment of the Gene Classifier Prognostic Signature for BC

   By analyzing the Kaplan–Meier curve, we found that the OS of BC samples
   in the high-risk group had a lower value than those in the low-risk
   group, indicating that the prognostic marker of the risk score was
   effective (p < 0.05) ([131]Figure [132]5
   a). In addition, the risk curve and scatter plot indicate that the risk
   factors and mortality of the high-risk group were higher than those of
   the low-risk group ([133]Figure [134]5b). Furthermore, ROC analysis
   reveals that the AUC of the risk score was 0.703 ([135]Figure [136]5c),
   suggesting the reliability of the obtained gene classifier signatures
   of BC. At the same time, the survival analysis results of BC subtype
   clusters obtained by using the unsupervised consistent clustering were
   relatively consistent with the prognosis results in high- and low-risk
   groups obtained from the prognostic model. In detail, the subtype
   cluster S3 with a better survival rate was similar to that in the
   low-risk group and the subtype cluster S2 with poor survival was close
   to that in the high-risk group ([137]Figure [138]5d). All of these
   findings imply that the gene signature is a predictive factor unique to
   BC patients.

Figure 5.

   Figure 5
   [139]Open in a new tab

   Survival evaluation of gene classifier signatures in TCGA. (a)
   Kaplan–Meier survival curve of the patients in high- and low-risk
   scores. (b) Patient subtypes were grouped based on the median level of
   gene classifier labels in TCGA. (c) Results of ROC analysis for the
   clinical factors. (d) Distribution of subtypes in high- and low-risk
   levels.

3.6. Validation of the Prognostic Signature for BC

   For further validating the stability of gene classifier signatures, the
   patients of BC in the METABRIC database were classified into two
   different groups, namely, high- and low-risk groups. Then, the OS of
   patients in high- and low-risk groups was analyzed by Kaplan–Meier
   survival analysis. As a result, samples from high-risk groups have
   higher OS than those from low-risk groups, suggesting that prognostic
   markers of risk scores were valid (P < 0.05) ([140]Figure [141]6
   a). The generated risk curves and scatterplots indicate that compared
   with the risk factor and mortality in low-risk groups, those in
   high-risk groups were high ([142]Figure [143]6b). All of these results
   suggest that the gene classifiers are highly reliable in marking
   prognostic markers of BC. Moreover, the molecular intrinsic subtypes of
   BC patients were determined by the PAM50 classifier algorithm. Clearly,
   according to gene classified signatures, three subtype clusters were
   identified ([144]Figure [145]6c) and there were also significant
   differences in prognosis (p < 0.05) among three subtype clusters
   ([146]Figure [147]6d), which is in agreement with the above results.
   Overall, all of the findings suggest that the genetic classification
   markers are independent prognostic factors in BC patients.

Figure 6.

   Figure 6
   [148]Open in a new tab

   Survival evaluation of gene classifier signatures in METABRIC. (a)
   Patient subtypes were grouped based on the median level of gene
   classifier labels in METABRIC. (b) Kaplan–Meier survival curve of OS in
   different groups for the BC patients in METABRIC. (c) K-means algorithm
   for molecular subtype classification in the BC dataset of METABRIC (n =
   1904). (d) Kaplan–Meier survival curve of OS between subtypes in
   METABRIC.

3.7. Independent Prognostic Ability of Gene Classifier Signatures

   For evaluating the independent prognostic ability of the obtained gene
   classifier signatures, we also investigated the correlation between the
   gene classifier signatures and clinicopathological features of TCGA–BC
   samples through univariate regression analysis and multivariate Cox
   regression analysis. The clinically common factors like age, tumor
   stage, group, and TNM status were registered as covariates for
   analysis. As a result, age, M status, and group are independent factors
   that can be used to predict the prognosis of BC patients ([149]Figure
   [150]7
   a,b). In addition, the clinicopathological information and independent
   prognostic factors are combined and then a nomogram involving three
   items in TCGA was constructed, which is used as a clinically relevant
   quantitative method for predicting the mortality of BC patients.
   Obviously, we found that by adding points for each prognostic
   parameter, each patient will be assigned a total score value and higher
   total scores corresponded to poorer patient outcomes ([151]Figure
   [152]7c). Moreover, calibration plots for the TCGA cohort A reveal a
   similar performance of the nomogram to the developed prognostic model
   ([153]Figure [154]7d). In addition, the DCA curve also indicated that
   the nomogram had good stability and reliability net benefit curve in
   age compared with other clinical factors ([155]Figure [156]7e).
   Finally, we further screened the gene signatures that are associated
   with the prognosis by employing univariate regression analysis. As a
   result, a total of ten gene signatures significantly related to
   prognosis were obtained, which contain BARD1, CASP7, CYP11A1, IL1RAP,
   ITGB1, RAB13, RPL28, TGFBR1, DUSP1, and CXCL13 ([157]Figure [158]7f).

Figure 7.

   [159]Figure 7
   [160]Open in a new tab

   Correlation between the gene classifier signatures and
   clinicopathological features of TCGA–BC samples. (a, b) Univariate and
   multivariate analysis including risk status and clinical factors. (c)
   Nomogram for comprehensively predicting the 1-, 3-, and 5-year OS of BC
   patients in the TCGA database. (d) Calibration plots for estimating the
   1-, 3-, and 5-year OS for TCGA–BC samples. (e) Decision curve plots of
   the “nomogram,” “group,” “pathology_M,” “age,” “All,” and “None” models
   by Cox regression analysis. (f) Significantly prognostic genes screened
   by univariate analysis.

3.8. Evaluation of Cellular Heterogeneity in the Subtype Cluster of BC
Patients

   To evaluate the heterogeneity of the molecular intrinsic subtypes of BC
   patients, single-cell analysis was performed on [161]GSE118389.
   Employing the immune cell-specific marker genes obtained from the
   published article,^[162]41 more than 1500 cells from six primary BC
   patients were clustered into 13 clusters and 12 cell types, i.e., CD8+
   T-cells, endothelial cells, mast cells, stromal cells, neutrophil,
   natural killer T-cells, multilymphoid progenitor cells, epithelial
   cells, secretory cells, B-cells, plasmacytoid dendritic cells, and
   cancer stem cells ([163]Figure [164]8
   a). Subsequently, the DEGs between different cell types were identified
   and visualized by the volcano plots ([165]Figure [166]8b). In addition,
   the signature distribution of key prognostic gene classifiers reveals
   that BARD1 is mainly expressed in T and cancer stem cells, while CASP7
   is mainly expressed in endothelial cells ([167]Figure [168]8c).
   Moreover, DUSP1, ITGB1, RAB13, and RPL28 are expressed in subsets other
   than T-cells and B-cells ([169]Figure [170]8c). With respect to TGFBR1,
   it was mainly expressed in T-cells, neutrophils, and epithelial cells.
   In contrast, CYP11A1, IL1RAP, and CXCL13 were not specifically
   expressed ([171]Figure [172]8c). Furthermore, to characterize the
   communication between key immune cells, tumor cells, and other
   microenvironmental cells, the significant ligand–receptor interactions
   between 12 cell types were detected and analyzed through the CellChat
   analysis. The results show that there is extensive cellular
   communication among tumor cells, NK-T cells, epithelial cells, and
   secretory cells ([173]Figure [174]8d).

Figure 8.

   [175]Figure 8
   [176]Open in a new tab

   Single-cell sequencing analysis. (a) Cells are divided into thirteen
   types, including CD8+ T-cells, endothelial cells, mast cells, stromal
   cells, neutrophil, natural killer T-cells, multilymphoid progenitor
   cells, epithelial cells, secretory cells, B-cells, plasmacytoid
   dendritic cells, and cancer stem cells. (b) Volcano of DEGs between
   different cell populations. (c) Violin plots displaying the expression
   of representative gene classifier signatures across the cell types
   identified in the single-cell dataset. The y-axis shows the normalized
   read count. (d) Cellular communication analysis among tumor cells, NK-T
   cells, epithelial cells, and secretory cells.

3.9. Distribution of Subtype Clusters in BC Intrinsic Profiles and Risk
Subgroups

   To investigate the association between the subtype clusters and the
   clinicopathological characteristics and risk subgroups of BC patients,
   we assessed the proportion of subtype clusters in BC molecular
   intrinsic profiles (i.e., basal, Her2, LumA, LumB, and normal) and the
   risk subgroups (i.e., high, low) using the chi-square analysis.^[177]42
   Interestingly, all of the BC molecular intrinsic profiles were
   identified in the three subtype clusters, but the proportions are
   different ([178]Figure [179]9
   a). For example, the proportion of subtype cluster S2 with poor
   prognosis in the basal-like profile was 80% (p = 9.49 × 10^–33),
   whereas subtype cluster S1 (68%) showed a BC normal-enriched profile (p
   = 2.35 × 10^–4, [180]Figure [181]9a). Of note, 52% of subtype cluster
   S3 exhibited a BC LumB-enriched profile (p = 3.21 × 10^–10, [182]Figure
   [183]9a). Additionally, compared with the proportion of subtype cluster
   S2 patients in the low-risk group, that in the high-risk group was
   significantly higher (p = 3.38 × 10^–6), whereas the proportion of
   samples in subtype clusters S1 and S3 in the low-risk group are higher
   (p = 1.15 × 10^–4, [184]Figure [185]9b). All of the results indicate
   that the classified subtype clusters and the clinicopathological
   features of BC patients exhibit a certain correlation.

Figure 9.

   Figure 9
   [186]Open in a new tab

   Correlation analysis of the clinical type and subtype clusters. (a)
   Histogram of the correlation between clinical immunohistochemical
   typing and subtype clusters. (b) Histogram of the correlation between
   high- and low-risk typing and subtype clusters.

3.10. Drug Sensitivity Analysis and Immune Cell Infiltration Evaluation

   Using the “pRRophetic” package of R, the IC50 differences of eight
   targeted and chemotherapeutic drugs in three different subtypes of
   clustering were studied to predict their sensitivities to drug therapy.
   [187]Figure [188]10
   a–h, respectively, shows the drug sensitivity results of Sorafenib,
   Gefitinib, Bleomycin, Bosutinib, Etoposide, Lenalidomide, Camptothecin,
   and Methotrexate in three subtypes of BC clusters. The statistical
   results demonstrated that subtype cluster S2 samples are generally more
   sensitive to tumor treatment drugs than the other two subtype clusters.
   In contrast, subtype cluster S3 is less sensitive to most drugs
   ([189]Figure [190]10a–h).

Figure 10.

   [191]Figure 10
   [192]Open in a new tab

   Drug sensitivity analysis as well as immune cell infiltration
   evaluation. (a–h) Scatter plot shows that drug sensitivity prediction
   scores are distributed differently among subtypes. (i) Comparison of
   immune cell proportions between subtypes (TCGA). (j–m) Scatter plot for
   the immune score, stromal score, estimated score, and tumor purity,
   which are distributed differently among subtypes.

   Additionally, the differences of 22 subtypes of immune cells among
   TCGA–BC samples in three subtype clusters are investigated using the
   CIBERSORT algorithm. According to the results, there are significant
   differences in immune infiltration between the three subtype clusters
   for B-cells naive, plasma cells, T-cells CD4 memory resting, T-cells
   CD4 memory activated, T-cells follicular helper, T-cells regulatory
   Tregs, T-cells γ delta, NK cells resting, monocytes, macrophages M0,
   M1, M2, dendritic cells activated, mast cells resting, and eosinophils
   ([193]Figure [194]10
   i), taking up a sizable portion of immune cell invasion. Besides,
   compared to other clusters with a poorer prognosis, the subtype cluster
   S1 with better survival showed a higher number of M1 and M2 macrophages
   and activated NK cells ([195]Figure [196]10i). Finally, the tumor
   purity for all BC samples in three subtype clusters as well as the
   stromal and immune scores were calculated. As a result, subtype cluster
   S2 had better immune, stromal, and estimate scores than the others,
   while subtype clusters S1 and S3 had more pure tumors ([197]Figure
   [198]10j–m).

4. Discussion

   With the rapid development of high-throughput sequencing and the
   accumulation of massive biomedical data, a large number of
   bioinformatics methods have been used to identify new gene signatures
   related to the progression, survival, or prognosis of malignant
   carcinoma.^[199]43−[200]45 Particularly, the critical role of the TME
   in tumor biology has made targeted TME therapy for tumors a research
   hotspot. BC, as one of the most common malignant tumors in women, has
   attracted more attention from researchers due to its poorer
   prognosis.^[201]46 As a biologically heterogeneous disease, BC has
   different molecular subtypes according to the gene expression profiles.
   Presently, we attempt to identify the BC subtype clusters by
   integrating genomic analysis with the TME. As far as we know, a
   large-amount dataset combined with multiple bioinformatics methods was
   first used to identify the intrinsic subtypes and the gene classifier
   signatures for BC patients.

   First, we performed unsupervised consistent clustering on the mRNA
   profiles of TCGA–BC cohort A and a total of three BC subtype clusters
   were obtained, which are named cluster S1 (n = 382), cluster S2 (n =
   328), and cluster S3 (n = 296) subtypes ([202]Figure [203]1
   b). Then, three corresponding subtype-related gene modules were
   obtained from WGCNA, which are specific to each BC subtype cluster
   ([204]Figure [205]2c). Further analysis of enrichment reveals that
   genes in subtype cluster S1 are principally involved in the regulation
   of angiogenesis and PI3K-Akt pathways, and genes in subtype cluster S2
   exhibit significant enrichment for hormone secretion and AMPK signaling
   pathways ([206]Figure [207]3a–d). Additionally, genes in subtype
   cluster S3 are mainly enriched in epidermis development and Wnt
   signaling pathways ([208]Figure [209]3e,f). Actually, these biological
   processes and signaling pathways exhibit crucial roles in the
   development of BC. For instance, angiogenesis is a crucial process in
   the formation of inflammatory BC (IBC), and genes and proteins involved
   in angiogenesis are often overexpressed and overactivated in
   IBC.^[210]47,[211]48 Recent studies have shown that hormone receptor
   (HR) + BC causes the majority of BC-related deaths.^[212]49 Yu et al.
   have also reported that in BC patients with bone, lung, and brain
   metastases, actionable mutations frequently occur in the PI3K/AKT/mTOR
   pathway.^[213]50 Inhibiting the activation of the PI3K/Akt pathway
   partially restores the sensitivity of trastuzumab-resistant
   HER2-positive BC cells.^[214]51

   In addition, the genetic diversities of BC patients demonstrate that it
   is important to identify gene signatures with distinct survival and
   druggable targets. Presently, LASSO regression analysis on the gene
   expression profiles of TCGA–BC patients obtained a total of 160 gene
   classifiers related to the BC subtype ([215]Figure [216]4
   b). Employing the obtained gene classifiers, we built the risk
   prognostic model, and this model was also then validated by the gene
   expression data of cohort B in the METABRIC database. The results show
   that the OS of samples from high-risk groups is higher than those in
   low-risk groups, suggesting that the risk model of prognostic markers
   was reliable (P < 0.05) ([217]Figure [218]6a). Indeed, the important
   role of OS in multiple stages of tumorigenesis and development has been
   reported in many studies.^[219]52,[220]53 Further, evaluation of the
   independent prognostic ability of obtained gene classifier signatures
   indicates that the BC patients in subtype cluster S2 have the poorest
   prognosis, while subtype clusters S1 and S3 exhibit good prognosis
   ([221]Figure [222]5d). Subsequently, ten gene signatures significantly
   related to prognosis were obtained, which contain BARD1, CASP7,
   CYP11A1, IL1RAP, ITGB1, RAB13, RPL28, TGFBR1, DUSP1, and CXCL13
   ([223]Figure [224]7f). All of these results suggest that the
   independent prognostic factors of BC patients can be the genetic
   classifier signatures.

   Besides, it is well-known that TME has an important role during the
   progression of human cancers, as well as the OS after the
   operation.^[225]54,[226]55 Presently, the heterogeneity of the
   molecular subtypes of BC patients was also explored by the single-cell
   analysis on [227]GSE118389. The results demonstrate that most of the
   key prognosis-related gene classification signatures are distributed in
   immune cells. Notably, there were more neutrophils in the subtype S2
   with poor prognosis ([228]Figure [229]8
   c), which is consistent with the previous study.^[230]56 Single-cell
   analysis also shows that subtype cluster S1 contains more types of
   immune cells, while subtype cluster S3 with a good prognosis includes
   more multilymphoid progenitor cells ([231]Figure [232]8c). Moreover,
   the chi-square analysis exhibits that 80% of basal-like profile samples
   are found in subtype cluster S2 (p = 9.49 × 10^–33), 68% of the
   normal-enriched profiles are subtype cluster S1 samples, and 52% of the
   subtype cluster S3 showed LumB enrichment (p = 3.21 × 10^–10,
   [233]Figure [234]9a), indicating the association among the
   classification subtype cluster and the clinicopathological features of
   patients with BC. Finally, drug sensitivity analysis demonstrates that
   subtype cluster S2 samples are generally more sensitive to tumor
   treatment drugs and subtype cluster S3 is less sensitive to most drugs
   ([235]Figure [236]10). All of the findings may provide a basis for
   targeting the subtype-specific molecular and immunotherapy of BC.
   However, the gene signatures that characterize each subtype cluster of
   BC need further be studied and validated by biological experiments.

5. Conclusions

   Taken together, three subtype clusters, S1, S2, and S3, are obtained
   from the gene expression profiles of TCGA–BC cohort A. WGCNA and
   pathway enrichment analysis revealed that PI3K-Akt, AMPK, and Wnt
   signaling pathways, respectively, participated in the subtype clusters
   S1, S2, and S3. Additionally, the risk prognostic model was constructed
   by the obtained 160 gene classifiers associated with the BC subtype and
   validated by the data in the METABRIC database. Clinical prognostic
   evaluation reveals the poor prognostic ability of subtype cluster S2
   and the good prognostic ability of subtype clusters S1 and S3. Then,
   ten crucial prognostic-related gene signatures, i.e., BARD1, CASP7,
   CYP11A1, IL1RAP, ITGB1, RAB13, RPL28, TGFBR1, DUSP1, and CXCL13, are
   screened. Finally, single-cell analysis demonstrates that most types of
   immune cells are enriched in subtype cluster S1 samples, more
   neutrophils are distributed in the subtype S2 with more sensitivity to
   tumor treatment drugs, and more multilymphoid progenitor cells are
   involved in subtype cluster S3. All of the results may provide an
   effective basis for the clinical precision treatment of BC.

Acknowledgments