Abstract

   Breast cancer is a highly heterogeneous disease that is clinically
   classified into several subtypes. Among these subtypes, basal-like
   breast cancer largely overlaps with triple-negative breast cancer
   (TNBC), and these two groups are generally studied together as a single
   entity. Differences in the molecular makeup of breast cancers can
   result in different treatment strategies and prognoses for patients
   with different breast cancer subtypes. Compared with other subtypes,
   basal-like and other ER+ breast cancer subtypes exhibit marked
   differences in etiologic factors, clinical characteristics and
   therapeutic potential. Anthracycline drugs are typically used as the
   first-line clinical treatment for basal-like breast cancer subtypes.
   However, certain patients develop drug resistance following
   chemotherapy, which can lead to disease relapse and death. Even among
   patients with basal-like breast cancer, there can be significant
   molecular differences, and it is difficult to identify specific drug
   resistance proteins in any given patient using conventional variance
   testing methods. Therefore, we designed a new method for identifying
   drug resistance genes. Subgroups, personalized biomarkers, and therapy
   targets were identified using cluster analysis of differentially
   expressed genes. We found that basal-like breast cancer could be
   further divided into at least four distinct subgroups, including two
   groups at risk for drug resistance and two groups characterized by
   sensitivity to pharmacotherapy. Based on functional differences among
   these subgroups, we identified nine biomarkers related to drug
   resistance: SYK, LCK, GAB2, PAWR, PPARG, MDFI, ZAP70, CIITA and ACTA1.
   Finally, based on the deviation scores of the examined pathways, 16
   pathways were shown to exhibit varying degrees of abnormality in the
   various subgroups, indicating that patients with different subtypes of
   basal-like breast cancer can be characterized by differences in the
   functional status of these pathways. Therefore, these nine
   differentially expressed genes and their associated functional pathways
   should provide the basis for novel personalized clinical treatments of
   basal-like breast cancer.

Introduction

   Breast cancer is highly heterogeneous and most frequently occurs in
   females. This disease is divided into several different clinical
   subtypes, including luminal A, luminal B, basal- and normal-like, based
   on differences in gene expression profiling and immunohistochemical
   indicators. Among these subtypes, basal-like breast cancer largely
   overlaps with triple-negative breast cancer (TNBC). Therefore,
   basal-like breast cancer is usually described as a form of TNBC, and
   these diseases are commonly studied together as a single group [[39]1].
   TNBC is an aggressive subtype of breast cancer that is defined by the
   absence of ER, PR and HER2, and it is characterized by poor prognosis
   and rarely benefits from targeted therapy [[40]2,[41]3]. Effective
   targeted therapies specific for TNBC currently do not exist, and
   treatment regimens for TNBC are limited [[42]4]. Hence, elucidating the
   mechanisms underlying resistance to adjunctive chemotherapy drugs and
   identifying new biomarkers and potential therapeutic targets in TNBC
   patients remain significant and challenging goals for modern clinical
   practice.

   Drug resistance is commonly observed in TNBC patients and is more
   common than in non-TNBC patients [[43]5,[44]6]. Studies have revealed
   numerous drug resistance mechanisms in TNBC patients, and multiple
   genes and biological pathways have been implicated in this process. For
   example, CD73 and CD133 have been shown to impact drug-mediated
   anti-tumor immune responses, and IMP3 regulates the drug resistance
   proteins, ABCG2 and HSF1, as well as autophagy related protein 7 (ATG7)
   [[45]4,[46]7–[47]9]. The PI3K/AKT/mTOR pathways have also been linked
   to drug resistance [[48]10] through the regulation of multiple
   biological processes in the human body.

   To realize the possibility of personalized therapy in TNBC patients,
   much research has been devoted to identifying personalized signatures
   by subdividing TNBC patients into subgroups that present different
   molecular characteristics or prognoses. One of the most significant
   works in this field was that of Lehmann and colleagues [[49]3]. Lehmann
   et al. identified six TNBC subgroups using K-means clustering by
   amassing TNBC patient data from multiple platforms. They demonstrated
   that TNBC can be divided into distinct subgroups, each of which has
   distinct molecular characteristics. However, whether chemotherapy
   response was significantly different between these six subgroups was
   not addressed in detail. To efficiently identify prognosis signatures,
   Ke-Da Yul [[50]11] treated the chemotherapy sensitive and resistant
   groups as two independent subgroups of TNBC, eventually identifying
   seven gene prognosis signatures. The purpose of this study was to
   integrate the main ideas of Lehmann and Ke-Da Yu to take into account
   both heterogeneity among TNBC subgroups and personalized resistant
   biomarkers in TNBC patients.

   One commonly used method to identify important specific genes that
   mediate a given phenotype is to determine the gene expression signature
   of the case group and then compare this with a control group. Genes
   that exhibit statistically significant differences in expression
   between these two groups are then potentially linked to the phenotype
   of the case group. Conventional variance test methods, such as t tests,
   Mann-Whitney tests, and the significance analysis of Microarray (SAM)
   approach [[51]12], have numerous advantages for analyzing homogeneous
   tumors. However, these methods are not suitable for highly
   heterogeneous tumors such as breast cancer. In particular, conventional
   variance test methods lack the sensitivity to identify personalized
   drug resistance genes when comparing breast cancer patients as a single
   group to controls. Indeed, potential specific biomarker genes are
   likely to only exhibit significant differential expression within
   particular subgroups, where they could play important roles in the
   individualized development of drug resistance [[52]13,[53]14].

   To identify these personalized genes, we developed an algorithm to
   score the differential expression of each gene between the sensitive
   group and the drug-resistant group and then verify significant genes
   via random perturbation. In addition to differentially expressed genes
   that would have been identified using conventional variance tests, this
   new method can also effectively identify personalized genes that are
   differentially expressed in specific subgroups of the case group. Next,
   the identified genes were subjected to cluster analysis and enrichment
   analysis. Finally, based on molecular correlations, basal-like cancer
   patients were divided into two high risky subgroups and two low risky
   subgroups, and drug-resistance-related genes and pathways were
   identified for each subgroup. This analysis allowed us to quantify the
   degree of similarity and specificity for the mechanisms of drug
   resistance between the different subgroups. Each subgroup showed a
   specific set of differentially expressed genes and pathways, although
   several common drug resistance genes and biological pathways were
   shared between multiple subgroups. It is possible that these subgroups
   share the same terminal target, and certain drug-related proteins may
   eventually be affected in all subgroups, despite the fact that
   different upstream biological processes are involved. The nine
   identified genes represent potential biomarkers and targets for
   personalized clinical treatment, and they may help to improve clinical
   efficacy and reduce the side effects of anti-cancer drugs.

Materials and Methods

Discovery cohort

   A total of 178 samples were downloaded from the expression profile
   dataset [54]GSE34138 in the GEO database, including 46 basal-like
   subtype patients (24 drug-sensitive patients and 22 drug-resistant
   patients), 68 luminal A subtype patients (four drug-sensitive patients
   and 64 drug-resistant patients), 44 luminal B subtype patients (six
   drug-sensitive patients and 38 drug-resistant patients), and 20
   normal-like and Her2 type patients. As we aimed to study the
   personalized resistant biomarkers for the Basal (HER2 negative) type of
   breast cancer, we removed the 20 normal-like and Her2 type patients.

Validation cohort

   To demonstrate that the resistant-related genes identified here have
   portability and repeatability, we obtained a validation cohort from the
   GEO and EBI databases ([55]GSE1456, [56]GSE3494, E-TABM-158). A total
   of 93 TNBC patients were screened based on ER, PR or HER2 status. The
   validation data were normalized using RMA methods and then integrated
   into one profile. Considering that the chemoresistance-induced relapse
   ratio was highest during the first 3 years [[57]11], we treated
   patients with DSS Time (Disease-Specific Survival Time in years) values
   less than 3 years as chemoresistant and those with DSS Time values
   greater than 3 years as chemosensitive.

Extraction of differentially expressed genes

   Data from breast cancer samples of the luminal and basal-like subtypes
   were first selected from the expression profile dataset [58]GSE34138.
   Due to intrinsic molecular variations between the samples of these two
   subtypes, the 46 basal-like subtype samples and 112 luminal subtype
   samples were analyzed separately to maintain sample homogeneity. The
   two sets of expression profiles were normalized using the RMA method to
   eliminate inherent differences in the levels of gene expression. All
   expression values were subject to Z-score corrections. In the
   normalized expression profiles, samples that were sensitive to
   chemotherapy (CR) were assigned to the sensitive group and those that
   were insensitive to chemotherapy (NOCR) were assigned to the
   drug-resistant group. The normal range was defined based on the
   expression values of the sensitive group (mean ± standard deviation).
   Effectively, this meant that the expression values of the patients in
   the sensitive group were defined as fluctuating within the normal range
   and that drug resistance was likely to occur when a patient expression
   values were outside that normal range. The number of samples in the
   sensitive group was defined as n1 and the number of samples in the
   drug-resistant group was defined as n2, and the score of gene g was
   calculated for expression outside the normal range for all
   drug-resistant groups using Formula 1.

   Formula 1. Calculation of deviation score
   [MATH: <mtable columnalign="left"><mtr><mtd><mspace
   width="10em"></mspace><mtext>score</mtext><mo>=</mo><mspace
   width="0.2em"></mspace><mstyle
   displaystyle="true"><munderover><mo>∑</mo><mrow><mi>i</mi><mo>=</mo><mn
   >1</mn></mrow><mrow><mi>n</mi><mn>2</mn></mrow></munderover></mstyle><m
   o
   stretchy="false">(</mo><msub><mi>X</mi><mrow><mn>2</mn><mi>i</mi></mrow
   ></msub><mo>−</mo><mi>X</mi><mo>′</mo><mo
   stretchy="false">)</mo></mtd></mtr><mtr><mtd><mtext>where</mtext></mtd>
   </mtr><mtr><mtd><mspace
   width="8em"></mspace><mi>X</mi><mo>′</mo><mspace
   width="0.2em"></mspace><mo>=</mo><mrow><mo>{</mo><mrow><mtable><mtr><mt
   d><mrow><mspace
   width="0.4em"></mspace><msub><mi>X</mi><mrow><mi>m</mi><mi>a</mi><mi>x<
   /mi></mrow></msub><mspace
   width="0.2em"></mspace><mi>i</mi><mi>f</mi><mspace
   width="0.3em"></mspace><msub><mi>X</mi><mrow><mn>2</mn><mi>i</mi></mrow
   ></msub><mo>></mo><msub><mi>X</mi><mrow><mi>m</mi><mi>a</mi><mi>x</mi><
   /mrow></msub></mrow></mtd></mtr><mtr><mtd><mrow><msub><mi>X</mi><mrow><
   mn>2</mn><mi>i</mi></mrow></msub><mspace
   width="0.2em"></mspace><mi>i</mi><mi>f</mi><mspace
   width="0.2em"></mspace><msub><mi>X</mi><mrow><mi>m</mi><mi>i</mi><mi>n<
   /mi></mrow></msub><mo><</mo><msub><mi>X</mi><mrow><mn>2</mn><mi>i</mi><
   /mrow></msub><mo><</mo><msub><mi>X</mi><mrow><mi>m</mi><mi>a</mi><mi>x<
   /mi></mrow></msub></mrow></mtd></mtr><mtr><mtd><mrow><mspace
   width="0.5em"></mspace><msub><mi>X</mi><mrow><mi>m</mi><mi>i</mi><mi>n<
   /mi></mrow></msub><mspace
   width="0.2em"></mspace><mi>i</mi><mi>f</mi><mspace
   width="0.3em"></mspace><msub><mi>X</mi><mrow><mn>2</mn><mi>i</mi></mrow
   ></msub><mo><</mo><msub><mi>X</mi><mrow><mi>m</mi><mi>i</mi><mi>n</mi><
   /mrow></msub></mrow></mtd></mtr></mtable></mrow></mrow></mtd></mtr></mt
   able> :MATH]
   (1)

   In this formula, X[2i] represents the expression value of gene g in
   patient I in the drug-resistant group, and X[max] and X[min] represents
   the two limit values of the normal range. The raw score of gene g was
   calculated using the cumulative sum of the scores for gene g in the n2
   patients in the drug-resistant group.

   After obtaining the raw score for gene g, the n1 samples in the
   sensitive group and the n2 samples in the drug-resistant group were
   subject to 10,000 random permutations, and the n1 samples were then
   randomly selected as the sensitive group, with the remaining n2 samples
   counting as the drug-resistant group. Then, a new random score was
   calculated using Formula 1. The above procedure was repeated 10,000
   times to acquire the background distribution of scores for gene g,
   which was then converted to a P value. A gene was considered to be
   significantly different between the groups when P<0.05.

Hierarchical cluster analysis

   Cluster analysis was performed to examine the extracted genes with
   significantly different levels of expression between patients with
   diverse outcomes in the luminal and basal-like subtypes. Average
   linkage hierarchical cluster analysis using Pearson correlations was
   conducted using the Cluster 3.0 program, and the data were visualized
   in Treeview [[59]15,[60]16]. The expression profiling data were
   filtered and standardized using the Cluster 3.0 program, and the genes
   and samples were standardized using the median center method. Centered
   correlation was used in the similarity matrix, and hierarchical
   clustering was used for the cluster analysis. The analyses were
   visualized as heat maps. As luminal type breast cancer can be divided
   into two subgroups, luminal A and luminal B [[61]17], the
   differentially expressed genes in luminal breast cancer patients were
   extracted for cluster analysis to evaluate the efficacy of the proposed
   method for subdividing this single breast cancer subtype into
   personalized subgroups. The basal-like type of breast cancer was then
   analyzed to identify intrinsic subgroups.

Allocation of subgroup-specific genes

   Samples that had similar molecular expression profiles were clustered
   together using hierarchical clustering. In the drug-resistant group,
   patients in different subgroups may share similar specific drug
   resistance mechanisms. By comparing the specific expression patterns in
   each subgroup, candidate genes were allocated into each subgroup.
   Assuming a total of m subgroups were obtained in the drug-resistant
   group and n subgroups in the sensitive group via hierarchical
   clustering, determination of whether a gene was differentially
   expressed in a specific subgroup was made by calculating the mean value
   of this gene in the subgroups of the drug-resistant group (x1, x2…xm)
   and the mean value of this gene in the subgroups of the sensitive group
   (y1, y2…yn). The fluctuation range was then calculated based on the
   mean expression values of the drug-resistant group and the sensitive
   group. If the mean value for a given gene within the subgroups of the
   drug-resistant group was outside of the fluctuation range of the
   sensitive group, this indicated that the gene was differentially
   expressed in the subgroups of the drug-resistant group compared with
   the sensitive group; therefore, this gene was considered to be specific
   for the drug-resistant subgroup. However, when the mean value of a
   given gene within the subgroups of the sensitive group was outside the
   fluctuation range of the drug-resistant group, this indicated that the
   gene was stably expressed in the sensitive group and that its abnormal
   expression could lead to drug resistance; therefore, genes of this type
   were allocated to the sensitive subgroup.

Identification of specific pathways and genes related to drug resistance

   The corresponding specific gene set was obtained by allocating
   differentially expressed genes to various subgroups based on their mean
   expression values. These subgroup-specific genes exhibited significant
   differences in expression when compared with the sensitive group. Thus,
   they represent candidate genes that may be involved in drug resistance
   mechanisms in the different subgroups, and research into the functions
   of these specific genes and the biological processes they affect could
   be extremely useful for personalized clinical treatment. To analyze the
   biological processes in which a specific set of genes are involved,
   functional enrichment analysis was performed for the specific genes in
   each subgroup. KEGG pathway enrichment analysis was completed using the
   molecule annotation system V3.0 [[62]18], and pathways with P values
   lower than 0.05 were considered to be statistically significant.

Pathway deviation score

   Functional annotation analysis was conducted on the specific gene set
   of the subgroups in the drug-resistant group. As specific genes
   exhibited different expression patterns in the different subgroups, the
   corresponding functional levels also varied. Functional pathways
   exhibiting differential expression levels in drug-resistant patients
   compared with sensitive patients could provide important clues for the
   development of personalized therapies for breast cancer. Hence,
   quantitative scoring of potential pathways was performed based on genes
   that were enriched in each potential pathway.

   Formula 2.
   [MATH: <mrow><mtext>A</mtext><mrow><mo
   stretchy="false">(</mo><mtext>P</mtext><mo
   stretchy="false">)</mo></mrow><mspace
   width="0.1em"></mspace><mo>=</mo><mtext>lg</mtext><mrow><mo
   stretchy="false">(</mo><mrow><mfrac><mn>1</mn><mi>N</mi></mfrac><msqrt>
   <mrow><msubsup><mi>∑</mi><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi
   >N</mi></msubsup><msup><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mover
   accent="true"><mrow><msub><mi>X</mi><mi>i</mi></msub></mrow><mo
   stretchy="true">¯</mo></mover><mo>−</mo><mover
   accent="true"><mrow><msub><mi>Y</mi><mi>i</mi></msub></mrow><mo
   stretchy="true">¯</mo></mover></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mn>2</mn></msup></mrow></msqrt></
   mrow><mo stretchy="false">)</mo></mrow></mrow> :MATH]
   (2)

   For pathway P, A(P) is the deviation score of the pathway, N is the
   number of differentially expressed genes in this pathway, X[i] is the
   mean expression value of gene i across the subgroups, and Yi is the
   mean expression value of gene i in the sensitive groups. The normal
   deviation level of pathway P in a given subgroup was obtained by
   calculating the decimal logarithm of the cumulative sum of the
   Euclidean distance of all genes in pathway P. Finally, functional
   pathways in which specific differences were observed among the various
   subgroups and the associated regulatory genes were identified by
   comparing the deviation of each pathway between the different
   subgroups.

Resistant biomarker identification

   We constructed the protein interaction network using resistance-related
   genes in common between the functional pathways. The gene degree in
   network was calculated, which indicates how many genes are directly
   linked to this gene. Genes with higher values have much more impact on
   the network as a whole, and could therefore influence the expression of
   multiple resistance-related genes, so therefore, we treated these genes
   as important resistance-related biomarkers. The interaction information
   was integrated from the STRING, HPRD, BioGrid databases. The degree of
   all nodes was converted using the base 2 logarithm.

Performance on the validation cohort

   The discovery and validation cohorts were treated as the training and
   test set, respectively. All of the identified resistance biomarkers
   were treated as features. We trained a decision tree model on the
   training set and then evaluated the ability of the algorithm to
   distinguish between patients with different prognosis in the test set
   using identified biomarkers. To eliminate differences among the
   platforms, data from the discovery and validation cohorts were treated
   with discretization. For example, for gene i, the mean value of all
   samples was μ, and the standard deviation was s. If the expression
   value of gene i was greater than μ + s, the expression value was set as
   1; if the expression value was lower than μ - s, the expression value
   was set as -1. All other expression values were set as 0. The decision
   tree model was trained using the discovery cohort and resistance
   biomarkers and was then used to predict outcomes (relapse and
   non-relapse) in the validation cohort.

Results

Extraction of differentially expressed genes

   We identified a total of 2047 luminal-related differentially expressed
   genes between the resistant and sensitive patients through random
   perturbation analysis of the sample data, including 1149 downregulated
   genes and 898 upregulated genes. A total of 3020 basal-related
   differentially expressed genes were obtained, including 2087
   downregulated genes and 933 upregulated genes. Different subtypes of
   breast cancer have diverse chemosensitivity and may require different
   chemotherapy strategies. Therefore, we extracted differentially
   expressed genes between the resistant and sensitive groups in the
   luminal and basal subtypes, respectively. The comparison between the
   two subtypes is shown in [63]Fig 1.

Fig 1. Comparison of differentially expressed genes (DEGs) in Basal and
Luminal breast cancer (BC).

   Fig 1
   [64]Open in a new tab

   The left circle (blue) represents DEGs in basal type BC patients, and
   the right circle (orange) represents DEGs in luminal type BC patients.
   The overlapping and unique DEGs in two types of BC are shown using a
   Venn diagram.

   [65]Fig 1 shows a comparison of the genes that were differentially
   expressed between the basal and luminal-like breast cancer samples. Of
   these, 503 genes were found in common, accounting for 16% and 24% of
   the differentially regulated genes in the basal and luminal-like types
   of breast cancer, respectively. Therefore, the genes in this
   intersection were stably expressed in these two types of breast cancer
   and may be involved in the mechanisms of drug resistance shared by
   multiple breast cancer subtypes. On the other hand, many drug
   resistance-related genes were specific for only one of the two
   subtypes, as 84% of the basal-related genes and 76% of the
   luminal-related genes were differentially expressed in only a single
   subtype. Therefore, unique drug resistance mechanisms may exist in
   different subtypes of breast cancer. Identifying the specific
   mechanisms of drug resistance in these subtypes could provide the basis
   for personalized therapies in clinical practice.

Hierarchical clustering

   To identify and validate the existence of different subgroups within a
   breast cancer subtype, hierarchical clustering was performed using
   samples of luminal and basal-like breast cancer based on the genes that
   were identified as differentially expressed in these two subtypes. As
   there are two subgroups of luminal breast cancer, luminal A and luminal
   B [[66]17], hierarchical clustering was first performed with the 112
   luminal breast cancer samples, based on the 2047 differentially
   expressed genes, to validate the ability of our method to distinguish
   specific subgroups within the same subtype of breast cancer (shown in
   [67]Fig 2).

Fig 2. Hierarchical clustering of luminal breast cancer samples.

   [68]Fig 2
   [69]Open in a new tab

   A green-red heat map was used to visualize the clustering results. As
   illustrated, luminal type BC can be divided into multiple subgroups,
   indicated with different colors. Both similarities and differences were
   present between the subgroups. The red and green color key in the heat
   map represent up- and downregulated genes, respectively.

   [70]Fig 2 shows the clustering results for luminal breast cancer. The
   112 luminal breast cancer patients were divided into multiple subgroups
   based on similarities in the expression levels of the differentially
   expressed genes. There were 27 patients in group 1 (blue), 89% of which
   had the luminal A type of breast cancer. There were 14 breast cancer
   patients in group 2 (green), 86% of which had the luminal A type breast
   cancer. There were 18 breast cancer patients in group 3 (yellow), 61%
   of which had the luminal B type of breast cancer. There were eight
   breast cancer patients in group 4 (orange), 88% of which had the
   luminal A type of breast cancer. There were seven breast cancer
   patients in group 5 (red), all of which had the luminal A type of
   breast cancer. There were 21 breast cancer patients in group 6
   (purple), 71% of which had the luminal B type of breast cancer. Group 7
   (grey) was the mixed type, which consisted of 16 samples, and 90% of
   the sensitive group samples were in this group. Detailed Luminal
   patients labels in each subgroups were shown in [71]Table 1.

Table 1. Subgroups of luminal breast cancer.

   sub-group sample num luminal A luminal B CR dominant subtype
   1         27         24        3         0  luminal A
   2         14         12        2         0  luminal A
   3         18         7         11        0  luminal B
   4         8          7         1         0  luminal A
   5         7          7         0         1  luminal A
   6         21         6         15        0  luminal B
   7         16         5         11        9  luminal B
   [72]Open in a new tab

   Table l lists the seven subgroups identified through hierarchical
   clustering of luminal samples, where “sample num” refers to the number
   of samples in each subgroup, “CR” is the number of sensitive samples in
   each subgroup, and “dominant subtype” is the dominant breast cancer
   subtype in each subgroup.

   As shown in the clustering results, nearly all of the samples in the
   sensitive group were clustered in the same subgroup (group 7),
   indicating that the expression of these genes exhibited significant
   gene expression differences between the sensitive group and the
   drug-resistant group. In addition, the luminal A and luminal B samples
   were effectively separated and allocated into different subgroups,
   demonstrating that the proposed algorithm based on differential gene
   expression was able to effectively identify and distinguish the
   different luminal subgroups with high accuracy (minimum 61%, maximum
   100%).

   As this method could effectively distinguish between the luminal A and
   B subgroups, this method was also employed for clustering analysis
   using the basal-like subtype breast cancer samples to identify
   potential subgroups with different drug resistance mechanisms within
   this subtype. The clustering results for the basal-like subtype are
   shown in [73]Fig 3.

Fig 3. The clustering results for the basal-like subtype.

   [74]Fig 3
   [75]Open in a new tab

   A heat map was used to visualize the clustering results for basal BC.
   Basal BC can be divided into 4 subgroups, indicated with different
   colors. CR and NOCR represent sensitive and drug resistant patients,
   respectively.

   [76]Fig 3 shows the clustering results for the basal-like subtype. A
   total of four subgroups were identified, with group 1 in blue (12
   samples), group 2 in green (9 samples), group 3 in orange (11 samples),
   and group 4 in red (14 samples). The majority of the NOCR and CR
   patients could be divided into different clusters based on differences
   at the molecular level. In addition, both NOCR and CR patients could be
   further subdivided into two distinct subgroups, suggesting that
   differential drug sensitivity is present even in the same clinical
   phenotype. Detailed sample information is presented in [77]Table 2.

Table 2. Subgroups of Basal BC.

   sub-group sample num NOCR CR
   1         12         7    4
   2         9          9    0
   3         11         3    8
   4         14         3    11
   [78]Open in a new tab

   Table 2 shows the distribution of NOCR and CR patients in the four
   subgroups. CR and NOCR represent sensitive and drug-resistant patients,
   respectively.

   As shown in [79]Table 2, among the four subgroups of basal-like breast
   cancer, NOCR cases were predominantly found in groups 1 and 2, which
   were designated as the drug-resistant groups. CR cases were
   predominantly found in groups 3 and 4, which were designated as the
   sensitive groups. Therefore, subgroups with distinct molecular profiles
   were present in both the drug-resistant group and the sensitive group
   of basal-like breast cancer. Groups 1 and 2 provide evidence for
   variation in the molecular basis of drug resistance. Indeed, the
   patients with drug-resistant basal-like breast cancer were divided into
   at least two different subgroups, indicating the existence of
   significantly different drug resistance mechanisms. Importantly, this
   result could explain why resistance to a given drug is only observed in
   a subset of breast cancer patients with the same cancer subtype,
   whereas other patients with the same subtype remain sensitive to that
   drug. Groups 3 and 4 consisted predominantly of drug-sensitive samples.
   These groups belonged to the same class in the first round of cluster
   analysis, suggesting that the sensitive group could also be further
   subdivided into subgroups based on similarities at the molecular level.
   These subgroup-specific biomarkers could help to create personalized
   treatments for patients with differential drug sensitivity. To identify
   these biomarker genes, we allocated genes to subgroups based on
   expression differences among the subgroups.

Allocation of specific genes in the subgroups

   Four different subgroups were identified among the basal-like breast
   cancer samples, including two drug-resistant groups and two
   drug-sensitive groups. The differentially expressed genes for
   basal-like breast cancer were allocated based on their mean expression
   values in the four subgroups to yield the specific gene sets for each
   of these subgroups. The number of genes identified for four groups were
   1079, 1203, 1122, 1236, respectively ([80]S1 Table). There were 819
   common genes in the intersection of subgroup 1, 2, 3 and 4. The genes
   in the intersection between these groups exhibited stable differential
   expression in the drug-resistant group and the sensitive group; thus,
   they are likely to participate in the regulation of the shared
   mechanisms underlying drug resistance or sensitivity in these different
   subgroups. However, genes that did not overlap between these groups
   likely represent genes that are specific to a given subgroup. These
   genes only exhibit differential expression in specific subgroups, and
   therefore, they are likely to be involved in the specific mechanisms of
   drug resistance unique to each subgroup of the drug-resistant group.

Functional annotation analysis of subgroup specific genes

   To further study the drug resistance mechanisms that were shared by
   multiple subgroups or were specific to a single subgroup, KEGG
   functional pathway annotation analysis was conducted using the genes
   specific to groups 1 and 2, as well as the genes in the intersection
   between these groups, as shown in [81]Fig 4 ([82]S2 and [83]S3 Tables).

Fig 4. Pathway enrichment analysis.

   [84]Fig 4
   [85]Open in a new tab

   This figure depicts the results of the KEGG functional pathway
   enrichment analysis with genes specific to subgroups 1 and 2, as well
   as the genes shared between these subgroups. The pathways in the blue
   box represent the pathways enriched for the subgroup 1-specific genes,
   the pathways in the green box represent those enriched for the subgroup
   2-specific genes, and those in the purple box represent those enriched
   for the common genes. Only the top five pathways with the highest
   significance are listed in the figure; more detailed results are
   described in [86]S2 Table.

   Comparisons revealed that subgroup 1-specific genes were primarily
   involved in intercellular signal transduction processes, including the
   regulation of actin, cell adhesion, hematopoietic cell linkage and
   leukocyte migration. Subgroup 2-specific genes were primarily involved
   in the regulation of actin, focal adhesion, and the synthesis and
   metabolism of amino acids and sphingomyelin. The pathways enriched in
   the genes that showed overlapping expression patterns in the two
   subgroups were primarily involved in immune regulatory processes,
   including antigen presentation, natural killer cell-dependent
   processes, cytotoxicity effects and cytokine receptor signaling. These
   findings indicate that the genes that are misexpressed in different
   subgroups of the drug-resistant basal-like breast cancer group are
   primarily involved in immune regulation. Therefore, the immune response
   to chemotherapy agents is likely an important driver of drug
   resistance. In addition, subgroup 1 and subgroup 2 differentially
   expressed specific subsets of genes involved in similar pathways, such
   as actin regulation. These findings indicate that in response to
   chemotherapy drugs, abnormal connections of the extracellular matrix to
   intracellular cytoskeletal proteins, resulting from adhesion plaques or
   actin irregularities, could lead to the blockage of drug absorption by
   target cells and contribute to drug resistance. Furthermore, genes
   specific to subgroups 1 and 2 are also critical for processes such as
   blood cell linkage, leukocyte migration and the metabolism of glutamic
   acid and sphingomyelin, indicating that abnormalities in blood cell
   functions or glutamic acid and sphingomyelin metabolism may be
   important biomarkers for the onset of drug resistance.

   Similarly, functional enrichment analysis was also conducted for genes
   that showed similar expression in subgroups 3 and 4. These genes were
   stably expressed in both sensitive groups, although their expression
   levels were quite distinct from those observed in the drug-resistant
   groups. With respect to the specific drug resistance mechanisms
   acquired by patients in subgroups 1 and 2, the genes in the sensitive
   group intersection may be involved in common drug resistance mechanisms
   that regulate basal-like breast cancer and may in fact represent a
   common resistance pathway. The enrichment results are shown in
   [87]Table 3.

Table 3. Pathway enrichment of subgroups 3 and 4.

   Pathway                                Count p-Value  q-Value
   Jak-STAT signaling pathway             6     1.41E-05 4.40E-04
   Cytokine-cytokine receptor interaction 6     2.55E-04 0.003289
   GnRH signaling pathway                 4     4.61E-04 0.004927
   Non-homologous end-joining             2     9.70E-04 0.008849
   Purine metabolism                      4     0.001703 0.014915
   MAPK signaling pathway                 5     0.002323 0.016957
   Hematopoietic cell lineage             3     0.003097 0.020551
   Gap junction                           3     0.003737 0.024069
   Nucleotide excision repair             2     0.009445 0.040375
   Notch signaling pathway                2     0.010724 0.041937
   [88]Open in a new tab

   Table 3 lists significantly enriched pathways. Counts represent the
   number of DEGs enriched in each pathway. The p values were calculated
   using the hypergeometric distribution. Q values are the adjusted p
   values after FDR.

   As shown in [89]Table 3, genes stably expressed in the sensitive group
   were enriched in pathways regulating intercellular interactions,
   including cytokine receptors, gap junctions, and Notch signaling, as
   well as multiple other signaling pathways, such as STAT, GnRH and MAPK.
   Among these pathways, the STAT signaling pathway was the most
   significantly affected, with six genes being identified, including
   IL-4R, IL-15RA, STAT2, IL-2RA, CBLB and CSH1. In addition to the STAT
   pathway, the GnRH [[90]19] and MAPK [[91]20] signaling pathways were
   also identified as being related to drug resistance in breast cancer.

Calculation of pathway deviation scores

   To study functional differences in the specific and common pathways in
   the various subgroups, the specific and common pathways were integrated
   for subgroups 1 and 2, and the common pathways were integrated for
   subgroups 3 and 4. The abnormal deviation scores obtained for each of
   the 16 pathways in subgroups 1 and 2 and the sensitive subgroups 3 and
   4 are listed in [92]Table 4. Pathways showing abnormalities in specific
   functions were identified by comparing the scores of the two
   drug-resistant groups (subgroups 1 and 2) and the sensitive groups
   (subgroups 3 and 4), as shown in [93]Fig 5.

Table 4. Deviation scores of the pathways.

 Pathway                                      subgroup 1 subgroup 2 subgroup 3/4
 Aminoacyl-tRNA biosynthesis                     0.67       1.42        0.18
 Antigen processing and presentation             0.84       0.65        0.14
 Cell adhesion molecules (CAMs)                  0.75       0.56        0.01
 Cytokine-cytokine receptor interaction          0.7        0.59        0.06
 Focal adhesion                                  0.71       0.82        0.57
 Glutamate metabolism                            0.67       1.42        0.18
 Glycosphingolipid biosynthesis—ganglioseries    0.38       1.32        0.46
 GnRH signaling pathway                          0.53       0.86        0.64
 Hematopoietic cell lineage                      1.09       0.77        0.29
 Jak-STAT signaling pathway                      0.91       0.89        0.4
 Leukocyte transendothelial migration            0.75       0.59        0.1
 Natural killer cell mediated cytotoxicity       0.74       0.61        0.07
 Non-homologous end-joining                      0.27       0.81        0.31
 Pathogenic Escherichia coli infection—EHEC      1.06       0.68        0.37
 Purine metabolism                               0.79       0.73        0.34
 Regulation of actin cytoskeleton                0.88       0.59        0.3
   [94]Open in a new tab

   Table 4 shows the deviation scores of each of the 16 pathways for the
   different subgroups that were calculated using Formula 1. Higher scores
   indicate a higher degree of deviation from the normal levels of the
   pathway, suggesting more obvious abnormalities in pathway function.

Fig 5. Pathway deviation scores of the subgroups.

   [95]Fig 5
   [96]Open in a new tab

   Subgroups 1 and 2 were marked with blue and red lines. To observe the
   deviation of subgroup 1 and subgroup 2 from the sensitive range, we
   also calculated the deviation scores of subgroup 3 and subgroup 4,
   marked in green (the deviation scores were the same for these two
   subgroups). The scores of the 16 pathways ranged from 0 to 1.6.

   As shown in the [97]Fig 5, subgroup 1 exhibited abnormalities in
   Pathogenic Escherichia coli infection, Natural killer cell mediated
   cytotoxicity, Hematopoietic cell lineage, Antigen processing and
   presentation, and Cell adhesion molecules (CAMs), whereas subgroup 2
   displayed abnormalities in aminoacyl-tRNA biosynthesis, glutamate
   metabolism, Glycosphingolipid biosynthesis–ganglioseries and
   Non-homologous end-joining. Except for aminoacyl-tRNA biosynthesis, all
   of these pathways that were abnormal in the drug-resistant group have
   been shown to be involved in the development of drug resistance
   [[98]21–[99]25]. However, a study by Palaskas N. et al. in 2011
   reported that genes related to basal-like breast cancer were highly
   enriched in the functional pathway of aminoacyl-tRNA biosynthesis
   [[100]26], indicating a close relationship between basal-like breast
   cancer and irregularities in this pathway.

Resistant biomarker identification

   A total of 118 drug resistance candidate genes were extracted from the
   16 pathways related to drug resistance. These genes exhibited
   significant differential expression in at least in one drug-resistant
   subgroup or one sensitive group. Drug-resistant genes often exert their
   effects on downstream functional pathways by inducing the abnormal
   expression of associated genes, eventually leading to decreased
   sensitivity towards drugs. Therefore, to identify important
   drug-resistance genes, a PPI network was created for the 118 candidate
   genes and the 819 common genes based on protein-protein interaction,
   and the degree distribution was calculated for the nodes. The degree of
   the nodes was converted using the base 2 logarithm, and the number of
   nodes that were distributed at different intervals was statistically
   analyzed. As shown in [101]Fig 6, the number of nodes with a degree
   distribution at intervals 1–4 in the network was the largest, whereas
   the number of nodes (21) with a degree greater than 4 was the smallest.
   Finally, genes that showed interactions with at least 25
   drug-resistance candidate genes and common genes were selected as
   important drug resistance-related biomarkers, which were the gene nodes
   with a degree greater than 4.7 after logarithm conversion. A total of 9
   important marker genes were obtained at the end of the analysis, as
   shown in [102]Table 5.

Fig 6. Degree distribution in the PPI network.

   [103]Fig 6
   [104]Open in a new tab

   The network was constructed using candidate genes and common resistance
   genes. The degree of a gene represents the number of adjacent genes in
   the network that directly interact with that gene. The degree of all
   genes was converted using base the base 2 logarithm. Genes with a
   degree distribution of 4–6 were least common. These genes have a
   greater impact on the network and are considered to be important
   drug-resistance markers.

Table 5. Degree of resistance biomarkers.

   resistant biomarker log(Degree)
   SYK                 6.11
   LCK                 6.11
   PPARG               5.52
   GAB2                5.17
   ZAP70               5.13
   PAWR                5
   MDFI                5
   CIITA               4.75
   ACTA1               4.7
   [105]Open in a new tab

   Table 5 lists the nine resistant biomarkers and their corresponding
   network degrees. All of these biomarkers have large degree values,
   indicating they should have a larger impact on drug sensitivity
   compared with other genes.

   [106]Table 5 lists the 9 genes and their corresponding degree
   distributions after logarithm conversion. These genes are associated
   with multiple drug-resistance candidate genes or common genes in the
   network, and therefore play important roles in the development of drug
   resistance. Verification data were then used to test the effectiveness
   of these genes in determining the prognosis of TN breast cancer
   patients.

Performance of the resistance biomarkers in the validation cohort

   The discovery validation cohorts were both treated with discretization,
   and then we trained the decision tree model using the expression of
   nine genes from the discovery cohort. The average accuracy reached 83%,
   representing 95% accuracy for predicting recurrent TN patients and 47%
   accuracy for non-recurrent TN patients, as shown in [107]Table 6. This
   result demonstrates the nine identified resistant biomarkers are highly
   informative for the prediction of relapse due to drug resistance to
   chemotherapies, although they were less effective for patients who were
   sensitive to chemotherapies with no relapse. This difference was likely
   caused by sample imbalance within the validation cohort. As the number
   of samples without relapse was significantly larger than with relapse
   (67 vs. 23), the accuracy was lower for the patients without relapse,
   which had a large number of samples.

Table 6. Classification report of the validation cohort.

                  precision recall f1-score support
   Recurrence       0.95     0.64    0.77   23
   Non recurrence   0.47     0.91    0.62   67
   avg / total      0.83     0.71    0.73   90
   [108]Open in a new tab

   Table 6 A decision tree model was used to predict outcomes in the
   validation cohort, and the average accuracy reached 83%. The precision,
   recall, f1-score and support for recurrence and non-recurrence are also
   included in the classification report.

Discussion

   Breast tissue is composed of highly heterogeneous epidermal cells,
   including luminal and basal cells, and these two types of cells are
   considered to be the progenitor cells from which the breast tissue
   originates [[109]27]. These cell types have significantly different
   expression markers and biological functions [[110]28], and as a result,
   different subtypes of breast cancer have remarkably different
   treatments and prognoses. Clinically, breast cancer is classified based
   on immunohistochemistry, and most patients diagnosed with basal-like
   breast cancer (i.e., TNBC) are usually treated with the same
   therapeutic regimen. However, although some patients are sensitive to
   this treatment, others develop drug resistance and may suffer relapse.
   This phenomenon suggests that breast cancers of the same subtype can
   exhibit markedly different responses to therapeutic agents due to
   variations at the molecular level. Thus, optimizing treatment outcomes
   will require personalized therapies. Sensitivity to chemotherapy agents
   is primarily determined by drug absorption, distribution, metabolism
   and excretion (ADME), as well as the function of drug efflux pump
   proteins [[111]29]. By contrast, lower correlations are observed
   between drug sensitivity and histochemistry types. Therefore, the
   classification of TNBC patients based on functional protein levels is
   an urgent clinical need. Personalized therapies based on the
   sensitivity of the patients to chemotherapy are expected to improve
   efficacy and reduce unnecessary side effects.

   Among TNBC patients, both the drug-resistant and drug-sensitive groups
   could be further divided into two subgroups, suggesting complex
   mechanisms underlying drug resistant to clinical chemotherapies. In the
   two subgroups of the drug-resistant group, the abnormal functions in
   subgroup 1 were primarily in pathways related to the immune system,
   such as natural killer cell mediated cytotoxicity, antigen processing
   and presentation. By contrast, in subgroup 2, the abnormal functions
   were enriched for pathways associated with the biosynthesis of cell
   membranes and protein, such as aminoacyl-tRNA biosynthesis and
   glutamate metabolism. Finally, 9 resistant biomarkers were identified
   from these aberrant pathways and were validated using the validation
   cohort, with the mean accuracy reaching 83%.

   Central to this study was the use of subgroup-specific genetic markers
   to determine whether TNBC patients are candidates for routine clinical
   chemotherapies. If a patient is predicted to be resistant to
   chemotherapy using this model, other treatment methods should be
   considered to improve prognosis, such as targeted treatments that
   avoids toxicity. On the other hand, in the validation cohort, survival
   “over 3 years” or “less than 3 years” was used to indicate chemotherapy
   sensitivity or resistance based on concept that non-pCR in TNBC is
   equivalent to recurrence or poor survival [[112]11]. Therefore, the
   model established in this study can not only predict the sensitivity of
   patients to chemotherapies, but it can also determine prognosis, such
   as risk for relapse.

   Two drug-resistant subgroups were identified in this study. These two
   subgroups exhibited significant differences at the functional level,
   indicating distinct mechanism of drug resistance between these two
   types of TNBC patients. Therefore, for patients in subgroup 1, drugs
   that improve immune functions might be considered to increase drug
   sensitivity and improve prognosis. For patients in subgroup 2,
   inhibitors of aminoacyl-tRNA and glutamate synthesis could be used to
   decrease the proliferative capability of tumor cells. As the 9
   resistant biomarkers displayed high-level degree distribution in the
   PPI network, they broadly regulate multiple drug resistance-related
   genes and the corresponding downstream pathways. Therefore, these
   biomarkers should provide new targets for clinical treatments.

   This research is highly relevant, as most currently available methods
   determining breast cancer treatments fail to consider heterogeneity
   when extracting differentially expressed genes, although a few
   signatures have been developed to evaluate chemosensitivity of the TNBC
   patients [[113]30,[114]31]. Therefore, we employed random disturbance
   to identify specific genes that were only differentially expressed
   among subgroups to recover individualized chemoresistance genes that
   would be missed by other methods, in which only the difference between
   drug-resistant and drug-sensitive groups was examined. Two
   drug-resistant subgroups were identified with significant differences
   at the functional level, and the functions of the genes that were
   misexpressed in each subgroup provide novel insights into the selection
   of clinical treatment strategies. The nine-gene signature identified in
   this study can not only predict chemosensitivity, but it can also be
   used to assess the survival length and the risk of relapse.

   This study has several limitations. First, the sample size in the
   discovery cohort and in the homogeneous validation cohort was limited.
   In particular, the discovery cohort had unequal numbers of samples of
   the two prognosis types (67 samples without relapse vs. 23 samples with
   relapse), leading to a higher predictive accuracy in patients with
   relapse and a lower predictive accuracy in patients without relapse.
   Second, the method used to standardize the data from the validation
   cohort does is not applicable to all published data. For example, the
   same gene could show large variance between different studies or when
   different detection methods were used. As a result, to rule out
   variation in the data across platforms, the validation cohort in this
   study was selected from the same platform ([115]GPL96), and the data
   were standardized using the RMA method.

   In conclusion, we identified two subgroups of chemoresistant TNBC
   patients and characterized their personalized abnormal functions. A
   nine-gene signature was proposed to classify TNBC patients with diverse
   chemosensitivity and prognoses, and these genes were derived from each
   resistant subgroup as personalized biomarkers. Therefore, these genes
   also represent potential therapy targets. By monitoring the expression
   changes of these genes, it may be possible to optimize therapeutic
   strategies and dosage adjustments, which could minimize treatment
   failure and side effects from overdoses. Although further validation
   and additional research are required, this study points the way towards
   novel personalized therapeutic strategies.

Supporting Information

   S1 Table. Specific genes in subgroups.

   (XLSX)
   [116]Click here for additional data file.^ (71.3KB, xlsx)
   S2 Table. Subgroup1 pathway enrichment.

   (XLSX)
   [117]Click here for additional data file.^ (14.8KB, xlsx)
   S3 Table. Subgroup2 pathway enrichment.

   (XLSX)
   [118]Click here for additional data file.^ (11.5KB, xlsx)

Acknowledgments