Abstract

Background

   Prioritization of breast cancer patients based on the risk of
   resistance to tamoxifen plays a significant role in personalized
   therapeutic planning and improving disease course and outcomes.

Methods

   In this work, we demonstrate that a genome-wide pathway-centric
   computational framework elucidates molecular pathways as markers of
   tamoxifen resistance in ER+ breast cancer patients. In particular, we
   associated activity levels of molecular pathways with a wide spectrum
   of response to tamoxifen, which defined markers of tamoxifen resistance
   in patients with ER+ breast cancer.

Findings

   We identified five biological pathways as markers of tamoxifen failure
   and demonstrated their ability to predict the risk of tamoxifen
   resistance in two independent patient cohorts (Test cohort1: log-rank
   p-value = 0.02, adjusted HR = 3.11; Test cohort2: log-rank
   p-value = 0.01, adjusted HR = 4.24). We have shown that these pathways
   are not markers of aggressiveness and outperform known markers of
   tamoxifen response. Furthermore, for adoption into clinic, we derived a
   list of pathway read-out genes and their associated scoring system,
   which assigns a risk of tamoxifen resistance for new incoming patients.

Interpretation

   We propose that the identified pathways and their read-out genes can be
   utilized to prioritize patients who would benefit from tamoxifen
   treatment and patients at risk of tamoxifen resistance that should be
   offered alternative regimens.

Funding

   This work was supported by the Rutgers SHP Dean's research grant,
   Rutgers start-up funds, Libyan Ministry of Higher Education and
   Scientific Research, and Katrina Kehlet Graduate Award from The NJ
   Chapter of the Healthcare Information Management Systems Society.

   Keywords: Therapeutic resistance, Pathway markers, Tamoxifen, Breast
   cancer
     __________________________________________________________________

Research in context.

Evidence before this study

   Treatment resistance plays a central role in disease management and
   outcomes, especially for patients with oncologic malignancies. Several
   groups have studied response to tamoxifen in ER+ breast cancer, yet the
   identification of markers that accurately predict tamoxifen resistance
   remains limited.

Added value of this study

   In this work, we derived a computational framework to predict treatment
   resistance to tamoxifen in ER+ breast cancer patients based on behavior
   of their molecular pathways, defined from changes in mRNA expression
   profiles. Our analysis identified five molecular pathways and their
   corresponding read-out genes that successfully predict risk to fail
   tamoxifen, as validated in two independent patient cohorts. These
   pathways have not been previously reported as associated with risk of
   tamoxifen resistance and provide an accurate estimate of response to
   tamoxifen, outperforming previously known resistance signatures.

Implications of all the available evidence

   Our study derived a marker panel to identify patients at risk of
   primary tamoxifen failure. Together with other studies, our work builds
   a foundation for personalized therapeutic planning for patients with
   ER+ breast cancer patients.

   Alt-text: Unlabelled box

1. Introduction

   Despite recent advances in diagnosis, classification, and therapeutic
   management, breast cancer (BC) remains one of the leading causes of
   cancer-related death in women worldwide [[29][1], [30][2], [31][3]].
   Nearly 70% of all diagnosed cases of breast tumors are estrogen
   receptor-positive (ER+) [[32]4,[33]5], making treatments that have
   anti-estrogen effects in the breast cells, such as tamoxifen, the
   standard-of-care for patients with ER+ breast cancers [[34]4,[35][6],
   [36][7], [37][8], [38][9]]. Despite the significant success of
   tamoxifen administration, nearly 30% of treated patients develop
   therapeutic resistance, ultimately leading to metastasis and lethality
   [[39]1,[40]10]. Therefore, prioritization of patients based on the risk
   of resistance to tamoxifen before treatment administration could play a
   significant role in personalize therapeutic planning for patients with
   ER+ breast cancer and builds a foundation to improve disease course and
   outcomes.

   Tamoxifen is a selective estrogen receptor modulator (SERM) and has
   agonist or antagonist activity depending on the tissue type [[41]11].
   In the breast cells, tamoxifen directly binds to the ER, blocking
   estrogen from attaching to the receptor and thus inhibiting the
   activity of estrogen-regulated genes and causing the repression of
   estrogenic effects [[42]4,[43]5,[44]12,[45]13]. However, the emergence
   of alternative mechanisms of estrogenic stimulation has been shown to
   cause resistance to tamoxifen. For example, some studies have
   demonstrated that ER+ breast cancers that overexpress HER2 and EGFR can
   activate the components of downstream signaling pathways which then
   stimulate both ER and estrogen receptor co-activator AIB1, and thus
   induce the estrogen agonistic activity of tamoxifen in breast cancer
   cells [[46]14,[47]15]. Another study noticed that the increased
   expression of HER2 signaling can also downregulate progesterone
   receptor (PR) levels in the ER+ breast tumors, where losing the PR
   expression serves as a biomarker of hyperactive growth factor
   signaling, leading to another possible mechanism of tamoxifen
   resistance [[48]16]. Despite the emerging role of HER2 in tamoxifen
   resistance, it only accounts for 10% of ER+ breast cancers
   [[49]12,[50]17], suggesting more complex resistance mechanisms in these
   cases and presenting a central clinical problem for patients with ER+
   breast cancer [[51]4,[52]5,[53]10,[54]12].

   In recent years, several groups have developed gene expression
   signatures of tamoxifen response for ER+ patients, including 10
   gene-signature by Men et al. [[55]18], 21 gene-signature by Paik et al.
   [[56]19] (known as Oncotype DX), and 2 gene-signature by Ma et al.
   [[57]20] While these signatures provide substantial advances to our
   understanding of individual genes involved in resistance, they do not
   yet capture the complex interplay between biological mechanisms that
   governs tamoxifen resistance. Here, we propose a pathway-centric
   computational framework to elucidate tamoxifen resistance and
   demonstrate that it outperforms known gene-based approaches. Advantages
   of our pathway-based approach lies in (i) its ability to identify a
   tightly connected cooperative group of genes unified by the same
   function [[58][21], [59][22], [60][23]]; (ii) studying molecular
   pathways, rather than individual genes, produces more reliable read-out
   outputs as they are less susceptible to experimental noise [[61]24];
   (iii) pathway-level view enhances our understanding of the biological
   mechanisms related to disease and treatment response [[62][25],
   [63][26], [64][27], [65][28]]; and finally (iv) looking at alterations
   in biological pathways enhances the likelihood of identifying potential
   therapeutic targets to preclude or overcome resistance.

   In this work, we have established a systematic pathway-centric
   computational framework to elucidate molecular pathways as markers of
   tamoxifen resistance in ER+ breast cancer patients, which we call
   pathER. Through the analysis of pathway activity in each ER+ patient
   and their association with response to tamoxifen (Training cohort,
   n = 53), we identified five biological pathways as markers of tamoxifen
   resistance: Retrograde Neurotrophin Signalling, Loss of NLP from
   Mitotic Centrosomes, RNA Polymerase III Transcription Initiation from
   Type 2 Promoter, EIF2 pathway, and Valine, Leucine and Isoleucine
   Biosynthesis. We have demonstrated the ability of the identified five
   candidate pathways to predict the risk of tamoxifen resistance in two
   independent patient cohorts [[66]29] (Test cohort 1, n = 66: log-rank
   p-value = 0.02, accuracy of leave one out cross-validation
   (LOOCV) = 85.8%; Test cohort 2, n = 77: log-rank p-value = 0.01,
   accuracy of LOOCV = 82.5%) and their independence from known
   covariates, such as age, tumor grade, tumor size, lymph node status,
   and PR status, as the absence of PR in ER+ tumor can be an indicator of
   HER2 activation and an aggressive phenotype [[67]16] (Test cohort 1,
   adjusted hazard ratio = 3.11; Test cohort 2, adjusted hazard ratio =
   4.24). Furthermore, we performed stratified Kaplan-Meier survival
   analyses on the PR+ and PR- patients as well as patients with Ki-67 low
   and Ki-67 high status and demonstrated the five candidate pathways can
   predict risk of resistance to tamoxifen in each PR and Ki-67 group with
   high accuracy. Importantly, as a negative control, we have demonstrated
   that the identified five candidate pathways did not classify patients
   simply based on the disease aggressiveness (log-rank p-value = 0.7,
   hazard ratio = 1.246) and that in fact pathways associated with disease
   aggressiveness do not overlap with the five candidate pathways. We have
   compared our method to other computational techniques to tackle
   treatment response, including Epsi et al. [[68]28] (which utilized
   extreme-responder analysis, using tails of the treatment response
   distribution to define a treatment response signature), Zhong et al.
   [[69]30] (which used Support Vector Machine approach as a base), Yu
   et al. [[70]31] (which uses random forest approach as a base), and mRNA
   data alone (without considering molecular pathways) and demonstrated
   that our method outperforms these techniques in predicting risk of
   resistance to tamoxifen. Furthermore, we have compared our pathway
   signature to other known signatures of tamoxifen response [71][18],
   [72][19], [73][20] and have shown the superiority of our pathway-based
   approach (adjusted hazard ratio = 3.11, hazard p-value = 0.0278).
   Finally, to enhance clinical applicability of our finding, we derived
   five read-out genes (each of which reflects activity changes in a
   corresponding pathway) and defined their treatment failure scoring
   system, indicating risk of developing tamoxifen resistance in two
   patient cohorts (Test cohort 1, adjusted hazard ratio = 3.1; Test
   cohort 2, adjusted hazard ratio = 6.95). Thus, we propose that the
   identified five candidate pathways and their read-out genes can
   potentially be used to prioritize patients who would benefit from
   tamoxifen treatment as their first-line therapy, and to identify
   patients at risk of tamoxifen resistance who should be offered an
   alternative regimen plan.

2. Materials and Methods

2.1. Patient cohorts utilized for study analysis

   All gene expression datasets of patients with ER+ breast cancer were
   obtained from publicly available GEO data repository [[74]32] from
   multi-institutional multi-PI comprehensive Loi et al. [[75]29] study
   [76]GSE6532 ([77]Supplementary Fig. 1, [78]Supplementary Table 1): (i)
   KIT-[79]GSE6532 utilized as a Training cohort; (ii) GUYT-[80]GSE6532,
   utilized as Test cohort 1; (iii) OXFT-[81]GSE6532, utilized as a Test
   cohort 2; and (iv) KIU-[82]GSE6532, utilized as a negative control
   cohort. Training cohort contains patient profiles of primary ER+ breast
   tumors (n = 57), archived at the Uppsala University Hospital (Uppsala,
   Sweden), profiled on Affymetrix Human Genome U133A array and Affymetrix
   Human Genome U133B array. Test cohort 1 contains patient profiles of
   primary tumors from patients with ER+ breast cancer (n = 70), archived
   at the Guy's Hospital (London, United Kingdom), profiled on Affymetrix
   Human Genome U133 Plus 2.0 Array. Test cohort 2 contains patient
   profiles of primary ER+ breast tumors (n = 77), archived at the John
   Radcliffe Hospital (Oxford, United Kingdom), profiled on Affymetrix
   Human Genome U133A, B array. Negative control cohort consists of
   not-treated patients with ER+ primary breast tumors (n = 51), profiled
   on Affymetrix Human Genome U133A, B array. All primary tumors samples
   in Training and Test cohorts were collected through surgery, diagnosed
   between 1980 and 1995 and received tamoxifen-only treatment for 5 years
   post-diagnosis as their adjuvant treatment.

2.2. Data normalization and filtering

   For each gene expression microarray dataset, a matrix of RMA (Robust
   Microarray Analysis) normalized signal intensity values was used
   [[83]29]. Using the most updated annotation file from GEO and the
   latest Affymetrix annotation files from Thermo Fisher database
   [[84]33], each probe set ID was annotated to gene ID; thereafter, probe
   IDs that annotated to different gene IDs or did not annotate to any
   gene ID were excluded. When multiple probe set IDs were mapped to the
   same gene, probes with the highest coefficient of variation (CV) were
   selected [[85]34, [86]35].

2.3. Breast cancer molecular subtypes

   Gene expression classifier of the breast cancer subtypes (PAM50) was
   applied to assign breast cancer patients to one of the intrinsic
   molecular subtypes: luminal A, luminal B, HER2-enriched,
   triple-negative/basal-like, and normal-like [[87]36,[88]37]. The
   subtype classification of each patient was determined based on the
   closeness between the average expression profile of 50 genes in each
   subtype centroid and the corresponding gene expression pattern of
   patient tumor, where the distances were measured utilizing Spearman's
   rank correlation [[89]36]. We utilized genefu package in R,
   intrinsic.cluster.predict function with pam50 [[90]38] to assign
   subtype membership and eliminate samples with HER2-enriched,
   triple-negative/basal-like, and normal-like subtypes (i.e., non ER+).

2.4. Single-sample gene set enrichment analysis (ssGSEA)

   To estimate pathway enrichment in each ER+ patient, we performed
   single-sample (i.e., single-patient) pathway enrichment analysis, where
   standardized gene expression profile for each patient was used as
   reference and genes from each biological pathway were used as a query
   in an unweighted (i.e., each gene had the same weight) single-sample
   gene set enrichment analysis (ssGSEA) [[91]39,[92]40]. For such
   analysis gene expression values for each gene were transformed into
   standardized scores (i.e., z-scores) in order to bring the expression
   level into a common scale across all samples [[93]41,[94]42]. Z-score
   for each gene in each sample was computed by subtracting the average
   intensity of this gene across samples from the intensity of this gene
   in each sample and dividing it by the gene's standard deviation (SD),
   where mean and standard deviation were estimated for each gene across
   all samples [[95]41]. In this way, after such z-scoring, each gene's
   mean is standardized to 0 and standard deviation to 1. Ranked list of
   z-scores across all genes for a given sample then defines a
   single-sample (i.e., single-patient) signature, utilized as a reference
   for pathway enrichment analysis.

   To acquire a comprehensive list of pathways, we utilized MSigDB C2
   pathway database [[96]43] which includes curated selections of 833
   pathways obtained from human gene sets using Reactome [[97]44], KEGG
   [[98]45], and BioCarta databases. Reactome database is a curated
   resource that describes fundamental biological processes including cell
   signaling, metabolism, regulatory, and human diseases with particular
   focus on signaling and metabolic pathways derived from a wide list of
   biomedical experiments and literature references [[99]44,[100][46],