Abstract

   The biology of breast cancer response to neoadjuvant therapy is
   underrepresented in the literature and provides a window-of-opportunity
   to explore the genomic and microenvironment modulation of tumours
   exposed to therapy. Here, we characterised the mutational, gene
   expression, pathway enrichment and tumour-infiltrating lymphocytes
   (TILs) dynamics across different timepoints of 35 HER2-negative primary
   breast cancer patients receiving neoadjuvant eribulin therapy
   (SOLTI-1007 NEOERIBULIN-[62]NCT01669252). Whole-exome data (N = 88
   samples) generated mutational profiles and candidate neoantigens and
   were analysed along with RNA-Nanostring 545-gene expression (N = 96
   samples) and stromal TILs (N = 105 samples). Tumour mutation burden
   varied across patients at baseline but not across the sampling
   timepoints for each patient. Mutational signatures were not always
   conserved across tumours. There was a trend towards higher odds of
   response and less hazard to relapse when the percentage of subclonal
   mutations was low, suggesting that more homogenous tumours might have
   better responses to neoadjuvant therapy. Few driver mutations (5.1%)
   generated putative neoantigens. Mutation and neoantigen load were
   positively correlated (R^2 = 0.94, p = <0.001); neoantigen load was
   weakly correlated with stromal TILs (R^2 = 0.16, p = 0.02). An
   enrichment in pathways linked to immune infiltration and reduced
   programmed cell death expression were seen after 12 weeks of eribulin
   in good responders. VEGF was downregulated over time in the good
   responder group and FABP5, an inductor of epithelial mesenchymal
   transition (EMT), was upregulated in cases that recurred (p < 0.05).
   Mutational heterogeneity, subclonal architecture and the improvement of
   immune microenvironment along with remodelling of hypoxia and EMT may
   influence the response to neoadjuvant treatment.

   Subject terms: Cancer genomics, Breast cancer, Tumour biomarkers

Introduction

   Breast cancer is the most commonly diagnosed cancer and the leading
   cause of female cancer death worldwide^[63]1. It represents a
   heterogeneous group of tumours with characteristic molecular features,
   prognosis and responses to available therapy^[64]2,[65]3.

   In the early stage breast cancer setting, treatment decisions are
   guided by clinical subtypes, namely hormone receptor (HR) positive
   (HR+/HER2−), human epidermal growth factor receptor 2 amplified (HER2+)
   and triple-negative breast cancer (TNBC). This general classification
   does not take into account the complex genomic landscape and breast
   cancer evolution during therapy administration and disease recurrence
   or progression^[66]4,[67]5.

   Currently, the biology of the neoadjuvant response to therapy is
   underrepresented in the literature. Large-scale genomic studies have
   mostly focused on the analysis of single primary breast
   cancers^[68]2,[69]3,[70]6–[71]9, which do not provide information on
   cancers over time. The analysis of the gene expression landscape of
   tumours has been shown to correlate with response to cytotoxic
   therapies^[72]10–[73]12, though very limited work has been done to
   characterise the genomic and transcriptomic changes across breast
   cancer patients receiving neoadjuvant therapy^[74]13. Therefore, the
   neoadjuvant setting in breast cancer provides a window-of-opportunity
   to explore the genomic and microenvironment modulation of tumours
   exposed to therapy over time^[75]5,[76]14.

   In this study, we temporally characterised the mutational, gene
   expression, pathway enrichment and tumour-infiltrating lymphocytes
   (TILs) dynamics across different timepoints over a 12-week period in
   HER2-negative primary breast cancers enrolled in the single-arm
   SOLTI-1007 NEOERIBULIN phase II clinical trial ([77]NCT01669252). We
   show that the mutational and immune tumour microenvironment remodelling
   of HER2-negative primary breast cancers provides a path forward for
   gathering biological insights from primary breast cancers.

Results

A clinical cohort of HER2-negative breast cancer patients

   Primary breast cancer tumour specimens were obtained from the
   open-label, single-arm SOLTI-1007 NEOERIBULIN phase II clinical trial
   ([78]NCT01669252). We included sequential primary tumour biopsies from
   35 HER2-negative (22 HR-positive and 13 HR-negative) breast cancer
   patients (1–3 tumour tissue samples per patient) during eribulin
   administration. Whole-exome data (N = 88 samples) generated mutational
   profiles and candidate neoantigens and were analysed along with
   RNA-Nanostring 545-gene expression (N = 96 samples) and stromal TILs
   (N = 105 samples) from 35 patients with HER2-negative breast cancer
   (Fig. [79]1a).

Fig. 1. The study schematics.

   [80]Fig. 1
   [81]Open in a new tab

   a. Tumour tissue samples underwent (i) Whole-exome sequencing (WES) for
   mutation and clonality detection followed by neoantigen prediction;
   (ii) Nanostring gene expression profiling; and (iii) stromal TILs
   counting. Our goal was to select samples that passed quality control
   and perform the temporal characterisation of the mutational, gene
   expression and TILs in serial primary HER2-negative breast cancers that
   were good responders or poor responders to eribulin. DNA sequencing
   (WES) was performed in 88 primary invasive breast cancers and matched
   the normal DNA of each patient. Of these, 66 tumour samples were used
   for mutational and clonality analyses. RNA-Nanostring gene expression
   profiling was performed in 96 primary invasive breast cancers. From the
   DNA sequencing data, candidate neoantigens were predicted. Stromal TILs
   were counted from the H&E slides in 91 out of 105 tumour specimens.
   Clinical features and the PAM50 intrinsic molecular subtypes of each of
   the sequential primary tumour’s biopsies were examined. TMB tumour
   mutation burden, ORR overall response rate. b Schematics of the
   clinical trial. Temporal tumour sampling and a number of samples
   included in each analysis and time point are depicted. Distribution of
   PAM50 molecular intrinsic breast cancer subtypes at V1 (diagnostic
   biopsy), and ORR at V3 (surgery). E eribulin administration, V1 visit
   one, V2 visit two, V3 visit three, VR visit recurrence, CR complete
   response, PR partial response, SD stable disease, PD progressive
   disease.

   Eighty per cent of cases had Ki67 greater than 14% at diagnosis.
   Disease recurrence after neoadjuvant therapy with eribulin was observed
   in six patients, despite the use of anthracyclines as adjuvant therapy.
   Although six patients presented clinic-radiologic recurrence, tumour
   material was available at the time of recurrence in one patient. The
   samples that passed quality control (i.e., tumour cellularity,
   sequencing quality, see Fig. [82]1a and “Methods”), were further
   processed and analysed.

   Clinical features of the cohort are summarised in Table [83]1 and the
   schematics of the study design are shown in Fig. [84]1b. The 5-year
   relapse-free survival was 85.6% (95% CI: 74.7–98.1%) after breast
   cancer was diagnosed and the overall survival rate at 5 years was 91.3%
   (95% CI: 82.4–100%) for the patients analysed here.

Table 1.

   Clinicopathological characteristics of the study cohort.
   Number of patients 35 p value (Fisher’s exact test)
   Age (mean) 52.8 years
   Histologic grade Diagnosis Surgery
   Grade 1 4 1
   Grade 2 16 18
   Grade 3 14 14 0.5
   Missing or not available 1 2
   ER status
   ER+ 21 18
   ER− 14 15 0.8
   Missing or not available 0 2
   PR status
   PR+ 17 17
   PR− 18 16 1
   Missing or not available 0 2
   KI67 at diagnosis
   > or = 14% 28 30
   <14% 7 3 0.3
   Missing or not available 0 2
   Clinical tumour stage at diagnosis
   T0 0 2
   T1 1 21
   T2 31 10
   T3 2 1 <0.001
   Node
   N0 24 19
   N1 11 12
   N2 0 4 0.1
   Histology
   IDC 30 27
   ILC 2 2
   other 3 4 0.9
   Missing or not available – 2
   Treatment response
   pCR 2
   Good responders (CR or PR) 19 (5 CR, 14 PR)
   Poor responders (PR or SD) 16 (12 SD, 4 PD)
   Missing or not available 3
   Residual cancer burden (RCB) after neoadjuvant treatment RCB1 = 1,
   RCB2 = 18, RCB3 = 13 patients, NA = 3
   Recurrence 6
   [85]Open in a new tab

The mutational and TILs profiling across temporal primary breast cancers

   We characterised the mutational landscape of the tumours through an
   analysis of tumour mutational burden (TMB), breast cancer driver
   identification^[86]15, mutational signatures^[87]16 and mutational
   clonality^[88]17 at baseline (V1) (28 cases) and correlated these at
   visit 2 (V2) (24 cases) and with the response after 12 weeks of therapy
   at the time of surgery (V3) (13 cases).

   TMB varied among patients at baseline (V1) (median 1.75 mutations per
   megabase (Mb) [range: 0.59–8.19], with a standard deviation of 1.48)
   but there was no evidence of variation across the three sampling time
   points (V2, median 1.62 mutations/Mb (range: 0.13–8.19), p = 0.62; V3,
   median 1.73 mutations/Mb (range: 0.38–7.11), p = 0.81) (mean coverage
   was ~40×) (Fig. [89]2a). At least one breast cancer driver
   gene^[90]18,[91]19 was detected in 28 (80%) patients further analysed.
   TP53 (n = 15) and PIK3CA (n = 11) were the most prevalent mutated
   driver genes in the cohort (Supplementary Fig. [92]1, [93]Supplementary
   Data). Mutations within these breast cancer drivers were detected at
   all time points as well as in the recurrent sample (case N021). In V1,
   TP53 mutation was more prevalent in basal cancers (N = 9), as compared
   to Luminal-A (N = 4) and Luminal-B (N = 2). Relapse free survival was
   not statistically significant when samples bearing TP53 mutations in V1
   (N = 15) were compared to those without the mutation (N = 20)
   [HR = 2.07, 95% CI: 0.35–12.4, p value = 0.43]. Likewise, no difference
   was found in patients with PIK3CA mutation (n = 11) [HR = 1.46, 95% CI:
   0.24–8.75, p value = 0.68] (Supplementary Fig. [94]1). However, this
   cohort was not adequately powered to detect an association between
   mutations in driver genes and clinical outcomes.

Fig. 2. Mutational landscape of HER2-negative primary breast cancers under
neoadjuvant chemotherapy.

   [95]Fig. 2
   [96]Open in a new tab

   a Landscape of mutational alterations over time. Stacked plots display
   mutational burden (top), breast cancer drivers (tile plots, middle),
   PAM50 molecular intrinsic subtypes, clinical–pathological responses per
   patient and purity (tile plots, middle), mutation signatures (filled
   histogram). b Mutation clonality and subclonal distribution across
   different responses to eribulin. Left panels: odds for complete or
   partial response; right panels: a relative hazard for relapse-free
   survival (RFS). The analysis was performed for all comers (top panels)
   and for HR-positive and HR-negative patients (bottom panels). c
   Distribution of selected driver mutations generating neoantigens.
   Driver gene mutations are coloured whether the mutation is clonal or
   subclonal. d TILs across PAM50 intrinsic subtypes. (*) refer to p
   value < 0.05; ns nonsignificant. For each box plot, the centre line,
   the boundaries of the box, the ends of the whiskers and points beyond
   the whiskers represent the median value, the interquartile range, the
   minimum and maximum values, and the outliers, respectively. e
   Relationship between predicted neoantigen load (y-axis) and
   nonsynonymous mutational load (x-axis) and between predicted neoantigen
   load (y-axis) and mean stromal TILs per patient (x-axis).

   We next investigated signatures of mutagenic biological processes to
   determine whether there were any detectable shifts during neoadjuvant
   therapy. A non-negative matrix factorisation technique to identify
   mutagenic processes in breast cancer including ageing, APOBEC cytidine
   deaminases, defective DNA repair, BRCA1/BRCA2 deficiency was
   used^[97]16. Mutational signatures were studied in each tumour sample
   across neoadjuvant therapy (Fig. [98]2a—bottom panel). Signature 1
   (C > T transitions at CpG dinucleotides) contributed to a relatively
   higher proportion of mutations, but others were predominant in some
   patients, such as APOBEC (signatures 2 and 13), were more prevalent in
   luminal B patients, and BRCA-related (signature 3) in basal-like
   tumours. Signatures were not always conserved across tumour samples
   (defective DNA repair signature was conserved in 62.5% of patients
   (n = 8) while APOBEC and BRCA-related in 40% (n = 7 and n = 10,
   respectively). No statistical differences were observed between
   mutation signatures and overall response rate (ORR) (all p
   values > 0.05, Fisher’s test) (Supplementary Fig. [99]1), although the
   study was not initially powered to detect this association.

   We computed tumour clonal composition and CCFs by applying the
   ABSOLUTE^[100]17 algorithm across temporal primary tumours from each
   patient to determine how clonal composition relates to outcomes.
   Mutations predicted to be clonal (found in every cancer cell) or
   subclonal (found in a subset of cancer cells) were identified: the
   proportion of clonal mutations was 44.7% on the first visit, 21.0% on
   the second visit, 28.9% at surgery and 5% at recurrence).

   We studied the shape of the association of subclonal mutations at
   baseline (V1) and the patientsʼ odds to obtain a complete or partial
   response (PR) and also the hazard to relapse (Fig. [101]2b, see
   “Methods”). Overall, there was a trend towards higher odds of response
   and less hazard to relapse when the percentage of subclonal mutations
   was low (i.e., from 0 to 40%), suggesting that less heterogeneous
   tumours might have better responses to neoadjuvant therapy. Further
   increases above 40% in the percentage of subclonal did not seem to
   impact outcomes. The same analysis was carried out for HR-positive and
   HR-negative patients (Fig. [102]2b, bottom panels). Overall,
   HR-positive patients had a trend to have better outcomes.

   We generated HLA-class I binding predictions for somatic mutations that
   may result in protein changes. Totally, 492 out of 2009 (24.50%) unique
   mutations were predicted to encode for at least one strong binder
   (binding rank < 0.5%) of HLA-class I. We analysed candidate neoantigens
   generated from nonsynonymous mutations. A total of 1002 putative
   neoantigens (average of 36 per patient, [range: 2–153]) were present in
   27 patients ([103]Supplementary Data).

   We inferred candidate neoantigens derived from breast cancer driver
   mutations (5.19% of total neoantigens) and whether they were clonal or
   subclonal. Overall, 52 neoantigens (average of 3.11 per patient [range:
   1–11]) were generated by 22 (unique) driver mutations in 17 patients.
   Driver mutations including TP53 (p.R110P, p.R209Kfs*6), PIK3CA
   (p.H1047L, p.K111E), MAP3K1 (p.R961Sfs*44, p.E504Vfs*36) predicted
   neoantigens in more than one patient, and in general, they were clonal
   and present across all samples of each patient (Fig. [104]2c). Each
   candidate neoantigen was predicted to be bound to one or more HLA
   class-I molecule.

   Finally, stromal TILs were counted in 91 samples of 34 patients and had
   a median of 9.3% [range: 0–40] among patients. Among the PAM50
   intrinsic subtypes, basal-like tumours had significantly more TILs than
   luminal A (p value = 0.02) but not luminal B (p value = 0.11) (Fig.
   [105]2d, Supplementary Fig. [106]1).

   We observed that the higher nonsynonymous mutation load was strongly
   correlated with neoantigen load and the latter was weakly correlated
   with stromal TILs. A significant positive correlation was observed in
   both analyses (R^2 = 0.94, p = <0.001 and R^2 = 0.16, p = 0.02),
   respectively (Fig. [107]2e).

   Therefore, although TMB and driver mutations were temporally conserved
   across neoadjuvant therapy, cases with a lower percentage of subclonal
   mutations, thus more homogeneous breast cancers, had a trend for better
   responses. TILs were weakly correlated with neoantigen load, and a
   small fraction of neoantigens were derived from driver gene mutations.

Multiple expressed genes across temporal primary breast cancers

   To identify differentially expressed gene candidates during the
   administration of neoadjuvant chemotherapy, we performed targeted gene
   expression profiling using a custom 545-gene Nanostring panel composed
   of breast cancer-related genes among 35 patients ([108]Supplementary
   Data).

   We observed a total of 155 genes differentially expressed in patients
   responding to eribulin at surgery (good responders, CR or PR [N = 19])
   vs. those with poor responses (PD or SD [N = 16]). After a false
   discovery rate (FDR) adjustment p value < 0.05, ALDH1A1, EVI2A, ADRA2A,
   LHFP, LOC400043 and PTGER4, were upregulated while VEGFA, FANCA, ORC6,
   KIFC1 and ANGPTL4 were downregulated in the good responder group (Fig.
   [109]3a, [110]Supplementary Data).

Fig. 3. Gene expression profiling in longitudinal primary breast tumour
biopsies.

   [111]Fig. 3
   [112]Open in a new tab

   a ORR at surgery (V3) (patients with good response to eribulin vs. poor
   response) (Top). Genes identified have a change corresponding to
   eribulin treatment. Volcano plots with the strength of the association
   on the y-axis (−log10 p values) and the effect size on the x-axis (log
   2-fold change (FC)). Differentially expressed genes were highlighted
   during different timepoints across neoadjuvant therapy (V1–V3). Genes
   above the red dotted line represent those whose expression levels were
   significantly different (p value < 0.01). A full list of the most up
   and downregulated genes can be found on [113]Supplementary Data.
   Boxplots of good and poor responders on eribulin over time,
   colour-coded by their corresponding poor response (red) or good
   response (blue) in eribulin from baseline to surgery (Bottom). b Same
   as in (a). Patients with later clinical recurrence vs. no recurrence
   (Top). Boxplots of individual patients on eribulin over time,
   colour-coded by their corresponding recurrence (purple) or
   nonrecurrence (grey) in eribulin from baseline to surgery (Bottom).

   To assess variability within and between individuals, gene expression
   levels were analysed over a 12-week period for each patient (Fig.
   [114]3a bottom panels, Supplementary Fig. [115]2). Consistent with the
   general trend in the previous analysis, increased levels of ALDH1A1,
   EVI2A, ADRA2A, LHFP, LOC400043 and PTGER4 were modulated over time in
   patients responding to eribulin.

   Overall and across all time points, we analysed patients that developed
   disease recurrence after curative therapy (n = 6) vs. those that did
   not recur (n = 29). We observed 58 upregulated genes in the group of
   patients with clinical recurrence as compared to those who did not have
   a recurrence. Among those were FABP5, YBX1, TUBB6, PLOD1 and CXCL8
   genes that were upregulated and analysed over 12 weeks during
   neoadjuvant therapy.

   Our analyses evidenced fatty acid-binding protein 5 (FABP5)
   overexpression at surgery (V3) was the only statically significant
   after FDR adjustment p < 0.05 in cases that recurred, as compared to
   those that did not recur (FDR = 0.04) (Fig. [116]3b, Supplementary Fig.
   [117]2). FABP5 is an inductor of the epithelial-mesenchymal transition
   (EMT) transition process^[118]20. EMT has been reported to contribute
   to tumour aggressiveness and eribulin resistance^[119]21,[120]22.

   Finally, we used pathway enrichment analysis to identify the activity
   of the immune system and common cancer signalling pathways before and
   after neoadjuvant therapy (see “Methods”, Table [121]2). At the
   baseline V1 timepoint, it was observed an increase in pathways
   associated with angiogenesis, hypoxia and epithelial features, in
   patients that had poor responses to therapy (SD/PD). In the good
   responder group, eribulin treatment was associated with statistically
   significant enrichment of pathways associated to lymphocyte and T cell
   activation in the end of therapy (V3 or surgery), and with reduction in
   expression of the immunosuppressive programmed cell death pathway. In
   contrast, there was an increase in cell proliferation and reduction in
   cytotoxic T cell pathways for the poor responder group at V3 or surgery
   timepoint. This suggests that eribulin apparent reversal of EMT and
   restoration of hypoxia might lead to improvement of immunosuppressive
   features in the primary tumours of good responders, and an increase of
   proliferation and reduction of immune cytotoxic activity in poor
   responders.

Table 2.

   Gene ontology association with important biological pathways.
   Visit Effect in tested group (poor responders) Pathway Database Pathway
   Genes p value
   V1 Increased GO:0045766 Positive regulation of angiogenesis PTGS2,
   VEGFA, AKT3, ADM, ANGPTL4 and CXCL8 0.00697
   KEGG:04066 HIF-1 signalling pathway VEGFA, AKT3, ERBB2 and EGFR 0.0276
   GO:0050678 Regulation of epithelial cell proliferation STRAP, VEGFA,
   AKT3, MYC, ERBB2, EGFR and NFIB 0.0217
   GO:1901342 Regulation of vasculature development PTGS2, VEGFA, AKT3,
   ERBB2, ADM, ANGPTL4 and CXCL8 0.0462
   V3 Decreased GO:0046649 Lymphocyte activation ERCC1, IGF1, IGBP1, CD86,
   IGFBP2, CDKN1A, VAV3, IL6ST, PIK3R1, ZEB1, TGFBR2, AXL, PTGER4 and
   KIF13B 0.0000413
   GO:0042110 T cell activation IGF1, CD86, IGFBP2, IL6ST, PIK3R1, ZEB1,
   TGFBR2, PTGER4 and KIF13B 0.0227
   GO:0043069 Negative regulation of programmed cell death IGF1, IGBP1,
   NR4A3, TWIST1, CDKN1A, IL6ST, KLF4, SLC40A1, CYR61, PIK3R1, BTG2, AXL
   and TWIST2 0.00716
   GO:0043066 Negative regulation of the apoptotic process IGF1, IGBP1,
   NR4A3, TWIST1, CDKN1A, IL6ST, KLF4, SLC40A1, CYR61, PIK3R1, BTG2, AXL
   and TWIST2 0.00595
   Increased GO:0012501 Programmed cell death YBX3, GAL, MCM2, BIRC5,
   AARS, ITGA6, MYBL2, NDRG1, VEGFA, SNAI1, KRT17, TOP2A, CDK4, MYC,
   BRCA2, TP53BP2, CDCA7, HSPD1, ADM, CHEK1, MAD2L1, ANGPTL4, DDIT4, BUB1,
   CDK1, KRT19, KRT5, KRT14, S100A14 and PGAM5 0.00385
   GO:0008283 Cell proliferation TACC3, STRAP, LAMC2, CDH3, GAL, GRHL2,
   BIRC5, CDC6, CDKN3, NDRG1, EZH2, FOXM1, BYSL, VEGFA, TTK, CDC20, CENPF,
   CKS2, DLGAP5, CCNB1, CDK4, MYC, BRCA2, KIF2C, CDCA7, HSPD1, CCNA2,
   MKI67, ADM, CHEK1, BTG3, CDC25C, FZD6, DDIT4, CXCL8, BUB1, CDK1, CTPS1,
   FANCA, PRC1 and BOP1 0.00000000266
   [122]Open in a new tab

Discussion

   Our study describes a step forward in gathering insights related to
   response to neoadjuvant therapy for HER2-negative primary breast
   cancers. Whole-exome sequencing, gene expression, pathway enrichment
   and TILs analyses of temporal primary breast tumours of 35
   HER2-negative breast cancer patients show cancer remodelling during
   neoadjuvant chemotherapy.

   The strengths of our analysis include the prospective design of the
   study, and the orthogonal genomic and immune infiltration data for each
   primary breast cancer systematically obtained from the SOLTI-1007
   NEOERIBULIN phase II clinical trial ([123]NCT01669252). Clonal driver
   mutations were maintained over time, but intratumoral genomic
   heterogeneity, measured as a fraction of subclonal mutations may affect
   response to therapy. Heterogeneity in mutational signatures across
   patients was considerably greater than at the intrapatient level, which
   was, in general, not conserved across temporal samples of each patient
   in 12 weeks of therapy administration. It should be noted, however,
   that eribulin is a weak chemotherapeutic agent which might explain the
   minimal change within the genomic landscape^[124]21. Its efficacy might
   be related to the fact that that it might change tumour phenotype
   (based on the EMT hypothesis)^[125]21 rather than changing tumour
   genotypes.

   Neoantigens have been predicted in a few primary breast cancer
   datasets^[126]23,[127]24. There is usually a discrepancy of neoantigens
   predicted computationally and those actually shown to leverage robust
   T-cell responses^[128]25,[129]26. An immunosuppressive tumour
   microenvironment might be associated with this fact. A recent analysis
   of The Cancer Genome Atlas (TCGA) dataset of invasive breast cancers,
   predicted HLA class I-binding neoepitopes for 870 breast cancer
   samples^[130]24. About 40% of the nonsynonymous mutations led to the
   generation of candidate neoepitopes and the neoepitope load was also
   highly correlated with the mutational burden. Here, 21% of the
   nonsynonymous mutations led to the generation of putative neoantigens.
   Candidate neoantigens and stromal TILs were positively correlated and
   were more frequent in basal-like primary tumours. Driver mutations such
   as frameshifts mutations in TP53 and MAP3K1 genes or missense mutations
   in the PIK3CA gene can generate tumour neoantigens, suggesting that a T
   cell-mediated immune response, if present and properly validated, would
   target all cancer cells^[131]15,[132]27,[133]28. The relevance of
   neoantigens derived from clonal driver mutations, particularly those
   arising from truncating mutations, is the possibility of being
   incorporated as targets for adoptive T cell therapies and cancer
   vaccines^[134]28,[135]29.

   The evaluation of TILs has been shown to represent a reliable surrogate
   of the immune anti-tumour activity and a robust independent prognostic
   biomarker in breast cancer patients, especially in the TNBC and
   HER2-positive breast cancer subtypes^[136]30,[137]31. Retrospective
   reports analysed TILs as prognostic and predictive markers in
   metastatic TNBC patients that received eribulin^[138]32. In a previous
   analysis, the high TILs group vs low TILs within the metastatic TNBC
   cases had better outcomes and suggested TILs to predict treatment
   response to eribulin in the TNBC metastatic setting^[139]32. In our
   early stage breast cancer cohort, stromal TILs detected in H&E slides
   were more frequent in basal-like primary tumours. A statistically
   significant higher percentage of stromal TILs after the end of
   neoadjuvant therapy was not observed possibly due to the small sample
   size and other biological factors including immunosuppressive TME.
   However, we observed an enrichment in pathways associated with immune
   infiltration and reduced programmed cell death after 12 weeks of
   eribulin administration corroborating findings observed in metastatic
   breast cancer. In a recent study^[140]33, lymphocyte infiltration
   increase and PDL1 expression turning to negative values were suggested
   to be improvements in the immunosuppressed TME of patients receiving
   eribulin for advanced breast cancer. Our findings add to current
   evidence for potentially combining checkpoint inhibitors to eribulin in
   the early breast cancer setting^[141]34.

   There is the emerging rationale for eribulin to activate the immune
   system in pre-clinical and retrospective studies, through EMT
   suppression, and vascular remodelling and improvement of the tumour
   immune microenvironment^[142]21. Our analyses indicate that temporal
   gene expression responses to eribulin might be associated with the
   process of restoration of hypoxia, and the EMT process, which should be
   contextualised with the immune TME. Eribulin treatment has shown to
   suppress genes that are known to be involved in hypoxic signalling
   cascades, including VEGF, contributing to TME remodelling and restoring
   the scenario of normoxia^[143]35,[144]36. We observed that VEGF was
   downregulated overtime during eribulin therapy in the good responder
   group. At diagnosis, poor responders presented an overexpression of
   angiopoietin-like 4 (ANGPTL4), a pro-angiogenic factor that is
   modulated by hypoxia and associated with poor prognosis, metastasis,
   cell differentiation and vascular permeability^[145]37. Previously,
   aldehyde dehydrogenase 1A1 (ALDH1A1) was reported as a cancer stem cell
   marker and suggested to have a favourable prognostic role for cervical
   cancer^[146]38,[147]39. Although we observed ALDH1A to be upregulated
   in good responders, its role in breast cancer is undefined yet.
   Furthermore, FABP5 promotes tumour cell growth in numerous cell types
   and is a negative prognostic marker in renal cell
   carcinoma^[148]20,[149]40. FABP5 was upregulated in cases that
   presented further clinic-radiological recurrence. However,
   identification of FABP5 as a biomarker in HER2-negative breast cancer
   requires substantially increased cohort size and mechanistic validation
   for robust interpretation.

   Our work has limitations which are mostly secondary to the small sample
   size and therefore lack of power to detect specific associations.
   Nonetheless, this HER2-negative breast cancer cohort gives a glimpse of
   the changes in cancer biology during neoadjuvant therapy
   administration. Another limitation is the 545 targeted gene expression
   panel used did contemplate only a few immune markers including IDO1,
   LAG3 and STAT1. Furthermore, the primary endpoint of the present
   clinical trial is pathologic complete response (pCR) in the breast. We
   used ORR for analyses of responses after the neoadjuvant therapy since
   there are only a few pCR cases; residual cancer burden was used but was
   not informative in the dataset; thus, the association between
   biomarkers and response is challenging.

   Taken together, these results suggest that mutational heterogeneity,
   subclonal architecture and the immune microenvironment along with
   remodelling of hypoxia and EMT may influence the response to
   neoadjuvant treatment in early stage disease, with possible
   implications for clinical decision-making and monitoring of treatment
   efficacy.

Methods

Patients and tissues

   Primary breast cancer tumour specimens were obtained from the phase II
   study, open-label, single-arm SOLTI-1007 NEOERIBULIN
   ([150]NCT01669252). The patients reported here were treated at the Vall
   Hebron Institute of Oncology, Barcelona. In brief, eribulin was
   administered as neoadjuvant treatment for stage I-II HER2-negative
   breast cancer with a dose of 1.4 mg/m^2 intravenously on Days 1 and 8
   every 21-day cycle, for 4 cycles, and the pathological CR was defined
   as the absence of residual invasive tumour in breast tumour specimens.
   All patients were negative for HER2 overexpression on clinical assays.
   The Ethics Committee of the Vall Hebron University Hospital, Barcelona,
   Spain approved the study. All patients gave informed consent for DNA
   and RNA sequencing.

Clinic-pathological response analyses

   We adopted a conservative method of defining ORR after neoadjuvant
   therapy as CR or PR by response evaluation criteria in solid tumours
   (RECIST) (major decrease in tumour burden following treatment) and poor
   response as progressive disease (PD) or stable disease (SD) by RECIST
   (major increase or stability in tumour burden following treatment)
   after eribulin therapy.

   Patients were classified into two groups depending on their clinical
   response to therapy at surgery: good responders—patients that had CR or
   PR; poor responders—patients that had PD or SD.

Tissue processing and DNA and RNA extraction

   Frozen biopsies were embedded in a frozen tissue matrix (OCT; Sakura
   Finetek, Torrance, CA) and cut at the cryostat for tumour cellularity
   assessment by a pathologist. Genomic DNA was isolated from 10 × 10um
   sections using the DNeasy Kit (Qiagen, Hilden, Germany). Samples with
   less than 10% tumour material or that produced low DNA yield were
   excluded from the analysis.

   A section of the FFPE breast tissue was first examined with
   haematoxylin and eosin staining to confirm the presence of invasive
   tumour cells (≥10%).

   For RNA purification (High Pure Formalin-Fixed Paraffin-embedded RNA
   Isolation Kit, Roche Diagnostics Limited, West Sussex, UK), one to five
   10 μm FFPE slides were used for each tumour specimen (at diagnosis,
   cycle 2 and surgery).

Gene expression and Intrinsic subtype analyses

   A minimum of ∼125 ng of total RNA was used to measure the expression of
   545 genes involved in breast cancer, including 5 housekeeping genes
   (ACTB, MRPL19, PSMC4, RPLP0 and SF3A1), using the Prosigna assay
   (NanoString Technologies, Seattle, USA). Samples with 20 or fewer
   counts in at least 70% of the genes were removed. Data were log base 2
   transformed and normalised using the housekeeping genes.

   The same RNA was used to measure the expression of 50 genes of the
   PAM50 intrinsic subtype predictor assay. For each sample, we calculated
   the PAM50 signature scores (Basal-like, HER2-E, Luminal A and B) and
   the proliferation signature score^[151]41. Differential gene expression
   analysis fold change of each gene was calculated as the ratio of
   average gene expression intensity of (i) the good responder group
   (n = 19 [5 CR, 14 PR]) to that of the poor responder group (n = 16 [12
   SD, 4 PD]) or (ii) the patients that recurred (n = 6) to that of the
   non-recurrent group (n = 29). A two-sample t test was used to compare
   gene expression intensities between groups.

   A gene was claimed to be differentially expressed if it showed a fold
   change of >1 (increased in good responders, or non-recurrent) or ≤ −1
   (increased in poor responders, or in the later recurred) and further
   adjustment FDR ≤ 0.05 was applied. Volcano plots were used to visualise
   log 2-fold change on the x-axis and −log10 p values on the y-axis.

   Pathway enrichment analysis was performed using the gprofiler
   toolkit^[152]42 comparing good responder vs. poor responder groups
   using two ontology databases as reference: (i) Gene Ontology and (ii)
   Kyoto Encyclopaedia of Genes and Genomes.

DNA sequencing

   WES was performed to the breast cancer tumour and matched normal DNA
   obtained from the buffy coat of each HER2-negative breast cancer
   patient.

   Libraries for Illumina sequencing were prepared using Illumina Nextera
   Rapid Capture Exome kit (cat. FC-140-1003, Illumina) as we reported
   previously. Prior to library preparation DNA concentrations for each
   sample were quantified using a fluorescence-based method (Quant-IT
   dsDNA BR, cat. [153]Q33130, Thermo Fisher Scientific) and 50 ng of
   genomic DNA was used for library preparation.

   Samples were processed following the manufacturer’s instructions (part#
   15037436 Rev. J, Illumina) for WES. Prior to the first hybridisation,
   all libraries were quantified using quantitative polymerase chain
   reaction (qPCR). KAPA Library Quantification Kit (cat. KK4873, KAPA
   Biosystems) as used as per manufacturer’s recommendations. A subset of
   libraries was analysed using DNA 1000 Kit (cat. 5067-1504, Agilent).

   Whole-genome libraries and exome libraries were normalised and pooled
   in equal volumes to create balanced pools. Each pool was normalised to
   molarity of 4 nM and used for sequencing with clustering concentration
   20 pM with 1% spike-in of PhiX control. Sequencing was performed on an
   Illumina HiSeq2500 using v4 chemistry and 50 cycles single-end for
   s-WGS and 75 cycles paired-end for WES. Demultiplexing was performed
   using Illumina’s bcl2fastq2 v.2.17 software using default options.
   FASTQ files were used for subsequent data analysis.

WES analyses

   For WES analysis, reads were mapped to the human genome (GRCh37) and
   base quality recalibration were performed using Novoalign v 3.02
   (Novocraft). Coordinate sorting of reads and PCR-duplicate marking was
   performed using Novosort (v 3.02). The resulting bam files for all
   samples for the same patient were locally realigned using the Genome
   Analysis Toolkit (GATK, v 3.4.46)^[154]43. MuTect (version 2) was run
   using default parameters^[155]44. Strelka (version 1.0.14) was run with
   recommended starting parameters for BWA and default parameters^[156]45.
   The mean coverage per sample was calculated with CollectWgsMetrics
   (Picard). A joint calling was done and filtered allelic fraction >10%
   in at least one sample of each patient. Somatic mutations were
   annotated using Variant Effect Predictor (VEP,
   [157]http://grch37.ensembl.org/) and visualised using IGV. Output with
   reads > 15 and purity > 20% were included in further analyses.

ABSOLUTE

   ABSOLUTE (v1.0.6) was used to infer the cancer cell fraction (CCF) of
   mutations and the mutations were classified as clonal or subclonal as
   previously described^[158]17,[159]46. A mutation was classified as
   clonal if its probability of being clonal was >50% or if the lower
   bound of the 95% confidence interval of its CCF was >90%. Mutations
   that did not meet the above criteria were considered subclonal.

   ABSOLUTE software was used to calculate tumour clonality, purity and
   ploidy. For running the ABSOLUTE we obtained the mutation annotation
   file by running vcf2maf script with VCF files for the corresponding
   tumour and normal control samples, annotation was performed by VEP. To
   find the segmentation we ran the CNVkit batch command with the.bam
   files from tumour and normal samples^[160]47. For running ABSOLUTE we
   subset the.cns files from the CNVkit output file for variant
   coordinates as well as probes and log2 values. We used the options
   min.ploidy = 0.95, max.ploidy = 10. Probability of a mutation to be
   clonal was defined essentially as described in ref. ^[161]48.

Mutation signature

   Decomposition of the mutational signature was performed using
   deconstructSigs^[162]16, based on the set of 30 mutational signatures.
   These signatures are largely defined by the relative frequency of the
   six possible base substitutions (C > A, C > G, C > T, T > A, T > C, and
   T > G) in the sequence context of their adjacent 5′ and 3′ base. Of
   these, COSMIC signatures 1, 2, 3, 6, 13, 9, 10, 15, have been
   associated with breast cancer. For clarity, we explored the following
   mutational signatures in this cohort: ageing/clock-like
   signatures^[163]49,[164]50 (signatures 1 and 5), APOBEC (signatures 2
   and 13), BRCA-related (signature 3), defective DNA repair (signatures 6
   and 15), polymerase n (signature 9), POLE mutations (signature 10),
   unknown breast (signature 30) and other signatures.

Neoantigen prediction

   The 4-digit HLA type was obtained from matched normal WES data and the
   OptiType 1.3.1 analysis software package was used^[165]51. WES and HLA
   typing was integrated, and NeoPredPipe pipeline^[166]52 was applied
   with minor modifications for the neoantigen prediction. First,
   non-synonymous cancer-specific mutations, i.e., present in all tumour
   cells and absent in all normal cells, were used to generate a
   comprehensive list of peptides (9–11 amino acids in length) with the
   mutated amino acid represented at each peptide position and used as
   input was previous described. Prioritisation of neoantigens: the
   selection of candidate neoantigens were based on a %Rank < 0.5;
   epitopes already existing in the reference proteome indicated by the
   Novelty parameter were excluded.

Tumour infiltrating lymphocytes

   Evaluations of stromal TILs were performed on haematoxylin and
   eosin-stained sections by an experienced board-certified pathologist
   (R.F.) according to the 2014 recommendations of the international TILs
   working group^[167]53.

Statistical analysis

   Continuous variables were expressed as median and range, while
   categorical variables were expressed as absolute values or percentages.
   For statistical comparison, we used Mann–Whitney test for independent
   continuous variables, Wilcoxon signed-rank test for paired continuous
   data and Fisher’s exact test for categorical data. To study if there
   was a significant variation of TMB across sampling timepoints the
   Wilcoxon signed-rank test was used to compare pairwise timepoints. In
   addition, the range and standard deviation of TMB in each time point
   were calculated to estimate intra-period TMB heterogeneity. To study
   the association between mutations in driver genes and relapse-free
   survival the Cox proportional hazard model was fitted. Furthermore, to
   analyse the association between the percentage of subclonal mutations
   at baseline and clinical outcomes, we used the logistic and Cox models
   after relaxed linearity assumption using restricted cubic splines by
   means of rms R package. Data analyses were carried out using R version
   statistical software 3.6.3.

Reporting summary

   Further information on research design is available in the [168]Nature
   Research Reporting Summary linked to this article.

Supplementary information

   [169]Supplementary Data 1^ (1.2MB, xlsx)
   [170]Supplementary Information^ (190.1KB, pdf)
   [171]Reporting Summary^ (71.7KB, pdf)

Acknowledgements