Graphical abstract graphic file with name fx1.jpg [85]Open in a new tab Highlights * • Proteogenomics reveals potential false HER2+ tumors without pCR to targeted therapy * • Tumor microenvironment, cell lineage, and EMT pathways are higher in non-pCR tumors * • Baseline GPRC5A and TPBG protein levels are higher in HER2+ tumors without pCR * • Meta-analysis of additional studies confirms higher GPRC5A and TPBG mRNA in non-pCR __________________________________________________________________ Jaehnig et al. perform proteogenomic analysis of needle-core biopsies from the CALG 40601 breast cancer trial and identify proteins with differential expression between responders and non-responders to neoadjuvant anti-HER2 therapy. Meta-analysis reveals reproducible association with response for two markers, GPRC5A and TPBG, in multiple independent studies. Introduction ERBB2 gene amplification, with consequent overexpression of mRNA and protein (human epidermal growth factor receptor 2 gene products are referred to as HER2 hereon), is present in ∼15% of breast cancer cases.[86]^1 For clinical Stage 2 and 3 HER2+ disease, trastuzumab-based neoadjuvant therapy is often undertaken. Patients who experience complete histological eradication of disease in the breast and axillary lymph nodes after neoadjuvant therapy (pathological complete response - pCR) have dramatically better outcomes in comparison with patients who have residual or resistant disease (non-pCR).[87]^2^,[88]^3 A pCR predictor, based on an analysis of a pretreatment sample, would be clinically impactful. For patients with a predicted high likelihood of a pCR and long-term freedom from relapse, the focus would be de-escalation, as highly responsive patients may be over-treated with multiple chemotherapy drugs and several anti-HER2 approaches. For patients predicted to have a tumor that is resistant to the standard of care, alternative approaches must be developed, which could include escalation of anti-HER2 treatment with a trastuzumab-based antibody drug conjugate, the addition of an anti-tumor immunotherapeutic, or other treatments identified through improvements in our understanding of treatment resistance. To investigate the efficacy of different HER2 targeting regimens in the neoadjuvant setting, the Cancer and Leukemia Group B (CALGB, now part of the Alliance for Clinical Trials in Oncology) conducted a three-arm randomized phase II clinical trial (CALGB 40601, [89]NCT00770809) that compared paclitaxel in combination with either trastuzumab (TH arm), lapatinib (TL arm), or the combination (THL arm). The total trial accrual was 305 patients. The pCR frequencies were 57% for the THL arm (118 patients), 45% for the TH arm (120 patients), and 30% for the TL arm (67 patients).[90]^4 The TL arm was closed early due to lower efficacy. Although the difference in pCR rates between treatment arms lacked statistical significance, the THL arm subsequently demonstrated a significant improvement in overall survival (OS) versus the TH or TL arms.[91]^4^,[92]^5 Research biopsies were obtained from all participants before therapy initiation. Previously reported RNA and DNA analyses that combined all arms indicated that PIK3CA mutation, transcriptomics-based intrinsic subtypes, and immune gene expression signatures were associated with treatment response.[93]^5^,[94]^6^,[95]^7^,[96]^8 Given the modest concordance between RNA and protein expression and potential of mass spectrometry-based protein and phosphoprotein data for pathway analysis,[97]^9^,[98]^10 we hypothesized that proteomic analysis would offer additional insights into biomarkers and pathways associated with treatment response, as well as candidate therapeutic targets for resistant cases. Herein we report the results of integrated proteogenomic profiling (DNA, RNA, protein, and phosphoprotein) in a subset of CALGB 40601 pretreatment biopsies that were snap frozen in optimal cutting temperature (OCT) resin. Techniques deployed were developed in a training set of OCT embedded frozen samples from NRG Discovery Protocol 1 (DP1) that enrolled patients treated with the standard of care for HER2+ breast cancer.[99]^11 Results Proteogenomic profiling of frozen core biopsies from CALGB 40601 Two hundred and eighteen snap frozen OCT resin-embedded core needle biopsies from 117 patients enrolled onto CALGB 40601 were obtained from NCTN Navigator and processed for proteogenomic analysis. [100]Figure 1A is a REMARK diagram that outlines the number of biopsies received, those successfully processed, and those excluded due to factors such as low tumor content (<25%), sample misidentification, or inadequate protein yield (refer to [101]STAR Methods). Eighty biopsies corresponding to 54 patients passed quality control (QC) for proteogenomic analysis ([102]Table S1 describes how the final cohort compares to the original cohort). Notably, a majority (117) of the biopsies that failed QC were insufficient for processing to begin with or were excluded because of low tumor content, thus highlighting the need for implementing diligent sample collection protocols. The upper part of the heatmap in [103]Figure 1B displays the data types obtained for each biopsy (1–2 biopsies per tumor) stratified by treatment arm. The processed data for this cohort is included in [104]Table S2. Seventy-five of 80 biopsies with sufficient tumor content yielded both tandem mass tag (TMT)-based proteomic and phosphoproteomic data (at least one biopsy per tumor). 67 biopsies also yielded whole-exome sequencing (WES) data, while QC-passed RNA sequencing (RNA-seq) data were obtained from 52 biopsies. Figure 1. [105]Figure 1 [106]Open in a new tab Proteogenomic profiling of fresh frozen core biopsies from CALGB40601 (A) REMARK diagram relating the final set of core biopsies processed for proteogenomics to the original clinical trial. See also [107]Table S1. (B) Heatmap showing proteogenomic data availability, clinical data, and proteogenomic features of genes on the ERBB2 amplicon. The data types obtained for each of the 80 biopsies profiled from 54 patients (genomics: DNA sequencing, transcriptomics: RNA-seq, TMT: proteomics [including phosphoproteomics]) are indicated in black (included) and gray (not available). Because two core biopsies were obtained from the same patient in several cases (technical replicates), biopsies from different patients are separated by spaces. PAM50 subtype assignment based on the RNA data obtained in this study (PAM50, updated) as well as from previous RNA profiling (PAM50, original)[108]^5 and Herceptest IHC scores are also shown. Purple boxes outline samples with HER2 protein levels in the lower 5% relative to the pCR distribution (see [109]Figure S1G). (C) Scatterplot showing the correlation between ERBB2 copy-number aberration (CNA) log2 ratios and HER2 protein levels for all biopsies. Spearman correlation rho = 0.57, p = 1.3 × 10^−6. (D and E) Comparison of centralized Herceptest IHC scores with ERBB2 CNA log2 ratios (D) and HER2 protein log2 TMT ratios (E). Boxplots show the median (center) and first (upper bound) and third (lower bound) quartiles for each group, while whiskers indicate 1.5× interquartile range (IQR). White dots show values from biopsies from non-pCR tumors whereas blue dots show values for pCR biopsies. p values are from Wilcoxon rank-sum tests comparing values in biopsies with central ERBB2 IHC scores of 0 and 1+ to those with IHC scores of 2+ or 3+. For (C)–(E), CNA and protein measurements from individual biopsies were treated as separate data points even when obtained from the same patient. See also [110]Figures S1 and [111]S2 and [112]Table S2. The RNA data included measurements for ∼21,500 individual genes ([113]Figure S1A). The respective proteomics identifications included peptides from ∼11,000 genes and phosphopeptides from ∼5000 genes ([114]Figures S1A and S1B). Most phosphoproteins had associated RNA and protein measurements ([115]Figure S1B). The numbers of individual biological features were comparable to the prior HER2+ focused (Discovery Protocol 1 [DP1]) pilot study ([116]Figures S1A and S1B).[117]^11 For each data type, measurements showed similar distributions in all samples ([118]Figure S1C). In addition, no evidence of batch effect was observed in the TMT data ([119]Figure S1D). For 28 of the 36 tumors with available RNA data from previous profiling,[120]^5 the Prediction Analysis of Microarray 50 (PAM50)-based intrinsic subtype for at least one biopsy matched the subtype previously reported ([121]Figure 1B). Consistent with previous Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteogenomic studies,[122]^9^,[123]^10 the mean gene-wise RNA-protein correlation for the cohort was ∼0.43 ([124]Figure S1E). Correlations for pairs of genes annotated to participate in the same protein complex were significantly higher than for pairs assigned to different complexes at both the RNA and protein levels ([125]Figure S1F), and this difference was more pronounced at the protein level. Further attesting to the quality of the proteomics data, prediction of function from co-expression[126]^12^,[127]^13 based on protein data out-performed prediction based on RNA data for most Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways ([128]Figure S1G). Correlations between RNA and protein expression levels for pairs of biopsies were used as an additional sample quality metric.[129]^13 When considering the protein data, pairs of biopsies from the same tumor showed significantly higher correlation than pairs from different tumors (median Spearman correlation ∼0.6 vs. 0, Wilcoxon rank-sum test p = 1.45 × 10^−13; [130]Figure S2A). Similar results were observed for the RNA data (median correlation ∼0.7 vs. 0, Wilcoxon rank-sum p = 1.18 × 10^−10; [131]Figure S2A). The highest RNA-protein correlations occurred when the RNA and protein data were from the same patient, regardless of whether they were obtained from the same biopsy or a different biopsy from the same tumor, and lowest when the RNA and protein data were from tumors from different patients (Wilcoxon rank-sum p value equals 2.2 × 10^−16 for comparison of these two sets; [132]Figure S2B). The median correlation for all 2138 possible pairs involving RNA and protein from different patients was zero, whereas the median of the 47 pairwise correlations between RNA and protein from the same sample was 0.48, and the median correlation was 0.46 for pairs involving different biopsies from the same tumor. In conclusion, these quality control metrics agreed with previously published proteogenomic datasets allowing the analysis to proceed.[133]^9^,[134]^10^,[135]^11^,[136]^12 Lack of proteogenomic evidence for HER2 amplification and protein overexpression correlates with absence of pCR All cases enrolled in CALGB 40601 were accrued based on a positive clinical test for HER2. The predominant copy-number aberration by WES was therefore amplification of the ERBB2 locus on Chr17q12 ([137]Figure S2C). The next most prominent peak, a gain at 8q23.3, was previously reported as a region containing putative oncogenes (RSPO2, EIF3E).[138]^14^,[139]^15 The WES data demonstrated the typical somatic mutation pattern for HER2+ breast cancer. For example, TP53 was mutated in 57% of the tumors while PIK3CA was mutated in 19% ([140]Figure S2D). Phosphorylation of HER2 can be due to auto-phosphorylation, downstream kinase activation and heterodimerization partners. While none of the known HER2 auto-phosphorylation sites were observed, several other phosphorylation sites on HER2 were successfully measured. Thus, the mean log2 ratio of all sites on the HER2 protein was considered as a proxy for HER2 activation. Proteins with the highest correlation to overall HER2 phosphorylation included HER2 itself, other co-overexpressed protein products from genes adjacent to ERBB2 in the 17q12 locus (GRB7, MIEN1, and STARD3), and three MAP KKKs (MAP3K9, MAP3K10, and MAP3K21) ([141]Figure S2E). Furthermore, post-translational modification-set enrichment analysis (PTM-SEA)[142]^16 indicated that phosphosites from the EGFR pathway (in which upstream receptors include HER2 and other EGFR family members) and sites that are perturbed by treatment with U0126, an inhibitor of ERK kinases downstream of EGFR/HER2, were among the most highly correlated with HER2 phosphoprotein, thereby confirming established downstream signaling activation events ([143]Figure S2F). Next, a detailed analysis of ERBB2 copy-number (CNA) and associated HER2 RNA and protein expression data was conducted ([144]Figures 1B and 1C) because a subset of non-pCR samples showed lower protein levels than expected given the distribution observed for the pCR cases ([145]Figure S2G; highlighted by purple boxes in [146]Figures 1B and [147]S2G). Six samples from four patients lacked evidence for amplification of the ERBB2 locus in the gene copy-number data. Copy-number ratios were significantly correlated with protein levels (Spearman rho = 0.57, p = 1.3 × 10^−6; [148]Figure 1C), and all six samples lacking amplification had low HER2 RNA, protein and/or phosphorylation levels and were associated with absence of pCR ([149]Figures 1B and 1C). This analysis therefore suggests these cases are false HER2 positives, a phenomenon also observed in our initial DP1 dataset.[150]^11 Since the proteogenomic methods for the DP1 and CALGB 40601 studies were the same, the two datasets were combined. Of 68 cases, there were 5 (all without pCR) that were potentially misdiagnosed HER2+, with a significant interaction between lack of proteogenomic evidence for HER2+ status and absence of pCR (Fisher’s exact test p = 0.04). To investigate this hypothesis further, immunohistochemistry (IHC) staining for HER2 for several CALGB 40601 samples was repeated using FFPE tissue from separate biopsies. Fifteen samples demonstrated IHC scores of 0 or 1+, i.e., HER2 negative ([151]Figure 1B). Notably, all samples assigned IHC 0 for which WES data were available were among the samples without ERBB2 gene amplification, and IHC 0 and 1+ samples had lower levels of amplification than IHC 2+ and 3+ samples (Wilcoxon rank-sum p = 9.8 × 10^−8; [152]Figure 1D). Consistently, HER2 protein levels by proteomics were significantly lower in IHC 0 and 1+ samples than in IHC 2+ and 3+ samples (Wilcoxon rank-sum p = 7 × 10^−6; [153]Figure 1E). Since different biopsies were used for proteogenomics than were used for the initial clinical diagnosis, some potential false positive results may be a consequence of intra-tumor heterogeneity, but it is important to note that the biopsies used for central staining were also different from those used for proteogenomics. Thus, heterogeneity is unlikely to be a dominant explanation for cases where both proteogenomics and central IHC indicate lack of ERBB2 amplification and HER2 overexpression. To facilitate the identification of treatment-resistance biomarkers in true HER2+ breast cancer, the four CALGB 40601 cases with lack of proteogenomic evidence for ERBB2 gene amplification were excluded from subsequent analyses. Baseline extracellular matrix proteins are elevated in non-pCR-associated samples accrued from the THL arm To account for different responses in individual treatment arms and for duplicate biopsies from the same tumor, linear models were employed to compare non-pCR tumors to pCR tumors for the RNA, protein, and phosphosite data (see [154]STAR Methods, [155]Figure S3A; [156]Table S3). Since the combination treatment arm (TLH arm) had the highest pCR rate and lowest relapse and death rates, cases from this arm were subjected to proteogenomic discovery analyses ([157]Figure 2A). Gene set enrichment analysis (GSEA) of KEGG pathways using signed log10 p values from the linear models ([158]Table S3) as input was conducted on differential protein (y axis) and RNA (x axis) measurements to identify differentially enriched pathways ([159]Table S4).[160]^17^,[161]^18 Several immune-related pathways were concordantly lower in the non-pCR tumors at both the RNA and protein level ([162]Figure 2A, green points in lower left corner). Higher immune pathway activity at the RNA level in pCR cases is consistent with the prior molecular study of CALGB 40601.[163]^6 However, while “O-glycan biosynthesis” was elevated in non-pCR at both the RNA and protein levels, “extracellular matrix (ECM)-receptor interaction”, one of the other pathways most elevated in non-pCR tumors at the protein level, was not differentially enriched at the RNA level. Similarly, when performing Gene Ontology (GO) Biological Process (BP) enrichment ([164]Table S4),[165]^19 the most enriched pathways in non-pCR tumors at the protein level included related cellular processes: “extracellular structure organization (ECSO)”, “connective tissue development (CTD)”, and “collagen metabolic process”, although ECSO and CTD also showed significant enrichment at the RNA level. Differential proteins from this set of ECM-associated pathways included several collagens and matrix metalloproteinases (MMPs) as well as ECM signaling regulators ([166]Figure 2B). For example, transforming growth factor-β2 (TGFB2) was consistently higher in non-pCR cases versus the pCR cases (moderated t test p = 0.026; [167]Table S3). PTM-SEA analysis was then deployed to identify pCR-associated phosphosite sets in THL arm samples using the signed log10 p values from the linear models that were applied to the site data. These analyses indicated higher levels of CDK4 target sites, as well as sites altered by nocodazole treatment, in the pCR cases, suggesting higher levels of cell cycle activity/sensitivity to mitotic spindle inhibitors ([168]Figure 2C; [169]Table S4). Targets for PAK1 constituted one of the most enriched phosphosite sets in non-pCR cases. Of note, these ECM-related pathways and the PAK1 signature were not significantly elevated in non-pCR in the TH arm. A possible explanation is that non-pCR tumors with low levels of ECM proteins from this arm may have responded with the addition of lapatinib. However, random differences and cohort size are more parsimonious explanations for both differences between treatment arms and between results based on RNA and protein data. In conclusion, proteomic analysis emphasized the role of the tumor microenvironment in resistance to HER2 therapy, significantly identifying ECM pathway components and kinase activation events that were only partially evident from mRNA-based discovery. Figure 2. [170]Figure 2 [171]Open in a new tab ECM-related pathways are higher in non-pCR than in pCR in the THL arm (A) Scatterplot showing GSEA results for enrichment of KEGG pathways ([172]Table S4) using the signed log10 p values from the linear models comparing non-pCR to pCR in the THL arm for the protein data (y axis) vs. RNA data (x axis) as inputs ([173]Table S3). The top pathway enriched specifically with proteins higher in non-pCR than pCR, “ECM-receptor interaction,” is labeled purple. (B) Heatmap showing significantly differential proteins (p < 0.05 in moderated t tests from linear models comparing non-pCR to pCR in THL arm; [174]Table S3) from the top ECM-related enriched pathways from KEGG and GO BP. Each column represents a separate tumor; when two biopsies were available for the same tumor; mean log2 TMT ratios were calculated prior to Z score scaling. Checker plot on the right indicates whether (black) or not (white) each gene/protein is a member of each pathway or was significantly differential at the RNA level (RNA DE). (C) Volcano plot of PTM-SEA results from enrichment of differential phosphosites in the THL combination arm with phosphosite sets from PTMSigDB ([175]Table S4), using signed (by direction of DE) log10 p values from the linear models as input ([176]Table S3). See also [177]Figure S3. Immune gene expression and a protein-based T cell signature are elevated in baseline samples associated with pCR to trastuzumab To maximize the identification of molecular events associated with pCR, pathway enrichment analysis was also applied to the differential expression results from the combination of the two trastuzumab-containing arms (TH and THL) by using the signed log10 p values from the linear models ([178]Table S3) as input for GSEA (enrichment at the RNA or protein level) or PTM-SEA (enrichment at the phosphosite level) since trastuzumab is the standard of care ([179]Table S4). [180]Figures 3A and 3B display differential pathways (MSigDb Hallmark) and phosphosite signatures (PTMSigDB) between non-pCR and pCR.[181]^16^,[182]^18^,[183]^20 Genes from the EMT (epithelial-mesenchymal transition) pathway were higher in non-pCR versus pCR cases at the protein but not the RNA level in this larger sample set. Other pathways involving the remodeling of the tumor microenvironment and changes in cellular identity, such as angiogenesis, myogenesis, apical junction, and Wnt-β-catenin signaling, were also higher in non-pCR tumors primarily at the protein level (green labels corresponding to red points in [184]Figure 3A). Finally, cell cycle (purple labels) and immune-related (brown labels) pathways were higher in pCR tumors, and differences between the RNA and protein data were evident. Most cell cycle pathways were elevated in pCR cases specifically at the protein level. Interestingly, different immune-related Hallmark pathways were elevated specifically in the RNA data (interleukin-STAT signaling) versus those that were also elevated in the protein data (interferon responses). Consistent with gene-level enrichment observations, PTM-SEA indicated higher levels of phosphosites perturbed by treatment with the anti-inflammatory glucocorticoid dexamethasone in tumors associated with pCR ([185]Figure 3B). Sites induced by nocodazole perturbation as well as targets of three cell cycle-associated kinases, AURKB, CDK2, and DYRK2, were also elevated in pCR cases, identifying cell cycle phosphosite signatures that are potentially response associated. Sites with higher levels in non-pCR than in pCR were enriched for targets of GSK3A/B, PKACA, PKCE, PRKD, JNK1, AMPKA1, and CDK5, suggesting additional response associated signaling events, which could be used to infer alternative therapeutic targets. Furthermore, recent studies suggest CDK5 may be an immunosuppressive inhibitor of major histocompatibility complex class I expression.[186]^21 Figure 3. [187]Figure 3 [188]Open in a new tab Immune-related pathways are elevated in tumors demonstrating pCR to trastuzumab (A) Scatterplot showing GSEA results for enrichment of MSigDB Hallmark50 pathways ([189]Table S4) using the signed log10 p values from the THL + TH contrasts from protein data (y axis) vs. RNA data (x axis) as inputs ([190]Table S3). (B) Volcano plot showing PTM-SEA results for PTMSigDB sets ([191]Table S4) using signed log10 p values from comparing phosphosites in non-pCR to pCR ([192]Table S3). y axis shows −log10 p values from PTM-SEA whereas the x axis shows the corresponding normalized enrichment score (NES). (C) Heatmap illustrating immune microenvironment-related proteogenomic features in each tumor (mean of pairs for duplicate biopsies/technical replicates from same tumor) from the THL and TL arms. Features include Z scores of single-sample gene set enrichment analysis (ssGSEA) enrichment scores for immune Hallmark pathways that were differential at the protein level in (A); of mean levels of immune modulator proteins curated by Thorsson et al.[193]^22; of RNA-based immune signature scores from ESTIMATE, Cibersort, and xCell[194]^23^,[195]^24^,[196]^25; of the IgG signature previously found to be associated with pCR in this study[197]^6; of RNA, protein, and phosphoprotein (mean of all sites) levels for immune checkpoint inhibitor (ICI) targets; and of a differential protein-based immune signature from Bayes DeBulk[198]^26 ([199]Tables S2 and [200]S4). Also included are tumor-infiltrating CD3^+ cell classifications for each tumor and percentages of T cells based on anti-CD3 immunohistochemistry (IHC) for all available cases. Samples are ordered by increasing levels of protein-based total immune modulator scores. Gray boxes indicate data not available, and asterisks indicate features that were differential in Wilcoxon rank-sum tests comparing non-pCR to pCR using the values for single cores or the mean values for pairs of duplicate cores from the same tumor (p < 0.05). See also [201]Figure S3A. Given that immune-related pathways were also differential between non-pCR and pCR-associated cases in the recently reported RNA-level analysis of CALGB 40601,[202]^6 a more detailed overview of the proteogenomic features associated with the immune microenvironment is provided in [203]Figure 3C. Here, tumors were ordered by increasing levels of protein scores derived from immune modulator genes (regulators of antigen presentation),[204]^22 which were consistent with RNA-based immune scores from ESTIMATE, Cibersort, and xCell and protein-based Hallmark single-sample GSEA (ssGSEA) scores for the pathways that were differential at the protein level in [205]Figure 3A ([206]Figure 3C; [207]Tables S2 and [208]S4). While immune scores were numerically higher in pCR tumors, the differences were not statistically significant when directly comparing non-pCR to pCR cases for protein-based single-sample scores ([209]Figures 3C and [210]S3B). However, the IgG signature previously found to be elevated in pCR cases at the RNA level in CALGB 40601[211]^6 was statistically significant at both the RNA and protein level despite the smaller size of the proteogenomic dataset. The proteogenomic features for potential immune checkpoint inhibitor (ICI) targets were also included in [212]Figure 3C for reference. PD-L1, PD1, CTLA4, LAG3, and TIGIT showed significantly lower expression in non-pCR tumors at the RNA level. However, PD-L1 protein and phosphoprotein, PD1 phosphorylation, and TIM3 RNA and protein levels, while also being lower in non-pCR samples, were not significantly so ([213]Figure 3C). To further investigate the immune microenvironment from the proteomics perspective, BayesDeBulk (BDB), a recently developed tool for immune deconvolution from proteomics data, was utilized ([214]Table S2).[215]^26^,[216]^27 The CD8 T cell signature was significantly lower in non-pCR than in pCR ([217]Figure 3C). However, protein-derived ssGSEA scores for interferon alpha and gamma responses and allograft rejection were not significantly different between pCR and non-pCR ([218]Figure 3C) despite the fact these pathways were enriched with differential proteins ([219]Figure 3A) and that these and several other immune pathways were significantly positively correlated with the BDB CD8 T cell signature ([220]Figure S3B). The CD8 signature was also significantly positively correlated with two phosphosite signatures that were significantly higher in pCR, signatures for targets of the HIPK2 kinase and for sites perturbed by paclitaxel treatment ([221]Figure S3B). The CD8 signature was also significantly anti-correlated with several single-sample scores that were higher in non-pCR tumors, including protein-based scores for Hedgehog and TGFβ signaling and phosphosite scores for PRKD1 and PRKCE activity. A more conventional approach for assessing the presence of T cell infiltration, anti-CD3 IHC, was also applied to the samples for which histological sections were available (see [222]STAR Methods). CD3^+ T cell percentages were correlated with immune pathway activity scores inferred from the proteogenomic data (r = 0.42, p = 0.0061 for Pearson correlation with immune modulator protein scores). However, no significant difference in CD3^+ cell counts between pCR and non-pCR cases was observed. Thus, the immune microenvironment analysis was technically valid, but a predictive model for pCR based on proteogenomic or histological immune parameters did not clearly emerge from these data, possibly because of sample size given that both groups (pCR and non-pCR) show heterogeneity in the tumor-immune microenvironment. Evaluation of three independent HER2-targeted neoadjuvant therapy datasets nominates protein biomarkers associated with lack of pCR Conclusions from complex pathway analyses are limited when sample sizes are modest. A simpler approach was therefore undertaken to identify individual proteins, not pathways, that associate with pCR status. To reduce false discovery, data from two additional proteomic studies of neoadjuvant therapy for HER2+ breast cancer were examined; the DP1 study,[223]^11 which included 14 patients treated with trastuzumab and, in most cases, pertuzumab, and the recently published Debets study,[224]^28 which included 45 patients treated with trastuzumab and pertuzumab. After accounting for tumor and nodal stage and ER status as potential confounders to the linear models described above, differential analysis revealed a total of 362 proteins with higher levels in non-pCR and 485 proteins with higher levels in pCR for the combination of the two trastuzumab-containing arms from CALGB 40601 ([225]Figure 4A; [226]Table S3). When a similar differential expression analysis approach was applied to the DP1 study, 179 proteins were higher in non-pCR cases, and 27 were higher in pCR cases. Fifteen proteins with higher levels in non-pCR cases overlapped between CALGB 40601 and DP1, while nine proteins with higher levels in pCR overlapped ([227]Figure 4A). These overlaps were significantly greater than expected by chance if similar numbers of genes were drawn at random (hypergeometric test, p = 0.0004 and 0.00001, respectively). Except for GPRC5A and VLDR, which were also significantly differential in both RNA-seq datasets, applying this approach to the RNA data yielded slightly smaller and mostly distinct sets of genes (12 with higher expression in non-pCR and 4 with higher expression in pCR). A sample-level protein heatmap for the protein candidates from both cohorts is shown in [228]Figure 4B. Notably, all 24 candidates remained significant when linear models were further adjusted to consider tumor purity as a covariate despite the inclusion of samples with tumor content as low as 25%. These 24 candidates were then assessed in the Debets study.[229]^28 While the directions of changes in protein expression were consistent for multiple proteins, only four potential resistance markers were confirmed to be significantly associated with response in all three datasets, GPRC5A (G protein-coupled receptor class C group 5 member A), TPBG (trophoblast glycoprotein also referred to as 5T4), NEU1 (neuraminidase 1), and SP140L (SP140 nuclear body protein-like) ([230]Figure 4C). Figure 4. [231]Figure 4 [232]Open in a new tab GPRC5A, TPBG, NEU1, and SP140L protein levels are associated with lack of pCR to neadjuvant anti-HER2 therapy in three proteomics datasets (A) Analogous limma models to those described in [233]Figure S3A were applied to the DP1 study after excluding potential false-positive protein outliers. The Venn diagrams show the overlap between upregulated (top) and downregulated (bottom) proteins from the comparison of non-pCR to pCR for the THL + TH arms of the current study and the DP1 study (limma moderated t test, p < 0.05). Numbers included for each study in parentheses indicate numbers of differential proteins among the 8,488 proteins evaluated by both studies. p values for overlap are from hypergeometric tests evaluating the probability that the overlapping sets contain as many proteins as observed here by chance given the numbers of differential proteins for each study and the background of 8,488 proteins. (B) Heatmap showing protein levels in biopsies (duplicate biopsies from same tumor are included as separate samples) from the THL + TH arms in the current study and from the DP1 study for candidates identified by the workflow described in (A). Upper panel shows 15 proteins that had significantly higher expression in non-pCR than in pCR, and lower panel shows the nine proteins that were higher in pCR (limma moderated t test, p < 0.05). Asterisks indicate genes that were also significant in the current RNA dataset from CALGB 40601. (C) Comparison of non-pCR to pCR samples from a recently published proteomics dataset for tumors from patients treated with dual antibody therapy[234]^28 revealed that GPRC5A (top), TPBG (middle), and NEU1 (bottom) proteins were also significantly elevated in and the SP140L protein was significantly lower in non-pCR in that study as well. Boxplots show the median (center) and first (upper bound) and third (lower bound) quartiles of respective levels of each protein in pCR and non-pCR tumors. p values are from using Wilcoxon rank-sum tests comparing the two groups. (D) TPBG IHC of control sections. Positive controls include placenta and renal cell carcinoma (RCC) tissue and MCF7 cells, while the negative control is the RAJI cell line (a human lymphoblastoid cell line). 100 micron scale bar in the upper left image applies to all images in the panel. (E) TPBG IHC of representative sections from tumors from the CALGB 40601 trial for each ordinal score category (0, 1 ,2, 3). 100 micron scale bar in the upper left image applies to all images in the panel. (F) Comparison of mass spectroscopy data for TPBG protein (y axis) to IHC scores. Boxplots show the median (center) and first (upper bound) and third (lower bound) quartiles of TPBG protein log2 TMT ratios for each group (tumors scoring as 2 or 3 vs. those scoring as 0 and 1), and whiskers extend to 1.5× interquartile range (IQR). For tumors with duplicate biopsies, the mean TPGB log2 TMT ratio was used. p value is from Wilcoxon rank-sum test comparing these groups. See also [235]Tables S2 and [236]S3. An antibody to TPBG/5T4 is commercially available, enabling independent confirmation of one of the MS-based protein measurements by immunohistochemistry (IHC). Positive staining was observed in the placenta, a tissue that physiologically expresses TPBG, as well as in the MCF7 breast cancer cell line and a renal cell carcinoma sample (RCC) as expected ([237]www.proteinatlas.org).[238]^29 Staining was absent from a previously established negative cell line control, RAJI ([239]Figure 4D) ([240]www.proteinatlas.org). When IHC was applied to sections from the CALGB 40601 proteogenomics cohort, we observed considerable variability in the extent and intensity of membrane staining ([241]Figure 4E). Thus, the tumors were assigned a score ranging from 0 to 3 to represent the degree of staining. As shown in [242]Figure 4F, the protein levels from the MS measurements were significantly higher in the tumors that scored higher (2 and 3) than in those with lower IHC scores (0 and 1). Thus, TPBG expression by MS was independently confirmed by IHC. GPRC5A and TPBG mRNA levels are associated with lack of pCR in a meta-analysis of RNA data from neoadjuvant HER2+ breast cancer clinical trials RNA-protein correlations were calculated for the four candidate biomarkers to facilitate further validation. All candidates showed significant correlations (Pearson r ∼0.77 for GPRC5A, r∼0.54 for TPBG, r∼0.42 for NEU1, and r∼0.45 for SP140L; p < 0.05 for all) ([243]Figure S4A). Since RNA-seq was previously performed on most of the original samples in CALGB 40601, GPRC5A, TPBG, NEU1, and SP140L were evaluated in this larger dataset[244]^5 (dbGaP id: phs001570.v2.p1). GPRC5A, TPBG, and NEU mRNA levels were all confirmed to be significantly higher in non-pCR tumors (RD: residual disease) than in pCR tumors, but SP140L mRNA levels were not significantly different ([245]Figure S4B). Higher levels of GPRC5A and TPBG RNA were also associated with worse OS for the highest expression quartile (p = 0.0056 and p = 0.028, respectively); in contrast, neither NEU1 nor SP140L mRNA levels were significantly associated with OS ([246]Figure S4C). RNA-based profiling datasets are also available from multiple published investigations of anti-HER2 therapies including I-SPY2,[247]^30^,[248]^31 CHER-LOB,[249]^32 Park et al.,[250]^33 and Shen et al.[251]^34 Ultimately, mRNA datasets from four breast cancer clinical trials that evaluated ten different neoadjuvant anti-HER2 regimens were analyzed. ESR1, GPRC5A and TPBG mRNA levels were significantly elevated in non-pCR samples in three, three, and five of the ten cohorts, respectively ([252]Figure 5A; [253]Table S5). In contrast, HER2 had significantly elevated levels in pCR-associated cases in two of the datasets. NEU1, however, was not significantly different in any dataset. Several of the datasets were too small to be individually informative; however, a meta-analysis demonstrated that HER2 expression was lower in non-pCR than in pCR (p = 0.0072), whereas ESR1 (p = 0.0028), GPRC5A (p = 0.00016), and TPBG (p = 0.000081) expression levels were higher in non-pCR cases. Figure 5. [254]Figure 5 [255]Open in a new tab GPRC5A and TPBG gene expression is associated with non-pCR to neadjuvant anti-HER therapy in combination with chemotherapy in multiple clinical trial datasets (A) Heatmap summarizing the results from comparing non-pCR/residual disease (RD) to pCR in publicly available datasets from clinical trials where patients were treated with anti-HER2 therapies. p values are from Wilcoxon rank-sum tests, while log2 FC (fold change) is the difference in the log10 transformed median expression value for the RD and pCR groups. The anti-HER2 therapy for each treatment arm is indicated in the first set of boxes. The pie charts in the meta-analysis column summarize the combined number of samples included in the datasets by pCR status and anti-HER2 therapy. The Stouffer meta-p values show the p values resulting from combination of the individual p values for each dataset weighted by sample size of the corresponding treatment arm using Stouffer’s method. Weighted (by study size) means of the log2 fold changes from the comparison for non-pCR to pCR for each dataset are also included. (B) Receiver-operator characteristic (ROC) curves for classifying tumors as pCR vs. non-pCR/RD using Z scores for HER2, ESR1, GPRC5A, and TPBG gene expression (RNA) in the combination of arms treated with trasuzumab from CHER-LOB (left) or from I-SPY2 (middle) or the HER2+ samples from the neratinib-treated arm from I-SPY2 (right). GPRC5A+TPBG-ERBB2 shows the results for a combined score obtained by adding the Z scores for GPRC5A and TPBG in a given tumor and subtracting the HER2 Z score. Numbers in the legend show the AUROCs for each score. See also [256]Figures S4 and [257]S5 and [258]Table S5. GPRC5A, TPBG, and HER2 mRNA levels are more closely related to pCR status than ESR1 mRNA levels Since GPRC5A, TPBG, ESR1, and HER2 represent orthogonal biological pathways, the relationship between the mRNA expression levels of each gene, both alone and in combination, versus the likelihood of pCR was explored. To take into consideration differences in treatment across the studies under investigation, receiver-operator characteristic (ROC) curve analyses were applied to three groups of samples separately. These groups consisted of 1) the combination of all the treatment arms that received trastuzumab within the CHER-LOB study, 2) the combined trastuzumab cohort from the I-SPY2 trials, and 3) the neratinib arm from I-SPY2 ([259]Figure 5B). For the trastuzumab-treated tumors from CHER-LOB, GPRC5A mRNA expression was the best single indicator of non-pCR with an area under the ROC (AUROC) of 0.74, whereas TPBG mRNA was the best performing marker for non-pCR in the trastuzumab-treated samples in I-SPY2 (AUROC = 0.739). In contrast, HER2 mRNA was the best marker for predicting pCR to neratinib-treatment (AUROC = 0.74). However, the best performance for all three groups was achieved when GPRC5A, TPBG, and HER2 levels were combined into a single signature. The AUROC for the triple gene signature was 0.741 for CHER-LOB, 0.776 for I-SPY2 trastuzumab treatment, and 0.79 for the I-SPY2 neratinib trial. The addition of ESR1 mRNA levels did not further improve the ROC value in any of the three cohorts. GPRC5A and TPBG protein levels are associated with differential PRKD1 kinase and T cell signatures To study possible biological roles for GPRC5A and TPBG in resistance to HER2 targeting, their associations with protein- and phosphosite-derived single-sample activity scores were investigated ([260]Table S4). Protein-based TGF-beta signaling pathway scores and PRKD1 and PRKCE target phosphosite activity were among the scores associated with response that were positively correlated with GPRC5A protein levels ([261]Figure S5A). PRKD1 target scores were also significantly correlated with TPBG, and both proteins were positively correlated with early and late estrogen response, suggesting these markers are co-regulated, although their biological profiles do not completely overlap ([262]Figures S5A and S5B). While ESR1 protein levels were also significantly positively correlated with GPRC5A (Pearson r = 0.39, p = 0.025) and TPBG (Pearson r = 0.28, p = 0.031), some samples had high levels of ESR1 without elevation of GPRC5A or TPBG levels and vice versa ([263]Figure S5C). Furthermore, both proteins were significant despite including clinical ER IHC status as a confounding variable in the linear models. However, when PAM50 subtype was used instead of ER status, TPBG (unlike GPRC5A) was no longer significant, likely due to higher levels of TPBG in the luminal subtype samples ([264]Figure S5D). Finally, like TGFβ signaling and PRKD1 and PRKCE target scores ([265]Figure S3B), GPRC5A protein levels also showed significant negative correlation with the BDB T cell signature identified by the pathway analysis described in the previous section (Pearson r = −0.57, p = 0; [266]Figure S5E). TPBG was also negatively correlated with this signature (r = −0.46, p = 0.0019; [267]Figure S5E), and both TPBG and the T cell signature were associated with targets of HIPK2 ([268]Figures S3B and [269]S5B). In conclusion, despite a significant but modest correlation with ESR1 protein and estrogen response pathway activity, TPBG and GPRC5A protein levels were associated with several biological functions that are distinct from established roles of HER2 and ESR1 in breast cancer pathogenesis and response to treatment. Discussion The CALGB 40601 data presented herein is the largest proteogenomic study of clinically annotated HER2+ breast cancers conducted to date. Proteogenomic analysis of the ERBB2 amplicon preliminarily suggests that local laboratory testing produces a false positive result in 7% of cases. Thus, HER2 proteogenomics could find clinical utility as a quality control for HER2 immunohistochemistry. The effort to detect biomarkers consistently associated with pCR in HER2+ breast cancer involved three independent proteomic datasets and 10 RNA-based validation cohorts. These studies prioritized two potential candidates, GPRC5A and TPBG, with higher expression in tumors failing to undergo pCR. Pathway analyses indicated these proteins are associated with pathways orthogonal to HER2 and ESR1. GPRC5A is a retinoic acid-regulated member of the type 3G protein-coupled receptor family and has been associated with poor prognosis in pancreatic cancer[270]^35^,[271]^36 as well several other cancer types.[272]^37^,[273]^38 Interestingly, GPRC5A mRNA exists in a circular form, which is associated with poor outcome in colon cancer.[274]^39 GPRC5A was significantly correlated with PRKD1 and PRKCE kinase activity scores, which, like GPRC5A itself, were both negatively correlated with the BDB T cell protein signature. This is noteworthy because PRKCE and PRKD1 have been implicated in toll-like receptor signaling,[275]^40^,[276]^41 and PRKCE is associated with poor prognosis and low immune checkpoint target levels in kidney cancer.[277]^42 TPBG is an antagonist of the Wnt/β-catenin signaling pathway,[278]^43 an adverse prognostic factor in lung cancer,[279]^44 and a proposed promoter of metastasis.[280]^45 TPBG-based antibody drug conjugates,[281]^46 vaccine therapy (TroVax),[282]^47 T cell engagers,[283]^48 and CAR-T cells[284]^49 are currently under investigation. How GPRC5A and TPBG specifically relate to resistance to anti-HER2 therapies remains unclear. However, TPBG is linked to cancer stem cell biology and thus mechanisms that allow cancers to evade systemic therapies.[285]^50 In conclusion, the results reported herein support the use of proteogenomics for the discovery of resistance biomarkers and the identification of alternative therapeutic strategies. However, several hurdles must be overcome to promote more rapid progress, including the analysis of larger sample sets of HER2+ breast cancers treated with a uniform approach for more comprehensive discovery and validation. A recent report on the use of targeted proteomics on routine pathology slides is therefore promising in this regard.[286]^51 Limitations of the study There are several general limitations to correlative molecular studies. Firstly, the low sample size per treatment arm reduces the power to detect true associations, increasing the likelihood of Type II errors. This limitation, as well as the fact that study specific factors may confound any clinical trial, necessitates additional validation in orthogonal cohorts. However, we were able to leverage two other published proteomic studies of HER2+ breast cancer[287]^11^,[288]^28 and conduct further validation based on mRNA expression in additional datasets. Thus, the conclusion that TPBG and GPRC5A expression are associated with non-pCR is well-supported. However, additional large-scale proteomics datasets on highly curated clinical cohorts will be required for more discovery and validation cycles. Thirdly, experiments in relevant model systems are essential to rigorously assess the mechanistic roles of TPBG and GPRC5A in HER2+ breast cancer. Finally, proteogenomic analysis of tumors is still a specialized area, and translation of protein mass spectrometry to the clinical setting is the subject of ongoing research and technical development. Resource availability Lead contact Requests for further information, resources, and reagents should be directed to and will be fulfilled by the lead contact, Meenakshi Anurag (anurag@bcm.edu). Materials availability This study did not generate new unique reagents. Data and code availability * • Data availability: The genomics and transcriptomics data are available from the database of Genotypes and Phenotypes (dbGAP): phs003576.v1.p1 ([289]https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?s tudy_id=phs003576.v1.p1), and the proteomics data are accessible through the NCI Proteomics Data Commons: PDC0004582 for proteomics ([290]https://pdc.cancer.gov/pdc/study/PDC000582) and PDC000583 for phosphoproteomics ([291]https://pdc.cancer.gov/pdc/study/PDC000583). Processed genomics and normalized RNA, protein expression, and phosphosite abundance data are provided in [292]Tables S2A (CNA), [293]S2B (Mutation), [294]S2C (RNA), [295]S2D (Global proteome), and [296]S2E (Phosphosite). * • Code availability: Specific code used for systems biology analysis will be available upon request. * • Any additional information required to re-analyze the data reported in this work is available from the [297]lead contact upon request. Acknowledgments