Graphical abstract

   graphic file with name fx1.jpg
   [45]Open in a new tab

Highlights

     * •
       16 gene pairs (16-GPS) could predict HRDness for prostate cancer at
       individual level
     * •
       HRDness samples classified by 16-GPS showed HRD molecular and
       clinical features
     * •
       HRDness cells classified by 16-GPS tend to be sensitive to PARPi
     __________________________________________________________________

   Cancer; Genomics; Transcriptomics

Introduction

   Prostate cancer is the second most common solid tumor in males
   ([46]Siegel et al., 2020). Approximately 20–30% of patients suffer from
   disease recurrence after receiving radical resection, radiation, or
   androgen deprivation therapy (ADT), leading to a progression of the
   disease to metastatic castration-resistant prostate cancer (mCRPC), and
   finally, 90% of patients succumb to the disease without effective
   treatments ([47]Kodama et al., 2020; [48]Lalonde et al., 2014).
   Recently, based on the synthetic lethal effect, poly ADP-ribose
   polymerase inhibitor (PARPi) has been approved by the Food and Drug
   Administration for the treatment of mCRPC with BRCA1/2 or other
   homologous recombination repair (HRR) gene mutations ([49]Rescigno
   et al., 2018). Importantly, patients with HRR gene alterations also
   carried a higher Gleason score and prostate specific antigen (PSA)
   level, and a worse prognosis similar to BRCA alteration, indicating a
   homologous recombination defectiveness (HRDness) phenotype beyond
   narrow BRCA-related way ([50]Evans et al., 2016; [51]Marshall et al.,
   2019a; [52]Pilie et al., 2019). Thus, there is an urgent need to
   identify the homologous recombination deficiency (HRD) phenotype to
   extend the utilization of PARPi for tumors with BRCA-deficiency as well
   as HR-deficiency, thereby expanding the demographic of patients who may
   benefit from PARPi treatment ([53]Weston et al., 2010;
   [54]Mendes-Pereira et al., 2009).

   Approximately 20–25% mCRPC and 5–10% of primary prostate cancer cases
   exhibited mutations in the HRR genes ([55]Criscuolo et al., 2019;
   [56]Handy et al., 2018). Owing to fewer hotspot mutations and the
   requirement of high tumor purity, detection of genomic variations
   presents a high false-negative rate. Besides, several presently popular
   BRCAness and HR-deficiency signatures, named HRD score and HRDetect
   score, have been developed based on the genomic scars and applied in
   prostate, breast, and ovarian cancers with the same threshold
   ([57]Sztupinszki et al., 2020; [58]Telli et al., 2016; [59]Davies
   et al., 2017). However, it is irrational to use the same threshold for
   classifying different tumor types, considering the tissue-specific
   background for HRD signature ([60]Jonsson et al., 2019). Thus, in view
   of the aforementioned limitations, our study aims to discover a robust
   HRDness classification signature for prostate cancer.

   HRD is a genomic event. Changes in the genome can affect gene
   expression through transcription factor dependent and independent
   transcriptomic regulation ([61]Ladan et al., 2021; [62]Peng et al.,
   2014). However, transcriptome signature for HRD in prostate cancer has
   rarely been investigated. Regular transcriptional signatures based on
   the quantitative expression measurements of multiple key genes lack
   robustness because of the marked variation in expression levels of the
   same gene across samples, and this variation is inevitably induced by
   different experimental conditions ([63]Guan et al., 2018). Thus, we
   initiated a relative expression ordering (REO)-based discovery frame to
   overcome these limitations. The advantage of REOs is that individual
   patients can be classified by the within-sample REOs of genes without a
   complex scoring process. The REO-based biomarkers are relatively robust
   against batch or normalization effects with different technical sources
   and poor sample preparation or quality, which are likely responsible
   for most of the false-negative results in detecting gene mutations, and
   could be applied in individual cancer samples ([64]Qi et al., 2016;
   [65]Guo et al., 2018).

   In the present study, using transcriptomic quantitative measurement of
   prostate cancer from The Cancer Genome Atlas (TCGA) database, we
   developed a qualitative REO-based HRDness classification signature
   composed of 16 gene pairs (16-GPS) for prostate cancer.

Results

Identifying HRDness signature by a REO-based discovering workflow

   Briefly, the procedure of identifying HRDness signature is shown in the
   flow chart ([66]Figures 1A–1C). Firstly, we identified 384 genes whose
   expression levels were significantly correlated with the HRDetect score
   in the PRCA495 dataset (spearman's rank correlation, |r| > 0.2, FDR <
   0.05). Then, consensus clustering divided the PRCA495 samples into six
   clusters (k = 6, [67]Figure 1A). A total of 111 samples from the
   cluster 1 (C1) group had a significantly poorer biochemical
   recurrence-free survival (BFS) than the 316 samples from the non-C1
   group (p = 4.19 × 10^−4; hazard ratio, HR = 2.47; 95% confidence
   interval, CI: 1.47–4.14; [68]Figure 2A). In addition, within the
   samples in clusters C1–C6 (132, 122, 79, 48, 93, and 21 samples,
   respectively), C1 carried the most aggressive pathological features,
   including Gleason score ([69]Figure 2B) and PSA level ([70]Figure 2C).
   More importantly, the C1 cluster maintained the highest HRD-related
   scores, including HRDetect score ([71]Figure 2D), signature 3
   ([72]Figure 2E), signature 5 ([73]Figure 2E), and HRD score
   ([74]Figure 2E), among the six clusters. Therefore, based on an
   appropriate proportion of sample size, i.e., 26.67% (132/495), and
   distinct HRDness information, the C1 subtype was used as the
   HRDness-like subtype for training the transcriptional qualitative
   signature.

Figure 1.

   [75]Figure 1
   [76]Open in a new tab

   Workflow of this study

   (A) Identifying HRDetect score-related genes and then HRDness-like
   subgroups (C1) was defined by these genes through consensus clustering
   method.

   (B) based on gene pairs with stable REO patterns in C1 and
   significantly reverse in non-C1, a qualitative signature consisting of
   16 gene pairs (16-GPS) was developed by performing a forward selection
   approach.

   (C) Validation of 16-GPS in independent datasets.

Figure 2.

   [77]Figure 2
   [78]Open in a new tab

   HRD-related clusters of prostate cancer

   (A) Biochemical recurrence survival for subjects according to molecular
   clusters. p values were determined by log-rank test.

   (B–E) Violin plots illustrate the distribution of pathological
   malignant features and genomic signature scores, including Gleason
   score (B), PSA level (C), HRDetect score (D), signature 3, signature 5,
   and HRD score (E) among the six clusters. Each dot represents an
   individual sample. Horizontal lines represent the median. p values were
   determined by a one-sided Wilcoxon rank-sum test.

   The quantitative gene expression values of pairwise genes were
   transformed to binary values for discovering qualitative HRDness
   signature ([79]Figure 1B). First, we identified 256,099,836 highly
   stable gene pairs in 132 samples from C1 (see [80]Method details) and
   306,023,521 significantly stable gene pairs in 363 non-C1 samples
   (binomial distribution test, FDR < 0.01), wherein REOs of 1,100
   candidate gene pairs were stable in C1, but reversed in non-C1 samples.
   Then, for each candidate gene pair, we calculated the F-score to
   evaluate the prediction power of the gene pair. Next, using the gene
   pair ATP6V1C1-NUDT15, with the max F-score 0.71 as the seed, we finally
   obtained a set of 16-GPS that reached the highest F-score (0.92) and
   the largest Area Under Curve (AUC, 95.50%), with the threshold of 11.5,
   by performing forward selection and AUC of the receiver operating
   characteristic (AUC-ROC) analysis ([81]Figure 3A). A sample was
   classified into the HRDness subtype if at least 12 gene pairs voted for
   HRDness; else, the sample was classified into the non-HRDness subtype.

Figure 3.

   [82]Figure 3
   [83]Open in a new tab

   Performance of 16-GPS in the PRCA495 dataset

   (A) The area under receiver characteristic operating curves for
   confirming signature and threshold. AUC was highest at a threshold of
   11.5, and the signature was marked as 16-GPS.

   (B–E) The distribution of pathological malignant features and genomic
   signature scores among the C1, non-C1, HRDness, and non-HRDness
   subgroups in the PRCA495 dataset, including signature 3, signature 5,
   and HRD score (B), HRDetect score (C), Gleason score (D), and PSA level
   (E). p values were determined by a one-sided Wilcoxon rank-sum test.

   (F) Kaplan–Meier survival analysis of HRDness and non-HRDness patients
   in the PRCA495 dataset.

   (G) Kaplan–Meier survival analysis of HRDness & non-C1 and non-HRDness
   & C1 patients in the PRCA495 dataset.

   (H) Kaplan–Meier survival analysis of HRDness & non-C1 and non-HRDness
   patients in the PRCA495 dataset.

   (I) Kaplan–Meier survival analysis of HRD[pos] and HRD[neg] patients
   classified by HRD score in the PRCA495 dataset.

   (J) Kaplan–Meier survival analysis of HRD[pos] and HRD[neg] patients
   classified by HRDetect score in the PRCA495 dataset. p values marked in
   (F-G) were determined by log-rank test.

16-GPS displaying improved performance than HRDetect score and HRD score

   Comparative analysis with a one-sided Wilcoxon rank-sum test suggested
   that HRDness subtype from PRCA495, classified by 16-GPS, showed a
   significantly higher level of HRDness genomic and clinical features,
   including signature 3 (p = 8.22 × 10^−6, [84]Figure 3B), signature 5
   (p = 2.00 × 10^−4, [85]Figure 3B), HRD score (p < 2.20 × 10^−16,
   [86]Figure 3B), HRDetect score (p = 7.92 × 10^−14, [87]Figure 3C),
   Gleason score (p = 6.00 × 10^−16, [88]Figure 3D), and PSA value (p =
   3.51 × 10^−7, [89]Figure 3E), than the non-HRDness subtype.
   Particularly, significance of the difference between HRDness and
   non-HRDness was increased upon comparing the difference between C1 and
   non-C1 at the levels of signature 3 (p = 8.22 × 10^−6 vs. p = 1.87 ×
   10^−4, [90]Figure 3B), HRDetect score (p = 7.92 × 10^−14 vs. p = 2.61 ×
   10^−13, [91]Figure 3C), Gleason score (p = 6.00 × 10^−16 vs. p = 2.88 ×
   10^−13, [92]Figure 3D), and PSA (p = 3.51 × 10^−7 vs. p = 3.90 × 10^−4,
   [93]Figure 3E). Similarly, the difference in BFS between subtypes
   classified by 16-GPS (p = 7.06 × 10^−8, HR = 3.88, 95% CI: 2.28–6.59,
   [94]Figure 3F) was greater than that in consensus clustering subtypes
   (p = 4.19 × 10^−4, [95]Figure 2A).

   A total of 41 patients in the non-C1 cluster were classified into the
   HRDness group by 16-GPS, which were termed as HRDness & non-C1
   patients. Further, eight patients in C1 were classified into the
   non-HRDness group, and were termed as non-HRDness & C1. The 35 patients
   with PSA information in the HRDness & non-C1 group showed greater PSA
   levels than the eight patients in the non-HRDness & C1 group (p = 0.07,
   [96]Figure S2A). No significant results were observed when comparing
   with other features ([97]Figures S2B–S2D), which was mainly because of
   the small size of the non-HRDness & C1 group. Notably, although we
   obtained no significant results with a huge HR value (p = 0.12, HR =
   26.52 × 10^7, 95% CI: 0-Inf, [98]Figure 3G), seven out of eight
   patients (one without BFS record) had no disease relapse with intact
   BRCA. In contrast, when comparing the HRDness & non-C1 group with the
   non-HRDness subtype with a greater patient pool defined by 16-GPS, we
   found that the HRDness and non-C1 group showed a higher level of
   HRDness features, such as PSA level (p = 2.46 × 10^−4, [99]Figure S2A),
   HRDetect score (p = 8.17 × 10^−3, [100]Figure S2B), Gleason score (p =
   6.92 × 10^−5, [101]Figure S2C), signature 3 (p = 0.03,
   [102]Figure S2D), signature 5 (p = 0.17, [103]Figure S2D), and HRD
   score (p = 4.83 × 10^−5, [104]Figure S2D), than the non-HRDness group.
   Moreover, these HRDness & non-C1 patients had poorer BFS than the
   non-HRDness patients (p = 4.31 × 10^−6, HR = 4.87, 95% CI: 2.31–10.29,
   [105]Figure 3H). Directly, when applying other genomic signatures, HRD
   score (cutoff = 42) and HRDetect score (cutoff = 0.70), into the
   dataset PRCA495, the predicted HRD[pos] and HRD[neg] had no BFS
   difference (HRD score, p = 0.59, HR = 1.72, 95% CI: 0.24–12.45;
   HRDetect score, p = 0.10, HR = 1.68, 95% CI: 0.90–3.11, [106]Figures 3I
   and 3J). These results suggest that 16-GPS displays better performance
   compared with HRD score, HRDetect score, and even HRDetect score-based
   consensus clustering.

16-GPS showing robust classification in independent datasets

   In the independent validation dataset of PRCA213, 12 and 54 patients
   with HRD feature information were respectively divided into the HRDness
   and non-HRDness subtypes by 16-GPS. Patients in the HRDness group
   showed higher levels of HRD score (p = 0.01, [107]Figure 4A) and
   HRDetect score (p = 0.16, [108]Figure 4B) than the patients in the
   non-HRDness group. In the testing dataset PRCA281, a classification
   rule of “winner takes all” was applied, as only one gene pair
   overlapped with 16-GPS. A total of 157 patients in the HRDness group
   showed poorer BFS compared with that in the non-HRDness counterparts
   (p = 2.52 × 10^−3, HR = 1.54, 95% CI: 1.16–2.04, [109]Figure 4C), and
   the former showed a significantly higher Gleason score (one-sided
   Wilcoxon rank-sum test, p = 1.81 × 10^−3, [110]Figure 4D). Similarly,
   for the 131 samples from the PRCA131 dataset, 29 patients classified
   into the HRDness group by 16-GPS had a significantly lower BFS than 102
   patients classified into the non-HRDness group (p = 3.13 × 10^−3, HR =
   3.02, 95% CI: 1.40–6.53, [111]Figure 4E), and the HRDness patients
   displayed higher Gleason scores (p = 0.10, [112]Figure 4F) and PSA
   levels (p = 0.06, [113]Figure 4G) through a one-sided Wilcoxon rank-sum
   test.

Figure 4.

   [114]Figure 4
   [115]Open in a new tab

   Performance of 16-GPS in validation datasets

   (A and B) HRD and HRDetect scores in different groups classified by
   16-GPS from the PRCA213 dataset. p values were determined by a
   one-sided Wilcoxon rank-sum test.

   (C) Kaplan–Meier survival analysis of groups classified by 16-GPS in
   the PRCA281 dataset. p value were determined by log-rank test.

   (D) Gleason score in groups classified by 16-GPS in the PRCA281
   dataset. p value were determined by a one-sided Wilcoxon rank-sum test.

   (E) Kaplan–Meier survival analysis of groups classified by 16-GPS in
   the PRCA131 dataset. p value were determined by log-rank test.

   (F and G) Gleason score and PSA level in groups classified by 16-GPS
   from the PRCA131 dataset. p values were determined by a one-sided
   Wilcoxon rank-sum test.

   (H–J) Kaplan–Meier survival analysis of PARP[neg] and PARP[pos] samples
   in HRDness group from the PRCA495, PRCA281, and PRCA131 datasets. p
   values were determined by log-rank test.

   (K) LN_IC50 and AUDIC in cells classified by 16-GPS. p values were
   determined by a one-sided Wilcoxon rank-sum test.

   On the other hand, patients were classified into the PARP[neg] and
   PARP[pos] (high PARP expression) subgroups by the StepMiner algorithm
   (see [116]Method details). As expected, for the 139 HRDness patients
   from the PRCA495 dataset with expression profile, 84 PARP[neg] tumors
   showed greater BFS than 55 PARP[pos] tumors (p = 0.04, HR = 1.96, 95%
   CI: 1.01–3.80, [117]Figure 4H). A similar tendency was observed when
   analyzing the 157 HRDness patients from PRCA281 (p = 0.24, HR = 1.28,
   95% CI: 0.85–1.94, [118]Figure 4I). Unfortunately, we obtained an
   opposite result in PRCA131, where 15 PARP[neg] patients had a poor BFS
   than 14 PARP[pos] patients (p = 0.15, HR = 0.41, 95% CI: 0.11–1.45,
   [119]Figure 4J).

   Further, using the gene expression profiles of prostate cancer cell
   lines treated by five PARPi types from the Genomics of Drug Sensitivity
   in Cancer (GDSC), three and four cell lines were stratified by 16-GPS
   into the HRDness and non-HRDness groups, respectively. The HRDness
   cells tended to be sensitive to veliparib by comparing the natural log
   of the fitted half maximal inhibitory concentration (LN_IC50, p =
   4.80 × 10^−2, [120]Figure 4K) and area under the drug inhibition curve
   (AUDIC, p = 0.05, [121]Figure 4K) using one-sided Wilcoxon rank-sum
   test. HRDness cells also displayed the same tendency of sensitivity for
   the drugs, including olaparib, rucaparib, and talazoparib, when
   comparing the LN_IC50 ([122]Figure S3A) and AUDIC ([123]Figure S3B)
   with non-HRDness cells.

Robustness of 16-GPS against different tumor sampling sites and partial RNA
degradation

   Here, we proved that REO-based qualitative 16-GPS was not susceptible
   to different tumor sampling positions or RNA degradation in prostate
   cancer. 16-GPS was applied to classify six paired pre-ADT and post-ADT
   samples from the PRCA12-ADT dataset, and all samples were classified
   into the non-HRDness group. Although the quantitative expression values
   of 26 genes in 16-GPS varied considerably between paired samples, the
   REOs in 16-GPS were stable enough to stratify paired patients into the
   same group ([124]Figure S4). Similarly, the signature resulted in a
   classification where six paired fresh-frozen/formalin-fixed
   paraffin-embedded (FF/FFPE) samples from PRCA12-FF were assorted into
   the non-HRDness group. Relative expression ranks of 16-GPS were highly
   robust against partial RNA degradation in FFPE subjects, and consistent
   between pairwise samples ([125]Figure S5).

16-GPS characterizing distinct multi-omics features in HRDness prostate
cancer

   Multi-omics analysis was conducted in the PRCA495 dataset. A total of
   61 genes showed significantly different mutation frequencies between
   the HRDness and non-HRDness groups after filtering the mutation type of
   silent mutation (Fisher’s exact test, p < 0.05, [126]Figure 5A), and
   the HRDness samples showed significantly higher tumor mutational burden
   (TMB) than the non-HRDness samples (one-sided Wilcoxon rank-sum test,
   p < 2.20 × 10^−16, [127]Figure 5B). Further, the histogram showed a
   high frequency of copy number aberrations (CNA) in the HRDness samples
   ([128]Figure 5C). Specifically, focusing on the genomic region, 613
   lesions were identified with significantly different CNA frequencies
   between these two groups (Fisher’s exact test, p < 0.05), of which, 612
   of 613 genomic lesions had significantly higher frequencies of copy
   number gain or loss in the HRDness group, which was coincidentally
   higher than expected (binomial distribution test, p < 2.20 × 10^−16).

Figure 5.

   [129]Figure 5
   [130]Open in a new tab

   The distinct multi-omics characteristics of the HRDness and non-HRDness
   samples

   (A) Oncoplot showing 61 genes with significantly different mutation
   frequencies (rows) across subjects (columns) split by 16-GPS. The left
   shows the mutation percentage, and the top shows the overall number of
   mutations. Molecular subgroups divided by BRCA mutation, signature 5,
   signature 3, HRD score, and HRDetect score across subjects are shown at
   the top. Demographic and clinical annotations are provided at the
   bottom.

   (B) TMB levels between groups classified by 16-GPS. p value were
   determined by a one-sided Wilcoxon rank-sum test.

   (C) Percentage of samples (Y axis) with CNA (amplification or deletion)
   along the exome, separately shown for the HRDness and non-HRDness
   subtypes.

   (D) Concordance between DEGs detected from the different datasets
   (upper half), and the Venn diagram displaying the overlap between the
   consensus DEGs in the three datasets and differentially methylated
   genes (DMGs) detected from the PRCA495 dataset.

   (E) Protein expression levels of two HRR genes between the HRDness and
   non-HRDness samples. p values were determined by a one-sided Wilcoxon
   rank-sum test.

   We identified 2,281, 4,641, and 10,185 differentially expressed genes
   (DEGs) between HRDness and non-HRDness samples from the PRCA131,
   PRCA213, and PRCA495 datasets, respectively (Wilcoxon rank-sum test,
   FDR <0.01). Only 15 DEGs were identified between the HRDness and
   non-HRDness groups from the PRCA281 dataset because of the small
   detected gene scale (Wilcoxon rank-sum test, FDR <0.01). Concordance
   analysis for DEGs showed that the consistency of HRDness-related DEGs
   between both datasets, excluding the PRCA281 dataset, was over or equal
   to 92.23% ([131]Figure 5D), indicating significantly reproducible and
   consistent transcriptional differences between these two groups.

   By comparing the methylation profiles between HRDness and non-HRDness
   samples in PRCA495, we detected 2,800 differentially hypermethylated
   and 2,552 hypomethylated genes (Wilcoxon rank-sum test, FDR <0.01). As
   shown in [132]Figure 5D, 104 of the 2,800 hypermethylated genes and 196
   of the 2,552 hypomethylated genes overlapped with 831 unique and
   consistent deregulation direction DEGs, respectively, which were
   obtained by overlapping the identified DEGs in the three datasets.
   Further, 98.08% (102/104) hypermethylated genes were concordantly
   under-expressed in the HRDness group. Similarly, 98.98% (194/196)
   hypomethylated genes were concordantly overexpressed in the HRDness
   group, which could not be coincidental (binomial distribution test, p <
   2.20 × 10^−16).

   Further, we also identified 24 differentially expressed proteins by
   comparing the proteomic data of these two groups (Wilcoxon rank-sum
   test, FDR <0.01). These genes were enriched in some HRD- and/or
   immune-related pathways (hypergeometric distribution model, p < 0.05,
   [133]Table S1). Importantly, two key proteins, BRCA2 and MRE11A, which
   are essential for HR ([134]Li et al., 2019; [135]D'Andrea and Grompe,
   2003), were significantly under-expressed in HRDness compared with that
   in the non-HRDness group (one-sided Wilcoxon rank-sum test, BRCA2: p =
   8.87 × 10^−5; MRE11A: p = 0.03, [136]Figure 5E). Besides, the HRDness
   group suffered genomic instability and epigenomic alterations,
   including ATM mutation, POLD4 and NBN copy number gain, DEPDC1B and
   H2AFX loss, UBR5 hypomethylation, and FANCE hypermethylation, which
   were enriched in HRR genes.

HRDness prostate cancer presenting elevated immune infiltration

   The 296 consensus genes, which represent consistent alterations in the
   transcriptome and epigenome levels caused by HRDness, not only
   participated in the HR-related pathways, but also some typically
   immune-related pathways, such as “Endocytosis”, “Autophagy,” and “Hippo
   signaling pathway” (hypergeometric distribution model, p < 0.05,
   [137]Figure 6A). Therefore, considering emerging evidence to suggest
   that HRD may influence the immune system ([138]Pellegrino et al.,
   2020), we next investigated the main infiltrating immune cells,
   including B cells, CD4 T cells, CD8+ T cells, neutrophils, macrophages,
   and dendritic cells, between the HRDness and non-HRDness groups from
   the PRCA495 dataset in the Tumor Immune Estimation Resource (TIMER)
   database. The violin plot showed a significantly increased fraction of
   5/6 immune cells in the HRDness group than that in the non-HRDness
   group ([139]Figure 6B). Surprisingly, two important genes regulating
   immune cells, IRF3 and TBK1, were upregulated in the HRDness group in
   more than two datasets. Results suggest that the overexpression may
   contribute to B cell and CD8+ T cell activation via the STING-TBK1-IRF3
   mechanism as reported previously ([140]Figure 6C) ([141]Zheng et al.,
   2020).

Figure 6.

   [142]Figure 6
   [143]Open in a new tab

   Immune infiltration analysis of HRDness tumors

   (A) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways
   significantly enriched with the 296 consensus genes were identified.
   Red, blue, and green arrows represent pathways related to immunity,
   HRD, and both, respectively (hypergeometric distribution model, p <
   0.05).

   (B) Abundance of six main infiltrating immune cells between the groups
   from the PRCA495 dataset. p values were determined by a one-sided
   Wilcoxon rank-sum test.

   (C) Relationship between the two key genes belonging to the cGAS-STING
   pathway and fractions of three types of immune cells.

   (D) Distribution of immune scores in groups classified by 16-GPS from
   the PRCA495 dataset. p value were determined by a one-sided Wilcoxon
   rank-sum test.

   (E) Heatmap displaying the distribution of log-transformed expression
   levels for immune-stimulators and immune-inhibitors with significant
   differences between the groups in the PRCA495 dataset.

   (F) Expression levels of four immunotherapeutic targets, viz., PD1,
   PD-L1, CTLA4, and PD-L2, between the two groups classified by 16-GPS
   from the PRCA495 dataset. p values were determined by a one-sided
   Wilcoxon rank-sum test.

   Further, the HRDness group exhibited a significantly increased immune
   score (one-sided Wilcoxon rank-sum test, p = 9.73 × 10^−12,
   [144]Figure 6D), which was likely responsible for the activation of
   immune-stimulators in the HRDness samples ([145]Figure 6E). In
   addition, when localizing immunotherapeutic targets, higher PD-L1,
   PD-L2, PD1, and CTLA4 expressions were noticed in the HRDness group
   ([146]Figure 6F). These results demonstrated that HRDness patients may
   benefit from immune checkpoint inhibitors (ICIs).

Genes in 16-GPS playing a central role in regulating the HRD-related pathways

   There were six overlaps with consistent deregulation directions between
   signature genes and 831 unique DEGs. Co-expression or co-methylation
   among these six signature genes and DEGs were used to construct the
   modulation network of HRDness. As shown in [147]Figure 7, the 82 DEGs
   regulated by SRPX were significantly enriched in genetic instability
   and DNA damage repair (DDR)-related biological pathway, namely
   “cGmp-pkg signaling pathway,” and an immune-related “ErbB signaling
   pathway” (p < 0.05, hypergeometric distribution model). Similarly,
   using the hypergeometric distribution model and 121 DEGs linked with
   UBE2W (p < 0.05), some HRD- and/or immune-related pathways were also
   observed, such as “cell cycle” and the “mTOR signaling pathway.”
   Focusing on HRR genes, we found one key HRR gene, viz., NBN, which was
   simultaneously connected to several signature genes, including
   ATP6V1C1, ESRP1, UBE2W, and NUCKS1, and was upregulated in the HRDness
   group, consistent with the results of previous research ([148]Min
   et al., 2020); the result may be due to the significantly higher
   frequency of copy number gain of NBN in HRDness tumors (Fisher’s exact
   test, p < 0.0001). Likewise, another HRR gene, viz., UBR5, was linked
   with ATP6V1C1 and UBE2W. In addition, UBR5 was over-expressed in the
   HRD group, which likely resulted from hypomethylation of this gene. As
   expected, these results suggest that the signature genes driving some
   HRD-related transcriptome alterations matched upstream-specific
   variations.

Figure 7.

   [149]Figure 7
   [150]Open in a new tab

   The regulatory network of signature genes for HRD

   The nodes (circle and hexagon) represent DEGs of co-expression or
   co-methylation with six differentially expressed signature genes
   (rhombus); the solid and dashed lines represent positive and negative
   relationships, respectively. KEGG pathways significantly enriched with
   DEGs connected with SRPX and ATP6V1CA are shown on the left and right
   sides, respectively. The pathways enriched with the HRD-related DEGs
   regulated by each of the remaining four signatures are displayed in
   [151]Table S2.

Discussion

   In this study, we derived a transcriptional qualitative signature,
   termed as 16-GPS, whose within-sample REOs could be used to demonstrate
   the HRD status of prostate cancer patients. To our knowledge, this was
   the first study to establish an HRDness transcriptional signature in
   prostate cancer for expanding patients who may benefit from PARPi
   treatment. As expected, the proportion of HRDness samples in the
   largest dataset, PRCA495, was slightly higher than the reported scale
   of HRR gene mutation (33.54% vs. 25%) ([152]Criscuolo et al., 2019;
   [153]Handy et al., 2018). In agreement with previous reports, the
   HRDness samples showed poor BFS, along with a high Gleason score and
   PSA level. Moreover, when 16-GPS was applied to the cell line data, we
   found that HRDness cells were more sensitive to PARPi ([154]Evans
   et al., 2016; [155]Marshall et al., 2019a). Further, analysis of
   genomic data demonstrated that HRDness prostate cancers classified by
   16-GPS displayed higher frequencies of mutation and CNA than the
   non-HRDness samples. HRD-associated alterations at the epigenomic level
   were consistent with transcriptional differences obtained between the
   HRDness and non-HRDness groups, suggesting that DNA methylation
   alterations of the CpG loci in these DEGs play key roles in promoting
   HRD. Further functional enrichment analysis showed that these
   consistent DEGs were enriched with some HRD- and/or immune-related
   pathways. Subsequently, HRDness samples were found with increased
   immune infiltration and expression of the immune checkpoint genes,
   suggesting that the HRDness tumors may respond to ICIs. Finally, in
   addition to some pathways related to HRD and/or immune system, we found
   that several HRR genes, such as NBN, UBR5, and RMI1, were regulated by
   genes involved in the 16-GPS, which were also regulated by genomic and
   epigenomic alterations.

   Presently, pre-clinical trials showed various PARPi response rates with
   different HRR gene mutations. Moreover, some cases without such
   mutations also responded to PARPi in prostate cancer ([156]Marshall
   et al., 2019b; [157]Mateo et al., 2015). This suggests that these
   genomic testing panels are not broad enough in illustrating the
   molecular characteristics of HRD tumors and/or to catch the mechanisms
   of PARPi vulnerability outside of the HRR gene mutations. There are two
   assumptions: (i) these tumors indeed exhibited the intact HRR gene, but
   shared a HRDness phenotype; (ii) current genomic testing methods have
   strict requirements for tumor materials, leading to false-negative
   results. Therefore, a more accurate biomarker for determining tumor HRD
   status is needed. Taking studies on breast and ovarian cancers as a
   reference, Zsofia et al. established the same HRD signatures in
   prostate cancer ([158]Sztupinszki et al., 2020; [159]Telli et al.,
   2016). However, setting a reliable cutoff for these HRD signatures
   induced by tissue-specific backgrounds is challenging ([160]Jonsson
   et al., 2019). Even signature 3, the basis of HRDetect discovery, lacks
   a discrete threshold to determine the BRCA states of tumors
   ([161]Alexandrov et al., 2013). Besides, a transcriptomics-based study
   created a DDR pathway signature, which mainly focused on predicting the
   prognosis of high-risk prostate cancer, rather than defining the DDR
   status ([162]Evans et al., 2016).

   Sokolova et al. shared a unique insight that it may be more reliable to
   detect the downstream consequences of HR function loss for accurately
   predicting the response to PARPi ([163]Sokolova et al., 2020).
   Therefore, considering the important role of the transcriptome as a
   “bridge” from genome to proteome and current high-throughput assays for
   gene expression detection, we developed a transcriptional qualitative
   signature based on the within-sample REOs rather than traditional
   quantitative biomarkers. Here, we chose HRDetect score rather than the
   HRD score to identify HRD-associated genes, because the former genomic
   signature showed a more accurate characterization of HRD than the
   latter in the prostate cancer whole-exome sequencing data from TCGA
   ([164]Sztupinszki et al., 2020). Strong genomic and proteomic evidence
   was obtained, which indicated that the HRDness tumors classified by
   16-GPS were associated with enhancing genomic instability and
   decreasing HRR protein expression. More importantly, the REO-based
   signature probably meet great application in clinical practice because:
   (i) the pairwise relative transcript abundance measurements of genes
   could be acquired using a low-throughput detection technique, such as
   real-time PCR; (ii) these qualitative signatures are relatively robust
   against experimental and technical variations; and (iii) the
   qualitative signature can be applied at an individual level, without
   pre-collecting a set of samples for setting a reliable cutoff or for
   data standardization ([165]Guan et al., 2018; [166]Bhuva et al., 2020).

   In summary, we developed 16-GPS based on REOs, which could be applied
   to define the HRD status of prostate cancer at an individual level. The
   HRDness samples classified by the 16-GPS showed distinct HRD
   characteristics at the multi-omics level and in clinical factors.
   Therefore, it is worthwhile to further verify the clinical applications
   of 16-GPS, which may reduce unnecessary medical expenses as well as
   cytotoxicity of some patients who represent PARPi-therapy-irrelevant
   low risk of death because of the presence of HRD phenotype.

Limitations of study

   There are a few limitations in this study. We were unable to collect
   prostate cancer samples with PARPi treatment to validate the inference
   that HRDness patients determined by 16-GPS could be benefited from
   PARPi. However, based on the expression of PARP genes, the prostate
   cancer samples were divided into PARP[neg] and PARP[pos] groups, and
   the PARP[neg] prostate cancer group was suggested to have a loss of
   PARP function. As expected, PARP[neg] patients with HRDness showing
   higher BFS were recognized rather than the PARP[pos] patients in the
   PRCA495 and PRCA281 datasets. However, a disagreement result was found
   in PRCA131 dataset, which may be caused by small sample size and low
   expression of PARP genes may be compensated by other genes or pathways
   in the HR deficient background and further promote cancer cell
   development ([167]Ghosh et al., 2016; [168]Ceccaldi et al., 2015).
   Besides, because of the absence of data with prostate cancer or cell
   lines treated by ICIs, we were only able to obtain some indirect
   results for the sensitivity of HRDness to ICIs, including the enlarged
   immune infiltration, high TMB, and expression of immunotherapeutic
   targets in HRDness samples. It is necessary to collect ICIs treatment
   data to explore the relationship between HRDness and ICIs in the
   future.

STAR★ Methods

Key resources table

   REAGENTE or RESOURCE SOURCE IDENTIFIER
   Deposited data
     __________________________________________________________________

   TCGA RNA-seq data (PRCA495) Genomic data commons application
   programming interface (GDC API)
   [169]https://gdc.cancer.gov/developers/gdc-application-programming-inte
   rface-api
   PRCA213 International Cancer Genome Consortium PRAD-CA
   ([170]https://icgc.org/)
   PRCA131 Gene Expression Omnibus (GEO) [171]GSE21032
   ([172]http://www.ncbi.nlm.nih.gov/geo/)
   PRCA281 GEO [173]GSE16560 ([174]http://www.ncbi.nlm.nih.gov/geo/)
   PRCA12-ADT GEO [175]GSE150368 ([176]http://www.ncbi.nlm.nih.gov/geo/)
   PRCA12-FF ArrayExpress E-MTAB-2523
   ([177]https://www.ebi.ac.uk/arrayexpress/)
   Genomics of Drug Sensitivity in Cancer GDSC
   [178]https://www.cancerrxgene.org/
   Cancer Genome Project CGP
   [179]https://www.nature.com/articles/nature11005#Sec18
   TCGA mutation data GDC API
   [180]https://gdc.cancer.gov/developers/gdc-application-programming-inte
   rface-api
   TCGA CNA data GDC API
   [181]https://gdc.cancer.gov/developers/gdc-application-programming-inte
   rface-api
   TCGA DNA methylation data cBioPortal [182]http://www.cbioportal.org/
   TCGA proteomics data Firehose [183]https://gdac.broadinstitute.org/
   TCGA immune-related data TIMER [184]http://timer.cistrome.org/
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   consensus cluster R-Bioconductor
   [185]https://bioconductor.org/packages/release/bioc/html/ConsensusClust
   erPlus.html
   ComplexHeatmap R-Bioconductor
   [186]https://bioconductor.org/packages/release/bioc/html/ComplexHeatmap
   .html
   clusterProfiler R-Bioconductor
   [187]https://bioconductor.org/packages/release/bioc/html/clusterProfile
   r.html
   Cyctoscape Graphpic Software [188]https://cytoscape.org/
   R (4.0.2) The R foundation [189]https://www.r-project.org/
   [190]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources should be directed to
   and will be fulfilled by the lead contact, Yunyan Gu
   ([191]guyunyan@ems.hrbmu.edu.cn).

Materials availability

   This study did not generate new unique materials.

Method details

Data processing

   A total of 1144 samples from six datasets were used in this study, as
   described in the [192]Key resources table. The dataset name is defined
   by the sample size. For survival analysis, we only used datasets
   containing more than 50 samples to avoid the bias of small sample size.
   For the training dataset of PRCA495, the mRNA-seq profiles of level 3
   Fragments Per Kilobase Million was extracted. For validation datasets
   PRCA213, PRCA131, PRCA281 and PRCA12-ADT, which were measured by four
   different platforms, the gene expression profiles were directly used
   after probe re-annotation without normalization. The probe-set
   identifiers (IDs) were mapped to Entrez gene IDs using the
   corresponding platform information. If multiple probe-sets were mapped
   to the same gene, the expression value of the gene was defined as their
   average value, and probe sets that did not map to any Entrez gene ID or
   mapped to multiple Entrez gene IDs were excluded in our study. For the
   dataset of PRCA12-FF, whose transcriptional data were derived from
   Illumina HiSeq 2000, the gene expression profiles of processed Reads
   Per Kilobase per Million mapped reads were used.

   TCGA mutation profiles were extracted according to the following
   inclusion criteria: (i) non-silent somatic mutation. (ii) At least one
   mutation called by Muse, Mutect2, Somaticsniper, or Varscan2 methods.
   Moreover, CNA, epigenomics and proteomics data were directly used in
   this study without any processed.

   LN_IC50 and AUDIC of five PARPi, viz., veliparib, olaparib, rucaparib,
   niraparib, and talazoparib, were obtained from GDSC database, and IC50
   values of veliparib and olaparib were obtained from CGP database. Data
   on the seven prostate cancer cell lines used in this study was
   referenced from the Catalogue of Somatic Mutations in Cancer
   ([193]https://cancer.sanger.ac.uk/cosmic).

Mutational signature, HRD, and HRDetect score

   The mutational signature information, including signature 3 and
   signature 5, HRD score, and HRDetect score for 495 samples from the
   PRCA495 dataset and 66 samples from the PRCA213 dataset was derived
   from research by [194]Sztupinszki et al. (2020).

Unsupervised clustering

   To determine the molecular subclass that is most biologically related
   to HRD in the RNA-Seq-based dataset (n = 495), we performed consensus
   clustering analysis in the range k = 2-6 (the pre-assigned number of
   clusters) based on the expression profiles of genes whose expressions
   were significantly correlated with the HRDetect score. Spearman's rank
   correlation was used for evaluating the relationships between gene
   expression and HRDetect scores. To avoid overfitting, the clustering
   data was resampled 1,000 times, each cycle consisting of a randomized
   selection of 80% samples and genes. Then, the resampled data was
   converted into a similarity matrix to calculate the probability of two
   samples clustered together, termed as consensus matrix. Although the
   lowest proportion of ambiguous clustering was observed at k = 5
   ([195]Figure S1A), considering the clinical and genomics
   characteristics that HRDness sample should carry and the appropriate
   sample size of HRDness, we chose k = 6, and C1 was considered the
   HRDness-like subclass ([196]Figure S1B). Clinical factors were compared
   between all cluster pairs using a one-sided Wilcoxon rank-sum test.

Development of the REO-based signature

   Transforming the quantitative expression of a single gene into a binary
   relation of the within-sample REOs of gene pairs, a binary matrix was
   obtained by pairwise comparisons of all genes, with E[i] > E[j] or
   E[i] < E[j] on a per sample basis, where E[i] and E[j] represent the
   expression values of gene i and gene j, respectively. We identified
   highly stable gene pairs whose REO pattern was preserved at ≥ 85% of C1
   samples, and the significant gene pairs in non-C1 samples were detected
   using the binomial distribution test with false discovery rate (FDR) <
   0.01. Further, gene pairs whose REO patterns were preserved in
   HRDness-like samples, but reversed in non-HRDness-like samples, were
   termed as candidate gene pairs. The p-values were adjusted using the
   Benjamini-Hochberg (BH) procedure. We classified the samples from the
   training dataset into the HRDness or non-HRDness subtypes for each
   candidate gene pair according to their within-sample REO pattern.
   Subsequently, we evaluated the sensitivity and specificity for each
   gene pair by calculating the ratio of correctly classified HRDness
   samples to all HRDness-like samples, and the ratio of correctly
   classified non-HRDness samples to all non-HRDness-like samples.
   Finally, from these candidate gene pairs, we performed a forward
   selection procedure to search for a signature that achieved the largest
   F-score value for determining the molecular subclasses of samples,
   which is defined as follows:
   [MATH: <mrow><mi>F</mi><mo
   linebreak="badbreak">−</mo><mi>s</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>
   e</mi><mo
   linebreak="goodbreak">=</mo><mfrac><mrow><mn>2</mn><mrow><mo>(</mo><mi>
   s</mi><mi>e</mi><mi>n</mi><mi>s</mi><mi>i</mi><mi>t</mi><mi>i</mi><mi>v
   </mi><mi>i</mi><mi>t</mi><mi>y</mi><mtext>∗</mtext><mi>s</mi><mi>p</mi>
   <mi>e</mi><mi>c</mi><mi>i</mi><mi>f</mi><mi>i</mi><mi>c</mi><mi>i</mi><
   mi>t</mi><mi>y</mi><mo>)</mo></mrow></mrow><mrow><mo>(</mo><mi>s</mi><m
   i>e</mi><mi>n</mi><mi>s</mi><mi>i</mi><mi>t</mi><mi>i</mi><mi>v</mi><mi
   >i</mi><mi>t</mi><mi>y</mi><mo
   linebreak="badbreak">+</mo><mi>s</mi><mi>p</mi><mi>e</mi><mi>c</mi><mi>
   i</mi><mi>f</mi><mi>i</mi><mi>c</mi><mi>i</mi><mi>t</mi><mi>y</mi><mo>)
   </mo></mrow></mfrac></mrow> :MATH]

   All candidate gene pairs were sorted in descending order according to
   their F-score value. The gene pair with the largest F-score was chosen
   as the seed, and the gene pairs that follow were added to the set one
   at a time, until the F-score value did not increase. Taking 10% as the
   gradient, we adopted 10-90% gene pairs contained in the signatures as
   the cutoff. A sample was classified into the HRDness group if more than
   the pre-defined cutoff of the REOs of gene pairs in the set was voted
   for HRDness; else, the sample was assigned to the non-HRDness group.
   Finally, a set of gene pairs with the largest AUC and F-score was
   selected as the final HRDness classification signature by employing
   AUC-ROC. Particularly, the classification cutoff correspondingly
   reduced with the number of matched gene pairs within the sample.

Defining PARP[neg] and PARP[pos] tumors

   Using the StepMiner algorithm ([197]Sahoo et al., 2007), other than the
   PRCA281 dataset, in which only PARP1 overlapped with the detected
   genes, if any two among PARP1, PARP2, and PARP3 expression levels from
   a sample were smaller than the corresponding StepMiner cutoff, then the
   sample was classified into the PARP[neg] group.

Pathway enrichment analysis

   Functional enrichment analysis of DEGs and differentially expressed
   proteins was performed using the R package ‘clusterProfiler’.
   Subsequently, bubble maps of the KEGG pathway were drawn.

Quantification and statistical analysis

Analysis of the multi-omics data

   Fisher’s exact test was used to identify genes or genomic regions with
   significantly different mutations or CNA frequencies between two groups
   classified by 16-GPS. TMB was calculated based on somatic nonsynonymous
   mutations. The Wilcoxon rank-sum test was used to detect DEGs or DMGs
   between two groups. The BH procedure was used to calculate FDR.
   Further, protein expression was compared between the two groups using a
   one-sided Wilcoxon rank-sum test.

Analysis of immunity features

   The one-sided Wilcoxon rank-sum test was used to compare the immunity
   features mentioned above and the expression of the immune checkpoint
   genes between two groups. Spearman rank correlation was used for
   evaluating the relationships between the expression of two key genes in
   the STING pathway and the three main infiltrating immune cells.

Network analysis

   Spearman's rank correlation, with |r| > 0.6 and FDR < 0.01, was used
   for identifying DEGs that were co-expressed and co-methylated with
   signature genes based on the expression and methylation profiles of
   PRCA495.

Survival analysis

   The Kaplan-Meier curves of BFS were estimated using the Kaplan-Meier
   method and compared with the log-rank test.

Statistical analysis

   All statistical analyses in this study were performed using R software
   versions 4.0.2.

Acknowledgments