Abstract

Objectives

   To analyze the correlations between the expression and effect of DNA
   damage repair genes and the immune status and clinical outcomes of
   urothelial bladder cancer (BLCA) patients. In addition, we evaluate the
   efficacy and value of utilizing the DNA damage repair genes signature
   as a prognosis model for BLCA.

Methods

   Two subtype groups (C1 and C2) were produced based on the varied
   expression of DNA damage repair genes. Significantly differentiated
   genes and predicted enriched gene pathways were obtained between the
   two subtypes. Seven key genes were obtained from the DNA damage
   repair-related genes and a 7-gene signature prognosis model was
   established based on the key genes. The efficacy and accuracy of this
   model in prognosis prediction was evaluated and verified in two
   independent databases. Also, the difference in biological functions,
   drug sensitivity, immune infiltration and affinity between the
   high-risk group and low-risk group was analyzed.

Results

   The DNA damage repair gene signature could significantly differentiate
   the BLCA into two molecular subgroups with varied genetic expression
   and enriched gene pathways. Seven key genes were screened out from the
   232 candidate genes for prognosis prediction and a 7-gene signature
   prognosis model was established based on them. Two independent patient
   cohorts (TCGA cohort and GEO cohort) were utilized to validate the
   efficacy of the prognosis model, which demonstrated an effective
   capability to differentiate and predict the overall survival of BLCA
   patients. Also, the high-risk group and low-risk group derived from the
   7-gene model exhibited significantly differences in drug sensitivity,
   immune infiltration status and biological pathways enrichment.

Conclusions

   Our established 7-gene signature model based on the DNA damage repair
   genes could serve as a novel prognosis predictive tool for BLCA. The
   differentiation of BLCA patients based on the 7-gene signature model
   may be of great value for the appropriate selection of specific
   chemotherapy agents and immune-checkpoint blockade therapy
   administration.

   Keywords: Bladder cancer, Prognosis, DNA Damage repair, Immunotherapy

1. Introduction

   Bladder cancer (BLCA) is one of the most common urological malignancies
   globally, leading in both incidence rate and mortality rate
   [[47]1,[48]2]. Smoking has been identified as a major contributing
   factor for the oncogenesis of BLCA [[49]3]. Clinically, BLCA could be
   divided into two subtypes as non-muscle invasive bladder cancer (NMIBC)
   and muscle-invasive bladder cancer (MIBC), which is based on the degree
   of tumor penetration into the muscle layer. By timely inhibitory
   therapeutics such as transurethral resection of bladder tumor (TURBT)
   and intravesical Bacille Calmette-Guérin (BCG) infusion treatment, A
   nearly 90% 5-year survival rate could be achieved in the NMIBC
   patients. However, what we cannot neglect is that NMIBC still has a
   high tendency of recurrence and progression into MIBC [[50]4,[51]5].
   Once BLCA patients progressed into MIBC, the 5-year rate would
   drastically decline into 50% and the probability of further metastasis
   would sharply increase [[52]6,[53]7]. Radical cystectomy in combination
   with neoadjuvant chemotherapy has long been utilized as the standard
   therapeutic for MIBC, while the high operation difficulty, multiple
   complications as well as the expensive surgical cost largely limited
   the patient benefits [[54]8,[55]9]. Recently, immunotherapy based on
   immune-checkpoint blockade (ICB) has shed a new light of hope on the
   clinical treatment of cancer, whose efficacy has been verified in a
   range of clinical trials [[56][10], [57][11], [58][12]]. Nevertheless,
   a great efficacy difference was observed among individuals, which
   suggested that the genetic and immune characteristics may determine the
   immune sensitivity of the BLCA patients and more detailed patient
   selection and grouping strategies based on gene signature are urgently
   warranted [[59]13]. In addition, innate or adaptive immune resistance
   seriously dampened the effectiveness of ICB, which indicated that a
   deeper understanding of regulatory mechanisms involving antitumor
   immunity is of great importance [[60]13,[61]14].

   A good preservation of the genomic information is an indispensable
   premise for the stability and homeostasis of an independent life.
   Meanwhile, genetic instability, mainly represented by mutagenesis, is
   the driving force for the evolution of life. The balance between these
   two divergent pathways ensured the perpetuation of life, and DNA damage
   repair (DDR) pathway is an integral part of this network
   [[62]15,[63]16]. Owing to high susceptibility to both exogenous and
   endogenous inducing factors, as well as the inevitable incidence of
   mistaken replication or repair of DNA, unfavorable mutations could
   emerge and accumulate in cells. Thanks to the DDR system, the damage to
   DNA could be eliminated timely and thereby avoiding disadvantageous
   mutations [[64]16,[65]17]. However, inappropriate regulation or
   functional defects of DDR system could be hijacked by tumor cells,
   which may help promote the formation of various hallmarks of cancer
   [[66]18,[67]19]. Recently, multiple studies have found that DDR-related
   genes play pivotal roles in the process of a range of cancers
   [[68][20], [69][21], [70][22]]. Moreover, the DDR-related gene
   signature has been identified as a potential tool to predict the ICB
   efficacy [[71]23]. In BLCA, DDR gene mutational status have been found
   to be associated with immune genes expression [[72]24]. In addition,
   disturbance or activation of DDR pathways have been found to be
   promising therapeutic targets for multiple treatments, including
   radiation [[73]25,[74]26] and ICB immunotherapy [[75]27]. Yin et al.
   made a primary evaluation of the prognostic value of DDR-related
   genomic alterations for BLCA patients and found that ATM alteration has
   a significant association with a worse prognosis [[76]28]. However,
   sufficient data illustrating the prognostic value of DDR-related genes
   in BLCA and studies exploring the relationship between DDR and immune
   status of BLCA still lacked.

   Here in this study, we established a prognosis model based on the
   expression of DDR-related genes in BLCA and made a deep analysis into
   the correlation between DDR and immune infiltration and tumor stemness.
   We found that the prognosis of BLCA patients could be effectively
   stratified and predicted. DDR pathway may provide more valuable
   information for the clinical diagnosis and treatment of BLCA.

2. Materials and methods

2.1. Data extraction and processing

   The Cancer Genome Atlas (TCGA) database
   ([77]https://portal.gdc.cancer.gov/) was utilized to collect the raw
   data of the RNA sequence-based counts along with the corresponding
   clinical parameters. A total of 408 BLCA patients and 19 normal tissues
   were finally enrolled in the cohort. For validation of our prognosis
   model, another testing cohort was extracted from the GENE EXPRESSION
   OMNIBUS (GEO) database. A total of 256 BLCA patients were finally
   enrolled from [78]GSE13507 and [79]GSE31684 datasets. The transcriptome
   information and the clinical parameters, including age, grade, TNM
   stage, and overall survival (OS) data were obtained. The data is shared
   voluntarily and could be downloaded and used freely in no requirement
   of ethics committee approval.

2.2. Molecular subgrouping method

   To establish the molecular subtypes of BLCA based on DDR-related genes,
   the RNA-sequencing expression data (level 3) and corresponding clinical
   parameters were extracted. ConsensusClusterPlus R package (v1.54.0) was
   utilized to conduct the consistency analysis, the maximum number of
   clusters is 6, and 80% of the total sample is drawn 100 times,
   clusterAlg = “hc”, innerLinkage = ‘ward.D2’. The R software package
   pheatmap (v1.0.12) was used to draw the clustering heatmaps. The genes
   displayed in the heatmap are kept with SD > 0.1. Top 1/4 genes will be
   filtrated out depending on the SD if the input genes outnumbered 1000.
   Two subtypes (C1 and C2) were formed as shown in the heatmap. The list
   of all the DDR-related genes used in the analysis has been provided in
   the supplemental data of [80]Table S1.

   To further explore the genetic and molecular difference between C1 and
   C2 subtypes, the R limma package was used to obtain the differentially
   expressed mRNA. The threshold was set as “Log2F > 1 or Log2F < −1 with
   the adjusted p value < 0.05” for the differential expression of mRNAs.
   To further predict the biological functions of the DDR-derived
   molecular subtypes, functional pathway enrichment analysis was
   performed on the obtained significantly up- or down-regulated genes. R
   ClusterProfiler package was utilized to analyze the GO function
   prediction and the KEGG pathway enrichment. The boxplot and heatmap
   were drew via R ggplot2 package and pheatmap package respectively. All
   the analysis methods and R package were implemented by R version 4.0.3.

2.3. Establishment and assessment of the prognosis model

   We first utilized the univariate Cox regression analysis to identify
   the crucial prognosis-associated genes (p < 0.01). Ten key genes were
   selected from the significantly-associated genes infiltrated by
   univariate Cox regression analysis. The After screening out the 10 key
   genes. The R glmnet package was used to conduct the feature selection
   through the method of Least absolute shrinkage and selection operator
   (LASSO) regression algorithm. Survival outcomes were compared between
   groups by Log-rank testing KM survival analysis. The predictive
   efficacy and the risk score of DDR-related 7 key genes were evaluated
   by TimeROC analysis. Univariate Cox proportional hazards regression and
   log-rank test were utilized to calculate the p-values and hazard ratio
   (HR) with 95% confidence interval (CI) and generate the Kaplan–Meier
   curves. P < 0.05 was viewed as statistically significant for certain
   difference. The R forest package was used to draw the forest map, which
   showed the HR, 95% CI and the p value, by the univariate and
   multivariable Cox regression analysis. Based on the multivariate COX
   proportional hazards analysis results, we utilized the R RMS package to
   generate the nomogram to make a prediction of the 1, 3, 5-year
   recurrence rate.

2.4. Immune infiltration and GSEA gene pathways enrichment analysis

   To explore the relationship between immune infiltration status and
   different DDR-divided subtypes in BLCA, we used the R package of
   immundeconv, which consists of six algorithms, including TIMER, xCell,
   MCP-counter, CIBERSORT, EPIC and quanTIseq. Each algorithm owns its
   specific advantages and unique characteristics. The immune function
   difference was compared between high-risk and low-risk groups through
   GSVA R package. Crucial small-molecular bioactive components between
   high-risk and low-risk groups of BLCA patients were compared through
   limma algorithm. To compare the expression distribution of immune
   checkpoints gene in high-risk and low-risk groups. RNA-sequencing
   expression (level 3) profiles and corresponding clinical information
   for BLCA were downloaded from the TCGA dataset
   ([81]https://portal.gdc.com). A range of key immune-checkpoint proteins
   were selected and the expression levels of these genes were obtained.
   The ggplot2 R package and pheatmap R package were used for the
   analysis. Potential ICB response and immune dysfunction and exclusion
   status were predicted with TIDE algorithm. TCIA database was used to
   compare the T cell exhaustion status between high-risk and low-risk
   group through wilcox test.

   Gene Set Enrichment Analysis (GSEA)
   ([82]http://www.broad.mit.edu/gsea/) was utilized to determine the
   concordant and significant differences in the defined genes set and
   pathways between 2 biological conditions.

2.5. Drug sensitivity prediction analysis

   The RNA-sequencing expression information and related clinical
   information were downloaded from the TCGA dataset. The chemotherapeutic
   response for each group was predicted based on the available
   pharmacogenomics database of the Genomics of Drug Sensitivity in Cancer
   (GDSC) ([83]https://www.cancerrxgene.org). The R pRRophetic package was
   used to perform the prediction. The samples' half-maximal inhibitory
   concentration (IC50) was measured using ridge regression. All
   parameters were set as the default values. The batch effect of combat
   and tissue type were used for all tissues, and the duplicate gene
   expression was set as mean value.

3. Results

3.1. DDR-based molecular subtype grouping

   Distinct subgroups of 408 BLCA patients were generated according to the
   expression levels of DDR-related genes. Sufficient selection could be
   achieved when k was set as 2 ([84]Fig. 1A–B), and two molecular
   subgroups were finally divided from the whole cohort. Two subgroups as
   C1 and C2 were successfully formed according to proportion of ambiguous
   clustering (PAC) ([85]Fig. 1C–D). Comparative analysis on the core
   clinical parameters, including T and N stage, Tumor grade, gender, and
   age, was conducted between C1 and C2 ([86]Fig. 1E). Significant
   difference was only identified in the tumor grade between C1 and C2.
   More low-grade patients were found in the C2 group. When it comes to
   tumor stage, gender, and age, no significant difference was found
   between the two groups. Differentially expressed genes (DEGs) were
   compared between the two subgroups. In addition, GO annotation and KEGG
   pathway enrichment analysis were performed to explore the downstream
   regulatory genetic functions ([87]Fig. 2A–C). Also, a comparative
   analysis between C1 and C2 with the uninvolved normal tissues may also
   provide with more valuable and accurate information. Therefore, we also
   conducted a DEGs analysis and the gene enrichment analysis between C1
   ([88]Fig. 3A, B, E, G), C2 ([89]Fig. 3C, D, F, H) and the uninvolved 19
   uninvolved normal tissues. Fifty-nine and 32 DDR-related genes were
   found to show contrasting expression between C1 and C2 group compared
   to uninvolved tissue, the details are shown in [90]Table S2.

Fig. 1.

   [91]Fig. 1
   [92]Open in a new tab

   Molecular subtypes of BLCA based DDR-related genes. (A, B) Delta area
   curve of consensus clustering, representing the alterations in area
   under the cumulative distribution function (CDF) curve for each
   category number k compared with k − 1. (C) The heatmap showing
   differentiation of DDR-related gene expression in C1 and C2, the color
   of red and blue represents high and low expression respectively. (D)
   The heatmap illustrating consensus clustering solution (k = 2) for
   DDR-related genes. (E) Comparative analysis on the core clinical
   parameters. (For interpretation of the references to color in this