Abstract Objectives To analyze the correlations between the expression and effect of DNA damage repair genes and the immune status and clinical outcomes of urothelial bladder cancer (BLCA) patients. In addition, we evaluate the efficacy and value of utilizing the DNA damage repair genes signature as a prognosis model for BLCA. Methods Two subtype groups (C1 and C2) were produced based on the varied expression of DNA damage repair genes. Significantly differentiated genes and predicted enriched gene pathways were obtained between the two subtypes. Seven key genes were obtained from the DNA damage repair-related genes and a 7-gene signature prognosis model was established based on the key genes. The efficacy and accuracy of this model in prognosis prediction was evaluated and verified in two independent databases. Also, the difference in biological functions, drug sensitivity, immune infiltration and affinity between the high-risk group and low-risk group was analyzed. Results The DNA damage repair gene signature could significantly differentiate the BLCA into two molecular subgroups with varied genetic expression and enriched gene pathways. Seven key genes were screened out from the 232 candidate genes for prognosis prediction and a 7-gene signature prognosis model was established based on them. Two independent patient cohorts (TCGA cohort and GEO cohort) were utilized to validate the efficacy of the prognosis model, which demonstrated an effective capability to differentiate and predict the overall survival of BLCA patients. Also, the high-risk group and low-risk group derived from the 7-gene model exhibited significantly differences in drug sensitivity, immune infiltration status and biological pathways enrichment. Conclusions Our established 7-gene signature model based on the DNA damage repair genes could serve as a novel prognosis predictive tool for BLCA. The differentiation of BLCA patients based on the 7-gene signature model may be of great value for the appropriate selection of specific chemotherapy agents and immune-checkpoint blockade therapy administration. Keywords: Bladder cancer, Prognosis, DNA Damage repair, Immunotherapy 1. Introduction Bladder cancer (BLCA) is one of the most common urological malignancies globally, leading in both incidence rate and mortality rate [[47]1,[48]2]. Smoking has been identified as a major contributing factor for the oncogenesis of BLCA [[49]3]. Clinically, BLCA could be divided into two subtypes as non-muscle invasive bladder cancer (NMIBC) and muscle-invasive bladder cancer (MIBC), which is based on the degree of tumor penetration into the muscle layer. By timely inhibitory therapeutics such as transurethral resection of bladder tumor (TURBT) and intravesical Bacille Calmette-Guérin (BCG) infusion treatment, A nearly 90% 5-year survival rate could be achieved in the NMIBC patients. However, what we cannot neglect is that NMIBC still has a high tendency of recurrence and progression into MIBC [[50]4,[51]5]. Once BLCA patients progressed into MIBC, the 5-year rate would drastically decline into 50% and the probability of further metastasis would sharply increase [[52]6,[53]7]. Radical cystectomy in combination with neoadjuvant chemotherapy has long been utilized as the standard therapeutic for MIBC, while the high operation difficulty, multiple complications as well as the expensive surgical cost largely limited the patient benefits [[54]8,[55]9]. Recently, immunotherapy based on immune-checkpoint blockade (ICB) has shed a new light of hope on the clinical treatment of cancer, whose efficacy has been verified in a range of clinical trials [[56][10], [57][11], [58][12]]. Nevertheless, a great efficacy difference was observed among individuals, which suggested that the genetic and immune characteristics may determine the immune sensitivity of the BLCA patients and more detailed patient selection and grouping strategies based on gene signature are urgently warranted [[59]13]. In addition, innate or adaptive immune resistance seriously dampened the effectiveness of ICB, which indicated that a deeper understanding of regulatory mechanisms involving antitumor immunity is of great importance [[60]13,[61]14]. A good preservation of the genomic information is an indispensable premise for the stability and homeostasis of an independent life. Meanwhile, genetic instability, mainly represented by mutagenesis, is the driving force for the evolution of life. The balance between these two divergent pathways ensured the perpetuation of life, and DNA damage repair (DDR) pathway is an integral part of this network [[62]15,[63]16]. Owing to high susceptibility to both exogenous and endogenous inducing factors, as well as the inevitable incidence of mistaken replication or repair of DNA, unfavorable mutations could emerge and accumulate in cells. Thanks to the DDR system, the damage to DNA could be eliminated timely and thereby avoiding disadvantageous mutations [[64]16,[65]17]. However, inappropriate regulation or functional defects of DDR system could be hijacked by tumor cells, which may help promote the formation of various hallmarks of cancer [[66]18,[67]19]. Recently, multiple studies have found that DDR-related genes play pivotal roles in the process of a range of cancers [[68][20], [69][21], [70][22]]. Moreover, the DDR-related gene signature has been identified as a potential tool to predict the ICB efficacy [[71]23]. In BLCA, DDR gene mutational status have been found to be associated with immune genes expression [[72]24]. In addition, disturbance or activation of DDR pathways have been found to be promising therapeutic targets for multiple treatments, including radiation [[73]25,[74]26] and ICB immunotherapy [[75]27]. Yin et al. made a primary evaluation of the prognostic value of DDR-related genomic alterations for BLCA patients and found that ATM alteration has a significant association with a worse prognosis [[76]28]. However, sufficient data illustrating the prognostic value of DDR-related genes in BLCA and studies exploring the relationship between DDR and immune status of BLCA still lacked. Here in this study, we established a prognosis model based on the expression of DDR-related genes in BLCA and made a deep analysis into the correlation between DDR and immune infiltration and tumor stemness. We found that the prognosis of BLCA patients could be effectively stratified and predicted. DDR pathway may provide more valuable information for the clinical diagnosis and treatment of BLCA. 2. Materials and methods 2.1. Data extraction and processing The Cancer Genome Atlas (TCGA) database ([77]https://portal.gdc.cancer.gov/) was utilized to collect the raw data of the RNA sequence-based counts along with the corresponding clinical parameters. A total of 408 BLCA patients and 19 normal tissues were finally enrolled in the cohort. For validation of our prognosis model, another testing cohort was extracted from the GENE EXPRESSION OMNIBUS (GEO) database. A total of 256 BLCA patients were finally enrolled from [78]GSE13507 and [79]GSE31684 datasets. The transcriptome information and the clinical parameters, including age, grade, TNM stage, and overall survival (OS) data were obtained. The data is shared voluntarily and could be downloaded and used freely in no requirement of ethics committee approval. 2.2. Molecular subgrouping method To establish the molecular subtypes of BLCA based on DDR-related genes, the RNA-sequencing expression data (level 3) and corresponding clinical parameters were extracted. ConsensusClusterPlus R package (v1.54.0) was utilized to conduct the consistency analysis, the maximum number of clusters is 6, and 80% of the total sample is drawn 100 times, clusterAlg = “hc”, innerLinkage = ‘ward.D2’. The R software package pheatmap (v1.0.12) was used to draw the clustering heatmaps. The genes displayed in the heatmap are kept with SD > 0.1. Top 1/4 genes will be filtrated out depending on the SD if the input genes outnumbered 1000. Two subtypes (C1 and C2) were formed as shown in the heatmap. The list of all the DDR-related genes used in the analysis has been provided in the supplemental data of [80]Table S1. To further explore the genetic and molecular difference between C1 and C2 subtypes, the R limma package was used to obtain the differentially expressed mRNA. The threshold was set as “Log2F > 1 or Log2F < −1 with the adjusted p value < 0.05” for the differential expression of mRNAs. To further predict the biological functions of the DDR-derived molecular subtypes, functional pathway enrichment analysis was performed on the obtained significantly up- or down-regulated genes. R ClusterProfiler package was utilized to analyze the GO function prediction and the KEGG pathway enrichment. The boxplot and heatmap were drew via R ggplot2 package and pheatmap package respectively. All the analysis methods and R package were implemented by R version 4.0.3. 2.3. Establishment and assessment of the prognosis model We first utilized the univariate Cox regression analysis to identify the crucial prognosis-associated genes (p < 0.01). Ten key genes were selected from the significantly-associated genes infiltrated by univariate Cox regression analysis. The After screening out the 10 key genes. The R glmnet package was used to conduct the feature selection through the method of Least absolute shrinkage and selection operator (LASSO) regression algorithm. Survival outcomes were compared between groups by Log-rank testing KM survival analysis. The predictive efficacy and the risk score of DDR-related 7 key genes were evaluated by TimeROC analysis. Univariate Cox proportional hazards regression and log-rank test were utilized to calculate the p-values and hazard ratio (HR) with 95% confidence interval (CI) and generate the Kaplan–Meier curves. P < 0.05 was viewed as statistically significant for certain difference. The R forest package was used to draw the forest map, which showed the HR, 95% CI and the p value, by the univariate and multivariable Cox regression analysis. Based on the multivariate COX proportional hazards analysis results, we utilized the R RMS package to generate the nomogram to make a prediction of the 1, 3, 5-year recurrence rate. 2.4. Immune infiltration and GSEA gene pathways enrichment analysis To explore the relationship between immune infiltration status and different DDR-divided subtypes in BLCA, we used the R package of immundeconv, which consists of six algorithms, including TIMER, xCell, MCP-counter, CIBERSORT, EPIC and quanTIseq. Each algorithm owns its specific advantages and unique characteristics. The immune function difference was compared between high-risk and low-risk groups through GSVA R package. Crucial small-molecular bioactive components between high-risk and low-risk groups of BLCA patients were compared through limma algorithm. To compare the expression distribution of immune checkpoints gene in high-risk and low-risk groups. RNA-sequencing expression (level 3) profiles and corresponding clinical information for BLCA were downloaded from the TCGA dataset ([81]https://portal.gdc.com). A range of key immune-checkpoint proteins were selected and the expression levels of these genes were obtained. The ggplot2 R package and pheatmap R package were used for the analysis. Potential ICB response and immune dysfunction and exclusion status were predicted with TIDE algorithm. TCIA database was used to compare the T cell exhaustion status between high-risk and low-risk group through wilcox test. Gene Set Enrichment Analysis (GSEA) ([82]http://www.broad.mit.edu/gsea/) was utilized to determine the concordant and significant differences in the defined genes set and pathways between 2 biological conditions. 2.5. Drug sensitivity prediction analysis The RNA-sequencing expression information and related clinical information were downloaded from the TCGA dataset. The chemotherapeutic response for each group was predicted based on the available pharmacogenomics database of the Genomics of Drug Sensitivity in Cancer (GDSC) ([83]https://www.cancerrxgene.org). The R pRRophetic package was used to perform the prediction. The samples' half-maximal inhibitory concentration (IC50) was measured using ridge regression. All parameters were set as the default values. The batch effect of combat and tissue type were used for all tissues, and the duplicate gene expression was set as mean value. 3. Results 3.1. DDR-based molecular subtype grouping Distinct subgroups of 408 BLCA patients were generated according to the expression levels of DDR-related genes. Sufficient selection could be achieved when k was set as 2 ([84]Fig. 1A–B), and two molecular subgroups were finally divided from the whole cohort. Two subgroups as C1 and C2 were successfully formed according to proportion of ambiguous clustering (PAC) ([85]Fig. 1C–D). Comparative analysis on the core clinical parameters, including T and N stage, Tumor grade, gender, and age, was conducted between C1 and C2 ([86]Fig. 1E). Significant difference was only identified in the tumor grade between C1 and C2. More low-grade patients were found in the C2 group. When it comes to tumor stage, gender, and age, no significant difference was found between the two groups. Differentially expressed genes (DEGs) were compared between the two subgroups. In addition, GO annotation and KEGG pathway enrichment analysis were performed to explore the downstream regulatory genetic functions ([87]Fig. 2A–C). Also, a comparative analysis between C1 and C2 with the uninvolved normal tissues may also provide with more valuable and accurate information. Therefore, we also conducted a DEGs analysis and the gene enrichment analysis between C1 ([88]Fig. 3A, B, E, G), C2 ([89]Fig. 3C, D, F, H) and the uninvolved 19 uninvolved normal tissues. Fifty-nine and 32 DDR-related genes were found to show contrasting expression between C1 and C2 group compared to uninvolved tissue, the details are shown in [90]Table S2. Fig. 1. [91]Fig. 1 [92]Open in a new tab Molecular subtypes of BLCA based DDR-related genes. (A, B) Delta area curve of consensus clustering, representing the alterations in area under the cumulative distribution function (CDF) curve for each category number k compared with k − 1. (C) The heatmap showing differentiation of DDR-related gene expression in C1 and C2, the color of red and blue represents high and low expression respectively. (D) The heatmap illustrating consensus clustering solution (k = 2) for DDR-related genes. (E) Comparative analysis on the core clinical parameters. (For interpretation of the references to color in this