Abstract The potential role of homologous recombination deficiency (HRD) in the diagnosis and treatment of colon adenocarcinoma (COAD) remains incompletely explored. Differential gene expression analysis was conducted using Limma to identify genes with altered expression levels. Key genes associated with HRD were identified through the integration of WGCNA and machine learning techniques. For the unsupervised grouping of samples, ConsensusClusterPlus was applied. To quantify gene expression and protein abundance in clinical tissues and cell lines, RT-qPCR and Western Blotting (WB) assays were performed, respectively. The “pRRophetic” package was employed to predict drug sensitivity profiles. Molecular docking simulations and optimal pose presentations were conducted by using CB-Dock2. Our comprehensive analysis of multiple COAD data sets, leveraging WGCNA and machine learning, unveiled five novel, previously unreported biomarkers of HRD: TNFRSF11A, SERPINA1, SPINK4, REG4, and CYP2B6. We devised an innovative HRD-linked molecular classification system and a predictive nomogram that accurately forecasts patient outcomes. Experimental validation substantiated the upregulation of CYP2B6 in COAD, enhancing proliferation and migration capabilities, and demonstrated a robust positive association with established HRD indicators RAD51 and γH2AX. Notably, CYP2B6 emerged as a promising predictor of PARP inhibitor (PARPi) sensitivity, offering potential therapeutic implications. In conclusion, our study, harnessing machine learning and experimental validation, has uncovered novel biomarkers of HRD and PARPi sensitivity, shedding light on potential avenues for tailored clinical treatment strategies in COAD, thereby advancing personalized medicine. __________________________________________________________________ graphic file with name ao4c10672_0014.jpg __________________________________________________________________ graphic file with name ao4c10672_0012.jpg Introduction Colon adenocarcinoma (COAD) is the third most common malignant tumor globally, with approximately 1.148 million new cases reported worldwide in 2020. It is projected that the number of new cases will reach around 2.2 million by 2030. The mortality rate remains high, with about 576,000 new deaths globally in 2020, making it the second leading cause of cancer-related deaths worldwide. Due to the lack of clear early symptoms, the majority of COAD patients are diagnosed at an advanced stage. Even after undergoing treatments such as surgery, metastasis and recurrence are common, resulting in a relatively low 5-year survival rate. − In the field of oncology, particularly in the realm of COAD research, homologous recombination deficiency (HRD) has emerged as a key biological characteristic and a potential therapeutic target, attracting increasing scientific attention. − HRD denotes a state where cells, confronted with severe DNA damage such as double-strand breaks (DSBs), are unable to effectively repair these lesions due to deficiencies in the homologous recombination (HR) repair pathway. − Homologous recombination repair (HRR), a primary mechanism for DSB repair, relies on high-fidelity restoration facilitated by proteins encoded by genes including BRCA1/2. When HRR functionality is compromised, DSBs increasingly rely on alternative repair pathways like nonhomologous end-joining (NHEJ) and microhomology-mediated end-joining (MMEJ), which may lead to insertions/deletions in nucleic acid sequences, copy number abnormalities, and ultimately, chromosomal cross-linking and genomic instability. − The HRD score is a value calculated by using a proprietary laboratory-specific algorithm based on one or more HRD biomarkers, which is used to quantify the degree of functional defects in the homologous recombination repair pathway of cells. Knijnenburg et al. previously calculated the HRD score for all tumor samples in the TCGA cohort by integrating loss of heterozygosity (LOH), telomeric allelic imbalance (TAI), and large-scale state transition (LST). COAD, a prevalent malignancy in the digestive system, is frequently accompanied by myriad genetic and molecular aberrations during its initiation and progression. , Research underscores the ubiquitous presence of HRD across multiple cancer types, including COAD. The existence of HRD status contributes to tumorigenesis by enhancing genomic instability, thereby fostering tumor cell proliferation and metastasis. In the context of COAD, the occurrence of HRD has been linked to mutations in genes such as BRCA1/2, promoter methylation, or abnormalities in other HRR pathway genes. Cancer cells harboring HRD status exhibit heightened sensitivity to specific classes of anticancer agents, notably PARP inhibitors (PARPis). − By inhibiting PARP-mediated DNA single-strand break repair, PARPis synergize with HRD to elicit a “synthetic lethality” effect, ultimately leading to tumor cell demise. , Consequently, the detection of HRD status holds paramount significance in identifying patients who may benefit from PARPi therapy. The potential for individualized therapeutic strategies based on HRD status represents an emerging trend in the treatment of COAD. By assessing patients’ HRD status, clinicians can devise more precise and efficacious treatment plans, thereby enhancing therapeutic outcomes and mitigating adverse effects. In summary, as an important biological characteristic and therapeutic target of COAD, exploring and developing new biomarkers for HRD provide new perspectives and ideas for the diagnosis, treatment, and prognosis assessment of tumors. Materials and Methods Data Access Download the transcriptome expression profiles, clinical information, and somatic mutation data of all samples from TCGA-COAD in the TCGA database ([31]https://portal.gdc.cancer.gov/). Additionally, We retrieved the expression profiles and prognostic information on the validation set for colon cancer from the Gene Expression Omnibus (GEO) database, including [32]GSE40967 and [33]GSE17536. Subsequently, We searched and downloaded data sets from the GEO database that contain both transcriptome sequencing data and BRCA1/2 mutation data, encompassing [34]GSE49481, [35]GSE54219, and [36]GSE17764. The HRD scores for all samples in the TCGA-COAD cohort were obtained from the study by Knijnenburg et al., where HRD score is defined as the sum of LOH, LST, and TAI. Weighted Correlation Network Analysis (WGCNA) and Gene Differential Analysis We retrieved differential expression genes (DEGs) between tumor and normal tissues in TCGA-COAD from the GEPIA database and utilized these DEGs in WGCNA. Upon filtering out the top half of genes exhibiting the lowest Median Absolute Deviation (MAD) and applying the “goodSamplesGenes” methodology to eliminate outliers, we proceeded to integrate the remaining samples and genes into a WGCNA. The configuration parameters were strategically established as a minimum module size set at 30, sensitivity adjusted to 3, a module merging threshold of 0.25, and β fixed at 6. Subsequently, to distinguish between clusters of coexpressed genes, we assigned distinct colors to the modules, with genes that could not be assigned to any specific module consolidated into a designated “gray” module. Our investigation then shifted to exploring the correlations between module eigengenes and clinical attributes, with the objective of pinpointing those modules that demonstrated a statistically significant link to clinical features. The methodology described above has been widely reported in previous literature. − Using the “LIMMA” package, we conducted gene differential analysis, applying a threshold of an absolute fold change >1.5 and a false discovery rate (FDR) < 0.05 to identify significant differences. Gene Set Enrichment Analysis and Immune Infiltration Evaluation Gene enrichment analysis leveraged Metascape, incorporating KEGG, GO, Reactome, Canonical, CORUM, and WikiPathways. xCell algorithm analyzed the tumor immune microenvironment via transcriptomics, quantifying 64 immune cell types. R package “xCell” facilitated xCell score analysis. Machine Learning Framework Ten diverse machine learning approaches and a total of 101 algorithmic permutations were consolidated to pinpoint crucial genes linked to OS. − These algorithms encompassed RSF (Random Survival Forest), Enet (Elastic Network), Lasso, Ridge, Stepwise Cox, CoxBoost, plsRcox (Partial Least Squares for Cox), SuperPC (Supervised Principal Components), GBM (Generalized Boosted Regression Modeling), and survival-SVM (Survival Support Vector Machine). The methodology proceeded as follows: (1) An input data set was formed by intersecting Limma-identified differentially expressed genes (DEGs) with key modules from WGCNA (Weighted Gene Coexpression Network Analysis). (2) Utilizing this input, 101 combinations of the algorithms were applied to build prediction models within the TCGA-COAD cohort, adopting a leave-one-out cross-validation (LOOCV) strategy. (3) Each model’s performance was tested across distinct AJCC stages (I/II and III/IV) for validation. (4) The Harrell’s C-index, a metric for model concordance, was computed for each model across the entire TCGA-COAD cohort and within the specified AJCC stages. (5) The model exhibiting the highest average C-index was designated as the optimal model for identifying the key genes. Unsupervised Clustering The cluster analysis process was carried out leveraging the ConsensusClusterPlus package, adopting an agglomerative pam clustering approach and utilizing 1-Pearson correlation as the distance metric. This analysis incorporated a resampling strategy where 80% of the samples were selected, and the process was repeated 10 times. The determination of the optimal number of clusters was facilitated through the assessment of the empirical cumulative distribution function plot. Drug Sensitivity Analysis and Molecular Docking To ascertain the half-maximal inhibitory concentration (IC[50]) values, we conducted regression analysis within the R programming environment, leveraging the pRRophetic package. This package, grounded in the Genomics of Drug Sensitivity in Cancer (GDSC) database, was instrumental in our analysis. , Molecular docking simulations were executed using the CB-Dock2 online platform ([37]https://cadd.labshare.cn/cb-dock2), an AutoDock Vina-based tool for molecular docking. All parameters were maintained at their default settings to ensure consistency. The chemical structures of the aforementioned drugs were retrieved from the PubChem database ([38]https://pubchem.ncbi.nlm.nih.gov/), while the target protein structure was sourced from the Protein Data Bank (RCSB PDB; [39]https://www.rcsb.org/). , Reverse Transcription-Quantitative PCR (RT-qPCR) RNA isolation was carried out using TRIzol reagent procured from Ambion, USA. Subsequently, the reverse transcription of mRNA into cDNA was accomplished with the aid of PrimeScriptTM RT Master Mix sourced from Takara, Japan. Gene transcript abundances were quantified through RT-qPCR using ChamQ SYBR qPCR Master Mix from Vazyme, China. Relative expression levels were evaluated by employing the 2^–ΔΔCT approach, utilizing GAPDH as the housekeeping gene for normalization: For accurate measurement of CYP2B6 and GAPDH expression, distinct primer pairs were employed: CYP2B6-specific primers comprised a forward sequence (5′-GCAATACTCGCCTTACGGCT-3′) and a reverse sequence (5′-TACACACCTTGGTAGTACGCC-3′), while GAPDH-specific primers consisted of a forward primer (5′-AACAGCCTCAAGATCATCAGC-3′) and a reverse primer (5′-GGATGATGTTCTGGAGAGCC-3′). To ensure reproducibility, the experiment was repeated in triplicate and the average values were derived. The RT-qPCR technique was harnessed to assess the gene expression levels. Western Blotting Analysis After digestion with RIPA, total protein extracts were obtained. We utilized a BCA assay kit from Beyotime to quantify the cellular protein concentrations. For electrophoretic separation, a 10% SDS-PAGE gel was prepared, with 50 μg of protein loaded per lane. The resolved proteins were then transferred onto PVDF membranes, which were blocked with 5% milk solution to reduce nonspecific interactions. For primary antibody recognition, we selected rabbit polyclonal antibodies against CYP2B6 (Abcam, catalog no. ab140609; 1:1000) and GAPDH (Abcam, catalog no. ab8245; 1/10,000). Following extensive washing with TBS-T buffer, the membranes were incubated with secondary antibodies labeled with horseradish peroxidase (HRP), obtained from Abcam (catalog number ab6721), for 1 h at 4 °C. Finally, to visualize the protein bands, we employed an enhanced chemiluminescence (ECL) reagent system that enhances the sensitivity of the chromogenic reaction. Migration Ability Assay The HCT116 cells were seeded into a 24-well plate at a density of 1 × 10^5 cells per well. These plates were then incubated at a temperature of 37 °C in an atmosphere containing 5% CO[2] for a period of 24 h. Subsequently, the culture medium was discarded, and the monolayer of cells was delicately scratched using a 10 μL pipet tip, followed by a clear demarcation of the scratch. To remove any residual debris, we gently rinsed the cells twice with phosphate-buffered saline (PBS). Afterward, 1 mL of Dulbecco’s modified Eagle’s medium (DMEM) was introduced to maintain the cellular environment. Images of the scratches were captured at various time points: immediately (0 h), after 24 h, and again after 48 h. Implant HCT116 cells into the wells of a Matrigel-coated plate prefilled with serum-free DMEM, at a concentration of 5 × 10^4 cells per well. Position a lower well beneath, containing 500 μL of a full-strength medium comprising DMEM supplemented with 10% fetal bovine serum. Following a 24 h incubation period at 37 °C, we carefully eliminated any nonmigratory cells from the upper well using a cotton-tipped applicator. Subsequently, the cells residing in the lower chamber were fixed with 5% glutaraldehyde for a duration of 10 min. Stain these fixed cells with a 1% crystal violet solution diluted in 2% ethanol, allowing the staining process to occur at room temperature for 20 min. After staining, images of the cells were captured and proceeded with their enumeration. All of the above experiments were repeated three times. Colony Formation Assay Cells were enzymatically dissociated using 0.25% trypsin, followed by seeding at a density of 500 cells per well in DMEM supplemented with 10% FBS. Following a 5-day culture period, when macroscopic colonies were observable, the colonies were fixed with paraformaldehyde for 20 min and subsequently stained with 0.1% crystal violet. CCK-8 To assess cell viability, the CCK-8 assay kit (Solarbio, cat. no. CA1210) was employed. Cells were seeded in 96-well plates at a density of 2000 cells per well and incubated for varying durations (0, 24, 48, and 72 h). Post-incubation, 10 μL of CCK-8 solution was added to each well, followed by an additional incubation period of 2 h at 37 °C. Subsequently, the absorbance at 450 nm was measured to quantify cell viability. Statistical Analyses Version 4.2.1 of R software was utilized for executing all statistical analyses. The Wilcoxon/Kruskal–Wallis test was applied for continuous variable comparisons, while the chi-square test was employed for assessing differences in proportions. Statistical significance was set at a p-value threshold of <0.05. Kaplan–Meier (KM) curves were generated to analyze overall survival (OS) and disease-free survival (DFS) among distinct subgroups, with validation through the log-rank test. Additionally, receiver operating characteristic (ROC) curve analysis was conducted to appraise the predictive accuracy of the prognostic model. Correlation analysis was carried out using the Pearson correlation coefficient method. Results Prognostic Value of HRD in COAD We employed the R package ‘maxstat’ (version 0.7–25) to determine the optimal cutoff value for the HRD score, thereby stratifying patients in the TCGA-COAD cohort into high and low HRD score groups. Patients with a high HRD score demonstrated inferior OS (cutoff = 15, HR = 1.65, p = 0.02; [40]Figure A) and DFS (cutoff = 13, HR = 1.63, p = 0.02; [41]Figure B). Multivariate Cox regression analysis revealed that the HRD score was an independent prognostic factor for OS in patients with COAD (HR = 1.02, p = 0.014; [42]Figure C), indicating a 2% increase in the risk of poorer OS for every one-unit increment in the HRD score. 1. [43]1 [44]Open in a new tab (A) OS analysis in the TCGA-COAD cohort, stratified by high and low HRD score groups with a cutoff of 15. (B) DFS analysis in the TCGA-COAD cohort, stratified by high and low HRD score groups with a cutoff of 13. (C) Multivariate Cox analysis of the prognostic value of age, gender, tumor stage (TMN), and HRD score for OS. Results of WGCNA Utilizing the criteria of an absolute FC > 1 and an FDR < 0.05, a total of 2860 downregulated genes and 2692 upregulated genes were retrieved from the GEPIA database, and these DEGs were subsequently included in the WGCNA. Based on the R ^2 and mean connectivity values, the soft-thresholding power was set to 6 ([45]Figure S1A,B). After excluding outlier samples ([46]Figure S1C), all included genes were categorized into 16 distinct coexpression modules ([47]Table S1), with the brown module (n = 192) exhibiting the strongest positive correlation with the HRD score and the green-yellow module (n = 70) demonstrating the strongest negative correlation with the HRD score ([48]Figure A). A highly significant correlation was observed between Gene significance (GS) and module membership (MM), specifically within the brown-yellow and green-yellow modules ([49]Figure B,C). Within the brown-yellow and green-yellow modules, a total of 51 genes (18 downregulated, 33 upregulated; [50]Figure D) were differentially expressed between the upper and lower quartiles of the HRD score groups ([51]Figure E and [52]Table S2). Metascape enrichment analysis revealed that these 51 genes were enriched in NABA MATRISOME ASSOCIATED, protein autoprocessing, photoreceptor cell differentiation, retinol metabolism, epithelial cell differentiation, and regulation of the inflammatory response ([53]Figure F). 2. [54]2 [55]Open in a new tab (A) Correlation between different gene coexpression modules and HRD in the WGCNA analysis. Correlation between MM and GS for the brown (B) and green-yellow (C) modules. (D) Volcano plot showing DEGs in brown and green-yellow modules between high and low HRD score groups. (E) Heatmap showing differential genes between high and low HRD groups. (F) Pathway enrichment analysis of DEGs in brown and green-yellow modules between high and low HRD score groups. Machine Learning-Based Screening of HRD-Hub Genes Utilizing 101 machine learning approaches, we screened for the Hub genes among the 51 DEGs that were most correlated with OS. Our analysis revealed that the RSF model exhibited the optimal C-index across various stages of COAD ([56]Figure ). 3. [57]3 [58]Open in a new tab Heat maps and average C-index bars illustrating the performance of 101 machine learning prognostic models across different stages of COAD cohorts. RSF identified as the optimal model, highlighting the top five genes with the highest importance. The top five genes with the highest Variable Relative Importance in the RSF model are identified as Hub genes, specifically TNFRSF11A, SERPINA1, SPINK4, REG4, and CYP2B6. CYP2B6 was significantly overexpressed in patients with high HRD scores, whereas TNFRSF11A, SERPINA1, SPINK4, and REG4 demonstrated significant overexpression in patients with low HRD scores ([59]Figure A). ROC analysis indicated that the five Hub genes exhibited a good diagnostic performance for high HRD scores ([60]Figure B). In the independent validation sets, [61]GSE49481 and [62]GSE54219, all five genes demonstrated significant differential expression in BRCA1/2-mutated samples. For [63]GSE17764, only expression levels of TNFRSF11A, SERPINA1, and REG4 were available, and all three genes showed significantly downregulated expression levels in the BRCA1/2-mutated samples ([64]Figure S2). 4. [65]4 [66]Open in a new tab (A) Gene expression levels of HRD-hub genes between high and low HRD score groups. (B) ROC analysis for predicting high HRD scores using HRD-hub genes. Prognostic Value of HRD-Hub Genes The GEPIA database was utilized to explore the diagnostic and prognostic value of these five hub genes in COAD. All five hub genes were significantly overexpressed in COAD samples, with higher expression of TNFRSF11A, SPINK4, and REG4 indicating better OS ([67]Figure A,B). Additionally, in the HPA database, all five hub genes exhibited low expression levels in adjacent normal tissues, whereas their protein levels were significantly upregulated in colon cancer samples ([68]Figure S3). Immunofluorescence (IF) data from the HPA database indicate that CYP2B6, SERPINA1, and SPINK4 are primarily expressed in the cell nucleus, while TNFRSF11A is predominantly expressed in the cytoplasm ([69]Figure S4). 5. [70]5 [71]Open in a new tab (A) Expression levels of HRD-hub genes in COAD tumors and normal tissues. (B) KM survival analysis of OS between high and low expression groups of HRD-hub genes. Expression levels of HRD-hub genes across different CNV (C) and SNV (D) statuses. We further explored the impact of CNV and SNV mutations on the gene expression levels of these five hub genes in the TCGA-COAD cohort. The occurrence of CNV gain and loss did not significantly alter the gene expression levels of CYP2B6, SERPINA1, and REG4. However, a CNV gain in SPINK4 was associated with a significant decrease in its expression level, whereas a CNV loss in TNFRSF11A led to a notable reduction in its expression ([72]Figure C). The occurrence of SNV mutations in CYP2B6 resulted in a significant decrease in its expression level ([73]Figure D). Establishment of HRD-Related Molecular Subtypes and Nomogram In the TCGA-COAD cohort, unsupervised clustering based on the gene expression levels of TNFRSF11A, SERPINA1, SPINK4, REG4, and CYP2B6 was performed. By analyzing the cumulative distribution function (CDF) and the area under the CDF curve, it was found that patients could be effectively stratified into two distinct HRD clusters ([74]Figure A–C). 6. [75]6 [76]Open in a new tab (A) CDF curve of unsupervised clustering. (B) Relative change in area under the CDF curve. (C) Clustering heatmap of two types of patients. (D) Cluster-consensus plot. (E) Heatmap of HRD-hub gene expression levels in different HRD-clusters. At K = 2, the consensus values between the two clusters are the closest, indicating that the partitioning of TCGA-COAD into two molecular subtypes is appropriate ([77]Figure D). The expression level of CYP2B6 is higher in HRD-cluster 2, while the remaining four genes, TNFRSF11A, SERPINA1, SPINK4, and REG4, exhibit higher expression levels in HRD-cluster 1 ([78]Figure E). We validated the robustness of the HRD-clusters in an independent colitis validation cohort, where patients from both [79]GSE40967 and [80]GSE17536 were successfully stratified into two distinct HRD-clusters, with HRD-cluster 1 exhibiting a more favorable prognosis ([81]Figure S5). Patients in HRD-cluster 1 exhibited a more favorable prognosis, with a median OS of 8.19 years, compared to a median OS of 5.30 years for patients in HRD-cluster 2 (HR = 1.69, p = 0.01; [82]Figure A). Multivariate Cox analysis indicated that patients in HRD-Cluster 2 have a significantly worse prognosis, with a 64% increased risk of poor prognosis compared to patients in HRD-Cluster 1 (p = 0.032; [83]Figure B). 7. [84]7 [85]Open in a new tab (A) KM survival curves for OS between HRD-cluster 1 and HRD-cluster 2. (B) Multivariable Cox analysis of age, gender, TMN tumor stage, and HRD-cluster in the TCGA-COAD cohort. Clinical prognostic factors with p-values less than 0.05 in the multivariate Cox regression analysis were included in the construction of the nomogram, encompassing age, cluster, stage, M stage, and N stage ([86]Figure A). Patients with higher nomogram scores have a worse prognosis, with a median OS of 3.00 years, whereas patients with lower nomogram scores have a median OS of 12.33 years (HR = 5.15, p < 0.0001; C-index = 0.791, 95% CI (0.740–0.842), p < 0.0001, [87]Figure B). ROC analysis revealed that the AUC values of the nomogram score in predicting 1-, 3-, and 5-year OS for COAD patients were all above 0.8, indicating a good predictive performance ([88]Figure C). The calibration curve indicates that the nomogram score exhibited good accuracy in predicting the 3-year OS of patients with COAD ([89]Figure D). 8. [90]8 [91]Open in a new tab (A) Nomogram predicting OS incorporating age, m, n, stage, and HRD-cluster. (B) KM curves for OS between high- and low-risk groups defined by the nomogram. (C) ROC analysis for predicting OS using nomogram risk scores. (D) Calibration curve for predicting 3-year OS using nomogram risk scores. Immune Infiltration in HRD-Related Molecular Subtypes The infiltration levels of diverse intratumoral immune cells in all samples from the TCGA-COAD cohort were evaluated by using the xCell algorithm ([92]Table S3). Subsequently, unsupervised clustering based on these immune cell infiltration levels was performed, resulting in the categorization of all patients into four distinct xCell clusters ([93]Figure A,B). In xCell-cluster 4, tumors exhibited higher levels of infiltration by neutrophils, macrophages, monocytes, and dendritic cells (DCs), whereas in xCell-cluster 1, the infiltration levels of these immune cell types were significantly lower ([94]Figure A). We further explored the proportion of patients belonging to different immune subtypes within distinct HRD-Clusters and found that the number and proportion of patients classified as the xCell-cluster 4 subtype were significantly higher in HRD-Cluster 1 ([95]Figure C,D). The xCell algorithm was utilized to evaluate the comprehensive infiltration levels of stromal and immune cells within tumor tissues, revealing a significantly elevated Immune Score in HRD-Cluster 1 compared with HRD-Cluster 2 ([96]Figure S6A). 9. [97]9 [98]Open in a new tab (A) Unsupervised clustering of tumor-infiltrating immune or stromal cell scores derived from the Xcell algorithm. (B) Clustering heatmap of 4 types of Xcell-cluster patients. Patient numbers (C) and proportions (D) in different Xcell-clusters for patients in HRD-cluster 1 and HRD-cluster 2. To further validate our findings, we employed the ImmuCellAI algorithm to reassess the infiltration levels of characteristic immune cells from xCell-cluster 4 across different HRD-Clusters ([99]Table S4). The significantly higher infiltration levels of neutrophils and various types of DCs observed in HRD-Cluster 1 are consistent with our findings from the xCell algorithm analysis ([100]Figure S6B). Furthermore, we employed ssGSEA to conduct an enrichment analysis of immune cell functions in each sample, revealing significantly higher enrichment levels of CCR, Check-point, Cytolytic_activity, Parainflammation, and T_cell_coinhibition functional pathways in HRD-Cluster 1 ([101]Table S5 and [102]Figure S6C). Predictive Capability of HRD-Hub Genes in Chemotherapy Sensitivity Initially, we employed pRRophetic to evaluate the IC50 values of commonly used tumor chemotherapeutic drugs across distinct HRD-Clusters. The HRD-Cluster 1 demonstrated significantly superior drug sensitivity to Vinblastine, Cytarabine, Lapatinib, Cisplatin, Gefitinib, and Metformin compared with HRD-Cluster 2 ([103]Figure S7). Numerous previous studies have established that tumor HRD status serves as one of the biomarkers for the treatment response to PARPis. Consequently, we delved further into exploring the potential of novel HRD-hub Genes as predictive indicators of PARPi responsiveness. Using the median expression level of each HRD-hub Gene as the cutoff, patients were stratified into high and low expression groups. The IC50 values of Veliparib (ABT-888), Rucaparib (AG-014699), and Olaparib (AZD2281) were then evaluated and compared between these two groups ([104]Figure ). Patients with a low expression of CYP2B6 exhibited higher sensitivity to Veliparib, Rucaparib, and Olaparib. Conversely, the expression level of REG4 did not offer a predictive value for treatment sensitivity to PARPis. Patients with low expressions of SERPINA1 and SPINK4 showed increased sensitivity to Veliparib and Rucaparib. Notably, the high expression group of TNFRSF11A had a lower sensitivity to Rucaparib but a higher sensitivity to Olaparib. 10. [105]10 [106]Open in a new tab Drug sensitivity to PARPi between high and low expression groups of the HRD-hub genes. To further elucidate the underlying mechanisms of HRD-hub Genes in predicting the PARPi response, we performed molecular docking analyses individually between the proteins encoded by these HRD-hub Genes and Veliparib, Rucaparib, and Olaparib. Each molecular docking process was repeated five times, and the docking pose with the lowest Vina score was presented ([107]Figure ). It is noteworthy that the vina score for the docking between CYP2B6 and Rucaparib is −0.9 kcal/mol, indicating a favorable binding interaction between CYP2B6 and Rucaparib ([108]Figure S8). 11. [109]11 [110]Open in a new tab Optimal docking poses of veliparib, rucaparib, and olaparib with TNFRSF11A, SERPINA1, SPINK4, REG4, and CYP2B6 proteins. Experimental Validation of the Tight Linkage between CYP2B6 and HRD in COAD A total of 9 pairs of COAD tumor tissues and adjacent normal tissues were collected and subjected to RT-qPCR analysis to assess the alterations in the CYP2B6 expression levels. Compared to adjacent normal tissues, not only was the gene expression of CYP2B6 significantly upregulated in COAD tumors, but also the protein expression level of CYP2B6 was markedly elevated ([111]Figure S9A,B). Memantine hydrochloride was employed to inhibit CYP2B6 in HCT116 cells, and the proliferation of HCT116 cells was progressively suppressed with an increasing concentration of memantine hydrochloride. The maximal degree of inhibition was observed at a concentration of 1 mM ([112]Figure S9C). Upon treatment of HCT116 cells with 1 mM Memantine hydrochloride, both the wound scratch assay and Transwell assay indicated a decreased migratory capacity of the cells, while the clonal formation assay suggested a reduced tumor clonal formation ability ([113]Figure S9D,E). In the HPA database, we integrated all available data that concurrently featured immunohistochemical (IHC) analysis for both CYP2B6 and RAD51, which are crucial molecules required for HRD formation. Our analysis revealed that as the average optical density (AOD) value of CYP2B6 increased, so did the AOD value of RAD51, indicating a strong positive correlation between the protein expression levels of CYP2B6 and RAD51 (R = 0.9310, p = 0.0018; [114]Figure S9F). Furthermore, immunofluorescence analysis revealed a significantly elevated expression level of γH2AX, a marker of HRD, in tumor cells with a high expression of CYP2B6 protein. Conversely, γH2AX expression was nearly absent in CYP2B6-negative cells ([115]Figure S9G). Discussion In breast, pancreatic, and prostate cancers, HRD has been established as a significant predictor for the clinical application of PARP inhibitors or platinum-based drugs and patient outcomes. − However, the potential role of HRD in the diagnosis and treatment of COAD has been incompletely explored. Our study, integrating bioinformatics analysis and machine learning, has ultimately identified five novel HRD biomarkers that have not been previously reported in the literature. The novel HRD biomarkers identified include TNFRSF11A, SERPINA1, SPINK4, REG4, and CYP2B6. TNFRSF11A, also known as TRANCE receptor and RANK, is a type I transmembrane protein. Mice with RANK gene knockout exhibit severe osteopetrosis, lack mature osteoclasts, exhibit impaired lymph node development, and have defective B and T cell maturation. , SERPINA1, a primary 1-antitrypsin synthesized in specific cells, plays diverse roles in physiological and pathological processes such as angiogenesis, intravascular fibrinolysis, and tumor metastasis. , Previous studies have indicated that the expression of SERPINA1 is associated with a poor prognosis in various cancer patients. Furthermore, research has suggested that SERPINA1 exhibits superior diagnostic value for colorectal cancer and early-stage colorectal cancer compared to CEA and CA19-9. Overexpression of SPINK4 significantly promotes proliferation, metastasis, and tumor growth of CRC cells. Furthermore, they markedly inhibit ferroptosis in CRC cells. Results from mouse models further demonstrate that SPINK4 overexpression suppresses ferroptosis in CRC cells and promotes tumor growth. REG4 participates in infection and inflammation by promoting macrophage polarization toward the M2 type, activating the epidermal growth factor receptor (EGFR)/Akt/CREB pathway, and eliminating inflammatory Escherichia coli. It is closely associated with tumorigenesis and exhibits high expression in gastric cancer, colorectal cancer, pancreatic cancer, gallbladder cancer, ovarian cancer, and urothelial cancer, correlating with tumor aggressiveness and poor prognosis. Given that the molecular docking of CYP2B6 protein with the PARPi Rucaparib exhibited the lowest binding energy of −0.9 kcal/mol and data from the GDSC database suggested a significant correlation between CYP2B6 expression levels and drug sensitivity to three PARPi, we selected CYP2B6 for further analysis and validation. Cytochrome P450 2B6, encoded by the human CYP2B6 gene, is a significant subtype within the cytochrome P450 (CYP) enzyme family. Notably, the CYP2B6 gene exhibits the highest level of polymorphism among human CYP superfamily genes, with over 100 known DNA variants. − In a hospital-based case-control study investigating the role of CYP gene polymorphisms in gastrointestinal (GI) cancer susceptibility in rural Maharashtra, the variant (T) allele of CYP2B65 (rs3211371) was found to possess a significantly elevated risk (OR = 4.43; 95% CI: 2.20–8.90; P < 0.0001) of GI cancer in the studied population. These findings suggest that polymorphism of the CYP2B65 gene may be involved in the development of GI cancer. In human liver microsomes (HLMs), Rucaparib exhibited reversible inhibition of CYP1A2, CYP2C9, CYP2C19, CYP2D6, and CYP3A isoforms, but not of CYP2B6 or CYP2C8. In cultured human hepatocytes, Rucaparib demonstrated concentration-dependent downregulation of CYP2B6 mRNA. Hence, the predictive ability of CYP2B6 for Rucaparib sensitivity may not be attributed to its role as a CYP enzyme in drug metabolism, suggesting potential alternative reasons for the association between CYP2B6 and HRD. Studies investigating the effect of CYP2B6 on the treatment efficacy of other chemotherapeutic agents for cancer have been reported. A study involving 145 breast cancer patients receiving cyclophosphamide combination chemotherapy revealed minor allele frequencies of 0.27, 0.29, and 0.07 for CYP2B69, CYP2B64, and CYP2B6*5, respectively. Notably, the haplotypes CYP2B6 *5/*6, *6/*9, or *6/*6 were significantly associated with a shorter time to disease recurrence. No significant associations were observed with myelotoxicity. CYP2B6 is a cytochrome P450 enzyme involved in the metabolism of various drugs, including some PARPi. The metabolism of PARPi by CYP2B6 can affect the drug’s pharmacokinetics and pharmacodynamics, potentially influencing its efficacy and toxicity. Variations in CYP2B6 expression and activity can lead to interindividual differences in drug response, making it an important factor to consider in the clinical use of PARPi. Currently, detecting tumor HRD is a complex and intricate process, involving the assessment of pathogenic variants in BRCA1/2, variations in other genes related to the HRR signaling pathway, as well as the determination of HRD scores and threshold rules that characterize genomic instability; consequently, this imposes a significant economic burden on tumor patients. , Therefore, further exploration of novel potential HRD biomarkers holds promise for simplifying and optimizing HRD detection strategies, offering greater prospects for clinical application. Our study not only screened and validated a novel HRD biomarker, CYP2B6, but also developed an HRD-related molecular subtyping scheme and nomogram that demonstrate robust predictive capability for OS in COAD patients and exhibit some predictive power for sensitivity to common tumor chemotherapeutic agents. Therefore, this HRD-related molecular subtyping and nomogram holds promise for facilitating the formulation of personalized clinical treatment strategies for COAD patients. Furthermore, our study suggested that tumors from patients with distinct HRD subtypes exhibit varying immune infiltration profiles; specifically, HRD-Cluster 1, the cluster with a higher degree of HRD, displays elevated levels of neutrophils and distinct subsets of DCs. Previous research has consistently demonstrated that colon tumors exhibit an overexpression of neutrophil chemoattractants in comparison to healthy colonic tissues. This overexpression facilitates the recruitment of neutrophils to both the invasive margin and central regions of colon tumors. Notably, tumor-associated neutrophils expressing tumor necrosis factor α (TNF-α), a marker typically associated with an antitumoral phenotype, are predominantly localized within the invasive margin. , Tumor-associated neutrophils derived from this margin exhibit an antitumoral phenotype, characterized by elevated expression of intercellular adhesion molecule-1 (ICAM-1) and CD95, underscoring their potential role in tumor immunity. Research has revealed that DCs, particularly PD-L1+ DCs within the tumor microenvironment of colon cancer, are associated with enhanced patient survival. This association is further corroborated by a positive correlation with CD8+ T cell infiltration, indicative of an immunologically “hot” tumor microenvironment that fosters robust antitumor immune responses. − Hence, variations in the infiltration levels of these immune cell populations may underlie the differential prognostic outcomes observed among patients stratified into distinct HRD clusters. Acknowledging the limitations of our study, it is important to note that our findings are primarily grounded in bioinformatic analyses and machine learning methods. While we have established a correlation between CYP2B6 and HRD, we have yet to delve into the intricate mechanisms that underlie the influence of CYP2B6 on HRD, leaving this as an avenue for future investigation. Conclusions In summary, our study integrated multiple COAD data sets and employed WGCNA and machine learning techniques to identify five novel, unreported biomarkers of HRD: TNFRSF11A, SERPINA1, SPINK4, REG4, and CYP2B6. We developed a novel HRD-associated molecular classification scheme and nomogram that effectively predicts patient prognosis. Experimental validation confirmed the upregulation of CYP2B6 in COAD, promoting proliferation and migration, and revealed a strong positive correlation between CYP2B6 and canonical HRD markers RAD51 and γH2AX. Furthermore, CYP2B6 emerged as a promising marker for sensitivity to PARP inhibitors. Thus, our study provides new resources and insights into optimizing personalized treatments for COAD. Supplementary Material [116]ao4c10672_si_001.pdf^ (3.6MB, pdf) Acknowledgments