Abstract Background The DNA damage response (DDR) has a major impact on the development and progression of pancreatic ductal adenocarcinoma (PDAC). Investigating biomarkers linked to the DDR may facilitate prognostic assessment and prediction of immunological characteristics for patients with PDAC. Methods The single-cell RNA sequencing (scRNA-seq) dataset [40]GSE212966 was obtained from the GEO database, whereas the bulk RNA-seq data were sourced from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) databases. Least absolute shrinkage and selection operator (LASSO) and univariate Cox regression analyses were used to select genes to construct a prognostic risk model. Finally, the correlations of the model score with drug sensitivity, immunological checkpoints, and immune infiltration were assessed. Results We used 16 DDR marker genes to construct a predictive model. Furthermore, we established that the model had strong performance in both the training and validation cohorts. For PDAC, the model risk score served as an independent predictor of prognosis. There were notable differences in the proportions of the immune cells in the tumor microenvironment and drug sensitivity between the high and low risk score groups. The study confirmed that the risk score model is useful for predicting the immunotherapy response. Our experiments verified that knockdown of LY6D inhibits cell proliferation, promotes apoptosis and DNA damage. Conclusion Our creative integration of bulk RNA sequencing and scRNA-seq data allowed us to construct a DDR-related prognostic model. Our model can be used to predict the immunological features, treatment response and prognosis of PDAC with a relatively high degree of accuracy. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02293-w. Keywords: Pancreatic ductal adenocarcinoma, DNA damage- and repair-related genes, Survival, Chemotherapeutic response Introduction With a five-year survival rate of 10% across all stages, pancreatic ductal adenocarcinoma (PDAC) is one of the deadliest cancers worldwide [[41]1, [42]2]. The high mortality rate and poor prognosis of PDAC stem from the absence of clinical signs in the disease's early stages, the high recurrence rate, and substantial resistance to treatment modalities, including chemotherapy and radiation. Many clinical studies on PDAC treatments have been unsuccessful, and thus alternative treatment options are limited. The molecular and functional heterogeneity of PDAC may partly explain the lack of clinical therapeutic advancements. Genomic instability is defined as a high frequency of harmful alterations in the genomic structure that result from a compromised DNA repair response, which is a hallmark of cancers. The emergence of several cancers has been linked to the silencing of or to harmful mutation in tumor suppressor genes, including checkpoint and DNA repair genes such as P53, P16, and ataxia telangiectasia mutated (ATM). The DNA damage response (DDR), which encompasses signaling and repair, is necessary for genome stability. Multiple DDR mechanisms, including direct reversal, base excision repair, nucleotide excision repair, mismatch repair, single-strand break repair, and double-strand break repair, can occur [[43]3–[44]5]. DDR pathways play a role in tumor development and progression and the tumor response to treatment [[45]6]. Certain studies have shown a correlation between DDR deficiency and the stimulation of anticancer immunity via the activation of the classical cGAS-STING pathway. In addition to causing alterations in the tumor immune milieu, DDR anomalies also impact the therapeutic efficacy of immunological checkpoint inhibitors (ICIs) [[46]7]. Moreover, the main factor limiting the response to single drugs in cancer therapy is tumor acquisition of resistance. Drug resistance may result from the interplay of DDR mechanisms. Personalized medicine for PDAC is a relatively new research area, and there are currently no predictive biomarkers available. Several studies have been performed to investigate numerous possible biomarkers for predicting the response to different treatments (chemotherapy, immunotherapy, or targeted therapy) and the prognosis of PDAC. The most recognized biomarkers are BRCA1 and BRCA2 mutations, which are linked to advantages from platinum-based therapies and PARP inhibitors (PARPis). The only approved targeted maintenance treatment for pancreatic cancer is olaparib, a DNA-damaging drug designed for hormone receptor (HR)-deficient carriers of germline BRCA (gBRCA) mutations [[47]8]. With the exception of the < 1% of patients with microsatellite instability-high (MSI-H) tumors, PDAC patients are mostly resistant to FDA-approved immunotherapies that have improved the prognosis of other solid tumors. The widespread resistance of PDAC to immune checkpoint blockade (ICB) is almost unparalleled among human cancers. One important feature of PDAC, which is often hereditary in nature, is impaired DNA damage repair. Sixty-three genetic changes were found in each PDAC case in a large-scale genomic study, and these genes are enriched in DNA damage control pathways [[48]9]. Familial breast cancer is often linked to germline DDR gene mutations (ATM, BRCA1, BRCA2, MLH1, MSH2, PALB2, PMS2, and STK11) as well as mutations in traditional cancer risk-related genes such as CDKN2 A or TP53 [[49]10]. ATM is the most often mutated DDR gene in somatically altered sporadic PDAC. DDR deficiency may increase the susceptibility of PDAC patients to novel treatment interventions that increase the DNA damage burden above a threshold that can be tolerated. Consequently, investigating the characteristic genes that impact the DDR and creating a predictive signature based on characteristic genes associated with the DDR would aid in the development of tailored therapy and monitoring regimens for patients with PDAC. Through the integration of bulk and single-cell RNA sequencing data, we performed multiomics studies to identify the pathways involved in malignant development and to predict cancer prognosis effectively. To identify DDR response-related DEGs in PDAC, we assessed single-cell and bulk RNA sequencing data from patients with PDAC, with a focus on genes associated with DNA damage repair. A risk score (RS) model using these genes was built to predict the prognosis of PDAC. Our findings suggest that the RS model shows stability and potential in predicting PDAC prognosis, providing a foundation for further exploration of its clinical utility. Materials and methods Data download From the Gene Expression Omnibus (GEO) database ([50]https://www.ncbi.nlm.nih.gov/geo), we obtained the scRNA-seq dataset [51]GSE212966 [[52]11]. We also retrieved the expression profiles and patient survival data from the [53]GSE62452 and [54]GSE57495 datasets from GEO. The data were processed in the following ways: (1) samples lacking clinical follow-up information were eliminated; (2) samples with a unified unit of survival time of days, < 0 days, and unknown survival time were eliminated; (3) probe identifiers were transformed into gene symbols; (4) samples corresponding to multiple genes were eliminated; and (5) expression values corresponding to multiple gene symbols were summed to determine the median expression value. A total of 10,526 DDR-related genes were identified via the term"DNA damage repair"in the GeneCards database ([55]http://www.genecards.org). Among them, 9,723 genes associated with the DDR that had a correlation coefficient greater than 1.0 were selected. Single-cell sequencing data analysis Initially, we used the R package"Seurat"(5.0.2) [[56]12] to generate a Seurat object for the scRNA-seq data. We added quality control criteria on the basis of the prior threshold. Cells meeting the following criteria were included: range of counts, 1800–20,000; number of features, 300–8000; proportion of mitochondrial genes, < 20%; percentage of ribosomal genes < 20%; and percentage of red blood cell reads < 5%. Next, we eliminated the cell cycle effects, normalized the Seurat objects, reduced their dimensionality (1:15), clustered them at a resolution of 0.5, and annotated them. Seurat groups cells according to their principal component analysis (PCA) scores, with each PDAC cell basically serving as a"meta feature"that incorporates data from a correlated feature set to mitigate the significant technical noise in any single feature for scRNA-seq data. Data on the top 2000 hypervariable genes were used for the PCA. The"harmony"R package was used for batch correction to mitigate the batch effect on future analyses since single-cell sequencing data were obtained from many datasets. The FindClusters approach was utilized to find tumor cell subgroups with a resolution of 0.5 after refining and consulting the clustering findings in the initial contribution. UMAP was used to further decrease the dimension of the data that were acquired following PCA. The DimPlot function was used to depict the various cell types once they were separated into low-dimensional areas. The expression, landscape distribution, and distinctive genes of tumor cells were visualized via the FeaturePlot, DotPlot, DoHeatmap, and VlnPlot tools. In each cell subgroup, marker genes associated with particular DDR reactions were identified via the DotPlot tool. The"FindAllMarkers"tool was utilized to locate cell cluster marker genes. Finally, we carried out manual annotation. The reference cell markers were obtained from a published study [[57]13] and the CellMarker database ([58]https://xteam.xbio.top/CellMarker/). Identification of active subgroups Marker gene intersections were chosen for each cell subgroup on the basis of DDR-related parameters and substantial population specificity (adj.p < 0.05& |avg_log2 FoldChange|> 1.5 & pct.1 > 0.5 & pct.2 < 0.5). The"AUCell"R program was utilized to determine each cell's activity score based on the expression levels of the intersecting genes. The threshold for this analysis was established via the AUCell_explore thresholds function. The area under the ROC curve (AUC) score of every cell was used to color-code the UMAP plot of cell clusters, allowing researchers to determine which fraction of cells in a given subgroup was active with respect to particular genes connected to the DDR. With the default settings, the FindAllMarkers tool was used to identify the DEGs between the active and inactive subgroups. Marker genes with significant specificity (avg_log2 FC > 1) were used to identify the functions of the related cell population. Enrichment analysis To rank genes according to the degree of differential expression between two samples, gene set enrichment analysis (GSEA) was performed with predetermined gene sets. The most highly upregulated and downregulated genes in predetermined gene sets were subsequently used for analysis. These specific gene sets are often obtained via functional annotation or derived from earlier studies. GSEA was used to identify the distinctive genes enriched by active cell subgroups via the"ClusterProfiler"[[59]14] and"hallmark"pathway gene set R packages. Significant enrichment was defined as gene sets with p < 0.01. Additionally, we used the Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses to conduct pathway enrichment studies. Identification of differentially expressed genes (DEGs) To improve the comparability of genomes between cancer and paraneoplastic disorders, we merged normal samples from the Genotype-Tissue Expression Project (GTEx) database with samples from the TCGA-PDAC cohort. To identify DEGs between the tumor and normal groups (log2 FoldChange > 1.5 & P < 0.05), the"limma"R program was utilized. Development of a prognostic model and survival analysis We used the DNA damage response-related marker genes (DDRMGs) (log2 FoldChange > 0.25) to intersect the DEGs between the tumor and normal samples. The predictive model was built using clinical data for the TCGA-PDAC cohort, including age, TNM stage, and gene expression (in transcripts per million mapped reads (TPM) format). Univariate Cox regression analysis was then used to identify genes linked to the prognosis of PDAC. The gene set was further reduced via least absolute shrinkage and selection operator (LASSO) regression analysis. In the end, 16 DDRMGs were chosen to construct the prognostic model. This following formula was used to obtain the risk score (RS). Gene expression × coefficient = RS. Patients were divided into high- and low-risk groups on the basis of the median RS. We next used the [60]GSE62452 and [61]GSE57495 cohorts to validate the model. The overall survival (OS) time of patients with PDAC was predicted via Kaplan‒Meier (K‒M) survival analysis. To assess the specificity and sensitivity of the model, a receiver operating characteristic (ROC) curve was generated. Calculating the proportions of invading immune cells and the immune score for DEG identification The"CIBERSORT"R program was used to determine the degree of immune infiltration in PDAC. Initially, we computed the proportions of 21 classical immune cell types and compared them between the high- and low-risk groups. Additionally, correlation analysis was performed to clarify the association between the proportions of infiltrating immune cells and the model RS. Immune checkpoints can be used to ascertain a patient's responsiveness to immunotherapy. We thus assessed the differences in immune checkpoint gene expression between the two risk groups. Drug sensitivity and immunotherapy efficacy Gene expression profiles and half-maximum inhibitory concentration (IC50) values of many anticancer medications were obtained via the R program"OncoPredict"[[62]15]. To predict the medication sensitivity of the high- and low-risk categories, we employed the"calcPhenotype"tool. We also investigated the relationship between the IC50 values and the activity of our model genes. Immunotherapy efficacy in the two risk groups was predicted via the"IMvigor210 CoreBiologies"R package. Pathway enrichment analysis Several pathways that were enriched in two risk categories were evaluated via gene set variation analysis (GSVA). The"HALLMARK"and"KEGG"reference gene sets were obtained from the Molecular Signature Database (MSigDB; [63]https://www.gseamsigdb.org/gsea/msigdb/). Immunohistochemical staining To determine the expression of LY6D, 25 PDAC tissue samples from Fudan University Shanghai Cancer Center were subjected to immunohistochemistry. After two hours of drying at 65 degrees Celsius, the tissue slices were deparaffinized in xylene and then rehydrated in ethanol. The sections were blocked with 10% goat serum for 10 min and washed with PBS following antigen extraction in 0.1 mol/L citrate buffer solution. After a 4 °C overnight incubation period, the sections were incubated with a primary antibody (82,094–1-RR, Proteintech, China;1:1000), and then, the sections were incubated with biotin-labeled goat anti-rabbit IgG for half an hour. The sections were treated in a consecutive manner at room temperature, first with diaminobenzidine for 1 min and then with hematoxylin for 2 min. The proportion of stained cells and the intensity of the cell staining on each slide were independently graded for the assessment of LY6D expression. Two separate pathologists evaluated the immunohistochemistry data in a blinded fashion. A score of 0–4 was assigned for the percentage of positive cells: 0, < 5%; 1, 5–25%; 2, 26–50%; 3, 51–75%; and 4, > 75%. The following system was used to score stain intensity: 0, no staining; 1, weakly positive; 2, moderately positive; and 3, strongly positive. The percentage of positive cells and the staining intensity were multiplied to determine the final staining score. Cell culture The SW1990 and PATU- 8988 T cell lines were purchased from the National Infrastructure of Cell Line Resources. All cell lines were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum (FBS), 100 units/mL penicillin, and 100 μg/mL streptomycin. Cell transfection Small interfering RNA (siRNA) targeting LY6D were obtained from Genomeditech (Shanghai, China). The target sequences of the siRNAs were as follows: siLY6D#1:5′-GCAAUGAGAAGCUGCACAA- 3′; siLY6D#2:5′-GUUUAUUUUUGUACUCAAA − 3′. Transfection of SW1990 and PATU- 8988 T cells was carried out using Lipofectamine 3000 (Invitrogen, Waltham, MA, USA). RT-qPCR Total RNA was extracted using the TRIzol method. PrimeScript RT Master Mix (RR036 A, Takara) was used to generate cDNA from 500 ng RNA. The ChamQ Universal SYBR qPCR Master Mix (RR820 A, Takara) was used to generate the reactive solution for RT-qPCR, and the Quantastudio 7 device captured the data. Gene expression was computed using the 2 − ΔΔCT method, and the results were normalised to the mRNA expression of ACTB in each sample. The specific primers used were: human LY6D forward 5'-GGAATCTGGTGAAGAAGGACTGTG- 3′ and reverse 5'-GGTCCTCCTGGCAGCACTG- 3′; human GAPDH forward 5'-CTGGGCTACACTGAGCACC- 3′ and reverse 5'′-AAGTGGTCGTTGAGGGCAATG- 3′. Immunofluorescence staining of γ-H2 AX and 53BP1 The assessment of γ-H2 AX immunofluorescence labeling was performed via a DNA damage test kit from Beyotime(C2038S) in Shanghai, China. After three cold PBS washes, the transfected SW1990 cells and PATU- 8988 T cells were blocked with QuickBlock™ Blocking Buffer for Immunol Staining for one hour at 4 °C. A mouse monoclonal antibody (1:200) against γ-H2 AX and a rabbit monoclonal antibody (1:200) against TP53BP1-Specific antibody (83,809–1-RR, Proteintech) were then added to the cells, which were then incubated for 1 h at room temperature. After being rinsed with Immunol Staining Wash Buffer containing 0.5% Triton X- 100, SW1990 cells and PATU- 8988 T cells were treated for 80 min at 4 °C with Coralite®488-Conjugated Goat Anti-Rabbit IgG(H + L) (SA00013 - 2, Proteintech;1:200) and Cy3-conjugated anti-mouse secondary antibody (C2038S,Beyotime; 1:200). Images were obtained using a confocal microscope after the nuclei were labeled with 4,6-diamidino- 2-phenylindole (DAPI). 5-Ethynyl- 2′-deoxyuridine (EdU) cell proliferation assay An EdU cell proliferation kit (C0078S, Beyotime) was used to measure the capacity for cell proliferation. The process was completed according to the kit's instructions. EdU was used at a working concentration of 10 μM. Cell cycle assays A cell cycle and apoptosis analysis kit (C1052, Beyotime) was used to measure cell cycle changes. The process was completed according to the kit's instructions. Apoptosis detection An caspase- 3 activity and apoptosis detection kit for live cell (C1077S, Beyotime) was used to measure the caspase- 3 activity and apoptosis in cultured cells. The process was completed according to the kit's instructions. The concentration of GreenNuc™ Caspase- 3 Substrate is 1 mM. Statistical analyses The differences between the two groups of samples were analyzed via the Wilcoxon signed-rank test, whereas the differences among several groups of samples were assessed via the Kruskal‒Wallis test. NS indicates p > 0.05, ^∗ indicates p ≤ 0.05, ^∗∗ indicates p ≤ 0.01, ^∗∗∗ indicates p ≤ 0.001, and ^∗∗∗∗ indicates p ≤ 0.0001. Results Identification of cellular subgroups exhibiting an active DDR Tumor single-cell landscape The single-cell sequencing dataset [64]GSE212966 obtained from the Gene Expression Omnibus (GEO) database comprises tumor samples from six individuals diagnosed with PDAC. On the basis of the findings of"VlnPlot"analysis, thresholds were set to filter the cells (Supplementary Fig. S1 A). After filtering and normalization, a total of 30,483 cells remained for use in the subsequent studies. The results of PCA revealed that cells can be successfully segregated from one another, and the top 14 principal components were identified as important principal components for further research via the"ElbowPlot"tool (Supplementary Fig. S1B). With UMAP dimensionality reduction, the expression patterns of individual cells were visualized on the basis of clustering outcomes based on the single-cell sequencing dataset and the annotation data acquired with SingleR. The cells were split into 25 subgroups, labeled 0 to 24, according to the data (Fig. [65]1A). We manually annotated the data and identified 10 primary cell types. Figure [66]1B displays the distribution of various PDAC cell subgroups. The cell percentage bar graph revealed that the tumor microenvironment is an intricately interwoven system composed of immune cells, ductal cells, endothelial cells, and tumor-associated fibroblasts, among other cell types (Fig. [67]1C). The two most prominent marker genes for every cell cluster are displayed in the violin diagram (Fig. [68]1D). Additionally, there was considerable heterogeneity in the cell components of the various PDAC samples. By intersecting DDR-related genes with cell cluster marker genes, we were able to obtain 478 genes to serve as the input gene set for the following stage. The bubble diagram shows how they manifested in the designated cell subgroups (Fig. [69]1E). Fig. 1. [70]Fig. 1 [71]Open in a new tab Identification of cell subgroups and analysis of marker gene expression via single-cell RNA sequencing (scRNA-seq) data. A The distribution of pancreatic ductal adenocarcinoma (PDAC) cell subgroups is displayed on the UMAP plot. B PDAC cell subgroup annotation findings are displayed on the UMAP plot. C The cell types distributed in PDAC patients are displayed in a cumulative histogram. D Gene expression associated with distinct cell subgroups is depicted in a violin diagram. E The expression of genes linked to DNA damage and repair responses in each cell subgroup is depicted in the bubble diagram Identification of cell subgroups with an active DDR Using the"AUCell"approach, we then calculate the activation level of each cell and ranked the cells according to AUC score. We demonstrated the expression of the top 10 most active DDRMGs in various cell subpopulations using UMAP plots (Fig. [72]2A). A total of 9807 single cells were defined as active, with a threshold of 0.14 (Fig. [73]2B). The detailed layout of these active cells is displayed in a UMAP plot (Fig. [74]2C). The results indicate that tumor-associated fibroblasts and macrophages may play significant roles in the DDR process, as evidenced by the composition and proportions of active and inactive cells in Fig. [75]2D. DDR-active cells contribute to PDAC progression and therapeutic resistance by maintaining genomic stability through enhanced DNA repair mechanisms (e.g., homologous recombination), enabling tumor cells to evade apoptosis triggered by chemotherapy or radiation. These cells also remodel the tumor microenvironment (TME) by promoting stromal interactions (e.g., tumor-associated fibroblasts) and immune evasion, as shown by their high DDR activity in our single-cell analysis (Fig. [76]2C). Such cells likely drive resistance to genotoxic therapies and support tumor survival, aligning with studies linking DDR activation to chemoresistance in PDAC [[77]16]. Fig. 2. [78]Fig. 2 [79]Open in a new tab Determination of subgroups of activated cells. A The UMAP plots show the expression of the top ten most active DDRMGs in different cell subpopulations. B The cutoff value for the AUC score of the DNA damage and repair marker genes was 0.14. C The cell activity score is displayed on the color-coded UMAP plot. Brighter colors are associated with higher activity levels. D The histogram illustrates the cumulative distribution of active and inactive cell populations. E GSEA findings for DDR-activated genes. There were four repressed and four activated signaling pathways among the eight differentially enriched pathways The identification of 478 DDR-related genes across cell clusters reveals key molecular mechanisms underlying PDAC heterogeneity. These genes are enriched in pathways critical for DNA repair (e.g., BRCA1/2, ATM), cell cycle regulation (e.g., CDK1), and immune modulation (e.g., CXCL12). This highlights potential therapeutic targets, such as PARP inhibitors for BRCA-deficient tumors or checkpoint inhibitors targeting DDR-immunomodulatory crosstalk. By mapping these genes to specific cell types (e.g., fibroblasts and macrophages), our study provides insights into stromal-driven resistance mechanisms and personalized treatment strategies. Identification of functions of DDR-reactive cell subgroups By employing GSEA, the expression of marker genes with high specificity was used to infer the function of the population of active DDR cells. After the marker gene sets were subjected to GO and KEGG pathway enrichment analyses, the ten most significantly enriched terms and pathways were used to create a bubble diagram. The analysis revealed that marker gene sets were considerably enriched in extracellular tissue-related activities and pathways linked to neuroactive ligand–receptor interactions and cancer (Supplemental Fig. [80]1C–1D). On the basis of the log2 FC value, GSEA was performed on the DEGs with hallmark gene sets. Eight pathways were enriched. The G2M checkpoint pathway was notably enriched (Fig. [81]2E). Enrichment of pathways such as"Allograft rejection,""Coagulation,""E2 F targets,""Epithelial-mesenchymal transition (EMT),""G2/M checkpoint,""Myogenesis,""Pancreas beta cells,"and"UV response dn"significantly influences the tumor microenvironment (TME) and patient prognosis. Allograft rejection pathways indicate immune activation, which can enhance anti-tumor immunity [[82]16]. Coagulation pathways are linked to tumor progression and metastasis by promoting angiogenesis and fibrin deposition [[83]17]. E2 F targets regulate cell cycle progression, and their dysregulation is associated with uncontrolled proliferation [[84]18]. EMT facilitates tumor invasion and metastasis by enhancing cell motility and resistance to apoptosis [[85]19]. Myogenesis and pancreas beta cell pathways may reflect stromal interactions and metabolic reprogramming in the TME [[86]20]. UV response dn pathways are often downregulated in tumors, potentially reducing DNA repair capacity [[87]21]. The G2/M checkpoint is crucial for maintaining genomic stability by ensuring proper cell division. Dysregulation of this checkpoint, often observed in cancers, leads to mitotic errors and chromosomal instability, promoting tumor aggressiveness and poor prognosis [[88]22]. C646 has been shown to impede G2/M cell cycle-related proteins and augment the antitumor efficacy against pancreatic cancer [[89]23]. G2/M checkpoint is also triggered by the ATR-CHK1-WEE1 pathway in response to DNA damage [[90]24]. Understanding these pathways provides insights into TME dynamics and potential therapeutic strategies. Gene expression pattern and bulk RNA-sequencing analyses of PDAC samples We acquired TCGA-PDAC raw count data to perform an analysis of differential expression. For differential expression analysis, 179 PDAC samples and 171 normal samples were acquired by merging normal samples from the GTEx project with TCGA-PDAC data. With the use of the"limma"R package, 3671 DEGs—2108 upregulated genes and 1563 downregulated genes—were found. The PCA diagram revealed the effective separation of tumor samples from normal samples (Supplementary Fig. S2 A). The DEGs between tumor and normal tissues were visualized in a volcano plot and a gene expression heatmap (Supplementary Fig. S2B-C). After the DEGs and the marker genes of the active cell subgroup intersected, 1422 differentially active marker genes were identified. Development and validation of the prognostic risk model Identifying the gene signatures for prognosis in PDAC To determine whether marker genes in the active cell population were associated with prognosis (p < 0.05), univariate Cox analysis was employed. After removing duplicate genes from the training set via LASSO regression analysis, a seed value of 10,210 was established. Sixteen genes were ultimately associated with the prognosis of PDAC, and the results are shown in Figs. [91]3A–3 C. The 16 prognostic genes were systematically selected from 1422 active markers and 3671 DEGs through multi-step filtering (intersection analysis, survival validation, and LASSO regression), ensuring statistical rigor and clinical relevance to PDAC prognosis. To categorize the patients into high- and low-risk groups, the median expression value of each of the sixteen genes—which were identified via Cox regression analysis—was utilized as the cutoff value. K–M survival curves were generated for patients from the TCGA-PDAC cohort to identify genes that reliably predict patient survival. The KM curves of these 16 genes showed notable variations from one another. The prognosis was poorer for patients in the high-risk group than for those in the low-risk group (Fig. [92]3D- 3S). Fig. 3. [93]Fig. 3 [94]Open in a new tab Cox and LASSO regression analyses were conducted on the TCGA-PDAC dataset. A The logarithm of the independent variable lambda is represented by the abscissa of the LASSO regression independent variable, while its coefficient is represented by the ordinate. B Confidence interval for LASSO regression for each lambda. C Significant prognostic gene LASSO regression coefficient. D–S Kaplan–Meier (K–M) survival curves for patients grouped according to the expression of the prognostic genes identified via Cox regression analysis. Red signifies the high-expression group, whereas blue represents the low-expression group, and significant differences are indicated Verification of the robustness of the model with both internal and external datasets By dividing the patients into high-risk and low-risk groups using the median RS value as the cutoff, we assessed the robustness of the models created using the sixteen gene signatures. We selected the TCGA-PDAC cohort as the model training cohort. We also selected [95]GSE62452 and [96]GSE57495 as validation cohorts to confirm the validity of our model. The K‒M analysis revealed that, across all datasets, the OS time of the high-risk group was significantly shorter than that of the low-risk group (Supplementary Fig. S3 A–3 C). The model's ability to predict prognosis was assessed using ROC curves (Supplementary Fig. S3D–3 F). In the TCGA-PDAC training cohort, the AUCs for 3- and 5-year survival were 0.87 and 0.85, respectively. In the [97]GSE62452 cohort, the AUCs of the [98]GSE62452 cohort was 0.82 at 3 years and 0.62 at 5 years. The AUCs of the [99]GSE57495 cohort for 3- and 5-year survival were 0.74 and 0.70, respectively. Collectively, these findings show that our predictive model performed well across several cohorts. Relationships between clinical characteristics and the model risk score in PDAC To determine whether the RS can be used as an independent prognostic indicator, multivariate Cox regression analysis was performed on patient clinical characteristics, including age and stage (Fig. [100]4A–4 C). The heatmap in Fig. [101]4A displays prognostic genes (e.g., LY6D, TSPYL2) identified through single-cell and bulk RNA-seq integration. While LY6D and TSPYL2 [[102]25] show preliminary links to DDR processes (see Results), further validation is needed to confirm their functional roles. LY6D knockdown experiments demonstrated increased DNA damage (Fig. [103]8D, [104]E), supporting its potential DDR relevance. Fig. 4. [105]Fig. 4 [106]Open in a new tab In terms of clinical features, the RS is an independent predictor of outcome. A Risk score graphs for the TCGA. The dashed line at the top of the graphic represents the median risk score for each cohort. Patients were classified into several risk groups on the basis of these criteria. The appropriate patient data are displayed in the middle and bottom of the diagram. B The findings of the Cox regression analysis conducted on the clinical features of the TCGA cohort are displayed in a forest map. C The nomogram based on the prediction model. The clinical factor's contribution to outcome events is depicted by the line segment, whereas the likelihood of 1-, 3-, and 5-year survival is illustrated by the lower three lines. The total points represent the cumulative score derived from the individual scores of all the variable values Fig. 8. [107]Fig. 8 [108]Open in a new tab LY6D affects DNA damage repair and proliferation in PDAC cells. A Immunohistochemical staining was used to detect the expression of LY6D in PDAC tissues and adjacent tissues. The dotted diagram shows a significant increase in LY6D expression in PDAC tissues compared to normal tissues (p < 0.0001). B IHC scores for each tissue type. C qPCR results showing the knockdown efficiency of LY6D in SW1990 cells and PATU- 8988 T cells. D, E Co-staining of 53BP1 and γ-H2 AX confirmed that LY6D knockdown induces DSBs, with colocalized foci observed in both SW1990 and PATU- 8988 T cells. F–H EdU assays demonstrated the proliferation ability of different SW1990 and PATU- 8988 T treatment groups Examination of the model's immune cell infiltration features The incidence and progression of malignancies are impacted by immune cell infiltration in the TME. Consequently, we used the CIBERSORT method to assess differences in the TME features between patients in the high-risk and low-risk groups. The immune cell bar chart provides a detailed representation of the 21 immune cell groups and their proportions (Supplementary Fig. S2 A). The data revealed notable differences in the proportions of immune cells infiltrating tumors between the high- and low-risk patient groups (Fig. [109]5A). Specifically, significant differences were observed for immune cells including M0 and M1 macrophages, naive B cells and eosinophils. Figure [110]5B shows the distribution of 21 immune cell types among patients in the high-risk and low-risk groups. Immunological cells secrete chemicals called immunological checkpoints that have the ability to control the level of immune activation. They have a significant effect on the development of autoimmune illnesses. Consequently, we examined the relationships among the expression levels of 16 model genes, cytokines, and five kinds of immunomodulators. The results revealed a robust relationship between immunological checkpoints and CXCL9 (Fig. [111]5C). Fig. 5. [112]Fig. 5 [113]Open in a new tab Assessment of the associations of the model risk score with immune cell infiltration and immune markers. A The variation in the percentage of 21 infiltrating immune cells between the patients in the high- and low-risk groups is displayed in the box diagram. The high-risk group is shown in yellow, whereas the low-risk group is shown in blue. B The proportions of immune cells in the high-risk and low-risk groups. C Relationships between immunological checkpoint genes and model gene expression Estimating drug sensitivity and the immunotherapy response In the TCGA-PDAC cohort, we computed the IC50 values and anticipated drug sensitivity for the high- and low-risk groups. The nine medications with the lowest P values are displayed individually in the box plot (Fig. [114]6A–J). Compared with the high-risk group, the low-risk group exhibited increased drug sensitivity, as indicated by lower IC50 values. On the basis of these data, we next carried out a correlation study between the IC50 values of various medications and the model gene expression levels. Figure [115]6K shows a good correlation between the IC50 values of these medicines and the expression level of the gene FAM83 A. In addition, the IMvigor210 dataset was divided into groups on the basis of the median value of the RS and was used to ascertain the model's ability to predict the response to immunotherapy. We discovered that, as shown in Fig. [116]6L, the RS determined by our model roughly demonstrated a favorable trend toward an immunotherapy response. Additionally, there were discernible variations in their overall response rates (ORRs). Compared with the low-risk group, the high-risk group presented greater complete response (CR) and partial response (PR) rates and lower progressive disease (PD) and stable disease (SD) rates (Fig. [117]6M). In conclusion, immunotherapy efficacy data revealed that high-risk group patients often benefited more from ICI treatment. In conclusion, these results may lead to a new understanding of the benefit of treating high-risk PDAC patients with ICIs and low-risk PDAC patients with chemotherapeutic medication. Fig. 6. [118]Fig. 6 [119]Open in a new tab The treatment results for PDAC patients predicted by the RS model. A–J The variation in the IC50 values of ten chemotherapeutic medicines between the low-risk and high-risk cohorts. Red indicates the high risk group, whereas blue indicates the low risk group. K The heatmap indicates the relationship between the IC50 values of chemotherapeutic medications and the expression of model genes. The color of the dots indicate the value of the correlation coefficient (positive or negative), whereas the symbol * denotes significance. L Violin plot illustrating the distribution of RSs across several immunotherapy response categories. The numerical number indicates the substantial disparity in the replies of the two groups. Immunotherapy response was classified into four groups: complete response (CR), partial response (PR), stable disease (SD), and progressive disease (PD). M The cumulative distribution of immunotherapy response for patients in the high-risk and low-risk groups is displayed in a histogram Distinctions in enriched pathways among patients in various risk groups Finally, we delineated distinct signaling pathways that were enriched in both the high- and low-risk TCGA-PDAC groups. The"GSVA"algorithm was applied for HALLMARK and KEGG pathway enrichment analyses. There were notable differences in the pathways enriched in the various risk groups. Heatmaps were used to display the top 20 pathways with the highest log2 FoldChange values for both the KEGG analysis and the HALLMARK analysis (Fig. [120]7A, [121]B). Additionally, Fig. [122]7C shows the results of an investigation of the relationship between RS and enrichment scores determined by our approach. These results will be useful in understanding the fundamental mechanism of the DDR in tumorigenesis. Fig. 7. [123]Fig. 7 [124]Open in a new tab Variations in pathway enrichment between the high and low risk groups. A KEGG pathways. B HALLMARK pathways. C Research into the correlation between RS and the KEGG pathway enrichment scores LY6D knockdown induces DSBs Figure [125]3C shows that LY6D made the greatest contribution to the RS computation. Consequently, we decided to carry out further experimental validation using LY6D. The IHC results revealed higher LY6D expression in pancreatic ductal adenocarcinoma tissues than in normal tissues (Fig. [126]8A). We used the H score for quantitative immunohistochemical analysis and Fig. [127]8B indicates the basis of our score. We knocked down LY6D in SW1990 cells and PATU- 8988 T cells using siRNAs (Fig. [128]8C). Then, we performed co-staining of 53BP1 (a marker of DNA double-strand breaks, DSBs) and γ-H2 AX (a marker of DNA damage signaling) in LY6D-knockdown PDAC cells. The merged images (DAPI/53BP1/γ-H2 AX) clearly demonstrate colocalization of 53BP1 foci and γ-H2 AX signals, confirming that LY6D depletion induces DSBs(Fig. [129]8D, [130]E). The expression of LY6D influences the cycle, apoptosis, and proliferation of PDAC cells To further validate the effect of LY6D on pancreatic cancer cells, we performed EDU assays to detect the effect of LY6D on cell proliferation (Fig. [131]8F-H), as well as apoptosis(Fig. [132]9A-D) and cycle assays(Fig. [133]9E). These results clearly show that LY6D promotes PDAC cell proliferation by enhancing the S-phase and G2/M transition while decreasing apoptosis. Fig. 9. [134]Fig. 9 [135]Open in a new tab Effect of LY6D on the phenotype of PDAC cells. A, B Flow cytometry analysis of the effect of knockdown of LY6D on apoptosis in SW1990 and PATU- 8988 T cells. C, D The percentage of apoptosis in the experimental and control groups was compared by calculating the sum of Q1-UR and Q1-LR, which was validated by three repetitions of the experiment showing p < 0.0001. E Impact of LY6D knockdown on pancreatic cancer cell cycle as determined by flow cytometry Discussion PDAC ranks as one of the most detrimental types of cancer in humans, as it has a very poor prognosis. PDAC is the most common type of pancreatic cancer, accounting for approximately 96% of cases [[136]26]. PDAC whole-exome and whole-genome sequencing has revealed changes in the key driver genes of most pancreatic tumors [[137]27, [138]28]. Genetic mutations in germline genes related to the DDR have been found in patients with sporadic PDAC [[139]29]. The BRCA2 gene has the most frequent germline mutations (1.4–7.4%). A recent prospective phase III study (the POLO trial, Pancreas Cancer Olaparib Ongoing, [140]NCT02184195) was conducted to assess the effectiveness of olaparib in patients with BRCA-mutant metastatic PDAC [[141]8]. PARPi have been approved for use as maintenance therapy in patients with metastatic pancreatic cancer who harbor pathogenic germline mutations in BRCA1/2. Recently, our team's latest findings reveal the dual functional significance of TPX2 in controlling DNA DSB repair pathway selection and mitotic progression, providing a potential PARPi therapeutic strategy for pancreatic cancer patients [[142]30]. Studies have shown that DDR deficiency is present in as many as 24% of patients with PDAC [[143]31]. Other studies have revealed the critical roles that DDR malfunction plays in PDAC carcinogenesis and treatment outcomes [[144]32]. Therefore, discovering more about the underlying processes of the DDR will aid in the determination of the best treatment option for patients and the understanding of PDAC characteristics. In this work, we combined bulk and single-cell RNA sequencing data to develop a predictive model for PDAC. On the basis of single-cell sequencing data from PDAC patients from the GEO database, we annotated and distinguished several cell subgroups. The findings revealed notable variations in the TME of various PDAC patients. The marker genes with significant specificity for each cell subgroup overlapped with a total of 478 DDR genes. On the basis of the expression patterns of these 478 DDR genes, we were able to identify active cell subgroups and their respective roles. Afterward, we used the"AUCell"technique to identify DDRMGs. GO and KEGG pathway enrichment analyses revealed notable enrichment of pathways linked to the ECM-receptor interaction and focal adhesion. The ECM, a critical component of the TME, undergoes significant remodeling in PDAC, contributing to tumor progression and treatment resistance. Desmoplasia, characterized by excessive fibroblast activity and ECM deposition, is a hallmark of PDAC histology and plays a key role in its fibrotic microenvironment [[145]33]. Our AUCell score analysis further highlighted the importance of tumor-associated fibroblasts, which exhibited the highest DDRMG scores. Recent studies have also demonstrated a close relationship between ECM remodeling and DDR mechanisms [[146]34–[147]36]. Our findings provide a fresh view of the DDR mechanism, revealing how it modifies the ECM and induces treatment resistance in PDAC. Next, we determined which marker genes were differentially expressed in PDAC patients via bulk RNA-sequencing data. The expression patterns of sixteen genes—LY6D, CXCL9, GMNN, FAM111B, NANOG, MYEOV, UCA1, SAMD9, DSG3, ZPLD1, FAM83 A, HMGA2, SLC2 A11, PLD4, CCDC154 and TSPYL2—that are linked to prognosis were used to construct prognostic risk models. Modifications in gene expression patterns mostly impact the immunological landscape and tumor features of various individuals. According to the latest research reports, LY6 proteins, such as LY6D, are becoming well known as superior tumor-associated antigens that, when combined with effective mismatch repair, may significantly suppress antitumor immunity in patients with malignancies [[148]37]. The lymphocyte antigen 6 complex, locus D (LY6D, commonly known as E48), is a GPI-anchored protein that is expected to be present on the plasma membrane and cell surface. It is located on human chromosome 8 in the 8q24 region [[149]38]. Recent studies have revealed that LY6D is associated with cell adhesion and has elevated expression in some types of malignancies [[150]39, [151]40]. LY6D is a CRC antigen that is expressed preferentially. By utilizing a TcE, one may effectively reroute and activate cytotoxic T cells to lyse CRC cells that express LY6D [[152]41]. Our study demonstrated that LY6D is highly expressed in pancreatic ductal adenocarcinoma and that knockdown of LY6D affects DNA damage repair capacity and cell proliferation in PDAC. According to a recent significant study, the CXCL9:SPP1 ratio can be used to describe the antitumor immune cell abundance in cancer, the gene expression patterns of each type of tumor-infiltrating cell, the control of communication networks that determine tumor control or progression, and the response to immunotherapy [[153]42]. When the median RS value was used to categorize patients into high- and low-risk groups, the ROC curve demonstrated that the model performed well in prognostic prediction. According to the survival study, individuals in the high-risk subgroup had a poorer prognosis than those in the low-risk subgroup did. Furthermore, our GSEA findings revealed a strong correlation between DDRMGs and the"MYC_TARGETS_V2"and"MYC_TARGETS_V1"pathways. MYC functions as a transcription factor for numerous DDR genes, such as WRN helicase, which is involved in the repair of replication forks that have stopped and abnormal replication intermediates. According to one study, MYC functions downstream in the DDR, regulating the TP53/MDM2 (MDM2)/ubiquitin-protein ligase tumor suppressor P19 ARF/E3 [[154]43]. This further validates the accuracy and guiding value of our models. Immune checkpoints are crucial for helping immune cells avoid immune surveillance, according to recent research [[155]44]. We also investigated the relationships between immunological checkpoints and sixteen signature genes that were incorporated into the prognostic model. Immune checkpoints and CXCL9 have been shown to be strongly correlated [[156]45]. A study demonstrated that, by suppressing CD8 + cytotoxic T cell infiltration, CXCL9 mediates antitumor immune responses in PDAC [[157]46]. Studies have demonstrated that tumor cell and TME immune features play major roles in the determining the efficacy of immunotherapy [[158]47]. DDR pathway abnormalities can alter immunogenicity and immune checkpoint levels, which directly impacts the efficacy of ICI therapy [[159]48]. Additionally, we analyzed the sensitivity of the two risk groups to routinely used anticancer medications. The IC50 values of these medications were positively correlated with the RS, according to our data, and the low-risk group responded better to the treatments. The effectiveness of immunotherapy showed that patients in high-risk groups often benefited more from ICI treatment. Our results indicate that the most beneficial treatment differed between risk groups; thus, our model can provide guidance for improving PDAC treatment options. In summary, our DDR-associated prognostic model offers valuable insights into the molecular mechanisms that may influence PDAC prognosis and treatment efficacy. This study presents a promising approach for integrating bulk RNA sequencing with single-cell sequencing data, which may contribute to advancing research in pancreatic cancer. The immunological environment, cell-specific markers, and cell heterogeneity can all be revealed by single-cell RNA sequencing methods, whereas bulk RNA sequencing can be used to detect significant changes in gene expression and signaling pathways. However, there are some limitations to our study. Since this work is retrospective in nature, additional validation of the model's accuracy with larger sample sizes and prospective cohorts is needed. Additionally, further investigation is needed to validate the regulatory roles of sixteen genes in the initiation and progression of PDAC and the DDR. Conclusion A novel prognostic model created on the basis of the sixteen DDRGs identified in PDAC may have practical applications, particularly in predicting immunotherapy efficacy. Supplementary Information [160]Additional file1^ (12.3MB, tif) [161]Additional file2^ (15.9MB, tif) [162]Additional file3^ (10.5MB, tif) [163]Additional file4^ (2MB, docx) Acknowledgments