Abstract Pancreatic cancer’s high incidence and mortality rates are underscored by ineffective treatments, particularly immunotherapy’s poor performance. This could stem from an unclear immune microenvironment, where NK cells may play a unique role. Analyzing the NK cell-differentially expressed genes (NKDEGs) from the PAAD_GSE162708 single-cell dataset and utilizing the TCGA-PAAD and ICGC-PACA-AU datasets, we identified 11 NKDEGs linked to pancreatic adenocarcinoma (PAAD) prognosis and developed a prognostic model. This model’s risk scores significantly outperformed traditional grading and TNM staging systems, validated through clinical and pathological analyses. Functional enrichment analysis pointed to the Neuroactive ligand-receptor interaction and MAPK signaling pathways, suggesting NK cells’ distinctive role in PAAD. High-risk groups showed decreased overall NK cells but increased activated NK cells, which may mediate adverse inflammatory responses. NK cells exhibit synergistic interactions with plasma cells and macrophages and negative regulation by monocytes and naive B cells. Our model accurately predicts immunotherapy responses, indicating potential for targeted drugs to enhance treatment. Additionally, we introduced an NKDEGs-based immunotyping approach for personalized medicine and clinical decision-making in PAAD. This study emphasizes NK cells’ potential in PAAD treatment, offering precise patient stratification and therapeutic targets for immunotherapy. Supplementary Information The online version contains supplementary material available at 10.1007/s12672-025-02266-z. Keywords: Machine learning, Immunotherapy, NK cell, Pancreatic cancer, Prognostic model, Molecular subtype Introduction Pancreatic cancer (PAAD) is the third leading cause of cancer-related deaths, with pancreatic ductal adenocarcinoma (PDAC) being the predominant pathological type [[32]1]. Due to its rising incidence and relatively stable mortality rate, PAAD is projected to become the second leading cause of cancer-related deaths by 2030 [[33]2]. While surgery and adjuvant chemotherapy can achieve long-term survival, they are only applicable to a small subset of patients with resectable disease [[34]3]. Over the past decade, combined chemotherapy regimens have provided significant options for patients with advanced and metastatic disease, allowing newly diagnosed patients to experience short-term partial remission or disease stabilization. However, nearly all patients eventually relapse, and second-line treatment options are limited [[35]4]. Consequently, the five-year overall survival rate for PAAD patients in the United States is only 9% [[36]5], underscoring the urgent need for alternative treatment strategies. Recent advances in the genomic and immune profiling of PAAD have spurred the development of targeted therapies, but these are currently limited to a small fraction of PAAD patients [[37]6]. Therefore, there is an urgent need to identify effective immunotherapy targets suitable for various PAAD patient subsets and to screen for matching drugs, in order to improve PAAD survival rates. Except for less than 1% of patients with high microsatellite instability (MSI-H) tumors, pancreatic ductal adenocarcinoma (PDAC) is nearly universally resistant to FDA-approved immunotherapies, which have significantly improved outcomes for patients with advanced solid tumors such as melanoma and lung cancer. The resistance of PDAC to immune checkpoint blockade (ICB) is almost unique in human oncology. PAAD, traditionally characterized as having an “immunologically cold” tumor microenvironment (TME) due to its capacity for immune evasion, exhibits remarkable heterogeneity. Rather than a uniformly inert landscape, PAAD comprises immune-depleted “desert” regions alongside infiltrated niches, demonstrating spatial divergence in immune activity. This compartmentalization underscores the presence of localized immunosuppressive networks, necessitating further mechanistic investigation [[38]7–[39]9]. Natural killer (NK) cells, among the most potent anti-tumor immune cells, are found in very low numbers in pancreatic ductal adenocarcinoma (PDAC), constituting less than 0.5% of tumor-infiltrating lymphocytes (TILs). Despite their limited numbers, NK cells have a significant anti-tumor effect in PAAD, and targeted activation of NK cells can enhance anti-tumor immunity and improve survival rates [[40]10–[41]12]. Additionally, NK cells appear to play a crucial role in regulating T cells and B cells within the PDAC environment [[42]13]. However, research into the phenotype and function of tumor-associated NK cells is still scarce, and more detailed characterization is likely beneficial for understanding their role in controlling PDAC progression. Machine learning models have become pivotal in identifying key therapeutic targets for cancer and in developing precise and effective diagnostic and prognostic models [[43]1, [44]14]. In this study, we analyzed the PAAD_GSE162708 single-cell dataset to identify NK cell-differentially expressed genes (NKDEGs). We then utilized the TCGA-PAAD and ICGC-PACA-AU datasets, applying 101 different machine learning models to discover 11 NKDEGs that are differentially expressed in PAAD and associated with patient prognosis. We constructed and validated an accurate prognostic model based on these findings. Additionally, our research highlights the potential role of NK cells in PAAD and offers insights for distinguishing patient subgroups and identifying effective therapeutic targets for immune-based treatments of PAAD. Materials and methods Selection of differentially expressed NK cell-differentially expressed genes (NKDEGs) in PAAD We acquired single-cell RNA sequencing data for pancreatic ductal adenocarcinoma (PAAD), which included 22,133 cells, from the PAAD_GSE162708 dataset available on the GEO website. The analysis was conducted using the TISH2 platform [[45]15]. We utilized Principal Component Analysis (PCA) for dimensionality reduction, and applied K-Nearest Neighbors (KNN) and Louvain algorithms to identify clusters. Cell types were annotated based on marker genes [[46]15, [47]16]. Differentially expressed genes within NK cell clusters were determined using the Wilcoxon test, focusing on genes with a logarithmic fold change (|FoldChange|≥ 1.5) and a false discovery rate (FDR) of less than 0.05 [[48]17]. Functional enrichment analysis of different cell clusters To explore the functional roles of various cell type populations, we performed gene set enrichment analysis using the TISH2 platform. This involved ranking genes according to fold changes obtained from differential analysis [[49]15]. We utilized KEGG enrichment analyses to identify and visualize pathways that were significantly upregulated or downregulated within each cell cluster (FDR ≤ 0.05). This approach enabled us to conduct a comprehensive functional enrichment analysis across the different clusters [[50]15]. Collection of PAAD transcriptome and clinicopathological data We acquired transcriptomic data from the TCGA-PAAD dataset, which includes 179 PAAD samples along with 4 paired adjacent normal tissues. Of these, 178 patients had comprehensive survival data, clinical pathological details, and somatic mutation information. Additionally, we gathered transcriptomic and survival data for 92 PAAD patients from the ICGC database. Construction and validation of prognosis-related NKDEGs model in PAAD Using the TCGA-PAAD dataset as our training cohort, we performed univariate Cox regression analysis (FDR < 0.05) to pinpoint NK cell-differentially expressed genes (NKDEGs) associated with PAAD prognosis. To mitigate the risk of model overfitting, we employed a diverse set of 101 machine learning models. These models were carefully selected based on their popularity, relevance, historical significance, practical applications, and educational value. This comprehensive approach ultimately led to the development of a PAAD prognosis model incorporating 11 NKDEGs. Table S1 provides a comparative summary of the key parameters and performance of the models. The NKDEGs included in this prognostic model and their associated gene coefficients are detailed in Table S2. We then calculated risk scores for each PAAD sample in the TCGA-PAAD dataset. Samples were categorized into low and high-risk groups using a 1:1 cutoff value. The model’s predictive accuracy for PAAD prognosis was assessed through Kaplan–Meier survival analysis and ROC curve analysis, utilizing R packages survival [[51]18] and timeROC. Additionally, we validated the robustness of the NKDEGs model using the ICGC dataset. Functional enrichment analysis of different risk score groups Differential gene expression analysis between high-risk and low-risk groups in TCGA-PAAD was carried out using the limma package in R, with thresholds set at |logFC|> 1 and FDR < 0.05. For exploring the molecular mechanisms underlying the PAAD risk score groups, we employed clusterProfiler [[52]18] in R for KEGG and GO pathway analyses. Additionally, Gene Set Enrichment Analysis (GSEA) was performed to evaluate biological functional differences between low and high-risk subgroups, with criteria set at |NES|> 1 and FDR < 0.05. Assessing the clinical relevance of NKDEGs models To establish that the risk score serves as an independent prognostic factor for PAAD, we performed both univariate and multivariate Cox regression analyses (FDR < 0.05) to examine the association between the risk score and various clinicopathological factors of PAAD, utilizing the survival package in R. Following this, we employed the survminer [[53]18] and timeROC [[54]18] packages in R to generate ROC curves, which illustrate the comparative effectiveness of the risk score against other clinicopathological factors in PAAD prognosis. Additionally, we visualized the relationships between the risk score and other clinicopathological variables using the ComplexHeatmap and reshape2 packages in R. Evaluate the ability of the NKDEGs model to characterize the immune microenvironment of PAAD We employed a range of algorithms, including TIMER [[55]19], Estimation [[56]19], ssGSEA [[57]19], MCP Counter [[58]19], EPIC [[59]19], CIBERSORT [[60]19], and QUANTISEQ [[61]19], to determine the relative levels of tumor-infiltrating immune cells in TCGA-PAAD samples. Additionally, both ssGSEA and CIBERSORT algorithms were used to estimate the abundance of various cell subpopulations within these samples. Furthermore, the ssGSEA algorithm was utilized to evaluate the immune functional status in TCGA-PAAD samples. Assessment of response to immunotherapy Using the R packages limma and reshape2 [[62]19], we identified differential expression of immune checkpoint genes across different risk score groups. Higher expression levels of these genes could be associated with enhanced effectiveness of immune checkpoint inhibitors. To assess the efficacy of immunotherapy, we applied a prognostic model based on the Biomarker Response to Immunotherapy Group (NKDEGs) to a cohort of 348 patients with malignant tumors undergoing immune therapy, as recorded in the IMvigor210 dataset. Identification of novel molecular subtypes of PAAD The R package ConsensusClusterPlus [[63]19] was utilized for unsupervised consensus clustering to uncover new molecular subtypes of PAAD. This package provides several critical outputs: the Consensus Matrix (CM), the cumulative distribution function (CDF) plot, and the consensus heatmap, all of which are instrumental in determining the optimal number of clusters for PAAD classification. The CDF plot offers insight into the stability of clustering results across different numbers of clusters. The CM, shown as a matrix, reflects the frequency with which sample pairs are consistently grouped together across multiple iterations, serving as a quantitative indicator of clustering stability. The consensus heatmap visually depicts the CM, aiding in the clearer interpretation of clustering outcomes. Statistical analysis All statistical analyses were conducted using R version 4.3.3. The Kruskal–Wallis test was applied to assess variations in immune scores, immune checkpoint gene expressions, and drug sensitivities across different clusters. To compare survival rates between two risk groups, a log-rank test from the R survival package was utilized for Kaplan–Meier (KM) analysis. Statistical tests were two-sided, with p values < 0.05 deemed statistically significant. Significance levels are indicated by asterisks: *p < 0.05, **p < 0.01, ***p < 0.001. To account for multiple testing in differential gene expression and pathway enrichment analyses, the false discovery rate (FDR) was controlled using the Benjamini–Hochberg method. Genes and pathways with FDR-adjusted p values below 0.05 were deemed statistically significant. Additionally, during machine learning-based model construction, stringent feature selection criteria were applied to prevent overfitting. Results NK cell-related immune microenvironmental crosstalk in PAAD We identified NK cell-specific genes following the workflow outlined in Fig. [64]1A. Specifically, we analyzed single-cell data from the PAAD_GSE162708 dataset, performing cell clustering and differential analysis to identify genes differentially expressed across various cell populations (Fig. [65]1B–D). Enrichment analysis of these differential genes using Hallmark Gene-Sets revealed that NK cells are involved in a wide range of PAAD-related functions, including cellular components, development, DNA damage, immune responses, metabolic pathways, proliferation, and signaling (Fig. [66]1E). KEGG enrichment analysis highlighted both known functions and pathways associated with NK cells, such as Natural killer cell mediated cytotoxicity, complement and coagulation cascades, and ECM-receptor interaction. Additionally, it uncovered potentially novel roles in PAAD, such as involvement in ribosome, spliceosome, and T cell receptor signaling pathway, suggesting that NK cells may have a unique role in regulating RNA and mediating T cell signaling in PAAD (Fig. [67]1F). Transcription factors play a crucial role in PAAD progression [[68]20], so we sought to identify NK cell-specific transcription factors to further elucidate their potential functions. Our results identified key transcription factors with specific expression in NK cells, including KMT2 A, MED1, BRD4, ERG, E2 F6, MAF, HEY1, H2 AFZ, STAT1, and CDK9 (Fig. [69]1G, [70]H). Since intercellular communication is likely crucial in PAAD, we employed CellChat analysis to explore potential interactions between NK cells and other cell clusters. We found that Myofibroblasts, Fibroblasts, and CD8 T cells are the most closely connected clusters to NK cells (Fig. [71]1I). Additionally, as donors, NK cells may regulate Myofibroblasts, Fibroblasts, and CD8 T cells through the ADGRE5-CD55 interaction (Fig. [72]1J). Conversely, as recipients, NK cells might be strongly regulated by B cells, CD8 T cells, Endothelial cells, malignant cells, and Myofibroblasts via HLA-E-KLRC2 and HLA-E-CD94/NKG2 C interactions (Fig. [73]1K). These findings suggest that NK cells not only act as effectors in the traditional B cell and T cell immune system but may also regulate the functions of these cells. Fig. 1. [74]Fig. 1 [75]Open in a new tab NK cell-related immune microenvironmental crosstalk in PAAD. A A detailed flowchart outlines the process for identifying and validating the prognostic model associated with NKDEGs. B The cellular map depicts the distribution and abundance of different cell subgroups (B, CD8 T, Endothelial, Fibroblasts, Malignat, Mast, Mono/Macro, Myofibroblasts, NK cells) within PAAD. C The pie chart shows the precise counts of each cell subgroup (B, CD8 T, Endothelial, Fibroblasts, Malignat, Mast, Mono/Macro, Myofibroblasts, NK cells) in PAAD. D The bar graph illustrates the proportional distribution of each cell subgroup (B, CD8 T, Endothelial, Fibroblasts, Malignat, Mast, Mono/Macro, Myofibroblasts, NK cells) among individual PAAD patients. E The heatmap presents the HALLMARK gene sets regulated by each cell subgroup (B, CD8 T, Endothelial, Fibroblasts, Malignat, Mast, Mono/Macro, Myofibroblasts, NK cells). F The heatmap displays the KEGG gene sets regulated by each cell subgroup (B, CD8 T, Endothelial, Fibroblasts, Malignat, Mast, Mono/Macro, Myofibroblasts, NK cells). G The heatmap highlights key transcription factors that are differentially expressed across various cells within PAAD. H The dot plot reveals significantly expressed transcription factors in NK cells of PAAD, including KMT2 A, MED1, BRD4, ERGE2 F6, MAF, HEY1H2 AFZ, STAT1, and CDK9. I Using CellChat, the interaction probability between specific cell groups and other cell groups is visualized. J Using CellChat, the interaction probability between NK cells as donors and specific gene pairs in other cells is illustrated. K Using CellChat, the interaction probability between NK cells as recipients and specific gene pairs in other cells is depicted Construction of NKDEGs prognostic model of PAAD based on multiple machine learning models To elucidate the role of NKDEGs in PAAD, we first analyzed all differentially expressed genes in PAAD tissues using the TCGA-PAAD dataset, and presented the expression levels of the top 50 genes across all samples (Fig. [76]2A). Additionally, we performed univariate Cox regression analysis and identified 29 NKDEGs strongly associated with PAAD prognosis (Fig. [77]2B). Leveraging these NKDEGs, we developed accurate prognostic models for PAAD using 101 different machine learning algorithms, and validated these models with data from the PACA-AU dataset in the ICGC database. The results indicated that NKDEGs demonstrated robust prognostic prediction capability for PAAD patients in both the TCGA-PAAD and ICGC validation datasets, with the CoxBoost + Ridge model achieving the highest predictive accuracy (mean AUC of 0.69) (Fig. [78]2C, Table S1). Consequently, we selected the CoxBoost + Ridge model to construct our prognostic model, extracting weights for 11 specific NKDEGs to calculate the risk scores for each PAAD sample (Table S2). Survival curves revealed that the risk scores effectively predicted PAAD patient outcomes in both training and validation sets (Fig. [79]2D, [80]E). Moreover, ROC curves confirmed that the NKDEGs prognostic model has excellent predictive accuracy (Fig. [81]2F, [82]G). Fig. 2. [83]Fig. 2 [84]Open in a new tab Establishment of prognostic model of NKDEGs in PAAD. A The heatmap illustrates the expression levels of NKDEGs associated with PAAD prognosis in both PAAD samples and paired adjacent non-tumor samples. B The univariate Cox regression analysis highlights the NKDEGs related to PAAD prognosis. C The NKDEGs prognostic model was developed using 101 machine learning models (training set: TCGA, validation set: ICGC). D The survival curves for the TCGA cohort are shown based on risk scores derived from the NKDEGs prognostic model. Based on patient risk scores assessed by the model, the graphs indicate that the survival prognosis for the high-risk group (red line) is significantly worse than that for the low-risk (blue line). E The survival curves for the ICGC cohort are depicted based on risk scores derived from the NKDEGs prognostic model. Based on patient risk scores assessed by the model, the graphs indicate that the survival prognosis for the high-risk group (red line) is significantly worse than that for the low-risk (blue line). F The ROC curve for the NKDEGs prognostic model risk scores in the TCGA cohort is presented. G The ROC curve for the NKDEGs prognostic model risk scores in the ICGC cohort is shown Clinical correlation analysis of NKDEGs prognostic model for PAAD Next, we analyzed the clinical relevance and potential clinical utility value of the NKDEGs prognostic model. Univariate Cox regression analysis revealed that age, grade, and risk score are all prognostic factors for PAAD. Notably, the hazard ratio (HR) for the risk score (162.964; 95% CI: 41.361–642.075) was significantly higher than that of other clinical pathological factors (Fig. [85]3A). Multivariate Cox regression analysis indicated that only age and risk score served as independent prognostic factors for PAAD (Fig. [86]3B). Concordance analysis demonstrated that the risk score accurately predicted the survival of PAAD patients across different time frames, outperforming other clinicopathological factors (Fig. [87]3C). Subsequently, we incorporated the significant variables, age and risk score, from the multivariate Cox regression analysis into a nomogram. This nomogram provides a straightforward and precise method for predicting the survival of PAAD patients, thereby offering valuable guidance for clinical decision-making (Fig. [88]3D, [89]E). Clinical relevance analysis showed that PAAD patients with high-risk scores were more likely to have advanced tumor stage and higher T classification (Fig. [90]3F, [91]G). There was also an apparent association between high-risk scores and high grade, although it did not reach statistical significance due to limited sample size (p = 0.054) (Fig. [92]3F, [93]G). Furthermore, we assessed the applicability of the NKDEGs prognostic model across various clinical subgroups. The results indicated that the NKDEGs prognostic model exhibited strong predictive capability for PAAD patient survival across different age, gender, grade, and stage groups (Fig. [94]3H–K), highlighting the model’s consistency within diverse clinical subpopulations. Fig. 3. [95]Fig. 3 [96]Open in a new tab Clinical correlation analysis of NKDEGs prognostic model in PAAD. A Single-factor Cox regression analysis examines the relationship between individual variables(age, gender, tumor grade, cancer stage, and risk scores) and overall survival (OS) in PAAD patients. B Multi-factor Cox regression analysis assesses whether risk scores and other clinicopathological factors can serve as independent prognostic factors for PAAD. C The C-index curve compares the concordance index (C-index) of risk scores with other clinical variables (age, gender, tumor grade, and stage). Risk scores demonstrate the highest predictive power for PAAD prognosis over time. D The nomogram integrates age and risk scores to predict individual probabilities of 1-, 3-, and 5-year survival. E Displays the agreement between predicted and observed survival rates for 1, 3, and 5 years, validating the model’s predictive accuracy with a high C-index (0.708). F The heatmap visualizes the distribution of clinical (age, gender) and pathological (tumor grade, stage, T, M, N categories) features across low- and high-risk score groups. Notable clustering patterns emerge, showing differences in clinical characteristics based on risk levels. G The bubble plot highlights the proportions and correlations between risk score groups (high vs. low) and clinical variables such as age, gender, grade, and tumor-node-metastasis (TNM) classification. H Kaplan–Meier survival curve stratifies PAAD patients younger than 65 and older than 65, showing significantly worse OS in the high-risk group for both age categories (p < 0.001 and p = 0.006, respectively). I Kaplan–Meier survival curve separates patients by gender (male and female), demonstrating that high-risk patients consistently have poorer OS across both groups (p < 0.001 and p = 0.002). J Kaplan–Meier survival curve analyzes tumor grade (G1 - 2 vs. G3 - 4), with high-risk scores predicting worse OS in both categories (p < 0.001 and p = 0.036). K Kaplan–Meier survival curve examines tumor stage (Stage I–II vs. Stage III–IV), indicating that high-risk patients have worse OS, with significant differences in both early- and late-stage disease (p < 0.001 and p = 0.023). *p < 0.05, **p < 0.01, ***p < 0.001 Characterization and analysis of NKDEGs prognostic model genes We further analyzed the genes included in the NKDEGs prognostic model to gain a better understanding of the role of NK cells in PAAD and why NKDEGs are effective in predicting PAAD patient outcomes. Our in-depth analysis of the [97]GSE162708 single-cell dataset revealed that IL32, CLEC2B, UCP2, EVL, and SATB1 are significantly upregulated in NK cells, whereas PSAP, SEZ6L2, TSPAN8, EFNB2, ITGA6, and ARHGAP29 are significantly downregulated in NK cells (Fig. [98]4A). To validate our findings, we selected IL32, the most representative NK-related differentially expressed gene, for experimental verification due to its highest fold change in NK cells (log2 FC = 1.56) and predominant expression in this cell type. qRT-PCR confirmed that IL32 expression was significantly elevated in the NK cell line NK- 92 compared to PAAD cell lines, including BXPC- 3, PANC- 1, SW1990, ASPC1, and COLO357 (Figure S1 A). Functional assays using IL32 knockdown and overexpression in NK- 92 cells co-cultured with PANC- 1 demonstrated that IL32 overexpression promoted PAAD cell proliferation, whereas knockdown suppressed it (Figure S1B). These findings provide compelling evidence that IL32, highly expressed in NK cells, plays a crucial role in PAAD progression. Survival analysis indicated that high expression of IL32, CLEC2B, TSPAN8, EFNB2, ITGA6, and ARHGAP29 correlates with poorer survival in PAAD patients, while the remaining genes are associated with better survival outcomes (Fig. [99]4B). We also investigated the correlations among the NKDEGs prognostic model genes to assess their potential synergistic effects. Our findings showed that SATB1 and ARHGAP29 have the highest correlation (0.53), while TSPAN8 and EFNB2 exhibit correlations with almost all other NKDEGs prognostic model genes, suggesting they may function as hub genes (Fig. [100]4C). Additionally, using GeneMANIA, we characterized the specific interactions and functional relationships among the NKDEGs prognostic model genes. The analysis revealed that these genes primarily regulate pathways related to the regulation of cell activation and positive regulation of leukocyte activation, which further underscores the potential role of NK cells in PAAD. Fig. 4. [101]Fig. 4 [102]Open in a new tab Characteristics of NKDEGs prognostic model genes. A A comprehensive reanalysis of the single-cell dataset [103]GSE162708 showcases the expression patterns of NKDEGs prognostic model genes in various cell clusters. B Survival analysis curves visually represent the significant influence of NKDEGs prognostic model genes on the outcomes for PAAD patients. C The gene correlation analysis reveals the interrelationships among the NKDEGs prognostic model genes. D Insights derived from the Genemania database analysis illustrate the functional regulation of NKDEGs prognostic model genes within the context of PAAD. *p < 0.05, **p < 0.01, ***p < 0.001 Functional enrichment in different risk score groups for PAAD To investigate the unique characteristics of various risk score groups and clarify the potential role of NK cells in PAAD regulation, we conducted a differential gene expression analysis using the TCGA-PAAD dataset. This analysis identified 58 upregulated genes and 210 downregulated genes (Fig. [104]5A), with the top fifty differentially expressed genes highlighted (Fig. [105]5B). Subsequent Gene Ontology (GO) analysis suggested that NK cells may play a role in key cellular molecular functions and biological processes in PAAD, including membrane potential regulation, signal release, and hormone secretion (Fig. [106]5C, [107]D). We then performed KEGG enrichment analysis to identify pathways differentially represented across risk score groups. This analysis revealed significant pathways such as Neuroactive Ligand-Receptor Interaction and the MAPK signaling pathway (Fig. [108]5E). Finally, Gene Set Enrichment Analysis (GSEA) was used to assess the importance of these enriched pathways. The results showed that, in the high-risk score group, pathways like aminoacyl-tRNA biosynthesis and the cell cycle were prominently enriched (Fig. [109]5F). Conversely, in the low-risk score group, the enriched pathways included primary immunodeficiency and steroid hormone biosynthesis (Fig. [110]5F). This comprehensive analysis provides valuable insights into the molecular mechanisms underlying different risk profiles in PAAD and the involvement of NK cells. Fig. 5. [111]Fig. 5 [112]Open in a new tab Functional enrichment analysis of different risk score groups. A The volcano plot displays the genes with differential expression across high and low risk patients. Genes highlighted in red have a fold change greater than 2 and an FDR less than 0.05, while those in green have a fold change less than − 2 and an FDR below 0.05. B The heatmap illustrates the expression levels of the top fifty differentially expressed genes among different risk score groups in PAAD samples. C The bubble plot highlights the pathways from the GO enrichment analysis, showing their proportions across different risk score groups. Key pathways include those involved in immune response regulation, extracellular matrix organization, and cellular metabolic processes. D The circular diagram provides a comprehensive view of GO enrichment analysis, categorizing pathways into three domains: Biological Processes (BP), Cellular Components (CC), and Molecular Functions (MF). Each section indicates the number of associated genes, with statistical significance (adjusted p values) represented by a color gradient. This visualization emphasizes the interconnected roles of pathways, including those involved in immune system processes, signal transduction, and tumor microenvironment remodeling. E The bubble plot summarizes the results of KEGG pathway enrichment analysis for different risk score groups. Notable pathways include the MAPK signaling pathway, neuroactive ligand-receptor interaction, and insulin secretion, which are closely linked to tumor growth, metastasis, and metabolic dysregulation in PAAD. F GSEA analysis identifies the key pathways enriched, including epithelial-mesenchymal transition (EMT) and immune-related processes in PAAD patients. G The box plot compares the tumor mutation burden (TMB) between high- and low-risk groups. High-risk patients exhibit a significantly elevated TMB (p = 0.033), suggesting that genetic instability correlates with higher risk scores. H Pearson correlation analysis reveals the relationship between risk scores and tumor mutation burden in PAAD patients. The positive correlation (R = 0.18, p = 0.007) indicates that higher risk scores are associated with increased TMB, suggesting a potential link between genetic mutations and risk stratification. I Analysis of somatic mutation data highlights the mutation rates of key genes in PAAD patients. The color-coded mutations (e.g., missense mutations, frameshift deletions) highlight the mutation frequencies, with high-risk patients exhibiting higher overall mutation rates. J The Kaplan–Meier survival curve evaluates the prognostic impact of TMB alone. Patients are divided into high- and low-TMB groups, with high TMB associated with worse overall survival (p = 0.009). K The Kaplan–Meier curve combines TMB and risk score groups to assess their joint impact on survival. Four subgroups are defined: TMB-high/risk-high, TMB-low/risk-high, TMB-high/risk-low, and TMB-low/risk-low. Patients in the TMB-high/risk-high group exhibit the poorest survival, while the TMB-low/risk-low group has the best prognosis (p < 0.001). *p < 0.05, **p < 0.01, ***p < 0.001 Mutations in critical genes within tumor cells are strongly linked to the progression and prognosis of PAAD [[113]21]. We sought to investigate whether the functionality of NK cells in PAAD is associated with these mutations. Our findings revealed that patients in the high-risk score group exhibited a higher tumor mutation burden (TMB), with a positive correlation between risk scores and TMB (Fig. [114]5G, [115]H). Further analysis showed that individuals with high-risk scores had notably higher somatic mutation rates in essential genes, such as KRAS (72% vs. 47%) and TP53 (65% vs. 49%), compared to those with low-risk scores (Fig. [116]5I). Additionally, PAAD patients with elevated TMB experienced significantly poorer prognoses than those with lower TMB (Fig. [117]5J). When evaluating prognosis using both TMB and risk scores, patients with both high TMB and high-risk scores had the poorest outcomes, while those with low TMB and low-risk scores had the best outcomes. Patients with either high TMB or high-risk scores alone had intermediate prognoses (Fig. [118]5K). These results suggest a potential link between NK cell functionality and somatic mutations in PAAD, and indicate that combining the NKDEGs prognosis model with TMB could improve the prediction of patient outcomes. Assessment of immune microenvironment in different risk score groups of PAAD NK cells, as essential immune components, are likely crucial in regulating the immune microenvironment of pancreatic ductal adenocarcinoma (PAAD). An analysis of the immune microenvironment reveals that patients with high-risk scores in PAAD tend to have a lower ESTIMATE score (p = 0.051), though there are no significant differences in StromalScore and ImmuneScore (Fig. [119]6A). Regarding immune cell infiltration, a significant negative correlation exists between risk scores and NK cell infiltration, with various B cells and T cells displaying similar trends to NK cells (Fig. [120]6B, [121]C). Using the CIBERSORT algorithm, we identified a significant increase in activated NK cells accompanied by a concurrent depletion of resting NK cell subsets in high-risk PAAD patients. Despite the elevated NK cell levels observed in this group, their prolonged activation—likely induced by sustained exposure to immunosuppressive cytokines (e.g., TGF-β, IL- 10) and metabolic stress within the tumor microenvironment (TME)—may drive aberrant inflammatory responses and contribute to an unfavorable prognosis. Additionally, the responses of T cells and B cells exhibited an inverse trend relative to the overactivation of NK cells (Fig. [122]6D). Further immune correlation analysis revealed a marked negative correlation between activated NK cells and the responses of B cells and T cells, indicating potential intercellular communication, consistent with findings in Fig. [123]1 (Fig. [124]6E). Additionally, we explored the potential regulatory role of NKDEGs prognostic model genes in intercellular communication and identified SEZ6L2, CLEC2B, and EFNB2 as key genes potentially regulating NK cell interactions with other cells (Fig. [125]6F). Fig. 6. [126]Fig. 6 [127]Open in a new tab Assessment of the immune microenvironment in different risk score groups. A The violin plot illustrates the StromalScore, ImmuneScore, and ESTIMATE scores across different risk score groups. B The heatmap shows the distribution and proportions of immune cells across various risk score groups, as determined by the CIBERSORT algorithm. C The bubble plot reveals the correlation between risk scores and immune cell infiltration in PAAD samples. D The box plot displays the abundance of immune cells across different risk score groups, as calculated using the CIBERSORT algorithm. E The heatmap highlights the correlations between various immune cells and functions in PAAD. F The heatmap illustrates the regulatory interactions of NKDEGs prognostic model genes with different immune cells and functions in PAAD. G The box plot presents the expression levels of immune checkpoint genes across different risk score groups. H Survival analysis evaluates the response and efficacy of immune therapy across different risk score groups within the IMvigor210 immune therapy cohort. I The box plot shows the variation in risk scores among different immune therapy response cohorts. J The association between TCGA immune subtypes and NKDEGs prognostic subtypes is examined. *p < 0.05, **p < 0.01, ***p < 0.001 We next evaluated the potential of the NKDEGs prognostic model in guiding clinical immunotherapy. Our analysis revealed that several immune checkpoint genes are upregulated in high-risk patients, suggesting that these individuals with PAAD might be more responsive to immune checkpoint inhibitors (Fig. [128]6G). Additionally, drug sensitivity analysis indicated that high-risk patients might respond better to targeted therapy with BI- 2536, while those in the low-risk group are more likely to benefit from AZD5363, Afuresertib, and BMS- 754807 (Figure S2 A). In the IMvigor210 immunotherapy cohort, we confirmed that high-risk patients experience notably poorer outcomes from immunotherapy (Fig. [129]6H). Furthermore, we observed a trend indicating that patients with immune therapy responses (CR/PR) tend to have lower risk scores compared to non-responders (SD/PD), although this trend did not reach statistical significance due to sample size limitations (Fig. [130]6I). Moreover, the TCGA official team has classified PAAD into four subtypes: Wound Healing (Immune C1), IFN-gamma Dominant (Immune C2), Inflammatory (Immune C3), and TGF-beta Dominant (Immune C6), which has become a significant reference [[131]22]. Our findings demonstrate that the NKDEGs prognostic model can effectively differentiate among these immune subtypes, particularly showing a significant increase in Immune C1 patients within the high-risk group, and a notable decrease in Immune C3 and C6 patients (Fig. [132]6J). Novel molecular classification based on 11-NKDEGs The therapeutic plateau of current PAAD immunotherapies, evidenced by a universal 100% non-durable response rate, necessitates novel stratification paradigms accounting for immune heterogeneity [[133]23]. Leveraging our 11-NKSG prognostic signature, we implemented a consensus clustering framework (ConsensusClusterPlus) to delineate molecular subtypes. Cluster stability optimization via Monte Carlo cross-validation identified k = 4 as the inflection point, demonstrating minimal cluster disassociation index and maximum proportional area change in cumulative distribution function analysis (Fig. [134]7A, [135]B). The sample distribution across different k values is presented in Fig. [136]7C, with the consistency matrix heatmap showing consistent blue shading when k = 4 (Fig. [137]7D). Kaplan–Meier survival analysis delineated a striking dichotomy: the C4 subtype exhibited superior median overall survival, while C1-C3 shared comparable mortality trajectories (Fig. [138]7E). A Sankey diagram illustrated the connection between the novel molecular subtypes and the prognosis model categories (Fig. [139]7F). Both PCA and tSNE analyses clearly differentiated PAAD patients according to the newly identified molecular subtypes (Fig. [140]7G, [141]H). Further, we examined whether the molecular subtyping approach could distinguish PAAD patients based on their immune characteristics. Immune microenvironment analysis revealed that the C1 group exhibited elevated ESTIMATE, Stromal, and ImmuneScores, while patients in the C4 group had lower scores, and those in the C2 and C3 groups showed intermediate values (Fig. [142]7I). Tumor Purity exhibited an inverse trend (Fig. [143]7I). Regarding immune cell infiltration, the C4 group exhibited the most pronounced NK cell infiltration and T cell activation, suggesting that the potent anti-tumor effects of NK cells might be a key factor in their favorable prognosis (Fig. [144]7J). Conversely, C1 group patients showed extensive activation of various immune cells, which might be beneficial for tumor killing but could also contribute to the poor prognosis due to excessive inflammation (Fig. [145]7J). In contrast, C3 group patients exhibited suppressed immune cells, representing an “immunologically desert” state with a lack of anti-tumor immune response (Fig. [146]7J). The C2 group showed no distinct immune cell characteristics, which might indicate other molecular mechanisms contributing to their poor prognosis (Fig. [147]7J). Furthermore, we analyzed the expression of immune checkpoint genes across different molecular subtypes. Patients in each PAAD molecular subtype could potentially benefit from targeted immune checkpoint inhibitors based on their characteristic gene expression profiles (Fig. [148]7K). We also identified potential targeted drugs suitable for each molecular subtype. C1 group patients were most likely to benefit from GSK343, C2 group patients from Sapitinib, C3 group patients from AZD6738, and C4 group patients from Linsitinib (Figure S3). Fig. 7. [149]Fig. 7 [150]Open in a new tab Novel molecular subtyping for identification of PAAD based on NKDEGs. A Delta area curves for varying numbers of classifications. B Cumulative Distribution Function (CDF) curves for different classification numbers, where each curve represents the model stability at different K values. C The heatmap displays the distribution of PAAD patients across different K values. D The consistency clustering plot shows the clustering results when K equals 4. E Survival curves illustrate the prognosis of PAAD patients with different molecular subtypes. F The Sankey diagram illustrates the correspondence between different molecular subtypes of PAAD patients and various risk score groups. G PCA analysis reveals the distribution of samples across different PAAD molecular subtypes. H t-SNE analysis shows the distribution of samples across different PAAD molecular subtypes. I The box plot demonstrates the ESTIMATE score, StromalScore, ImmuneScore, and tumor purity across different PAAD molecular subtypes. J The heatmap displays immune cell infiltration status across different PAAD molecular subtypes, based on various algorithms. K The box plot shows the expression levels of immune checkpoint genes across different PAAD molecular subtypes. *p < 0.05, **p < 0.01, ***p < 0.001 Discussion NK cells have been confirmed to have antitumor effects, but because pancreatic ductal adenocarcinoma (PAAD) is considered a “cold tumor” with a unique immune microenvironment, its response to immunotherapy is extremely limited [[151]24, [152]25]. Thus, identifying targets to harness the antitumor effects of NK cells and exploring their specific roles in PAAD is crucial for advancing treatment strategies [[153]26]. In this study, we analyzed the single-cell dataset PAAD_GSE162708 and identified NKDEGs. Enrichment analysis of these NKDEGs revealed that NK cells may regulate PAAD progression through pathways such as ribosome, spliceosome, and T cell receptor signaling pathway—areas that are not fully understood but have been associated with cancer progression in other contexts [[154]27, [155]28]. Additionally, we identified cell clusters interacting with NK cells, with myofibroblasts, fibroblasts, and CD8 T cells being the most closely associated. Subsequent analysis of the immune microenvironment revealed a positive correlation between NK cell infiltration and the presence of T cells and B cells, supporting the notion that NK cells are regulated by T cell and B cell signaling. However, there was a negative correlation between the activation of NK cell functions and those of T cells and B cells. This is intriguing because the unique immune evasion mechanisms of PAAD lead to T cell and B cell functional suppression, yet NK cell activity remains active, suggesting the involvement of additional regulatory mechanisms in PAAD. Notably, the Neuroactive ligand-receptor interaction and MAPK signaling pathways emerge as central enriched pathways within NK cell signature genes (NKDEGs). The former exerts its influence on NK cell cytotoxic functions through specific receptors and ligands, such as Killer Immunoglobulin-like Receptors (KIRs) and HLA class I molecules [[156]29]. On the other hand, the MAPK signaling pathways, including the ERK/MEK and p38 MAPK cascades, are pivotal in transducing extracellular signals into intracellular responses. These pathways enhance NK cell development, survival, and function, especially through the activation of NK cell receptors and the upregulation of ligands such as MICA/B, facilitating NK-mediated tumor cell lysis [[157]30]. Furthermore, in PAAD, NK cells act not only as effectors in the traditional B cell and T cell immune system but may also regulate these cells’ functions. Given that NK cells are the only activated immune cells in PAAD, targeting NK cells for immunotherapy underscores their significant therapeutic potential. We also revealed the impact of NK cells on the prognosis of PAAD. Machine learning (ML) is increasingly being used in clinical oncology to diagnose cancer, predict patient outcomes, and guide treatment plans [[158]31]. While liquid biopsy holds the potential to revolutionize cancer diagnosis, molecular analysis of solid tumors is currently more advanced, offering high-resolution molecular and clinical information that can better describe cancer prognosis [[159]32]. By leveraging NKDEGs and existing 101 machine learning models, we developed an accurate prognostic model for PAAD. Among these, the CoxBoost+ Ridge model exhibited the best performance, achieving the highest C-index in both cohorts (0.735 for PCA-AU, 0.644 for TCGA). The model’s validity was further substantiated through its application to the ICGC dataset. Our findings indicate that risk scores can effectively predict survival time in PAAD patients. Compared to traditional clinical pathological features, risk scores are an independent prognostic factor with predictive power far surpassing that of classic TNM staging or stage. From a clinical perspective, we constructed a nomogram using age and risk scores—both independent prognostic factors—to help clinicians accurately predict PAAD patient outcomes. This nomogram demonstrated that risk scores are broadly applicable across various clinical subgroups. To get a better understanding of the role of NKDEGs in PAAD, survival analysis combined with expression analysis indicated that pro-tumor genes such as IL32, CLEC2B were upregulated, while TSPAN8, EFNB2, ITGA6, and ARHGAP29 were downregulated. Overexpression of IL32 has been seen as a hallmark of PAAD development and progression as its association with increased inflammation, which may contribute to chronic inflammation and enhanced angiogenesis and extracellular matrix remodelling [[160]33, [161]34].As to CLEC2B, limited studies have shown that it may play a role in interacting with natural killer (NK) cells, potentially influencing their cytotoxic activity against tumor cells but still need further research [[162]34]. Notably, while UCP2 is traditionally linked to tumor progression and poor prognosis in many cancers due to its role in reducing oxidative stress and supporting metabolic reprogramming, our findings reveal a potential association with improved prognosis in PAAD. This discrepancy highlights the complex and context-dependent nature of UCP2 function within the unique tumor microenvironment of PAAD. Future studies are warranted to elucidate the precise mechanisms underlying this association, particularly its interaction with stromal and immune components. Additionally, our NKDEGs prognostic model is applicable to immunotherapy cohorts, where patients with high-risk scores showed poorer responses to immune treatment. However, our analysis of immune checkpoint genes and drug sensitivity indicated that high-risk PAAD patients could still benefit from targeted therapy with BI- 2536, providing valuable guidance for clinical decision-making. Beyond this, the NKDEGs identified in our study—IL32, CLEC2B, UCP2, EVL, SEZ6L2, TSPAN8, EFNB2, ITGA6, ARHGAP29, and SATB1—highlight potential avenues for NK-targeted therapeutic strategies. For instance, IL32 could be leveraged to enhance NK cell activation, while targeting EFNB2 and ITGA6 might promote NK cell infiltration into the tumor microenvironment. Emerging therapies such as adoptive NK cell transfer and checkpoint inhibitors for NK cells could also be optimized based on our findings. For example, NKDEGs like CLEC2B, involved in immune synapse formation, and ARHGAP29, regulating cellular signaling, may inform the design of more effective CAR-NK cells or combination therapies [[163]35, [164]36]. These insights not only reinforce the utility of our NKDEGs prognostic model but also underscore its potential to guide the development of innovative NK cell-targeted therapies. A significant reason for the suboptimal efficacy of immunotherapy in PAAD is the complexity and variability of the immune microenvironment among different PAAD patients, coupled with the lack of effective molecular subtyping to distinguish between various subtypes for personalized and precise treatment [[165]37]. Utilizing NKDEGs prognostic model genes, we identified four reliable molecular subtypes of PAAD. Analysis of the immune microenvironment revealed that patients in the C1 group had higher ESTIMATE scores, StromalScores, and ImmuneScores, while those in the C4 group had lower scores, with C2 and C3 groups falling in between. TumorPurity showed an opposite trend. In terms of immune cell infiltration, C1 was characterized as the immune hyperactivation type, with robust immune activity yet a poorer prognosis due to potentially dysfunctional immune responses. C2 and C3 were identified as the immune-inert and immune-desert types, respectively, each exhibiting low immune activity and poor prognoses. In contrast, the C4 group demonstrated the highest NK cell infiltration and T cell activation, suggesting that NK cell-mediated tumor killing may play a critical role in the improved prognosis observed in this group. Based on the aforementioned observations, we observed an increase in resting NK cells among patients with low-risk scores. Therefore, it is reasonable to hypothesize that NK cells in the C4 subgroup may also be in a resting state, essentially in a “primed” state, awaiting activation in response to specific signals, such as the presence of infected or malignant cells [[166]38–[167]40].These findings align with existing clinical classifications of PAAD, where subtypes resembling C1 and C4 may correspond to “Classical” types associated with better outcomes, and C2 and C3 align with “Basal-like” subtypes, known for their aggressive nature and poor survival [[168]41, [169]42]. Additionally, the C4 subtype’s strong NK cell activity suggests the potential for future clinical trials targeting NK cells to improve outcomes in high-risk subtypes such as C2 and C3. Therapeutic strategies, such as enhancing NK cell cytotoxicity through immune checkpoint blockade (e.g., targeting KIRs or NKG2 A) or adoptive NK cell therapies, hold promise for overcoming the immunosuppressive environments in these subtypes. Preclinical studies using subtype-stratified models could validate these approaches, further advancing personalized immunotherapy in PAAD [[170]43, [171]44]. Furthermore, the identification of molecular subtypes opens avenues for targeted therapeutic strategies. Based on immune checkpoint gene expression and drug sensitivity analyses, potential treatments were proposed: GSK343 for C1, Sapitinib for C2, AZD6738 for C3, and Linsitinib for C4. These therapies reflect the unique molecular and immune characteristics of each subtype and provide a framework for future personalized interventions. While our study provides novel insights into NK cell-related prognostic modeling in PAAD, several limitations warrant consideration. First, the exclusive reliance on retrospective databases (TCGA, ICGC) introduces potential selection bias, particularly regarding underrepresentation of ethnic diversity and regional healthcare disparities. Second, although we employed rigorous machine learning approaches across 101 algorithms, the observational nature of the data precludes causal inference between NKDEGs and clinical outcomes, and residual confounding by unmeasured variables such as chemotherapy adherence and comorbidities remains possible. While this study identified 11 NKDEGs significantly associated with PAAD prognosis, their precise roles in tumor progression, molecular interactions, and immune regulation remain incompletely understood. Although functional validation confirmed that IL32, highly expressed in NK cells, promotes PAAD cell proliferation, the contributions of other NKDEGs warrant further investigation. Comprehensive in vitro and in vivo studies, including gene overexpression, knockdown, and co-culture assays, are required to elucidate their mechanistic roles in tumor progression and modulation of the immune microenvironment. Finally, while our immunotyping strategy shows clinical promise, its generalizability requires validation in prospective multicenter cohorts with standardized immunotherapy regimens. Conclusion In summary, our study suggests that NK cells in PAAD may exert their tumoricidal effects through, but not limited to, synergistic interactions with B cells and T cells. The unique regulatory mechanisms of NK cells in PAAD warrant further investigation. Additionally, we have developed a reliable NKDEGs prognostic model and identified new molecular subtypes, providing important and valuable insights for the diagnosis, prognosis, and personalized treatment strategies of PAAD. Supplementary Information [172]12672_2025_2266_MOESM1_ESM.tif^ (2.9MB, tif) Additional file1 Expression and functional validation of IL32 in PAAD [173]12672_2025_2266_MOESM2_ESM.tif^ (1.6MB, tif) Additional file2 Drug sensitivity analysis of different risk score groups and immune characterization of NKDEGs prognostic model genes. A) The box plot illustrates the sensitivity of different risk score groups to various drugs. B) The association between NKDEGs prognostic model genes and immune cells and functions. *p < 0.05, **p < 0.01, ***p < 0.001 [174]12672_2025_2266_MOESM3_ESM.tif^ (1.9MB, tif) Additional file3 Drug sensitivity analysis of different PAAD molecular subtypes. A) The box plot illustrates the sensitivity of different PAAD molecular subtypes to various drugs. *p < 0.05, **p < 0.01, ***p < 0.001 [175]12672_2025_2266_MOESM4_ESM.xlsx^ (13.6KB, xlsx) Additional file4 Model Construction and Performance [176]12672_2025_2266_MOESM5_ESM.xlsx^ (8.6KB, xlsx) Additional file5 11 specific NKDEGs and Their Associated Risk Scores Acknowledgements