Abstract Pancreatic cancer (PC) is a highly aggressive and fatal malignancy, primarily affecting older males. Curcumin, a potential anti-cancer agent, has been shown to regulate key molecules in cancer progression, but its specific mechanisms in PC remain unclear. We conducted a comprehensive database search to identify curcumin-related targets in PC. Gene expression and immune correlations were analyzed using the GEO database, identifying differentially expressed hub genes (DEHGs). A method involving machine learning was employed to identify feature genes and create a nomogram, using external datasets and molecular docking for preliminary validation. Consensus clustering and subgroup comparisons were also performed based on DEHGs expression. We identified 35 DEHGs strongly associated with immune cell infiltration. Five feature genes (VIM, CTNNB1, CASP9, AREG, HIF1A) were used to build a nomogram, with the classification model showing AUC values above 0.9 in both training and validation groups. Molecular docking highlighted potential binding sites of five feature genes for curcumin. Clustering analysis categorized PC samples into four distinct subgroups: C1 and CII, which showed high expression and elevated immune cell infiltration, and C2 and CI, which exhibited the opposite pattern. Significant variations in scores of DEHG were seen between C1 and C2, in addition to between CI and CII. Curcumin may target DEHGs to influence PC, regulating immune and tumor proliferation mechanisms. These outcomes provide potential insights for medical applications and upcoming research. Supplementary Information The online version contains supplementary material available at 10.1038/s41598-025-05346-w. Keywords: Curcumin, Pancreatic cancer, Machine learning, Nomogram, Network pharmacology Subject terms: Cancer microenvironment, Cancer, Biomarkers, Gastroenterology, Oncology, Risk factors Introduction Pancreatic cancer (PC) is a very fatal tumor characterized by poor prognosis^[32]1. It ranks as the seventh leading reason for deaths connected with cancer globally and is anticipated to surpass breast cancer by 2025^[33]2. Characterized by early metastasis and resistance to chemotherapy and radiotherapy, radical surgical resection remains the primary treatment. However, most cases are diagnosed in progressive stages, with around 50% presenting distant metastases at diagnosis, thus missing the optimal window for surgery^[34]3. In spite of new advances, the 5-year survival rate is still around 7%, emphasizing the need for new biomarkers to improve early detection and management. Curcumin, a polyphenolic substance extracted from turmeric (Curcuma longa), has been examined as a prospective therapy for PC. Historically utilized as a food ingredient and in medicinal applications^[35]4–[36]7, preclinical investigations have shown curcumin’s anti-inflammatory, antioxidant, and anticancer effects by regulating multiple signaling pathways in various malignancies, comprising PC^[37]8,[38]9. Curcumin’s safety and low toxicity make it attractive for therapeutic use. Clinical trials have tested curcumin’s anticancer effects, with promising results for its application in cancer treatment^[39]10–[40]14. These findings suggest curcumin as a promising therapeutic for PC. Current studies are limited by small sample sizes and a lack of mechanistic focus, necessitating further investigation into curcumin’s molecular mechanisms in PC. Network pharmacology, which analyzes molecular interactions between medicines and diseases from a systemic and biological network perspective, is an essential approach for identifying active components and elucidating the pathways of traditional Chinese medicine (TCM)^[41]15. Therefore, this investigation utilized network pharmacology to discover the core targets of curcumin in PC therapy. The primary target expression was assessed employing the dataset of Gene Expression Omnibus (GEO). Feature genes were selected using a model of machine learning, and a nomogram for PC classification was developed. Molecular docking analysis was performed to explore potential binding sites between curcumin and the core targets. Lastly, clustering analysis examined the role of differentially expressed hub genes (DEHGs) in the subgroup classification of PC, revealing curcumin’s potential mechanisms of action. The overall study procedure is illustrated in Fig. [42]1. Fig. 1. [43]Fig. 1 [44]Open in a new tab The flowchart of this study. Methods Components and targets in Curcumin The structure and Isomeric SMILES of Curcumin were retrieved from the PubChem database. Target prediction was performed using SwissTargetPrediction ([45]http://www.swisstargetprediction.ch/) and SuperPred ([46]https://prediction.charite.de/). Additionally, target predictions were supplemented using the TCM System Pharmacology Technology Platform (TCMSP, [47]http://tcmspw.com/tcmsp.php), HERB ([48]http://herb.ac.cn/), and DrugBank ([49]https://go.drugbank.com/). The predicted targets were then validated through the UniProt database. Finally, all identified targets were compiled to construct a drug-target database. PC targets The investigation utilized PharmGKB ([50]https://www.pharmgkb.org/), Online Mendelian Inheritance in Man (OMIM, [51]https://omim.org/), and Genecards ([52]https://www.genecards.org/) to conduct searches for “Pancreatic cancer” in order to identify targets associated with PC. The acquired targets were consolidated, and duplicates were eliminated. Intersection of therapeutic targets The intersection of curcumin’s active compound targets and targets connected with PC was identified using the Venn package (Version: 1.12). These overlapping targets represent potential targets for curcumin in the intervention of PC. Construction of protein-protein interaction (PPI) network The intersecting targets PPI network was constructed using the STRING database ([53]https://string-db.org), limited to Homo sapiens and a minimum interaction score of 0.40. A minimum interaction score of 0.40 (medium confidence) was used to include a comprehensive set of potential interactions while maintaining sufficient network connectivity. For PPI network analysis, we applied the MCODE plugin of Cytoscape (version V3.8.0) with the following parameters: degree cut-off = 2, node score cut-off = 0.2, K-core = 2, and max depth = 100, to identify significant clusters within the network. Clusters with more than 10 nodes were selected, resulting in two major clusters (C1 and C2). Sub-networks were constructed for C1 and C2 based on half the average degree of nodes within each cluster. The genes within these clusters were considered key targets potentially involved in curcumin’s mechanism in pancreatic cancer. Data collection and processing To obtain samples from the database of GEO ([54]https://www.ncbi.nlm.nih.gov/geo/), we employed “Pancreatic cancer” as a keyword and limited the data kind (Expression profiling by array) and organism (Homo sapiens). Gene expression data and clinical information were obtained from the GEO database. [55]GSE62165 (13 non-tumoral and 118 pancreatic tumor samples) was used as the training set, and [56]GSE71729 (46 normal and 145 primary pancreatic tumor samples) served as the validation set. The raw data were preprocessed and annotated with official gene symbols, and expression levels of core genes associated with curcumin intervention in pancreatic cancer (PC) were extracted and compared between normal and PC samples. Expression difference of core genes, chromosome position, and expression correlation of significantly differently expressed core genes We performed differential expression analysis of the previously identified hub genes from the PPI network using the limma package (version 3.56.2). Gene expression profiles from healthy pancreatic tissue and pancreatic cancer (PC) samples were visualized using pheatmap (version 1.0.12) and ggpubr (version 0.6.0). Hub genes with an adjusted p-value < 0.05 and |log2 fold change| ≥ 1 were defined as differentially expressed hub genes (DEHGs). The chromosomal locations of these DEHGs were plotted using the Rcircos package (version 1.2.2). In addition, the correlation coefficients among SDECGs were calculated using the “cor” function and subsequently visualized. Infiltration, difference, and correlation of immune cells in PC samples The proportion of immunity cells was derived from 1,000 simulations conducted employing the CIBERSORT tool in R. The immunity cell composition in every sample was then shown employing a bar plot. The R packages “GSVA” and “GSABase” were employed to conduct Single-sample gene set enrichment analysis (ssGSEA) to evaluate the variations in immunity cell composition between the healthy and PC groups. Box plots were conducted to represent the outcomes of the ssGSEA. The scores of ssGSEA were employed to intersect the DEHGs, and the correlation coefficients were obtained employing the correlation analyses, which were then presented. Selection of machine learning model and nomogram for curcumin in the treatment of PC To ensure robust and accurate identification of pancreatic cancer-related biomarkers from high-dimensional gene expression data, we employed four complementary machine learning algorithms: Generalized Linear Models (GLMs), Support Vector Machines (SVMs), Random Forests (RF), and Extreme Gradient Boosting (XGBoost). GLMs provide interpretable statistical associations by modeling both linear and nonlinear effects. SVMs are effective in handling high-dimensional data using kernel functions to focus on informative gene subsets. RF captures complex nonlinear interactions and offers feature importance rankings, while XGBoost enhances predictive performance through gradient boosting and regularization, making it suitable for sparse or imbalanced datasets. This multi-model strategy enhances the stability and reliability of feature selection. Gene expression values of differentially expressed hub genes (DEHGs) were used as input features to develop four machine learning models: GLM, SVM, RF, and XGB. The [57]GSE62165 dataset served as the training set, while [58]GSE71729 was used as the independent validation set. Model performance was assessed using ROC curves, AUC, reverse cumulative residual plots, and residual box plots. We compared the predictive performance of all four models, and the one with the highest AUC and lowest residual variance was selected as the optimal classifier. To identify key feature genes among DEHGs, we used the residual and ROC analyses. These feature genes and their expression profiles in normal and cancer samples were then used to construct a predictive nomogram. Decision curve analysis (DCA) and calibration plots were further applied to evaluate its clinical utility and accuracy. Molecular docking The three-dimensional models of the previously acquired curcumin and feature genes were retrieved from the Protein Data Bank (PDB) ([59]http://www.rcsb.org/) and PubChem ([60]https://pubchem.ncbi.nlm.nih.gov). To assess the interaction of curcumin with target genes, molecular docking was performed using AutoDock Vina (version 1.2.2). The protein structures were prepared by removing water molecules and any bound ligands, followed by the addition of hydrogen atoms using AutoDockTools. Curcumin was optimized using Chem3D to minimize energy and ensure proper geometry before the docking process. Docking simulations were conducted with AutoDock Vina, using a grid box centered around the binding site of each target protein. The grid dimensions were set appropriately, with coordinates defined based on the active sites of the proteins. The most favorable binding poses were selected based on the binding energy, and the interactions were visualized using PyMOL. The binding energy for each complex was recorded. Finally, the ligand-receptor interaction patterns were analyzed using Discovery Studio Visualizer, and visual representations highlighting molecular forces such as hydrogen bonds and hydrophobic interactions were generated. Clusters of DEHGs and analysis between DEHGs clusters Based on the DEHGs expression using a k-means clustering algorithm, Euclidean distance metric, and a maximum of nine clusters, the package for R called “Consensus ClusterPlus” (Version: 1.66.0) was utilized to cluster PC specimens. The resultant groups were examined by contrasting their levels of expression employing heat maps and box plots. The differentiation among clusters was evaluated by employing principal components analysis (PCA). A CIBERSORT analysis of the DEHG clusters was performed to generate a bar plot depicting the individual immunity cell composition of every specimen across the various clusters and to assess the variations in immunity cell content across the clusters. The GMT files obtained from the GSEA platform ([61]http://www.gsea-msigdb.org/) were utilized to conduct the Kyoto Encyclopedia of Genes and Genomes (KEGG)^[62]16–[63]18 and Gene ontology (GO) enrichment analyses, while the R programming language to evaluate the expression of the enrichment items across clusters was evaluated to perform Gene Set Variation Analysis (GSVA) (version 1.50.1). In the end, differential analysis was conducted on the gene expression of DEHG clusters, utilizing filtering criteria of |logFC| > 0.585 and adjusted P-Value < 0.05. Enrichment analysis in differentially expressed genes (DEGs) between DEHGs clusters The DEGs within the differentially expressed hub genes (DEHGs) clusters underwent molecular function (MF), biological process (BP), and cellular component (CC) gene ontology (GO) enrichment analysis, in addition to KEGG pathway enrichment analysis. The analyses were performed employing the R software packages, including “clusterProfiler” and “enrichplot,” with a screening criterion of p-value < 0.05. The outcomes were shown as circular plots and bar graphs. Clusters of DEGs and analysis among DEG clusters We performed an additional analysis of the cluster based on the DEGs expression (from [64]GSE62165 dataset), employing the exact identical clustering algorithm as described in Sect. 2.11, and chose the DEG cluster exhibiting the best accurateness. The levels of DEGs expression across various clusters, the variations in DEHGs, and immunity cell composition across these clusters were analyzed in accordance with the DEG clustering outcomes. The data were represented employing box plots, respectively. DEHGs scores and differential analysis, and construction of alluvial plot We used the PCA approach to determine the DEHGS scores for every specimen by aggregating PC1 and PC2 according to the levels of DEHGs expression (from [65]GSE62165 dataset). Differential analysis was conducted on the DEHGS scores of DEHGs and DEG clusters employing packages in R like “limma” (version 3.56.2) and “ggpubr” (version 0.6.0). Box plots were generated to depict the DEHG scores of samples categorized into DEHGs and DEGs. An alluvial diagram was created by employing the package in R called “ggalluvial” to illustrate the linkages and mechanisms between the DEHG clusters, DEG clusters, and samples with elevated and decreased scores of DEHG. Statistical analysis All statistical analyses were performed employing R version 4.3.3. This study used t-tests to compare two independent samples and the Wilcoxon paired rank sum test for two paired samples. For datasets that include three or more groups, one-way analysis of variance (ANOVA) and the Kruskal-Wallis rank sum test were utilized, while correlation analysis was conducted employing the Spearman rank correlation test. Statistical significance was determined at a P-value of less than 0.05 or a false discovery rate (FDR) adjusted P-value of less than 0.05, employing the Benjamini-Hochberg method. Results Identification of potential targets of curcumin in the management of PC To identify the potential targets of curcumin, we searched five databases (SuperPred, SwissTargetPrediction, HERB, ETCM, and TCMSP) and identified 407 curcumin-related targets for further analysis (Fig. [66]2A). The PC-related targets were acquired from the databases of GeneCards, PharmGKB, and OMIM, yielding a total of 13,972 targets (Fig. [67]2B). By intersecting drug and disease targets through a Venn diagram, 234 overlapping targets were found, indicating the potential therapeutic effects of curcumin against PC (Fig. [68]2C). Figure [69]2D illustrates the PPI network of these targets. To identify key targets, we utilized the MCODE plugin for clustering the PPI network, which resulted in two clusters, C1 and C2, each containing more than 10 nodes (Fig. [70]2E and F). Ultimately, we identified 52 key targets, considered the primary targets of curcumin in PC, including MYC, MAPK8, IL6, CDH1, CCND1, VIM, CTNNB1, CASP9, AREG, and HIF1 A. For more comprehensive information on the key targets, please refer to Supplementary Table 1. Fig. 2. [71]Fig. 2 [72]Open in a new tab Target Analysis of Curcumin and PC. (A) Venn diagram of Curcumin targets obtained from five databases. (B) Venn diagram targets connected with PC retrieved from the GeneCards, PharmGKB, and OMIM databases. (C) Overlap of identified target genes between Curcumin and PC. (D) PPI network from STRING database visualized in Cytoscape 3.8.0. The color intensity of each node reflects its degree value: the deeper blue the color, the higher the degree; the lighter the color, the lower the degree (E) The C1 network cluster with more than 10 nodes identified through MCODE clustering. (F) The C2 network cluster with more than 10 nodes determined through MCODE clustering. DEHG expression differences, chromosomal localization, and intergene associations between PC and normal tissues The boxplot (Fig. [73]3A) displays the levels of key hub genes expression in PC tissues compared to control tissues. The heatmap (Fig. [74]3B) highlights the 35 DEHBs between control and PC samples. Among them, CASP9, MTOR, AKT1, TERT, MYC, and NANOG are highly expressed in the normal group, while BRCA1, HSP90 AA1, MET, TNF, CASP8, STAT1, MYD88, CASP3, TLR4, IL1 A, PTGS2, HIF1 A, AREG, MMP9, TGFB1, HMOX1, IL6, NFE2L2, STAT3, NFKB1, PDGFRB, CTNNB1, MMP2, VIM, AXL, PTK2, TJP1, MAPK3, and PPARG are highly expressed in the PC group. Figure [75]3C displays the chromosomal locations of these genes. Both the chord diagram and the correlation heatmap demonstrate strong relationships between these genes (Fig. [76]3D-E). Fig. 3. [77]Fig. 3 [78]Open in a new tab Expression and Correlation Heatmap of Key Hub Genes. (A) Boxplot displaying the differential key genes expression between control and PC tissues. Red represents control samples, and blue represents PC tissues. ***p < 0.001, **p < 0.01, *p < 0.05. (B) Heatmap depicting the expression profiles of significantly differentially expressed genes in control vs. PC specimens. The color gradient from red to blue represents high to low expression levels. (C) Chromosomal distribution of the differentially expressed genes (DEGs). Each bar indicates the number and position of DEGs mapped to specific chromosomal regions, providing a genomic overview of their spatial organization. (D) Chord diagram displaying pairwise correlations among selected DEGs. The color of the connecting ribbons represents the direction and strength of correlation: red indicates strong positive correlation (approaching + 1), and green indicates strong negative correlation (approaching − 1). (E) Gene-gene correlation heatmap, with red representing strong positive correlations and blue showing negative associations. Correlation of DEHBs with immune cells in PC To discover the pathways of DEHBs in the PC group from various perspectives, we conducted an immunity cell infiltration analysis. First, we analyzed the relative abundance of various immunity cell kinds in PC and control samples (Fig. [79]4A). Figure [80]4B presents a box plot of immune cell proportions, revealing that activated dendritic cells, regulatory T cells (Tregs), and M2 macrophages were significantly more abundant in PC samples compared to controls. In contrast, resting mast and naive B cells were more prevalent in control tissues. These findings suggest that PC is associated with an immunosuppressive environment, potentially contributing to tumor progression. Figure [81]4C displays a heatmap demonstrating the connection between immune cell types and key hub genes. Notably, genes such as IL6 and NFKB1 were strongly positively correlated with immune-activating cells (e.g., activated mast and memory CD4 + T cells), while genes like MAPK3, MET, and TERT showed significant negative correlations with cells suppressed by immunity, like M2 macrophages and Tregs. These correlations underscore the possible function of these genes in modulating the immune microenvironment in PC and suggest their value as therapeutic targets. Fig. 4. [82]Fig. 4 [83]Open in a new tab Correlation of Immune Cell Infiltration and DEHBs. (A) Bar plot showing the relative abundance of immune cell kinds in control and PC tissues. The x-axis represents the sample groups (control vs. PC), and the y-axis shows the relative percentage of each immune cell type, with different colors indicating various immune cell populations. (B) Box plot showing the proportions of immunity cell types in control and PC tissues. Red represents control samples, and blue represents PC samples. **p < 0.001, **p < 0.01, *p < 0.05. (C) Heatmap illustrating the correlations between immune cell types and DEHBs. The color gradient indicates correlation values (red for positive correlation, blue for negative correlation). ***p < 0.001, **p < 0.01, *p < 0.05. Development of PC prediction models depending on DEHGs We constructed various machine learning prediction models based on DEHG expression data. In the training cohort ([84]GSE62165), Fig. [85]5A displays the boxplots of residuals for four models: RF, SVM, XGB, and GLM. The RF and SVM models demonstrated the lowest residuals and the steepest cumulative distribution of residuals (Fig. [86]5C), indicating superior predictive performance. Figure [87]5B shows the feature importance of each model, highlighting the genes that contributed most to predictive accuracy. The ROC curves for all models are shown in Fig. [88]5D, with the RF model achieving the uppermost AUC of 1.000, followed by SVM (0.980), XGB (0.981), and GLM (0.943), confirming that RF and SVM had the best sensitivity and specificity in the training set. In the validation cohort ([89]GSE71729), Supplementary Figs. [90]1 A-B again display the boxplots and cumulative distribution of residuals for each model. In the validation cohort, SVM achieved the highest AUC of 0.918 (Fig. [91]5E), followed by RF (0.897), XGB (0.884), and GLM (0.687) (Supplementary Fig. [92]1 C-E). These results further validate the robustness of the SVM models in predicting PC. In the SVM model, we selected the top five genes with the maximum importance scores (VIM, CTNNB1, CASP9, AREG, and HIF1 A) to construct a nomogram (Fig. [93]5F). Patients can predict the probability of developing PC by calculating the total score depending on the expression levels of these key genes. Figure [94]5G shows the calibration curve of the nomogram, indicating consistency between anticipated and actual probabilities. The close alignment of the curves suggests good predictive accuracy. Figure [95]5H presents the decision curve analysis, where the red line represents the model, demonstrating net benefit across a range of threshold probabilities. The model consistently outperforms the “no model” strategy, highlighting its clinical utility. Fig. 5. [96]Fig. 5 [97]Open in a new tab Performance and Feature Importance of PC Prediction Models. (A) Boxplots of residuals for RF, SVM, XGB, and GLM models in the training cohorts, with lower residuals indicating better performance. (B) Feature importance analysis of the models in the training cohort ([98]GSE62165), highlighting the key genes contributing to predictive accuracy. (C) Cumulative distribution of residuals for each model, with steeper curves indicating better performance. (D) ROC curves for the four models in the training cohort. (E) ROC curves in the validation cohort for SVM models. (F) Nomogram predicting disease risk based on molecular markers VIM, CTNNB1, CASP9, AREG, and HIF1 A. Each marker contributes a corresponding score, and the total score correlates with overall disease risk. (G) Calibration curve comparing predicted probabilities to actual outcomes. (H) Decision curve analysis assessing the medical utility of the nomogram model, showing higher net benefit across threshold probabilities compared to no model. Molecular docking analysis based on five molecular markers To investigate the potential binding between curcumin and the proteins encoded by the five key molecular markers identified in our analysis (CTNNB1, HIF1 A, AREG, VIM, and CASP9), we performed molecular docking using AutoDock Vina v1.2.2. The 3D structure of curcumin was retrieved from the PubChem database (CID: 969516), and the crystal or NMR structures of the five protein targets were obtained from the RCSB PDB database: CTNNB1 (PDB ID: 1 g3j), HIF1 A (1 h2k), AREG (2rnl), VIM (3 g1e), and CASP9 (1jxq). Prior to docking, all structures were preprocessed by removing water molecules, adding polar hydrogens, and converting to PDBQT format. Grid boxes were centered around the functional domains of each protein to ensure adequate coverage of potential binding pockets. Docking results revealed that curcumin formed stable complexes with all five proteins, with binding energies below − 5.0 kcal/mol, indicating strong and favorable interactions (Supplementary Table 2). The docking poses demonstrated that curcumin occupied hydrophobic pockets and established hydrogen bonds and electrostatic interactions with key residues at the binding sites (Fig. [99]6A-E). Fig. 6. [100]Fig. 6 [101]Open in a new tab The molecular docking models of curcumin with the five molecular markers. (A) CTNNB1. (B) HIF1 A. (C) AREG. (D) VIM. (E) CASP9. (I) Cartoon representations showing the superimposed structures of curcumin and the corresponding protein targets. (II) Three-dimensional visualization of the binding pockets generated using Discovery Studio. Non-bonded interactions between receptor and ligand atoms are depicted as dashed lines in various colors. Bold lines highlight the ligand and receptor residues directly involved in binding, while lighter lines indicate surrounding residues forming the pocket. (III) Hydrogen bond interaction models: receptor residues acting as hydrogen bond donors are shown beneath pink surfaces, while acceptors are shown beneath cyan surfaces. (IV) Hydrophobic interaction models: the color gradient ranges from brown (indicating highly hydrophobic regions) to blue (indicating less hydrophobic regions). Green labels indicate amino acid three-letter codes and corresponding residue IDs. Identification of two PC subtypes based on DEHGs Using unsupervised K-means consensus clustering, two distinct subtypes of PC were identified based on 35 DEHGs. The optimal clustering was determined when K = 2 (Fig. [102]7A). PCA further confirmed a clear distinction between the two clusters (Fig. [103]7B). In contrast to Cluster 1, 15 DEHGs, except for MAPK3, PPARG, and TJP1, were significantly upregulated in Cluster 2 (Fig. [104]7C-D). To explore the biological processes and pathways involved in each subtype, we performed GSVA. GO analysis showed that processes related to cell proliferation and immune evasion, such as Positive regulation of macromolecule biosynthetic process, Epithelial structure maintenance, and Regulation of myeloid leukocyte differentiation, were upregulated in Cluster 2 (Fig. [105]7E). On the other hand, functions related to cell inhibition, such as Negative modulation of growth, were enriched in Cluster 1. Analysis of the KEGG pathway revealed that mechanisms connected to metabolism and proliferation, like Insulin resistance, Oxidative phosphorylation, and Cell cycle, were activated in Cluster 2 (Fig. [106]7F), while immune regulation and cellular homeostasis pathways, including the NOD-like receptor signaling pathway, Autophagy, and Phosphatidylinositol signaling system, were activated in Cluster 1. Finally, Fig. [107]7G illustrates the immune cell infiltration in the two subtypes. Cluster 1 exhibited higher levels of immune-effector cells, such as resting memory CD4 + T cells and activated NK cells. The box plot in Fig. [108]7H further quantified the significant variations in the proportions of various immunity cell types between the two groups. Fig. 7. [109]Fig. 7 [110]Open in a new tab Analysis of Two Molecular Subtypes of PC Samples. (A) Consensus cumulative distribution function (CDF) curves and delta area plot, with a consensus heatmap identifying two distinct subtypes (C1 and C2). (B) PCA plot demonstrating sample distribution across the two subtypes. (C) Heatmap of differential gene expression, where red shows high expression and blue shows low expression. (D) Boxplots showing the expression of DEHGs in the two subtypes, with red for C1 and blue for C2. (E, F) GSVA results display the GO biological processes (E) and KEGG pathways (F) enriched in each subtype. Red bars represent processes enriched in C2, while blue bars represent those enriched in C1. (G) Immune cell infiltration landscape for the two subtypes. (H) Boxplots illustrating the proportion of different immunity cell types between C1 and C2. Comprehensive analysis of the molecular characteristics and immune microenvironment of DEHGs subtypes To better understand the gene expression patterns between the subtypes and explore the underlying biological mechanisms, we screened for DEGs between clusters C1 and C2. In the volcano plot (Fig. [111]8A), the DEGs are highlighted, showing significantly upregulated (red) and downregulated (blue) genes between the two subtypes. The heatmap analysis (Fig. [112]8B) illustrates the clustering of these DEGs across samples, revealing clear expression differences between the two clusters (C1 and C2), suggesting distinct molecular signatures. Figure [113]8C presents the GO enrichment analysis, which highlights key biological processes associated with each subtype. Enrichment of pathways related to signal transduction, protein localization, and regulation of cell morphogenesis in one subtype suggests a more aggressive tumor behavior, whereas the other subtype shows enrichment in immune response and metabolic processes. Figure [114]8D presents the top 20 pathways significantly enriched in both subtypes, with pathways like cytokine-cytokine receptor interaction, Toll-like receptor signaling, and immune-related pathways being particularly prominent in Cluster 2, indicating that Cluster 2 may have a more active immune microenvironment. Figure [115]8E shows the consensus clustering analysis, identifying the optimal number of clusters (K = 2) depending on the CDF curve and delta area plot. Figure [116]8F further supports this division with PCA, demonstrating a clear separation between the two subtypes (CI and CII). Figure [117]8G presents the heatmap of differential gene expression in CI and CII, as well as the box plots of DEHGs in CI and CII. Results show that DEHGs such as MMP9, HMOX1, IL1 A, IL6, PTGS2, and AREG are upregulated in CII, while PPARG is downregulated in CII. Figure [118]8H displays the immune infiltration landscape, where Cluster II is associated with a higher proportion of immunosuppressive cells, like M2 macrophages and regulatory T cells, suggesting that this subtype might be more immune evasive. Comparing the score differences between clusters, we evaluated whether there were significant differences in DEHGs between clusters to confirm the robustness of the outcomes. PCA-based analysis of DEHGs clustering variations indicated a statistically significant disparity between the two clusters (Fig. [119]8I), with Cluster 2 exhibiting a greater score and Cluster 1 a lower score. Likewise, a statistically significant disparity in DEG values was seen between the two groups, with CI achieving higher scores and CII attaining lower levels. Figure [120]8J illustrates that DEHGs cluster C1 mostly aligns with DEG cluster CII, while DEHGs cluster C2 correlates to DEG cluster CI. Furthermore, elevated and diminished DEHG scores mostly align with DEG clusters CI and CII, respectively. Fig. 8. [121]Fig. 8 [122]Open in a new tab Analysis of Molecular Characteristics and Immune Microenvironment of DEHGs Subtypes. (A) Volcano plot displaying differentially expressed genes between the two subtypes, with red representing upregulated genes and blue representing downregulated genes. (B) Heatmap of gene expression, demonstrating clustering of samples into two distinct subtypes. (C) GO enrichment analysis of biological processes involved in each subtype. (D) Top 20 enriched pathways between the subtypes, showing significant involvement of immune and metabolic pathways. (E) Consensus clustering CDF curve and delta area plot, identifying the optimal number of clusters (K = 2). (F) PCA plot displaying the separation between the two subtypes. (G) Expression patterns of DEGs and DEHGs in clusters CI and CII. (H) Box plot showing differential expression analysis of DEHG scores between DEG clusters. (I) Box plot displaying differential expression analysis of DEHG scores between DEHG clusters. (J) Alluvial diagram illustrating the correspondence between different sample clusters. Discussion Pancreatic cancer is a highly aggressive disease that is often diagnosed at a progressive stage, leaving limited treatment options. In recent years, research has indicated that curcumin, a natural compound, may have a potential role in the management of PC^[123]19. Nevertheless, the particular pathways by which curcumin targets the disease are still unclear. In this investigation, we utilized network pharmacology, molecular docking, and machine learning models to identify DEHGs associated with PC. By analyzing the interactions between curcumin and pathogenic targets of PC, we aimed to deepen our understanding of curcumin’s multi-target characteristics, providing a theoretical foundation for its application in PC treatment. This approach not only helps elucidate the mechanisms of curcumin’s action but may also offer new insights for developing innovative therapeutic strategies. Our research yielded several important findings. First, through a “drug-disease-target” network analysis, we discovered that curcumin’s therapeutic effects on PC are primarily mediated through multiple molecular targets. We utilized a PPI network to identify core targets, followed by differential expression analysis of these targets in the GEO dataset to recognize DEHGs. Machine learning further refined the selection of these genes, highlighting five key feature genes (VIM, CTNNB1, CASP9, AREG, and HIF1 A) as significant targets in PC. These genes have vital roles in the occurrence and progression of the disease, potentially influencing the biological characteristics of tumors and patient prognosis. For instance, VIM is essential in the epithelial-mesenchymal transition (EMT) process in PC cells^[124]20. The CTNNB1 gene, which encodes β-catenin, often shows abnormal activation in PC cells, closely linked to tumor proliferation, survival, and metastasis^[125]21. CASP9 is closely related to the apoptosis process, with studies showing that it is frequently downregulated in PC cells, leading to chemotherapy resistance and promoting tumor survival. AREG also contributes to tumor progression by mediating the EMT process. Its expression in PC cell lines can significantly impact cell migration and invasion capabilities^[126]22. Additionally, under hypoxic conditions, HIF1 A binds to the promoter of the fasciculin protein, resulting in increased levels of this protein, which is connected with poor differentiation and prognosis in PC^[127]23. These results underscore the function of these genes in promoting the invasive characteristics of PC cells. A comprehensive knowledge of these genes and their interactions is vital for developing new therapeutic strategies. Furthermore, we created a nomogram utilizing the five feature genes determined by SVM to quantitatively evaluate the incidence of PC and evaluate the curcumin treatment sensitivity and accuracy. Molecular docking and analysis of the dataset of GEO were utilized to validate the interactions of these feature genes and the model construction. Our study demonstrates that the constructed nomogram effectively distinguishes between normal and PC groups, showing good clinical utility for risk prediction in PC. While CT and MRI are crucial for staging pancreatic cancer, they often miss early or subtle cases^[128]24. Our gene-based nomogram, built on the expression of VIM, CTNNB1, CASP9, AREG, and HIF1 A, offers a simplified yet effective risk assessment tool. Decision curve analysis shows consistent net benefit across thresholds, highlighting its potential to aid early risk stratification and guide clinical decisions. This model complements traditional imaging by providing an interpretable, molecular-based approach to patient evaluation. The molecular docking of curcumin with PC cells not only elucidates its binding affinity but also provides insights into its mechanisms of action. In our study, docking analysis revealed that curcumin stably binds to the five key targets, with binding energies all below − 5.0 kcal/mol, indicating favorable and stable interactions. The exceptionally low binding energy observed with AREG (–147.879 kcal/mol), despite being an outlier possibly due to structural or computational factors, suggests particularly strong binding. The other targets also demonstrated significant binding affinities (CASP9: − 11.496 kcal/mol; HIF1 A: − 9.771 kcal/mol; CTNNB1: − 9.705 kcal/mol; VIM: − 8.371 kcal/mol). Functionally, these targets regulate key pathogenic pathways such as EMT, apoptosis, and hypoxia response, which are critical for PC progression. The combination of computational binding stability and known biological roles supports the hypothesis that curcumin may exert therapeutic effects by modulating these proteins^[129]20–[130]23,[131]25. While molecular docking provides a theoretical basis, further experimental validation will be necessary to confirm these interactions and their therapeutic relevance. Nevertheless, this integrated approach offers a solid foundation for understanding the multi-targeted mechanisms of curcumin in PC treatment. The two molecular subtypes of PC identified based on DEHGs exhibit distinct transcriptional landscapes that profoundly influence tumor progression, immune evasion, and microenvironmental remodeling. To better understand the biological underpinnings of this stratification, we examined the interplay among the differentially expressed genes and their associated pathways. Cluster 2 displayed significant upregulation of genes such as MMP2, HMOX1, STAT3, IL1 A, IL6, HIF1 A, CASP9, TGFB1, TNF, STAT1, NFKB1, HSP90 AA1, PTGS2, and AREG. These genes are widely recognized for promoting tumor proliferation, inflammation, apoptosis resistance, and immune evasion. For instance, STAT3, IL6, and TNF are key inflammatory mediators that activate downstream NF-κB signaling and support a tumor-permissive microenvironment^[132]26,[133]27. MMP2 facilitates extracellular matrix degradation and invasion^[134]28, while HIF1 A and HMOX1 contribute to hypoxia tolerance and oxidative stress response^[135]29, respectively. The activation of TGFB1 and PTGS2 indicates involvement in epithelial–mesenchymal transition (EMT) and immunosuppressive signaling^[136]30,[137]31. On the other hand, Cluster 1 was marked by the upregulation of MAPK3, PPARG, and TJP1, genes known for their roles in maintaining epithelial integrity, modulating cell growth, and promoting anti-inflammatory responses. PPARG acts as a transcriptional regulator of lipid metabolism and is linked to immune homeostasis^[138]32. TJP1 (tight junction protein 1) is critical for preserving epithelial barrier function^[139]33, while MAPK3 is involved in mitogen-activated signaling and may counterbalance inflammatory signaling^[140]34. These gene expression patterns are aligned with our GSVA results, where Cluster 2 showed enrichment in proliferative and metabolic pathways, including oxidative phosphorylation and insulin resistance, whereas Cluster 1 was enriched in pathways related to immune surveillance, autophagy, and cellular homeostasis, such as the NOD-like receptor signaling and phosphatidylinositol signaling system. Therefore, Cluster 2 represents a more aggressive and immune-suppressive molecular subtype of PC, while Cluster 1 maintains epithelial characteristics and a more active immune microenvironment. This stratification has potential implications for prognosis and therapeutic targeting. Lastly, we explored the potential pathways of curcumin’s effects on PC from an immune perspective. ssGSEA analysis exhibited statistically significant variations in the expression of certain immune cells between the normal and PC groups. Correlation analysis of DEHG expression with immune cells indicated that genes such as IL6 and NFKB1 were strongly positively correlated with immune-activating cells (e.g., activated mast and memory CD4 + T cells)^[141]35,[142]36, while genes like MAPK3, MET, and TERT showed significant negative correlations with immunosuppressive cells, like regulatory T cells (Tregs) and M2 macrophages^[143]37–[144]39. These correlations underscore the potential roles of these genes in regulating the immune microenvironment in PC and highlight their value as therapeutic targets. Clustering analysis further extended these findings, suggesting that the mechanisms through which curcumin treats PC are related to immune pathways such as IL17, TNF, NF-kB, and NOD-like receptors. These results align with the current understanding that curcumin’s therapeutic mechanisms primarily involve modulating inflammatory responses, blocking pro-cancer signaling pathways, and improving the immune microenvironment^[145]40–[146]44, thereby suppressing the proliferation and metastasis of PC cells through various means. Overall, our analysis through molecular docking and network pharmacology validates the multi-target mechanisms of curcumin in PC, providing a theoretical basis for its potential use as an adjuvant treatment. However, the current study is primarily based on network pharmacology and bioinformatics analyses; the efficacy of curcumin in experimental inflammation and clinical applications still requires further verification. Due to the high computational demands and resource constraints, molecular dynamics simulations were not performed at this stage. We acknowledge this as a limitation and plan to incorporate MD analysis in future work to further validate the dynamic stability of the complexes. Future research should incorporate large-scale clinical trials to explore the effects of curcumin in combination with other treatment modalities, such as radiotherapy and chemotherapy, to offer more therapeutic choices for individuals with PC. Conclusion This study integrates multiple databases and machine learning models to reveal the key targets and regulatory mechanisms of curcumin in PC, particularly in relation to immune cell infiltration. We identified 35 DEHGs, among which five feature genes (VIM, CTNNB1, CASP9, AREG, HIF1 A) were utilized to construct a nomogram model with clinical predictive value. Molecular docking results suggest that these genes may serve as potential binding sites for curcumin. Furthermore, clustering analysis based on DEHGs expression categorized PC samples into four subgroups, revealing significant differences in immune infiltration and gene expression among them. These findings provide new directions for future clinical research, indicating that curcumin may serve as a potential adjuvant drug in PC immunotherapy. Electronic supplementary material Below is the link to the electronic supplementary material. [147]Supplementary Material 1^ (444.9KB, docx) Author contributions H.M.X, conceived the idea for the investigation. J.B.L, obtained the investigations for inclusion and abstracted the data. H.B.H, conducted the statistical analysis. Z.W.Z, and J.X.L, critically revised the paper. All authors participated in the article’s creation and accepted the submitted version. Funding This work was supported by the 2024 Zhongshan Traditional Chinese Medicine Heritage and Innovation Development Research Program (Grant No. 2024B3073). Data availability All data produced or examined during this investigation are included in this publication. Declarations Competing interests The authors declare no competing interests. Footnotes Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. These authors contributed equally: HongMing Xie and JieBin Liang. Contributor Information Zewei Zhuo, Email: zhuozewei@gdph.org.cn. JiaXuan Li, Email: hcoco_001@163.com. References