Abstract This study was based on the use of whole-genome DNA methylation sequencing technology to identify DNA methylation biomarkers in tumor tissue that can predict the prognosis of patients with pancreatic cancer (PCa). TCGA database was used to download PCa-related DNA methylation and transcriptome atlas data. Methylation driver genes (MDGs) were obtained using the MethylMix package. Candidate genes in the MDGs were screened for prognostic relevance to PCa patients by univariate Cox analysis, and a prognostic risk score model was constructed based on the key MDGs. ROC curve analysis was performed to assess the accuracy of the prognostic risk score model. The effects of PIK3C2B knockdown on malignant phenotypes of PCa cells were investigated in vitro. A total of 2737 differentially expressed genes were identified, with 649 upregulated and 2088 downregulated, using 178 PCa samples and 171 normal samples. MethylMix was employed to identify 71 methylation-driven genes (47 hypermethylated and 24 hypomethylated) from 185 TCGA PCa samples. Cox regression analyses identified eight key MDGs (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, PIK3C2B, IGF1R) associated with prognosis in PCa. Seven of them were hypermethylated, while PIK3C2B was hypomethylated. A prognostic risk prediction model was constructed based on the eight key MDGs, which was found to accurately predict the prognosis of PCa patients. In addition, the malignant phenotypes of PANC-1 cells were decreased after the knockdown of PIK3C2B. Therefore, the prognostic risk prediction model based on the eight key MDGs could accurately predict the prognosis of PCa patients. Keywords: Pancreatic cancer, Whole-genome DNA methylation, TCGA database, Methylation driver genes, Prognostic risk prediction model, Prognostic markers A list of abbreviation words DCA Decision curve analysis DEG Differentially expressed gene DMG Differentially methylated gene MDG Methylation driver gene MSK Memorial Sloan-Kettering Cancer NCI National Cancer Institute NHGRI National Human Genome Research Institute ROC, Receiver Operating Characteristic Curve TCGA The Cancer Genome Atlas 1. Introduction Pancreatic cancer (PCa) is the 14th most common cancer in the world, and the 7th highest cause of cancer mortality, with a 5–7% 5-year survival rate, and the age-standardized incidence of PCa increased by 1.03 % per year between 1973 and 2014, meaning that it is expected to rise from the 4th to the 2nd most common cause of cancer-related deaths in the US by 2030 [[41]1]. Incidence rates vary widely by country, with age-standardized incidence rates highest in Europe and North America and lowest in Africa and South Central Asia, with a general trend towards higher incidence rates in developed countries compared with developing countries and higher incidence rates of PCa in both men and women in higher HDI countries [[42]2]. The causes of PCa are complex, and the main causes that are relatively difficult to modify are age, sex, race, blood group, gut microbiota, diabetes, family history, and genetic predisposition [[43]3]; the wide variation in the incidence of PCa between countries also suggests that environmental factors play an important role as risk factors for the disease and could be improved by modifying the environment and lifestyle. Examples include smoking, alcohol, chronic pancreatitis, obesity, dietary factors, and H. pylori infection [[44]4]. Currently, analytical methods used for whole-genome DNA methylation sequencing are divided into (1) track-specific methods (MSP, methyl light, combined bisulfite restriction analysis, bisulfite-sanger sequencing, bisulfite pyrose sequencing, base-specific cleavage coupled to mass spectrometry, heavy methylation, methylation-sensitive single nucleotide primer extension, methylation-sensitive melting curve analysis, methylation-sensitive high-resolution melting); (2) global estimation of methylcytosine content (aminomethylation analysis); (3) genome-scale methods (restriction landmark genome scanning, methylation-specific arbitrary primer PCR, amplification of inter methylation sites, methylation-sensitive representation differential analysis); (4) microarray-based methods (restriction enzyme treatment, affinity enrichment, bisulfite treatment) [[45]5]. DNA promoter hypermethylation as an early oncogenic mechanism, researchers developed a diagnostic prediction model based on an optimized accelerated bisulfite treatment protocol, methylation-specific PCR of 28 genes on plasma samples. They found that cell-free DNA promoter hypermethylation has the potential to be a diagnostic marker for pancreatic adenocarcinoma and to distinguish between malignant and benign pancreatic disease [[46]6]. Using a whole-genome pharmacological transcriptomic approach to identify novel cancer-specific DNA methylation alterations in PCa cell lines, the investigators identified promoter DNA methylation of BNC1 and ADAMTS1 among eight promising genes as potential biomarkers for the detection of early PCa [[47]7]. Important genes associated with cancer development are called driver genes, and driver genes determine the most important cause of this cancer. Methylated driver genes are oncogenes driven by DNA methylation. If a gene is under-expressed in a cancer sample compared to a normal sample, this gene is highly methylated in the cancer sample. Therefore, we consider this gene a driver gene, and its differential methylation value (DM-value) is its degree of differential methylation. For mixed samples of multiple identical cancer types, it is also possible to subtype that cancer type based on their DM-values [[48]8]. A growing body of research suggests that altered DNA methylation plays an important role in PCa, with Koutsioumpa et al. finding that lysine (K)-specific methyltransferase 2D (KMT2D), which is subject to biallelic CpG methylation, undergoes transcriptional repression and gene inactivation in human pancreatic tumors thereby promoting tumor growth; in parallel, knockdown of KMT2D promotes tumor growth by regulating SLC2A3 increased aerobic glycolysis and proliferation rates and altered the lipidome profile of PCa cells [[49]9]. Researchers identified eight genes in 65 invasive PCa samples that showed aberrant hypermethylation, such as the SARP2 gene. Epigenetic inactivation caused by aberrant methylation could confer resistance to apoptosis, leading to PCa progression [[50]10]. Our study aimed to screen for DNA methylation markers that predict prognosis in PCa patients based on whole-genome DNA methylation sequencing technology. Scientifically, our study provides a strong theoretical basis for screening eight key MDGs (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, PIK3C2B, and IGF1R) that could be used as prognostic markers for PCa patients by whole-genome DNA methylation sequencing technology for the first time, which is of great scientific importance. PCa is a very lethal solid tumor with a poor prognosis. Surgery remains the basis for the cure of PC, but the majority (80–85 %) of patients presenting with unresectable or metastatic tumors are lost to surgery [[51]11]. In addition to advances in surgical techniques and chemotherapeutic approaches, exploring effective and new biomarkers is another effective way to improve early diagnosis rates and predict prognosis in patients with PCa [[52]12]. The prognostic risk prediction model constructed in our study based on eight key MDGs could accurately predict the prognosis of PCa patients for early preventive and therapeutic measures, which is expected to improve the survival rate of PCa patients and bring hope for survival to PCa patients and is of great clinical importance. 2. Materials and methods 2.1. Acquisition and analysis of DNA methylation data associated with PCa The Cancer Genome Atlas (TCGA) is a joint project of the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) initiated in 2006. It is an important source of clinical data, genomic variation, mRNA expression, miRNA expression, methylation, and other data for cancer researchers. We retrieved human PCa methylation data and transcriptomic data from the TCGA database. The methylation data consisted of 185 tumor samples and 10 adjacent non-tumor samples, obtained from the Illumina Human Methylation 450k platform. The transcriptomic analysis data included 178 tumor samples and 4 adjacent non-tumor samples, with no information on subtype expression and miRNA expression quantification. To augment our study, we also obtained a set of 167 pancreatic transcriptome sequencing data from the GTEx database, specifically from the normal control group. This data was downloaded from the UCSC-XENA website ([53]http://xena.ucsc.edu/). First, we normalized the downloaded data and used the limma package in R to obtain differentially expressed genes (DEGs) and differentially methylated genes (DMGs). Then, we used the MethylMix package for integrated analysis. MethylMix is a statistical package for R software. The function of this package is to identify potential methylation driver oncogenes by integrating DNA methylation data and RNA expression data to identify methylation driver genes (MDGs), i.e., if a gene is highly expressed and hypomethylated, or low expressed and hypermethylated, and the gene expression level is significantly negatively correlated with the methylation level, then we consider this gene is considered to be MDGs. Using the MethylMix algorithm, we calculated the correlation between gene methylation levels and gene expression and constructed β mixed models using |logFC| > 0.5, P < 0.05, and Cor < −0.3 to obtain MDGs [[54]13]. The data provided by TCGA is publicly available and does not require the approval of the local ethics committee. 2.2. GEO database In this study, we obtained the PCa RNA-seq microarray dataset ([55]GSE183795) from the Gene Expression Omnibus (GEO) database ([56]https://www.ncbi.nlm.nih.gov/geo/). The sequencing platform used was [57]GPL6244, and the microarray dataset included a total of 139 PCa tissues and 102 adjacent tissues. 2.3. Functional enrichment analysis of GO and KEGG for DEGs After obtaining MDGs, to further understand the biological functions of these selected genes, we performed GO analysis and KEGG pathway enrichment analysis using the “ClusterProfiler” R package for GO and KEGG enrichment analysis at the biological process (BP), cellular component (CC) and P < 0.05 was set as the standard threshold, and the results were visualized using Goplot [[58]14]. 2.4. Constructing a prognostic risk score model based on MDGs To analyze the relationship between MDGs and prognosis, we performed a one-way Cox analysis using the “survival” package in conjunction with the prognostic outcome of PCa in TCGA. Lasso regression and multivariate Cox analysis were performed on the significant correlates (genes) from the univariate Cox analysis results. Risk values were calculated for each PCa patient to construct a prognostic risk score model. The “survival” package and the “survminer” package ([59]https://CRAN.R-project.org/package=survivalsvm) were used to draw prognostic. The survival curve of the risk score model was calculated, and the p-value was calculated. Patients were divided into high-risk and low-risk groups according to the median risk score. Kaplan-Meier survival analysis was performed to compare the difference in overall survival (OS) between the high- and low-risk groups, with P < 0.05 selected as the cut-off value. ROC curve analysis was performed on the prognostic risk score model using the “survivalROC” package ([60]https://CRAN.R-project.org/package=survivalROC) and the prognostic risk score model AUC (Area Under Curve) values [[61]15,[62]16]. 2.5. Construction and validation of Nomograms To expand the clinical application and usability of prognostic risk scoring models, a column line graph was constructed that could be easily used to calculate the expected survival of individual PCa patients by combining the genes involved in the construction of the prognostic risk scoring model with common clinicopathological features, using the 'rms' package to Nomograms were drawn for the genes involved in the construction of the prognostic risk score using the 'rms' package and calibration curves were drawn to assess the agreement between actual and predicted values. decision curve analysis (DCA) is an evaluation method developed by Memorial Sloan Kettering Cancer Institute (MSKCC) in 2006 [[63]17]. DCA is a method for evaluating and comparing multiple clinical prediction models [[64]18]. 2.6. Analysis of genetic alterations The cBio Cancer Genomics Portal, cancer genomics portal (cBioPortal), is a major online platform for analyzing cancer genomics data. cBio Cancer Genomics Portal ([65]http://cbioportal.org), funded by Memorial Sloan-Kettering The cBio Cancer Genomics Portal (.), funded by Memorial Sloan-Kettering Cancer (MSK), is designed to address a large amount of data available from extensive sample tumor genomics studies to facilitate the more accessible and more direct application of large sample tumor genomic findings to oncology research [[66]19]. LungCNS/Brain is a tool for visualizing cancer genomic data analysis. Genomics data analysis could be performed through a simple set-up of the web interface to explore the association between genetic alterations and clinicopathological features. No special bioinformatics knowledge is required. The results are presented graphically and easy to understand, and article-quality graphs could be produced to facilitate the use of the results. To date, 169 studies derived from multiple databases such as TCGA, ICGC, and published papers are included (data list: Head/Neck [67]http://www.cbioportal.org/data_sets.jsp); data types include somatic mutations, DNA copy number alterations, mRNA and microRNA expression, DNA methylation, protein abundance; data available for download. We used cBioPortal to study genetic alterations in MDGs in PCa patients [[68]20]. 2.7. GSEA pathway enrichment analysis Gene Set Enrichment Analysis (GSEA) is used to assess the trend in the distribution of genes in a predefined gene set in a table of genes ranked by their phenotypic relevance to determine their contribution to the phenotype. The input data consists of two parts, a gene set of known functions (either GO annotations, MsigDB annotations, or other formatted gene set definitions) and an expression matrix (also a sorted list), where the software sorts the genes according to their phenotypic relevance (understood as the change in expression value) from highest to lowest and then determines whether the genes under each annotation in the gene set are enriched in the upper or lower part of the gene list after the phenotypic correlation ranking, and thus determine the effect of synergistic changes in genes within this gene set on phenotypic changes. This is distinctly different from using significant differential genes for enrichment analysis, which screens for differential genes and then determines in which annotated pathways the differential genes are enriched; this involves setting a differential threshold, which is subjective and could only be used for genes with large variation in expression, i.e., what we define as significant differential genes. GSEA, on the other hand, is not limited to differential genes and is theoretically more likely to encompass the effects of subtle but coordinated changes in biological pathways, especially in sets of genes with less significant fold differences [[69]21,[70]22], from the perspective of gene set enrichment. The GSEA pathway enrichment analysis was performed using the “fgsea” package and the “clusterprofiler” of R software. All analytical methods and R packages above were performed using v4.0.3 of R software (R Foundation for Statistical Computing, 2020). p < 0.05 was considered statistically significant. 2.8. Cell culture and transfection The normal human pancreatic cell line HPDE6c7 was purchased from Shanghai Zeye Biotechnology Co., Ltd. (AC338285). The PANC-1 PCa cell line was obtained from the ATCC (CRL-1469). The PCa cell lines, BxPC-3 (CBP60542) and AsPC-1 (CBP60546), were purchased from Nanjing Cobioer Biotechnology Co., Ltd. All the aforementioned cell lines were cultured in DMEM (10569044, Gibco, USA) supplemented with 10 % FBS (100099141, Gibco, USA) and 1 % penicillin-streptomycin (15070063, Gibco, USA) and incubated in a CO[2] incubator at 37 °C [[71]23]. The cells were digested with trypsin when PANC-1 and BxPC-3 cells were in the log phase, inoculated in 6-well plates at 1 × 10^5 cells per well, and incubated routinely for 24 h. After the cells had reached about 40 % fusion, the cells were infected according to the lentivirus infection instructions and divided into the sh-NC group (infected with the no-load control lentivirus sh-NC), sh-PIK3C2B-1 group (infected with the lentivirus sh -PIK3C2B-) and sh-PIK3C2B-2 (infected with the lentivirus sh-PIK3C2B-2), and cells were collected 72 h after transfection for subsequent experiments. shRNA sequences are shown in [72]Table S1. 2.9. RT-qPCR Total tissue or cellular RNA was extracted using Trizol (16096020, Thermo Fisher Scientific, New York, NY, USA). cDNA was obtained by reverse transcription using a reverse transcription kit (RR047A, Takara, Japan). cDNA was spiked using an SYBR Premix Ex TaqTM II kit (DRR081, Takara, Japan). Spiking was performed, and samples were subjected to RT-qPCR reactions in a real-time fluorescence quantitative PCR instrument (ABI 7500, ABI, Foster City, CA, USA). GAPDH was used as an internal reference for the coding gene. RT-qPCR was set up with 3 replicate wells. The 2-ΔΔCt method was used to quantify gene expression [[73]24]. The primer design is shown in [74]Table S2. 2.10. Western Blot analysis Total protein was extracted from cells using RIPA lysis buffer (P0013B, Beyotime Biotechnology) and incubated on ice for 30 min. The lysate was then centrifuged at 8000 g for 10 min at 4 °C, and the supernatant was collected. Protein concentration was determined using the BCA assay kit ([75]A53226, ThermoFisher, USA). After separation by polyacrylamide gel electrophoresis, the proteins were transferred to a PVDF membrane (IPVH85R, Millipore, Germany) using the wet transfer method. The membrane was blocked with 5 % BSA at room temperature for 1 h and then incubated overnight at 4 °C with mouse anti (MA5-26243, Invitrogen) and rabbit anti-GAPDH (ab9485, Abcam) primary antibodies. After washing the membrane three times with TBST (each for 10 min), it was incubated for 1 h with HRP-conjugated goat anti-rabbit IgG H&L secondary antibody (ab97051, diluted 1:2000, Abcam, Cambridge, UK) or HRP-conjugated goat anti-mouse IgG H&L secondary antibody (31430, 1:2000, Invitrogen). After TBST washing, the membrane was placed on a clean glass plate. An appropriate amount of A and B solutions from the ECL fluorescence detection kit (abs920, Absin Biotechnology Co., Ltd., Shanghai, China) were mixed and then applied to the membrane in a darkroom. The relative protein levels were analyzed using Quantity One V4.6.2 software from Bio-Rad (USA), with the grayscale value of the corresponding protein band normalized to the grayscale value of the GAPDH protein band [[76]25]. 2.11. CCK-8 Cell viability after lentivirus treatment was assessed using the CCK-8 (K1018, Apexbio, USA) kit by spreading 1 × 10^4 cells per well in 96-well plates (100 μL/well), followed by the addition of 10 μL of CCK-8 solution at each time point (0 h, 24 h, 48 h, and 72 h) for 2 h at 37 °C. The absorbance of each sample at 480 nm was measured by an enzyme marker (Bio-Rad, Hercules, CA, USA) to measure the absorbance of each well at 480 nm [[77]26]. 2.12. Scratch assay Cells were scraped from the bottom of the wells using a 10 μl tip, and the floating cells were gently washed twice using a DMEM medium. Images of each well were captured using a microscope (magnification, × 100) at 0 and 24 h intervals after wounding (Olympus X51; Olympus, Tokyo, Japan) [[78]27]. 2.13. Transwell assay Transwell chambers with 8 μm pore size polycarbonate membranes containing matrigel were pre-filled with 600 mL of DMEM medium with 20 % FBS in the lower chamber and equilibrated at 37 °C for 1 h. PANC-1 and BxPC-3 cells were resuspended in DMEM medium without FBS after 48 h of transfection, and 1 × 10^6/mL of cells were inoculated into the upper chamber and incubated at 37 °C with 5 % CO[2] for 24 h. The cells were fixed with 4 % paraformaldehyde for 20 min and stained with 0.1 % crystal violet for 10 min. The surface cells were wiped off, and the cells were observed under an inverted fluorescent microscope (TE2000, Nikon, Japan). Five fields of view were photographed. The number of cells passing through the chambers was counted [[79]26]. 2.14. Statistical analysis All statistical analyses of our study data were performed using SPSS 21.0 (IBM, USA) statistical software. Measures were expressed as mean ± standard deviation, first tested for normality and chi-square, tested for conformity to normal distribution and chi-square, and unpaired t-test between two groups; data were compared between groups at different time points using repeated measures ANOVA with Bonferroni method for post-hoc tests. p < 0.05 indicated statistically significant differences. 3. Results 3.1. Identification of 2737 DEGs and 71 MDGs in PCa samples through bioinformatics analysis A total of 3548 DEGs were identified from 178 PCa samples and 171 normal samples (4 from TCGA database, 167 from GTEx dataset). Among these DEGs, 1752 were upregulated, and 1796 were downregulated ([80]Fig. 1A). MDGs analysis using the MethylMix package identified 71 MDGs (50 hypermethylated and 21 hypomethylated) from 185 TCGA PCa samples and 10 normal samples. The methylation levels of these MDGs were displayed using a heatmap ([81]Fig. 1B). An intersection analysis between DEGs and MDGs, conducted using the jvenn tool, revealed that a majority of MDGs were also differentially expressed ([82]Fig. 1C). Fig. 1. [83]Fig. 1 [84]Open in a new tab Screening of DEGs and MDGs in PCa samples based on the TCGA database. Note: A: Volcano plot showing DEGs between normal and PCa samples in the TCGA database (blue dots represent upregulated genes, red dots indicate downregulated genes). B: Heatmap analysis of MDGs in PCa from the TCGA database (where the colors range from green to red, indicating the trend from low to high methylation). C: Venn diagram showing the intersection between DEGs and MDGs. D: Bubble chart presenting the results of the GO analysis for MDGs. E: Bubble chart illustrating the results of the KEGG enrichment analysis for MDGs (where the size of the circles represents the number of selected genes and the color represents the P-value of the enrichment analysis). GO analysis revealed significant enrichment of these genes in anatomical structure morphogenesis, circulatory system development, kinase activity, protein kinase activity, and membrane raft ([85]Fig. 1D). Pathway analysis using the KEGG database showed significant enrichment of these genes in Pathways in cancer, FoxO signaling pathway, and MAPK signaling pathway ([86]Fig. 1E). 3.2. Prognostic risk score model based on 8 key MDGs accurately predicts the prognosis of PCa patients To further determine the relevance of 71 MDGs to the prognosis of PCa patients, we screened 54 MDGs based on MDG-related survival data in TCGA_PADD patients by one-way Cox analysis to correlate with the prognosis of PCa patients. Next, the results of the univariate analysis were further screened using Lasso regression and multivariate Cox analysis, resulting in 8 key MDGs (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, PIK3C2B, IGF1R) which could be used as PCa prognostic genes ([87]Table S3; [88]Fig. 2A and B) and given a risk score Calculation formula. Risk score = (−0.1231) * LEF1 + (-8e^−04) * ZIC3 + (−0.0732) * VAV3 + (−0.2221) * TBC1D4 + (−0.0234) * FABP4 + (0.0251) * MAP3K5 + (0.0898) * PIK3C2B + (−0.0481) * IGF1R. Among these 8 key MDGs, the methylation levels of LEF1, ZIC3, VAV3, TBC1D4, FABP4, and IGF1R were negatively correlated with the risk score, while the methylation levels of the remaining genes were positively correlated with the risk score. Fig. 2. [89]Fig. 2 [90]Open in a new tab Risk score model to predict prognosis of PCa based on 8 key MDGs. Note: A: Partial likelihood deviance of LASSO coefficient distribution, Lambda = 0.04; B: Through LASSO analysis, 8 candidate genes are identified, Lambda = 0.08; C: Compare and analyze the prognostic differences between high-risk group and low-risk group of PCa patients' OS using Kaplan-Meier curve based on the risk scores analyzed by LASSO for 8 key MDGs; D: ROC analysis at 1, 3, and 5-year time points based on the risk scores analyzed by LASSO; E: Kaplan-Meier curve analysis of the progression-free survival (FPS) of PCa patients for the 8 key MDGs; F: Display the distribution of high-risk group, low-risk group, and survival status. Based on the median risk score, PCa patients (excluding those with a survival time of 0 or NA, leaving a total of 175 patients) were further categorized into the high-risk group (n = 88) and the low-risk group (n = 87). Kaplan-Meier survival curves revealed a statistically significant difference in OS between the high-risk and low-risk groups ([91]Fig. 2C). The predictive performance of the risk scoring model was assessed using the ROC curve, and the results showed that the area under the curve (AUC) for the prognostic risk scoring model based on 9 key MDGs was greater than 0.60, suggesting its accurate prediction of PCa prognosis ([92]Fig. 2D). Furthermore, the curves of progression-free survival were analyzed for the high-risk and low-risk groups, with the high-risk group exhibiting significantly lower progression-free survival compared to the low-risk group ([93]Fig. 2E). We also explored the relationship between the different risk scores and patient follow-up time, events, and changes in gene expression, and observed a clear decrease in patient survival rate as the risk score increased ([94]Fig. 2F). These findings indicate that the prognostic risk scoring model based on 8 key MDGs can accurately predict PCa prognosis. 3.3. Construction of gene nomogram to predict OS prognosis in PCa patients In order to enhance the clinical application and usability of prognostic risk scoring models, a gene-clinical-pathological nomogram was constructed by combining the genes involved in building the prognostic risk scoring model with the clinical pathological characteristics of PCa patients, including age, sex, stage classification, grade staging, and FPS. This nomogram provides a convenient tool for calculating the expected survival rate of individual PCa patients. In both univariate and multivariate regression analyses, age, PFS, and MAP3K5 were identified as significant predictive factors ([95]Table 1). Considering all of these significant predictive factors, the constructed nomogram is presented in [96]Fig. 3A, with a C-index of 0.68 and P < 0.001. Table 1. Univariate and multivariate regression analysis. Characteristics __________________________________________________________________ Univariate analysis __________________________________________________________________ Multivariate analysis __________________________________________________________________ HR (95%CI) Pvalue HR (95%CI) Pvalue Age 1.029 (1.008–1.051) 0.007 1.028 (1.001–1.055) 0.045 Sex MALE Reference Reference FEMALE 1.219 (0.810–1.837) 0.343 1.160 (0.667–2.019) 0.599 Stage StageI Reference Reference StageII 2.333 (1.069–5.090) 0.033 1.152 (0.468–2.840) 0.758 StageIV 2.100 (0.430–10.251) 0.359 1.558 (0.118–20.594) 0.737 StageIII 1.256 (0.153–10.284) 0.832 0.000 (0.000-Inf) 0.995 Grade G1 Reference Reference G2 1.977 (1.017–3.843) 0.044 1.314 (0.533–3.241) 0.553 G3 2.614 (1.299–5.262) 0.007 1.136 (0.454–2.843) 0.785 G4 1.643 (0.210–12.824) 0.636 0.148 (0.015–1.436) 0.099 GX 1.763 (0.226–13.735) 0.588 2.779 (0.222–34.856) 0.428 PFS 0.995 (0.993–0.996) <0.001 0.994 (0.992–0.995) <0.001 VAV3 1.000 (1.000–1.000) 0.422 1.000 (1.000–1.000) 0.804 TBC1D4 1.000 (1.000–1.000) 0.419 1.000 (1.000–1.000) 0.474 FABP4 1.000 (1.000–1.000) 0.758 1.000 (1.000–1.000) 0.369 MAP3K5 1.000 (1.000–1.000) 0.759 1.000 (1.000–1.000) 0.448 LEF1 1.000 (0.999–1.001) 0.915 1.000 (0.998–1.002) 0.953 PIK3C2B 1.001 (1.000–1.002) 0.225 1.001 (0.999–1.002) 0.548 IGF1R 0.992 (0.986–0.997) 0.003 0.988 (0.979–0.998) 0.016 ZIC3 0.998 (0.992–1.004) 0.537 1.002 (0.995–1.009) 0.602 [97]Open in a new tab Fig. 3. [98]Fig. 3 [99]Open in a new tab Construction of gene nomogram based on 8 MDGs and clinicopathological characteristics of PCa patients. Note: A: Plot of results of one-way Cox analysis of gene expression and clinicopathological characteristics (P value, risk factor HR, and confidence intervals are shown); B: Plot of results of multi-way Cox analysis of gene expression and clinicopathological characteristics (P value, risk factor HR and confidence intervals are shown); C: Columnar line plot constructed by MDGs (predicts 1, 3 and 5-year total survival, C_index 0.68, p < 0.001); D: OS calibration curve for the column line graph model (diagonal dashed lines indicate ideal column line graphs, blue, red and orange lines indicate observed 1-, 3- and 5-year column line graphs); E: Decision curve for OS at X years (DCA curve) (horizontal coordinates represent threshold probabilities, which could be interpreted as the probability of being grouped by a value The threshold probability could be interpreted as the number of samples greater than this value always/total number of samples after grouping by a certain value. The vertical coordinate is the net benefit, the relative benefit obtained by subtracting the proportion of true positives from the proportion of false positives weighted by the threshold probability ratio. (for other models, the model is better). Moreover, the calibration curves for the OS at 1 year, 3 years, and 5 years demonstrated the accurate predictive capability of the model ([100]Fig. 3B). Additionally, the DCA revealed that the gene-based nomogram possesses strong clinical predictive ability ([101]Fig. 3C–E). The net benefit results indicated that the combined model outperforms the single-factor model, suggesting that the nomogram can assist clinicians in accurately assessing patient prognosis. Due to its construction based on multiple prognostic factors, the nomogram is more effective than individual factors alone, performing well in short-term and long-term OS predictions. Therefore, we believe that the nomogram has the potential to aid physicians in making informed medical decisions throughout the diagnosis and treatment process of PCa. 3.4. Verification of the stability of MDG's predictions and prognosis in the external validation set To assess the reliability and reproducibility of our prognosis model, we performed an independent external validation using the GEO dataset [102]GSE183795 which included 139 PCa patients with complete clinical information out of a total of 134 patients. Subsequently, patients in each cohort were divided into high-risk and low-risk groups based on their stroke risk scores. Across all three validation cohorts, patients in the high-risk group exhibited poorer survival outcomes ([103]Fig. 4A). The predictive performance of the prognosis risk scoring model constructed by MDGs was evaluated using ROC curves, which consistently showed an area under the curve (AUC) greater than 0.60, further supporting the accurate prognostication of PCa patients by our model ([104]Fig. 4B). Fig. 4. [105]Fig. 4 [106]Open in a new tab Validation of the prognosis model marked by MDGs in the GEO dataset. Note: A: Survival curve of high and low-risk groups; B: ROC curve of high and low-risk groups in the testing queue; C: Scatter plot of high and low-risk groups and survival status distribution. Furthermore, we analyzed the relationship between different risk scores and patients' follow-up time, events, as well as the expression changes of various genes. Interestingly, as the risk score increased, the patients' survival rate significantly decreased ([107]Fig. 4C). Therefore, these results demonstrate a trend similar to the TCGA dataset and indicate the stability and reliability of the MDGs as biomarkers. 3.5. Methylation levels of 8 key MDGs in PCa tissue and normal pancreatic tissue Of the eight key MDGs, seven (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5 and IGF1R) were hypermethylated, while PIK3C2B was hypomethylated ([108]Fig. 5A). In addition, there was a significant negative correlation between the methylation levels of the eight key MDGs and gene expression levels ([109]Fig. 5B). Additionally, Kaplan-Meier survival curves were plotted based on the expression levels of the PIK3C2B gene in TCGA PCa data, revealing a negative correlation between the expression of PIK3C2B and OS of patients ([110]Fig. 5B). Fig. 5. [111]Fig. 5 [112]Open in a new tab Methylation level and correlation analysis of 8 key MDGs in PCa tissues and normal pancreatic tissues Note: A: Histogram of the distribution of methylation status of MDGs (histogram shows the distribution of LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5 methylation in the samples, Beta values represent methylation levels (ranging from 0 to 1), horizontal black bars indicate the distribution of methylation values in non-tumor samples); B: Regression analysis between gene expression levels of MDGs and methylation regression analysis between gene expression levels and methylation levels (vertical axis is gene expression levels, the horizontal axis is methylation levels). Kaplan-Meier survival curves for PIK3C2B gene expression in TCGA PCa samples, with the red curve representing samples with low PIK3C2B expression and the blue curve representing samples with high PIK3C2B expression. 3.6. PIK3C2B is highly expressed in PCa cells, and knockdown of PIK3C2B significantly reduces the proliferation, invasion, and migration of PCa cells The expression of LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, IGF1R, and PIK3C2B was assessed in the human normal pancreatic cell line HPDE6c7 and PCa cell lines PANC-1, BxPC-3, and AsPC-1 using RT-qPCR. The results showed that compared to the human normal pancreatic cell line HPDE6c7, the expression levels of LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, and IGF1R were significantly decreased, while the expression level of PIK3C2B was significantly increased in the PCa cell lines PANC-1, BxPC-3, and AsPC-1 ([113]Fig. 6A). Additionally, we validated the expression of these eight genes in the three PCa cell lines through the CCLE website, and found that TBC1D4, MAP3K5, IGF1R, and PIK3C2B exhibited higher expression levels, while the remaining genes were expressed at lower levels ([114]Fig. S1). Subsequently, we knocked down PIK3C2B in PANC-1 cells using shRNA, and RT-qPCR and WB results showed a significant decrease in PIK3C2B expression in PANC-1 and BxPC-3 PCa cell lines compared to the sh-NC group, with sh-PIK3C2B-1 showing a more pronounced reduction ([115]Fig. 6B, [116]Fig. S2A). Therefore, sh-PIK3C2B-1 was selected for subsequent experiments and referred to as sh-PIK3C2B. CCK8, scratch assay, and Transwell assay revealed a significant decrease in the proliferation, migration, and invasion abilities of PANC-1 and BxPC-3 cells in the sh-PIK3C2B group compared to the sh-NC group ([117]Fig. 6C–E, [118]Figs. S2B–D). Collectively, PIK3C2B is highly expressed in PCa cells, and the knockdown of PIK3C2B significantly reduces their proliferation, invasion, and migration abilities. Fig. 6. [119]Fig. 6 [120]Open in a new tab Knockdown of PIK3C2B significantly inhibits the proliferation, migration, and invasion of PCa cell lines PANC-1 A: RT-qPCR to detect the expression of LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, IGF1R, and PIK3C2B in the human normal pancreatic cell line HPDE6c7 and PCa cell line PANC-1, BxPC-3 and AsPC-1; B: RT-qPCR and Western Blot to detect the expression of PIK3C2B in the PANC-1 cell line after shRNA knockdown of PIK3C2B PIK3C2B; C: CCK8 assay to detect changes in the proliferative capacity of PANC-1 cell lines after knockdown of PIK3C2B; D: Scratch assay to detect changes in migratory capacity of PANC-1 cell lines after knockdown of PIK3C2B cell lines; E: Transwell assay to detect changes in invasive capacity of PANC-1 cell lines after knockdown of PIK3C2B cell lines changes in the invasive ability of PANC-1 cell lines after knockdown of PIK3C2B cell lines. Cell experiments were performed in three independent biological replicates. Compared with the human normal pancreatic cell line HPDE6c7 or sh-NC group, * indicates p < 0.05. 4. Discussion We screened 2737 DEGs and 71 MDGs in PCa samples by bioinformatics analysis. Further, we screened the results of univariate analysis using Lasso regression and multivariate Cox analysis to finally obtain 8 key MDGs (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, PIK3C2B, IGF1R). These results provide some theoretical basis for subsequent PCa-related studies, and the eight key MDGs could establish risk-scoring models that could improve the prediction of the prognosis of PCa patients. LEF1 is a downstream mediator of the Wnt/β-catenin signaling pathway that maintains stem cell and organ development, and abnormal expression is associated with cancer cell proliferation, migration, and invasion. Therefore, it could be a prognostic biomarker [[121]28]. In addition, low expression of LEF1 has been found to promote carcinogenesis and progression [[122]29]. ZIC3 is a transcription factor with varying functions in gastric and lung cancers, and LncRNA IGBP1-AS1 has been shown to reduce ZIC3 expression and promote proliferation and invasion in breast cancer [[123]30]. TBC1D4 may be a cancer suppressor gene, acting through the sponge miR-21 in neuroblastoma [[124]31]. MAP3K5 is an important early responder gene in the P38MAPK pathway and an apoptosis-related gene. MAP3K5 was significantly under-expressed in tumor tissue [[125]32]. Low expression of CircRNF13 accelerated IGF1R-induced malignant progression of PCa by targeting miR-139-5p and inhibiting IGF1R expression, thereby promoting proliferation, migration, and invasion of PCa [[126]33]. In addition, elevated levels of PIK3C2B expression may lead to enhanced tumor cell invasion [[127]34]. The Kaplan-Meier survival curves showed a statistically significant difference in the OS of PCa patients in the high-risk and low-risk groups. In addition, the ROC curves confirmed that the prognostic risk scoring model based on the eight key MDGs could accurately predict the prognosis of PCa patients. Methylation inhibits gene transcription, leads to gene inactivation, and, in most cases, promotes tumor progression. Our study constructed a risk assessment model based on eight key MDGs that better integrate key genes associated with PCa progression. The model may be of great importance in providing clinicians with a tissue-based prognostic assessment tool to aid their treatment decisions, which is expected to improve patient survival. Targeted methylation profiling of tumor tissue and adjacent normal tissue from 50 patients with stage I-II PCa has been used to construct and validate a tissue-based DNA methylation risk score model to predict prognosis and identify PCa patients at high risk of mortality at the surgery with a significant correlation between risk score and OS [[128]35]. Other investigators have also developed a combined predictive risk assessment model consisting of three m6A methylation regulators (i.e., IGF2BP2, IGF2BP3, and METTL16) and four iron deposition regulators (i.e., ENPP2, ATP6V1G2, ITGB4, and PROM2) that more effectively reflects PCa progression and prognosis [[129]36]. We combined the genes in constructing the prognostic risk score model with the clinicopathological characteristics (age, gender, pT stage, pN stage, pM stage) of PCa patients to construct a gene nomogram. DCA showed that the gene nomogram had strong clinical predictive power. This result indicates that the accuracy of our prognostic risk assessment model is very high, and the expression abundance and methylation of key driver genes, which could correspond to the clinicopathological characteristics of PCa patients themselves, especially the clinical staging analysis of patients' disease, could accurately predict the prognosis of PCa patients. One investigator analyzed prognostic factors to create a labeled graph to predict overall OS in patients with advanced pancreatic body tail cancer (PBTC, a type of PCa), using univariate and multivariate Cox regression analysis on 1256 patients with advanced PBTC, identifying age, grade, stage N, stage M, surgery and chemotherapy as independent risk factors, demonstrating in both calibration and DCA curves that the model in the training and validation cohorts Age, grade, M stage, and surgery were independently associated with OS in good agreement in the calibration and DCA curves [[130]37]. Other researchers showed that age, histologic subtype, primary site, N stage, and tissue metastasis were risk factors for the development of PCLM through multivariate logistic regression analysis based on the SEER database; models were developed to correlate prognostic factors such as age, grade, tumor size, histologic subtype, surgery, chemotherapy, and tissue metastasis with patient OS. DCA showed that both clinical practice prediction models were highly reliable [[131]38]. In a report with the highest relevance to our findings, researchers developed a novel model of prognosis and drug sensitivity consisting of three DNA repair-related genes, DRG (RECQL, POLQ, and RAD17), with DCA curves and calibration curves showing that risk scores and TNM staging coincide, could be used to predict prognosis and treatment response in PCa [[132]39]. We also found that among the 8 key MDGs, 7 (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, and IGF1R) were hypermethylated, while PIK3C2B was hypermethylated. 8 key MDGs were not only affected by methylation but also by gene amplification, deletion, and mutation. In addition, GSEA pathway enrichment analysis revealed that the 8 key MDGs might be involved in various processes associated with PCa, such as immune escape, inflammatory response, apoptosis, and cell migration. These results have a very important clinical guidance value, and the eight key MDGs have the potential to be markers or drug targets for early diagnosis and therapeutic options to assess PCa. Our findings provide the necessary bioinformatics and relevant theoretical basis for future PCa research, potential biomarkers for precision treatment and prognosis prediction, and a basis for future research into the molecular mechanisms of PCa. Relevant to the findings are reports that LEF1 has strong and diffuse nuclear markers in solid pseudopapillary tumors of the pancreas, is highly methylated, and could be used as a diagnostic marker [[133]40]. In addition, inhibition of IGF1R expression in PCa cells has been reported to promote proliferation, migration, and invasion of PCa, accelerating IGF1R-induced malignant progression of PCa [[134]33]. PIK3C2B overexpressed in tumor cells inhibits apoptosis and leads to chemoresistance in tumor cells [[135]41]. 5. Conclusions In summary, we could draw the following preliminary conclusions: whole-genome DNA methylation sequencing technology screened eight key MDGs (LEF1, ZIC3, VAV3, TBC1D4, FABP4, MAP3K5, PIK3C2B, IGF1R) as prognostic markers for PCa patients, and the prognostic risk prediction model constructed based on the eight key MDGs could accurately predict the prognosis of PCa patients ([136]Fig. S3). Furthermore, the prognostic risk prediction model based on the eight key MDGs could accurately predict the prognosis of PCa patients ([137]Fig. S3). In addition, these eight key MDGs may be involved in various processes associated with PCa, such as immune escape, inflammatory response, apoptosis, and cell migration. Therefore, these genes could be markers for early diagnosis, therapeutic options, or drug targets to assess PCa. Although further experimental validation is needed, our findings provide the necessary bioinformatics and relevant theoretical basis for future PCa research. Our study provides potential biomarkers for precise treatment and prognosis prediction by analyzing MDGs in PCa. Furthermore, it provides a basis for future research into the molecular mechanisms of PCa. Ethics approval and consent to participate Not applicable. Consent for publication Not applicable. Funding statement This work was supported by grants from the Shanghai Municipal Health Commission (201940019), Shanghai Qingpu District Science and Technology Commission (QKY2023–04, R2023-05), National Natural Science Foundation of China (81872366, 81871941, 81827807), Natural Science Programs offered by Bengbu Medical University (2022byzd160) and Scientific Research Project of Shanghai Qingpu District Health Commission (QWJ2022-03). Data availability statement The data that supports the findings of this study are available on request from the corresponding author. CRediT authorship contribution statement Chao Song: Writing – review & editing, Methodology, Investigation, Funding acquisition, Formal analysis, Conceptualization. Ganggang Wang: Writing – review & editing, Writing – original draft, Data curation. Mengmeng Liu: Writing – review & editing, Writing – original draft, Investigation, Data curation, Conceptualization. Zijin Xu: Writing – original draft, Software, Data curation, Conceptualization. Xin Liang: Writing – original draft, Validation, Data curation. Kai Ding: Writing – review & editing, Writing – original draft, Methodology, Investigation, Conceptualization. Yu Chen: Writing – original draft, Visualization, Methodology, Data curation, Conceptualization. Wenquan Wang: Writing – original draft, Validation, Methodology, Formal analysis, Conceptualization. Wenhui Lou: Writing – original draft, Software, Methodology, Formal analysis, Data curation, Conceptualization. Liang Liu: Writing – review & editing, Writing – original draft, Visualization, Validation, Project administration, Methodology, Investigation, Data curation, Conceptualization. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Footnotes ^Appendix A Supplementary data to this article can be found online at [138]https://doi.org/10.1016/j.heliyon.2024.e29914. Contributor Information Wenquan Wang, Email: wang.wenquan@zs-hospital.sh.cn. Wenhui Lou, Email: lou.wenhui@zs-hospital.sh.cn. Liang Liu, Email: liuliang.zlhospital@fudan.edu.cn. Appendix A. Supplementary data The following are the Supplementary data to this article: Multimedia component 1 [139]mmc1.docx^ (19.9KB, docx) figs1. [140]figs1 [141]Open in a new tab figs2. [142]figs2 [143]Open in a new tab figs3. [144]figs3 [145]Open in a new tab References