Abstract Pancreatic adenocarcinoma (PAAD) remains challenging to diagnose and treat clinically due to its difficult early diagnosis, low surgical resection rate, and high risk of postoperative recurrence and metastasis. SMAD4 is a classical mutated gene in pancreatic cancer and is lost in up to 60%–90 % of PAAD patients, and its mutation often predicts a poor prognosis and treatment resistance. In this study, based on the expression profile data in The Cancer Genome Atlas database, we identified a ceRNA network composed of 2 lncRNAs, 1 miRNA, and 4 mRNAs through differential expression analysis and survival prognosis analysis. Among them, high expression of KLK10/LIPH/PARD6B/SLC52A3 influenced the prognosis and overall survival of PAAD patients. We confirmed the high expression of these target genes in pancreatic tissue of pancreatic-specific SMAD4-deficient mice. In addition, immune infiltration analysis showed that the high expression of these target genes affects the tumor immune environment and contributes to the progression of PAAD. Abnormal overexpression of these target genes may be caused by hypermethylation. In conclusion, we found that KLK10/LIPH/PARD6B/SLC52A3 is a potential prognostic marker for PAAD based on a competing endogenous RNA-mediated mechanism and revealed the potential pathogenic mechanism by which deficient expression of SMAD4 promotes pancreatic cancer progression, which provides a new pathway and theoretical basis for targeted therapy or improved prognosis of pancreatic cancer. These data will help reveal potential therapeutic targets for pancreatic cancer and improve the prognosis of pancreatic cancer patients. Keywords: ceRNA, PAAD, KLK10, LIPH, PARD6B, SMAD4 Graphical abstract Image 1 [51]Open in a new tab Research Highlights * • A regulatory axis through differential expression analysis and survival prognosis analysis was identified. * • KLK10/LIPH/PARD6B/SLC52A3 are promising molecular biomarkers for the treatment of pancreatic cancer through a ceRNA network. * • This study provides a new pathway and theoretical basis for targeted therapy or improved prognosis of pancreatic cancer. 1. Introduction Pancreatic adenocarcinoma (PAAD) is one of the most lethal malignant gastrointestinal tract tumors with an aggressive nature and poor prognosis. According to the 2023 Cancer Data [[52]1], PAAD is now the third most common cause of cancer-related death. Currently, the incidence of PAAD is increasing at a rate of 1 % per year and the mortality rate of PAAD has not significantly improved compared with other cancers. Furthermore, its 5-year survival rate is only 12 %, and the mortality rate is almost equal to the incidence rate [[53]2]. The overall prognosis for PPAD patients remains poor due to the lack of effective early screening programs and the delayed onset of early symptoms [[54]3]. The best treatment for PAAD is still surgical resection, but only 15–20 % of patients are suitable for resection [[55]4], and the recurrence rate is as high as 76.6 % even after surgical treatment [[56]5]. In addition, there are adjuvant therapies such as radiotherapy, chemotherapy, targeted therapy, and immunotherapy, and although some progress has been made in each of these treatments, it is still challenging to improve patient survival [[57]6,[58]7]. New biomarkers are essential for the early diagnosis of the onset and recurrence of PAAD and the outcome of advanced PAAD and will be invaluable for improving postoperative monitoring after radical surgery [[59]8]. To improve the early diagnosis of PAAD and its treatment and prognosis, there is an urgent need to identify new prognostic biomarkers and new therapeutic targets. SMAD4 is one of the most mutation-prone genes in PAAD [[60]9], and SMAD4 mutations are considered to be the most specific for PAAD [[61]10]. SMAD4, a signal transducer of TGF-β, is one of the Smad family members and can inhibit cell growth and promote apoptosis downstream of TGF-β [[62]11]. SMAD4 has been shown to play a tumor-suppressive role in a variety of tumors including PAAD [[63]12]. According to the literature, approximately 60–90 % of PAAD patients have SMAD4 mutations, which cause deletion of SMAD4 expression [[64]13], resulting in loss of the SMAD4 tumor-suppressive function in a variety of cancers [[65]14] and accelerating tumor cell invasion and metastasis, leading to poor prognosis of cancer patients and reducing their overall survival (OS) rate. Although some studies on the effects of SMAD4 deficiency on PAAD have reported the role of glycolysis and the immune microenvironment, the pathogenic mechanisms by which SMAD4 deficiency promotes the development of PAAD at the epigenetic level have not been investigated. Cancer development is often caused by dysregulated gene expression, and numerous studies have revealed that non-coding RNA (ncRNA) can affect the biological function of cells by regulating the expression of target genes at the transcriptional level [[66]15], thus playing an important role in the proliferation and invasion of a variety of tumor cells [[67]16,[68]17]. ncRNAs are functional RNA molecules that do not encode proteins but function by regulating gene expression and proteins. ncRNAs include long ncRNA (lncRNA), small ncRNA (mainly miRNA), and circular RNA [[69]18]. miRNAs are short ncRNAs of 18–24 nucleotides in length that can bind complementarily to the target mRNA through miRNA-recognition elements (MREs) on the target mRNA. Binding leads to the degradation of the target mRNA or the inhibition of mRNA translation, thereby downregulating the expression of the target mRNA and thus regulating different biological processes including cell development, cell proliferation and differentiation, apoptosis, and tumor growth [[70]19]. LncRNAs are ncRNAs of more than 200 nucleotides in length and are one of the important families of regulatory molecules in the human genome. They mainly regulate gene expression at the transcriptional or post-transcriptional level, and dysregulation of lncRNA expression or altered function directly affects cell proliferation [[71]20], apoptosis [[72]21], and migration and invasion [[73]22] in a variety of malignancies, including PAAD. The “competing endogenous RNA (ceRNA) hypothesis” was proposed in 2011 [[74]23] and suggests that lncRNA also exists in the MRE region and can act as a ceRNA that binds to miRNAs [[75]24,[76]25]. The binding of ceRNAs attenuates the degradation or repression of target mRNAs by miRNAs, thus upregulating the expression of target mRNAs [[77]26]. There is increasing evidence that ceRNA mechanisms are involved in regulating the development of various cancers including pancreatic cancer [[78]27], thyroid cancer [[79]28], and oral squamous carcinoma [[80]29]. However, whether the SMAD4 expression deficiency in PAAD is associated with ceRNAs has not yet been revealed. In this study, we identify a SMAD4-related ceRNA network through public databases to reveal potential prognostic biomarkers for PAAD. This study sought to identify a ceRNA regulatory axis to reveal the effect of SMAD4 loss on PAAD and to develop possible prognostic biomarkers based on the validation of the ceRNA hypothesis in numerous malignancies ([81]Fig. S1). 2. Materials and methods 2.1. Data sources and processing Sequencing data and clinical information of 178 pancreatic cancer samples and four paracancerous samples were downloaded from TCGA database ([82]https://portal.gdc.cancer.gov/), and RNAseq data were obtained from 167 normal pancreatic samples from the GTEx database ([83]https://commonfund.nih.gov/GTEx/). Expression data of target mRNAs in 117 cancer and 106 paracancerous samples was obtained from Ruijin Hospital (Shanghai, China). In addition, we downloaded the transcriptomic data of PAAD published in Cell (including 140 PAAD and 21 paracancerous samples) and performed differential expression analysis. The CPTAC database ([84]https://pdc.cancer.gov/pdc/), Gene Expression Profiling Interactive Analysis (GEPIA) ([85]http://gepia.cancer-pku.cn/) and The Human Protein Atlas (THPA) database ([86]https://www.proteinatlas.org/) were used to analyze expression at the gene and protein level. The 178 PAAD tumor samples obtained from TCGA were divided into SMAD4^high and SMAD4^low groups (grouped by the median expression level of SMAD4). The log2FC values and p-values of lncRNAs, miRNAs, and mRNAs with high SMAD4 expression (n = 89) and low SMAD4 expression (n = 89) were calculated using the R package “gplots" (p < 0.05), and screening thresholds for differentially expressed miRNAs (DEmiRNAs) were |logFC| > 0.2 (p < 0.05). RNA expression was considered statistically different when the p-value was <0.05. Volcano maps were created using GraphPad Prism software (version 8.4). 2.2. Construction and identification of the ceRNA network Mircode ([87]http://www.mircode.org/) was used to predict DEmiRNAs that are mutually regulated with differentially expressed lncRNAs (DElncRNAs). TargetScan ([88]https://www.targetscan.org/vert_80/) and miRDB ([89]https://mirdb.org/mirdb/index.html) were applied to screen the target mRNAs of DEmiRNAs. DElncRNA sequences were obtained from the LNCipedia database ([90]https://lncipedia.org/). The lncLocator 2.0 database ([91]http://www.csbio.sjtu.edu.cn/bioinf/lncLocator2/) was used to determine the DElncRNA cellular localization. Finally, Cytoscape software ([92]http://www.cytoscape.org) was used to visualize the ceRNA network. Survival curves were plotted using Kaplan–Meier curves. Statistical analysis was performed using GraphPad Prism (version 8.4). 2.3. Functional enrichment analysis of differentially expressed mRNAs (DEmRNAs) and cancer pathway scoring To investigate the pathway enrichment information for screening PAAD-related prognostic target mRNAs, we obtained RNAseq data in HTSeq-FPKM format from TCGA-PAAD ([93]https://portal.gdc.cancer.gov/) database and performed a log2 transformation using the stat R package (version 3.6.3). Spearman correlation analysis was performed to obtain the top 100 genes associated with DEmRNAs (p < 0.05). Then, KEGG pathway enrichment analysis and GO function enrichment analysis (including biological process, cellular component, and molecular function) were performed on bubble plots for the respective gene sets using the R package “ggplot 2." Pathway activity scores for 10 cancer-related pathways from 178 PAAD samples from TCGA database were calculated using reverse-phase protein array (RPPA) data from The Cancer Proteome Atlas (TCPA) database ([94]https://www.tcpaportal.org/tcpa/). RPPA data are relative protein levels after normalizing the standard deviation (SD) of all samples centered on the median. The pathway activity score is the sum of the relative protein levels of all positive regulatory components minus the relative protein levels of negative regulatory components in each pathway [[95]30]. 2.4. Genetic alteration and methylation analysis Mutations in target genes in pancreatic cancer are available in the cBioPortal database ([96]https://www.cbioportal.org/). Mutation sequence data were collected from cBioportal ([97]http://www.cbioportal.org/), CAMOIP ([98]http://www.camoip.net/) and OncoVar ([99]https://oncovar.org/welcome/index). The degree of DNA methylation of target genes is available in the UALCAN database ([100]http://ualcan.path.uab.edu/analysis.html). MEXPRESS ([101]https://mexpress.Be) was used to visualize target gene expression and the association between clinical factors and DNA methylation. MethSurv ([102]https://biit.cs.ut.ee/methsurv/) was used to analyze the degree of methylation at different CpG sites of target genes. 2.5. Analysis of immune infiltration levels The immune infiltration and mRNA expression module of the Gene Set Cancer Analysis (GSCA) website ([103]http://bioinfo.life.hust.edu.cn/GSCA/#/) allows the assessment of tumor infiltration of 24 immune cell types using ImmuCellAI. Lollipop plots were obtained with Sento Academic using the R package GSVA visualization. In addition, the relationship between target gene expression levels and lymphocytes was analyzed using the TISIDB database ([104]http://cis.hku.hk/TISIDB/index.php). 2.6. Predictive analysis of gene-targeted drugs The HERB database ([105]http://herb.ac.cn/) was used to predict the possible target herbs for the target genes. Using the GSCA platform ([106]http://bioinfo.life.hust.edu.cn/GSCA/#/), we integrated the sensitivity correlations of drugs with target genes from the GDSC and GTRP databases. 2.7. Cell culture Human pancreatic cancer cell lines MIA PaCa-2, CFPAC-1, and CAPAN-1 were obtained from the Cell Bank of the Chinese Academy of Sciences (Shanghai, China). All cell lines were cultured in a humidified incubator (37 °C) containing 5 % CO[2] in media (DMEM and 1640) supplemented with 10 % fetal bovine serum and 1 % penicillin/streptomycin. 2.8. Mouse model construction SMAD4-deficient C57BL/6J mice were constructed using the Cre-LoxP system. PDX1-cre; SMAD4^flox/+ (PS) or PDX1-cre; SMAD4^flox/+;KRAS^G12D/- (PKS) mice were obtained by crossing PDX1-cre (pancreatic-specific cre tool mice), SMAD4^flox/+ mice (MGI:894293), and/or KRAS^G12D/- mutant mice (MGI: 96680). Genotyping was performed using PCR and the Mouse Tail DNA Extraction Kit. The PCR primers used were as follows: PDX1 genotyping forward primers 5′-CTGTCCCTGTATGCCTCTGG-3′ and 5′-AGATGGAGAAAGGACTAGGCTACA-3′, and. PDX1 genotyping reverse primers 3′-CCTGGACTACATCTTGAGTTGC-5′ and 3′-AGGCAAATTTTGGTGTACGG-5'; SMAD4 genotyping forward primer 5′-CTGCAGAGTAATGGTAAGTA-3′, and. SMAD4 genotyping reverse primer 3′-GTTCAACATGTCCCGAGTCA-5'; and. KRAS-G12D forward primer 5′-CTAGCCACCATGGCTTGAGT-3′, and. KRAS-G12D reverse primer 3′-TCCGAATTCAGTGACTACAGATG-5'. The PCR products were identified through gel electrophoresis on a 1.5 % agarose gel (PDX1-cre: heterozygote = 650 bp and 415 bp; SMAD4: heterozygote = 823 bp and 694 bp; and KRASG12D: heterozygote = 350 bp). 2.9. RNA extraction, quantitative PCR analysis and RNA sequencing Total RNA was extracted using AG's SteadyPure Universal RNA Extraction Kit (Accurate Biology, Changsha, China) and reverse transcribed to cDNA using the Evo M-MLV RT Mix Kit with gDNA Clean for qPCR Kit. The synthesized cDNA was used as a template for real-time PCR (qPCR), and qPCR was performed on the qTOWER system using SYBR® Green Premix Pro Taq HS qPCR Kit (Code No. AG11701). hsa-mir-98–5p, [107]AC108134.1, [108]AC015987.1, SM [109]AC015987.1, SMAD4, KLK10, LIPH, PARD6B, SLC52A3, and PRSS22 PCR primers (human and mouse) were designed by Biotech (see [110]Table S1 for primers). Quantitative analysis was performed using the 2^−ΔΔct method and plotted with the aid of Prism 8.3. Total RNA-seq library construction was performed from the RNA samples using the TruSeq Stranded RNA Sample Preparation Kit and bar-coded with individual tags following the manufacturer's instructions (Illumina). Libraries were prepared on an Agilent Bravo Automated Liquid Handling System. Quality control was performed at every step and the libraries were quantified using the TapeStation system. Indexed libraries were prepared and run on HiSeq 4000 paired end 75 base pairs to generate a minimum of 80 million reads per sample library with a target of greater than 90 % mapped reads, typically with four-sample pools. The raw Illumina sequence data were then demultiplexed and converted to FASTQ files, and adaptor and low-quality sequences were quantified. Reads were mapped to the hg38 human genome reference and underwent significant QC steps. Samples passing this QA/QC were then clustered with other expression data from similar and distinct tumor types to confirm expected expression patterns. 2.10. Statistical analysis All differential expression data were analyzed using GraphPad Prism software (version 8.4). Data are expressed as the mean ± SD. The Mann–Whitney test and the independent t-test were used to calculate differences between the two groups of data. Differences between groups were evaluated using one-way ANOVA and the chi-square test. P < 0.05 was considered statistically significant. 3. Results 3.1. SMAD4 expression is downregulated in PAAD, and low expression is associated with poor prognosis Based on THPA database, SMAD4 was highly expressed in normal pancreatic tissues ([111]Fig. 1A and [112]Fig. S2A), but there was low expression in PAAD tissues ([113]Fig. S2B). Immunohistochemical staining (IHC) showed similar results ([114]Fig. 1B). In the CPTAC database, the protein expression of SMAD4 was significantly lower in PAAD tissues than in normal tissues ([115]Fig. 1C). To explore the clinical relevance of low SMAD4 expression in tumor samples, we divided the samples into two groups (high and low) based on the SMAD4 mRNA expression (bounded by the median expression level) in each sample in TCGA database and plotted Kaplan–Meier survival curves ([116]Fig. 1D). Low SMAD4 expression significantly correlated with poorer OS in pancreatic cancer patients. The differential expression of SMAD4 in TCGA-sourced samples showed a significant correlation with the clinical T-stage ([117]Fig. S2C), suggesting that low expression of SMAD4 influences the prognosis of PAAD. Furthermore, using online TIMER ([118]http://timer.cistrome.org/) server, we produced a bar graph showing the frequency of mutations in the SMAD4 gene in each cancer in TCGA database ([119]Fig. 1E). SMAD4 was most frequently mutated in PAAD, where it was mutated in up to 20 % (36/178) of tumors. Combined with the high-frequency mutation gene waterfall map provided by OncoVar ([120]Fig. S2D), the mutation rate of SMAD4 in 158 PAAD samples was 18 %, which was in the top three of high-frequency mutations. We then analyzed the genomic and copy-number alterations of SMAD4 in TCGA data using cBioPortal. The copy number of SMAD4 in PAAD was positively correlated with its mRNA expression ([121]Fig. 1F), and there was a 30 % mutation rate of SMAD4 in PAAD, most of which were profound deletion mutations ([122]Fig. 1G). Thus, the downregulation of SMAD4 expression significantly affects the survival and prognosis of PAAD patients. Copy-number alterations of SMAD4 are one possible cause of low SMAD4 expression. Fig. 1. [123]Fig. 1 [124]Open in a new tab Characteristics of SMAD4 in PAAD. (A) Protein expression and distribution of SMAD4 in normal tissues. (B) Immunohistochemical expression of SMAD4 at the translational level was obtained from the HPA database. (C) Protein expression of SMAD4 in PAAD samples from the CPTAC database. (D) Correlation analysis between high and low expression levels of SMAD4 and overall survival of patients. (E) Frequency of SMAD4 mutations in different cancer types in TCGA dataset. (F) Correlation between SMAD4 copy number and mRNA expression. (G) Bar graph showing the frequency of SMAD4 mutations in TCGA-PAAD dataset. 3.2. Construction of a ceRNA network composed of lncRNA, miRNA, and mRNA Numerous studies have demonstrated that ceRNAs may act as a regulatory mechanism of tumor progression in a variety of tumors. Therefore we attempted to identify the potential ceRNA-mediated regulatory mechanisms by constructing a SMAD4-related ceRNA network in which deletion of SMAD4 expression affects PAAD prognosis. We divided 178 PAAD patients from TCGA database into two groups (high SMAD4 expression and low SMAD4 expression according to the median mRNA expression of SMAD4) and performed differential mRNAs (DEmRNAs) screening. A total of 2015 DElncRNAs (655 upregulated and 1360 downregulated), 192 DEmiRNAs (62 upregulated and 130 downregulated), and 3971 DEmRNAs (1358 upregulated and 2613 downregulated) were screened ([125]Fig. 2A–C). Next, based on the miRcode and TargetScan databases, we obtained all lncRNA-miRNA and miRNA-mRNA relationship pairs. By taking the intersection of these RNAs that could constitute a network with our screened DERNAs, we identified potential ceRNA networks in PAAD, including 188 DElncRNAs, 24 DEmiRNAs, and 838 DEmRNAs, and visualized them as ceRNA networks using Cytoscape 3.9 ([126]Fig. 2D). Kaplan–Meier regression analysis was then performed on these DERNAs, and DERNAs associated with PAAD survival prognosis were obtained ([127]Fig. S3A). These DERNAs included four DElncRNAs ([128]AC015987.1, [129]AC108134.1, [130]AL136115.1, and HCG11), one DEmiRNA (hsa-mir-98–5p), and five DEmRNAs (KLK10, LIPH, PARD6B, PRSS22, and SLC52A3) that made up a potential ceRNA network associated with PAAD survival and prognosis ([131]Fig. 2E). In addition, we performed GO and KEGG enrichment analyses of DEmRNAs in this network using Metascape and found that these differential DEmRNAs were mainly enriched in the “synaptic signaling,” “membrane potential regulation,” “cell adhesion,” and “cell morphogenesis” pathways ([132]Fig. 2F). Fig. 2. [133]Fig. 2 [134]Open in a new tab Regulatory network of lncRNA-miRNA-mRNA screened according to SMAD4^LOWand SMAD4^HIGH(red represents upregulation, and blue represents downregulation). (A–C) The volcano plots describe (A) DElncRNAs (|log2FC| > 0.5, p < 0.05), (B) DEmiRNAs (|log2FC| > 0.2, p < 0.05), and (C) DEmRNAs (|log2FC| > 0.5, p < 0.05). (D–E) SMAD4-related ceRNA network. Diamonds represent lncRNAs, rectangles represent miRNAs, and ellipses represent mRNAs. (D) All triple ceRNA networks in PAAD. (E) Survival-associated ceRNA networks. (F) Functional enrichment analysis of DEmRNAs in the network using KEGG and GO. 3.3. Identification of a ceRNA network associated with SMAD4 expression To confirm the SMAD4-related ceRNA network, we verified the expression of the ceRNA axes in the SMAD4 high expression group and the SMAD4 low expression group ([135]Fig. S3B). Next, given that lncRNAs can only exert competitive effects as ceRNAs in the cytoplasm, we analyzed the subcellular localization of [136]AC015987.1 and [137]AC108134.1 using lncLocator. The localization results showed that both were mainly localized in the cytoplasm ([138]Fig. 3A and B), a result that suggests that [139]AC015987.1 and [140]AC108134.1 may act as ceRNAs. Using miRDB and TargetScan databases, we found predicted binding sites of hsa-mir-98–5p with [141]AC015987.1, [142]AC108134.1, KLK10, LIPH, PARD6B, PRSS22, and SLC52A3 ([143]Fig. 3C). The expression correlation between all DERNAs in this ceRNA network was then analyzed in TCGA-PAAD samples, and the expression levels of [144]AC015987.1 and [145]AC108134.1 were significantly positively correlated with the expression of KLK10, LIPH, PARD6B, SLC52A3, and PRSS22 (p < 0.05). Furthermore, there was a negative although insignificant correlation between [146]AC108134.1 and hsa-mir-98–5p ([147]Fig. 3D and E); hsa-mir-98–5p was significantly negatively correlated with the expression of KLK10, LIPH, PARD6B, SLC52A3, and PRSS22 (p < 0.05; [148]Fig. 3F). The correlation among DEmRNAs was also analyzed, and it was found that there was a significant positive correlation between all these target mRNAs ([149]Fig. S3C). These data further support the possibility of this ceRNA regulatory axis ([150]Fig. 3G). Fig. 3. [151]Fig. 3 [152]Open in a new tab Construction of the ceRNA network associated with PAAD prognosis. (A–B) Subcellular localization of (A) [153]AC108134.1 and (B) [154]AC015987.1. (C) Base pairings between hsa-mir-98–5P and target sites ([155]AC108134.1, [156]AC015987.1, KLK10, LIPH, PARD6B, SLC52A3, and PRSS22) predicted using lncRNABase, MiRcode, and TargetScan databases. (D) Expression correlation analysis between [157]AC108134.1 and five DEmRNAs (KLK10, LIPH, PARD6B, SLC52A3, and PRSS22), and hsa-mir-98–5p. (E) Expression correlation analysis between [158]AC015987.1 and five DEmRNAs (KLK10, LIPH, PARD6B, SLC52A3, and PRSS22), and hsa-mir-98–5p. (F) Expression correlation analysis between hsa-miR-98–5p and five DEmRNAs (KLK10, LIPH, PARD6B, SLC52A3, and PRSS22). (G) PAAD prognostic model constructed using the ceRNA network. 3.4. Experimental validation of the correlation between SMAD4 and the ceRNA axis To confirm the correlation between this ceRNA axis and SMAD4, we first analyzed the correlation between the expression of DERNAs and SMAD4 using TCGA database. The results showed that the expression of SMAD4 was significantly negatively correlated with the expression of [159]AC108134.1; there was only a negative correlation trend with the expression of [160]AC015987.1; significantly positively correlated with the expression of hsa-mir-98–5p; and significantly negatively correlated with the expression of five target mRNAs ([161]Fig. 4A). This result is consistent with the ceRNA hypothesis mechanism. Then, to confirm the feasibility of this ceRNA axis, we used SMAD4-related pancreatic cancer cell lines (CAPAN-1, CFPAC-1, and MIA PaCa-2) to validate the expression of DEmRNAs in this regulatory axis. CFPAC-1 cells are deficient in SMAD4 expression; and MIA PaCa-2 cells do not have a SMAD4 mutation and show normal SMAD4 expression [[162]31]. qPCR results confirm that, compared with MIA PaCa-2, CAPAN-1 and CFPAC-1 cells had deficient SMAD4 expression. The expression of [163]AC108134.1 and [164]AC015987.1 was significantly upregulated, the expression of hsa-mir-98–5P was significantly downregulated, and the expression of target mRNAs was significantly upregulated ([165]Fig. 4B). These results are consistent with the bioinformatic analysis and further support a ceRNA network axis composed of [166]AC108134.1, [167]AC015987.1-hsa-mir-98-5P-KLK10, LIPH, PARD6B, and SLC52A3 as a possible PAAD prognostic regulatory axis associated with SMAD4 expression deficiency. Fig. 4. [168]Fig. 4 [169]Open in a new tab Correlation analysis between DERNAs and SMAD4 expression. (A) Expression correlation analysis between SMAD4 and DERNAs ([170]AC108134.1, [171]AC015987.1, KLK10, LIPH, PARD6B, SLC52A3, and PRSS22). (B) Expression of DERNAs in SMAD4-associated pancreatic cancer cell lines. 3.5. Expression of target mRNAs in PAAD and analysis of their clinical relevance Next, to determine the potential biological functions of target mRNAs, we individually investigated the expression and clinical relevance of target mRNAs in the ceRNA regulatory axis. We analyzed the expression of target mRNAs in PAAD and normal pancreatic samples using a combination of TCGA and GTEX databases. KLK10, LIPH, PARD6B, and SLC52A3 were significantly more highly expressed in PAAD tumor tissues than in normal tissues ([172]Fig. 5A). The expression of LIPH, PARD6B, and SLC52A3 in TCGA-PAAD database was significantly associated with the clinicopathological stage of PAAD by GEPIA (p < 0.05; [173]Fig. 5B). Interestingly, we also found that smoking influenced the prognosis of pancreatic cancer patients, and high expression of target mRNAs in pancreatic cancer patients who smoked significantly decreased the prognostic survival time of patients compared with non-smoking patients ([174]Fig. 5C). In addition to SMAD4, KRAS, TP53, and CDKN2A are high-frequency mutated genes in PAAD [[175]32], and their expression is closely associated with PAAD progression. Therefore, we analyzed the correlation between the expression of the four target mRNAs and these classically mutated genes ([176]Fig. 5D). High expression of the four target mRNAs was only associated with high expression of KRAS (p < 0.05; [177]Fig. 5E). Fig. 5. [178]Fig. 5 [179]Open in a new tab Correlation analysis between gene expression and clinical factors. (A) Expression of DEmRNAs in PAAD from TCGA and GTEX databases. (B) Correlation between the expression of DEmRNAs and OS in TCGA-PAAD (Alive = 86; Dead = 92). (C) Total survival curve of target mRNAs. (D) Clinical pathological stages of DEmRNAs using log2 (TPM + 1) for log-scale in TCGA-PAAD (TCGA database). ns, p ≥ 0.05; *, p < 0.05; **, p < 0.01; and ***, p < 0.001. To further analyze the prognostic significance of target mRNAs, we performed univariate and multifactorial Cox regression analyses to determine the correlation between clinicopathological characteristics and clinical prognosis ([180]Tables S2–S5). While “lymph node metastasis" and “KLK10/LIPH expression" were shown to be independent risk factors for OS in the multivariate regression analysis (p < 0.05), the univariate regression analysis revealed that “tumor size," “lymph node metastasis," “clinical stage," and “KLK10/LIPH expression" were all independent risk factors for clinical prognosis (p < 0.05). 3.6. Correlation analysis between the expression of target mRNAs and immune infiltration Given that multiple cancers are associated with immune cell infiltration and that tumor cells and tumor-infiltrating cells can act together to promote tumor progression, we assessed the expression of the KLK10/LIPH/PARD6B/SLC52A3 using ImmuCellAI. We assessed the correlation between the expression of the four target mRNAs and 24 infiltrating immune cells and found that KLK10/LIPH/PARD6B/SLC52A3 correlated positively with monocytes, Th17 cells, nTreg cells, NK cells, Tfh cells, and CD4^+ T cells ([181]Fig. 6A). Based on TCGA database, the expression of each target mRNA in pancreatic cancer samples was also significantly positively correlated with NK cells (CD56^+) and Th2 cells and significantly negatively correlated with Tfh cells ([182]Fig. 6B). We also performed a correlation analysis of the expression of the four target mRNAs with the abundance of tumor-infiltrating lymphocytes (TILs) using the TISIDB database ([183]Fig. 6C). In addition, we found in the “immunoassay” module of the CAMOIP database [[184]33] that high expression of KLK10, LIPH, PARD6B, and SLC52A3 resulted in an increased tumor mutational load ([185]Fig. 6D). It has been reported in the literature that a high TMB implies a higher type and number of neoantigens produced by tumor cells, a higher probability of being recognized by the immune system, a higher probability of tumor cells being killed, and thus a higher probability of immunotherapy being effective in that patient [[186]34,[187]35]. Therefore, these four target genes have potential as new biomarkers for the prognosis of pancreatic cancer patients. Fig. 6. [188]Fig. 6 [189]Open in a new tab Correlation between gene expression and immune infiltration. (A) Correlation between gene expression and infiltration of 24 immune cells evaluated through ImmuCellAI. (B) Lollipop illustration of tumor-infiltrating lymphocyte infiltration of target genes in TCGA-PAAD. (C) Correlation of gene expression with tumor-infiltrating immune cells. (D) Correlation between gene expression and tumor mutation burden (TMB). *p < 0.05; **p < 0.01; ***p < 0.001; and ****p < 0.0001. 3.7. Genomic alterations of target mRNAs and methylation analysis There is growing evidence that genomic instability is a hallmark of many cancers and that gene copy-number variation plays an important role in the development of cancer [[190]36]. Using the cBioportal database [[191]37], we examined the frequency and type of genomic alterations in KLK10, LIPH, PARD6B, and SLC52A3 in 168 PAAD cases. The frequency of genomic alterations was 1.1 % for KLK10 and PARD6B, 2.3 % for LIPH, and a relatively low frequency of 0.6 % for SLC52A3 ([192]Fig. S4A). The types of alterations in the KLK10, LIPH, PARD6B, and SLC52A3 genes were all gene amplification ([193]Fig. S4B). The expression of KLK10/LIPH/PARD6B was significantly and positively correlated with the copy-number changes ([194]Fig. S4C). Tumor development is associated with alterations in DNA methylation levels, which affect tumor invasion and metastasis by silencing the expression of tumor suppressor genes or activating oncogenes [[195]38]. In the UALCAN and DiseaseMeth databases (version 3.0), the methylation levels of KLK10, LIPH, PARD6B, and SLC52A3 were lower in PAAD samples than in normal pancreatic tissues ([196]Fig. 7A and B). In the MEXPRESS database [[197]39], KLK10, LIPH, PARD6B, and SLC52A3 methylation was significantly associated with several clinical factors, including “histological type,” “tumor histological grade,” “tumor stage,” “gender,” “sample type,” and “OS” ([198]Fig. 7C). Correlations between the methylation levels of KLK10, LIPH, PARD6B, and SLC52A3 and copy-number alterations were also observed in PAAD samples (r = −0.07, 0.267, 0.219, and 0.002, respectively). The four target genes were methylated at different CPG sites, most of which were significantly negatively correlated with the expression of target mRNAs. The methylation heatmap obtained from the MethSurv database [[199]40] showed that in terms of genomic distribution, the CPG sites of KLK10, SLC52A3, LIPH, and PARD6B were mainly distributed in the 5′UTR, TSS1500, and Body regions. The CPG sites of KLK10 were mainly in the S-shore and Island regions, the CPG sites of LIPH were mainly in S-shore regions, and the CPG sites of PARD6B and SLC52A3 were mainly in Island regions ([200]Figs. S5A–E). In addition, several loci (cg24512400, cg04772683, and cg17766007 for KLK10; cg02124892 and cg12611448 for LIPH; cg09072870, cg0377621, and cg22478261 for PARD6B; and cg01479232 and cg04972979) were hypermethylated in PAAD. Fig. 7. [201]Fig. 7 [202]Open in a new tab Relationship between the expression and methylation of predicted target genes. (A) Methylation levels of genes assessed using UALCAN. (B) Methylation levels of genes assessed using Disease 3.0. (C) Methylation sites of predicted target genes in PAAD assessed using MEXPRESS. 3.8. Functional enrichment analysis of target mRNAs Based on the GSCA database [[203]41], we explored the possible functions of KLK10/LIPH/PARD6B/SLC52A3 in the cancer-related pathways ([204]Fig. 8A). KLK10 mainly inhibited the cell cycle, DNA damage response, hormone AR, and hormone ER pathways while activating the apoptosis and RAS/MAPK pathways. LIPH mainly inhibited the apoptosis, cell cycle, DNA damage response, EMT, hormone AR, hormone ER, and PI3K/AKT pathways while activating the RAS/MAPK, RTK, and TSC/mTOR pathways. PARD6B mainly inhibited the apoptosis, DNA damage response, EMT, and TSC/mTOR pathways while activating the hormone AR, hormone ER, and RTK pathways. SLC52A3 played an important activating role in the hormone pathway, hormone ER, and RAS/MAPK pathways and a major inhibiting role in the apoptosis, cell cycle, DNA damage response, EMT, and PI3K/AKT pathways. In addition, in the GeneMANIA database [[205]42], we identified what appeared to be potential protein interactions between the four target mRNAs ([206]Fig. S6). Therefore, the top 100 genes with the greatest correlation with the target mRNAs were identified separately using GEPIA2 for GO and KEGG enrichment analyses ([207]Figs. S6B–E) to explore the potential biological functions of the four target mRNAs. The enrichment analysis showed that KLK10/LIPH/PARD6B/SLC52A3-related genes were all enriched in the “cell adhesion”-related pathway, which was also enriched in SMAD4-related DEmRNAs, suggesting that the high expression of these four target mRNAs may alter cell adhesion. In addition, KLK10 and SLC52A3 were enriched in the “epidermal differentiation” pathway, suggesting that these two molecules may also promote epidermal differentiation. Fig. 8. [208]Fig. 8 [209]Open in a new tab Drug sensitivity analysis and prediction of Chinese herbal medicine targets. (A) Different roles of predicted target genes in the PAAD cancer pathway. (B–C) Correlation between gene expression and the sensitivity to GDSC drugs (top 30) in pan-cancer based on the (A) CTRP and (B) GDSC databases. A positive correlation means that the high expression of the gene is associated with resistance to the drug, and vice versa. (D–F) Prediction of Chinese herbal medicine targets, (D) KLK10, (E) LIPH, PARD6B, and (F) SLC52A3. 3.9. Prediction of targeted drugs To explore whether there are specific drugs for these target mRNAs, we assessed the correlation between the expression of the four target mRNAs and the sensitivity to multiple drugs using the GDSC and CTRP databases [[210]43], which lacked SLC52A3-related data. In the GDSC database, high expression of KLK10/LIPH/PARD6B was associated with resistance to 24 drugs, including AR-42, BX-795, and XMD13-2, and sensitivity to five drugs, namely afatinib, cetuximab, erlotinib, gefitinib, and lapatinib. However, different results were obtained in the CTRP database, where high expression of KLK10/LIPH/PARD6B was associated with resistance to five drugs (PD 153035, afatinib, Austoocystin D, erlotinib, and saracatinib) and sensitivity to AT7867, AZD4547, vincristine, and 25 other drugs ([211]Fig. 8B and C). In addition, we predicted herbs that may target the four target genes, 22 herbs targeting KLK10, one herb targeting LIPH, three herbs targeting PARD6B, and 19 herbs targeting SLC52A3 with the help of the HERB database [[212]44] ([213]Fig. 8D–F). 3.10. Expression of target mRNAs in human and mouse pancreatic tissues To verify the expression of target mRNAs in pancreatic tissues, we obtained 117 PAAD samples and 106 normal pancreatic samples from Ruijin Hospital (Shanghai, China) and performed transcriptome sequencing. Through differential expression analysis, we found that KLK10/LIPH/PARD6B/SLC52A3 was significantly more highly expressed in PAAD samples than in normal pancreatic tissues ([214]Fig. 9A). Similar results were obtained in 94 paired cancer and paracancerous samples ([215]Fig. 9B). In addition, we downloaded the transcriptomic data of PAAD published in Cell in 2021 and performed differential expression analysis; the results were consistent with those of Ruijin Hospital ([216]Fig. 9C and D). Fig. 9. [217]Fig. 9 [218]Open in a new tab Expression of target mRNAs in human pancreatic tissues. (A) KLK10/LIPH/PARD6B/SLC52A3 was significantly more highly expressed in PAAD samples than in normal pancreatic tissues. (B) Expression of target mRNAs in 94 paired cancer and paracancerous samples. (C) The transcriptomic data of PAAD published in Cell in 2021 and differential expression analysis were performed. (D) Expression of target mRNAs in 21 paired cancer and paracancerous samples in Cell cohort. In addition, we conditionally knocked out SMAD4 in the pancreas of mice using the cre-loxP-mediated recombination system (the “cre-loxP system") to further validate the effect of the deletion of SMAD4 expression on these target mRNAs, as well as the pancreas ([219]Fig. 10A). The results of mouse genotyping are shown in [220]Fig. 10B (KRAS-positive bands 350 bp; PDX1-cre heterozygotes 415 bp and 650 bp; SMAD4 heterozygotes 498 bp and 647 bp). qPCR analysis of mouse pancreatic tissue RNA showed that, compared with controls, in SMAD4-knockout PS mice, KLK10, LIPH, and SLC52A3 expression was upregulated, but the difference in PARD6B expression was not statistically significant ([221]Fig. 10C). Notably, the pancreas of PKS mice (8 months old) showed significant precancerous lesions compared with the tissues of PS mice ([222]Fig. 10D). Fig. 10. [223]Fig. 10 [224]Open in a new tab Construction of mouse models and identification of target gene expression. (A) Construction of a pancreatic-specific SMAD4-deficient mouse model using the Cre-LoxP system. (B) Genotype identification results of mice (M = marker, HE = heterozygote, and WT = wild type). (C) The expression of target genes in SMAD4-deficient pancreatic tissues (ns, p ≥ 0.05; *, p < 0.05; **, p < 0.01; and ***, p < 0.001). (D) The pancreatic tissues of mice stained with hematoxylin and eosin (HE). 4. Discussion PAAD is one of the most lethal malignant gastrointestinal tumors, and its aggressive nature and poor prognosis remain challenging to treat in clinical practice. There is an urgent need to discover new biomarkers to improve patient prognosis. SMAD4 is one of the classical mutated genes in PAAD, and the loss of SMAD4 expression has an oncogenic function in PAAD [[225]11]. Therefore, revealing the potential mechanism of SMAD4 deletion in PAAD and mining new biomarkers are crucial to improve patient prognosis. Based on bioinformatics, we obtained results consistent with previous studies that approximately 20 % of pancreatic cancer patients in TCGA database had SMAD4 mutations and that these mutations resulted in the loss of SMAD4 expression and an oncogenic effect due to loss of SMAD4. It is unclear through which mechanism the deficiency of SMAD4 in PAAD promotes tumor development at the epigenetic level. The ceRNA hypothesis [[226]23] proposes that ceRNAs regulate tumor development in a variety of malignancies, including pancreatic cancer [[227]45,[228]46], lung cancer [[229]47], gastric cancer [[230]48], and nasopharyngeal carcinoma [[231]49]. Therefore, this study aimed to uncover a SMAD4-related ceRNA mechanism to understand how SMAD4 deletion plays a role in PAAD and to identify potential prognostic biomarkers. To construct a regulatory axis involving a SMAD4-related ceRNA mechanism, we first divided 178 pancreatic cancer patients in TCGA database into two groups—89 cases with low SMAD4 expression and 89 cases with high SMAD4 expression—and performed differential expression analysis. Then, the [232]AC108134.1/[233]AC015987.1-hsa-mir-98-5p-KLK10/LIPH/PARD6B/SLC52A3 axis was constructed through OS analysis, expression difference analysis, and expression correlation analysis. Next, we verified the differential expression of these DERNAs. Interestingly, all DEmRNAs in the ceRNA network were found to be independent predictive markers for pancreatic cancer through Cox regression analysis. We performed clinical correlation, immuno-infiltration, and methylation studies on KLK10, LIPH, PARD6B, and SLC52A3 to investigate how these DEmRNAs affect tumor growth. High expression of KLK10/LIPH/PARD6B/SLC52A3 indicated a poor prognosis for pancreatic cancer patients, while the expression of LIPH/PARD6B/SLC52A3 significantly correlated with clinical stage. In addition, we examined the correlation between the expression of these four genes and other classically mutated genes in pancreatic cancer and found that they were significantly positively correlated with the expression of KRAS, which is mutated in 95 % of pancreatic malignancies [[234]50]. This result suggests that KLK10/LIPH/PARD6B/SLC52A3 may serve as a potential biomarker for the prognosis of pancreatic cancer. Tumor growth is associated with the cooperation between tumor cells and infiltrating immune cells. Similar to immune infiltration in other solid tumors, T cells, B cells, mast cells, macrophages, and neutrophils heavily infiltrate pancreatic cancer [[235]51]. In the present study, we found that KLK10/LIPH/PARD6B/SLC52A3 expression was negatively correlated with Tfh, NK, and pDC cells and positively correlated with infiltration of TH2 and CD56^bright NK cells. In addition, tumor mutational load is a promising biomarker for immunotherapy, with higher TMB indicating that tumor cells produce more neoantigens that are likely to be recognized by the immune system [[236]52]. Several studies have shown that TMB in tumors is associated with improved immune checkpoint blockade therapy [[237]53]. We found that tumors with high KLK10/LIPH/PARD6B/SLC52A3 expression had a high TMB. These findings suggest that these four genes may be involved in the immune infiltration-related processes that control carcinogenesis and progression. Many cancers, including esophageal cancer [[238]54], small cell lung cancer [[239]55], and prostate cancer [[240]56], exhibit changes in gene copy number, which are closely associated with the onset and course of cancer. Moreover, genes can adjust their expression levels when the copy number of the gene is altered. Using biological databases, we identified mutations in the KLK10, LIPH, PARD6B, and SLC52A3 genes. The main type of modification was an amplification mutation, and mutations in KLK10, LIPH, and PARD6B were positively correlated with their expression. In addition, epigenetics, such as DNA methylation [[241]57], plays an important role in cancer development and propagation. Improper methylation can suppress tumor suppressor genes and/or activate oncogenes. Therefore, we evaluated the methylation levels of the four target genes in PAAD using multiple databases. The results showed that KLK10/LIPH/PARD6B/SLC52A3 methylation levels were significantly lower in pancreatic cancer tissues than in healthy tissues. Meanwhile, we found that KLK10/LIPH/PARD6B/SLC52A3 methylation was significantly associated with many clinical variables, including “histological type,” “tumor histological grade,” “tumor stage,” “gender,” and “OS.” In addition, we examined the methylation sites of each gene and found that all four genes were hypermethylated at several CPG sites, most of which were negatively correlated with mRNA expression. Based on the assessment of aberrant DNA methylation compared with normal tissue and the identification of differentially methylated CpG islands, we found that KLK10/LIPH/PARD6B/SLC52A3 may be an important therapeutic target to improve the prognosis of pancreatic cancer. Next, we found protein interactions between these four genes using the GeneMANIA database; therefore, we performed enrichment analysis using the top 100 most relevant genes for each gene to further investigate the potential mechanisms of the four genes in tumor promotion. All four genes have pathways associated with cell adhesion. This suggests that KLK10, LIPH, PARD6B, and SLC52A3 may use cell adhesion pathways to promote the spread and invasion of pancreatic cancer cells. The high mortality of pancreatic cancer can be attributed to the aggressive nature of the disease, its rapid progression, atypical early signs, and poor postoperative prognosis. The heterogeneity of pancreatic cancer itself continues to limit patient benefit, despite advances in the effectiveness of pancreatic cancer treatment using surgery, radiation, chemotherapy, and immunotherapy. Therefore, to achieve long-term disease management in pancreatic cancer patients, there is an urgent need for new treatment modalities to prevent the invasion and metastasis of pancreatic cancer cells. We evaluated the relationship between the four genes and the sensitivity to multiple drugs using the GDSC and CTRP databases. High expression of KLK10/LIPH/PARD6B was sensitive to nine drugs, namely afatinib, cetuximab, erlotinib, gefitinib, lapatinib, PD 153035, growth cystatin D, erlotinib, and saracatinib. Next, we found in clinical samples that the expression of KLK10/LIPH/PARD6B/SLC52A3 was significantly higher in cancer samples than in normal tissues from TCGA and GTEX databases, clinical samples from Ruijin Hospital. These findings support our hypothesis that the high expression of KLK10/LIPH/PARD6B/SLC52A3 is associated with the development of pancreatic cancer. Although our clinical data could not analyze the differential expression of miRNAs and lncRNAs in the ceRNA network, mir-98–5p has been reported to inhibit the growth of PDAC cells [[242]58], which is consistent with our results regarding ceRNA mechanisms. Finally, we constructed pancreatic-specific SMAD4-deficient mice using the Cre-LoxP technique, and found that KLK10/LIPH/SLC52A3 expression was significantly elevated in the pancreatic tissue of PS mice. Together, these results identify a ceRNA network axis associated with SMAD4 deletion in pancreatic cancer, namely the [243]AC108134.1/[244]AC015987.1-hsa-mir-98-5P-KLK10/LIPH/PARD6B/SLC52A3 regulatory axis. Deficient expression of SMAD4 may promote tumor formation through this ceRNA mechanism. Aberrant overexpression of KLK10 has been shown to promote PDAC [[245]59], and LIPH has been associated with immunosuppression or escape in pancreatic cancer [[246]60]. However, a tumor-promoting role of PARD6B and SLC52A3 in this disease has not been demonstrated. Therefore, this study proposes that PARD6B and SLC52A3 may be prognosis markers for SMAD4-deficient pancreatic cancer. However, the current study does have some limitations. For example, we could not verify the expression of miRNAs and lncRNAs in normal and cancerous tissues, and the specific oncogenic mechanisms of target mRNAs in PAAD are unclear and need to be further investigated. Ethics approval and consent to participate The present study was approved by the Human Ethics Committee and Review Board (IRB) of Ruijin Hospital Affiliated to Shanghai Jiao Tong University School of Medicine. Informed consent was obtained from all subjects and/or their legal guardian(s). The authors declare that the study is reported in accordance with ARRIVE guidelines. All procedures in the experiments were in compliance with Guide for the Care and Use of Laboratory Animals and agreed by the Institutional Animal Care and Use Committee guidelines at Shanghai Tenth People's Hospital, Tongji University School of Medicine (SYDW-KY20-126). Consent for publication Not applicable. Funding This study was supported partly by grants from National Natural Science Foundation of China (82,373,348, 82,372,865, 82,272,766), Shanghai Natural Science Foundation (20ZR1472400), Project Foundation of Taizhou School of Clinical Medicine, Nanjing Medical University (TZKY20220311, TZKY20220204 and TZKY20220205), Nantong Health Science and Technology Project (MS2022053), Science and technology project of Jiangsu Provincial Health Commission (Z2020023), Nantong Basic Science and Technology Program (JC22022027), and Nantong University Clinical Medicine Special Project (2022JZ014). The conflict of interest statement The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Data availability statement There is no data need to deposit into a publicly available repository. CRediT authorship contribution statement Meng Zhang: Data curation, Conceptualization. Lin Jiang: Data curation. Xin-Yun Liu: Data curation. Fu-Xing Liu: Data curation. Hui Zhang: Data curation. Yan-Juan Zhang: Data curation. Xiao-Mei Tang: Data curation. Yu-Shui Ma: Data curation. Hui-Yi Wu: Data curation. Xun Diao: Data curation. Chun Yang: Data curation. Ji-Bin Liu: Conceptualization. Da Fu: Conceptualization. Jie Zhang: Conceptualization. Hong Yu: Conceptualization. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgments