Abstract

   Pancreatic cancer (PC) is a malignant digestive system tumor with a
   very poor prognosis. N6-methyladenosine (m6A) is mediated by a variety
   of readers and participates in important regulatory roles in PC. Based
   on TCGA_PAAD, ICGC_AU_PAAD, ICGC_CA_PAAD, [41]GSE28735 and [42]GSE62452
   datasets, We mapped the multi-omics changes of m6A readers in PC and
   found that m6A readers, especially IGF2BP family genes, had specific
   changes and were significantly associated with poor prognosis. An
   unsupervised consensus clustering algorithm was used to explore the
   correlation between specific expression patterns of m6A readers in PC
   and enrichment pathways, tumor immunity and clinical molecular
   subtypes. Then, the principal component analysis (PCA) algorithm was
   used to quantify specific expression patterns and screen core genes.
   Machine learning algorithms such as Bootstrapping and RSF were used to
   quantify the expression patterns of core genes and construct a
   prognostic scoring model for PC patients. What's more, pharmacogenomic
   databases were used to screen sensitive drug targets and small molecule
   compounds for high-risk PC patients in an all-around and multi-angle
   way. Our study has not only provided new insights into personalized
   prognostication approaches, but also thrown light on integrating
   tailored risk stratification with precision therapy based on
   IGF2BP2-mediated m6A modification patterns.

   Keywords: IGF2BP2, m6A modification, Precision medicine, Pancreatic
   cancer

Highlights

     * •
       Our work developed a novel gene signature, poor prognostic
       signature (PPS), for prognostic prediction of IGF2BP2-mediated m6A
       modification patterns in PC.
     * •
       The PPS model has important clinical significance and we also
       identifiedpotential therapeutic targets and compounds target this
       model, which opens a new direction for the application of precision
       medicine in PC.
     * •
       This study has not only provided new insights into personalized
       prognostication approaches, but also thrown light on integrating
       tailored prognosis prediction with precision therapy.

1. Introduction

   Pancreatic cancer (PC) is an insidious, rapidly progressing digestive
   tract malignant tumor with poor therapeutic effect and prognosis
   [[43]1]. It is the seventh leading cause of cancer death, with a 5-year
   survival rate of only 10%. At present, surgical resection is considered
   to be the only cure for PC. However, clinical experience shows that,
   due to the absence of hallmark signs in the early stage of PC, patients
   usually have advanced stage by the time of diagnosis, and only about
   20% of patients meet the surgical indications [[44]2]. Unfortunately,
   the vast majority of patients will eventually experience tumor
   recurrence even after surgical resection, and current radiotherapy,
   chemotherapy, immunotherapy, and targeted therapy do not significantly
   improve the survival of patients with PC [[45]3]. In the past decade,
   despite rapid advances in diagnostic methods, perioperative management,
   radiotherapy techniques, and systemic treatment of advanced disease,
   the clinical outcome of patients with PC has received little benefit.
   Therefore, there is an urgent need to classify tumors according to the
   molecular similarities and differences related to clinical and biology,
   improve the established morphological and imaging methods, so as to
   achieve a more accurate prognosis, better optimize individual patient
   management and risk stratification, and improve the selection of
   systematic treatment options [[46]4,[47]5].

   Epigenetics refers to the regulation of transcription or translation
   processes to affect the characteristics and function of genes without
   changing the DNA sequence, which mainly includes DNA methylation,
   histone modification, genomic imprinting, chromosome remodeling, etc
   [[48]6]. In recent years, with the development of high-throughput
   sequencing technology, the study of RNA modification has received more
   and more attention. To date, more than 100 types of modifications have
   been identified in various RNAs such as messenger RNAs (mRNAs),
   transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), microRNAs (miRNAs), and
   long noncoding RNAs (lncRNAs) [[49]7]. Among them, the earliest
   discovered and most widely studied is N6-methyladenosine (m6A), which
   has been considered as the most common methylation modification in
   eukaryotic mRNAs [[50]8]. It has been estimated that about 0.1–0.4% of
   adenine in mRNA is specifically modified by m6A, and m6A modification
   usually occurs in RRACH sequences (R = A, G, or U; R = A or G; H = A,
   C, or U), mainly enriched in the 3’ non-coding region and near the stop
   codon. Regulation of m6A modification is dynamically reversible,
   mediated by m6A methyltransferases (also known as “Writers”), and can
   be removed by m6A demethylases (also known as “Erasers”). More
   importantly, the effect of m6A modification on RNA metabolism mainly
   depends on the recognition of different m6A-binding proteins (also
   known as “Readers”) [[51]9].

   The m6A-modified transcript can be identified and bound by the m6A
   reader protein, which controls gene expression through diverse
   processes. These include the regulation of mRNA stability [[52]10],
   mRNA splicing [[53]11], mRNA structure [[54]12], mRNA export [[55]13],
   translation efficiency [[56]14], and miRNA biogenesis [[57]15], thereby
   exerting its regulatory effects. Emerging findings indicate a close
   association between m6A reader and the development and advancement of
   multiple human cancers, especially in PC [[58]16]. For example, m6A
   reader IGF2BP2 could interact with LncRNA-PACERR and miR-671-3p,
   thereby promoting M2 polarization and pro-tumor function in TAMs of PC
   [[59]17]. Liu et al. unveiled the pivotal tumor-inhibitory functions of
   the YTHDC1/miR-30d axis in the progression of PC, showcasing its
   significance across biological, mechanistic, and clinical realms in the
   context of human PC and glycolytic pathways [[60]18].

   Given that the majority of existing research concentrates on one
   individual reader, a thorough and profound comprehension of the overall
   traits of malign biological conduct in PC, orchestrated by the combined
   impact of comprehensive m6A readers, is currently lacking. Hence, in
   this study, starting from all m6A readers, we first drew the
   multi-omics level change map of m6A readers through the public
   database, and then explored the unique expression pattern of m6A
   readers in PC by unsupervised clustering and principal component
   analysis algorithm and screened the core genes. Finally, we used
   machine learning algorithms to construct core-gene expression
   pattern-based scoring models for pharmacogenomic studies, so as to
   tailor specialized management for PC patients. The overall workflow is
   displayed in [61]Fig. 1.

Fig. 1.

   [62]Fig. 1
   [63]Open in a new tab

   Flowchat of this study.

2. Materials and Methods

2.1. Data acquisition and preprocessing

   Public transcriptional profiling datasets, including the TCGA_PAAD
   dataset from The Cancer Genome Atlas Program (TCGA), [64]GSE62452 and
   [65]GSE28735 dataset from Gene Expression Omnibus (GEO), ICGC_AU_PAAD
   dataset for International Cancer Genome Consortium (ICGC) were
   included. For the TCGA_PAAD dataset, somatic mutation, copy number
   variation (CNV), RNA expression in transcript per million fragments
   mapped (FPKM) format data, methylation data, complete clinical
   information, alternative splicing (AS) data, histone modification data
   were obtained from UCSC Xena ([66]https://xenabrowser.net/datapages/),
   TCGA Splice Seq dataset
   ([67]http://projects.insilico.us.com/TCGASpliceSeq/), and Cistrome Data
   Browser database ([68]http://cistrome.org/db/), respectively. RNA-seq
   data in fragments per kilobase of FPKM format was then transformed into
   transcripts per million (TPM). GEO datasets were downloaded from
   [69]https://www.ncbi.nlm.nih.gov/geo/in normalized data format. R
   package “sva” was then utilized to combine different sourced GEO
   datasets and TCGA datasets. ICGC_AU_PAAD dataset was downloaded from
   [70]https://dcc.icgc.org/, somatic mutation data, FPKM RNA-seq data and
   clinical information were included in our study. The somatic mutation
   data of ICGC_CA_PAAD was also included in our study and downloaded from
   [71]https://dcc.icgc.org/. The expression data of normal tissue was
   downloaded from the Genotype-Tissue Expression (GTEx) database
   ([72]https://gtexportal.org/home/). 16 m6A readers (YTHDF1, YTHDF2,
   YTHDF3, YTHDC1, YTHDC2, IGF2BP1, IGF2BP2, IGF2BP3, FMR1, HNRNPC,
   HNRNPA2B1, EIF3A, PRRC2A, RBMX, LRPPRC, SND1) were screened from
   literatures [[73]14,[74]19]. These genes with deleted expression
   profiles in TCGA_PAAD, ICGC_AU_PAAD, ICGC_CA_PAAD, [75]GSE28735 and
   [76]GSE62452 were excluded. Finally, 14 m6A readers were included in
   the subsequent study, and the deleted genes were HNRNPC and RBMX. For
   pharmacogenomic research, the CRISPR-Cas9 Essentiality REvealing Score
   (CERES) score was obtained from the Dependency Map (DepMap) dataset
   ([77]https://depmap.org/portal/) to measure the dependence of a gene of
   interest in a PC cell line. A lower CERES score indicates that the gene
   is more important for cell growth and survival of a given PC.
   Expression profile data of human cancer cell lines (CCLs) were obtained
   from the Broad Institute Cancer Cell Line Encyclopedia (CCLE) project
   ([78]https://portals.broadinstitute.org/ccle/). Drug sensitivity data
   of CCLs were achieved from the Cancer Therapeutics Response Portal
   (CTRP v.2.0, released October 2015,
   [79]https://portals.broadinstitute.org/ctrp) and profiling relative
   inhibition simultaneously in mixtures (PRISM) dataset (19Q4, released
   December 2019, [80]https://depmap.org/portal/prism/). The above two
   datasets used the Area Under the Curve (AUC) as a measure of drug
   sensitivity, and a lower AUC value indicated that the drug was more
   sensitive to treatment. The connective Map (CMap) database
   ([81]https://clue.io/) was also used to explore the interaction network
   between drugs/small molecule compounds, genes and disease states. The
   correlation score was obtained according to the enrichment of
   differentially expressed genes in the reference gene set, and the value
   was between −100 and 100. A positive number indicated that genes were
   positively correlated with the reference gene set, while a negative
   number indicated that differentially expressed genes were negatively
   correlated with the reference gene set.

2.2. Unsupervised consensus clustering

   After the expression matrix was standardized using sweep () function in
   R, the package “ConsensusClusterPlus” was applied for gene expression
   clustering. The parameters in this study were set as: maxK = 4,
   reps = 100, pItem = 0.8, pFeature = 1, title = title,
   clusterAlg = “pam”, distance = “spearman”. The cumulative distribution
   function (CDF) value and delta area of the CDF curve were used as
   evaluation criteria for every single clustering.

2.3. Differential analysis and pathway analysis

   The “limma” package in R was employed to screen for differentially
   expressed genes (DEGs) between Readercluster_A and Readercluster_B
   groups. To rectify false positive results, adjusted P-values were
   applied using the default Benjamini-Hochberg false discovery rate (FDR)
   method. The cutoff values for identifying DEGs were set at FDR <0.01
   and |log2fold change (FC)| > 2. Gene Ontology (GO) and Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathway analyses, were
   performed on the DEGs by using the “clusterProfiler” R package. Gene
   set enrichment analysis (GSEA) provided by MsigDB was adopted to
   determine the statistical significance of molecular pathways as well as
   the consistent heterogeneities between high and low-PPS groups based on
   the R package “fgsea”. FDR <0.05 was considered statistically
   significant. Pathifier analysis is mainly used to identify specific
   signaling pathways at specific stages of cancer through correlation,
   variance stability and principal component analysis methods. For each
   PC sample, the Pathway Deregulation Score (PDS) was calculated to
   estimate the extent to which the activity of a certain pathway in PC
   samples deviates from that in normal samples, and thus to aggregate the
   gene-level data Conversion to pathway-level data was used for
   subsequent analysis. The Pathifier algorithm relies on “Pathifier”
   package in R.

2.4. Dimensionality reduction analysis of random forest (RF)

   To further extract the core genes for subsequent modeling analysis, the
   Boruta dimensionality reduction algorithm based on RF was used here.
   Specific analysis steps and codes are as follows: (1) findCorrelation
   () function in “Caret” R package was used to eliminate genes with high
   collinearity (cor >0.85); (2) Dimensionality reduction analysis is
   carried out by using the “Boruta” R package and the Boruta () function
   in the “randomForest” R package, TentativeRoughFix () function for
   feature determination, and getSelectedAttributes () function for final
   feature selection. The Boruta gene obtained after the operation is the
   gene obtained after dimensionality reduction.

2.5. Principal component analysis (PCA) analysis and contribution screening

     * (1)
       In the TCGA_PAAD dataset, based on the expression matrix of 14 m6A
       readers, the consensus clustering method was used to obtain the
       specific expression pattern of m6A readers, namely Readercluster;
     * (2)
       DEGs between specific expression patterns of m6A readers were
       obtained using the differential analysis method, which was defined
       as m6A reader phenotypic-related differential genes;
     * (3)
       The expression patterns of m6A reader phenotypic-related
       differential genes were obtained by consensus clustering analysis
       again, namely Reader. gene.cluster;
     * (3)
       The univariate Cox regression analysis method was used to obtain
       prognostic genes with p < 0.05;
     * (4)
       The aforementioned RF dimensionality reduction algorithm was used
       to further obtain the robust prognostic genes;
     * (5)
       The prcomp () function in the R package “prcomp” was used for PCA
       analysis, and principal component 1 (PC1) and principal component 2
       (PC2) were selected to construct the scoring model, which was named
       as Readerscore model, and the calculation formula was as follows:

   [MATH: <mrow><mtext>Readerscore</mtext><mo
   linebreak="badbreak">=</mo><mo>∑</mo><mrow><mo
   stretchy="true">(</mo><mrow><msub><mrow><mtext>PC</mtext><mn>1</mn></mr
   ow><mi mathvariant="normal">i</mi></msub><mo
   linebreak="badbreak">+</mo><msub><mrow><mtext>PC</mtext><mn>2</mn></mro
   w><mi mathvariant="normal">i</mi></msub></mrow><mo
   stretchy="true">)</mo></mrow></mrow> :MATH]

   where i is the expression level of core genes in (4). The scoring
   system calculates a score based on genes in the gene set that have
   strong positive (or negative) correlations, while reducing the weight
   of unrelated genes. Then, the surv_cutpoint () function in the
   “Survminer” package was used to determine the optimal cut-off value,
   and the samples were divided into two subgroups: Readerscore_high and
   Readerscore_low for subsequent analysis.

   (6) The factoextra () function in R package “prcomp” was used to draw
   PCA component diagram for the above PCA results, in which the value of
   coordinate axis PC1/2 was the explanation rate of the overall
   difference, the point in the figure represented the sample, the color
   represented the group, the arrow represented the original variable, and
   the direction represented the correlation between the original variable
   and the principal component. The length represents the contribution of
   the original data to the principal components. This step was used to
   screen the core genes with the highest contribution to principal
   components for subsequent analysis.

2.6. Gene set variation analysis (GSVA) analysis and single sample gene set
enrichment analysis (ssGSEA) analysis

   To explore diverse enrichment status in gene function for different
   clusters or subgroups, GSVA and ssGSEA were applied by R package “GSVA”
   and “GSEAbase”. Two gene sets were conducted for functional annotation
   from MsigDB ([82]http://www.gsea-msigdb.org/gsea/msigdb/), which were
   c2. cp.kegg.v7.4. symbols and h. all.v7.4. symbols, respectively. The
   “limma” R package was then utilized to distinguish the enrichment
   differences between different subgroups. The pathways with |log2FC| >
   0.1 and FDR <0.05 were considered the most differentially enriched
   molecular pathways between the two subtypes.

2.7. Construction of poor prognosis signature (PPS)

   In order to meet the standards of machine learning algorithms, RNA-Seq
   data and clinical data of 328 samples from TCGA_PAAD, ICGC_AU_PAAD and
   [83]GSE62452 and [84]GSE28735 were included. Data preprocessing methods
   are as follows:
     * (1)
       According to the median expression of IGF2BP2, 328 PC samples were
       divided into two subgroups: IGF2BP2_high and IGF2BP2_low, and the
       “limma” R package was used for difference analysis. On the basis of
       |log2FC| > 0 and adj. p < 0.01, we screened out IGF2BP2 related
       differential genes;
     * (2)
       apply () and findCorrelation () functions were utilized to remove
       genes that had strong collinearity (R > 0.9) and low variability
       (MAD <0.5). The preProcess () function in R package “caret” was
       used to normalize and standardize the expression of DEGs. Finally,
       the createDataPartition () function was used to divide the treated
       differential gene expression matrix into training set and testing
       set according to the ratio of 7:3.
     * (3)
       The coxph () function in the R package “survival” was used to
       perform univariate Cox regression analysis on the data of the
       training set, and statistically significant, prognostic
       differential genes (p < 0.05);
     * (4)
       In order to re-evaluate the correlation between gene expression and
       prognosis, the “Bootstrapping” method described above was used to
       test the robustness of genes with prognostic differences. We
       performed univariate Cox regression analysis on 70% of the samples
       randomly selected from the training set. 1000 iterations were tuned
       to select the samples with significant prognostic value in 90% of
       the resampled samples, and were further included in the subsequent
       analysis;
     * (5)
       To further screen the core genes, rfsrc () function in the R
       package “randomForestSRC” was used for random survival forest (RSF)
       analysis. In this RSF analysis, the function used log-rank and
       score-splitting algorithms to build 1000 survival numbers, and the
       included DEGs were ranked according to the minimal depth (MD)
       method. The smaller the variable value, the better the prediction
       efficiency. 10-fold cross-validation was adopted for tuning.
       Finally, the above RSF analysis was iterated 1000 times, and the
       Concordance index (C-index) of out-of-bag samples was used as the
       criterion to judge the quality of the model. The higher the c-index
       value, the best-fitting effect of the model constructed by this
       gene set was observed, and this gene set was named PPS.
     * (6)
       Based on the gene set obtained in the previous step, coxph ()
       function in R package “survival” was used for multivariate Cox
       regression analysis. The PPS scoring model was constructed with the
       correlation coefficients obtained from multivariate Cox regression,
       and the calculation formula was as follows:

   [MATH: <mrow><mtext>PPS</mtext><mspace
   width="0.25em"></mspace><mtext>score</mtext><mo
   linebreak="badbreak">=</mo><munderover><mo>∑</mo><mrow><mi
   mathvariant="normal">i</mi><mo
   linebreak="badbreak">=</mo><mn>1</mn></mrow><mi
   mathvariant="normal">n</mi></munderover><msub><mtext>Coef</mtext><mi
   mathvariant="normal">i</mi></msub><mo>*</mo><msub><mi
   mathvariant="normal">x</mi><mi
   mathvariant="normal">i</mi></msub></mrow> :MATH]

   where Coef[i] is the multivariate Cox regression coefficient
   corresponding to the i[th] gene, and x[i] is the expression value of
   the i[th] gene. In addition, the AUC value of the training set and
   testing set was used as the basis to judge the effectiveness of the PPS
   model and other models.

2.8. Screening for potential drug targets

   In order to identify the target proteins (genes) with potential
   therapeutic significance in PC patients with high PPS scores, we first
   downloaded the detailed target information of 6125 compounds from The
   Drug Repurposing Hub database, and after deleting the duplicated target
   information, we performed the following analysis (Figure S1):
     * (1)
       Spearman correlation analysis was used to analyze the correlation
       between the protein expression profile of target genes and PPS
       score in PC patients. R > 0.4 and p < 0.05 were used as the
       criteria to screen the drug targets related to the poor prognosis
       of PC.
     * (2)
       PPS scores of PC cell lines in the CCLE database were calculated
       according to the above-mentioned scoring model construction
       formula, and Spearman correlation analysis was performed between
       CERES scores obtained from the DepMap database and PPS scores.
       Since a lower CERES score of a gene indicates a stronger dependence
       of the gene on the malignant biological phenotype of PC, R < −0.3
       and p < 0.05 were used as the criteria to screen the drug targets
       related to the poor prognosis of PC.
     * (3)
       The targets obtained in (1) and (2) were intersected, and the drug
       targets with high confidence were finally obtained.

2.9. Drug sensitivity analysis

   In order to conduct drug sensitivity analysis of PC patients with high
   PPS scores, the CTRP and PRISM databases were used for the analysis.
   The detailed data acquisition and analysis process are as follows
   (Figure S2):
     * (1)
       CTRP and PRISM database portals were used to download the gene
       expression matrix and drug sensitivity value (AUC) after drug
       intervention, and the cell lines with a missing value (NA value) of
       more than 20% or from hematopoietic and lymphoid tissues were
       excluded.
     * (2)
       The pRRopheticPredict () function in the “pRRophetic” R package was
       used to predict the drug sensitivity of PC samples included in this
       study, and the predicted AUC value was obtained.
     * (3)
       Wilcoxon rank-sum test was used to analyze the differential drug
       sensitivity between the high PPS score subgroup (top 10%) and the
       low PPS score subgroup (bottom 10%). To identify compounds with
       higher sensitivity (lower AUC values) in the subgroup with higher
       PPS scores. The screening criteria were: log2FC > 0.02 (CTRP),
       log2FC > 0.03 (PRISM);
     * (4)
       Spearman correlation analysis was performed between drug
       susceptibility values and PPS scores to identify compounds with
       negative correlation coefficients. The screening criteria were:
       R < −0.3 (CTRP),R < −0.45 (PRISM);
     * (5)
       By intersecting the compounds obtained in (3) and (4), highly
       sensitive drugs could be obtained based on CTRP and PRISM
       databases, respectively.

   For complementary validation of the higher sensitivity drugs screened
   by the above methods, we also performed CMap analysis. Firstly, the
   differences between PC samples and normal pancreatic tissue samples in
   the GTEx database were analyzed, and the 300 genes with the most
   significant differences (including 150 significantly up-regulated genes
   and 150 significantly down-regulated genes) were screened out. Then,
   these DEGs were submitted to the CMap cloud platform
   ([85]https://clue.io/query/) for analysis, and the CMap score of each
   compound was obtained. If the negative score of a compound is smaller,
   the gene expression pattern of the compound is opposite to the
   expression pattern of PC. It is suggested that this compound has a good
   therapeutic effect in PC.

2.10. Statistical analysis

   All statistical analyses were performed in R (version 4.1.1). The
   comparison of count data was tested by Fisher's test and Chi-square
   test. For the measurement data that conformed to the normal
   distribution, the Student-t test was applied; besides, the Wilcox test
   was applied for non-normal distribution data between independent
   subgroups. Spearman analysis was applied to estimate the correlations
   between two variables that are not linearly related. The Kaplan-Meier
   (K-M) test was utilized to validate the fraction of PC patients living
   for a certain survival time and the log-rank test was conducted to
   compare the significance of the difference. R package “survival” was
   used for depicting K-M survival curve. A two-tailed p-value of less
   than 0.05 was deemed to be statistically significant unless
   specifically stated.

3. Results

3.1. Landscape of multi-omics variation of m6A readers in PC

   Firstly, we analyzed the mutation variation of m6A readers in PC, three
   datasets (namely, TCGA_PAAD, ICGC_AU_PAAD and ICGC_CA_PAAD) were
   selected, all of which have complete gene expression and clinical data.
   As shown in Figure S3, the overall mutation rate of 14 m6A readers in
   335 samples was low, among which the top three genes with mutation rate
   were SND1, IGF2BP3 and IGF2BP2. From the perspective of mutation type,
   intronic mutation was the most common mutation in the readers, which
   indicated that the structural changes caused by intronic mutation might
   be the reason for the essential biological effects of the m6A readers.

   With regards to the CNV, data from TCGA_PAAD datasets were included.
   Among 140 PC samples with entire CNV data, 57.14% (8/14) of the m6A
   readers had copy number amplification frequency greater than copy
   number deletion frequency, indicating that copy number amplification
   was a relatively common mutation event of m6A readers at the DNA level.
   In terms of single genes, IGF2BP1 had the highest frequency of CNV
   events (9.22%), followed by PRRC2A (8.51%) and YTHDF2 (7.09%) (Figure
   S4A). The copy number increase or decrease of large genome fragments
   with copy number variation of more than 1 kb is mainly manifested as
   submicroscopic deletions and duplications, which are important
   components of genome structural variation. Therefore, we also explored
   the relationship between CNV events and the expression of m6A readers.
   When the copy number of m6A readers increased, its mRNA expression also
   showed an up-regulated trend. Similarly, when the copy number of m6A
   readers was deleted, its expression was down-regulated accordingly
   (Figure S4B-O).

   For the analysis of AS events of m6A readers in PC, only the TCGA_PAAD
   dataset with more comprehensive data information was included here.
   Through the TCGA Splice Seq database, we obtained 139 samples with AS
   events data, including exon skip (ES), alternate terminator (AT),
   alternate promoter (AP), alternate acceptor (AA), alternate donor (AD),
   retained intron (RI) and mutually exclusive (ME). From the level of
   whole transcriptome genes, ES events in PC samples ranked first
   (31.81%, 6750/21,218), AT and AP ranked second (17.98%, 3816/21,218)
   and third (17.55%, 3724/21,218), respectively (Figure S5A). However,
   for the 14 m6A readers, the overall number of AS events was small.
   Similarly, the most common type of AS was still ES (40%, 8/20) (Figure
   S5B). Therefore, exon deletion caused by ES is also an important link
   in the regulation of PC by m6A readers.

   As the most common modification in biology, m6A modification under the
   epigenetic branch has been widely reported to be closely related to the
   occurrence and development of PC. To further verify the epigenetic
   expression of m6A readers, DNA methylation and histone acetylation were
   analyzed respectively. After downloading PC methylation data from the
   TCGA portal and conducting data quality control, we obtained the
   methylation signal value matrix (β value matrix) of m6A reader genes to
   quantify the methylation level of each gene. As shown in Figure S6,
   correlation analysis between the expression level of m6A readers and β
   value shows that, at p < 0.05 level, 71.43% of the expression of
   readers were negatively correlated with methylation levels. Among them,
   YTHDF3 (R = −0.51, p = 1.9E-13), IGF2BP2 (R = −0.5, p = 6.3E-13), The
   expression of SND1 (R = −0.4, p = 2.5E-08) had the most significant
   negative correlation with methylation level, which indicated that most
   readers were regulated by methylation in PC. For the analysis of the
   histone acetylation level of the m6A readers, the most common histone
   H3K27ac was selected as the research object here due to the limitation
   of the database. Through the aforementioned method, we paired the
   acetylation data in the Cistrome Data Browser database with the UCSC
   Browser for visualization. Compared with normal pancreatic cells, the
   histone H3K27ac acetylation modification peaks in the promoter regions
   of IGF2BP1, IGF2BP2, IGF2BP3, YTHDF2, YTHDF3 and PRRC2A in PC cell
   lines were significantly increased (Figure S7). This indicates that the
   above m6A readers may be regulated by H3K27ac acetylation, which also
   provides a theoretical basis for subsequent related studies in the
   future.

   In the previous study, we systematically analyzed the m6A readers at
   the DNA level and epigenetic levels. On the whole, most of the m6A
   readers were regulated by SNP, CNV, AS, methylation and acetylation. To
   further analyze the effect of these readers at the mRNA level. We
   selected 141 PC samples from the TCGA_PAAD database and 165 normal
   samples from the GTEx database for differential expression analysis, as
   shown in Figure S8 mRNA expression levels of IGF2BP2 (log2FC = 5.45,
   FDR = 9.02E-89), IGF2BP3 (log2FC = 2.41, FDR = 4.47E-30) and IGF2BP1
   (log2FC = 1.58, FDR = 2.28E-86) in PC samples were significantly higher
   than those in normal tissues, which may be related to the high mutation
   frequency of IGF2BP2 and IGF2BP3 mentioned above. To further explore
   the relationship between m6A readers and the prognosis of PC patients,
   the 14 readers were divided into high expression group and low
   expression group according to their median expression level, and the
   overall survival (OS) between the two groups was compared. For YTHDC1,
   YTHDC2, YTHDF1, YTHDF2, YTHDF3 and PRRC2A, the high expression group of
   PC patients had better OS, while other readers showed opposite results
   (Figure S9A–N). At the same time, univariate Cox analysis was performed
   for 14 m6A readers, and the risk network diagram was drawn, which was
   consistent with the results of the survival analysis described above
   (Figure S9O). These results indicate that YTHDC1, YTHDC2, YTHDF1,
   YTHDF2, YTHDF3 and PRRC2A are protective factors in PC patients.
   IGF2BP1, IGF2BP2, IGF2BP3, FMR1, HNRNPA2B1, EIF3A, LRPPRC and SND1 were
   risk factors.

3.2. Construction of specific expression patterns of m6A readers

   In order to explore the specific expression pattern of m6A readers, a
   total of 328 PC samples were included (TCGA_PAAD, ICGC_AU_PAAD,
   [86]GSE28735 and [87]GSE62452 datasets). Based on the expression of 14
   m6A readers, we conducted an unsupervised consensus clustering
   analysis. According to the heatmap of consensus clustering ([88]Fig.
   2A), the optimal k = 2 could be determined. A total of 328 PC patients
   were divided into two subgroups: Readercluster_A (n = 161, 49.09%) and
   Readercluster_B (n = 167, 50.91%). To verify the reliability of the
   grouping, PCA analysis was performed on the expression profile of 14
   m6A readers. As shown in [89]Fig. 2B, in the PCA diagram, the samples
   classified as Readercluster_A (red) and Readercluster_B (blue) have
   high discrimination, indicating that the grouping effect is robust. In
   addition, to explore the effect of different m6A reader gene expression
   patterns on the prognosis of PC patients, a survival analysis was
   performed. The samples in the Readercluster_A group were significantly
   lower than those of the Readercluster_B group on OS (p < 0.001),
   suggesting that the Readercluster_A expression pattern may lead to poor
   prognosis in PC patients ([90]Fig. 2C).

Fig. 2.

   [91]Fig. 2
   [92]Open in a new tab

   Construction of specific expression patterns of m6A readers. (A) Heat
   map based on the expression matrix of 14 m6A readers, with the optimal
   grouping k value of 2. (B) PCA algorithm diagram shows that
   Readercluster_A and Readercluster_B groups have better classification
   effect. (C) Survival curve showed that the prognosis of patients in
   Readercluster_A group was poor. (D) Heat map of GSVA score based on
   KEGG background gene set. (E) ssGSEA analysis was performed based on
   the immune background gene set to judge the infiltration abundance of
   immune cells.

   To further detect the differences in biological behaviors among
   different m6A reader expression patterns, we performed GSVA analysis
   using the KEGG pathway as the background gene set. The expression
   pattern of the Readercluster_A subtype was mainly enriched in oncogenic
   pathways: Notch pathway, p53 pathway and pancreatic cancer pathway,
   while the expression pattern of Readercluster_B subtype was mainly
   enriched in metabolic pathways: fatty acid metabolism pathway, alanine
   metabolism pathway and tryptophan metabolism pathway ([93]Fig. 2D).
   These results provide a direction for further research on the oncogenic
   mechanism downstream of the m6A readers.

   We also preliminarily explored the difference in tumor immune
   microenvironment (TIME) between the specific expression patterns of the
   two m6A readers. ssGSEA analysis was performed using immune gene sets
   for the specific expression patterns of the two m6A readers. On the
   whole, the abundance of immune cell infiltration of the Readercluster_B
   expression pattern was significantly higher than that of
   Readercluster_A. From the perspective of the specific degree of cell
   infiltration, the Readercluster_B expression pattern has abundant
   innate immune cell infiltration, such as eosinophils, MDSC and
   dendritic cells ([94]Fig. 2E). These results provided a basis for the
   better prognosis of the Readercluster_B expression pattern.

   Next, to further explore the characteristics of the expression pattern
   of these m6A readers in different clinical features and biological
   behaviors, 141 PC samples in the TCGA_PAAD dataset with the most
   comprehensive clinical information were selected as the only study
   subjects for all subsequent analyses. Firstly, we used the expression
   matrix of 14 m6A readers in TCGA_PAAD dataset as the input object for
   consensus clustering analysis again. 141 PC patients were also divided
   into Readercluster_A (n = 77, 54.61%) and Readercluster_B (n = 64,
   45.39%) subgroups (Figure S10A-C).

   For checking the relationship between different m6A reader gene
   expression patterns and PC molecular subtypes, we first obtained the
   characteristic gene sets of three classical molecular classification
   (Moffitt classification, Collisson classification and Bailey
   classification) in the reference literature [[95]4,[96]20,[97]21]. Then
   consensus clustering analysis was performed on the expression matrix of
   these characteristic gene sets in TCGA_PAAD. The characteristic gene
   sets of the three molecular types were well clustered in the TCGA_PAAD
   dataset, so as to obtain the data of the three molecular
   classifications in the TCGA_PAAD dataset: (1) Moffitt classification:
   Classical (n = 66, 46.81%), Basel (n = 75, 53.19%); (2) Collisson
   classification: Classical (n = 36, 25.53%), Exocrine (n = 66, 46.81%)
   and QM-PDA (n = 39, 27.66%). (3) Bailey classification: squamous
   (n = 44, 31.21%), progenitor (n = 36, 25.53%), immunogenic (n = 21,
   14.89%) and Aberrantly Differentiated Endocrine Exocrine (ADEX)
   (n = 40, 28.37%) (Figure S10D–F). Finally, visual analyses were
   performed by adding column annotations (clinical information) to
   expression heatmaps. 84.1% (37/44) of the squamous type samples (with
   the worst prognosis) were found in the samples of the Readercluster_A
   group, and 85.71% (66/77) of the samples of the Readercluster_A group
   had mutations in the m6A readers (Figure S10G). These results may
   confirm that Readercluster_A was significantly associated with
   molecular subtypes with poor prognosis. In addition, among the 14 m6A
   readers, IGF2BP2 and IGF2BP3 were significantly differentially
   expressed between Readercluster_A group and Readercluster_B group
   (IGF2BP2, p < 0.01; IGF2BP3, p < 0.01), indicating that IGF2BP2 and
   IGF2BP3 may be key molecules with different expression patterns of m6A
   readers.

3.3. Generation of m6A readers phenotype-related expression patterns and
screening of key gene

   We found that the gene expression patterns of Readercluster_A and
   Readercluster_B in PC were significantly different, and affected tumor
   progression, metabolism and other pathways. To further explore and
   quantify the reasons for the differences in gene expression patterns of
   different m6A readers, we constructed a phenotype-related scoring model
   for m6A readers.

   Firstly, in the TCGA_PAAD cohort, we performed differential analysis on
   the gene expression patterns of Readercluster_A and Readercluster_B in
   PC, and screened 1468 DEGs related to the phenotype of m6A readers.
   Univariate Cox regression analysis was performed on 1468 differentially
   expressed genes to screen out 140 differentially expressed genes that
   were significantly associated with prognosis. To explore the specific
   gene expression patterns associated with different m6A reader
   phenotypes, we performed consensus cluster analysis on 140 genes
   associated with both the phenotype and prognosis of m6A readers. We
   obtained unique gene expression patterns related to the phenotype of
   two groups of m6A readers. They were defined as Reader. gene.cluster_A
   (n = 72, 51.06%) and Reader. gene.Cluster_B (n = 69, 48.94%),
   respectively ([98]Fig. 3A). Survival analysis showed that the
   expression pattern of Reader. gene.cluster_A had poor prognosis
   (p = 0.004) ([99]Fig. 3B). So far, we found that the expression pattern
   of Reader. gene.cluster was highly consistent with that of
   Readercluster: (1) Among the samples classified as Reader.
   gene.cluster_A group, 79.17% of the samples were also classified as
   Readercluster_A group; Among the samples classified as Reader.
   gene.Cluster_B group, 71.01% of the samples were also classified as
   Readercluster_B group; (2) The survival time of PC patients classified
   as Reader. gene.cluster_A group and Readercluster_A group was
   significantly lower than that of patients classified as Reader.
   gene.cluster_B group and Readercluster_B group; (3) IGF2BP2 and IGF2BP3
   were significantly differentially expressed in Reader. gene.cluster and
   Readercluster. These results indicated that there were indeed two
   specific modification patterns related to m6A readers in PC, and
   IGF2BP2 and IGF2BP3 may be the key regulatory genes.

Fig. 3.

   [100]Fig. 3
   [101]Open in a new tab

   Ceneration of m6A readers phenotype-related expression patterns and
   screening of key gene. (A) Heat map of concordance cluster analysis
   based on 140 genes related to both m6A reader phenotype and prognosis
   showed that the optimal grouping k value was 2; (B) Survival curve
   showed that the Reader. gene.cluster_A expression pattern had a poor
   prognosis. (C) Survival curve showed that the survival rate of patients
   decreased significantly with the decrease of readerscore; (D) The
   correlation impact map of the three subtypes showed that the expression
   patterns of Readercluster type, Reader. gene.Cluster type and
   Readerscore type were highly consistent. (E) PCA plots of expression
   patterns associated with two m6A reader phenotypes. The results showed
   that PC1+PC2 could explain 88.14% of the data variation, and the
   variable IGF2BP2 had the highest contribution.

   To further quantify the expression patterns related to the phenotype of
   m6A readers, a scoring system was constructed by PCA method. The best
   cutoff value (0.74) was used to divide into two subgroups:
   Readerscore_high (n = 58, 41.13%) and Readerscore_low (n = 83, 58.87%).
   As shown in [102]Fig. 3C, it can be found from the survival curve that
   with the reduction of Readerscore, the survival rate of patients
   significantly decreased. To explore the connection between these three
   subgroups mentioned above, we used the alluvial plot to more
   intuitively show the connection between sample attributes. The lower
   the Readerscore of PC patients, it is usually classified as
   Readercluster_A and Reader. gene.cluster_A group, and its survival time
   is also worse ([103]Fig. 3D). These results confirmed that our
   constructed Readerscore system can effectively quantify expression
   patterns related to m6A readers’ phenotypes.

   Next, we performed contribution analysis on the expression data which
   used to construct the model to explore the core genes between the two
   different expression patterns. As shown in [104]Fig. 3E, in this PCA
   plot, the explanation rate of principal component 1 (PC1) to the
   overall difference was 65.3%, the explanation rate of principal
   component 2 (PC2) to the overall difference was 23.1%, and the
   explanation rate of PC1+PC2 to the data difference reached 88.4%,
   indicating that the PCA had a good dimensionality reduction effect.
   Secondly, IGF2BP2 is the only selected m6A readers among the top 100
   genes with the highest contribution to PC1, suggesting that IGF2BP2 can
   effectively explain the variation of the overall data, which further
   suggests that IGF2BP2 was the key gene leading to the change of
   expression pattern related to m6A readers’ phenotypes.

3.4. Functional annotation of IGF2BP2-mediated modification patterns in PC

   To further investigate the potential biological behavior of
   IGF2BP2-mediated patterns, the “clusterProfiler” R package was utilized
   to screen the differential genes in different IGF2BP2 expression
   patterns, we used the RNA-Seq data of PC in TCGA, ICGC and GEO
   databases to determine the median expression value of IGF2BP2. 328 PC
   samples were divided into IGF2BP2_high and IGF2BP2_low subgroups and
   5044 differential genes were obtained, which were defined as the
   differential genes related to IGF2BP2-mediated phenotype. The top 2000
   differential genes with significant differences were selected for
   heatmap (Figure S11A) to judge the efficiency of differential analysis.
   Then, we conducted GO and KEGG pathway enrichment analysis on the above
   5044 differentially expressed genes. IGFBBP2_high subgroup was mainly
   enriched in cell cycle-related pathways (“mitotic cell cycle”, “cell
   cycle”), tumor-related EMT and transcriptional dysregulation pathways
   (“epithelial cell differentiation”, “Regulation of transcription by RNA
   polymera”, “transcriptional misregulation in cancer”; The IGFBBP2_low
   subgroup was mainly enriched in immune effector process (“immune
   effector process”, “immune response”, “regulation of immune system
   process”, “primary immunodeficiency”) (Figure S11B–C). All these
   results confirmed that the high expression of IGF2BP2 in PC patients
   may promote the malignant biological phenotype of PC by suppressing the
   immune system.

3.5. Development of poor prognostic signatures based on IGF2BP2-mediated
modification patterns via machine learning algorithms

   Based on the 5044 differential genes associated with the IGF2BP2
   phenotype obtained by differential analysis, we explored prognostic
   gene markers for PC. 328 samples in TCGA, ICGC and GEO databases were
   divided into training set (n = 230) and testing set (n = 98) according
   to the ratio of 7:3. For the samples in the training set, we first
   performed univariate Cox regression analysis to obtain 889 genes
   associated with prognosis. The Bootstrapping algorithm was used to
   obtain 181 genes that were significantly associated with prognosis in
   900 sampling sessions. Then, RSF analysis based on the minimum depth
   (MD-RSF) method was used to determine the final prognostic gene
   markers. In the RSF algorithm with 1000 iterations ([105]Fig. 4A–B).
   The 13 gene sets (FNDC3B, L1CAM, PLXNA1, HMGA2, FAM110B, FAM83A,
   COX7A1, PMAIP1, KIF20B, SPDL1, SNCG, TGM2 and MUC16) included when
   C_index reached the maximum (0.7128982) were selected as poor
   prognostic signatures, defined as PPS associated with IGF2BP2
   phenotype. Finally, the multivariate Cox regression coefficients of the
   above 13 PPS were calculated and the PPS scoring model was constructed.
   The specific PPS scoring formula was as follows:
   [MATH: <mrow><mtext>PPS</mtext><mspace
   width="0.25em"></mspace><mtext>socre</mtext><mo
   linebreak="badbreak">=</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>FNDC</mtext><mn>3</mn><mi
   mathvariant="normal">B</mi></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.16009993</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mi
   mathvariant="normal">L</mi><mn>1</mn><mtext>CAM</mtext></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.28052790</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>PLXNA</mtext><mn>1</mn></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.22958423</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>HMGA</mtext><mn>2</mn></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.14046140</mn><mo>−</mo><mspace
   width="0.25em"></mspace><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>FAM</mtext><mn>110</mn><mi
   mathvariant="normal">B</mi></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.18477060</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>FAM</mtext><mn>83</mn><mi
   mathvariant="normal">A</mi></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.04911509</mn><mo>−</mo><mspace
   width="0.25em"></mspace><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>COX</mtext><mn>7</mn><mi
   mathvariant="normal">A</mi><mn>1</mn></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.23187426</mn><mo>−</mo><mspace
   width="0.25em"></mspace><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>PMAIP</mtext><mn>1</mn></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.08461359</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>KIF</mtext><mn>20</mn><mi
   mathvariant="normal">B</mi></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.16976500</mn><mo>−</mo><mspace
   width="0.25em"></mspace><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>SPDL</mtext><mn>1</mn></mrow><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.09529266</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mtext>SNCG</mtext><mo
   stretchy="true">)</mo></mrow><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.06892878</mn><mo>−</mo><mspace
   width="0.25em"></mspace><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>TGM</mtext><mn>2</mn></mrow><mo
   stretchy="true">)</mo></mrow><mspace
   width="0.25em"></mspace><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.01766628</mn><mo
   linebreak="goodbreak">+</mo><mtext>Expr</mtext><mrow><mo
   stretchy="true">(</mo><mrow><mtext>MUC</mtext><mn>16</mn></mrow><mo
   stretchy="true">)</mo></mrow><mspace
   width="0.25em"></mspace><mo>*</mo><mspace
   width="0.25em"></mspace><mn>0.04981059</mn></mrow> :MATH]

Fig. 4.

   [106]Fig. 4
   [107]Open in a new tab

   Development of poor prognostic signatures based on IGF2BP2-mediated
   modification patterns via machine learning algorithms. (A) C-index
   change curve of RSF algorithm after 1000 iterations; (B) curve of
   out-of-pocket sample error rate of RSF algorithm; According to the
   median value of PPS score, the survival curve was drawn in the training
   set (C), testing set (D), TCGA_PAAD dataset (G), and ICGC_AU_PAAD
   dataset (H), the results showed that the higher the PPS score, the
   worse the prognosis of PDAC patients. The area under the 1-year, 2-year
   and 3-year survival curve predicted by PPS model in the training set
   (E), testing set (F), TCGA_PAAD dataset (I) and ICGC_AU_PAAD dataset
   (J). (K) Comparison of prediction performance between PPS model and
   other models in the training set; (L) Comparison of prediction
   performance between PPS model and other models in the testing set.

   To evaluate the prognostic value of the model, PC samples were divided
   into two subgroups according to the median PPS score, namely PPS_high
   and PPS_low. Then, survival analysis was performed on the training set,
   testing set, TCGA_PAAD dataset and ICGC_AU_PAAD dataset, and survival
   curves were drawn. In the four independent data sets mentioned above,
   higher PPS score was positively correlated with shorter overall
   survival (training set, p < 0.0001; testing set, P = 0.031; TCGA_PAAD,
   p = 0.024; ICGC_AU_PAAD,p < 0.0001) ([108]Fig. 4C, D, G, H).

   In order to further verify the prediction efficiency of this model, the
   receiver operating characteristic curve (ROC) was selected for visual
   analysis. In the training and testing sets, the PPS scoring model was
   effective for patients in 1 year (training set, AUC = 0.727; testing
   set, AUC = 0.663; TCGA_PAAD, AUC = 0.671; ICGC_AU_PAAD, AUC = 0.700), 2
   years (training set, AUC = 0.802; testing set, AUC = 0.703; TCGA_PAAD,
   AUC = 0.710; ICGC_AU_PAAD, AUC = 0.709) and 3 years (training set,
   AUC = 0.805; testing set, AUC = 0.785; TCGA_PAAD, AUC = 0.660;
   ICGC_AU_PAAD, AUC = 0.710) had good efficacy in predicting survival
   ([109]Fig. 4E, F, I, J). In addition, it has been reported that several
   scholars have also constructed prognostic models for PC patients, such
   as Zheng [[110]22], Zhou [[111]23] and Yuan [[112]24]. In order to
   verify whether PPS score model has better predictive power compared
   with the prognostic model constructed by the above three teams, ROC
   analysis was also performed. PPS scoring model in the training set
   (PPS: 0.778 vs Zheng: 0.567, Zhou: 0.687, Yuan, 0.554) and testing set
   (PPS: 0.717 vs Zheng: 0.662, Zhou: The average AUC values of 0.651,
   Yuan, 0.584) were higher than those of the other three prognostic
   models ([113]Fig. 4K-L). The above results fully verify the robustness
   and predictive effectiveness of our PPS scoring model.

   Finally, in order to explore the relationship between 13 PPS genes and
   IGF2BP2, we conducted a correlation analysis. As shown in [114]Fig.
   5A–B, the expression of IGF2BP2 was highly correlated with PPS score
   (R = 0.64; p = 2.2E-16) and showed a significant correlation with the
   expression of 13 PPS genes. In addition, we also used the “GOSemSim” R
   package to analyze the enrichment pathway similarity of 14 genes
   (IGF2BP2 and 13 PPS genes) and the functional similarity distribution
   map among these genes is shown in [115]Fig. 5C. These results indicated
   that IGF2BP2 is not only highly correlated with the PPS gene at the
   transcriptome level, but also has high functional similarity with the
   PPS gene.

Fig. 5.

   [116]Fig. 5
   [117]Open in a new tab

   Exploration of the PPS-related molecular subtype characteristics. (A)
   Correlation map between PPS score and IGF2BP2 expression; (B)
   correlation map between 13 PPS genes and IGF2BP2 expression; (C) Box
   plot of enrichment similarity analysis between 13 PPS genes and IGF2BP2
   pathway. Correlation analysis between PPS score and Moffitt
   classification (D), Collisson classification (E), Bailey classification
   (F), m6A mutation status (G), IGF2BP2 copy number variation event (H).

3.6. Exploration of the PPS-related molecular subtype characteristics,
biological processes and drug targets

   At present, the classical clinical molecular subtypes for PC include
   Moffitt subtype, Collisson subtype and Bailey subtype. To explore the
   correlation between PPS score and molecular subtypes in PC, we
   conducted a correlation analysis between them. The Basel_like subtype
   in the Moffitt classification, the QM-PDA subtype in the Collisson
   classification, and the Squamous subtype in the Bailey classification
   had higher PPS scores, which have been shown to have a poor prognosis
   ([118]Fig. 5D–F). In addition, we also noted that patients with higher
   PPS scores had a higher frequency of m6A gene mutation events and
   IGF2BP2 copy number amplification events ([119]Fig. 5G–H). All these
   results indicated that PC patients with higher PPS scores have worse
   prognosis, which is consistent with the conclusion confirmed in the
   previous part.

   To investigate which biological processes contribute to the malignant
   phenotype in patients with high PPS scores, Pathifier analysis and GSEA
   analysis based on transcriptomics were performed. Firstly, the
   expression profiles of PPS_high and PPS_low subgroups were used for
   Pathifier analysis to obtain the PDS score of each sample, and the
   correlation between the PDS score and PPS score was calculated. If the
   correlation coefficient of a pathway (biological process) is positive,
   it can be considered that this biological process may be responsible
   for the poor prognosis of patients with high PPS score. Cell cycle and
   proliferation-related pathways (“mitotic spindle”, “G2M checkpoint”,
   “E2F targets”) had the highest correlation with poor prognosis
   ([120]Fig. 6A). Next, to further verify the reliability of the above
   conclusions, we performed GSEA analysis using RNA-seq data from samples
   in the training set. First, the correlation coefficient between the PPS
   score of the sample and RNA-Seq data was calculated, and then the
   correlation coefficient was used as the input for GSEA analysis. As
   shown in [121]Fig. 6B, genes with positive correlation coefficients
   were also enriched in cell cycle and proliferation-related pathways
   (“mitotic spindle”, “G2M checkpoint”, “E2F targets” and “Myc targets
   v1”). Through the above two methods, the obtained biological processes
   related to the PPS score were highly consistent, so we can infer that
   the dysregulation of the cell cycle and proliferation process may be
   the reason for the poor prognosis of patients with high PPS.

Fig. 6.

   [122]Fig. 6
   [123]Open in a new tab

   Identification of PPS-related biological processes and drug targets.
   (A) The PDS score was calculated based on PPS model grouping, and the
   heat map of PDS score was drawn. (B) GSEA analysis was performed based
   on PPS model grouping to laterally verify the reliability of enriched
   pathways in patients with high PPS scores. (C) Volcano plot and scatter
   plot of correlation between PPS score and drug gene targets. (D)
   Correlation between PPS score and CERES score of drug gene target
   volcano plot and scatter plot.

   In patients with high PPS scores, proteins that are highly positively
   correlated with PPS scores may be potential therapeutic targets.
   However, most proteins fail to exert their targeting effects due to the
   lack of active sites to which small molecule compounds can bind.
   Therefore, to explore potential drug therapeutic targets for patients
   with high PPS scores and poor prognosis, we conducted the following
   analysis: First, the correlation between protein expression profile
   data and PPS score after drug intervention was analyzed. 448 targeted
   proteins were obtained by the standard of R > 0.4 and p < 0.05.
   Secondly, correlation analysis was performed between the CERES score in
   the DepMap database and the PPS score of PC cell lines in CCLE. 87
   targeted proteins were screened out according to the standard of
   R < −0.3 and p < 0.05.7 potential protein targets (FOXM1, PRC1, CCNB1,
   SLC16A3, CCNA2, GGCX and AURKA) were screened out ([124]Fig. 6C–D).
   These results suggest that for patients with high PPS scores, effective
   therapeutic results may be achieved by inhibiting the above 7 protein
   targets.

3.7. Identification of potential therapeutic agents for high PPS score in PC

   To screen potential therapeutic compounds related to the PPS model,
   this part of the analysis was based on the gene expression data and
   drug sensitivity data (AUC) of PC cell lines in the CTRP and PRISM
   databases. The information of 437 compounds in the CTRP database and
   1438 compounds in the PRISM database were selected for subsequent
   analysis after removing the duplicated compound information in the two
   databases and removing the PC cell lines with NA values over 20% or
   originating from hematopoietic and lymphoid tissues ([125]Fig. 7A).

Fig. 7.

   [126]Fig. 7
   [127]Open in a new tab

   Identification of candidate agents with higher drug sensitivity in
   high-PPS score patients. (A) Compound information summary Venn diagram
   from CTRP and PRISM database; (B) predictive value of drug
   susceptibility between Kras mutation and non-mutation groups. (C) Six
   potential compounds derived from CTRP were screened by Spearman
   correlation analysis and drug sensitivity analysis; (D) Six potential
   compounds derived from PRISM were identified by Spearman correlation
   analysis and differential susceptibility analysis.

   We first obtained the AUC values of each compound in PC samples using
   the “pRRophetic” package with a built-in ridge regression model based
   on the gene expression profile data of PC samples. To verify the
   credibility of the AUC value, we conducted the following analysis:
   Selumetinib is a PI3K pathway inhibitor used in the treatment of PC,
   and a recent study found that PC patients with KRAS pathway mutations
   had longer survival after treatment than PC patients without KRAS
   pathway mutations [[128]25]. Therefore, according to the mutation
   status of the KRAS pathway, PC patients were divided into KRAS-altered
   and KRAS-unaltered subgroups. Wilcoxon rank-sum test was used to
   compare the sensitivity of the two subgroups to Selumetinib. As shown
   in [129]Fig. 7B, the AUC of PC patients in the KRAS-altered group was
   significantly decreased (p = 0.0056), which was consistent with the
   clinical findings of Selumetinib above. The above results further
   confirmed the reliability of our predicted AUC. For patients with high
   PPS scores, potential compounds were screened from CTRP database and
   PRISM database. 6 compounds from the CTRP database (Austocystin D,
   Tipifarnib, Merck60, Canertinib, BRD-K92856060 and Vincristine) and 6
   compounds from PRISM database (Brigatininb, Cephalomannine, YM-976,
   Triciribine, Tanespimycin and Vemurafenib) were obtained ([130]Fig.
   7C–D). The above 12 compounds had lower predictive AUC values in the
   samples with higher PPS scores, and their predictive AUC values were
   significantly negatively correlated with PPS scores.

   Although the sensitivity of these 12 compounds to patients with high
   PPS scores was high, the analysis of these 12 compounds alone does not
   adequately indicate the therapeutic potential of these 12 compounds in
   patients with PC. Therefore, as shown in [131]Fig. 8, we further
   conducted a multi-dimensional study. First, we used CMap analysis to
   explore compounds whose gene expression pattern was contrary to the
   specific expression pattern of PC (namely, gene expression was
   increased in PC tissues, but the expression was down-regulated after
   the intervention of some compounds). The CMap scores of three compounds
   (tipirifanib, vemurafenib and YM-976) were all less than −90. These
   compounds may have potential therapeutic effects on PC. Second, the
   expression levels (including mRNA and protein levels) of the target of
   the candidate compounds were analyzed differently between PC tissues
   and normal tissues. The larger the difference, the greater the
   potential of the candidate compounds to treat PC. Third, we conducted a
   comprehensive literature search in the PubMed database
   ([132]https://www.ncbi.nlm.nih/), in order to search for candidate
   compounds for the treatment of PC experimental and clinical evidence.
   Taken together, there is strong evidence that both tipifarnib and
   vemurafenib have the best therapeutic potential for patients with high
   PPS scores.

Fig. 8.

   [133]Fig. 8
   [134]Open in a new tab

   Multiangle validation was performed to identify the most likely
   potential therapeutic compounds for patients with high PPS scores.

4. Discussion

   More and more evidence has shown that m6A modification plays an
   indispensable role in inflammation, innate immunity and tumor
   regulation through the interaction of a variety of m6A regulatory
   factors [[135]26]. Since most of the current studies focus on a single
   regulator, there is no comprehensive and in-depth understanding of the
   overall characteristics of malignant biological behavior of PC mediated
   by the comprehensive action of multiple m6A readers. To this end, based
   on the 14 m6A readers, we found two distinct expression patterns of m6A
   readers, and determined that the two expression patterns had
   significant heterogeneity in pathway enrichment, tumor immune
   microenvironment, and clinical classification. Finally, through
   contribution analysis of the constructed scoring model, IGF2BP2 was
   identified as the core gene affecting the related expression pattern of
   m6A readers. IGF2BP2 is an RNA-binding protein that regulates multiple
   biological processes. Previously, IGF2BP2 was thought to be a type 2
   diabetes (T2D)-associated gene [[136]27]. Indeed IGF2BP2 modulates
   cellular metabolism in human metabolic diseases such as diabetes,
   obesity and fatty liver through post-transcriptional regulation of
   numerous genes in multiple cell types. Emerging evidence shows that
   IGF2BP2 is an m6A reader that participates in the development and
   progression of PC by communicating with different RNAs such as miRNAs,
   mRNAs and lncRNAs [[137]28]. Liu Y found that IGF2BP2 and miR-671-3p
   could interact with lncRNA-PACERR, inducing pro-tumor macrophages in PC
   [[138]17]. Also, IGF2BP2 and lncRNA-DANCR work together to promote
   cancer stemness-like properties and pancreatic cancer pathogenesis
   [[139]29].

   In the past decade, the molecular biology of PC has been greatly
   enhanced by high-throughput sequencing analysis of large samples, which
   allows researchers to detect key events in the development of PC. With
   the recognition of PC-specific mismatch repair defects and homologous
   recombination defects, germline variants that increase the risk of PC
   have been identified, and these defects are conferring sensitivity to
   PC targeted therapy, which marks that the treatment strategy for PC has
   officially entered the era of precision medicine. In our study, we have
   confirmed that IGF2BP2 is a marker gene with certain molecular
   characteristics in PC, and higher expression of IGF2BP2 is closely
   related to worse prognosis. Considering the unique clinical features of
   PC patients with high IGF2BP2 expression, it is necessary to develop
   targeted treatment strategies for these patients. Therefore, an
   effective prognostic prediction model, the PPS score model, was
   developed based on the different expression patterns of IGF2BP2 in PC.
   Moreover, compared with other prognostic models, the PPS model based on
   different IGF2BP2 expression patterns has more significant and better
   predictive power in terms of prognosis. These results enable us to use
   precise prognostic prediction methods for patients with similar
   biological patterns to achieve a more reliable prognostic treatment
   effect.

   PC is highly heterogeneous among individuals, and no study has found a
   treatment that is suitable for all patients with PC. In addition, due
   to the lack of corresponding biomarkers, patients with advanced PC have
   not achieved satisfactory therapeutic effects. Therefore, it is of
   great interest to find tailored treatment strategies for specific
   populations. The PPS model we constructed not only has good prognostic
   efficacy, but also can be used as a biomarker to guide targeted
   therapy. In this study, we identified seven potential therapeutic
   targets (FOXM1, PRC1, CCNB1, SLC16A3, CCNA2, GGCX, and AURKA) and two
   agents (tipirafenib and verorafenib) for PC patients with high PPS
   scores.

   The protein encoded by FOXM1 belongs to the forkhead box family of
   transcription factors, which is a transcription factor related to cell
   proliferation and is overexpressed in a variety of human malignant
   tumor cells [[140]30]. Studies have shown that silencing FOXM1 can
   down-regulate the expression of ATX, thereby inhibiting the
   proliferation, invasion and metastasis of pancreatic cancer cells
   [[141]31]. PRC1 is a member of the polycomb inhibitory complex, and
   inhibition of PRC1 can prevent immune escape and angiogenesis in double
   negative breast cancer cells [[142]32]. The protein encoded by CCNB1 is
   a cyclin, which is mainly involved in the regulation of cell mitosis.
   Silencing CCNB1 can inhibit the proliferation of pancreatic cancer
   cells and promote cell senescence by activating p53 signaling pathway
   [[143]33]. CCNA2-encoded proteins also belong to the highly conserved
   cyclin family, which binds to and activates cyclin-dependent kinase 2,
   thereby promoting the transition from G1/S to G2/M. The up-regulation
   of CCNA2 expression in KRAS mutant gastric cancer cell lines and
   primary tumors leads to increased sensitivity to PLK1 inhibitors, and
   it can predict the sensitivity of PLK1 inhibitors to the treatment of
   advanced gastric cancer [[144]34]. SLC16A3 belongs to the solute
   carrier family, and the depletion of its encoded protein can promote
   the increase of reactive oxygen species and metabolism in pancreatic
   cancer cells, thereby promoting cell death [[145]35]. The protein
   encoded by AURKA is a cell cycle regulating kinase, and inhibition of
   AURKA expression can reduce its phosphorylation level in pancreatic
   cancer cells and promote cell necrosis [[146]36]. The above results
   suggest that the seven targets identified so far play a specific role
   in malignancies, especially PC, suggesting that the corresponding
   therapeutic targets developed for PC patients with high-risk IGF2BP2
   expression patterns are feasible.

   Tipifarnib, a farnesyltransferase inhibitor, has been widely used as a
   targeted therapy for the treatment of a variety of solid tumors.
   Considering that fa acylation is a key step in Ras protein membrane
   anchoring required for Ras activity, and KRAS mutations exist in
   70%–90% of PC, inhibition of farnesyltransferase can inhibit KRAS gene
   function, thereby inhibiting the progression of PC [[147]37]. The
   efficacy of tipifarnib has been demonstrated in a phase III clinical
   study in patients with advanced PC. The results showed that gemcitabine
   combined with gemcitabine can enhance anti-PC efficacy and is more
   tolerated by patients compared with gemcitabine alone [[148]38]. In
   addition, tipifarnib combined with atorvastatin and celecoxib can
   synergistically reduce the sphere formation ability of PC cells, and
   can significantly inhibit cell proliferation and promote apoptosis of
   sphere-forming cells [[149]39].

   Vemurafenib is a low molecular weight compound used to inhibit the
   mutated serine-threonine kinase BRAF by selectively binding to the
   ATP-binding site of the BRAF V600E kinase, thereby inhibiting its
   activity. vemurafenib has been recommended as a first-line treatment
   for unresectable advanced BRAF V600E mutation-positive melanoma
   [[150]40]. However, serine synthesis was significantly inhibited by
   vemurafenib in PC cell lines. Under serine depletion conditions, the
   death of PC cells treated with vemurafenib was significantly increased
   [[151]41]. In a case report, after vemurafenib targeted therapy for a
   PC patient with BRAF V600E mutation, the primary tumor, liver
   metastases and lymphatic metastases of the patient were significantly
   reduced, and the skin toxicity was weak [[152]42]. Unfortunately, no
   study has shown vemurafenib to prolong survival in patients with PC.
   This study provides new insights into improving the therapeutic effect
   of vemurafenib by selecting patients with vemurafenib-sensitive PC,
   thus providing new ideas for the precision treatment of PC.

   The PPS scoring model constructed in our study specifically quantifies
   the expression pattern of IGF2BP2 and is significantly better than
   other methods in risk stratification and prediction of individualized
   treatment. However, this study still has several limitations. First, to
   obtain more reliable results with a larger sample size, we used a
   machine learning algorithm to estimate the IGF2BP2 expression pattern
   in a subset of patients. However, there may be slight differences
   between the estimated IGF2BP2 expression pattern and the actual IGF2BP2
   expression pattern in these patients. Second, the results of drug
   target prediction and compound prediction cannot be verified against
   each other, which reduces the power of the conclusions in this part.
   Finally, all the conclusions of this study were based on public
   databases, and further experimental and clinical validation are needed
   to promote the clinical translation of the findings.

5. Conclusion

   In conclusion, the current work developed a novel gene signature, PPS,
   for prognostic prediction of IGF2BP2-mediated m6A modification patterns
   in PC. The PPS model has important clinical significance in both low-
   and high-PPS patients. For patients with high PPS scores, based on
   multiple drug susceptibility and target databases, we have identified
   seven potential therapeutic targets and two compounds from multiple
   perspectives and all-around aspects, which opens a new direction for
   the application of precision medicine in PC. Overall, this study has
   not only provided new insights into personalized prognostication
   approaches but also thrown light on integrating tailored prognosis
   prediction with precision therapy.

Ethical statement

   Not applicable.

Funding

   This study is supported by the National Natural Science Foundation of
   China (No. 82000614; No. 81873589; No. 82203494), Natural Science
   Foundation of Hunan Province (No. 2020JJ5876; No. 2022JJ40741); Science
   and Technology Project of Changsha (No. kq2004146; No. kq2004143);
   Project of the Health Commission of Hunan Province (No. 202204015341;
   No. 202104011922); The Wisdom Accumulation and Talent Cultivation
   Project of The Third Xiangya Hospital of Central South University (No.
   YX202203) and The Shanxi Provincial Basic Research Program (Free
   Exploration) Youth Science Research Project. (No. 202303021212363 to
   L.Z.).

Data availability statement

   All data used in this study are publicly available. The materials that
   support the conclusion of this article have been included within the
   Materials and Methods section “Data acquisition and preprocessing”.

CRediT authorship contribution statement

   Dongjie Chen: Writing – review & editing, Writing – original draft,
   Methodology, Formal analysis, Data curation, Conceptualization. Longjun
   Zang: Writing – review & editing, Writing – original draft,
   Methodology, Investigation. Yanling Zhou: Writing – review & editing,
   Writing – original draft. Yongchao Yang: Resources, Project
   administration. Xianlin Zhang: Resources. Zheng Li: Resources. Yufeng
   Shu: Supervision, Resources. Wenzhe Gao: Visualization, Validation,
   Supervision. Hongwei Zhu: Visualization, Validation, Supervision. Xiao
   Yu: Visualization, Validation, Supervision.

Declaration of competing interest

   The authors declare that they have no known competing financial
   interests or personal relationships that could have appeared to
   influence the work reported in this paper.

Acknowledgments