Abstract
Pancreatic cancer (PC) is a malignant digestive system tumor with a
very poor prognosis. N6-methyladenosine (m6A) is mediated by a variety
of readers and participates in important regulatory roles in PC. Based
on TCGA_PAAD, ICGC_AU_PAAD, ICGC_CA_PAAD, [41]GSE28735 and [42]GSE62452
datasets, We mapped the multi-omics changes of m6A readers in PC and
found that m6A readers, especially IGF2BP family genes, had specific
changes and were significantly associated with poor prognosis. An
unsupervised consensus clustering algorithm was used to explore the
correlation between specific expression patterns of m6A readers in PC
and enrichment pathways, tumor immunity and clinical molecular
subtypes. Then, the principal component analysis (PCA) algorithm was
used to quantify specific expression patterns and screen core genes.
Machine learning algorithms such as Bootstrapping and RSF were used to
quantify the expression patterns of core genes and construct a
prognostic scoring model for PC patients. What's more, pharmacogenomic
databases were used to screen sensitive drug targets and small molecule
compounds for high-risk PC patients in an all-around and multi-angle
way. Our study has not only provided new insights into personalized
prognostication approaches, but also thrown light on integrating
tailored risk stratification with precision therapy based on
IGF2BP2-mediated m6A modification patterns.
Keywords: IGF2BP2, m6A modification, Precision medicine, Pancreatic
cancer
Highlights
* •
Our work developed a novel gene signature, poor prognostic
signature (PPS), for prognostic prediction of IGF2BP2-mediated m6A
modification patterns in PC.
* •
The PPS model has important clinical significance and we also
identifiedpotential therapeutic targets and compounds target this
model, which opens a new direction for the application of precision
medicine in PC.
* •
This study has not only provided new insights into personalized
prognostication approaches, but also thrown light on integrating
tailored prognosis prediction with precision therapy.
1. Introduction
Pancreatic cancer (PC) is an insidious, rapidly progressing digestive
tract malignant tumor with poor therapeutic effect and prognosis
[[43]1]. It is the seventh leading cause of cancer death, with a 5-year
survival rate of only 10%. At present, surgical resection is considered
to be the only cure for PC. However, clinical experience shows that,
due to the absence of hallmark signs in the early stage of PC, patients
usually have advanced stage by the time of diagnosis, and only about
20% of patients meet the surgical indications [[44]2]. Unfortunately,
the vast majority of patients will eventually experience tumor
recurrence even after surgical resection, and current radiotherapy,
chemotherapy, immunotherapy, and targeted therapy do not significantly
improve the survival of patients with PC [[45]3]. In the past decade,
despite rapid advances in diagnostic methods, perioperative management,
radiotherapy techniques, and systemic treatment of advanced disease,
the clinical outcome of patients with PC has received little benefit.
Therefore, there is an urgent need to classify tumors according to the
molecular similarities and differences related to clinical and biology,
improve the established morphological and imaging methods, so as to
achieve a more accurate prognosis, better optimize individual patient
management and risk stratification, and improve the selection of
systematic treatment options [[46]4,[47]5].
Epigenetics refers to the regulation of transcription or translation
processes to affect the characteristics and function of genes without
changing the DNA sequence, which mainly includes DNA methylation,
histone modification, genomic imprinting, chromosome remodeling, etc
[[48]6]. In recent years, with the development of high-throughput
sequencing technology, the study of RNA modification has received more
and more attention. To date, more than 100 types of modifications have
been identified in various RNAs such as messenger RNAs (mRNAs),
transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), microRNAs (miRNAs), and
long noncoding RNAs (lncRNAs) [[49]7]. Among them, the earliest
discovered and most widely studied is N6-methyladenosine (m6A), which
has been considered as the most common methylation modification in
eukaryotic mRNAs [[50]8]. It has been estimated that about 0.1–0.4% of
adenine in mRNA is specifically modified by m6A, and m6A modification
usually occurs in RRACH sequences (R = A, G, or U; R = A or G; H = A,
C, or U), mainly enriched in the 3’ non-coding region and near the stop
codon. Regulation of m6A modification is dynamically reversible,
mediated by m6A methyltransferases (also known as “Writers”), and can
be removed by m6A demethylases (also known as “Erasers”). More
importantly, the effect of m6A modification on RNA metabolism mainly
depends on the recognition of different m6A-binding proteins (also
known as “Readers”) [[51]9].
The m6A-modified transcript can be identified and bound by the m6A
reader protein, which controls gene expression through diverse
processes. These include the regulation of mRNA stability [[52]10],
mRNA splicing [[53]11], mRNA structure [[54]12], mRNA export [[55]13],
translation efficiency [[56]14], and miRNA biogenesis [[57]15], thereby
exerting its regulatory effects. Emerging findings indicate a close
association between m6A reader and the development and advancement of
multiple human cancers, especially in PC [[58]16]. For example, m6A
reader IGF2BP2 could interact with LncRNA-PACERR and miR-671-3p,
thereby promoting M2 polarization and pro-tumor function in TAMs of PC
[[59]17]. Liu et al. unveiled the pivotal tumor-inhibitory functions of
the YTHDC1/miR-30d axis in the progression of PC, showcasing its
significance across biological, mechanistic, and clinical realms in the
context of human PC and glycolytic pathways [[60]18].
Given that the majority of existing research concentrates on one
individual reader, a thorough and profound comprehension of the overall
traits of malign biological conduct in PC, orchestrated by the combined
impact of comprehensive m6A readers, is currently lacking. Hence, in
this study, starting from all m6A readers, we first drew the
multi-omics level change map of m6A readers through the public
database, and then explored the unique expression pattern of m6A
readers in PC by unsupervised clustering and principal component
analysis algorithm and screened the core genes. Finally, we used
machine learning algorithms to construct core-gene expression
pattern-based scoring models for pharmacogenomic studies, so as to
tailor specialized management for PC patients. The overall workflow is
displayed in [61]Fig. 1.
Fig. 1.
[62]Fig. 1
[63]Open in a new tab
Flowchat of this study.
2. Materials and Methods
2.1. Data acquisition and preprocessing
Public transcriptional profiling datasets, including the TCGA_PAAD
dataset from The Cancer Genome Atlas Program (TCGA), [64]GSE62452 and
[65]GSE28735 dataset from Gene Expression Omnibus (GEO), ICGC_AU_PAAD
dataset for International Cancer Genome Consortium (ICGC) were
included. For the TCGA_PAAD dataset, somatic mutation, copy number
variation (CNV), RNA expression in transcript per million fragments
mapped (FPKM) format data, methylation data, complete clinical
information, alternative splicing (AS) data, histone modification data
were obtained from UCSC Xena ([66]https://xenabrowser.net/datapages/),
TCGA Splice Seq dataset
([67]http://projects.insilico.us.com/TCGASpliceSeq/), and Cistrome Data
Browser database ([68]http://cistrome.org/db/), respectively. RNA-seq
data in fragments per kilobase of FPKM format was then transformed into
transcripts per million (TPM). GEO datasets were downloaded from
[69]https://www.ncbi.nlm.nih.gov/geo/in normalized data format. R
package “sva” was then utilized to combine different sourced GEO
datasets and TCGA datasets. ICGC_AU_PAAD dataset was downloaded from
[70]https://dcc.icgc.org/, somatic mutation data, FPKM RNA-seq data and
clinical information were included in our study. The somatic mutation
data of ICGC_CA_PAAD was also included in our study and downloaded from
[71]https://dcc.icgc.org/. The expression data of normal tissue was
downloaded from the Genotype-Tissue Expression (GTEx) database
([72]https://gtexportal.org/home/). 16 m6A readers (YTHDF1, YTHDF2,
YTHDF3, YTHDC1, YTHDC2, IGF2BP1, IGF2BP2, IGF2BP3, FMR1, HNRNPC,
HNRNPA2B1, EIF3A, PRRC2A, RBMX, LRPPRC, SND1) were screened from
literatures [[73]14,[74]19]. These genes with deleted expression
profiles in TCGA_PAAD, ICGC_AU_PAAD, ICGC_CA_PAAD, [75]GSE28735 and
[76]GSE62452 were excluded. Finally, 14 m6A readers were included in
the subsequent study, and the deleted genes were HNRNPC and RBMX. For
pharmacogenomic research, the CRISPR-Cas9 Essentiality REvealing Score
(CERES) score was obtained from the Dependency Map (DepMap) dataset
([77]https://depmap.org/portal/) to measure the dependence of a gene of
interest in a PC cell line. A lower CERES score indicates that the gene
is more important for cell growth and survival of a given PC.
Expression profile data of human cancer cell lines (CCLs) were obtained
from the Broad Institute Cancer Cell Line Encyclopedia (CCLE) project
([78]https://portals.broadinstitute.org/ccle/). Drug sensitivity data
of CCLs were achieved from the Cancer Therapeutics Response Portal
(CTRP v.2.0, released October 2015,
[79]https://portals.broadinstitute.org/ctrp) and profiling relative
inhibition simultaneously in mixtures (PRISM) dataset (19Q4, released
December 2019, [80]https://depmap.org/portal/prism/). The above two
datasets used the Area Under the Curve (AUC) as a measure of drug
sensitivity, and a lower AUC value indicated that the drug was more
sensitive to treatment. The connective Map (CMap) database
([81]https://clue.io/) was also used to explore the interaction network
between drugs/small molecule compounds, genes and disease states. The
correlation score was obtained according to the enrichment of
differentially expressed genes in the reference gene set, and the value
was between −100 and 100. A positive number indicated that genes were
positively correlated with the reference gene set, while a negative
number indicated that differentially expressed genes were negatively
correlated with the reference gene set.
2.2. Unsupervised consensus clustering
After the expression matrix was standardized using sweep () function in
R, the package “ConsensusClusterPlus” was applied for gene expression
clustering. The parameters in this study were set as: maxK = 4,
reps = 100, pItem = 0.8, pFeature = 1, title = title,
clusterAlg = “pam”, distance = “spearman”. The cumulative distribution
function (CDF) value and delta area of the CDF curve were used as
evaluation criteria for every single clustering.
2.3. Differential analysis and pathway analysis
The “limma” package in R was employed to screen for differentially
expressed genes (DEGs) between Readercluster_A and Readercluster_B
groups. To rectify false positive results, adjusted P-values were
applied using the default Benjamini-Hochberg false discovery rate (FDR)
method. The cutoff values for identifying DEGs were set at FDR <0.01
and |log2fold change (FC)| > 2. Gene Ontology (GO) and Kyoto
Encyclopedia of Genes and Genomes (KEGG) pathway analyses, were
performed on the DEGs by using the “clusterProfiler” R package. Gene
set enrichment analysis (GSEA) provided by MsigDB was adopted to
determine the statistical significance of molecular pathways as well as
the consistent heterogeneities between high and low-PPS groups based on
the R package “fgsea”. FDR <0.05 was considered statistically
significant. Pathifier analysis is mainly used to identify specific
signaling pathways at specific stages of cancer through correlation,
variance stability and principal component analysis methods. For each
PC sample, the Pathway Deregulation Score (PDS) was calculated to
estimate the extent to which the activity of a certain pathway in PC
samples deviates from that in normal samples, and thus to aggregate the
gene-level data Conversion to pathway-level data was used for
subsequent analysis. The Pathifier algorithm relies on “Pathifier”
package in R.
2.4. Dimensionality reduction analysis of random forest (RF)
To further extract the core genes for subsequent modeling analysis, the
Boruta dimensionality reduction algorithm based on RF was used here.
Specific analysis steps and codes are as follows: (1) findCorrelation
() function in “Caret” R package was used to eliminate genes with high
collinearity (cor >0.85); (2) Dimensionality reduction analysis is
carried out by using the “Boruta” R package and the Boruta () function
in the “randomForest” R package, TentativeRoughFix () function for
feature determination, and getSelectedAttributes () function for final
feature selection. The Boruta gene obtained after the operation is the
gene obtained after dimensionality reduction.
2.5. Principal component analysis (PCA) analysis and contribution screening
* (1)
In the TCGA_PAAD dataset, based on the expression matrix of 14 m6A
readers, the consensus clustering method was used to obtain the
specific expression pattern of m6A readers, namely Readercluster;
* (2)
DEGs between specific expression patterns of m6A readers were
obtained using the differential analysis method, which was defined
as m6A reader phenotypic-related differential genes;
* (3)
The expression patterns of m6A reader phenotypic-related
differential genes were obtained by consensus clustering analysis
again, namely Reader. gene.cluster;
* (3)
The univariate Cox regression analysis method was used to obtain
prognostic genes with p < 0.05;
* (4)
The aforementioned RF dimensionality reduction algorithm was used
to further obtain the robust prognostic genes;
* (5)
The prcomp () function in the R package “prcomp” was used for PCA
analysis, and principal component 1 (PC1) and principal component 2
(PC2) were selected to construct the scoring model, which was named
as Readerscore model, and the calculation formula was as follows:
[MATH: Readerscore=∑(PC1i+PC2i) :MATH]
where i is the expression level of core genes in (4). The scoring
system calculates a score based on genes in the gene set that have
strong positive (or negative) correlations, while reducing the weight
of unrelated genes. Then, the surv_cutpoint () function in the
“Survminer” package was used to determine the optimal cut-off value,
and the samples were divided into two subgroups: Readerscore_high and
Readerscore_low for subsequent analysis.
(6) The factoextra () function in R package “prcomp” was used to draw
PCA component diagram for the above PCA results, in which the value of
coordinate axis PC1/2 was the explanation rate of the overall
difference, the point in the figure represented the sample, the color
represented the group, the arrow represented the original variable, and
the direction represented the correlation between the original variable
and the principal component. The length represents the contribution of
the original data to the principal components. This step was used to
screen the core genes with the highest contribution to principal
components for subsequent analysis.
2.6. Gene set variation analysis (GSVA) analysis and single sample gene set
enrichment analysis (ssGSEA) analysis
To explore diverse enrichment status in gene function for different
clusters or subgroups, GSVA and ssGSEA were applied by R package “GSVA”
and “GSEAbase”. Two gene sets were conducted for functional annotation
from MsigDB ([82]http://www.gsea-msigdb.org/gsea/msigdb/), which were
c2. cp.kegg.v7.4. symbols and h. all.v7.4. symbols, respectively. The
“limma” R package was then utilized to distinguish the enrichment
differences between different subgroups. The pathways with |log2FC| >
0.1 and FDR <0.05 were considered the most differentially enriched
molecular pathways between the two subtypes.
2.7. Construction of poor prognosis signature (PPS)
In order to meet the standards of machine learning algorithms, RNA-Seq
data and clinical data of 328 samples from TCGA_PAAD, ICGC_AU_PAAD and
[83]GSE62452 and [84]GSE28735 were included. Data preprocessing methods
are as follows:
* (1)
According to the median expression of IGF2BP2, 328 PC samples were
divided into two subgroups: IGF2BP2_high and IGF2BP2_low, and the
“limma” R package was used for difference analysis. On the basis of
|log2FC| > 0 and adj. p < 0.01, we screened out IGF2BP2 related
differential genes;
* (2)
apply () and findCorrelation () functions were utilized to remove
genes that had strong collinearity (R > 0.9) and low variability
(MAD <0.5). The preProcess () function in R package “caret” was
used to normalize and standardize the expression of DEGs. Finally,
the createDataPartition () function was used to divide the treated
differential gene expression matrix into training set and testing
set according to the ratio of 7:3.
* (3)
The coxph () function in the R package “survival” was used to
perform univariate Cox regression analysis on the data of the
training set, and statistically significant, prognostic
differential genes (p < 0.05);
* (4)
In order to re-evaluate the correlation between gene expression and
prognosis, the “Bootstrapping” method described above was used to
test the robustness of genes with prognostic differences. We
performed univariate Cox regression analysis on 70% of the samples
randomly selected from the training set. 1000 iterations were tuned
to select the samples with significant prognostic value in 90% of
the resampled samples, and were further included in the subsequent
analysis;
* (5)
To further screen the core genes, rfsrc () function in the R
package “randomForestSRC” was used for random survival forest (RSF)
analysis. In this RSF analysis, the function used log-rank and
score-splitting algorithms to build 1000 survival numbers, and the
included DEGs were ranked according to the minimal depth (MD)
method. The smaller the variable value, the better the prediction
efficiency. 10-fold cross-validation was adopted for tuning.
Finally, the above RSF analysis was iterated 1000 times, and the
Concordance index (C-index) of out-of-bag samples was used as the
criterion to judge the quality of the model. The higher the c-index
value, the best-fitting effect of the model constructed by this
gene set was observed, and this gene set was named PPS.
* (6)
Based on the gene set obtained in the previous step, coxph ()
function in R package “survival” was used for multivariate Cox
regression analysis. The PPS scoring model was constructed with the
correlation coefficients obtained from multivariate Cox regression,
and the calculation formula was as follows:
[MATH: PPSscore=∑i=1nCoefi*xi :MATH]
where Coef[i] is the multivariate Cox regression coefficient
corresponding to the i[th] gene, and x[i] is the expression value of
the i[th] gene. In addition, the AUC value of the training set and
testing set was used as the basis to judge the effectiveness of the PPS
model and other models.
2.8. Screening for potential drug targets
In order to identify the target proteins (genes) with potential
therapeutic significance in PC patients with high PPS scores, we first
downloaded the detailed target information of 6125 compounds from The
Drug Repurposing Hub database, and after deleting the duplicated target
information, we performed the following analysis (Figure S1):
* (1)
Spearman correlation analysis was used to analyze the correlation
between the protein expression profile of target genes and PPS
score in PC patients. R > 0.4 and p < 0.05 were used as the
criteria to screen the drug targets related to the poor prognosis
of PC.
* (2)
PPS scores of PC cell lines in the CCLE database were calculated
according to the above-mentioned scoring model construction
formula, and Spearman correlation analysis was performed between
CERES scores obtained from the DepMap database and PPS scores.
Since a lower CERES score of a gene indicates a stronger dependence
of the gene on the malignant biological phenotype of PC, R < −0.3
and p < 0.05 were used as the criteria to screen the drug targets
related to the poor prognosis of PC.
* (3)
The targets obtained in (1) and (2) were intersected, and the drug
targets with high confidence were finally obtained.
2.9. Drug sensitivity analysis
In order to conduct drug sensitivity analysis of PC patients with high
PPS scores, the CTRP and PRISM databases were used for the analysis.
The detailed data acquisition and analysis process are as follows
(Figure S2):
* (1)
CTRP and PRISM database portals were used to download the gene
expression matrix and drug sensitivity value (AUC) after drug
intervention, and the cell lines with a missing value (NA value) of
more than 20% or from hematopoietic and lymphoid tissues were
excluded.
* (2)
The pRRopheticPredict () function in the “pRRophetic” R package was
used to predict the drug sensitivity of PC samples included in this
study, and the predicted AUC value was obtained.
* (3)
Wilcoxon rank-sum test was used to analyze the differential drug
sensitivity between the high PPS score subgroup (top 10%) and the
low PPS score subgroup (bottom 10%). To identify compounds with
higher sensitivity (lower AUC values) in the subgroup with higher
PPS scores. The screening criteria were: log2FC > 0.02 (CTRP),
log2FC > 0.03 (PRISM);
* (4)
Spearman correlation analysis was performed between drug
susceptibility values and PPS scores to identify compounds with
negative correlation coefficients. The screening criteria were:
R < −0.3 (CTRP),R < −0.45 (PRISM);
* (5)
By intersecting the compounds obtained in (3) and (4), highly
sensitive drugs could be obtained based on CTRP and PRISM
databases, respectively.
For complementary validation of the higher sensitivity drugs screened
by the above methods, we also performed CMap analysis. Firstly, the
differences between PC samples and normal pancreatic tissue samples in
the GTEx database were analyzed, and the 300 genes with the most
significant differences (including 150 significantly up-regulated genes
and 150 significantly down-regulated genes) were screened out. Then,
these DEGs were submitted to the CMap cloud platform
([85]https://clue.io/query/) for analysis, and the CMap score of each
compound was obtained. If the negative score of a compound is smaller,
the gene expression pattern of the compound is opposite to the
expression pattern of PC. It is suggested that this compound has a good
therapeutic effect in PC.
2.10. Statistical analysis
All statistical analyses were performed in R (version 4.1.1). The
comparison of count data was tested by Fisher's test and Chi-square
test. For the measurement data that conformed to the normal
distribution, the Student-t test was applied; besides, the Wilcox test
was applied for non-normal distribution data between independent
subgroups. Spearman analysis was applied to estimate the correlations
between two variables that are not linearly related. The Kaplan-Meier
(K-M) test was utilized to validate the fraction of PC patients living
for a certain survival time and the log-rank test was conducted to
compare the significance of the difference. R package “survival” was
used for depicting K-M survival curve. A two-tailed p-value of less
than 0.05 was deemed to be statistically significant unless
specifically stated.
3. Results
3.1. Landscape of multi-omics variation of m6A readers in PC
Firstly, we analyzed the mutation variation of m6A readers in PC, three
datasets (namely, TCGA_PAAD, ICGC_AU_PAAD and ICGC_CA_PAAD) were
selected, all of which have complete gene expression and clinical data.
As shown in Figure S3, the overall mutation rate of 14 m6A readers in
335 samples was low, among which the top three genes with mutation rate
were SND1, IGF2BP3 and IGF2BP2. From the perspective of mutation type,
intronic mutation was the most common mutation in the readers, which
indicated that the structural changes caused by intronic mutation might
be the reason for the essential biological effects of the m6A readers.
With regards to the CNV, data from TCGA_PAAD datasets were included.
Among 140 PC samples with entire CNV data, 57.14% (8/14) of the m6A
readers had copy number amplification frequency greater than copy
number deletion frequency, indicating that copy number amplification
was a relatively common mutation event of m6A readers at the DNA level.
In terms of single genes, IGF2BP1 had the highest frequency of CNV
events (9.22%), followed by PRRC2A (8.51%) and YTHDF2 (7.09%) (Figure
S4A). The copy number increase or decrease of large genome fragments
with copy number variation of more than 1 kb is mainly manifested as
submicroscopic deletions and duplications, which are important
components of genome structural variation. Therefore, we also explored
the relationship between CNV events and the expression of m6A readers.
When the copy number of m6A readers increased, its mRNA expression also
showed an up-regulated trend. Similarly, when the copy number of m6A
readers was deleted, its expression was down-regulated accordingly
(Figure S4B-O).
For the analysis of AS events of m6A readers in PC, only the TCGA_PAAD
dataset with more comprehensive data information was included here.
Through the TCGA Splice Seq database, we obtained 139 samples with AS
events data, including exon skip (ES), alternate terminator (AT),
alternate promoter (AP), alternate acceptor (AA), alternate donor (AD),
retained intron (RI) and mutually exclusive (ME). From the level of
whole transcriptome genes, ES events in PC samples ranked first
(31.81%, 6750/21,218), AT and AP ranked second (17.98%, 3816/21,218)
and third (17.55%, 3724/21,218), respectively (Figure S5A). However,
for the 14 m6A readers, the overall number of AS events was small.
Similarly, the most common type of AS was still ES (40%, 8/20) (Figure
S5B). Therefore, exon deletion caused by ES is also an important link
in the regulation of PC by m6A readers.
As the most common modification in biology, m6A modification under the
epigenetic branch has been widely reported to be closely related to the
occurrence and development of PC. To further verify the epigenetic
expression of m6A readers, DNA methylation and histone acetylation were
analyzed respectively. After downloading PC methylation data from the
TCGA portal and conducting data quality control, we obtained the
methylation signal value matrix (β value matrix) of m6A reader genes to
quantify the methylation level of each gene. As shown in Figure S6,
correlation analysis between the expression level of m6A readers and β
value shows that, at p < 0.05 level, 71.43% of the expression of
readers were negatively correlated with methylation levels. Among them,
YTHDF3 (R = −0.51, p = 1.9E-13), IGF2BP2 (R = −0.5, p = 6.3E-13), The
expression of SND1 (R = −0.4, p = 2.5E-08) had the most significant
negative correlation with methylation level, which indicated that most
readers were regulated by methylation in PC. For the analysis of the
histone acetylation level of the m6A readers, the most common histone
H3K27ac was selected as the research object here due to the limitation
of the database. Through the aforementioned method, we paired the
acetylation data in the Cistrome Data Browser database with the UCSC
Browser for visualization. Compared with normal pancreatic cells, the
histone H3K27ac acetylation modification peaks in the promoter regions
of IGF2BP1, IGF2BP2, IGF2BP3, YTHDF2, YTHDF3 and PRRC2A in PC cell
lines were significantly increased (Figure S7). This indicates that the
above m6A readers may be regulated by H3K27ac acetylation, which also
provides a theoretical basis for subsequent related studies in the
future.
In the previous study, we systematically analyzed the m6A readers at
the DNA level and epigenetic levels. On the whole, most of the m6A
readers were regulated by SNP, CNV, AS, methylation and acetylation. To
further analyze the effect of these readers at the mRNA level. We
selected 141 PC samples from the TCGA_PAAD database and 165 normal
samples from the GTEx database for differential expression analysis, as
shown in Figure S8 mRNA expression levels of IGF2BP2 (log2FC = 5.45,
FDR = 9.02E-89), IGF2BP3 (log2FC = 2.41, FDR = 4.47E-30) and IGF2BP1
(log2FC = 1.58, FDR = 2.28E-86) in PC samples were significantly higher
than those in normal tissues, which may be related to the high mutation
frequency of IGF2BP2 and IGF2BP3 mentioned above. To further explore
the relationship between m6A readers and the prognosis of PC patients,
the 14 readers were divided into high expression group and low
expression group according to their median expression level, and the
overall survival (OS) between the two groups was compared. For YTHDC1,
YTHDC2, YTHDF1, YTHDF2, YTHDF3 and PRRC2A, the high expression group of
PC patients had better OS, while other readers showed opposite results
(Figure S9A–N). At the same time, univariate Cox analysis was performed
for 14 m6A readers, and the risk network diagram was drawn, which was
consistent with the results of the survival analysis described above
(Figure S9O). These results indicate that YTHDC1, YTHDC2, YTHDF1,
YTHDF2, YTHDF3 and PRRC2A are protective factors in PC patients.
IGF2BP1, IGF2BP2, IGF2BP3, FMR1, HNRNPA2B1, EIF3A, LRPPRC and SND1 were
risk factors.
3.2. Construction of specific expression patterns of m6A readers
In order to explore the specific expression pattern of m6A readers, a
total of 328 PC samples were included (TCGA_PAAD, ICGC_AU_PAAD,
[86]GSE28735 and [87]GSE62452 datasets). Based on the expression of 14
m6A readers, we conducted an unsupervised consensus clustering
analysis. According to the heatmap of consensus clustering ([88]Fig.
2A), the optimal k = 2 could be determined. A total of 328 PC patients
were divided into two subgroups: Readercluster_A (n = 161, 49.09%) and
Readercluster_B (n = 167, 50.91%). To verify the reliability of the
grouping, PCA analysis was performed on the expression profile of 14
m6A readers. As shown in [89]Fig. 2B, in the PCA diagram, the samples
classified as Readercluster_A (red) and Readercluster_B (blue) have
high discrimination, indicating that the grouping effect is robust. In
addition, to explore the effect of different m6A reader gene expression
patterns on the prognosis of PC patients, a survival analysis was
performed. The samples in the Readercluster_A group were significantly
lower than those of the Readercluster_B group on OS (p < 0.001),
suggesting that the Readercluster_A expression pattern may lead to poor
prognosis in PC patients ([90]Fig. 2C).
Fig. 2.
[91]Fig. 2
[92]Open in a new tab
Construction of specific expression patterns of m6A readers. (A) Heat
map based on the expression matrix of 14 m6A readers, with the optimal
grouping k value of 2. (B) PCA algorithm diagram shows that
Readercluster_A and Readercluster_B groups have better classification
effect. (C) Survival curve showed that the prognosis of patients in
Readercluster_A group was poor. (D) Heat map of GSVA score based on
KEGG background gene set. (E) ssGSEA analysis was performed based on
the immune background gene set to judge the infiltration abundance of
immune cells.
To further detect the differences in biological behaviors among
different m6A reader expression patterns, we performed GSVA analysis
using the KEGG pathway as the background gene set. The expression
pattern of the Readercluster_A subtype was mainly enriched in oncogenic
pathways: Notch pathway, p53 pathway and pancreatic cancer pathway,
while the expression pattern of Readercluster_B subtype was mainly
enriched in metabolic pathways: fatty acid metabolism pathway, alanine
metabolism pathway and tryptophan metabolism pathway ([93]Fig. 2D).
These results provide a direction for further research on the oncogenic
mechanism downstream of the m6A readers.
We also preliminarily explored the difference in tumor immune
microenvironment (TIME) between the specific expression patterns of the
two m6A readers. ssGSEA analysis was performed using immune gene sets
for the specific expression patterns of the two m6A readers. On the
whole, the abundance of immune cell infiltration of the Readercluster_B
expression pattern was significantly higher than that of
Readercluster_A. From the perspective of the specific degree of cell
infiltration, the Readercluster_B expression pattern has abundant
innate immune cell infiltration, such as eosinophils, MDSC and
dendritic cells ([94]Fig. 2E). These results provided a basis for the
better prognosis of the Readercluster_B expression pattern.
Next, to further explore the characteristics of the expression pattern
of these m6A readers in different clinical features and biological
behaviors, 141 PC samples in the TCGA_PAAD dataset with the most
comprehensive clinical information were selected as the only study
subjects for all subsequent analyses. Firstly, we used the expression
matrix of 14 m6A readers in TCGA_PAAD dataset as the input object for
consensus clustering analysis again. 141 PC patients were also divided
into Readercluster_A (n = 77, 54.61%) and Readercluster_B (n = 64,
45.39%) subgroups (Figure S10A-C).
For checking the relationship between different m6A reader gene
expression patterns and PC molecular subtypes, we first obtained the
characteristic gene sets of three classical molecular classification
(Moffitt classification, Collisson classification and Bailey
classification) in the reference literature [[95]4,[96]20,[97]21]. Then
consensus clustering analysis was performed on the expression matrix of
these characteristic gene sets in TCGA_PAAD. The characteristic gene
sets of the three molecular types were well clustered in the TCGA_PAAD
dataset, so as to obtain the data of the three molecular
classifications in the TCGA_PAAD dataset: (1) Moffitt classification:
Classical (n = 66, 46.81%), Basel (n = 75, 53.19%); (2) Collisson
classification: Classical (n = 36, 25.53%), Exocrine (n = 66, 46.81%)
and QM-PDA (n = 39, 27.66%). (3) Bailey classification: squamous
(n = 44, 31.21%), progenitor (n = 36, 25.53%), immunogenic (n = 21,
14.89%) and Aberrantly Differentiated Endocrine Exocrine (ADEX)
(n = 40, 28.37%) (Figure S10D–F). Finally, visual analyses were
performed by adding column annotations (clinical information) to
expression heatmaps. 84.1% (37/44) of the squamous type samples (with
the worst prognosis) were found in the samples of the Readercluster_A
group, and 85.71% (66/77) of the samples of the Readercluster_A group
had mutations in the m6A readers (Figure S10G). These results may
confirm that Readercluster_A was significantly associated with
molecular subtypes with poor prognosis. In addition, among the 14 m6A
readers, IGF2BP2 and IGF2BP3 were significantly differentially
expressed between Readercluster_A group and Readercluster_B group
(IGF2BP2, p < 0.01; IGF2BP3, p < 0.01), indicating that IGF2BP2 and
IGF2BP3 may be key molecules with different expression patterns of m6A
readers.
3.3. Generation of m6A readers phenotype-related expression patterns and
screening of key gene
We found that the gene expression patterns of Readercluster_A and
Readercluster_B in PC were significantly different, and affected tumor
progression, metabolism and other pathways. To further explore and
quantify the reasons for the differences in gene expression patterns of
different m6A readers, we constructed a phenotype-related scoring model
for m6A readers.
Firstly, in the TCGA_PAAD cohort, we performed differential analysis on
the gene expression patterns of Readercluster_A and Readercluster_B in
PC, and screened 1468 DEGs related to the phenotype of m6A readers.
Univariate Cox regression analysis was performed on 1468 differentially
expressed genes to screen out 140 differentially expressed genes that
were significantly associated with prognosis. To explore the specific
gene expression patterns associated with different m6A reader
phenotypes, we performed consensus cluster analysis on 140 genes
associated with both the phenotype and prognosis of m6A readers. We
obtained unique gene expression patterns related to the phenotype of
two groups of m6A readers. They were defined as Reader. gene.cluster_A
(n = 72, 51.06%) and Reader. gene.Cluster_B (n = 69, 48.94%),
respectively ([98]Fig. 3A). Survival analysis showed that the
expression pattern of Reader. gene.cluster_A had poor prognosis
(p = 0.004) ([99]Fig. 3B). So far, we found that the expression pattern
of Reader. gene.cluster was highly consistent with that of
Readercluster: (1) Among the samples classified as Reader.
gene.cluster_A group, 79.17% of the samples were also classified as
Readercluster_A group; Among the samples classified as Reader.
gene.Cluster_B group, 71.01% of the samples were also classified as
Readercluster_B group; (2) The survival time of PC patients classified
as Reader. gene.cluster_A group and Readercluster_A group was
significantly lower than that of patients classified as Reader.
gene.cluster_B group and Readercluster_B group; (3) IGF2BP2 and IGF2BP3
were significantly differentially expressed in Reader. gene.cluster and
Readercluster. These results indicated that there were indeed two
specific modification patterns related to m6A readers in PC, and
IGF2BP2 and IGF2BP3 may be the key regulatory genes.
Fig. 3.
[100]Fig. 3
[101]Open in a new tab
Ceneration of m6A readers phenotype-related expression patterns and
screening of key gene. (A) Heat map of concordance cluster analysis
based on 140 genes related to both m6A reader phenotype and prognosis
showed that the optimal grouping k value was 2; (B) Survival curve
showed that the Reader. gene.cluster_A expression pattern had a poor
prognosis. (C) Survival curve showed that the survival rate of patients
decreased significantly with the decrease of readerscore; (D) The
correlation impact map of the three subtypes showed that the expression
patterns of Readercluster type, Reader. gene.Cluster type and
Readerscore type were highly consistent. (E) PCA plots of expression
patterns associated with two m6A reader phenotypes. The results showed
that PC1+PC2 could explain 88.14% of the data variation, and the
variable IGF2BP2 had the highest contribution.
To further quantify the expression patterns related to the phenotype of
m6A readers, a scoring system was constructed by PCA method. The best
cutoff value (0.74) was used to divide into two subgroups:
Readerscore_high (n = 58, 41.13%) and Readerscore_low (n = 83, 58.87%).
As shown in [102]Fig. 3C, it can be found from the survival curve that
with the reduction of Readerscore, the survival rate of patients
significantly decreased. To explore the connection between these three
subgroups mentioned above, we used the alluvial plot to more
intuitively show the connection between sample attributes. The lower
the Readerscore of PC patients, it is usually classified as
Readercluster_A and Reader. gene.cluster_A group, and its survival time
is also worse ([103]Fig. 3D). These results confirmed that our
constructed Readerscore system can effectively quantify expression
patterns related to m6A readers’ phenotypes.
Next, we performed contribution analysis on the expression data which
used to construct the model to explore the core genes between the two
different expression patterns. As shown in [104]Fig. 3E, in this PCA
plot, the explanation rate of principal component 1 (PC1) to the
overall difference was 65.3%, the explanation rate of principal
component 2 (PC2) to the overall difference was 23.1%, and the
explanation rate of PC1+PC2 to the data difference reached 88.4%,
indicating that the PCA had a good dimensionality reduction effect.
Secondly, IGF2BP2 is the only selected m6A readers among the top 100
genes with the highest contribution to PC1, suggesting that IGF2BP2 can
effectively explain the variation of the overall data, which further
suggests that IGF2BP2 was the key gene leading to the change of
expression pattern related to m6A readers’ phenotypes.
3.4. Functional annotation of IGF2BP2-mediated modification patterns in PC
To further investigate the potential biological behavior of
IGF2BP2-mediated patterns, the “clusterProfiler” R package was utilized
to screen the differential genes in different IGF2BP2 expression
patterns, we used the RNA-Seq data of PC in TCGA, ICGC and GEO
databases to determine the median expression value of IGF2BP2. 328 PC
samples were divided into IGF2BP2_high and IGF2BP2_low subgroups and
5044 differential genes were obtained, which were defined as the
differential genes related to IGF2BP2-mediated phenotype. The top 2000
differential genes with significant differences were selected for
heatmap (Figure S11A) to judge the efficiency of differential analysis.
Then, we conducted GO and KEGG pathway enrichment analysis on the above
5044 differentially expressed genes. IGFBBP2_high subgroup was mainly
enriched in cell cycle-related pathways (“mitotic cell cycle”, “cell
cycle”), tumor-related EMT and transcriptional dysregulation pathways
(“epithelial cell differentiation”, “Regulation of transcription by RNA
polymera”, “transcriptional misregulation in cancer”; The IGFBBP2_low
subgroup was mainly enriched in immune effector process (“immune
effector process”, “immune response”, “regulation of immune system
process”, “primary immunodeficiency”) (Figure S11B–C). All these
results confirmed that the high expression of IGF2BP2 in PC patients
may promote the malignant biological phenotype of PC by suppressing the
immune system.
3.5. Development of poor prognostic signatures based on IGF2BP2-mediated
modification patterns via machine learning algorithms
Based on the 5044 differential genes associated with the IGF2BP2
phenotype obtained by differential analysis, we explored prognostic
gene markers for PC. 328 samples in TCGA, ICGC and GEO databases were
divided into training set (n = 230) and testing set (n = 98) according
to the ratio of 7:3. For the samples in the training set, we first
performed univariate Cox regression analysis to obtain 889 genes
associated with prognosis. The Bootstrapping algorithm was used to
obtain 181 genes that were significantly associated with prognosis in
900 sampling sessions. Then, RSF analysis based on the minimum depth
(MD-RSF) method was used to determine the final prognostic gene
markers. In the RSF algorithm with 1000 iterations ([105]Fig. 4A–B).
The 13 gene sets (FNDC3B, L1CAM, PLXNA1, HMGA2, FAM110B, FAM83A,
COX7A1, PMAIP1, KIF20B, SPDL1, SNCG, TGM2 and MUC16) included when
C_index reached the maximum (0.7128982) were selected as poor
prognostic signatures, defined as PPS associated with IGF2BP2
phenotype. Finally, the multivariate Cox regression coefficients of the
above 13 PPS were calculated and the PPS scoring model was constructed.
The specific PPS scoring formula was as follows:
[MATH: PPSsocre=Expr(FNDC3B)*0.16009993+Expr(L1CAM)*0.28052790+Expr(PLXNA1)*0.22958423+Expr(HMGA2)*0.14046140−Expr(FAM110B)*0.18477060+Expr(FAM83A)*0.04911509−Expr(COX7A1)*0.23187426−Expr(PMAIP1)*0.08461359+Expr(KIF20B)*0.16976500−Expr(SPDL1)*0.09529266+Expr(SNCG)*0.06892878−Expr(TGM2)*0.01766628+Expr(MUC16)*0.04981059 :MATH]
Fig. 4.
[106]Fig. 4
[107]Open in a new tab
Development of poor prognostic signatures based on IGF2BP2-mediated
modification patterns via machine learning algorithms. (A) C-index
change curve of RSF algorithm after 1000 iterations; (B) curve of
out-of-pocket sample error rate of RSF algorithm; According to the
median value of PPS score, the survival curve was drawn in the training
set (C), testing set (D), TCGA_PAAD dataset (G), and ICGC_AU_PAAD
dataset (H), the results showed that the higher the PPS score, the
worse the prognosis of PDAC patients. The area under the 1-year, 2-year
and 3-year survival curve predicted by PPS model in the training set
(E), testing set (F), TCGA_PAAD dataset (I) and ICGC_AU_PAAD dataset
(J). (K) Comparison of prediction performance between PPS model and
other models in the training set; (L) Comparison of prediction
performance between PPS model and other models in the testing set.
To evaluate the prognostic value of the model, PC samples were divided
into two subgroups according to the median PPS score, namely PPS_high
and PPS_low. Then, survival analysis was performed on the training set,
testing set, TCGA_PAAD dataset and ICGC_AU_PAAD dataset, and survival
curves were drawn. In the four independent data sets mentioned above,
higher PPS score was positively correlated with shorter overall
survival (training set, p < 0.0001; testing set, P = 0.031; TCGA_PAAD,
p = 0.024; ICGC_AU_PAAD,p < 0.0001) ([108]Fig. 4C, D, G, H).
In order to further verify the prediction efficiency of this model, the
receiver operating characteristic curve (ROC) was selected for visual
analysis. In the training and testing sets, the PPS scoring model was
effective for patients in 1 year (training set, AUC = 0.727; testing
set, AUC = 0.663; TCGA_PAAD, AUC = 0.671; ICGC_AU_PAAD, AUC = 0.700), 2
years (training set, AUC = 0.802; testing set, AUC = 0.703; TCGA_PAAD,
AUC = 0.710; ICGC_AU_PAAD, AUC = 0.709) and 3 years (training set,
AUC = 0.805; testing set, AUC = 0.785; TCGA_PAAD, AUC = 0.660;
ICGC_AU_PAAD, AUC = 0.710) had good efficacy in predicting survival
([109]Fig. 4E, F, I, J). In addition, it has been reported that several
scholars have also constructed prognostic models for PC patients, such
as Zheng [[110]22], Zhou [[111]23] and Yuan [[112]24]. In order to
verify whether PPS score model has better predictive power compared
with the prognostic model constructed by the above three teams, ROC
analysis was also performed. PPS scoring model in the training set
(PPS: 0.778 vs Zheng: 0.567, Zhou: 0.687, Yuan, 0.554) and testing set
(PPS: 0.717 vs Zheng: 0.662, Zhou: The average AUC values of 0.651,
Yuan, 0.584) were higher than those of the other three prognostic
models ([113]Fig. 4K-L). The above results fully verify the robustness
and predictive effectiveness of our PPS scoring model.
Finally, in order to explore the relationship between 13 PPS genes and
IGF2BP2, we conducted a correlation analysis. As shown in [114]Fig.
5A–B, the expression of IGF2BP2 was highly correlated with PPS score
(R = 0.64; p = 2.2E-16) and showed a significant correlation with the
expression of 13 PPS genes. In addition, we also used the “GOSemSim” R
package to analyze the enrichment pathway similarity of 14 genes
(IGF2BP2 and 13 PPS genes) and the functional similarity distribution
map among these genes is shown in [115]Fig. 5C. These results indicated
that IGF2BP2 is not only highly correlated with the PPS gene at the
transcriptome level, but also has high functional similarity with the
PPS gene.
Fig. 5.
[116]Fig. 5
[117]Open in a new tab
Exploration of the PPS-related molecular subtype characteristics. (A)
Correlation map between PPS score and IGF2BP2 expression; (B)
correlation map between 13 PPS genes and IGF2BP2 expression; (C) Box
plot of enrichment similarity analysis between 13 PPS genes and IGF2BP2
pathway. Correlation analysis between PPS score and Moffitt
classification (D), Collisson classification (E), Bailey classification
(F), m6A mutation status (G), IGF2BP2 copy number variation event (H).
3.6. Exploration of the PPS-related molecular subtype characteristics,
biological processes and drug targets
At present, the classical clinical molecular subtypes for PC include
Moffitt subtype, Collisson subtype and Bailey subtype. To explore the
correlation between PPS score and molecular subtypes in PC, we
conducted a correlation analysis between them. The Basel_like subtype
in the Moffitt classification, the QM-PDA subtype in the Collisson
classification, and the Squamous subtype in the Bailey classification
had higher PPS scores, which have been shown to have a poor prognosis
([118]Fig. 5D–F). In addition, we also noted that patients with higher
PPS scores had a higher frequency of m6A gene mutation events and
IGF2BP2 copy number amplification events ([119]Fig. 5G–H). All these
results indicated that PC patients with higher PPS scores have worse
prognosis, which is consistent with the conclusion confirmed in the
previous part.
To investigate which biological processes contribute to the malignant
phenotype in patients with high PPS scores, Pathifier analysis and GSEA
analysis based on transcriptomics were performed. Firstly, the
expression profiles of PPS_high and PPS_low subgroups were used for
Pathifier analysis to obtain the PDS score of each sample, and the
correlation between the PDS score and PPS score was calculated. If the
correlation coefficient of a pathway (biological process) is positive,
it can be considered that this biological process may be responsible
for the poor prognosis of patients with high PPS score. Cell cycle and
proliferation-related pathways (“mitotic spindle”, “G2M checkpoint”,
“E2F targets”) had the highest correlation with poor prognosis
([120]Fig. 6A). Next, to further verify the reliability of the above
conclusions, we performed GSEA analysis using RNA-seq data from samples
in the training set. First, the correlation coefficient between the PPS
score of the sample and RNA-Seq data was calculated, and then the
correlation coefficient was used as the input for GSEA analysis. As
shown in [121]Fig. 6B, genes with positive correlation coefficients
were also enriched in cell cycle and proliferation-related pathways
(“mitotic spindle”, “G2M checkpoint”, “E2F targets” and “Myc targets
v1”). Through the above two methods, the obtained biological processes
related to the PPS score were highly consistent, so we can infer that
the dysregulation of the cell cycle and proliferation process may be
the reason for the poor prognosis of patients with high PPS.
Fig. 6.
[122]Fig. 6
[123]Open in a new tab
Identification of PPS-related biological processes and drug targets.
(A) The PDS score was calculated based on PPS model grouping, and the
heat map of PDS score was drawn. (B) GSEA analysis was performed based
on PPS model grouping to laterally verify the reliability of enriched
pathways in patients with high PPS scores. (C) Volcano plot and scatter
plot of correlation between PPS score and drug gene targets. (D)
Correlation between PPS score and CERES score of drug gene target
volcano plot and scatter plot.
In patients with high PPS scores, proteins that are highly positively
correlated with PPS scores may be potential therapeutic targets.
However, most proteins fail to exert their targeting effects due to the
lack of active sites to which small molecule compounds can bind.
Therefore, to explore potential drug therapeutic targets for patients
with high PPS scores and poor prognosis, we conducted the following
analysis: First, the correlation between protein expression profile
data and PPS score after drug intervention was analyzed. 448 targeted
proteins were obtained by the standard of R > 0.4 and p < 0.05.
Secondly, correlation analysis was performed between the CERES score in
the DepMap database and the PPS score of PC cell lines in CCLE. 87
targeted proteins were screened out according to the standard of
R < −0.3 and p < 0.05.7 potential protein targets (FOXM1, PRC1, CCNB1,
SLC16A3, CCNA2, GGCX and AURKA) were screened out ([124]Fig. 6C–D).
These results suggest that for patients with high PPS scores, effective
therapeutic results may be achieved by inhibiting the above 7 protein
targets.
3.7. Identification of potential therapeutic agents for high PPS score in PC
To screen potential therapeutic compounds related to the PPS model,
this part of the analysis was based on the gene expression data and
drug sensitivity data (AUC) of PC cell lines in the CTRP and PRISM
databases. The information of 437 compounds in the CTRP database and
1438 compounds in the PRISM database were selected for subsequent
analysis after removing the duplicated compound information in the two
databases and removing the PC cell lines with NA values over 20% or
originating from hematopoietic and lymphoid tissues ([125]Fig. 7A).
Fig. 7.
[126]Fig. 7
[127]Open in a new tab
Identification of candidate agents with higher drug sensitivity in
high-PPS score patients. (A) Compound information summary Venn diagram
from CTRP and PRISM database; (B) predictive value of drug
susceptibility between Kras mutation and non-mutation groups. (C) Six
potential compounds derived from CTRP were screened by Spearman
correlation analysis and drug sensitivity analysis; (D) Six potential
compounds derived from PRISM were identified by Spearman correlation
analysis and differential susceptibility analysis.
We first obtained the AUC values of each compound in PC samples using
the “pRRophetic” package with a built-in ridge regression model based
on the gene expression profile data of PC samples. To verify the
credibility of the AUC value, we conducted the following analysis:
Selumetinib is a PI3K pathway inhibitor used in the treatment of PC,
and a recent study found that PC patients with KRAS pathway mutations
had longer survival after treatment than PC patients without KRAS
pathway mutations [[128]25]. Therefore, according to the mutation
status of the KRAS pathway, PC patients were divided into KRAS-altered
and KRAS-unaltered subgroups. Wilcoxon rank-sum test was used to
compare the sensitivity of the two subgroups to Selumetinib. As shown
in [129]Fig. 7B, the AUC of PC patients in the KRAS-altered group was
significantly decreased (p = 0.0056), which was consistent with the
clinical findings of Selumetinib above. The above results further
confirmed the reliability of our predicted AUC. For patients with high
PPS scores, potential compounds were screened from CTRP database and
PRISM database. 6 compounds from the CTRP database (Austocystin D,
Tipifarnib, Merck60, Canertinib, BRD-K92856060 and Vincristine) and 6
compounds from PRISM database (Brigatininb, Cephalomannine, YM-976,
Triciribine, Tanespimycin and Vemurafenib) were obtained ([130]Fig.
7C–D). The above 12 compounds had lower predictive AUC values in the
samples with higher PPS scores, and their predictive AUC values were
significantly negatively correlated with PPS scores.
Although the sensitivity of these 12 compounds to patients with high
PPS scores was high, the analysis of these 12 compounds alone does not
adequately indicate the therapeutic potential of these 12 compounds in
patients with PC. Therefore, as shown in [131]Fig. 8, we further
conducted a multi-dimensional study. First, we used CMap analysis to
explore compounds whose gene expression pattern was contrary to the
specific expression pattern of PC (namely, gene expression was
increased in PC tissues, but the expression was down-regulated after
the intervention of some compounds). The CMap scores of three compounds
(tipirifanib, vemurafenib and YM-976) were all less than −90. These
compounds may have potential therapeutic effects on PC. Second, the
expression levels (including mRNA and protein levels) of the target of
the candidate compounds were analyzed differently between PC tissues
and normal tissues. The larger the difference, the greater the
potential of the candidate compounds to treat PC. Third, we conducted a
comprehensive literature search in the PubMed database
([132]https://www.ncbi.nlm.nih/), in order to search for candidate
compounds for the treatment of PC experimental and clinical evidence.
Taken together, there is strong evidence that both tipifarnib and
vemurafenib have the best therapeutic potential for patients with high
PPS scores.
Fig. 8.
[133]Fig. 8
[134]Open in a new tab
Multiangle validation was performed to identify the most likely
potential therapeutic compounds for patients with high PPS scores.
4. Discussion
More and more evidence has shown that m6A modification plays an
indispensable role in inflammation, innate immunity and tumor
regulation through the interaction of a variety of m6A regulatory
factors [[135]26]. Since most of the current studies focus on a single
regulator, there is no comprehensive and in-depth understanding of the
overall characteristics of malignant biological behavior of PC mediated
by the comprehensive action of multiple m6A readers. To this end, based
on the 14 m6A readers, we found two distinct expression patterns of m6A
readers, and determined that the two expression patterns had
significant heterogeneity in pathway enrichment, tumor immune
microenvironment, and clinical classification. Finally, through
contribution analysis of the constructed scoring model, IGF2BP2 was
identified as the core gene affecting the related expression pattern of
m6A readers. IGF2BP2 is an RNA-binding protein that regulates multiple
biological processes. Previously, IGF2BP2 was thought to be a type 2
diabetes (T2D)-associated gene [[136]27]. Indeed IGF2BP2 modulates
cellular metabolism in human metabolic diseases such as diabetes,
obesity and fatty liver through post-transcriptional regulation of
numerous genes in multiple cell types. Emerging evidence shows that
IGF2BP2 is an m6A reader that participates in the development and
progression of PC by communicating with different RNAs such as miRNAs,
mRNAs and lncRNAs [[137]28]. Liu Y found that IGF2BP2 and miR-671-3p
could interact with lncRNA-PACERR, inducing pro-tumor macrophages in PC
[[138]17]. Also, IGF2BP2 and lncRNA-DANCR work together to promote
cancer stemness-like properties and pancreatic cancer pathogenesis
[[139]29].
In the past decade, the molecular biology of PC has been greatly
enhanced by high-throughput sequencing analysis of large samples, which
allows researchers to detect key events in the development of PC. With
the recognition of PC-specific mismatch repair defects and homologous
recombination defects, germline variants that increase the risk of PC
have been identified, and these defects are conferring sensitivity to
PC targeted therapy, which marks that the treatment strategy for PC has
officially entered the era of precision medicine. In our study, we have
confirmed that IGF2BP2 is a marker gene with certain molecular
characteristics in PC, and higher expression of IGF2BP2 is closely
related to worse prognosis. Considering the unique clinical features of
PC patients with high IGF2BP2 expression, it is necessary to develop
targeted treatment strategies for these patients. Therefore, an
effective prognostic prediction model, the PPS score model, was
developed based on the different expression patterns of IGF2BP2 in PC.
Moreover, compared with other prognostic models, the PPS model based on
different IGF2BP2 expression patterns has more significant and better
predictive power in terms of prognosis. These results enable us to use
precise prognostic prediction methods for patients with similar
biological patterns to achieve a more reliable prognostic treatment
effect.
PC is highly heterogeneous among individuals, and no study has found a
treatment that is suitable for all patients with PC. In addition, due
to the lack of corresponding biomarkers, patients with advanced PC have
not achieved satisfactory therapeutic effects. Therefore, it is of
great interest to find tailored treatment strategies for specific
populations. The PPS model we constructed not only has good prognostic
efficacy, but also can be used as a biomarker to guide targeted
therapy. In this study, we identified seven potential therapeutic
targets (FOXM1, PRC1, CCNB1, SLC16A3, CCNA2, GGCX, and AURKA) and two
agents (tipirafenib and verorafenib) for PC patients with high PPS
scores.
The protein encoded by FOXM1 belongs to the forkhead box family of
transcription factors, which is a transcription factor related to cell
proliferation and is overexpressed in a variety of human malignant
tumor cells [[140]30]. Studies have shown that silencing FOXM1 can
down-regulate the expression of ATX, thereby inhibiting the
proliferation, invasion and metastasis of pancreatic cancer cells
[[141]31]. PRC1 is a member of the polycomb inhibitory complex, and
inhibition of PRC1 can prevent immune escape and angiogenesis in double
negative breast cancer cells [[142]32]. The protein encoded by CCNB1 is
a cyclin, which is mainly involved in the regulation of cell mitosis.
Silencing CCNB1 can inhibit the proliferation of pancreatic cancer
cells and promote cell senescence by activating p53 signaling pathway
[[143]33]. CCNA2-encoded proteins also belong to the highly conserved
cyclin family, which binds to and activates cyclin-dependent kinase 2,
thereby promoting the transition from G1/S to G2/M. The up-regulation
of CCNA2 expression in KRAS mutant gastric cancer cell lines and
primary tumors leads to increased sensitivity to PLK1 inhibitors, and
it can predict the sensitivity of PLK1 inhibitors to the treatment of
advanced gastric cancer [[144]34]. SLC16A3 belongs to the solute
carrier family, and the depletion of its encoded protein can promote
the increase of reactive oxygen species and metabolism in pancreatic
cancer cells, thereby promoting cell death [[145]35]. The protein
encoded by AURKA is a cell cycle regulating kinase, and inhibition of
AURKA expression can reduce its phosphorylation level in pancreatic
cancer cells and promote cell necrosis [[146]36]. The above results
suggest that the seven targets identified so far play a specific role
in malignancies, especially PC, suggesting that the corresponding
therapeutic targets developed for PC patients with high-risk IGF2BP2
expression patterns are feasible.
Tipifarnib, a farnesyltransferase inhibitor, has been widely used as a
targeted therapy for the treatment of a variety of solid tumors.
Considering that fa acylation is a key step in Ras protein membrane
anchoring required for Ras activity, and KRAS mutations exist in
70%–90% of PC, inhibition of farnesyltransferase can inhibit KRAS gene
function, thereby inhibiting the progression of PC [[147]37]. The
efficacy of tipifarnib has been demonstrated in a phase III clinical
study in patients with advanced PC. The results showed that gemcitabine
combined with gemcitabine can enhance anti-PC efficacy and is more
tolerated by patients compared with gemcitabine alone [[148]38]. In
addition, tipifarnib combined with atorvastatin and celecoxib can
synergistically reduce the sphere formation ability of PC cells, and
can significantly inhibit cell proliferation and promote apoptosis of
sphere-forming cells [[149]39].
Vemurafenib is a low molecular weight compound used to inhibit the
mutated serine-threonine kinase BRAF by selectively binding to the
ATP-binding site of the BRAF V600E kinase, thereby inhibiting its
activity. vemurafenib has been recommended as a first-line treatment
for unresectable advanced BRAF V600E mutation-positive melanoma
[[150]40]. However, serine synthesis was significantly inhibited by
vemurafenib in PC cell lines. Under serine depletion conditions, the
death of PC cells treated with vemurafenib was significantly increased
[[151]41]. In a case report, after vemurafenib targeted therapy for a
PC patient with BRAF V600E mutation, the primary tumor, liver
metastases and lymphatic metastases of the patient were significantly
reduced, and the skin toxicity was weak [[152]42]. Unfortunately, no
study has shown vemurafenib to prolong survival in patients with PC.
This study provides new insights into improving the therapeutic effect
of vemurafenib by selecting patients with vemurafenib-sensitive PC,
thus providing new ideas for the precision treatment of PC.
The PPS scoring model constructed in our study specifically quantifies
the expression pattern of IGF2BP2 and is significantly better than
other methods in risk stratification and prediction of individualized
treatment. However, this study still has several limitations. First, to
obtain more reliable results with a larger sample size, we used a
machine learning algorithm to estimate the IGF2BP2 expression pattern
in a subset of patients. However, there may be slight differences
between the estimated IGF2BP2 expression pattern and the actual IGF2BP2
expression pattern in these patients. Second, the results of drug
target prediction and compound prediction cannot be verified against
each other, which reduces the power of the conclusions in this part.
Finally, all the conclusions of this study were based on public
databases, and further experimental and clinical validation are needed
to promote the clinical translation of the findings.
5. Conclusion
In conclusion, the current work developed a novel gene signature, PPS,
for prognostic prediction of IGF2BP2-mediated m6A modification patterns
in PC. The PPS model has important clinical significance in both low-
and high-PPS patients. For patients with high PPS scores, based on
multiple drug susceptibility and target databases, we have identified
seven potential therapeutic targets and two compounds from multiple
perspectives and all-around aspects, which opens a new direction for
the application of precision medicine in PC. Overall, this study has
not only provided new insights into personalized prognostication
approaches but also thrown light on integrating tailored prognosis
prediction with precision therapy.
Ethical statement
Not applicable.
Funding
This study is supported by the National Natural Science Foundation of
China (No. 82000614; No. 81873589; No. 82203494), Natural Science
Foundation of Hunan Province (No. 2020JJ5876; No. 2022JJ40741); Science
and Technology Project of Changsha (No. kq2004146; No. kq2004143);
Project of the Health Commission of Hunan Province (No. 202204015341;
No. 202104011922); The Wisdom Accumulation and Talent Cultivation
Project of The Third Xiangya Hospital of Central South University (No.
YX202203) and The Shanxi Provincial Basic Research Program (Free
Exploration) Youth Science Research Project. (No. 202303021212363 to
L.Z.).
Data availability statement
All data used in this study are publicly available. The materials that
support the conclusion of this article have been included within the
Materials and Methods section “Data acquisition and preprocessing”.
CRediT authorship contribution statement
Dongjie Chen: Writing – review & editing, Writing – original draft,
Methodology, Formal analysis, Data curation, Conceptualization. Longjun
Zang: Writing – review & editing, Writing – original draft,
Methodology, Investigation. Yanling Zhou: Writing – review & editing,
Writing – original draft. Yongchao Yang: Resources, Project
administration. Xianlin Zhang: Resources. Zheng Li: Resources. Yufeng
Shu: Supervision, Resources. Wenzhe Gao: Visualization, Validation,
Supervision. Hongwei Zhu: Visualization, Validation, Supervision. Xiao
Yu: Visualization, Validation, Supervision.
Declaration of competing interest
The authors declare that they have no known competing financial
interests or personal relationships that could have appeared to
influence the work reported in this paper.
Acknowledgments