Abstract
Background
Currently, there is lack of marker to accurately assess the prognosis
of patients diagnosed with pancreatic ductal adenocarcinoma (PDAC).
This study aims to establish a hypoxia-related risk scoring model that
can effectively predict the prognosis and chemotherapy outcomes of PDAC
patients.
Methods
Using unsupervised consensus clustering algorithms, we comprehensively
analyzed The Cancer Genome Atlas (TCGA) data to identify two distinct
hypoxia clusters and used the weighted gene co-expression network
analysis (WGCNA) to examine gene sets significantly associated with
these hypoxia clusters. Then univariate Cox regression, the least
absolute shrinkage and selection operator (LASSO) Cox regression and
multivariate Cox regression were used to construct a signature and its
efficacy was evaluated using the International Cancer Genome Consortium
(ICGC) PDAC cohort. Further, the correlation between the risk scores
obtained from the signature and carious clinical, pathological,
immunophenotype, and immunoinfiltration factors as well as the
differences in immunotherapy potential and response to common
chemotherapy drugs between high-risk and low-risk groups were
evaluated.
Results
From a total of 8 significantly related modules and 4423 genes, 5
hypoxia-related signature genes were identified to construct a risk
model. Further analysis revealed that the overall survival rate (OS) of
patients in the low-risk group was significantly higher than the
high-risk group. Univariate and multivariate Cox regression analysis
showed that the risk scoring signature was an independent factor for
prognosis prediction. Analysis of immunocyte infiltration and
immunophenotype showed that the immune score and the anticancer immune
response in the high-risk were significantly lower than that in the
low-risk group.
Conclusion
The constructed hypoxia-associated prognostic signature demonstrated
could be used as a potential risk classifier for PDAC.
Supplementary Information
The online version contains supplementary material available at
10.1186/s12957-023-03142-2.
Keywords: Gene signature, Prognosis, Hypoxia, Pancreatic ductal
adenocarcinoma (PDAC), ARID5A
Introduction
Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal malignancy
affecting the devastating digestive tract, characterized by high
mortality [[31]1]. Due to the absence of early symptoms and reliable
early detection methods, most PDAC cases are diagnosed at an advanced
stage, resulting in a very poor 5-year overall survival rate (< 10%)
[[32]2, [33]3]. Despite advancements in therapies such as
chemoradiotherapy and immunotherapy, there has been limited improvement
in the survival rates of PDAC patients [[34]4, [35]5]. Currently,
surgical excision remains the only radical treatment option, with only
20% percent of patients eligible for curative surgery [[36]6].
An early diagnosis of cancer and early intervention for “high-risk”
patients are critical to improving the clinical outcomes of cancer
patients. Currently, predictive models based on cumulative clinical and
pathological parameters, such as the American Joint Committee on Cancer
(AJCC) grading system, which is based on the tumor-node-metastasis
(TNM) classification, are commonly used to assess the prognosis of
cancer patients [[37]7]. However, this system remains unsuitable for
the individualized treatment of PDAC patients because it is difficult
to reflect their molecular profile and lack the ability to adjust the
staging as the disease progresses [[38]8]. Furthermore, this
classification system may result potential undertreatment or
overtreatment.
Surgical resection combined with chemoradiotherapy remains the most
widely utilized treatment option for PDAC, despite only providing
modest improvements in survival rates [[39]9–[40]11]. In recently
years, immunotherapy targeting immune checkpoint inhibitors (ICIs) has
emerged as a promising treatment option for various cancers [[41]12].
Biomarkers such as programmed death-ligand 1 (PD-L1) expression, tumor
mutation burden (TMB), neoantigen load (NAL), and microsatellite
instability-high (MSI-H) have been identified as potential biomarkers
for the effectiveness of ICIs [[42]13–[43]15]. While immunotherapy
alone has shown limited efficacy in pancreatic cancer, the combination
of ICIs with chemoradiotherapy has demonstrated promising results
[[44]16]. Consequently, the identification of related reliable
prognostic and therapeutic biomarkers is crucial for precise and
individualized precise treatments for PDAC patients.
Hypoxia is a major feature of PDAC [[45]17]. Although tumor cells can
deplete oxygen levels in the microenvironment through rapid cell
proliferation and altered metabolism, they tend to exhibit a remarkable
tolerance to hypoxia, contributing to tumor progression and treatment
resistance [[46]18–[47]20]. PDAC is typically characterized by a median
oxygen level of less than 0.7%, along with the activation of
angiogenesis and glycolysis pathways within the hypoxic
microenvironment [[48]21, [49]22]. Numerous studies have indicated that
tumor hypoxia is associated with anti-apoptosis, tumor recurrence,
migration, immunosuppression, immune evasion, and resistance to
chemoradiotherapy [[50]19, [51]22–[52]25], further confirming that
hypoxic status is significantly correlated with the prognosis of PDAC.
In this study, we utilized the TCGA and ICGC databases to develop and
validate a prognostic signature for PDAC based on hypoxia-related gene
sets and further assessed the predictive ability of the model for both
immunotherapy and chemotherapy (Fig. [53]1). The results showed that
our developed signature may serve as a valuable supplement to the
existing clinical staging system, enabling early detection and
intervention for individuals at high-risk patients and improving their’
prognosis.
Fig. 1.
[54]Fig. 1
[55]Open in a new tab
Flowchart for the development and validation of a PDAC prognostic
signature based on hypoxia
Materials and methods
Data collection and preparation
Transcription expression profiles of PDAC patients in TCGA and ICGC
databases were obtained from the XENA platform
([56]https://xenabrowser.net/datapages, 2022) [[57]26]. The expression
values were quantified using the fragments per kilobase of exon per
million reads mapped (FPKM) method and then transformed using
log[2](X + 1). Non tumor samples were excluded from the analysis.
Hypoxia hallmark genes (n = 200) were obtained from the Molecular
Signatures Database (MsigDB,
[58]https://www.gsea-msigdb.org/gsea/msigdb/). Somatic mutation data
for PDAC patients from the TCGA dataset were acquired from the Genomic
Data Commons (GDC, [59]https://portal.gdc.cancer.gov/, 2022). The
clinical information for the TCGA dataset is provided in Table [60]1.
All analyses were performed using the R language (version 4.2.2).
Table 1.
Clinical information of included PDAC patients in TCGA
Characteristics Level Samples (percentage)
Total 141
Status (%) Alive 60 (42.6)
Dead 81 (57.4)
Age (%) Old 70 (49.6)
Young 71 (50.4)
Gender (%) Female 67 (47.5)
Male 74 (52.5)
Tumor_stage (%) I 13 (9.2)
II 121 (85.8)
III/IV 6 (4.3)
NA 1 (0.7)
N.stage (%) N0 37 (26.2)
N1 103 (73.0)
NA 1 (0.7)
T.stage (%) T1/T2 21 (14.9)
T3/T4 119 (84.4)
NA 1 (0.7)
M.stage (%) M0 65 (46.1)
M1 3 (2.1)
MX 73 (51.8)
Tumor site (%) Head 107 (75.9)
Other 34 (24.1)
Malignancy (%) Malignancy 9 (6.4)
Not 132 (93.6)
Grade (%) G1/G2 100 (70.9)
G3/G4 40 (28.4)
NA 1 (0.7)
[61]Open in a new tab
Unsupervised clustering of hypoxia genes
For the clustering analysis, we employed the R package
ConsensusClusterPlus (version 1.60.0) to cluster the expression
profiles of PDAC patients in TCGA based on the hypoxia-related gene set
obtained from MSigDB. The clustering process involved performing 1000
iteration, each utilizing 80% of the data. The optimal number of
clusters was determined by evaluating the Consensus–Matrix plot and the
relative change in the area under the cumulative distribution function
(CDF) curves. Ultimately, two clusters were selected to represent
different hypoxia statues. To assess the impact of these clusters on
patients’ survival, Kaplan–Meier (K-M) plots were generated using the
survival package (version 3.5–5). K-M plots were generated to compare
their overall survival (OS) of patients in cluster_1 and cluster_2.
Additionally, the R package progeny (version 1.18.0) was used to
calculate the activation scores of 14 typical pathways in the TCGA-PDAC
cohort [[62]27], and the Wilcoxon signed-rank test was employed to
assess differences in pathway activation between cluster_1 and
cluster_2 groups.
Weighted gene co-expression network analysis for hypoxia clustering
The R package WGCNA (version 1.72.1) was used to investigate
co-expression networks and identify gene sets associated with hypoxia
status in the TCGA-PDAC cohort [[63]28]. First, we calculated the
optimal soft threshold β that met the criteria for a scale-free
network, employed a one-step method to build the co-expression network,
and set the minimum number of genes within each module to 40 to ensure
robust modules. Additionally, module fusion was performed, setting the
height of dendrogram cutting for module merging at 0.1. Module
eigengenes (MEs) were generated by summarizing the first principal
component of each module’s gene expression. Person’s correlation
coefficient between MEs and the hypoxia status clusters were evaluated.
Modules with correlation coefficients greater than 0.5 were selected
for further analysis. The genes within these modules were considered
candidate genes for further analysis. Lastly, Kyoto Encyclopedia of
Genes and Genomes (KEGG) and Gene Ontology (GO) functional enrichment
analyses were performed on the candidate genes using the R package
clusterProfiler (version 3.16.1) [[64]29].
Identification of prognostic genes and construction risk signature
The TCGA-PDAC cohort was selected as the training set for constructing
the hypoxia-related risk model, while the ICGC cohort served as the
validation set, and jointed Cox regression analysis was performed to
construct the prognostic model. First, univariate Cox regression was
performed to identify genes significantly associated with OS from the
candidate gene set using R package Survival. Then, the least absolute
shrinkage and selection operator (LASSO) penalized Cox regression model
was applied to analyze the genes with p < 0.001 to reduce the risk of
overfitting using the glmnet package (version 4.1–4) [[65]30, [66]31].
Lastly, multivariate Cox regression and stepwise method were used to
identify independent genes for building the prognostic model and their
regression coefficients using the Survival package. The risk score of
the patients was calculated according to the normalized expression of
each gene and its coefficient and calculated using the following
formula:
[MATH: risk score(patient)=∑k=1
ncoef×exp :MATH]
where n represents all cases, exp indicates the expression level for
each risk gene, and the coef indicates their regression coefficients.
Then, the patients were stratified into a high- or low-risk groups
using the median risk score.
Evaluation and verification of the prognostic signature
The difference in OS between the two risk groups was estimated using
K-M curves and assessed using the log-rank test by Survival. Receiver
operating characteristic curves (ROC) based on the patients’ risk
scores were plotted suing the package timeROC (version 0.4) and ggplot2
(version 3.3.6). The concordance index (C index) was calculated to
evaluate the prediction accuracy of the prognostic signature by R
package survcomp (version 1.46.0) [[67]32]. Univariate and multivariate
cox regression analyses were performed on the TCGA-PDAC cohort to
verify the clinical predictive independence of the risk score model.
Differentially expression analysis between risk groups
The R package progeny was used to calculate the activation of 14
typical pathways in the TCGA-PDAC cohort [[68]27]. Wilcoxon signed-rank
test was used to detect differences in activation score between the
high- and low-risk groups. To explore the differentially expressed
genes (DEGs) between the high- and low-risk groups in the TCGA-PDAC
cohort, the R package DEseq2 (version 1.36.0) was employed. DEGs were
identified based on criteria as |Log Fold change (FC)|> 1 and
p-value < 0.01 [[69]33]. Pathway analysis was performed using Metascape
platform ([70]http://metascape.org/gp/index.html#/main/step1) [[71]34].
The mutation landscape of patients in the TCGA-PDAC cohort was analyzed
using the R package maftools (version 2.12.0) [[72]35].
Estimation of immune cell infiltration fractions
To investigate the differences in the immune microenvironment
composition between the high- and low-risk groups in the TCGA-PADC
cohort, various methods were employed. Tumor purity and infiltrating
immune/stromal cell scores were predicted using the package estimate
(version 1.0.13). Single-sample gene set enrichment analysis (ssGSEA)
was performed to quantitatively measure immune cell infiltration using
the expression profiling matrix from the R package GSVA (version
1.36.3) [[73]36, [74]37]. As a supplement, all patients’ immune cells
proportion based on TIMER, CIBERSORT, XCELL, QUANTISEQ, MCPcounter,
EPIC, and CIBERSORT-ABS were downloaded from the TIMER2.0 dataset
([75]http://timer.comp-genomics.org/) [[76]38–[77]44]. Wilcoxon
signed-rank test was used to detect differences in immune cell
composition between the high- and low-risk groups. Spearman correlation
was then used to calculate the correlation between the risk values and
the immune cells that were significantly different after the Wilcoxon
test.
The activation of the cancer immune cycle, which reflects the antitumor
immune response and influences the fate of tumor cells, was obtained
for the TCGA-PDAC cohort from Tracking Tumor Immunophenotype database
(TIP, [78]http://biocc.hrbmu.edu.cn/TIP/index.jsp). The Wilcoxon
signed-rank test was employed to identify pathways that displayed
differential activation between the high- and low-risk groups [[79]45].
Immunotherapy and chemotherapy response predictions
Multiple perspectives were used to assess immunotherapy efficacy in the
different risk groups from the TCGA-PADC cohort. The immunophenotype
(IPS) of all patients, obtained from the Cancer Immunome Atlas database
(TCIA, [80]https://tcia.at/home), was calculated by analyzing the
expression genes associated with immunogenicity [[81]46]. The TIDE
([82]http://tide.dfci.harvard.edu/) online algorithm was used for
immune checkpoint blockade therapy response predictions [[83]47]. Tumor
mutation burden (TMB) and microsatellite instability (MSI) have shown
superior ability in predicting immunotherapy response in various
cancers [[84]48–[85]50]. TMB of TCGA-PDAC patients was calculated using
the package maftools [[86]35]. The neoantigens were downloaded from
TCIA [[87]46]. We downloaded the MSI prediction results of TCGA-PDAC
patients based on the mantis method [[88]51]. We also used the R
package PreMSIm (version 1.0) to predict the MSI for TCGA-PDAC patients
[[89]52]. Furthermore, we calculated Spearman correlations between the
expression levels of 20 common immune checkpoint genes and risk scores.
Additionally, the differences in expression levels of these between the
high- and low-risk groups were analyzed using the Wilcoxon signed-rank
test.
The half-maximal inhibitory concentration (IC50) values of common PDAC
chemotherapy drugs (paclitaxel, erlotinib, gemcitabine, sorafenib,
cisplatin, 5-fluorouracil) in each TCGA-PDAC sample were estimated
based on the transcription data by R package pRRophetic (version 0.5)
and Genomics of Drug Sensitivity in Cancer (GDSC) database
([90]https://www.cancerrxgene.org) [[91]53, [92]54].
Functional analysis of risk genes
We used the Wilcoxon signed-rank test to analyze the difference in the
expression of risk genes in the high- and low-risk groups. The ssGSEA
method was used to estimate the contents of 28 typical immune cells in
the TCGA-PADC cohort, and the Spearman correlation coefficients among
risk genes were calculated. The expression of risk genes among
different cell types was examined using The Human Protein Atlas (HPA,
[93]https://www.proteinatlas.org/) and The Tumor Single-Cell Hub
(TISCH, [94]http://tisch.comp-genomics.org/home/., the PAAD_GSE154778
dataset was selected) [[95]55, [96]56]. The risk genes protein
distribution information of cancerous and normal pancreatic tissues
(Genotype-Tissue Expression database) were obtained from the HPA.
Pan-cancer expression analysis of risk genes was performed via the Gene
Expression Profiling Interactive Analysis (GEPIA2,
[97]http://gepia2.cancer-pku.cn/#index, choice match with TCGA normal
and GTEx data) [[98]57]. Pan-cancer survival analysis results of risk
genes were obtained from HPA. We used the Gene Expression database of
Normal and Tumor tissues (GENT2, [99]http://gent2.appex.kr/gent2/) and
the differential expression analysis in Tumor, Normal, and Metastatic
tissues (TNMplot, [100]https://tnmplot.com/analysis/) to detect the
expression difference of risk genes in multiple pancreatic cancer
microarray dataset between tumor and normal samples [[101]58, [102]59].
In addition, the package LinkET (version 0.0.7.2) was used to detect
the association between risk genes and immune activation/suppressor
genes.
Results
Identification of genes associated with hypoxia in PDAC
After excluding samples with normal tissues and lacking survival
information, 141 and 92 cases were retrieved from the TCGA-PDAC and
ICGC-PDAC database. Using the expression matrix comprised of
hypoxia-related genes, we performed unsupervised cluster analysis on
the PDAC samples. The CDF curves revealed that the optimal number of
clusters was achieved when k = 2 (Fig. [103]2A, B). The K-M curve
demonstrated a significant difference in OS between the two clusters
(Fig. [104]2C). Moreover, the analysis of pathway activation indicated
a significant disparity in the activation degree of the hypoxia pathway
between the two clusters, suggesting a significant correlation between
our clustering and hypoxia (Fig. S[105]1).
Fig. 2.
[106]Fig. 2
[107]Open in a new tab
Identification of genes associated with hypoxia in PDAC based on WGCNA.
A, B The Consensus–Matrix plot and CDF curves were used to confirm the
optimal cluster number. C Kaplan–Meier curve of OS of PDAC patients in
cluster_1 and cluster_2 groups. D The optimal evaluating results about
soft threshold β. E Clustering dendrograms showing genes with similar
expression patterns were clustered into co-expression modules. F
Module–trait relationships revealing the correlations between each gene
module eigengene and hypoxia cluster and survival status. G Results of
GO pathway analysis of candidate gene set
During the WGCNA procedure, the optimal soft threshold β was set to 12
(Fig. [108]2D) to provide an appropriate suitable power value for the
construction of the coexpression network (Fig. [109]2E); 4423 genes (8
modules) that exhibited significantly association with the hypoxia
cluster (Fig. [110]2F, Fig. S[111]2). GO and KEGG pathway enrichment
analysis showed that these genes were mainly related to immune response
(Fig. [112]2G, Fig. S[113]3).
Construction of prognostic signature related to hypoxia
Based on the expression profiles of the 4423 hypoxia-related genes, 28
prognostic genes were identified by univariate Cox analysis
(Fig. [114]3A) and following LASSO Cox analysis, 11 genes were
identified after 1000 iterations (Fig. [115]3B, C, D). Finally,
multivariate Cox regression analysis was used to obtain a 5-gene
hypoxia-related prognostic signature (Fig. [116]3E), and the risk score
was estimated as follows:
[MATH: Riskscore=-0.677*exprofARID5A+-0.104*exprofIGLV7-46
+-2.3*exprofFAM19A2<
mo>+-1.925*exprofICOSLG+-1.086*exprofSPRN. :MATH]
Fig. 3.
[117]Fig. 3
[118]Open in a new tab
Construction of prognostic signature related to hypoxia. A Forest plot
of hazard ratios of 28 prognostic genes correlated with OS. B Tuning
parameter selection in the LASSO model. C LASSO coefficient profiles of
the 11 genes. D 11 genes coefficient after LASSO analysis. E Final 5
genes forest plot after multivariate Cox regression
Validation of prognostic signature related to hypoxia
After dividing all patients into high- and low-risk groups based on the
median of risk scores, the K-M survival curve demonstrated that the OS
in the high-risk group was significantly lower than in the low-risk
group in both the training set (P < 0.001, Fig. [119]4A) and validation
set (P = 0.0083, Fig. [120]4B). Moreover, a higher frequency of death
events was observed in the high-risk group compared to the low-risk
group (Fig. [121]4G, H). The ROC curve analysis in training and
validation sets indicated the promising prognostic value of the
signature for predicting PDAC OS (1, 2, 3 years AUC > 0.75 in training
sets, Fig. [122]4C. 1, 2 years AUC > 0.65 in validation sets,
Fig. [123]4D). Additionally, as the risk score increased, we observed
that the patients’ prognosis worsened in both sets (Fig. [124]4E, G for
training sets, Fig. [125]4F, H for validation sets). Further, the
heatmap showed that the expression of risk genes was higher in the
low-risk group (Fig. [126]4I for training sets, Fig. [127]4J for
validation sets).
Fig. 4.
Fig. 4
[128]Open in a new tab
Five genes hypoxia status-related signature evaluation and validation.
A, B Kaplan–Meier curve of OS of PDAC patients in high- and low-risk
groups. C, D The 1-, 2-, and 3-year AUC in ROC analysis. E–H Risk curve
for the investigated patient. I, J Heatmap for the expression levels of
5 risk genes. K, L The C-index of our signature (Hypoxia_wgcna), 3
published hypoxia-related prognostic signatures, and 4 published cell
death signatures
To evaluate the prediction accuracy of the prognosis signature, we
calculated the C-index for all published hypoxia-related prognostic
signatures in PDAC, whereby a C-index greater than 0.7 is considered
indicative of an accurate model. Additionally, we compared the
predictive performance of our signature with cell death signatures
(autophagy, cuproptosis, ferroptosis, and pyroptosis signatures)
[[129]60–[130]66]. The results showed that the C-index of our signature
(0.7286131 in training sets, 0.6163404 in validation sets) surpassed
that of other models in both the training and validation sets
(Fig. [131]4K, L); our signature had better predictive performance for
the estimating the prognosis of PDAC patients compared to other models.
Prognostic independence analysis of hypoxia signature
To verify whether our signature can be used as an independent and
effective prognostic indicator, we used the TCGA cohort to perform
univariate and multivariate Cox regression analysis based on common
clinical indicators, including age, gender, TNM stages, tumor site,
malignancy, tumor stage, and tumor grade. The 1-, 2-, and 3-year AUC
values of our signature were greater than those of other clinical
predictors (Fig. [132]5A, B, C). Univariate and multivariate Cox
regression results indicated that our signature was an independent
prognostic indicator (Fig. [133]5D, E). We also analyzed the
correlation between the risk score and different clinical factors, and
the result indicated that our signature was statistically correlated
with tumor grade (Fig. [134]6).
Fig. 5.
[135]Fig. 5
[136]Open in a new tab
Clinical predictive evaluation of risk signature based on training sets
(A–C) AUCs for risk score and clinical symptoms at 1, 2, and 3 years.
D, E Univariate and multivariate Cox regression analysis of risk score
and clinical feature
Fig. 6.
[137]Fig. 6
[138]Open in a new tab
Association between prognostic risk signature and clinical symptoms The
association between risk score and A age, B gender, C tumor stage, D N
stage, E T stage, F M stage, G grade, H tumor site, and I malignancy.
The number on the line is the p-value of Wilcoxon signed-rank sum test
Activation pathway and differentially expressed genes analysis between risk
groups
We conducted a comprehensive analysis to investigate the factors
contributing to the different prognosis of patients in the high- and
low-risk groups within the TCGA-PDAC dataset. Pathway activation
analysis revealed significant activation of pathways associated with
hypoxia tolerance, cell proliferation, promotion, angiogenesis
promotion, and cell metastasis in the high-risk group (Fig. [139]7A)
(such as Hypoxia, MARK, PI3K, and VEGF). Conversely, pathways related
to immune response and surveillance, tumor inhibition and apoptosis
induction, and trail were significantly activated in the low-risk group
(Fig. [140]7A) (such as Estrogen, JAK-STAT, NFκB, P53, TGFβ, TGFα);
2057 DEGs between the high- and low-risk groups, with 547 downregulated
and 1510 upregulated (Fig. [141]7B). Pathway enrichment analysis
demonstrated that these DEGs were significantly enriched in the
adaptive immune response pathway (Fig. [142]7C), suggesting potential
differences in immune activation and response between the high- and
low-risk groups. To visualize the common mutation frequencies of key
genes, we presented a waterfall map depicting the top 20 genes with the
highest mutation frequencies in the high- (Fig. [143]7D) and low-risk
(Fig. [144]7E) groups. Notable, KRAS, TP53, SAMD4, CDKN2A, and TTN
exhibited the highest mutation frequencies in both the high- and
low-risk groups.
Fig. 7.
Fig. 7
[145]Open in a new tab
Analysis of differences in pathway activation and gene expression
between risk groups. A Wilcoxon signed-rank test results of 14 typical
pathway activation scores between high- and low-risk groups. B volcano
plot of DEGs. C Pathway analysis results based on Metascape. D, E The
mutation landscape of patients in the TCGA-PDAC cohort (show the top 20
genes) about the high- and low-risk groups (*p < 0.05, **p < 0.01,
***p < 0.001, ****p < 0.0001)
Relationship between prognostic signature and tumor immune microenvironment
The proliferation and metastasis of malignant tumors are known closely
influenced by the infiltration of immune cells within the tumor
microenvironment. Given the enrichment analysis results suggesting
potential differences in the activation of immune pathways between the
high- and low-risk groups, we further explore the cancer immune
microenvironment of the TCGA-PDAC dataset using various approaches.
The analysis results obtained using the R package estimate indicated
that the immune score, stromal score, and estimate score were
significantly lower in the high-risk group (Fig. [146]8A, Fig. S[147]4
A, B). Additionally, the tumor purity score was significantly higher in
the high-risk group than that in the low-risk group (Fig. S[148]4 C).
The results of 28 kinds of immune cell infiltration degrees based on
ssGSEA showed that the low-risk group had a more abundant immune cell
composition. Specifically, immune cell types such as B cells, CD8 + T
cells, eosinophil, macrophage, and monocyte exhibited significant
differences between the high- and low-risk groups (Fig. [149]8B). To
provide additional insights, we extensively analyzed the relationship
between immune cell types and risk scores by integrating data from 7
mainstream immunoinformatics algorithms, and the results revealed that
multiple immune cell types were negatively associated with risk scores
(Fig. S[150]5). We assessed the activation of the cancer immunity
cycle, which reflects the degree of immune activation within the tumor
microenvironment and predicts the functional performance of immune
modulators and the chemokine system [[151]45, [152]67]. The TIP results
showed that overall immune activation was significantly higher in the
low-risk group (Fig. [153]8C). Specifically, in steps such as priming
and activation (step 3) and multiple cell recruiting factors (step 4),
including T cell recruiting, B cell recruiting, CD4 T cell recruiting,
dendritic cell recruiting, and macrophage recruiting, significant
activation was observed in the low-risk group (Fig. [154]8D). However,
there were no significant differences between the high- and low-risk
groups in steps involving the release of cancer cell antigens (step 1),
cancer antigen presentation (step 2), and subsequent steps related to
the recognition and killing of cancer cells (steps 6 and 7)
(Fig. [155]8D). The high activation of immune cell recruiting factors
observed in the low-risk group may lead to increased immune cell
infiltration and the maintenance of antitumor activity within the tumor
microenvironment, thereby benefiting patients.
Fig. 8.
[156]Fig. 8
[157]Open in a new tab
Analysis of tumor immune microenvironment. A Comparison of the immune
scores of high- and low-risk groups in the TCGA-PDAC cohort. B Analysis
of immune cell infiltration differences between the high- and low-risk
groups. C Overall immune activation differences between high- and
low-risk groups based on TIP. D Differences in the cancer immunity
cycle activities between high- and low-risk groups (*p < 0.05,
**p < 0.01, ***p < 0.001, ****p < 0.0001)
Analysis of the predictive ability of hypoxia related signature for
immunotherapy and chemotherapy
To evaluate the predictive ability of our signature for immunotherapy
efficacy, we analyzed its association with several
immunotherapy-related factors from multiple perspectives. IPS and TIDE
have been verified to quantify the patients’ response to immune
checkpoint inhibitors (ICIs). The results indicated that IPS,
IPS-CTLA4- AND PD1/PDL1/PDL2 blocker, IPS-CTLA4 blocker, and
IPS-PD1/PDL1/PDL2 blocker scores showed no significant association with
the risk score (Fig. [158]9A). However, the TIDE analysis revealed that
the TIDE score was significantly higher in the low risk-group compared
to the high-risk group (Fig. [159]9B). It is worth noting that all
patients had TIDE score below 0, suggesting that patients in the
low-risk group may not respond significantly better to immunotherapy
than those in the high-risk group. TMB, neoantigens, and MSI are common
predictors of immunotherapy efficacy [[160]14, [161]68, [162]69]. The
results showed that TMB and neoantigens had no significant correlation
with risk scores (Fig. [163]9C, D). Then, we used mantis and PreMSIm to
predict MSI levels in TCGA-PDAC patients and analyzed their correlation
with risk scores. The results of mantis showed that MSI had no
significant correlation with risk scores (Fig. [164]9E). PreMSIm showed
that MSI status had a significant correlation with risk scores, and the
number of MSI-low patients in the low-risk group was significantly
higher than that in the high-risk group (Fig. [165]9F, G). These
results indicate that this signature is difficult to predict the
efficacy of patients’ ICIs, but the prediction bias caused by the poor
effect of current ICIs on PDAC is also noteworthy. So, we also
investigated the relationship between the expression of 20 ICIs with
therapeutic potential and risk scores and found that the risk score was
negatively correlated with most ICIs, including PD-L1, CTLA-4, LAG-3,
TIM-3, IDO1, TIGIT, CD86, LAIR1, CD200R1, and CD200, while it was
positively correlated with LGALS3 and CEACAM1 (Fig. [166]9H, Fig.
S[167]6). Although the expression of ICIs differs between the high- and
low-risk groups, the low expression of ICIs may also contribute to the
poor effectiveness of PDAC immunotherapy. Finally, we analyzed the
difference in chemosensitivity to 6 commonly used anticancer drugs
between the high- and low risk groups, and the results showed that the
IC50 values of paclitaxel and erlotinib were relatively higher in the
low-risk group (Fig. [168]9I).
Fig. 9.
[169]Fig. 9
[170]Open in a new tab
Analysis of the predictive ability of risk signature for immunotherapy
and chemotherapy. A–D Association between IPS, TIDE, TMB, neoantigens,
and the risk signature. E MSI difference analysis based on mantis
between high- and low-risk groups. F Association analysis between MSI
status (based on PreMSIm) and risk score. G Distribution analysis of
MSI status in high- and low-risk groups. H The boxplot represents
immune checkpoint expression in the high- and low-risk groups. I The
difference in chemosensitivity of 6 commonly used anticancer drugs in
the high- and low-risk groups (*p < 0.05, **p < 0.01, ***p < 0.001,
****p < 0.0001, NS. p > 0.05)
Functional analysis of risk genes
We conduced survival analysis of the five risk genes and found that
patients with high expression of these genes had a better prognosis
(Fig. [171]10A–E). Consistently, these genes were significantly
upregulated in the low-risk group (Fig. [172]10F), indicating that they
act as protective factors. We then analyzed the association of the risk
genes with the 28 immune cell types, and the results revealed that
ARID5A, IGLV7-46, FAM19A2, and ICOSLG were significantly positively
correlated with multiple immune cells, while SPRN showed a negative
correlation with multiple immune cells (Fig. [173]10G), suggesting a
correlation between our signature and the tumor immune
microenvironment. Single-cell level gene expression analysis of the HPA
database showed that ARID5A was significantly expressed in mixed immune
cells, macrophages, and endothelial cells, and the expression level of
ARID5A was generally higher than other risk genes in a various of cells
(Fig. [174]10H).
Fig. 10.
[175]Fig. 10
[176]Open in a new tab
Functional analysis of risk genes. A–E Kaplan–Meier curve of OS in PDAC
patients with high- and low-expression of 5 genes. F Expression
patterns of 5 genes in the high- and low-risk groups. G Correlation
matrix of the relationship between the expression of 5 risk genes and
the differential immune infiltration levels. H Expression of risk genes
in different cell types based on HPA database (*p < 0.05, **p < 0.01,
***p < 0.001, ****p < 0.0001)
The results of multivariate Cox regression analysis highlighted the
significant impact of ARIT5A on the prognosis of PDAC (p < 0.001,
hazard ratio < 1) (Fig. [177]3E). Subsequently, we conducted several
analyses to explore the function of ARID5A in PDAC. Pan-cancer
expression analysis revealed that ARID5A was significantly
overexpressed in tumor tissues of GBM, KIRC, LAML, and PAAD, while it
showed lower expression in tumor tissues of BLCA, BRCA, and CESC
(Fig. [178]11A). Survival analysis showed that high expression of
ARID5A was beneficial to the prognosis of BRCA, CESC, HNSC, LUAD, and
SKCM (Fig. S[179]7A-E), but unfavorable for GBMLGG, KIRC, KIRP, LUSC,
READ, STAD, and UCEC (Fig. S[180]7F-L). Furthermore, gene expression
analysis based on chip datasets in GEO demonstrated that ARID5A was
significantly highly expressed in tumor tissues (paired or unpaired
samples, Fig. [181]11B). Protein expression analysis also revealed
significantly higher levels of ARID5A in tumor tissues (Fig. [182]10C,
D). Single-cell analysis of the [183]GSE154778 dataset in the TISCH
database showed a predominant expression of ARID5A in CD8T cells
(Fig. [184]11E, F, G). Gene expression correlation analysis further
highlighted the closely relationship between ARID5A and various
immunoactivator or suppressor genes (Fig. [185]11H).
Fig. 11.
[186]Fig. 11
[187]Open in a new tab
Expression and function analysis of ARID5A. A Pan-cancer expression
analysis of ARID5A, red represents tumor samples and gray represents
normal samples. B Expression analysis of ARID5A based on GEO chip data,
the first and second are the expression level of ARID5A in unpaired
samples based on [188]GPL570 and [189]GPL96 platforms, and the last on
represents is the expression level of ARID5A based on paired samples.
C, D Immunohistochemical staining of ARID5A in pancreatic cancer tumor
tissues (C) and normal tissues (D). E–G Expression of ARID5A at the
cellular level based on the [190]GSE154778 dataset. H Correlation
between ARID5A and the immune activating genes or immunosuppressive
genes (*p < 0.05, **p < 0.01, ***p < 0.001, ****p < 0.0001)
Discussion
PDAC is a highly lethal malignancy [[191]1]. Due to rapid progression,
easy metastasis and high drug resistance of PDAC, little progress has
been made in its treatment [[192]3, [193]70, [194]71]. Therefore, there
is a critical need to identify effective prognostic biomarkers and
novel therapeutic targets for PDAC patients. With the increasing
utilization of high-throughput sequencing in cancer research, recent
studies have sought to construction gene expression-based signatures to
assess the prognosis of patients with various malignant tumors
[[195]72–[196]74].
Hypoxia plays a significant role in promoting the proliferation,
metastasis, and therapy resistance to radiotherapy and chemotherapy of
PDAC [[197]19, [198]23, [199]75]. However, direct measurement of tumor
hypoxia status is challenging, and researchers often rely on gene
expression profiles to predict hypoxia status in tumors [[200]76,
[201]77]. Our clustering analysis revealed a high mortality cohort
(cluster_2) associated with hypoxia in PDAC patients. Cluster_2
exhibited higher activation of hypoxia pathway and worse prognosis,
consistent with previous studies linking high hypoxia levels to poor
outcomes [[202]78]. The regulation of hypoxia on the tumor
microenvironment is complex, and genes differentially expressed in
relation to hypoxia may not fully capture the molecular mechanisms.
Therefore, we utilized WGCNA to analyze gene sets associated with
hypoxia, and our GO and KEGG pathway revealed the enrichment of several
immunoregulatory pathways. Previous research has suggested that hypoxia
is a barrier to PDAC immunotherapy [[203]79]. Our findings further
support the close relationship between hypoxia and immunity in PDAC.
Among the five signature genes (ARID5A, IGLV7-46, FAM19A2, ICOSLG,
SPRN) identified through LASSO and multivariate Cox regression
analysis, ARID5A stands out as a member of the ARID protein family with
crucial roles in development, cell growth regulation, and immune
regulation [[204]80, [205]81]. Recent studies highlighted the
regulatory role of ARID5A in various cancers, including lung, breast,
prostate, PDAC, and colorectal cancer [[206]82–[207]85]. Notably,
Parajuli et al. demonstrated that high expression of ARID5A contributes
to tumor immune evasion by recruiting immunosuppressive cells, thereby
reducing the recruitment and activation of antitumor effector T cell
within the tumor microenvironment [[208]85]. According to our results,
the expression of ARID5A is positively associated with most immune
cells (like CD8T, Mono/Macro cells) and was also significantly
associated with multiple immunomodulatory genes. The expression of
ARID5A is related to the infiltration of immune cells, and the highly
active immune pathway is conducive to the survival prognosis of
patients. Therefore, high expression of ARID5A in tumor tissue may
serve as a potential target for PDAC immunotherapy. Interestingly,
pan-cancer analysis revealed different prognostic implications of
ARID5A in distinct cancer subtypes within the same organ. While ARID5A
showed high expression in normal tissues of both LUAD and LUSC,
survival analysis indicated that high expression of this gene was
beneficial to the prognosis of LUAD but detrimental to LUSC. Thus, the
molecular mechanisms underlying the regulation of ARID5A in tumor could
be of interest in future studies.
IGLV7-46 is an immunoglobulin lambda variable gene. While the specific
role of IGLV7-46 in PDAC or tumor immunity remains to be elucidated,
Cai et al. have speculated on a potential correlation between FAM19A2
and immune response based on data mining [[209]86]. One the other hand,
ICOSLG has been established as an immune checkpoint molecule that is
closely associated with immune activation and tumor progression. It has
been implicated ICOSLG as as a predictor and therapeutic target in
acute lymphoblastic leukemia, and can promote the progression of
glioblastoma by mediating the regulatory T-cell expansion [[210]87,
[211]88]. As for SPRN, its function in the context of PDAC or tumor
immunity has not been reported, and further research is needed.
Overall, the risk genes identified in our signature exhibit close
associations with tumor immunity, highlighting the interplay between
tumor hypoxia and immune activation.
Several prognostic signatures for pancreatic cancer based on gene sets,
such as hypoxia, pyroptosis, ferroptosis, cuproptosis, and autophagy,
have been published recently [[212]60–[213]66]. Our analysis revealed
that the C-index value of our signature was higher than or comparable
to the published prognostic signatures based on the hypoxia gene set,
both in the training and validation sets. To our knowledge, this
present study is the first to utilize WGCNA to analyze and fully
incorporate gene sets that are significantly associated with hypoxia to
construct the prognostic model. Overall, our prognostic signature
demonstrates superior or comparable performance to previously defined
signatures in pancreatic cancer.
The pathway activation analysis demonstrated that pathways associated
with cell proliferation and angiogenesis were significantly activated
in the high-risk group, while pathways related to tumor suppression,
immune response, and apoptosis were significantly activated in the
low-risk group. These findings are consistent with the observed poor
prognosis in the high-risk group and better prognosis in the low-risk
group. The activity and infiltration of immune cells in the tumor
microenvironment play a critical role in tumor pathogenesis and
progression. PDAC is known for its immunosuppressive nature and low
immunogenicity, making it highly aggressive [[214]89]. The pathway
enrichment analysis of DEGs revealed significant differences in the
adaptive immune response pathway between the high- and low-risk groups.
The estimate analysis further supported these findings, showing higher
immune scores and stromal scores in the low-risk group, while the tumor
purity score showed the opposite trend. The TIP analysis indicated that
overall immune activation and infiltration of immune cells into tumors
were significantly higher in the low-risk group. Taking these results
together, it can be speculated that the poor prognosis in the high-risk
group may attribute to lower immune activation and infiltration in the
tumor microenvironment.
The immunocyte enrichment analysis revealed that the immune
microenvironment of the low-risk group is complex and exhibits
contradictory characteristics. CD8 + T cells were found to inhibit
tumor growth in different ways [[215]90] and significantly enriched in
low-risk groups. Dendritic cells, which play a role in inducing and
regulating antitumor immune responses [[216]91], were also
significantly recruited in the low-risk group. CD4 + T cells, which
have been associated associated with metastasis and reduced survival in
some studies [[217]92], were unexpectedly enriched in the low-risk
group. Myeloid derived suppressor cells (MDSCs), known for their
immunosuppression and immune evasion promoting properties [[218]93],
were significantly enriched in the low-risk group. Macrophages, which
have been linked to immune suppression and poor prognosis at high
levels [[219]94, [220]95], were also found to be significantly enriched
in the low-risk group. Eosinophils and mast cells, which have been
implicated pancreatic fibrosis and malignancy development [[221]96],
showed significant enrichment in the low-risk group. These findings
indicate that the tumor microenvironment heterogeneity among patients
in the low-risk group could be more pronounced compared to the
high-risk group.
Immune checkpoint blockade therapy is not effective in patients with
PDAC. Our analysis also found no significant difference in
immunotherapy outcomes between the high- and low-risk groups. Related
studies speculated that mainstream immune checkpoints such as PD1,
PDL1, and CTLA4 might not be the main immunosuppressive molecules of
PDAC [[222]97]. The combination of immunocheckpoint inhibitors and
chemotherapy showed good benefits for PDAC patients [[223]71].
Therefore, combined with the results of our analysis of immune
checkpoints and chemotherapy agents, our results can provide
referential treatment options for patients in different risk groups.
Early diagnosis plays a crucial role in improving cancer treatment
outcomes, and the risk genes identified in our study have the potential
to contribute to early diagnosis of PDAC. The analysis of ARID5A
expression in pan-cancers and immunohistochemical results indicate its
high expression in PDAC. Single-cell expression analysis further
reveals its high expression in CD8T cells. Additionally, its expression
level has been correlated with the prognosis of various cancers,
highlighting its potential as a diagnostic biomarker for PDAC.
However, it is important to acknowledge the limitations of our study.
Firstly, the lack of validation in real-world clinical cohorts and the
small number of patients in the validation study restrict the
generalizability of our findings. Secondly, the expression of risk
genes lacks experimental validation in clinical samples or at the
cellular level, such as through real-time PCR. These limitations should
be addressed in future research to further validate the diagnostic
potential of the identified risk genes in PDAC.
Conclusion
In conclusion, this study introduces a hypoxia-based prognostic
molecular signature for PDAC and explores the underlying factors
contributing to prognostic differences in high- and low-risk groups
based on this signature. The high-risk group, characterized by high
hypoxia and poor prognosis, exhibits a typical immune cold phenotype,
whereas the low-risk group demonstrates a favorable prognosis and
extensive immune cell infiltration, offering new insights into PDAC
treatment. The identified signature holds potential as a prognostic
marker for PDAC and can inform the combination of immunotherapy and
chemotherapy. Among the risk genes, ARID5A emerges as promising
therapeutic target and a potential molecular marker for early diagnosis
of PDAC.
Supplementary Information
[224]12957_2023_3142_MOESM1_ESM.tif^ (1.7MB, tif)
Additional file 1: Figure S1. Wilcoxon signed-rank test results of 14
typical pathway activation scores between cluster_1 and cluster_2
groups. (*p<0.05, **p<0.01, ***p<0.001, **** p<0.0001).
[225]12957_2023_3142_MOESM2_ESM.tif^ (4.1MB, tif)
Additional file 2: Figure S2. Scatter plots of the module membership
(MM) and gene significance (GS) of each gene in 8 modules which
significantly associated with hypoxia cluster.
[226]12957_2023_3142_MOESM3_ESM.tif^ (2.5MB, tif)
Additional file 3: Figure S3. Results of KEGG pathway analysis of
candidate gene set.
[227]12957_2023_3142_MOESM4_ESM.tif^ (1.1MB, tif)
Additional file 4: Figure S4. Comparison of the stromal, estimate and
tumor purity scores of high- and low-risk groups in the TCGA-PDAC
cohort. (*p<0.05, **p<0.01, ***p<0.001, **** p<0.0001).
[228]12957_2023_3142_MOESM5_ESM.tif^ (4.1MB, tif)
Additional file 5: Figure S5. Spearman correlation analysis about risk
score and tumor-infiltrating immune cells.
[229]12957_2023_3142_MOESM6_ESM.tif^ (2.2MB, tif)
Additional file 6: Figure S6. Correlation between risk score and 20
inhibitory immune checkpoints. The color and the values indicate the
Spearman correlation coefficient.
[230]12957_2023_3142_MOESM7_ESM.tif^ (7.6MB, tif)
Additional file 7: Figure S7. Pan-cancer survival analysis of ARID5A.
Acknowledgements