Abstract
Functional genomics studies have helped researchers annotate
differentially expressed gene lists, extract gene expression
signatures, and identify biological pathways from omics profiling
experiments conducted on biological samples. The current geneset,
network, and pathway analysis (GNPA) web servers, e.g., DAVID, EnrichR,
WebGestaltR, or PAGER, do not allow automated integrative functional
genomic downstream analysis. In this study, we developed a new
web-based interactive application, “PAGER Web APP”, which supports
online R scripting of integrative GNPA. In a case study of melanoma
drug resistance, we showed that the new PAGER Web APP enabled us to
discover highly relevant pathways and network modules, leading to novel
biological insights. We also compared PAGER Web APP’s pathway analysis
results retrieved among PAGER, EnrichR, and WebGestaltR to show its
advantages in integrative GNPA. The interactive online web APP is
publicly accessible from the link,
[32]https://aimed-lab.shinyapps.io/PAGERwebapp/.
Keywords: PAGER, melanoma, functional genomics, geneset analysis,
network visualization and analysis, PAGER Web APP, GNPA
Introduction
Functional genomics analysis is widely performed to characterize genes
and intergenic regulatory regions in the genome that contribute to
different biological processes ([33]Yang et al., 2020; [34]Angeloni et
al., 2021). Essentially, functional genomics provides a way to reveal
the molecules’ coordination in mechanisms due to a specific phenotype
([35]Raamsdonk et al., 2001; [36]Rahaman et al., 2015). By tracking the
molecular activities in the specific biological conditions, we could
identify those driver and passenger genes working in a model linking
genotype to phenotype. Numerous studies have shown that the molecules
working in pathways could help in disease diagnosis ([37]Zhang and
Chen, 2010; [38]Drier et al., 2013; [39]Livshits et al., 2015; [40]Bock
and Ortea, 2020; [41]Pian et al., 2021), cancer subtyping ([42]Zhang
and Chen, 2013; [43]Mallavarapu et al., 2020; [44]Lafferty et al.,
2021), and personalized medicine ([45]Chen et al., 2007; [46]Hamburg
and Collins, 2010; [47]Raghavan et al., 2017). Additionally,
multi-omics analysis provides a complex map linking transcriptomics,
proteomics, and metabolomics ([48]Subramanian et al., 2020;
[49]Andrieux and Chakraborty, 2021). In multi-omics studies, the
challenges for functional genomics are the coverage of contents, the
rendering of the complex network-based models, and the easy-to-use
software with advanced features. Therefore integrative geneset,
network, and pathway analysis (GNPA) have emerged in the past decade to
lessen the burden of multi-omics data analysis users ([50]Wu et al.,
2014). Pathway analysis, especially topology-based approaches that
exploit all the knowledge about how genes and proteins interact in a
pathway, have been developed to discover the mechanical changes through
pathway-level scoring and pathway significance assessment ([51]Draghici
et al., 2007; [52]Mitrea et al., 2013; [53]Nguyen et al., 2018). To
better understand the impact of perturbations or genetic modifications
in a system-level, System-level PAThway Impact AnaLysis using map
(SPATIAL), Signaling Pathway Impact Analysis - Global Perturbation
Factor (SPIA-GPF), and SPATIAL-GPF have been introduced ([54]Bokanizad
et al., 2016).
During the last decade, several GNPA web servers have been developed
([55]Subramanian et al., 2005; [56]Khatri et al., 2012), including
DAVID ([57]Jiao et al., 2012), EnrichR ([58]Kuleshov et al., 2016),
WebGestalt ([59]Liao et al., 2019), and pathways, annotated gene lists
and gene signatures electronic repository (PAGER) ([60]Yue et al.,
2018). The highlights of those webservers are interactive and
comprehensive data coverage. The first version of the DAVID tool was
published in 2003 ([61]Dennis et al., 2003), and it is one of the
earliest geneset enrichment analysis webservers. The most updated
version of DAVID implements many advanced features such as gene
ranking, which gives a quick focus on the most likely important
candidate genes, gene with annotation in each single view, and gene
extension to make functional inferences ([62]Jiao et al., 2012).
EnrichR was initially developed in 2013, and its merits come from
comprehensive data coverage and interactive visualization panel
([63]Chen et al., 2013). EnichR provides 190 libraries and adds Appyter
to visualize EnrichR results in different styles ([64]Kuleshov et al.,
2016). WebGestalt was introduced in 2005 ([65]Zhang et al., 2005), and
it highlights the visualization of gene ontology hierarchy structure
and pathway view of wikiPathway. WebGestaltR implemented with R
language in the recent updates ([66]Liao et al., 2019).
PAGER was initially conceived in 2014 ([67]Harini et al., 2008) and
subsequently developed in 2015 ([68]Yue et al., 2015) with a
standardized concept called “PAGs” (Pathways, Annotated gene lists, and
Gene signatures) that integrates different levels of gene-sets. PAGER
highlights the measurement of biological relevance using normalized
Cohesion Coefficient (nCoCo) and advances the network interpretation of
functional genomics results in several aspects. Additionally, PAGER
introduced the computational strategies in generating m-type
(co-membership) or r-type (regulatory) PAG-to-PAG relationships. PAGER
also provides gene prioritization in each PAG. For the intra-PAG
network construction, PAGER adopts the protein-protein interactions
from the HAPPI database ([69]Chen et al., 2017), a comprehensive and
high-quality map of Human annotated and predicted protein interactions,
and gene regulations validated in vitro experiment. Hence, PAGER
enables gene prioritization using the network topology in each PAG
([70]Yue et al., 2018). All four web servers support API (Application
Programming Interface) services.
In this study, we developed the PAGER Web APP, an interactive online
application to perform the gene set enrichment analysis and network
interpretation of the functional genomics result. PAGER Web APP
provides preprocessed RNA-seq data from UALCAN-processed TCGA data
([71]Chandrashekar et al., 2017) and a melanoma drug
resistant-sensitive case study ([72]Snyder et al., 2014) from
cBioPortal ([73]Gao et al., 2013). We illustrated how the PAGER Web APP
enhances the potential to discover biological insights using
network-based computational strategy by comparing the enriched pathways
from the three leading web servers using their application programming
interfaces (APIs). We performed three additional case studies, multiple
sclerosis (MS), colonic mucosa in Crohn’s disease (CD), and ulcerative
colitis (UC) study, to compare the three web server performances and
further validate the pathways using PubMed co-citations. We intend for
PAGER Web APP to become a popular application for researchers
interested in integrative GNPA.
Methods
Workflow and User Interface
We developed a four-step procedure in performing the functional
genomics analysis in PAGER Web APP for Human genomics results
([74]Figure 1). Firstly, users need to either load Demo data or upload
their data. In the Demo data, PAGER Web APP provides a melanoma
dataset, a multiple sclerosis dataset, a Crohn’s disease dataset, an
ulcerative colitis dataset and 16 cancer types collected from UALCAN
TCGA data ([75]Chandrashekar et al., 2017). If users need to upload the
data, we ask users to provide a tab-delimited.txt format file, check
the log2 fold change column and p-value column, and click on the
“proceed” button. Secondly, PAGER Web APP will generate a volcano plot
using the gene’s log2 fold changes and colors the over-expressed
candidate genes red and under-expressed candidate genes blue using the
default threshold p-value ≤ 0.05 and absolute log2 fold change ≥1.
PAGER Web APP allows users to adjust the log2foldchange and negative
log2-based p-value to optimize the candidate gene list. Users need to
click on the proceed button to the next step. Thirdly, PAGER Web APP
will perform the gene-set enrichment analysis with the pathway type
geneset sources (P-type PAGs) in default. Users can add or remove the
source name in the source multiple-choice field. PAGER Web APP also
allows users to change the minimum number of overlapped genes,
similarity score, and “-log2p-value” cutoff. The similarity score is
based on the combination of overlap coefficient and Jaccard index using
the methods described previously ([76]Huang et al., 2012). In the table
of enriched genesets results, users can use the column “PAGER link” to
navigate to the web-hosted PAGER entries of the given PAG, including
the metadata, gene members, and gene networks. PAGER Web APP offers two
additional leading gene set enrichment analysis tools (EnrichR and
WebGestaltR) using the API service. We didn’t include DAVID due to the
API failure. Lastly, PAGER Web APP summarizes the similarity of the
terms and displays a Venn diagram of the overlapped terms. PAGER Web
APP also provides the corresponding tables to deliver similar terms
with similarity scores by comparing the three tools. All the tables and
plots are downloadable.
FIGURE 1.
[77]FIGURE 1
[78]Open in a new tab
The PAGER Web APP data analysis workflow. The workflow consists of four
steps with visualization panels to help biologists quickly understand
the results.
Term to Term Distance-Based Similarity of Terms Enriched From the Three Tools
The term similarity is generated based on a string metric using the
Stringdist library
([79]https://cran.r-project.org/web/packages/stringdist/index.html). We
clean up the terms or names by removing irrelevant content, such as
species, identifier, etc., and making all the terms lower case. We also
remove the redundant terms enriched from different data sources, such
as “MAPK signaling pathway” may come from KEGG and wikiPathway at the
same time. Then we apply the string similarity using optimal string
alignment (OSA) distance ([80]Boytsov, 2011) to generate the similarity
matrix between two sets of terms, set A and set B.
Assume there are two terms regarded as two strings
[MATH: a :MATH]
and
[MATH: b :MATH]
, the restricted distance is defined as
[MATH:
da,b(i,j)<
/mrow> :MATH]
in a recursive calculation, the
[MATH: i :MATH]
is the prefix of string
[MATH: a :MATH]
, and the
[MATH: j :MATH]
is the prefix of string
[MATH: b :MATH]
.
[MATH: da,b(i,j)=min{0 if i=j=0da,b(i−1,j)+1 if i>0(deletion)
mtr>da,b(i,j−1)+1 if j>0(insertion)
mtr>da,b(i−1,j−1) if ai=bj(match)
mtr>da,b(i−1,j−1)+1 if ai≠bj(substitution)da,b(i−2,j−2)+1 if i,j>1 and ai=bj−1 and ai−1=bj(transposition) :MATH]
The string similarity is calculated by:
[MATH: 1−da,b(i,j)max(|a|,|<
/mo>b|) :MATH]
where
[MATH: |a| :MATH]
represents the length of string
[MATH: a :MATH]
, and
[MATH: |b| :MATH]
represents the length of string
[MATH: b :MATH]
.
After generating the similarity matrix between the two lists of terms,
we check each row (a term from the set A) and take the highest score
with the term as the most similar term. Therefore, we generate a list
of pairwise term-to-term similarities. Finally, we use the default or
customized similarity cutoff to filter low similar term-to-term pairs.
Apply Louvain Clustering in m-Type PAG-To-PAG Networks to Identify PAG
Communities
We apply the Louvain clustering function in the igraph library in R
([81]https://cran.r-project.org/web/packages/igraph/index.html) to find
the community structure in m-type PAG-to-PAG networks. The Louvain
clustering is based on the modularity in a scale between -0.5
(non-modular clustering) to 1 (fully modular clustering) described in
the paper ([82]Blondel et al., 2008).
Extract the Critical Concepts From Pathways and Show Them in Word-Clouds
We create bag-of-words from the space-separated PAG names to present
the frequently appearing words in each PAG for any enriched PAG set. We
create word corpus, remove the potential punctuation such as comma,
colon, etc., make all the words lower case, remove both irrelevant
words and common words, “pathway,” “signaling,” “human,” “homo,”
“sapiens,” “has,” “or,” and “and”. Finally, we apply wordcloud2
function in the wordcloud2 library in R
([83]https://cran.r-project.org/web/packages/wordcloud2/index.html) for
the visualization.
Implementing the Software
The PAGER Web Application user interface is designed using bs4Dash
([84]https://cran.r-project.org/web/packages/bs4Dash/index.html)
package in R. The application is supported by R Shiny
([85]https://shiny.rstudio.com/) framework. In addition to data
processing and statistical analysis, GNPA Analysis are implemented
using PAGER API, EnrichR API and WebGestaltR API. Graphing libraries
like Plotly ([86]https://plotly.com/r/), igraph in R
([87]https://igraph.org/r/), ggplot2, wordcloud2, and VennDiagram have
been used.
Prepare the Melanoma Drug Resistant-Sensitive Data From cBioPortal
We downloaded the melanoma dataset of 64 patient samples from
cBioPortal, and it was initially published in a paper in the New
England Journal of Medicine ([88]Snyder et al., 2014). We identified a
cohort from the patients who are in the metastasis stage (m1c) with
Neuroblastoma RAS Viral Oncogene Homolog (NRAS gene) or/and v-Raf
murine sarcoma viral oncogene homolog B (BRAF gene) mutations. Hence,
we obtained three drug response patients, two drug weakly response
patients and seven non-response patients. We applied the DEseq2 library
in R
([89]https://bioconductor.org/packages/release/bioc/html/DESeq2.html)
to generate the differentially expressed genes that compared
drug-resistant patients to drug-sensitive patients. The output file is
stored in the PAGER Web APP as a demo.
Prepare the Multiple Sclerosis Data From the EMBL-EBI Database
We loaded the differentially expressed gene table,
“extdata/E-GEOD-21942.topTable.RData”, preprocessed in the ROntoTools
library ([90]Ansari et al., 2016). This dataset contains a genome-wide
array expression study in peripheral blood mononuclear cells (PBMC)
from 12 multiple sclerosis (MS) patients and 15 controls ([91]Kemppinen
et al., 2011). We selected differentially expressed genes using
adjusted p-value ≤ 0.01 (2,864 genes) and saved their fold changes as
input of ROntoTools. We set the adjusted p-value ≤ 0.01 and the
absolute logFC >0.5 to get the 1,470 candidate genes as the input of
PAGER, EnrichR and WebGestaltR.
Prepare the Colonic Mucosa Data From the EMBL-EBI Database
We downloaded the transcription profiling by array of RNA from inflamed
and non-inflamed colonic mucosa (E-MTAB-2967). In Crohn’s disease,
there are 15 inflamed colonic mucosa and 15 controls. In ulcerative
colitis, there are 14 inflamed colonic mucosa and 14 controls. We
performed the normalization and linear regression using the limma
library in R
([92]https://bioconductor.org/packages/release/bioc/html/limma.html).
We set the cutoffs of adjusted p-value ≤ 0.05 and the absolute logFC
>0.5 to get the 518 candidate genes in Crohn’s disease and the 528
candidate genes in the ulcerative colitis study.
Validation of Pathways Using the Co-citations in PubMed Literature
To demonstrate the significance of the keywords in pathways related to
a disease, we applied a co-citation enrichment analysis using the
hypergeometric test and odds ratio. We applied the NCBI e-utils
application programming interface (API) that implements semantic
searches of PubMed abstracts to report biomedical literature citations
([93]Sayers, 2008). We implied that the likelihood of observing
articles co-mentioning disease names and the keywords from pathways is
statistically higher than random using the PubMed score ([94]Yue et
al., 2019a). In this study, the background citations using the word
“disease” denoted as
[MATH: N :MATH]
, the citations of the specific disease using the word “melanoma”
represented as
[MATH: K :MATH]
, the citations of the keywords from a pathway denoted as
[MATH: n :MATH]
, and the joint citations of “melanoma” and the keywords from a pathway
represented as
[MATH: k :MATH]
. We performed the co-citation enrichment analysis to generate
[MATH:
PubMed
score :MATH]
using the formula:
[MATH:
PubMed
score=−<
/mo>ln(∑t=kmin(n,
mo>K)<
mrow>(Kt)
(N−Kn−t)(<
mrow>Nn
)<
/mrow>) :MATH]
We calculated the odds ratio based on the formula
[MATH:
k/(K
−k)(n−k)/(N−K−n+k)
:MATH]
. We also manually checked the contents and subsequently confirmed them
using the PubTator annotation API ([95]Wei et al., 2019; [96]Wei et
al., 2013), i.e.,
[97]https://www.ncbi.nlm.nih.gov/research/pubtator-api/publications/exp
ort/pubtator?pmids=[PMID]. We took a sample list of PubMed IDs from
each retrieved entry. To remove biases and further confirm the
mentioned keywords, we applied the analysis described in the previously
developed tool called biomedical entity expansion ranking and
exploration (BEERE) ([98]Yue et al., 2019b) to extract those semantic
relationships that co-mention “melanoma” and the pathways’ keywords.
To evaluate how well the method can identify “correct” pathways, we
introduced a new hybrid validation technique. It involves first
defining the ground truth and subsequently developing a statistical
model to assess the significance of results retrieved using a receiver
operating characteristics (ROC) curve and the area-under-the-curve
(AUC) value. The hybrid technique also includes performing a literature
co-citation-based assessment. We constructed the ground truth using
ROntoTools, the best performing method reported in the review paper
([99]Nguyen et al., 2019), in three steps. Firstly, we took the
candidate genes from the differential expression analysis using the
adjusted p-value cutoff 0.05 and the absolute gene’s log fold-changes
larger than or equal to 0.5. Secondly, we performed pathway enrichment
analysis using the ROntoTools. Thirdly, we defined the “true” data set
as the significantly enriched pathways with adjusted combined p-values
≤ 0.05 (combined p-values were generated by the function “comb.pv.func”
([100]Kemppinen et al., 2011) in ROntoTools) and the “false” data set
as the retrieved pathways with adjusted combined p-values > 0.05 but
with at least one gene overlapping with the input gene list.
In the literature co-citation validation, we developed a t-test based
statistical model on evaluating how significant the p-value ranked
pathways can be supported by the PubMed scores. Particularly, we ranked
the PAGs based on adjusted p-values, and compared the top n% PAGs’
PubMed scores to the bottom (100-n)% PAGs’ PubMed scores for each
method, where n ranges from 10 to 90 with a step increment of 10. And
then, we reported their average p-values, respectively. The smaller
p-values are, the better performance the methods have.
Results
Comparison of Data Coverage and Features Among the Three Web Servers
Compared to EnrichR and WebGestalt, PAGER progresses the network
interpretation of functional genomics results. Although there are 35
unique geneset libraries reported in most updated PAGER, which are less
than EnrichR, each of PAG in PAGER contains metadata other than EnrichR
and WebGestalt, including PAG-type (pathways, annotated gene lists and
gene signatures), PAG descriptions, source link, publication reference,
curator, and nCoCo score (described in PAGER 2.0). In addition, PAGER
provides geneset intra-network views, including the protein-protein
interaction network and gene-gene regulation network members in each
geneset, while WebGestalt reports pathway maps in wikiPathway source
only. For the geneset’s inter-network, WebGestalt inherits the Gene
Ontology (GO) hieratical structure from the GO consortium. We extent
the relationship concepts by introducing m-type (co-membership)
PAG-to-PAG relationships and r-type (regulatory) PAG-to-PAG
relationships described in PAGER. The m-type PAG-to-PAG relationships
represent co-memberships between two PAGs, which reveals signaling
cross-talk between PAGs that share signaling components within signal
transduction pathways in response to external stimuli. The r-type
PAG-to-PAG relationships represent the PAG causal ordering inferred
from gene-to-gene regulations by adapting our method previously
described in PAGER ([101]Yue et al., 2018). The PAGER Web APP fulfills
all the additional features in [102]Table 1, such as term searching and
API service.
TABLE 1.
A comparison of data coverage and features among PAGER, EnrichR, and
WebGestalt web servers.
Webserver PAGER EnrichR Webgestalt
Data coverage (Human) Unique library 35 89 22
Metadata Yes Partial Partial
Gene prioritization Yes No No
Geneset intra-network Interactions Yes No Partial
Regulations Yes No Partial
Geneset inter-network m-type (co-membership) Yes Partial Partial
r-type (regulatory) Yes No No
Additional feature Term searching Yes Yes No
API Yes Yes Yes
[103]Open in a new tab
Melanoma Drug Resistant-Sensitive Patients Enriched Pathway Case Study in
Demo
To better identify the cohorts in melanoma cancer to improve the
treatment, functional genomics has been applied to the next-generation
sequencing data for an in-depth understanding of the molecular
mechanisms in the drug resistance cases. We collected the
transcriptomes from the cBioPortal database in this study. In the
result, we found 164 P-type PAGs (pathways) to be significantly
enriched.
In the 164 P-type PAGs, they are two PAGs that are derived from more
than one data source, i.e., “PI3K-Akt signaling pathway” and “Bladder
cancer”, each of which is simultaneously recorded in both
“WikiPathway_2021” and “KEGG_2021_HUMAN” data sources. Compared to the
results from EnrichR and WebGestaltR, PAGER had the greatest number of
enriched pathways, which is 164, EnrichR has 98, and WebGestaltR has 52
([104]Figure 2A). PAGER also had the greatest number of unique
pathways, which is 101 (48%). We found 33 (16%) overlapped pathways
among the three tools. In addition, 23 (11%) pathways were shared
between PAGER and EnrichR, 5 (2.4%) pathways were shared between PAGER
and WebGestaltR, and 6 (2.8%) pathways were shared between EnrichR and
WebgestaltR.
FIGURE 2.
[105]FIGURE 2
[106]Open in a new tab
The enriched pathway results for melanoma drug resistant-sensitive
patients. (A) The consensus pathways among PAGER, EnrichR and
WebGestaltR results. (B) The composition of P-type PAGs enriched in
PAGER. (C) The top-30 enriched P-type PAGs are ordered by FDR in PAGER.
In the 164 P-type PAGs reported by PAGER, there were 4 major sources
and 1 minor source ([107]Figure 2B). 50 (30.5%) are from wikiPathway,
46 (28%) are from Reactome, 37 (22.6%) are from Protein Lounge, 28
(17.1%) are from KEGG, and 3 (1.83%) are from Spike. We showed the
top-30 enriched P-type PAGs colored by the sources in the horizontal
bar-plot in [108]Figure 2C, and the details of the enriched PAGs are in
[109]Supplementary Table S1.
Critical Terms Extraction From the Louvain Clustered PAGs in the m-Type
PAG-To-PAG Networks
The 164 P-type PAGs form a densely connected m-type PAG-to-PAG network
(2,749 m-type PAG-to-PAG relationships) with an average degree of 18.
After the community detection using Louvain clustering, we found 5 PAG
clusters in the m-type PAG-to-PAG network ([110]Figure 3). The
extracted concepts reveal the general pathway functions in the
clusters. Cluster 1 consists of 3 pathways with represented terms
“FAM20C” protein (Golgi-associated secretory pathway kinase), “IGF”
protein (insulin-like growth factor). Cluster 2 has 7 pathways related
to the Gi-activating and ligand-receptor bindings. Cluster 3 is formed
by 43 pathways related to collagen formation and binding events.
Cluster 4 has 37 pathways related to inflammasome responses in cancer
or infection. Cluster 5 contains 66 pathways with the regulation of
several inflammatory and cytokine responses through the receptor
interactions. Hence, PAGER Web APP enables screening for the critical
terms and quickly identifying the specific molecular mechanism
communities in the m-type PAG-to-PAG network.
FIGURE 3.
[111]FIGURE 3
[112]Open in a new tab
The m-type PAG-to-PAG network of enriched P-type PAGs in PAGER for
melanoma drug resistant-sensitive patients. (A) The m-type PAG-to-PAG
network overview and (B) The extracted word clouds from the Louvain
clusters in the network.
Validation of the Enriched Pathways Using Literature Support in the Melanoma
Drug Resistant-Sensitive Study
We found all of them relevant to melanoma cancer for the 33 consensus
pathways among PAGER, EnrichR, and WebGestaltR results. We listed the
results using pathway name, the pairwise term similarities, keywords
used for co-citation retrieval, number of co-citations, odds ratio,
PubMed score, one of PubMed IDs, and BEERE validation (in [113]Table 2;
[114]Supplementary Table S2). All the pathways were determined to be
related to Melanoma with PubMed literature support. For the 23
consensus pathways between PAGER and EnrichR results, we found that all
of them have at least one literature support ([115]Table 3). We showed
all the BEERE-identified semantic relationships in [116]Supplementary
Table S3. We found that all the 5 consensus pathways between PAGER and
WebGestaltR results to be supported by PubMed literature citations, and
we ranked them based on the PubMed score ([117]Table 4;
[118]Supplementary Table S4). Each of the six consensus pathways
between EnrichR and WebGestaltR also had at least one literature
citation support ([119]Supplementary Table S5).
TABLE 2.
The 33 consensus pathways among PAGER, EnrichR, and WebGestaltR results
with PubMed literature support. W vs. P represents the term
similarities between WebGestaltR and PAGER results. P vs. E represents
the term similarities between PAGER and EnrichR results. W vs. E
represents the term similarities between WebGestaltR and EnrichR.
[MATH: k :MATH]
represents the citations of “melanoma” and the keywords from a pathway.
[MATH: OR :MATH]
represents the odds ratio. Score represents the
[MATH: PubMed score
:MATH]
. PMID represents one PubMed ID example from each entry. BEERE
validation represents the semantic relationships retrieved. 1 stands
for Yes, and 0 stands for No. All these abbreviations are applied to
[120]Table 3 and [121]Table 4.
Term W vs. P (%) P vs. E (%) W vs. E (%) Keywords k OR Score PMID BEERE
validation
Photodynamic therapy-induced ap-1 survival signaling. 100 100 100
Photodynamic therapy 1,076 1.150 1.20E+01 31378787 1
mir-509-3p alteration of yap1/ecm axis 100 100 100 mir-509-3p 3 2.376
1.94E+00 33968718 1
Transcriptional misregulation in cancer 100 100 100 Transcriptional
misregulation in cancer 10 1.261 1.27E+00 32079144 1
Photodynamic therapy-induced nf-kb survival signaling 100 100 100
Photodynamic, nf-kb 2 1.261 7.40E-01 16524427 1
Apoptosis-related network due to altered notch3 in ovarian cancer 100
100 100 Notch3 ovarian cancer 2 1.154 6.49E-01 28165469 1
Senescence and autophagy in cancer 100 100 100 Senescence and autophagy
in cancer 35 0.722 1.97E-02 12789281 1
Focal adhesion: pi3k-akt-mtor-signaling pathway 96 96 100
pi3k-akt-mtor-signaling pathway 36 0.699 1.04E-02 31370278 0
Cytokine-cytokine receptor interaction 100 100 100 Cytokine-cytokine
receptor 14 0.442 1.72E-04 34824546 1
il-18 signaling pathway 100 100 100 il-18 pathway 25 0.482 1.54E-05
31731729 1
Mirna targets in ecm and membrane receptors 100 100 100 mirna membrane
receptors 2 0.104 1.22E-07 34680340 0
c-type lectin receptor signaling pathway 100 100 100 c-type lectin
receptor signaling pathway 28 0.360 4.03E-11 29497419 1
Nod-like receptor signaling pathway 100 100 100 Nod-like receptor
signaling pathway 50 0.394 5.13E-15 34747716 0
il-17 signaling pathway 100 100 100 il-17 pathway 20 0.215 2.80E-20
30079767 1
Protein digestion and absorption 100 100 100 Protein digestion and
absorption 5 0.062 4.10E-29 30900145 0
Assembly of collagen fibrils and other multimeric structures 100 100
100 Collagen assembly 13 0.126 2.30E-29 29216889 1
Bladder cancer 100 100 100 Bladder cancer 1815 0.708 6.34E-53 35059301
1
Class a/1 (rhodopsin-like receptors) 100 100 100 Adenosine a1 receptor
10 0.056 8.79E-63 8463264 1
Legionellosis 100 100 100 Legionellosis 1 0.006 7.10E-77 17870669 0
Prostaglandin synthesis and regulation 100 100 100 Prostaglandin
synthesis and regulation 75 0.162 5.42E-110 3149408 1
Response to elevated platelet cytosolic ca2+ 100 100 100 Platelet,
calcium 44 0.105 1.95E-120 32562975 1
Hepatitis c and hepatocellular carcinoma 100 100 100 Hepatitis c and
hepatocellular carcinoma 20 0.054 4.73E-127 31538700 0
Interleukin-6 family signaling 100 100 100 il-6 signaling pathway 170
0.237 2.42E-131 22713796 1
tnf signaling pathway 100 100 100 tnf signaling pathway 246 0.260
1.65E-159 30591049 1
Inflammatory response pathway 100 100 100 Inflammatory response pathway
123 0.173 2.30E-161 32517213 1
Amoebiasis 100 100 100 Amoebiasis 5 0.012 2.14E-177 31173190 0
Pertussis 100 100 100 Pertussis 83 0.083 1.46E-303 23737697 1
Cytokines and inflammatory response 100 100 100 Cytokines, inflammatory
response 559 0.212 0.00E+00 31176707 0
Lung fibrosis 100 100 100 Lung fibrosis 118 0.057 0.00E+00 31249780 1
Malaria 100 100 100 Malaria 137 0.044 0.00E+00 14657217 1
Micrornas in cancer 100 100 100 Micrornas 1,211 0.342 0.00E+00 28118616
1
Rheumatoid arthritis 100 100 100 Rheumatoid arthritis 357 0.074
0.00E+00 27307502 0
salmonella infection 100 100 100 salmonella 168 0.057 0.00E+00 11773163
0
Spinal cord injury 100 100 100 Spinal cord injury 82 0.034 0.00E+00
30008656 0
[122]Open in a new tab
TABLE 3.
The 23 consensus pathways between PAGER, EnrichR results with PubMed
literature support.
Term P vs. E (%) Keywords k OR Score PMID BEERE validation
axl signaling pathway 86 axl signaling 45 1.533 5.30E+00 31871265 0
g alpha (i) signaling events 97 g protein alpha signaling events 15
0.699 6.33E-02 33588787 1
Vitamin d receptor pathway 100 Vitamin d receptor pathway 23 0.734
5.49E-02 28218743 0
Age-rage signaling pathway in diabetic complications 100 Age-rage
signaling pathway, diabetes 1 0.200 7.12E-03 25909054 0
Activation of nlrp3 inflammasome by sars-cov-2 100 Viral protein
interaction, cytokine receptor 154 0.580 8.48E-14 26920710 0
Viral protein interaction with cytokine and cytokine receptor 100 nlrp3
inflammasome 14 0.225 6.84E-14 33649199 1
pi3k-akt signaling pathway 100 pi3k-akt signaling pathway 475 0.693
2.33E-17 22453015 0
Jak-stat signaling pathway 100 Jak-stat signaling pathway 103 0.478
1.39E-17 32194688 0
Kaposi sarcoma-associated herpesvirus infection 100 Kaposi
sarcoma-associated herpesvirus infection 12 0.085 2.15E-45 16443048 0
Proteoglycans in cancer 100 Proteoglycans 766 0.554 8.83E-72 31140988 0
Hematopoietic cell lineage 100 Hematopoietic cell lineage 63 0.130
3.82E-128 26391013 0
Adipogenesis 100 Adipogenesis 25 0.060 1.48E-139 27216185 0
nf-kappa b signaling pathway 100 nf-kappa b signaling 432 0.341
1.46E-158 22433222 1
Glucocorticoid receptor pathway 100 Glucocorticoid receptor 131 0.168
1.91E-179 31911848 1
Gastrin signaling pathway 100 Gastrin 23 0.034 3.80E-246 1,6242076 1
Allograft rejection 100 Allograft rejection 76 0.081 2.85E-288 26951628
0
Human papillomavirus infection 100 Papillomavirus 405 0.237 6.17E-308
10767787 1
Selenium micronutrient network 100 Selenium 129 0.112 2.82e-318
23470450 1
Nanomaterial-induced inflammasome activation 100 Nanotechnology 563
0.160 0.00E+00 28303522 0
Covid-19 adverse outcome pathway 100 Covid-19 289 0.043 0.00E+00
32734626 0
Pathogenic Escherichia coli infection 100 Escherichia coli 431 0.033
0.00E+00 34912719 0
Lipid and atherosclerosis 100 Lipid, atherosclerosis 21 0.012 0.00E+00
29903879 1
Human cytomegalovirus infection 100 Cytomegalovirus 195 0.125 0.00E+00
15922119 1
[123]Open in a new tab
TABLE 4.
The 5 consensus pathways between PAGER, WebGestaltR results with PubMed
literature support.
Term W vs. P (%) Keywords k OR Score PMID BEERE validation
Binding and uptake of ligands by scavenger receptors 100 Ligands,
scavenger receptors 12 0.398 6.77E-05 31244937 0
Interleukin-4 and interleukin-13 signaling 100 Interleukin-4,
interleukin-13 9 0.114 5.68E-24 23972995 1
Collagen chain trimerization 100 Collagen chain 153 0.257 4.14E-102
21853302 0
Interleukin-10 signaling 100 Interleukin-10 349 0.332 1.05E-136 7852279
1
Post-translational protein phosphorylation 100 Protein phosphorylation
2,850 0.301 0 17973544 0
[124]Open in a new tab
We ranked the pathways based on the
[MATH:
PubMed
score :MATH]
([125]Yue et al., 2019a). As reported the highest PubMed score in
[126]Table 2, the photodynamic therapy has been frequently reported for
melanoma treatment in recent years ([127]Shivashankarappa and Sanjay,
2019; [128]Turkoglu et al., 2019; [129]Abramova et al., 2021;
[130]Yordi et al., 2021). We observed that in the overlapped genes
between differentially expressed gene candidates and the photodynamic
therapy-induced ap-1 survival signaling’s gene members, the five genes,
IL6 (Interleukin 6), CDKN1A (Cyclin Dependent Kinase Inhibitor 1A),
FGF7 (Fibroblast Growth Factor 7), BCL3 (B-Cell Lymphoma 3-Encoded
Protein) and PDGFRA (Platelet Derived Growth Factor Receptor Alpha)
were under-expressed in the drug-resistant patients, and the three
genes, IL6, CDKN1A and FGF7 were connected in the gene regulatory
network from PAGER ([131]de Waal Malefyt et al., 1991) ([132]Figure 4).
The pathway, “pi3k-akt signaling pathway” in [133]Table 3, contained
twelve overlapped genes. Similarly, among the under-expressed genes,
the six genes, IL6 (Interleukin 6), CDKN1A (Cyclin Dependent Kinase
Inhibitor 1A), FGF7 (Fibroblast Growth Factor 7), OSM (Oncostatin M),
COL1A1 (Collagen Type I Alpha 1 Chain) and COL1A2 (Collagen Type I
Alpha 2 Chain) were connected in the gene regulatory network
([134]Figure 4). OSM gene is upstream and stimulates the other five
genes. Since OSM is an interleukin-6 (IL-6) type cytokine to inhibit
melanoma proliferation, the loss of OSM gene expression in
drug-resistant patients may inhibit the activity of collagen
biosynthesis and interleukin-6 family signaling. Lacreusette A et al.
([135]Lacreusette et al., 2007) reported that the histone deacetylase
inhibitor (HDACi) Trichostatin A (TSA), increased OSM protein activity
and histone acetylation of the OSM receptor-beta (OSMRbeta) promoter as
well as expression of OSMRbeta mRNA and protein. Therefore,
Trichostatin A (TSA) allows the OSM protein to activate the signal
transducer and activator of transcription 3 (STAT3) and inhibit
proliferation. Thus, OSM/IL-6 resistance of melanoma cells in the
late-stage patients may benefit from histone deacetylase inhibitor
Trichostatin A.
FIGURE 4.
[136]FIGURE 4
[137]Open in a new tab
The top-ranked enriched pathways using the PubMed score and the
expression of those overlapped genes with gene regulatory networks for
melanoma drug resistant-sensitive patients. In the box plots, the
x-axis are the overlapped genes between differentially expressed gene
candidates and pathway gene members, and the y-axis are the gene
expression values. In the gene regulatory networks, a red arrow
indicates the direction of activation, and a green arrow indicates the
direction of inhibition. WAG002532 and WAG002805 are the PAG IDs of the
enriched pathways shown in (A) The pathway with the highest PubMed
score in [138]Table 2, and (B) one of the PubMed literature validated
pathways in [139]Table 3. The details of pathways shown can be
retrieved online from:
[140]http://discovery.informatics.uab.edu/PAGER/index.php/geneset/view/
[PAG ID].
Another intriguing pathway, the interleukin 10 (IL-10) signaling
pathway, reported in PAGER also shows how literature supports its
involvement in melanoma. IL-10’s role in immune system biology is that
it acts as an immunomodulator, which means that it regulates how the
immune system behaves ([141]Terai et al., 2012). Terai et al. found
that metastatic melanoma cells can produce IL-10 and that this product
can prevent the immune cells from attacking it ([142]Terai et al.,
2012). The group also found that IL-6 may play a role in the
stimulation of IL-10 production in melanoma cells ([143]Terai et al.,
2012). Thus, the PAGER analysis can help give hints to researchers as
far as finding potential disease mechanisms is concerned.
We applied precision to measure the performance among the three tools
using different cutoffs ([144]Table 5). To evaluate the co-citation
coverage in the literature, we tested the result’s precision using
different cutoffs. When we set the co-citation (
[MATH: k :MATH]
) cutoff to be 1, PAGER’s precision is 0.95 as a little lower than
WebGestalt’s precision is 0.99. When the odds ratio cutoff is set to be
0.1, PAGER has the best precision, which is 0.75, and when the PubMed
score cutoff is set to be 10e-5, PAGER still leads, giving precision to
be 0.30.
TABLE 5.
The performance of the three tools.
[MATH: k :MATH]
represents the citations of “melanoma” and the keywords from a pathway.
[MATH: OR :MATH]
represents the odds ratio.
[MATH: Score :MATH]
represents the
[MATH: PubMed score
:MATH]
.
Tool Precision
k > 0 OR > 0.1 score > 10e-5
PAGER 0.95 0.75 0.30
EnrichR 0.89 0.65 0.29
WebGestalt 0.99 0.70 0.24
[145]Open in a new tab
Validation of the Enriched Pathways Using the Topology-Based Method and
Literature Support in Multiple Sclerosis (MS), Colonic Mucosa in Crohn’s
Disease (CD), and Ulcerative Colitis (UC) Studies
In the sclerosis study, we found 20 pathways in the true set and 203
pathways in the false set using ground truth discovered by the
topology-based method, ROntoTools ([146]Figure 5A). The PAGER led by
giving the AUC 0.88, EnrichR came the next with AUC to be 0.87, and the
WebGestaltR’s AUC was 0.77. In the t-test curve, We found PAGER had the
lowest average p-value (0.318) compared with ROntoTools (0.319),
EnrichR (0.319) and WebgestaltR (0.319). In the inflamed colonic mucosa
vs. non-inflamed colonic mucosa in Crohn’s disease study ([147]Figure
5B), we found 40 pathways in the true set and 161 pathways in the false
set. Both EnrichR and PAGER had the highest AUC of 0.98, and the
WebGestaltR’s AUC was 0.87. We found EnrichR had the lowest average
p-value (0.316) compared with ROntoTools (0.317), PAGER (0.322) and
WebgestaltR (0.322). In the inflamed colonic mucosa vs. non-inflamed
colonic mucosa in the ulcerative colitis study ([148]Figure 5C), we
found 6 pathways in the true set and 199 pathways in the false set. The
EnrichR had the highest AUC of 0.99, PAGER came the next with AUC to be
0.98, and the WebGestaltR’s AUC was 0.85. We found PAGER and EnrichR
tied with the lowest average p-value (0.317) compared with ROntoTools
(0.321) and WebgestaltR (0.322). Overall, PAGER was among the best.
FIGURE 5.
[149]FIGURE 5
[150]Open in a new tab
The performance comparisons among PAGER, EnrichrR and WebGestaltR using
Receiver Operator Characteristic (ROC) curve and the t-test curve. The
pathways’ adjusted p-values were applied to generate the ROC curves.
The PubMed scores were used for the t-test curve. (A) The sclerosis
study (E-GEOD-21942). (B) The inflamed colonic mucosa vs. non-inflamed
colonic mucosa in Crohn’s disease study. (C) The inflamed colonic
mucosa vs. non-inflamed colonic mucosa in the ulcerative colitis study.
Discussion and Conclusion
To summarize, we developed an interactive online functional genomics
analysis tool, PAGER Web APP. The tool can provide new and significant
insights into functional genomics studies and may support precision
medicine in delivering the candidate targets. In the melanoma
drug-resistant-sensitive case study, we observed that the P-type PAGs
(pathways) reported in PAGER lead to insights into molecular mechanisms
validated in literature support. PAGER web server supports the feature
of r-type PAG-to-PAG network generation.
There are two potential explanations for the differences in the
enrichment results among the three tools. First, we noticed that the
database versions might vary. As reported in the EnrichR and PAGER Web
APP, the KEGG data was processed in 2021, and the WebgestaltR’s KEGG
data was processed in 2018. Newer database processing time may suggest
more freshly updated content of databases—variability that we couldn’t
control in this case study. Second, the enrichment algorithms used in
these three tools are different. PAGER adapts hypergeometric test to
perform the enrichment analysis and applies adjusted p-value using
[MATH:
p0∗ (<
/mo>m + n
pi≤p<
mn>0 − 1<
/mrow>) :MATH]
, where
[MATH: p0 :MATH]
is the original p-value,
[MATH: m :MATH]
is the number of p-value’s multiple tests from the PAGs under the
constraints of PAG source, overlaps, PAG size, and similarity score,
and the
[MATH:
npi≤
p0 :MATH]
is the number of p-values in the multiple test that has less than or
equal to the original p-value. EnrichR uses fuzzy enrichment analysis,
and applies Benjamini–Hochberg for FDR, according to the documentation
online. The WebgestaltR uses hypergeometric test to evaluate the
significance of enrichment and uses Bonferroni for p-value adjustment.
To construct the ground truth in assessing the performance of
functional genomics analysis tools, many data-driven approaches can be
applied, such as target pathway ([151]Tarca et al., 2012; [152]Tarca et
al., 2013) or gene knockout (KO) dataset ([153]Nguyen et al., 2019). In
the target pathway approach, the datasets from diseases have a pathway
describing the underlying mechanisms, and hence this pathway is
implicated in this phenotype. Therefore, a pathway analysis method is
assessed based on the ranking and significance of these target
pathways. We explored the feasibility of using either pathway ground
truth or gene knockout (KO) data sets for our case study, i.e., the
study of late-stage drug-resistant melanoma. However, we could not find
“target pathway” ([154]Tarca et al., 2012; [155]Tarca et al., 2013) as
ground truth or pathways that are not directly related to the dataset
to build all the counts in a confusion table. For genes discovered in
the enriched pathway, the “pi3k-akt signaling pathway”, we didn’t find
any OSM gene-KO Melanoma dataset in the GEO database. We also could not
use non-melanoma KO experiments for fear of introducing additional
noises. Thus, we used BEERE to extract those semantic relationships
that co-mention melanoma and the pathways’ keywords with a statistical
evaluation to assess the statistical significance of the PubMed
literature reference count above a random model. As for NCBI e-utils
literature retrieval, we also applied the PubMed score to evaluate the
statistical significance of the literature counts to conquer the
literature volume and breadth.
In the future, we expect to implement features to enhance the usage of
PAGER Web APP, which can be plugged in geneset, network, and pathway
analysis (GNPA) to improve the use. In the current version, we observed
that the user interface, especially, in the enriched results, the
enriched PAGs’ result is not that interactive enough for users to
select a certain number of PAGs or arbitrarily remove some of the
records to generate PAG-to-PAG networks. We will implement the
interactive panels in the future release. PAGER Web APP calls the PAGER
API, which implements an over-representation analysis (ORA) technique
by default. In general, advanced functional class scoring (FCS)
techniques, e.g., Gene Set Enrichment Analysis (GSEA) ([156]Subramanian
et al., 2005), Gene Set Analysis (GSA) ([157]Efron and Tibshirani,
2007) and Pathway Analysis with Down-weighting of Overlapping Genes
(PADOG) ([158]Tarca et al., 2012), can better detect the significant
effects on pathways led by large changes in individual genes and the
weaker coordinated. Other pathway analysis tools may also incorporate
network topology information to integrate signaling interactions among
genes in a pathway, e.g., Pathway-Express ([159]Khatri et al., 2007),
SPIA ([160]Tarca et al., 2009), Pathway-Guide (Advaita Bioinformatics,
[161]http://www.advaitabio.com), TopoGSA ([162]Glaab et al., 2010),
Bayesian Pathway Analysis (BPA) ([163]Isci et al., 2011), and PathNet
([164]Dutta et al., 2012), etc. We plan to implement additional
advanced topology-based pathway GSEA analysis techniques into the PAGER
APIs, and adopt comprehensive benchmark data sets ([165]Tarca et al.,
2012; [166]Tarca et al., 2013; [167]Nguyen et al., 2019) to guide users
in selecting the proper method for the right application scenario in
future releases. Thus, PAGER Web APP will offer users more expanded
analysis choices than today.
Acknowledgments