Abstract
Biological condition-responsive gene network analysis has attracted
considerable research attention because of its ability to identify
pathways or gene modules involved in the underlying mechanisms of
diseases. Although many condition-specific gene network identification
methods have been developed, they are based on partial or incomplete
gene regulatory network information, with most studies only considering
the differential expression levels or correlations among genes.
However, a single gene-based analysis cannot effectively identify the
molecular interactions involved in the mechanisms underlying diseases,
which reflect perturbations in specific molecular network functions
rather than disorders of a single gene. To comprehensively identify
differentially regulated gene networks, we propose a novel
computational strategy called comprehensive analysis of differential
gene regulatory networks (CIdrgn). Our strategy incorporates
comprehensive information on the networks between genes, including the
expression levels, edge structures and regulatory effects, to measure
the dissimilarity among networks. We extended the proposed CIdrgn to
cell line characteristic-specific gene network analysis. Monte Carlo
simulations showed the effectiveness of CIdrgn for identifying
differentially regulated gene networks with different network
structures and scales. Moreover, condition-responsive network
identification in cell line characteristic-specific gene network
analyses was verified. We applied CIdrgn to identify gastric cancer and
itsf chemotherapy (capecitabine and oxaliplatin) -responsive network
based on the Cancer Dependency Map. The CXC family of chemokines and
cadherin gene family networks were identified as gastric
cancer-specific gene regulatory networks, which was verified through a
literature survey. The networks of the olfactory receptor family with
the ASCL1/FOS family were identified as capecitabine- and oxaliplatin
sensitive -specific gene networks. We expect that the proposed CIdrgn
method will be a useful tool for identifying crucial molecular
interactions involved in the specific biological conditions of cancer
cell lines, such as the cancer stage or acquired anticancer drug
resistance.
Introduction
The complex mechanisms underlying some diseases states cannot be
understood by gene expression level analyses alone because some
diseases are caused by disturbances in the specific functions of
molecular networks rather than disorders in a single gene. One of the
most powerful emerging techniques for biomedical research is
heterogeneous gene regulatory network analysis, especially for
responsive gene sets and module identification, and such networks have
attracted considerable research attention because of their ability to
reveal the complex mechanisms of diseases. Many studies have focused on
the expression levels of genes and developed statistical methods for
identifying differentially expressed genes (DEGs) or gene sets
[[28]1–[29]3]. Computational methods based on correlations between
genes have also been proposed to identify condition-responsive
subnetworks and gene modules. Tesson et al. [[30]4] focused only on the
coexpression of genes and developed the method DiffCoEx to identify
correlation pattern changes between multiple conditions. Guo et al.
[[31]5] developed the responsive score based on covariance of genes and
then proposed the method for searching responsive networks. Although
various computational approaches have been developed to identify
significant gene sets describing phenotype-specific characteristics,
these methods are based only on partial or incomplete information
(e.g., correlation, covariance of expression levels of genes) on gene
regulatory networks. Thus, biological condition-responsive gene
networks could not be effectively identified.
Our study focuses on the drawbacks of the previous studies. In order to
consider various aspects of network structure and effectively identify
responsive networks, we propose a novel computational strategy that
incorporates comprehensive information of network structure unlike the
existing studies, called comprehensive information-based differential
gene regulatory network analysis (CIdrgn). We consider the gene
regulatory network structure based on not only expression levels of
genes but also edge structures and regulatory effects of regulator
genes to their target genes and incorporate the comprehensive
information into a dissimilarity measure to compare the gene regulatory
networks. We then proposed a novel statistic strategy to identify
differentially regulated gene networks across phenotypes or biological
conditions and extend the proposed CIdrgn to cell line
characteristic-specific gene regulatory network analysis. To the best
of our knowledge, this is the first study on responsive network
identification based on comprehensive information of network structure.
We demonstrate the effectiveness of the proposed CIdrgn scheme through
Monte Carlo simulations. Our strategy showed outstanding performance
for identifying differentially regulated gene networks with various
structures and scales. We applied our strategy to the Cancer Dependency
Map (DepMap) dataset and performed gastric cancer (GC) and chemotherapy
(capecitabine and oxaliplatin)-responsive gene network analyses. The
identified responsive networks and their members (genes) represent
candidate biomarkers for mechanisms related to gastric/colorectal
cancer and various chemotherapies. Although biological knowledge about
the identified subnetworks and markers was not incorporated into the
differential gene regulatory network analysis, our data-driven strategy
provides biologically reliable results for GC and its
chemotherapy-responsive gene network analysis. Our results suggested
that The CXC family of chemokines and cadherin (CDH) genes may play
crucial roles in tumor invasion, metastasis, and progression of GC. In
addition, the loss of activity of the olfactory receptor (OR) family
with the ASCL1/FOS family may lead to drug resistance in cell lines and
inducing these gene families may provide vital clues to enhance the
chemotherapy efficiency of capecitabine and oxaliplatin.
The remainder of this paper is organized as follows. In the Methods
section, we introduce a novel strategy for the differential gene
regulatory network analysis. In the Monte Carlo simulation section, we
present the results of the simulation studies. In the GC and
XELOX-responsive gene regulatory network analysis section, we describe
the results of GC and its chemotherapy-responsive network analysis. In
the Discussion section, we provide the conclusions.
Methods
Suppose that
[MATH: X=(<
mi
mathvariant="bold-italic">x1,…,xn)T
∈Rn×p<
/mrow> :MATH]
is an n × p data matrix that describes the expression of p possible
regulators that control target gene transcription
[MATH: yℓ∈Rn,ℓ
=1,…,q
:MATH]
for n cell lines. To estimate the gene regulatory network, we
considered the following linear regression model:
[MATH: yℓ=
∑j=1pβℓjxj+ϵℓ,ℓ=1,…
,q, :MATH]
(1)
where β[ℓj] is the regression coefficient that represents the effect of
each regulator x[j] on its target y[ℓ] and ϵ[ℓ] = (ϵ[ℓ1], …, ϵ[ℓn])^T
is a random error vector. The network can be represented by a weighted
graph G = (V, E, W), where V is the set of vertices corresponding to p
genes and E ∈ V × V is the set of edges, where (i, j) ∈ E indicates a
link between vertices i and j (i.e., genes i and j). W = (w[ij]) is the
edge weight between vertices i and j which can be estimated using
[MATH: β^ij :MATH]
and
[MATH: β^ji :MATH]
.
Previous studies
Significance analysis of microarray for gene sets (SAM-GS)
To identify statistically significant gene sets across phenotypes A and
B, Dinu et al. [[32]1] proposed the SAM-GS method by extending the SAM
t-statistic. The SAM-GS statistic is given as follows:
[MATH: DSAM-GS
mrow>=∑j∈V
(x¯Aj-x¯Bj)2s
j+s0
, :MATH]
(2)
where
[MATH: x¯Aj :MATH]
and
[MATH: x¯Bj :MATH]
are the averages of the expression levels of gene j in phenotypes A and
B, respectively; s[0] is a tuning parameter; and s[j] is the
gene-specific scatter. This scatter is given as follows:
[MATH: s(j)=a{∑i=1nA
(xi
mi>j-x¯Aj)2+
∑k=1nB<
/mi>(xkj-x¯Bj)2}<
mo>, :MATH]
(3)
where n[A] and n[B] are the numbers of cell lines in phenotypes A and
B, respectively, and a = (1/n[A] + 1/n[B])/(n[A] + n[B] − 2). The
SAM-GS statistic measures the difference in expression levels of a gene
set across phenotypes A and B.
Gene set co-expression analysis (GSCA)
Choi and Kendziorski [[33]6] proposed GSCA a statistical method to
identify differentially co-expressed genes. For genes in a subnetwork
G, GSCA computes pairwise correlations for all
[MATH: (|V|2) :MATH]
gene pairs, where |V| is the number of genes in V. The statistic of the
GSCA used to measure the dispersion of correlations between phenotypes
A and B is given as follows:
[MATH: DGSCA=1|V|(|V|-1)/2∑k
mi>=2|V|∑j=1k-1
(ckjA-ckjB)2
msqrt>, :MATH]
(4)
where
[MATH:
ckjA :MATH]
and
[MATH:
ckjB :MATH]
are the correlations between the k^th and j^th genes for the phenotypes
A and B, respectively.
Comprehensive information-based differentially regulated gene network
analysis
In this study, we proposed a novel differential gene regulatory network
analysis strategy that can effectively identify biological
condition-responsive subnetworks based on gene expression levels and
subnetwork structures. Network structures are evaluated based on
regulatory effects between genes and the edge structure of genes, and
the following statistics are proposed to measure the dissimilarity of
the subnetworks.
* Dissimilarity based on regulatory effects between genes
We define the following regulatory effect of the j^th regulator
gene on the ℓ^th target gene in phenotypes A and B, consistent with
a study conducted earlier [[34]7]:
[MATH: rℓj(A)=<
mover
accent="true">β^Aℓjx¯Ajandrℓj<
/msub>(B)=β^Bℓjx¯Bjforj=1,…
,p,ℓ=1,…
,q, :MATH]
(5)
where
[MATH: β^Aℓj :MATH]
and
[MATH: β^Bℓj :MATH]
are the regression coefficients estimated by cell lines of
phenotypes of A and B, respectively. Let R(A) = {r[1j](A), …,
r[qj](A), r[j1](A), …, r[jp](A)}^T and R(B) = {r[1j](B), …,
r[qj](B), r[j1](B), …, r[jp](B)}^T be regulatory effect vectors of
the j^th gene in phenotypes A and B, respectively, where the first
q elements indicate the regulator effects of j^th gene on its q
targets and the remaining p elements indicate the regulator effect
of the p regulators on the j^th gene. We propose the following
statistic to describe dissimilarity of the regulatory effect of the
j^th gene:
[MATH: γj(<
/mo>A,B)=1q+p|
mo>|R(A)-R(B)||22.
:MATH]
(6)
The large value γ[j](A, B) indicates the j^th gene has different
expression levels and size of edges connected its neighborhoods
between phenotypes A and B. We then propose the following statistic
to measure dissimilarity of regulatory effects of a subnetwork G:
[MATH: Γ=1|V|∑
j∈Vγj(A,B)
. :MATH]
(7)
The large value of Γ indicates that the genes consisting of G have
different regulatory effects on their target genes across the
phenotypes, thus the network corresponding the large Γ can be
considered as the responsive network having phenotype specific
regulatory effects.
* Dissimilarity of the edge structure of nodes
The genes linked in the networks may have similar biological
functions; thus, node (gene) similarity can be defined using the
Jaccard index. We propose the following statistic for the node
similarity of the j^th gene between phenotypes A and B in line with
Li et al. [[35]8]:
[MATH: Nj(<
/mo>A,B)=|NAj∩NBj<
/msub>||
NAj∪N
Bj|, :MATH]
(8)
where N[Aj] and N[Aj] are the sets of nodes that are directly
connected to the j^th gene within phenotypes A and B, respectively.
If the gene have similar target and regulator genes between
phenotypes A and B, than we consider that the gene is not phenotype
specific feature. Thus, we propose the following Jaccard distance
metric to measure the dissimilarity of the edge structure of the
j^th gene:
[MATH: λj(<
/mo>A,B)=1
-Nj(A,B). :MATH]
(9)
We then propose the following statistic to describe the
dissimilarity of edge structures in a subnetwork:
[MATH: Λ=1|V|∑
j∈Vλj(A,B)
. :MATH]
(10)
* Dissimilarity based on adjusted regulatory effects
We also consider the following statistic based on the adjusted
regulatory effect using the Jaccard distance:
[MATH: ΓΛ=1|V|∑j∈V{<
/mo>γj(A,B)+γj(A,B)·
mo>λj(A,B)}. :MATH]
(11)
To comprehensively measure the dissimilarity of a subnetwork across
biological conditions or phenotypes, we propose the following statistic
that incorporates not only expression levels of genes but also edge
structures and regulatory effects into the differential gene regulatory
network analysis,
[MATH: CIdrgn.1=ΓΛ+<
msub>DSAM-GS, :MATH]
(12)
[MATH: CIdrgn.2=Γ+Λ+DSAM-GS<
/msub>. :MATH]
(13)
The large values of the statistics CIdrgn.1 and CIdrgn.2 indicate that
the subnetworks show different regulatory structures for phenotypes A
and B.
Responsive subnetwork extract
To extract biological condition-responsive subnetworks, we computed the
significance of the proposed statistics based on the permutation
framework.
We first compute the statistics Γ, Λ, and Γ[Λ] for the subnetworks.
Then, we generate permutation samples
[MATH:
SA(pm)<
/mo> :MATH]
and
[MATH:
SB(pm)<
/mo> :MATH]
and estimate gene regulatory networks G^(pm)(A) and G^(pm)(B) based on
[MATH:
SA(pm)<
/mo> :MATH]
and
[MATH:
SB(pm)<
/mo> :MATH]
for pm = 1, …, T. The permutation statistics Γ^(pm), Λ^(pm), and
[MATH:
ΓΛ(pm) :MATH]
were computed for G^(pm)(A) and G^(pm)(B). Because the statistics have
different scales, we normalize each statistic Γ, Λ, and Γ[Λ] to prevent
that any one dominates CIdrgn.1 and CIdrgn.2, where
[MATH: Γ=(Γ
,Γ(1),…Γ(T))T,Λ=(Λ
,Λ(1),…Λ(T))T,ΓΛ=(
mo>ΓΛ,Γ
Λ(1)<
mo>,…ΓΛ
(T))T<
/msup>. :MATH]
(14)
For notational compactness, in the remainder of this section, we shall
write the notation in [36]Eq (14) as normalized statistics.
We then computed the similarity measures CIdrgn.1 and CIdrgn.2 based on
the normalized statistics and computed the following permutation
p-values:
[MATH: p.valueCIdrgn.1=<
mrow>∑pm=1<
/mrow>TI({CIdrgn.1≤CIdrgn(pm).
1)T,
mtd> :MATH]
(15)
[MATH: p.valueCIdrgn.2=<
mrow>∑pm=1<
/mrow>TI({
CIdrgn.2≤CIdrgn
(pm).<
mn>2})T, :MATH]
(16)
where CIdrgn^(pm).1 and CIdrgn^(pm).2 are the statistics computed by
the pm^th permutation sample and I(⋅) is an indicator function. The
small permutation p-value indicates that networks G(A) and G(B) show
significantly different regulatory structures.
The algorithm for CIdrgn is given in Algorithm1. For the dissimilarity
measure CIdrgn.2, similar procedures were applied to compute the
permutation p.value[CIdrgn.2] and identify the responsive subnetworks.
The proposed statistic is based on comprehensive information of the
gene regulatory network, that is, the dissimilarity of node edge
structures, regulatory effects, and expression levels; thus, we can
effectively extract biological condition-specific gene networks.
[37]Fig 1 shows overall framework of the proposed CIdrgn.
Fig 1. Overall framework of the proposed CIdrgn.
[38]Fig 1
[39]Open in a new tab
Algorithm 1 Algorithm for CIdrgn
1: Construct gene regulatory networks for phenotypes A and B: G(A),
G(B).
2: Compute the statistics Γ, Λ and Γ[Λ].
3: Construct permutation samples
[MATH:
SA(pm)<
/mo> :MATH]
and
[MATH:
SB(pm)<
/mo> :MATH]
, estimate gene networks based on the permutation samples G^pm(A) and
G^pm(B) and compute the statistics Γ^(pm), Λ^(pm), and
[MATH:
ΓΛ(pm) :MATH]
for pm = 1, …T.
4: Normalize Γ, Λ, Γ[Λ] and then compute CIdrgn.1 and CIdrgn^(pm).1 for
pm = 1, …, pm.
5: Compute permutation p-values of the statistic CIdrgn.1:
p.value[CIdrgn.1].
6: If p.value[CIdrgn.1] < κ, then select the subnetwork as a
differentially regulated gene networks, where κ is a threshold.
7: Output the list of subnetworks.
CIdrgn for cell line characteristic-specific gene networks
We consider cell line characteristic-specific gene regulatory networks
that can be estimated using the following varying coefficient model
[[40]9]:
[MATH: yℓ=
∑j=1pβjℓ
(mα)·xj+ε<
/mi>ℓ,ℓ=1,…
,q, :MATH]
(17)
where β[jℓ](m[α]) is the varying coefficient of the j^th regulator gene
x[j] on ℓ^th target gene y[ℓ] for the α^th target sample with modulator
M = m[α], such as the cancer characteristics of the cell lines. The
varying coefficient model enables us to construct the gene regulatory
networks under varying conditions of cell line characteristics M =
m[α]. The network of the α^th cell line can be represented using a
weighted graph G[α] = (V, E[α], W[α]), where W[α] = (w[αij]) is the
edge weight, which can be computed based on
[MATH: β^ij(mα)<
/mrow> :MATH]
and
[MATH: β^ji(mα)<
/mrow> :MATH]
. E[α] ∈ V × V is the set of edges, where (i, j) ∈ E[α] indicates the
link between vertices i and j in the network of the α^th cell line.
We also propose the method for identifying responsive subnetwork from
cell line characteristic-specific gene networks.
* Dissimilarity based on the regulatory effect of the sample-specific
gene network
For the sample-specific gene network, the regulatory effect of the
j^th regulator gene on ℓ^th target gene in the α^th cell line is
given as follows:
[MATH: rαℓj=β^ℓj(mα)<
/mrow>·xαj,forj=1,…
,p,ℓ=1,…
,q, :MATH]
(18)
where x[αj] is an expression level value of the j^th gene in the
α^th cell line. We define the regulatory effect of phenotypes A and
B as follows:
[MATH: rℓj
ss(A
)=median(rαlj;
mo>α∈A)andrℓjss(B
)=median(<
mi>rαlj;α∈B), :MATH]
(19)
where
[MATH: A :MATH]
and
[MATH: B :MATH]
are the set of cell lines of phenotypes A and B, respectively. Let
[MATH: Rss
msup>(A)={r1jss(A
),…,rqj(A)
,rj1
ss(<
mi>A),…,r
jp(A<
/mi>)}T
:MATH]
and
[MATH: Rss
msup>(B)={r1jss(B
),…,rqj(B)
,rj1
ss(<
mi>B),…,r
jp(B<
/mi>)}T
:MATH]
be regulatory effect vectors of the j^th gene in the network of
α^th cell line for phenotypes A and B, respectively. We consider
the statistic to measure the dissimilarity of the regulatory effect
of the j^th gene in sample-specific gene networks as follows
[MATH: γjs
s(A,B)=1q+p||Rss(A)-Rss(B)||22. :MATH]
(20)
We then propose the following statistic to measure dissimilarity of
the regulatory effects of genes in a subnetwork G[α],
[MATH: Γss=1|V|
mo>∑j∈VΓjss(A,B)
mo>. :MATH]
(21)
* Dissimilarity of the node edge structures for sample-specific gene
network
We consider
[MATH: ∪α∈ANαj :MATH]
and
[MATH: ∪α∈BNαj :MATH]
as the sets of edges of the j^th gene in phenotypes A and B,
respectively. We then define the following node similarity:
[MATH: Njs
s(A,B)=|
(∪α∈AN<
mi>αj)∩(
∪α∈BN<
mi>αj)||(∪<
mrow>α∈AN<
mi>αj)∪(
∪α∈BN<
mi>αj)|, :MATH]
(22)
where N[αj] is the sets of nodes that are connected to the j^th
gene in the network of the α^th cell line. The Jaccard distance of
the j^th node in the cell line characteristic-specific gene network
is given as follows:
[MATH: λjs
s(A,B)=1-Njss(A
,B).
:MATH]
(23)
Similar to ([41]10), the dissimilarity measure of the edge
structure is given as follows:
[MATH: Λss=1|V|
mo>∑j∈Vλjss(A,B)
mo>. :MATH]
(24)
We then proposed the following statistics to identify differentially
regulated gene networks in the cell line characteristic-specific gene
network analysis.
[MATH: CIdrgnss.1=ΓΛss+DSAM-G
S, :MATH]
(25)
[MATH: CIdrgnss.2=Γss+Λss+DSAM-GS, :MATH]
(26)
where
[MATH: ΓΛs
s=1|<
mi>V|∑j∈V{γjss(A<
mo>,B)+γj<
/mi>ss(A,B)·λjss(A
,B)}.
mtd> :MATH]
(27)
For the cell line characteristic-specific gene network, the
condition-responsive subnetworks are extracted based on the permutation
p-value of CIdrgn^ss.1 and CIdrgn^ss.2, which are consistent with
p.value[CIdrgn.1] and p.value[CIdrgn.2].
Monte Carlo simulations
Monte Carlo simulations were conducted to investigate the performance
of the proposed CIdrgn. Suppose that we have two phenotypes A and B and
10 subnetworks consisting of 10 genes, where the sample sizes of
phenotypes A and B are n[A] = 50 and n[B] = 50, respectively. We
consider four common subnetworks of the two phenotypes A and B, six
A-specific and six B-specific subnetworks. The expression levels of the
p genes in 100 (= n[A] + n[B]) cell lines were generated using the
following procedure. In scenario 1, we generated four precision
matrices
[MATH:
Ωcnw :MATH]
for nw = 1,.., 4 common subnetworks from “random’’ graph structure
using huge.generator in R package Huge, where the off-diagonal elements
of the precision matrix were set to 0.5 and the positive number 0.2 was
added to the diagonal elements of the precision matrix, which is
consistent with [[42]10]. For each 10 genes consisting of the 4 common
subnetworks, the expression levels of 100 cell lines were generated
from
[MATH: N(010,Ωc-1
) :MATH]
. For the six A-specific and B-specific subnetworks, the precision
matrices (
[MATH:
ΩAnw :MATH]
and
[MATH:
ΩBnw :MATH]
for nw = 1, …, 6) were generated from the “random” and “band” graph
structures, respectively. For each 10 genes consisting of the each six
A- and six B-specific subnetworks, we generated expression levels of
n[A] = n[B] = 50 cell lines from
[MATH: N(010,ΩA-1
) :MATH]
and
[MATH: N(μ10,ΩB-1) :MATH]
, respectively. The expression levels of the remaining p − 100 genes
were generated from N(0[p−100], I[p]). Scenarios 2, 3, and 4 were
similar to scenario 1 except that the precision matrices
[MATH:
Ωcnw :MATH]
for nw = 1,.., 4 and
[MATH:
ΩAnw :MATH]
for nw = 1, …, 6 have “cluster”, “scale-free”, and “hub” graph
structures for scenarios 2, 3, and 4, respectively. [43]Fig 2 shows the
graph structures of the common subnetworks and A- and B-specific
subnetworks.
Fig 2. Subnetwork structures.
[44]Fig 2
[45]Open in a new tab
We considered p = 1000 and μ[10] = (0.3, 0.3, …, 0.3)^T, and the gene
regulatory networks were estimated using the lasso method [[46]11],
where the response and predictor variables are the expression levels of
the target gene and regulator genes, respectively. We evaluated the
proposed CIdrgn by comparing it with the statistics SAM-GS and GSCA.
The permutation p-values of the statistics SAM-GS and GSCA were
computed, and responsive subnetworks were extracted by the permutation
p-values with a significance level of 0.05, such as in the case of
CIdrgn.
The recall, precision, true negative rate (TNR), F-measure, and
accuracy results are given in the column labeled “subnetwork size:10”
of [47]Table 1. CIdrgn provides effective results for responsive
subnetwork identification overall, although limited differences are
observed. In particular, CIdrgn.2 shows effective results in terms of
precision, true negative rate, F-measure, and accuracy. The methods
show similar results for the true positive rate (i.e., Recall) for six
responsive network identifications, while our method provides
outstanding performance for the true negative rate (TNR) for four
common subnetworks.
Table 1. Simulation results.
Methods Subnetwork size: 10 Subnetwork size: 50
SCN.1 SCN.2 SCN.3 SCN.4 SCN.1 SCN.2 SCN.3 SCN.4
Recall SAMGS 0.970 0963 0.950 0.933 1.000 1.000 1.000 1.000
GSCA 1.000 1.000 0.993 1.000 1.000 1.000 1.000 1.000
CIdrgn.1 1.000 0.997 0.993 0.997 1.000 1.000 1.000 1.000
CIdrgn.2 0.993 0.993 0.990 0.997 1.000 1.000 1.000 1.000
Precision SAMGS 0.986 0.988 0.982 0.985 0.991 0.986 0.974 0.961
GSCA 0.966 0.970 0.954 0.969 0.961 0.971 0.972 0.958
CIdrgn.1 0.989 0.994 0.984 0.991 0.994 0.989 0.991 0.969
CIdrgn.2 0.989 0.997 0.989 0.991 0.994 0.991 0.994 0.970
TNR SAMGS 0.975 0.980 0.970 0.975 0.985 0.975 0.955 0.930
GSCA 0.940 0.945 0.915 0.945 0.930 0.950 0.950 0.925
CIdrgn.1 0.980 0.990 0.970 0.985 0.990 0.980 0.985 0.945
CIdrgn.2 0.980 0.995 0.980 0.985 0.990 0.985 0.990 0.945
F.measure SAMGS 0.975 0.974 0.962 0.954 0.995 0.992 0.986 0.979
GSCA 0.982 0.984 0.971 0.983 0.979 0.985 0.985 0.977
CIdrgn.1 0.994 0.995 0.987 0.994 0.997 0.994 0.995 0.983
CIdrgn.2 0.990 0.995 0.988 0.994 0.997 0.995 0.997 0.984
Accuracy SAMGS 0.972 0.970 0.958 0.950 0.994 0.990 0.982 0.972
GSCA 0.976 0.978 0.962 0.978 0.972 0.980 0.980 0.970
CIdrgn.1 0.992 0.994 0.984 0.992 0.996 0.992 0.994 0.978
CIdrgn.2 0.988 0.994 0.986 0.992 0.996 0.994 0.996 0.978
[48]Open in a new tab
We also consider the scenarios for a large-scale subnetwork, that is,
each 10 subnetworks consist of 50 genes. The results for the
large-scale subnetwork scenarios are presented in the column labeled
“subnetwork size:50” in [49]Table 1. The results show that our strategy
also provides outstanding performance for large-scale subnetwork
situations, where CIdrgn.2 also shows the most effective results for
differentially regulated gene network identification.
Monte Carlo simulations: Sample-specific gene network analysis
For the cell line characteristic-specific gene regulatory networks, we
considered the same network structures given in [50]Fig 2. For
sample-specific gene regulatory strength, we generated precision
matrices as follows:
[MATH: Ωijα(A
mi>)=Ωij(A)×0.5+{Ωij(A)
mo>-Ωij
msub>(A)×0.5}δα
,α=1,…
,50,Ωijα(
B)=Ωij<
/mi>(B)×
0.5+{Ωij(N
)-Ωij(B)×0
.5}δα<
mo>,α=1,…
,50. :MATH]
(28)
The gene expression levels were determined using the same method
applied in the gene network analysis with bulk cell lines except that
the expression of genes in six A-specific and six B-specific
subnetworks was generated as x[α](A) ∼ N(0[10], Ω^α(A)) and x[α](B) ∼
N(μ[10], Ω^α(B)) for α = 1, …, 50. The modulator M = (m[1], …,
m[100])^T that describes the characteristics of cell lines was
generated from the uniform distribution U(−1, 1).m The cell line
characteristic-specific gene network was estimated using SiGN-L1
([51]https://sign.hgc.jp/signl1/) [[52]7].
[53]Table 2 shows the results for cell line characteristic-specific
gene networks.
Table 2. Simulation results: Cell line characteristic analysis.
Methods Subnetwork size: 10 Subnetwork size: 50
SCN.1 SCN.2 SCN.3 SCN.4 SCN.1 SCN.2 SCN.3 SCN.4
Recall SAMGS 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
GSCA 0.990 0.993 0.980 1.000 1.000 1.000 1.000 1.000
CIdrgn.1 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
CIdrgn.2 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
Precision SAMGS 0.959 0.966 0.980 0.964 0.976 0.956 0.969 0.986
GSCA 0.966 0.962 0.966 0.961 0.964 0.963 0.967 0.961
CIdrgn.1 0.980 0.983 0.974 0.969 0.978 0.969 0.980 0.994
CIdrgn.2 0.986 0.979 0.980 0.966 0.979 0.955 0.986 0.989
TNR SAMGS 0.925 0.940 0.965 0.935 0.955 0.920 0.945 0.975
GSCA 0.940 0.930 0.940 0.930 0.935 0.935 0.940 0.930
CIdrgn.1 0.965 0.970 0.955 0.945 0.960 0.945 0.965 0.990
CIdrgn.2 0.975 0.960 0.965 0.940 0.960 0.920 0.975 0.980
F.measure SAMGS 0.977 0.982 0.989 0.980 0.987 0.976 0.983 0.992
GSCA 0.976 0.975 0.971 0.979 0.980 0.980 0.982 0.979
CIdrgn.1 0.989 0.991 0.986 0.983 0.988 0.983 0.989 0.997
CIdrgn.2 0.992 0.988 0.989 0.982 0.988 0.976 0.992 0.994
Accuracy SAMGS 0.970 0.976 0.986 0.974 0.982 0.968 0.978 0.990
GSCA 0.970 0.968 0.964 0.972 0.974 0.974 0.976 0.972
CIdrgn.1 0.986 0.988 0.982 0.978 0.984 0.978 0.986 0.996
CIdrgn.2 0.990 0.984 0.986 0.976 0.984 0.968 0.990 0.992
[54]Open in a new tab
CIdrgn provides effective results for condition-responsive subnetwork
identification in cell line characteristic-specific gene network
analyses. CIdrgn.1 outperforms in scenarios 2 and 4, while CIdrgn.2
shows an effective performance for scenarios 1 and 3.
In short, our proposed strategy provides effective results for
differentially regulated gene network identification in bulk cell line
gene networks and cell line characteristic-specific gene networks. This
finding implies that the incorporation of comprehensive information on
gene regulatory systems is crucial for differential gene regulatory
network analyses.
Gastric cancer and XELOX-responsive gene regulatory network analysis
Gastric cancer (GC) is the one of the leading causes of cancer-related
death worldwide. Although chemotherapy represents the standard
treatment for GC, it is not always effective because of drug resistance
in cancer cell lines. Uncovering crucial markers and their regulatory
networks that play key roles in the mechanisms of acquired drug
resistance are a vital issue to enhance chemotherapy efficacy of
gastric cancer.
We applied the proposed strategy to identify GC and chemotherapy
-responsive gene regulatory networks. We used a publicly available
large-scale dataset from the Cancer Dependency Map (DepMap)
([55]https://depmap.org/portal/), where the RNA expression of 19,221
genes and 1,406 cell lines from the CCLE dataset (22Q2) and the drug
sensitivity of 4,686 compounds and 578 cell lines from the PRISM
repurposing primary screen can be downloaded. Because CIdrgn.2 and
CIdrgn^ss.2 show outstanding performance in simulation studies, we use
CIdrgn.2 and CIdrgn^ss.2 for GC and XELOX-responsive gene regulatory
network analysis.
GC-responsive gene regulatory network identification
To identify the GC-responsive gene regulatory networks, we considered
41 GC and 1365 non-gastric cancer (NGC) cell lines as the two
phenotypes and then estimated GC gene regulatory networks based on the
GC cell lines. The NGC gene network was estimated using 41 randomly
selected cell lines from 1365 NGC cell lines, and the construction
process was repeated 50 times based on 41 randomly selected NGC cell
lines. The permutation GC and NGC networks were also estimated using
the 41 cell lines randomly selected from among the 1406 cell lines, and
the permutation NGC network construction process was also repeated 50
times. We then extracted subnetworks from the estimated GC and NGC
networks and permutation GC and NGC networks. For NGC networks, only
subnetworks existing in all 50 iterations were considered as NGC
networks (permutation NGC networks). The permutation of network
estimation was repeated 500 times. We extracted edges with a size
greater than 0.7 and identified the GC/NGS-responsive network based on
a permutation p-value with a significance level of 0.01. A total of 106
differentially regulated subnetworks were identified, where the
identified 106 sub-networks are gastric cancer cell line specific gene
networks, i.e., the sub-networks are not existed in NGC gene networks.
To verify the efficiency of the method, we visualized the identified
GC-specific gene regulatory networks based on subnetworks with a size
greater than 3 (i.e., a subnetwork with more than three nodes), as
shown in [56]Fig 3. The chemokine (C-X-C motif) ligand (CXCL) family
(i.e., CXCL1, CXCL2, CXCL3 and CXCL8) is identified as a GC-responsive
gene regulatory system. Moreover, the CDH gene family (CDH2 and CDH17)
has GC-responsive regulatory structures. The identified GC-responsive
markers (i.e., CXCL and CDH gene families) have been studied as GC
markers, and their involvement in mechanisms related to GC have been
demonstrated in many studies.
Fig 3. Differentially regulated gene network of GC.
[57]Fig 3
[58]Open in a new tab
CXCL family
The CXCL family plays an important role in the control of inflammation
and angiogenesis because their specific receptors are expressed by both
leukocytes and endothelial cells [[59]12]. Chen et al. [[60]13]
revealed that the CXCL family represents markers for the development of
GC and demonstrated that it is involved in the pathogenesis of GC. In
particular, the mRNA expression levels of the CXCL family have been
identified as promising prognostic indicators of Epstein-Barr
virus-associated GC [[61]14]. Zhou et al. [[62]15] revealed that the
CXCL10/CXCR3 axis promotes the invasion of GC via the PI3K/AKT
pathway-dependent matrix metalloproteinase production.
CDH gene family
The CDH gene family has been identified as a marker of tissue
morphogenesis, sorting, and cell adhesion, which are important
processes for tissue formation and maintenance.
CDH2, encoded by N-cadherin, is involved in the transformation of human
GC cells and associated with intracellular events of gastric
carcinogenesis and high expression of corresponding transcription
factors [[63]16]. GC cell invasion is upregulated by Twist and
accompanied by CDH2 [[64]17]. GC invasiveness is inhibited by
microRNA-145-5p, which targets CDH2 to suppress epithelial-mesenchymal
transition (EMT) [[65]18]. Jiang et al. [[66]18] revealed that CDH2 is
involved in the molecular mechanisms that regulate GC cell invasion and
migration, especially EMT progression [[67]19]. Aberrant CDH2
expression is involved in the invasion and metastasis of GC [[68]20].
The expression of CDH17 is increased in GC, and its silencing
suppresses the metastasis and proliferation of GC cells [[69]21]. CDH17
has been identified as a useful target molecule, and its crucial roles
in the comprehensive detection and localization of GC metastases have
been revealed earlier [[70]22]. Mutation of E-cadherin (CDH1)
represents the most common germline mutation in GC [[71]23]. In
particular, mutations in CDH1 lead to hereditary diffuse GC [[72]23,
[73]24].
To identify the biological processes involved in the identified
GC-responsive subnetworks, we performed a Kyoto Encyclopedia of Genes
and Genome (KEGG) pathway enrichment analysis of the GC-responsive
markers. [74]Fig 4 shows the enriched pathways corresponding to the
p-value (i.e., -log(p.value)). As shown in [75]Fig 4, the most enriched
pathway is the “IL-17 signaling pathway” grouping of the CXCL family
(i.e., CXCL8, LCN2, CXCL1, CXCL3, and CXCL2). Furthermore, the “viral
protein interaction with cytokine and cytokine receptor” and
“cytokine-cytokine receptor interaction” pathways were also enriched
for GC-responsive markers. The pathway “epithelial cell signaling in
Helicobacter pylori infection” was also identified as an enriched
pathway. These findings imply that GC-responsive networks may be
involved in EMT-related mechanisms of GC.
Fig 4. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis.
[76]Fig 4
[77]Open in a new tab
XELOX-responsive gene regulatory network identification
We also performed a GC chemotherapy-responsive gene regulatory network
identification. We focused on the anticancer drugs capecitabine and
oxaliplatin, which have been used as standard treatments for gastric
and colorectal cancer, and their combination is known as the XELOX
regimen. For XELOX, we extracted cell lines that had non-missing values
in sensitivities to both capecitabine and oxaliplatin and matched the
cell lines with the expression level dataset. In total, 520 cell lines
were extracted. We then estimated 520 capecitabine- and
capecitabine-sensitive gene networks using SiGN-L1 [[78]7]. We
considered the cell lines with the top 100 highest and lowest drug
sensitivity values as drug-sensitive and drug-resistant cell lines,
respectively. The sensitive and resistant cell lines for both
capecitabine and oxaliplatin were defined as XELOX-sensitive and
XELOX-resistant cell lines, and 18 XELOX-sensitive and 24
XELOX-resistant cell lines were considered. From the 520
capecitabine-sensitive gene networks, we extracted 18 and 24 networks
corresponding to the 18 and 24 XELOX-sensitive and XELOX-resistant cell
lines, respectively. The 18 sensitive and 24 resistance permutated
networks were also randomly selected from 520 capecitabine-sensitive
gene networks. By using the permutation p-value of CIdrgn^ss.2 with a
significance level of 0.01, we identified 62 capecitabine-responsive
subnetworks. For oxaliplatin, similar processes were performed, and
nine oxaliplatin-responsive subnetworks were identified. Among the
genes in the 62 capecitabine- and 9 oxaliplatin-responsive subnetworks,
the common genes in the sensitive (resistant) networks were defined as
XELOX-sensitive (resistant) markers. We then extracted edges of the
XELOX-sensitive (resistance) markers from the capecitabine and
oxaliplatin-sensitive (resistance) gene networks and constructed
XELOX-sensitive (resistant) networks, where the edge size was given as
an average value from the 18 XELOX-sensitive and 24 XELOX-resistant
networks.
[79]Fig 5 shows the XELOX-responsive gene regulatory networks for the
XELOX-sensitive and XELOX-resistant cell lines. The networks of
XELOX-sensitive and XELOX-resistant cell lines showed different gene
regulatory structures despite presenting common genes, and their
networks were observed for both sensitive and resistant cell lines. The
subnetwork of NCKAP1L was the largest XELOX-responsive gene regulatory
network. The regulatory structure between NCKAP1L/ASCL1 and the OR
family can be considered a XELOX-sensitive gene regulatory system.
ASCL1 is an important gene in XELOX-sensitive cell lines, whereas the
sub-network disappears from sensitive to resistance cell lines.
Furthermore, the subnetwork of the FOS gene family can also be
considered a XELOX-sensitive network.
Fig 5. XELOX-responsive gene regulatory networks.
Fig 5
[80]Open in a new tab
[81]Table 3 shows the identified XELOX-responsive markers and their
association with GC/colorectal cancer and chemotherapy.
Table 3. Identified markers and evidence.
Genes Gastric cancer Colorectal cancer Chemotherapy
NCKAP1L [[82]27, [83]28] immune checkpoint inhibitors [[84]29],
doxorubicin [[85]30]
cisplatin[[86]31]
ASCL1 [[87]32–[88]34] [[89]32] platinum [[90]35], cabozantinib [[91]36]
Oxaliplatin [[92]37]
doxorubicin, octreotide [[93]38]
Olfactory Receptor family OR10G2 [[94]39–[95]41] [[96]39, [97]42,
[98]43] anthraquinone [[99]44]
OR4E1
OR4E2
OR51E2
OR51T1
OR52R1
FOS gene family FOS [[100]46] [[101]45, [102]47] oxaliplatin [[103]48,
[104]49], eribulin[[105]50]
FOSB carboplatin and paclitaxel [[106]51]
[107]Open in a new tab
NCKAP1L
The Nck-associated protein 1-like (NCKAP1L) gene is known as a DNA
methylation-driven gene that promotes the malignancy of cancer cells
[[108]25, [109]26]. The expression of NCKAP1 is considered a marker of
colon cancer [[110]27]. NCKAP1 has been identified as a chemotherapy
marker of immune checkpoint inhibitors [[111]29]. NCKAP1 is associated
with long-term DOX-induced cognitive dysfunction [[112]30] Stark et al.
[[113]31] reported an association between NCKAP1 levels and
cisplatin-induced apoptosis and cytotoxicity.
ASCL1
Expression levels of ASCL1 were identified as crucial markers of
various GC subtypes, that is, lower expression of ASCL1 was observed in
gastric mixed adenocarcinoma, gastrointestinal stromal tumors, and
gastric intestinal-type adenocarcinoma [[114]32]. In addition,
overexpression of ASCL1 mRNA was observed in gastrointestinal carcinoid
tumor cells compared with that in normal tissue [[115]34]. ASCL1 was
identified as a DEG between GC and normal tissue and an upregulated
transcription factor in the gene network of the DEGs [[116]33].
Furthermore, the downregulated expression of ASCL1 has been identified
as a marker of GC and colon cancer [[117]32]. Augustyn et al. [[118]36]
revealed that the ASCL1 pathway is a promising therapeutic target of
cabozantinib for lung cancer. ASCL1 is involved in the anticancer
mechanisms of doxorubicin and octreotide in neuroendocrine cancer
[[119]38].
OR family
OR family activation is a crucial marker for cancer cell growth and
progression [[120]39]. The OR family promotes the invasiveness and
metastasis of colorectal cancer cells [[121]43]. OR2B6 was detectable
in colon and gastric carcinoma cell lines [[122]39]. The long noncoding
RNA of OR3A4 promotes metastasis and tumorigenicity in GC [[123]40].
OR51B5 has been identified as a candidate oncogene in scirrhous-type GC
[[124]41].
Morita et al. [[125]42] identified OR7C1 as a novel marker of colon
cancer-initiating cells and a potent target of immunotherapy.
Activation of the OR family members induces apoptosis and reduces cell
migration and proliferation in colorectal cancer cells [[126]39]. The
OR family (i.e., OR1D2, OR4F15, and OR1A1) was also disrupted in
colorectal cancer, and the importance of OR activity-associated genes
in colorectal cancer was identified by Li et al. [[127]43]. Choi et al.
[[128]44] found that the OR family is an important target for drug
discovery and represents a crucial anticancer target.
FOS family (AP-1 transcription factor family)
The FOS gene family is involved in malignant transformation in several
types of cancer, and its members (FOS, FOSB, FOSL1, and FOSL2) play
crucial roles in cell proliferation and cancer development [[129]45].
Abnormal expression of FOSB is a signature of tumor progression and
poor survival. Overexpression of FOSB suppresses cell proliferation,
clone formation, and migration, whereas silencing of FOSB leads to
proliferation, clone formation, and migration in GC cell lines
[[130]46]. Tang et al. [[131]46] suggested that FOSB could be a useful
biomarker for the diagnosis and prognosis of GC. FOS family members
have been identified as novel prognostic markers and therapeutic
targets in colorectal cancer [[132]45, [133]47]. The FOS family is
involved in the antitumor efficacy of oxaliplatin, and the importance
of the effect of oxaliplatin ‘on FOS signaling has been revealed
[[134]48, [135]49].
Dong et al. [[136]50] demonstrated that upregulated FOS expression is a
marker of low sensitivity to eribulin. Javellana et al. [[137]51]
revealed potentially clinically relevant roles of increased FOS family
expression in driving the initial responses of tumors to carboplatin
and paclitaxel and suggested that the FOS family will be a useful
druggable target to counteract chemotherapy resistance.
For the XELOX-sensitive and XELOX-resistant subnetworks in [138]Fig 5,
we performed a Gene Ontology (GO) enrichment analysis. [139]Fig 6 shows
the significant GO terms corresponding to a p-value less than 0.01,
where the enriched pathways of XELOX-sensitive and XELOX-resistant
subnetworks are shown in red and blue, respectively. As shown in
[140]Fig 6, the XELOX-sensitive (red) and XELOX-resistant (blue)
markers are enriched in different pathways, although the networks have
common genes.
Fig 6. Gene Ontology (GO) pathway analysis of XELOX-sensitive (red) and
XELOX-resistant (blue) specific markers.
[141]Fig 6
[142]Open in a new tab
The most enriched pathway of the XELOX-sensitive markers is the
“olfactory receptor activity” grouping of the OR family (i.e., OR4E2,
OR4E1, OR10G2, OR52R1, OR51E2, OR51T1). Furthermore, the pathways
related to “DNA binding”, “steroid hormone”, and “G protein-coupled
receptor” were also identified as XELOX-sensitive gene markers. These
findings imply that the XELOX-sensitive gene regulatory network is
dominated by G protein-coupled receptors because the OR family belongs
to the superfamily of G protein-coupled receptors [[143]52]. For
XELOX-resistance markers, the most enriched pathway is
“neurotransmitter transport”, and it consists SV2B and CHAT, which have
crucial molecular functions as drug targets [[144]53].
Our results and those of previous studies show that the CXC family of
chemokines may play a crucial role in the progression of GC.
Furthermore, identifying the EMT-related mechanism involving the CDH
gene family is essential for understanding the metastasis,
proliferation, and development of GC and revealing the mechanism of the
gene family in EMT-associated GC is vital to understanding the
progression of GC. The GS- and anti-cancer drugs response- related
mechanisms of ASCL1 that was identified as XELOX sensitive-specific hub
gene were revealed in previous studies. Although its mechanisms of
acquiring gastric cancer drug resistance have not yet been clearly
demonstrated, our results and literatures suggest that the loss of
activities and molecular interactions of the OR family with the ASCL1
and FOS families may represent a molecular mechanism underlying
acquired drug resistance in GC cell lines.
Conclusions
In this study, we introduce a novel strategy for differential gene
regulatory network analysis. To identify condition-responsive gene
regulatory networks describing biological condition-specific regulatory
characteristics, we propose a computational method that measures the
dissimilarity of subnetworks based on not only gene expression, but
also regulatory effects between genes and edge structures. We also
extended the proposed method to cell line characteristic-specific gene
network analysis. Our method performs differential gene regulatory
network analysis based on comprehensive information of the network
structure; thus, we can effectively identify biological
condition-responsive gene networks that cannot be detected using
methods based on differences in gene expression levels. Monte Carlo
simulations showed that our strategy demonstrates outstanding
performance for responsive gene network identification in various
network structure scenarios. It implies that incorporation of various
network structures not only expression levels of genes is crucial to
identify responsive networks.
We applied the proposed CIdrgn to the DepMap dataset and performed an
analysis of GC and chemotherapy (XELOX) -responsive gene regulatory
networks. Our strategy identified the molecular interplays of CXCL
family (i.e., CXCL1, CXCL2, CXCL3 and CXCL8) and CDH gene family as GS
responsive networks. For the XELOX-responsive gene networks, the
regulatory structure between NCKAP1L/ASCL1 and the OR family are
identified. The identified markers from the GC and XELOX-responsive
network analysis provide strong evidence for the mechanisms related to
GC, colorectal cancer, and various chemotherapies.
Our results suggest that the CXC family of chemokines and CDH genes may
play a crucial role in GC tumor invasion, metastasis, and progression.
Moreover, our findings indicate that revealing the mechanisms
associated with genes in EMT-associated GC will be key to understanding
GC progression. We also suggest that the loss of activity of the OR
family with the ASCL1/FOS family may lead to drug resistance in cell
lines, and inducers of this gene family can provide crucial information
to improve the chemotherapy efficiency of XELOX.
Acknowledgments