Abstract
Gene network associated with Alzheimer’s disease (AD) is constructed
from multiple data sources by considering gene co-expression and other
factors. The AD gene network is divided into modules by Cluster one,
Markov Clustering (MCL), Community Clustering (Glay) and Molecular
Complex Detection (MCODE). Then these division methods are evaluated by
network structure entropy, and optimal division method, MCODE. Through
functional enrichment analysis, the functional module is identified.
Furthermore, we use network topology properties to predict essential
genes. In addition, the logical regression algorithm under Bayesian
framework is used to predict essential genes of AD. Based on network
pharmacology, four kinds of AD’s herb-active compounds-active compound
targets network and AD common core network are visualized, then the
better herbs and herb compounds of AD are selected through enrichment
analysis.
Keywords: Alzheimer’s disease, network pharmacology, network entropy,
network topology, Bayesian algorithm, logical regression algorithm
1. Introduction
Alzheimer’s disease (AD) is a chronic age-associated neurodegenerative
disorder, and there are no definitive treatments or prophylactic
agents. Its pathological features include senile plaque, nerve fiber
tangles, and massive loss of neurons [[28]1]. As its pathogenesis is
not clear, clinical drugs used commonly can only relieve symptoms
within a certain period of time but cannot improve the disease
fundamentally.
Network pharmacology is associated with drug targets and human disease
genes. On the basis of understanding the “drug-target gene-disease
gene” network, the effects of different drugs on different target
proteins are evaluated by using network analysis methods [[29]2,[30]3].
Many different computational methods have been employed for the
different application fields. Gianni D’Angelo and Francesco Palmieri
proposed a novel autoencoder-based deep neural network architecture,
where multiple autoencoders are embedded with convolutional and
recurrent neural networks to elicit relevant knowledge about the
relations existing among the basic features (spatial-features) and
their evolution over time [[31]4]. Gianni D’Angelo and Francesco
Palmieri described the use of Genetic Programming for the diagnosis and
modeling of aerospace structural defects. The resulting approach aims
at extracting such knowledge by providing a mathematical model of the
considered defects, which can be used for recognizing other similar
ones [[32]5]. Zhang et al. proposed a Bayesian regression approach to
explain similarities of disease phenotypes by using diffusion kernels
of one or several protein-protein interaction (PPI) networks [[33]6].
Chen et al. proposed two improved Markov random field (MRF) algorithms,
which can automatically assign weights to different data sources, using
Gibbs sampling processes [[34]7,[35]8]. Chen et al. proposed a fast and
high-performance multiple data integration algorithm [[36]9] for
identifying human disease genes, the logistic regression based
algorithm is extended to the multiple data integration case, where the
parameters (weights) of different data sources can be tuned
automatically.
In this paper, AD genes are collected from multiple databases, and the
gene network of AD is constructed by considering some factors such as
gene co-expression and metabolic relationship. The gene network is
divided into modules by Cluster one [[37]10], MCL [[38]11], Glay
[[39]11] and MCODE [[40]11,[41]12]. Then these division methods are
evaluated by network structure entropy, and the optimal division
method, MCODE. Through functional enrichment analysis, the functional
modules are identified. Furthermore, essential genes can be predicted
by the analysis of network topology characteristics of these functional
modules. In addition, the integrated algorithm (logical regression
algorithm under Bayesian framework) is used to predict AD’s essential
genes. The final predicted essential genes are obtained by analyzing
these two results above.
AD is located in the brain, but it is closely associated to the
kidneys, liver, heart, spleen, and other viscera, according to
traditional Chinese medicine [[42]13,[43]14]. Compound herbs have the
characteristics of multi-components and multi-targets. In this study,
we screen out the effective herb compounds for the treatment of AD by
identifying the essential genes of AD, the herb-active compound-active
target genes network, and the common core network of AD
[[44]15,[45]16].
2. Materials and Methods
2.1. Data Preparation
Data Sources
Some common herbs for treating AD are KXS (Kaixinsan), DYSYS
(Dangguishaoyaosan), YGS (Yigansan) and YQTYT (Yiqitongyutang). The
compounds of these four herbs are obtained [[46]17,[47]18] (see
[48]Supplementary Table S1). Their active targets were obtained from
the Traditional Chinese Medicine Systems Pharmacology (TCMSP) Database
[[49]19]. The AD-associated genes were collected from the database of
National Center for Biotechnology Information (NCBI) database [[50]20],
Online Mendelian Inheritance in Man (OMIM) database [[51]21], and
Therapeutic Target Database (TTD) [[52]22]. The PPI dataset is derived
from the database of IntAct Molecular Interaction Database (IntAct)
[[53]23]. The human gene expression profiles are obtained from the Gene
Expression Omnibus (GEO) database [[54]24]. The pathway datasets are
obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG)
database [[55]25]. The human protein complexes are from the database of
Comprehensive Resource of Mammalian protein complexes (CORUM) database
[[56]26].
2.2. Methods
2.2.1. Prediction of Essential Genes based on Modular Network
Network Module Partition Method
According to the distribution of network nodes in the module, the
module division method can be divided into overlapping modules and
non-overlapping modules. The common algorithms, MCODE, MCL, Glay, and
cluster one, are used to divide the network. The first three algorithms
are non-overlapping algorithms, while the last one is an overlapping
algorithm. In this paper, the above four-module partition methods are
used to divide AD networks.
Entropy
Recently, “Shannon entropy” has been introduced to measure some
properties of networks, also known as “network entropy”. Its value can
effectively assess the stability of the network. The smaller numerical
value of network entropy, the more stable the network [[57]27]. Network
structure entropy is used as the evaluation method. Let N and
[MATH: ki
:MATH]
denote the number of nodes, the degree of the i-th node, respectively.
The entropy of a network [[58]28] is defined as follows:
[MATH: E=−∑i=1<
/mn>NIi
lnIi w<
mi>here Ii=<
mi>ki∑i=1<
/mn>Nki
:MATH]
(1)
Prediction of Functional Gene Modules
The correlation between AD original network and divided module network
is discussed based on gene function enrichment analysis and association
indices [[59]29,[60]30]. Jaccard association index is often used to
evaluate the functional correlation between each module and the
original network [[61]29]. In addition, Fuxman Bass Juan et al. survey
many association indices, such as Simpson, Geometric, Cosine, PCC
(Pearson Correlation Coefficient) [[62]31]. Zhu and Qiao et al. further
extend the PCC association index to measure the correlation between
each module and the function of the original network [[63]32], as shown
in [64]Table 1.
Table 1.
Correlation index.
Correlation Index Formula Meaning
Jaccard
[MATH:
JOC=|O ∩ Ci|
|O∪ Ci|<
/mrow> :MATH]
The range of values is [0, 1], and the closer it is to 1, the stronger
the correlation.
Simpson
[MATH:
SOC=|O ∩ Ci|
|min(O
mi>,Ci)<
/mrow>| :MATH]
Geometric
[MATH:
GOC=|O
∩ Ci|2|O|⋅|Ci| :MATH]
Cosine
[MATH:
COC=|O ∩ Ci||O|⋅|Ci|
:MATH]
PCC
[MATH:
PCCOC
mi>=|O ∩ Ci|⋅n−|O|⋅|
Ci||O|⋅
|Ci
|⋅(n−
|O|)⋅
(n−|<
mi>Ci|) :MATH]
[65]Open in a new tab
Where,
[MATH: O :MATH]
represents the set of the original network pathways;
[MATH: Ci
:MATH]
represents the set of the i-th module pathways after partition.
Screening of Essential Genes
Research on the essential genes can help us to understand the biology
of the disease. Various tools have been developed to predict and judge
the essential genes in the network [[66]33]. In this paper, the network
topology attributes of functional modules are analyzed by 11 indexes of
Cyto-Hubba [[67]33], such as degree centrality (DC), betweenness
centrality (BC), closeness centrality (CC), density of maximum
neighborhood component (DMNC), maximum neighborhood component (MNC),
bottleneck (BN), edge percolated component (EPC), maximum clique
centrality (MCC), edge clustering coefficient (ECC), radiality and
clustering coefficient.
2.2.2. Integrated Algorithm for Predicting Essential Genes
Chen et al. proposed a fast and high-performance multiple data
integration algorithm for identifying human disease genes [[68]9]. The
disease gene identification problem was first expressed as a
two-classification problem, and the feature vectors of each gene were
extracted from the integrated network. Combined with the binary
logistic regression model, maximum likelihood estimation and Bayesian
idea, the model parameters are estimated, and the posterior probability
of each gene was calculated. The final decision score was obtained by
calculating the percentage of individual posterior probability.
Acquisition of Priori Probability of Genes
Suppose the integrated network contains genes
[MATH:
g1…g
n+m :MATH]
, in which
[MATH:
g1…g
n :MATH]
are the unknown ones and
[MATH:
gn+1
…gn+m
mrow> :MATH]
are the known ones in the OMIM database. Similar to the method used in
references [[69]8,[70]9], for