Abstract

   Gene network associated with Alzheimer’s disease (AD) is constructed
   from multiple data sources by considering gene co-expression and other
   factors. The AD gene network is divided into modules by Cluster one,
   Markov Clustering (MCL), Community Clustering (Glay) and Molecular
   Complex Detection (MCODE). Then these division methods are evaluated by
   network structure entropy, and optimal division method, MCODE. Through
   functional enrichment analysis, the functional module is identified.
   Furthermore, we use network topology properties to predict essential
   genes. In addition, the logical regression algorithm under Bayesian
   framework is used to predict essential genes of AD. Based on network
   pharmacology, four kinds of AD’s herb-active compounds-active compound
   targets network and AD common core network are visualized, then the
   better herbs and herb compounds of AD are selected through enrichment
   analysis.

   Keywords: Alzheimer’s disease, network pharmacology, network entropy,
   network topology, Bayesian algorithm, logical regression algorithm

1. Introduction

   Alzheimer’s disease (AD) is a chronic age-associated neurodegenerative
   disorder, and there are no definitive treatments or prophylactic
   agents. Its pathological features include senile plaque, nerve fiber
   tangles, and massive loss of neurons [[28]1]. As its pathogenesis is
   not clear, clinical drugs used commonly can only relieve symptoms
   within a certain period of time but cannot improve the disease
   fundamentally.

   Network pharmacology is associated with drug targets and human disease
   genes. On the basis of understanding the “drug-target gene-disease
   gene” network, the effects of different drugs on different target
   proteins are evaluated by using network analysis methods [[29]2,[30]3].

   Many different computational methods have been employed for the
   different application fields. Gianni D’Angelo and Francesco Palmieri
   proposed a novel autoencoder-based deep neural network architecture,
   where multiple autoencoders are embedded with convolutional and
   recurrent neural networks to elicit relevant knowledge about the
   relations existing among the basic features (spatial-features) and
   their evolution over time [[31]4]. Gianni D’Angelo and Francesco
   Palmieri described the use of Genetic Programming for the diagnosis and
   modeling of aerospace structural defects. The resulting approach aims
   at extracting such knowledge by providing a mathematical model of the
   considered defects, which can be used for recognizing other similar
   ones [[32]5]. Zhang et al. proposed a Bayesian regression approach to
   explain similarities of disease phenotypes by using diffusion kernels
   of one or several protein-protein interaction (PPI) networks [[33]6].
   Chen et al. proposed two improved Markov random field (MRF) algorithms,
   which can automatically assign weights to different data sources, using
   Gibbs sampling processes [[34]7,[35]8]. Chen et al. proposed a fast and
   high-performance multiple data integration algorithm [[36]9] for
   identifying human disease genes, the logistic regression based
   algorithm is extended to the multiple data integration case, where the
   parameters (weights) of different data sources can be tuned
   automatically.

   In this paper, AD genes are collected from multiple databases, and the
   gene network of AD is constructed by considering some factors such as
   gene co-expression and metabolic relationship. The gene network is
   divided into modules by Cluster one [[37]10], MCL [[38]11], Glay
   [[39]11] and MCODE [[40]11,[41]12]. Then these division methods are
   evaluated by network structure entropy, and the optimal division
   method, MCODE. Through functional enrichment analysis, the functional
   modules are identified. Furthermore, essential genes can be predicted
   by the analysis of network topology characteristics of these functional
   modules. In addition, the integrated algorithm (logical regression
   algorithm under Bayesian framework) is used to predict AD’s essential
   genes. The final predicted essential genes are obtained by analyzing
   these two results above.

   AD is located in the brain, but it is closely associated to the
   kidneys, liver, heart, spleen, and other viscera, according to
   traditional Chinese medicine [[42]13,[43]14]. Compound herbs have the
   characteristics of multi-components and multi-targets. In this study,
   we screen out the effective herb compounds for the treatment of AD by
   identifying the essential genes of AD, the herb-active compound-active
   target genes network, and the common core network of AD
   [[44]15,[45]16].

2. Materials and Methods

2.1. Data Preparation

Data Sources

   Some common herbs for treating AD are KXS (Kaixinsan), DYSYS
   (Dangguishaoyaosan), YGS (Yigansan) and YQTYT (Yiqitongyutang). The
   compounds of these four herbs are obtained [[46]17,[47]18] (see
   [48]Supplementary Table S1). Their active targets were obtained from
   the Traditional Chinese Medicine Systems Pharmacology (TCMSP) Database
   [[49]19]. The AD-associated genes were collected from the database of
   National Center for Biotechnology Information (NCBI) database [[50]20],
   Online Mendelian Inheritance in Man (OMIM) database [[51]21], and
   Therapeutic Target Database (TTD) [[52]22]. The PPI dataset is derived
   from the database of IntAct Molecular Interaction Database (IntAct)
   [[53]23]. The human gene expression profiles are obtained from the Gene
   Expression Omnibus (GEO) database [[54]24]. The pathway datasets are
   obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG)
   database [[55]25]. The human protein complexes are from the database of
   Comprehensive Resource of Mammalian protein complexes (CORUM) database
   [[56]26].

2.2. Methods

2.2.1. Prediction of Essential Genes based on Modular Network

Network Module Partition Method

   According to the distribution of network nodes in the module, the
   module division method can be divided into overlapping modules and
   non-overlapping modules. The common algorithms, MCODE, MCL, Glay, and
   cluster one, are used to divide the network. The first three algorithms
   are non-overlapping algorithms, while the last one is an overlapping
   algorithm. In this paper, the above four-module partition methods are
   used to divide AD networks.

Entropy

   Recently, “Shannon entropy” has been introduced to measure some
   properties of networks, also known as “network entropy”. Its value can
   effectively assess the stability of the network. The smaller numerical
   value of network entropy, the more stable the network [[57]27]. Network
   structure entropy is used as the evaluation method. Let N and
   [MATH: <mrow><mrow><msub><mi>k</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   denote the number of nodes, the degree of the i-th node, respectively.
   The entropy of a network [[58]28] is defined as follows:
   [MATH: <mrow><mrow><mi>E</mi><mo>=</mo><mo>−</mo><munderover><mstyle
   mathsize="100%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>i</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>N</mi></munderover><msub><mi>I</mi><mi>i</mi></msub><mi>
   ln</mi><msub><mi>I</mi><mi>i</mi></msub><mo> </mo><mo> </mo><mi>w</mi><
   mi>h</mi><mi>e</mi><mi>r</mi><mi>e</mi><mo> </mo><mo> </mo><mo> </mo><m
   o> </mo><msub><mi>I</mi><mi>i</mi></msub><mo>=</mo><mfrac><mrow><msub><
   mi>k</mi><mi>i</mi></msub></mrow><mrow><msubsup><mstyle mathsize="100%"
   displaystyle="true"><mo>∑</mo></mstyle><mrow><mi>i</mi><mo>=</mo><mn>1<
   /mn></mrow><mi>N</mi></msubsup><msub><mi>k</mi><mi>i</mi></msub></mrow>
   </mfrac></mrow></mrow> :MATH]
   (1)

Prediction of Functional Gene Modules

   The correlation between AD original network and divided module network
   is discussed based on gene function enrichment analysis and association
   indices [[59]29,[60]30]. Jaccard association index is often used to
   evaluate the functional correlation between each module and the
   original network [[61]29]. In addition, Fuxman Bass Juan et al. survey
   many association indices, such as Simpson, Geometric, Cosine, PCC
   (Pearson Correlation Coefficient) [[62]31]. Zhu and Qiao et al. further
   extend the PCC association index to measure the correlation between
   each module and the function of the original network [[63]32], as shown
   in [64]Table 1.

Table 1.

   Correlation index.
   Correlation Index Formula Meaning
   Jaccard
   [MATH:
   <mrow><mrow><msub><mi>J</mi><mrow><mi>O</mi><mi>C</mi></mrow></msub><mo
   >=</mo><mfrac><mrow><mrow><mo>|</mo><mrow><mi>O</mi><mtext> </mtext><ms
   up><mstyle mathsize="100%"
   displaystyle="true"><mo>∩</mo></mstyle><mtext> </mtext></msup><msub><mi
   >C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow><mrow><mrow><mo>
   |</mo><mrow><mi>O</mi><msup><mstyle mathsize="100%"
   displaystyle="true"><mo>∪</mo></mstyle><mtext> </mtext></msup><msub><mi
   >C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow></mfrac></mrow><
   /mrow> :MATH]
   The range of values is [0, 1], and the closer it is to 1, the stronger
   the correlation.
   Simpson
   [MATH:
   <mrow><mrow><msub><mi>S</mi><mrow><mi>O</mi><mi>C</mi></mrow></msub><mo
   >=</mo><mfrac><mrow><mrow><mo>|</mo><mrow><mi>O</mi><mtext> </mtext><ms
   up><mstyle mathsize="100%"
   displaystyle="true"><mo>∩</mo></mstyle><mtext> </mtext></msup><msub><mi
   >C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow><mrow><mrow><mo>
   |</mo><mrow><mi>m</mi><mi>i</mi><mi>n</mi><mrow><mo>(</mo><mrow><mi>O</
   mi><mo>,</mo><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>)</mo></mrow><
   /mrow><mo>|</mo></mrow></mrow></mfrac></mrow></mrow> :MATH]
   Geometric
   [MATH:
   <mrow><mrow><msub><mi>G</mi><mrow><mi>O</mi><mi>C</mi></mrow></msub><mo
   >=</mo><mfrac><mrow><msup><mrow><mrow><mo>|</mo><mrow><mi>O</mi><mtext>
    </mtext><msup><mstyle mathsize="100%"
   displaystyle="true"><mo>∩</mo></mstyle><mtext> </mtext></msup><msub><mi
   >C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow><mn>2</mn></msup
   ></mrow><mrow><mrow><mo>|</mo><mi>O</mi><mo>|</mo></mrow><mo>⋅</mo><mro
   w><mo>|</mo><mrow><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>|</mo></m
   row></mrow></mfrac></mrow></mrow> :MATH]
   Cosine
   [MATH:
   <mrow><mrow><msub><mi>C</mi><mrow><mi>O</mi><mi>C</mi></mrow></msub><mo
   >=</mo><mfrac><mrow><mrow><mo>|</mo><mrow><mi>O</mi><mtext> </mtext><ms
   up><mstyle mathsize="100%"
   displaystyle="true"><mo>∩</mo></mstyle><mtext> </mtext></msup><msub><mi
   >C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow><mrow><msqrt><mr
   ow><mrow><mo>|</mo><mi>O</mi><mo>|</mo></mrow><mo>⋅</mo><mrow><mo>|</mo
   ><mrow><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow>
   </msqrt></mrow></mfrac></mrow></mrow> :MATH]
   PCC
   [MATH:
   <mrow><mrow><mi>P</mi><mi>C</mi><msub><mi>C</mi><mrow><mi>O</mi><mi>C</
   mi></mrow></msub><mo>=</mo><mfrac><mrow><mrow><mo>|</mo><mrow><mi>O</mi
   ><mtext> </mtext><msup><mstyle mathsize="100%"
   displaystyle="true"><mo>∩</mo></mstyle><mtext> </mtext></msup><msub><mi
   >C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow><mo>⋅</mo><mi>n</mi><mo
   >−</mo><mrow><mo>|</mo><mi>O</mi><mo>|</mo></mrow><mo>⋅</mo><mrow><mo>|
   </mo><mrow><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></m
   row><mrow><msqrt><mrow><mrow><mo>|</mo><mi>O</mi><mo>|</mo></mrow><mo>⋅
   </mo><mrow><mo>|</mo><mrow><msub><mi>C</mi><mi>i</mi></msub></mrow><mo>
   |</mo></mrow><mo>⋅</mo><mrow><mo>(</mo><mrow><mi>n</mi><mo>−</mo><mrow>
   <mo>|</mo><mi>O</mi><mo>|</mo></mrow></mrow><mo>)</mo></mrow><mo>⋅</mo>
   <mrow><mo>(</mo><mrow><mi>n</mi><mo>−</mo><mrow><mo>|</mo><mrow><msub><
   mi>C</mi><mi>i</mi></msub></mrow><mo>|</mo></mrow></mrow><mo>)</mo></mr
   ow></mrow></msqrt></mrow></mfrac></mrow></mrow> :MATH]
   [65]Open in a new tab

   Where,
   [MATH: <mrow><mi>O</mi></mrow> :MATH]
   represents the set of the original network pathways;
   [MATH: <mrow><mrow><msub><mi>C</mi><mi>i</mi></msub></mrow></mrow>
   :MATH]
   represents the set of the i-th module pathways after partition.

Screening of Essential Genes

   Research on the essential genes can help us to understand the biology
   of the disease. Various tools have been developed to predict and judge
   the essential genes in the network [[66]33]. In this paper, the network
   topology attributes of functional modules are analyzed by 11 indexes of
   Cyto-Hubba [[67]33], such as degree centrality (DC), betweenness
   centrality (BC), closeness centrality (CC), density of maximum
   neighborhood component (DMNC), maximum neighborhood component (MNC),
   bottleneck (BN), edge percolated component (EPC), maximum clique
   centrality (MCC), edge clustering coefficient (ECC), radiality and
   clustering coefficient.

2.2.2. Integrated Algorithm for Predicting Essential Genes

   Chen et al. proposed a fast and high-performance multiple data
   integration algorithm for identifying human disease genes [[68]9]. The
   disease gene identification problem was first expressed as a
   two-classification problem, and the feature vectors of each gene were
   extracted from the integrated network. Combined with the binary
   logistic regression model, maximum likelihood estimation and Bayesian
   idea, the model parameters are estimated, and the posterior probability
   of each gene was calculated. The final decision score was obtained by
   calculating the percentage of individual posterior probability.

Acquisition of Priori Probability of Genes

   Suppose the integrated network contains genes
   [MATH:
   <mrow><mrow><msub><mi>g</mi><mn>1</mn></msub><mo>…</mo><msub><mi>g</mi>
   <mrow><mi>n</mi><mo>+</mo><mi>m</mi></mrow></msub></mrow></mrow> :MATH]
   , in which
   [MATH:
   <mrow><mrow><msub><mi>g</mi><mn>1</mn></msub><mo>…</mo><msub><mi>g</mi>
   <mi>n</mi></msub></mrow></mrow> :MATH]
   are the unknown ones and
   [MATH:
   <mrow><mrow><msub><mi>g</mi><mrow><mi>n</mi><mo>+</mo><mn>1</mn></mrow>
   </msub><mo>…</mo><msub><mi>g</mi><mrow><mi>n</mi><mo>+</mo><mi>m</mi></
   mrow></msub></mrow></mrow> :MATH]
   are the known ones in the OMIM database. Similar to the method used in
   references [[69]8,[70]9], for