Abstract Gene network associated with Alzheimer’s disease (AD) is constructed from multiple data sources by considering gene co-expression and other factors. The AD gene network is divided into modules by Cluster one, Markov Clustering (MCL), Community Clustering (Glay) and Molecular Complex Detection (MCODE). Then these division methods are evaluated by network structure entropy, and optimal division method, MCODE. Through functional enrichment analysis, the functional module is identified. Furthermore, we use network topology properties to predict essential genes. In addition, the logical regression algorithm under Bayesian framework is used to predict essential genes of AD. Based on network pharmacology, four kinds of AD’s herb-active compounds-active compound targets network and AD common core network are visualized, then the better herbs and herb compounds of AD are selected through enrichment analysis. Keywords: Alzheimer’s disease, network pharmacology, network entropy, network topology, Bayesian algorithm, logical regression algorithm 1. Introduction Alzheimer’s disease (AD) is a chronic age-associated neurodegenerative disorder, and there are no definitive treatments or prophylactic agents. Its pathological features include senile plaque, nerve fiber tangles, and massive loss of neurons [[28]1]. As its pathogenesis is not clear, clinical drugs used commonly can only relieve symptoms within a certain period of time but cannot improve the disease fundamentally. Network pharmacology is associated with drug targets and human disease genes. On the basis of understanding the “drug-target gene-disease gene” network, the effects of different drugs on different target proteins are evaluated by using network analysis methods [[29]2,[30]3]. Many different computational methods have been employed for the different application fields. Gianni D’Angelo and Francesco Palmieri proposed a novel autoencoder-based deep neural network architecture, where multiple autoencoders are embedded with convolutional and recurrent neural networks to elicit relevant knowledge about the relations existing among the basic features (spatial-features) and their evolution over time [[31]4]. Gianni D’Angelo and Francesco Palmieri described the use of Genetic Programming for the diagnosis and modeling of aerospace structural defects. The resulting approach aims at extracting such knowledge by providing a mathematical model of the considered defects, which can be used for recognizing other similar ones [[32]5]. Zhang et al. proposed a Bayesian regression approach to explain similarities of disease phenotypes by using diffusion kernels of one or several protein-protein interaction (PPI) networks [[33]6]. Chen et al. proposed two improved Markov random field (MRF) algorithms, which can automatically assign weights to different data sources, using Gibbs sampling processes [[34]7,[35]8]. Chen et al. proposed a fast and high-performance multiple data integration algorithm [[36]9] for identifying human disease genes, the logistic regression based algorithm is extended to the multiple data integration case, where the parameters (weights) of different data sources can be tuned automatically. In this paper, AD genes are collected from multiple databases, and the gene network of AD is constructed by considering some factors such as gene co-expression and metabolic relationship. The gene network is divided into modules by Cluster one [[37]10], MCL [[38]11], Glay [[39]11] and MCODE [[40]11,[41]12]. Then these division methods are evaluated by network structure entropy, and the optimal division method, MCODE. Through functional enrichment analysis, the functional modules are identified. Furthermore, essential genes can be predicted by the analysis of network topology characteristics of these functional modules. In addition, the integrated algorithm (logical regression algorithm under Bayesian framework) is used to predict AD’s essential genes. The final predicted essential genes are obtained by analyzing these two results above. AD is located in the brain, but it is closely associated to the kidneys, liver, heart, spleen, and other viscera, according to traditional Chinese medicine [[42]13,[43]14]. Compound herbs have the characteristics of multi-components and multi-targets. In this study, we screen out the effective herb compounds for the treatment of AD by identifying the essential genes of AD, the herb-active compound-active target genes network, and the common core network of AD [[44]15,[45]16]. 2. Materials and Methods 2.1. Data Preparation Data Sources Some common herbs for treating AD are KXS (Kaixinsan), DYSYS (Dangguishaoyaosan), YGS (Yigansan) and YQTYT (Yiqitongyutang). The compounds of these four herbs are obtained [[46]17,[47]18] (see [48]Supplementary Table S1). Their active targets were obtained from the Traditional Chinese Medicine Systems Pharmacology (TCMSP) Database [[49]19]. The AD-associated genes were collected from the database of National Center for Biotechnology Information (NCBI) database [[50]20], Online Mendelian Inheritance in Man (OMIM) database [[51]21], and Therapeutic Target Database (TTD) [[52]22]. The PPI dataset is derived from the database of IntAct Molecular Interaction Database (IntAct) [[53]23]. The human gene expression profiles are obtained from the Gene Expression Omnibus (GEO) database [[54]24]. The pathway datasets are obtained from the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [[55]25]. The human protein complexes are from the database of Comprehensive Resource of Mammalian protein complexes (CORUM) database [[56]26]. 2.2. Methods 2.2.1. Prediction of Essential Genes based on Modular Network Network Module Partition Method According to the distribution of network nodes in the module, the module division method can be divided into overlapping modules and non-overlapping modules. The common algorithms, MCODE, MCL, Glay, and cluster one, are used to divide the network. The first three algorithms are non-overlapping algorithms, while the last one is an overlapping algorithm. In this paper, the above four-module partition methods are used to divide AD networks. Entropy Recently, “Shannon entropy” has been introduced to measure some properties of networks, also known as “network entropy”. Its value can effectively assess the stability of the network. The smaller numerical value of network entropy, the more stable the network [[57]27]. Network structure entropy is used as the evaluation method. Let N and [MATH: ki :MATH] denote the number of nodes, the degree of the i-th node, respectively. The entropy of a network [[58]28] is defined as follows: [MATH: E=i=1< /mn>NIi lnIi w< mi>here Ii=< mi>kii=1< /mn>Nki :MATH] (1) Prediction of Functional Gene Modules The correlation between AD original network and divided module network is discussed based on gene function enrichment analysis and association indices [[59]29,[60]30]. Jaccard association index is often used to evaluate the functional correlation between each module and the original network [[61]29]. In addition, Fuxman Bass Juan et al. survey many association indices, such as Simpson, Geometric, Cosine, PCC (Pearson Correlation Coefficient) [[62]31]. Zhu and Qiao et al. further extend the PCC association index to measure the correlation between each module and the function of the original network [[63]32], as shown in [64]Table 1. Table 1. Correlation index. Correlation Index Formula Meaning Jaccard [MATH: JOC=|O Ci| |O Ci|< /mrow> :MATH] The range of values is [0, 1], and the closer it is to 1, the stronger the correlation. Simpson [MATH: SOC=|O Ci| |min(O,Ci)< /mrow>| :MATH] Geometric [MATH: GOC=|O Ci|2|O||Ci| :MATH] Cosine [MATH: COC=|O Ci||O||Ci| :MATH] PCC [MATH: PCCOC=|O Ci|n|O|| Ci||O||Ci |(n |O|) (n|< mi>Ci|) :MATH] [65]Open in a new tab Where, [MATH: O :MATH] represents the set of the original network pathways; [MATH: Ci :MATH] represents the set of the i-th module pathways after partition. Screening of Essential Genes Research on the essential genes can help us to understand the biology of the disease. Various tools have been developed to predict and judge the essential genes in the network [[66]33]. In this paper, the network topology attributes of functional modules are analyzed by 11 indexes of Cyto-Hubba [[67]33], such as degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), density of maximum neighborhood component (DMNC), maximum neighborhood component (MNC), bottleneck (BN), edge percolated component (EPC), maximum clique centrality (MCC), edge clustering coefficient (ECC), radiality and clustering coefficient. 2.2.2. Integrated Algorithm for Predicting Essential Genes Chen et al. proposed a fast and high-performance multiple data integration algorithm for identifying human disease genes [[68]9]. The disease gene identification problem was first expressed as a two-classification problem, and the feature vectors of each gene were extracted from the integrated network. Combined with the binary logistic regression model, maximum likelihood estimation and Bayesian idea, the model parameters are estimated, and the posterior probability of each gene was calculated. The final decision score was obtained by calculating the percentage of individual posterior probability. Acquisition of Priori Probability of Genes Suppose the integrated network contains genes [MATH: g1g n+m :MATH] , in which [MATH: g1g n :MATH] are the unknown ones and [MATH: gn+1 gn+m :MATH] are the known ones in the OMIM database. Similar to the method used in references [[69]8,[70]9], for