Abstract

   Alzheimer's disease (AD) is one of the most common neurodegenerative
   diseases. To identify AD-related genes from transcriptomics and help to
   develop new drugs to treat AD. In this study, firstly, we obtained
   differentially expressed genes (DEG)-enriched coexpression networks
   between AD and normal samples in multiple transcriptomics datasets by
   weighted gene co-expression network analysis (WGCNA). Then, a
   convergent genomic approach (CFG) integrating multiple AD-related
   evidence was used to prioritize potential genes from DEG-enriched
   modules. Subsequently, we identified candidate genes in the potential
   genes list. Lastly, we combined deepDTnet and SAveRUNNER to predict
   interaction among candidate genes, drug and AD. Experiments on five
   datasets show that the CFG score of GJA1 is the highest among all
   potential driver genes of AD. Moreover, we found GJA1 interacts with AD
   from target-drugs-diseases network prediction. Therefore, candidate
   gene GJA1 is the most likely to be target of AD. In summary,
   identification of AD-related genes contributes to the understanding of
   AD pathophysiology and the development of new drugs.

   Keywords: Alzheimer's disease, transcriptomics, drug repurposing, deep
   learning, drug-target interaction

Introduction

   Alzheimer's disease (AD) is one of the most common neurodegenerative
   diseases, accounting for the majority of dementia patients (Wood,
   [29]2018; Darby et al., [30]2019). AD is estimated to affect in 13.8
   million individuals in the United States (US), with 7.0 million being
   aged 85 years or older by 2050 (Alzheimer's Association, [31]2018;
   Cummings et al., [32]2019). Currently, genetic factor are believed to
   be partially responsible for AD (Xu et al., [33]2018). Genome-wide
   association studies (GWAS) have also revealed that some single
   nucleotide polymorphisms (SNPs) contribute to AD disease onset (Hao et
   al., [34]2019; Andrews et al., [35]2020). These include common variants
   such as amyloid protein precursor (APP), presenilin-1 (PSEN1),
   presenilin-2 (PSEN2) and apolipoprotein E (APOE). PSEN1, PSEN2 and APP
   genes are clear pathogenic genes of early-onset AD (Lanoiselée et al.,
   [36]2017). APOE, as the only identified risk gene for late-onset AD,
   can increase the rate of cognitive decline (Wijsman et al., [37]2011).
   Different microRNAs (miRNAs) are also involved in the pathophysiology
   of AD (Femminella et al., [38]2015). For example, miRNA-377 promotes
   cell proliferation and inhibits cell apoptosis by regulating the
   expression level of cadherin 13 (CDH13), thus participating in the
   occurrence and development of AD (Liu et al., [39]2018). Long
   non-coding RNAs (lncRNAs) have been widely reported to be associated
   with a variety of physiological and pathological processes, such as AD.
   Brain cytoplasmic RNA is a kind of lncRNA, and the overexpression of
   brain cytoplasmic may lead to synaptic/dendritic degeneration in AD
   (Doxtater et al., [40]2020). Despite the fact that remarkable advances
   have been made in the understanding of the genetic basis of AD, there
   is no disease modifying therapy for AD. Identification of AD-related
   genes from transcriptomics becomes an attractive strategy for finding
   potential targets for drug therapy.

   Gene expression profiling of transcriptomic datasets of AD and normal
   brain samples has identified potential genes and contributed to the
   search for potential targets (Patel et al., [41]2019). Correlation
   networks are often used to analyze gene expression data and gather
   biologically-relevant information from genes with similar co-expression
   patterns. At present, the two most commonly used gene co-expression
   network algorithms are SWItchMiner (SWIM) (Falcone et al., [42]2019)
   and Weighted Gene Correlation Network Analysis (WGCNA) (Nangraj et al.,
   [43]2020; Ren et al., [44]2020). SWIM constructs an unweighted
   correlation network using local and global graph attributes to mine
   genes, known as switch genes, that have been shown to be associated
   with drastic changes in cell phenotypes, such as cancer development.
   WGCNA builds a correlation network that can be weighted or unweighted,
   and identifies related genes by measuring the centrality of a gene in
   the network. However, SWIM does not consider scale-free networks. The
   most notable characteristic of a scale-free network is the relative
   commonness of vertices with a degree that greatly exceeds the average.
   The highest-degree nodes are often referred to as "hubs" and are
   considered to have a specific purpose in their network. WGCNA is based
   solely on a scale-free network that is used to determine the
   relationships between genes, thereby enabling the identification of
   modules (clusters) of highly correlated genes, and the hub gene in each
   module. WGCNA is ideal for the identification of gene modules and key
   genes that contribute to phenotypic traits. Here, we used WGCNA to mine
   AD-specific modules from DEGs of AD and normal samples and identified
   candidate genes of from AD-specific modules.

   Studying target-drug-disease network has contributed to the search for
   candidate genes of AD. In recent years, deep learning has been applied
   in biomedical and artificial intelligence fields, and many deep
   learning frameworks have been used to deal with the prediction problem
   of drug-target interaction (DTIs) (Xia et al., [45]2019). Öztürk et al.
   ([46]2018) proposed a convolutional neural network (CNN)-based method
   based on using only sequence information and performing DTIs prediction
   on Davis and KIBA dataset. Rayhan et al. developed the FRnet-DTI, which
   is using autoenconder and CNN for feature extraction and
   classification, respectively (Chu et al., [47]2021). Zeng et al.
   ([48]2020a) utilized cascade deep forest and arbitrary-order
   neighboring algorithms to predict DTIs. Zeng et al. ([49]2020b)
   developed deepDTnet, a deep learning methodology for new target
   identification and drug repurposing in a heterogeneous network
   embedding 15 types of chemical, genomic, phenotypic, and cellular
   network profiles. Lots of works has been proposed for drug repurposing.
   Zeng et al. ([50]2019) presented deepDR (deep learning-based drug
   repositioning), to systematically infer new drug-disease relationships
   for in silico drug repurposing. Fiscon et al. ([51]2021) proposed
   SAveRUNNER, which predicts drug-disease associations by quantifying the
   interplay between the drug targets and the disease-specific proteins in
   the human interactome via a novel network-based similarity measure that
   prioritizes associations between drugs and diseases locating in the
   same network neighborhoods. Here, we combined deepDTnet and SAveRUNNER
   to predict interaction among candidate genes, drug and AD.

   In this paper, we aimed to search potential driver genes for AD from
   DEGs based on multiple transcriptomics dataset. We hypothesized that
   the DEGs might be regulated by several candidate genes in the
   DEG-enriched coexpression modules/networks by WGCNA. We used CFG score
   as a measurement of the likelihood for candidate genes to be AD
   targets. Further, we combined deepDTnet and SAveRUNNER to predict
   interaction between candidate genes and AD based on gene-drug-disease
   network in [52]Figure 1.

Figure 1.

   [53]Figure 1
   [54]Open in a new tab

   A flowchart of the whole study. (1) Data collection from AlzData and
   ADNI; (2) Data preprocessing (e.g., eliminating the samples with
   missing data); (3) DEGs regarded with |logFC| > 0.1 and FDR < 0.05; (4)
   Enrichment of biological process analyzed by DAVID 6.8; (5) Use WGCNA
   to find AD-specific module; (6) Prioritize driver genes of AD by CFG
   score; (7) candidate genes with CFG≥5 are identified. (8) Collect the
   dataset of target, drug and disease; (9) Combine deepDTnet and
   SAveRUNNER to predict association between candidate genes and AD.

Materials and Methods

AD Expression Data Collection and Preprocessing

   Our dataset came from the AlzData and ADNI database. For AlzData, Xu et
   al. constructed new database AlzData (http://www.alzdata.org/)
   including, hippocampus (HP), entorhinal cortex (EC), frontal cortex
   (FC), and temporal cortex (TC). The original four microarray data come
   from Gene Expression Omnibus (GEO) (https:// www.ncbi.nlm.nih.gov/geo),
   by searching with the keyword “Alzheimer.” Data retrieval has been
   performed using the following series of criteria: 1) AD-related
   expression profiles in the ArrayExpress database
   (https://www.ebi.ac.uk/arrayexpress/) were checked to avoid potential
   omissions; 2) Studies with no genome-wide probes or few probes were
   filtered; 3) For those GSE series with possibly duplicated samples or
   identical sample resource, we retained the one with a larger sample
   size and excluded another; 4) Only expression profiles of human
   postmortem brain tissues from HP, EC, FC, and TC, which were main
   regions affected by AD, were included; 5) Data retrieval and quality
   control were double-checked by two investigators. To ensure data
   quality, samples that were younger than 50 years old, or were outliers
   in our principal component analysis (PCA) of expression distribution,
   were excluded from this study.

   For ADNI data (http://adni.loni.usc.edu), Gene expression profiling
   from peripheral blood samples collected using PAXgene tubes for RNA
   analysis was performed on the Affymetrix Human Genome U219 Array
   (www.affymetrix.com, Santa Clara, CA) for ADNI and on the Illumina
   Whole-Genome DASL assay (www.illumina.com, San Diego, CA) for
   AddNeuroMed and MCSA. All probe sets were mapped and annotated with
   reference to the human genome (hg19). Raw microarray expression values
   were pre-processed followed by standard quality control (QC) procedures
   on samples and probe sets. Briefly, raw expression values were
   pre-processed using the robust multi-chip average normalization method.
   We checked discrepancies between the reported sex and sex determined
   from sex-specific gene expression data including XIST and USP9Y and
   also evaluated whether SNP genotypes were matched with genotypes
   predicted from gene expression data.

   In this study, we only consider gene expression data and binary
   classification problem (control vs. AD). After data processing, e.g.,
   eliminating the samples with missing data, altogether, we have 467
   controls and 309 AD from five dataset for subsequent analyses in total,
   including EC (39 vs. 39), HP (67 vs. 74), FC (128 vs. 104), TC (39 vs.
   52) and ADNI (194 vs. 40). Detailed information of each dataset is
   shown in [55]Table 1.

Table 1.

   Brief descriptions for five datasets.
   Dataset AlzData Alzheimer's disease neuroimaging initiative
   Entorhinal cortex Hippocampus Frontal cortex Temporal cortex
   Abbreviation EC HP FC TC ADNI
   No.of.gene 15361 16313 11779 15462 49387
   Sample size(Control/AD) 78 (39/39) 141 (67/74) 232 (128/104) 91 (39/52)
   234 (194/40)
   Age 80 (29.6) 81.7 (9.6) 83 (9.4) 81 (8.7) 74.3 (6.5)
   Male/Female/Unknown 35/43/0 68/73/0 99/111/22 32/41/18 116/118/0
   Aβ NA NA NA NA 1142.9 (494.9)
   Tau NA NA NA NA 25.4 (11.6)
   [56]Open in a new tab

   These datasets come from AlzData and ADNI, respectively. Each dataset
   has multiple features. SDs are given in parentheses.

Statistical Analysis

   Genes with log2 fold change greater than 0.1 (|logFC| > 0.1) and FDR
   smaller than 0.05 (FDR < 0.05) were defined as DEGs in AD patients in
   the each dataset. Functional enrichment of the DEGs was produced from
   Database for DAVID 6.8, which now provides a comprehensive set of
   functional annotation tools for investigators to understand biological
   meaning behind large list of genes. For obtained list of DEGs, DAVID
   6.8 is able to identify enriched biological themes, particularly KEGG
   pathway and GO terms (Huang et al., [57]2007). Differential expression
   analysis was conducted by R package limma and the Benjamini-Hochberg's
   method was used to correct for multiple comparisons (Xu et al.,
   [58]2018).

Weighted Gene Co-expression Network Analysis

   We used R package WGCNA to perform the weighted correlation network
   analysis. For genes i and j, the correlation coefficient is r[ij], we
   define the correlation intensity :
   [MATH:
   <msub><mrow><mi>a</mi></mrow><mrow><mi>i</mi><mi>j</mi></mrow></msub><m
   o>=</mo><msubsup><mrow><mi>r</mi></mrow><mrow><mi>i</mi><mi>j</mi></mro
   w><mrow><mi>β</mi></mrow></msubsup> :MATH]
   , which depends on the choice of power β (the power value ranging from
   1 to 20). When the independence is more than 0.80, the scale-free
   network is obtained by screening the appropriate power value. Finally,
   the adjacency matrix was transformed into topological overlap matrix
   (TOM). Once the network is built through the TOM, it is converted to a
   distance matrix (1-TOM) to use it as the basis for clustering. A
   dynamic tree-cutting algorithm is then applied to the dendrogram to
   generate a partition of disjunct sets of genes. In addition, we
   extracted the corresponding gene information of each module for further
   analysis (Bot́ıa et al., [59]2017).

deepDTnet and SAveRUNNER

   In this study, we combined deepDTnet and SAveRUNNER to predict
   interaction between candidate genes and AD. deepDTnet and SAveRUNNER
   were applied to predict the interactions of candidate genes/targets and
   drugs and relationship drugs and diseases, respectively.

   Firstly, deepDTnet uses stacked denoising autoencoder (SDAE) to obtain
   low-dimensional embedding for both drugs and targets. A SDAE model
   minimizes the regularized problem and tackles reconstruction error,
   defined as follows:
   [MATH: <mtable class="eqnarray" columnalign="right center
   left"><mtr><mtd><mi>m</mi><mi>i</mi><msub><mrow><mi>n</mi></mrow><mrow>
   <msub><mrow><mi>w</mi></mrow><mrow><mi>l</mi></mrow></msub><mo>,</mo><m
   sub><mrow><mi>b</mi></mrow><mrow><mi>l</mi></mrow></msub></mrow></msub>
   <mo>|</mo><mo>|</mo><mi>x</mi><mo>-</mo><mover
   accent="true"><mrow><mi>x</mi></mrow><mo>^</mo></mover><mo>|</mo><msubs
   up><mrow><mo>|</mo></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn></mrow
   ></msubsup><mo>+</mo><mi>λ</mi><mstyle displaystyle="true"><munder
   class="msub"><mrow><mo>∑</mo></mrow><mrow><mi>l</mi></mrow></munder></m
   style><mo>|</mo><mo>|</mo><msub><mrow><mi>W</mi></mrow><mrow><mi>l</mi>
   </mrow></msub><mo>|</mo><msubsup><mrow><mo>|</mo></mrow><mrow><mi>F</mi
   ></mrow><mrow><mn>2</mn></mrow></msubsup></mtd></mtr></mtable> :MATH]
   (1)

   where x is input sample x(a vector); L is the number of layers, w[l] is
   weight matrix, and b[l] is bias vector of layer l∈{1, ., L}. λ is a
   regularization parameter and ||.||[F] denotes the Frobenius norm. The
   middle layer is the key that enables SDAE to reduce dimensionality and
   extract effective representations of side information.

   Subsequently, Positive Unlabeled-matrix completion is used to predict
   unknown drug-target pairs. Assume the drug-target interaction matrix is
   given as
   [MATH:
   <mi>P</mi><mo>∈</mo><msup><mrow><mi>R</mi></mrow><mrow><msub><mrow><mi>
   N</mi></mrow><mrow><mi>d</mi></mrow></msub><mo>×</mo><msub><mrow><mi>N<
   /mi></mrow><mrow><mi>t</mi></mrow></msub></mrow></msup> :MATH]
   , where N[d] is the number of drugs and N[t] is the number of targets.
   When P[ij] = 1, infers drug i is linked to target j while zero
   indicates the relationship is unobserved. The optimization problem of
   our model is parameterized as:
   [MATH: <mtable class="eqnarray" columnalign="right center
   left"><mtr><mtd><mstyle
   displaystyle="true"><munder><mrow><mi>m</mi></mrow><mrow><mi>i</mi><mo>
   ,</mo><mi>j</mi></mrow></munder></mstyle><mi>i</mi><mi>n</mi><mstyle
   displaystyle="true"><munder
   class="msub"><mrow><mo>∑</mo></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>∈</mo><msup><mrow><mi>Ω</mi></mrow><m
   row><mo>+</mo></mrow></msup></mrow></munder></mstyle><msup><mrow><mrow>
   <mo
   stretchy="false">(</mo><mrow><msub><mrow><mi>P</mi></mrow><mrow><mi>i</
   mi><mi>j</mi></mrow></msub><mo>-</mo><msub><mrow><mi>x</mi></mrow><mrow
   ><mi>i</mi></mrow></msub><mi>W</mi><msup><mrow><mi>H</mi></mrow><mrow><
   mi>T</mi></mrow></msup><msubsup><mrow><mi>y</mi></mrow><mrow><mi>j</mi>
   </mrow><mrow><mi>T</mi></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mn>2</mn></mrow></msup><mo>
   +</mo><mi>α</mi><mstyle displaystyle="true"><munder
   class="msub"><mrow><mo>∑</mo></mrow><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>∈</mo><msup><mrow><mi>Ω</mi></mrow><m
   row><mo>-</mo></mrow></msup></mrow></munder></mstyle><msup><mrow><mrow>
   <mo
   stretchy="false">(</mo><mrow><msub><mrow><mi>P</mi></mrow><mrow><mi>i</
   mi><mi>j</mi></mrow></msub><mo>-</mo><msub><mrow><mi>x</mi></mrow><mrow
   ><mi>i</mi></mrow></msub><mi>W</mi><msup><mrow><mi>H</mi></mrow><mrow><
   mi>T</mi></mrow></msup><msubsup><mrow><mi>y</mi></mrow><mrow><mi>j</mi>
   </mrow><mrow><mi>T</mi></mrow></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mn>2</mn></mrow></msup><mo>
   +</mo><mi>λ</mi><mrow><mo
   stretchy="false">(</mo><mrow><mo>|</mo><mo>|</mo><mi>W</mi><mo>|</mo><m
   subsup><mrow><mo>|</mo></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn></
   mrow></msubsup><mo>+</mo><mo>|</mo><mo>|</mo><mi>H</mi><mo>|</mo><msubs
   up><mrow><mo>|</mo></mrow><mrow><mi>F</mi></mrow><mrow><mn>2</mn></mrow
   ></msubsup></mrow><mo
   stretchy="false">)</mo></mrow></mtd></mtr></mtable> :MATH]
   (2)

   where the set Ω∈N[d] × N[t] is the observed entries from the true
   underlying matrix that includes both positive and negative entries,
   such that Ω = Ω^+∪Ω^−, let Ω^+ denotes the observed samples and
   Ω^−denotes the missing entries chosen as negatives. Under the
   assumption that the matrix is modeled to be low rank, i.e., W∈N[d] × k
   and H∈N[t] × k, and these matrices share a low dimensional latent
   space, satisfying k ≤ N[d], N[t]. For biased inductive matrix
   completion, the value α is the key parameter, λ is a regularization
   parameter. Next, we approximate the likelihood of the pairwise
   interaction score between drug i and target j as:
   [MATH: <mtable class="eqnarray" columnalign="right center
   left"><mtr><mtd><mi>S</mi><mi>c</mi><mi>o</mi><mi>r</mi><mi>e</mi><mrow
   ><mo
   stretchy="false">(</mo><mrow><mi>i</mi><mo>,</mo><mi>j</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><msub><mrow><mi>x</mi></mrow><m
   row><mi>i</mi></mrow></msub><mi>W</mi><msup><mrow><mi>H</mi></mrow><mro
   w><mi>T</mi></mrow></msup><msubsup><mrow><mi>y</mi></mrow><mrow><mi>j</
   mi></mrow><mrow><mi>T</mi></mrow></msubsup></mtd></mtr></mtable> :MATH]
   (3)

   where the higher score means a higher possibility that drug i is
   correlated with target j.

   Then, to quantify the vicinity between drug and disease modules,
   SAveRUNNER implements a novel network similarity measure:
   [MATH: <mtable class="eqnarray" columnalign="right center
   left"><mtr><mtd><mi>f</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>p</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><
   mrow><mn>1</mn><mo>+</mo><msup><mrow><mi>e</mi></mrow><mrow><mo>-</mo><
   mi>c</mi><mrow><mo>[</mo><mrow><mfrac><mrow><mrow><mo
   stretchy="false">(</mo><mrow><mn>1</mn><mo>+</mo><mi>Q</mi><mi>C</mi></
   mrow><mo stretchy="false">)</mo></mrow><mrow><mo
   stretchy="false">(</mo><mrow><mi>m</mi><mo>-</mo><mi>p</mi></mrow><mo
   stretchy="false">)</mo></mrow></mrow><mrow><mi>m</mi></mrow></mfrac><mi
   >d</mi></mrow><mo>]</mo></mrow></mrow></msup></mrow></mfrac></mtd></mtr
   ></mtable> :MATH]
   (4)

   Where p is the network proximity measure defined:
   [MATH: <mi>p</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>T</mi><mo>,</mo><mi>S</mi></mrow><mo
   stretchy="false">)</mo></mrow><mo>=</mo><mfrac><mrow><mn>1</mn></mrow><
   mrow><mo>|</mo><mo>|</mo><mi>T</mi><mo>|</mo><mo>|</mo></mrow></mfrac><
   munder
   class="msub"><mrow><mo>∑</mo></mrow><mrow><mi>t</mi><mo>∈</mo><mi>T</mi
   ></mrow></munder><mi>m</mi><mi>i</mi><msub><mrow><mi>m</mi></mrow><mrow
   ><mi>s</mi><mo>∈</mo><mi>S</mi></mrow></msub><mi>d</mi><mrow><mo
   stretchy="false">(</mo><mrow><mi>t</mi><mo>,</mo><mi>s</mi></mrow><mo
   stretchy="false">)</mo></mrow> :MATH]
   that represents the average shortest path length between drug targets t
   in the drug module T and the nearest disease genes s in the disease
   module S; QC is the quality cluster score; m is max(p); c and d are the
   steepness and the midpoint of f(p), respectively.

   Finally, via deepDTnet and SAveRUNNER, we identified newly the
   relationship among candidate genes, drug and neurodegenerative
   diseases, which is including AD.

   More detail about deepDTnet and SAveRUNNER could be found in previous
   study (Zeng et al., [60]2020b; Fiscon et al., [61]2021).

Convergent Functional Genomics

   The potential driver genes was prioritized from AD-specific modules by
   CFG method, which integrated various levels of AD-related evidence
   (Ayalew et al., [62]2012; Xu et al., [63]2018). The range of CFG score
   was from 0 to 5, with 5 indicating highest priority. There were five
   AD-related evidence:1) Genetic association. If a gene had at least one
   locus being significantly associated with AD based on the summary
   statistics from the International Genomics of Alzheimer's Project
   [IGAP], 1 point was assigned; otherwise zero point. 2) Genetic
   regulation of gene expression. If a gene was associated with Expression
   Quantitative Trait Loci (eQTLs) showing an AD-risk in IGAP data, 1
   point was assigned; otherwise zero point. 3) Protein-protein
   interaction. If a gene was physically interacted with any AD core genes
   (APP, PSEN1, PSEN2, APOE, or MAPT), 1 point was assigned; otherwise
   zero point. 4) Expression correlation with AD pathology. If the
   expression level of a gene was correlated with AD pathology in AD mice,
   1 point was assigned; otherwise zero point. 5) Early alteration in AD
   mouse brain. If a gene showed differential expression in hippocampus of
   2-month-old AD mice compared with age matched wild-type mice, 1 point
   was assigned; otherwise zero point.

Results

DEG Detection

   A total of 776 samples and 108,302 genes from multiple transcriptomic
   datasets were compiled for DEGs detection. Besides, for ADNI dataset,
   we randomly chose 40 samples from the control in 10 times and selected
   gene with frequency greater than or equal to 3. Each red node
   represented DEG for five datasets in [64]Figure 2. We identified 7,567
   DEG(2166 EC, 1952 HP, 949 FC, 3075 TC and 3204 ADNI) for subsequent
   analyses. About 6 19% of the total genes could be identified as DEGs.
   Among the DEG list in all five datasets, the expression patterns of
   well-known AD risk genes, such as APP, PSEN1, PSEN2, APOE and MAPT were
   only slightly altered or unchanged in AD patients. In addition, 19
   genes had a consistently differential expression from EC, HP, FC, TC
   and ADNI ([65]Figure 3). We investigated functional enrichment of the
   AD-related DEGs. The 7,567 target genes in the network were enriched in
   324 KEGG pathway and 1,381 GO terms in [66]Figure 4. We identified 61
   KEGG pathway and 324 GO terms (P< 0.005), respectively. As shown in
   [67]Table 2, we also found several pathways have been reported to be
   associated with AD, including Alzheimer's disease pathway, MAPK
   signaling pathway and AMPK signaling pathway. Top 20 significantly KEGG
   pathway selected was exhibited for each dataset in [68]Figure 5.
   Besides, these GO terms are divided into ontologies based on a
   hierarchical relations. Specifically, DEGs related to the biological
   processes for synaptic-related functions were significant enriched in
   [69]Table 3, such as chemical synaptic transmission, regulation of
   postsynaptic membrane potential, synaptic vesicle exocytosis, synaptic
   transmission, GABAergic, regulation of synaptic transmission,
   glutamatergic, synaptic vesicle endocytosis and long-term synaptic
   potentiation. In addition, they were associated with neuron-related
   processes, including neurotransmitter secretion, neuron projection
   morphogenesis, negative regulation of neuron apoptotic process and
   negative regulation of neuron projection development.

Figure 2.

   [70]Figure 2
   [71]Open in a new tab

   Enhanced Volcano for illustrating DEGs in all datasets. The gene with
   |logFC| > 0.1 and FDR < 0.05 as DEGs shown in red node. (A) EC, (B) HP,
   (C) FC, (D) TC and (E) ADNI. Note: in ADNI dataset, DEGs by counting
   the frequency of 3 or above out of 10 occurrences.

Figure 3.

   Figure 3
   [72]Open in a new tab

   Venn diagram is used to represent relationships between EC (blue), HP
   (red), FC (green), TC (yellow) and ADNI (brown).

Figure 4.

   [73]Figure 4
   [74]Open in a new tab

   Venn diagram is used to represent relationships between multiple
   datasets. (A) KEGG pathway and (B) GO term.

Table 2.

   Significant KEGG pathways obtained from DAVID (P < 0.005).
   ID Description ID Description
   hsa00020 Citrate cycle (TCA cycle) hsa04966 Collecting duct acid
   secretion
   hsa00190 Oxidative phosphorylation hsa05010 Alzheimer's disease
   hsa00260 Glycine, serine and threonine metabolism hsa05012 Parkinson's
   disease
   hsa00620 Pyruvate metabolism hsa05014 Amyotrophic lateral sclerosis
   hsa01200 Carbon metabolism hsa05016 Huntington disease
   hsa01210 2-Oxocarboxylic acid metabolism hsa05017 Spinocerebellar
   ataxia
   hsa01230 Biosynthesis of amino acids hsa05020 Prion disease
   hsa01522 Endocrine resistance hsa05022 Pathways of neurodegeneration -
   multiple diseases
   hsa03050 Proteasome hsa05032 Morphine addiction
   hsa04010 MAPK signaling pathway hsa05033 Nicotine addiction
   hsa04070 Phosphatidylinositol signaling system hsa05110 Vibrio cholerae
   infection
   hsa04071 Sphingolipid signaling pathway hsa05120 Epithelial cell
   signaling in Helicobacter pylori infection
   hsa04110 Cell cycle hsa05131 Shigellosis
   hsa04120 Ubiquitin mediated proteolysis hsa05132 Salmonella infection
   hsa04137 Mitophagy - animal hsa05140 Leishmaniasis
   hsa04140 Autophagy - animal hsa05145 Toxoplasmosis
   hsa04144 Endocytosis hsa05152 Tuberculosis
   hsa04145 Phagosome hsa05163 Human cytomegalovirus infection
   hsa04152 AMPK signaling pathway hsa05167 Kaposi sarcoma-associated
   herpesvirus infection
   hsa04211 Longevity regulating pathway hsa05169 Epstein-Barr virus
   infection
   hsa04218 Cellular senescence hsa05202 Transcriptional misregulation in
   cancer
   hsa04260 Cardiac muscle contraction hsa05205 Proteoglycans in cancer
   hsa04360 Axon guidance hsa05212 Pancreatic cancer
   hsa04625 C-type lectin receptor signaling pathway hsa05214 Glioma
   hsa04666 Fc gamma R-mediated phagocytosis hsa05215 Prostate cancer
   hsa04721 Synaptic vesicle cycle hsa05219 Bladder cancer
   hsa04722 Neurotrophin signaling pathway hsa05220 Chronic myeloid
   leukemia
   hsa04723 Retrograde endocannabinoid signaling hsa05223 Non-small cell
   lung cancer
   hsa04920 Adipocytokine signaling pathway hsa05225 Hepatocellular
   carcinoma
   hsa04932 Non-alcoholic fatty liver disease hsa05235 PD-L1 expression
   and PD-1 checkpoint pathway in cancer
   hsa04961 Endocrine and other factor-regulated calcium reabsorption
   [75]Open in a new tab

Figure 5.

   [76]Figure 5
   [77]Open in a new tab

   Top 20 pathway of KEGG for five datasets (P < 0.005). (A) EC, (B) HP,
   (C) FC, (D) TC, and (E) ADNI.

Table 3.

   Significant GO terms obtained from DAVID (P < 0.005).
   ID Term
   GO:0002223 Stimulatory C-type lectin receptor signaling pathway
   GO:0006888 ER to Golgi vesicle-mediated transport
   GO:0048015 Phosphatidylinositol-mediated signaling
   GO:0038128 ERBB2 signaling pathway
   GO:0007249 I-kappaB kinase/NF-kappaB signaling
   GO:0006672 ceramide metabolic process
   GO:0000165 MAPK cascade
   GO:0045944 Positive regulation of transcription from RNA polymerase II
   promoter
   GO:0007269 Neurotransmitter secretion
   GO:0035329 Hippo signaling
   GO:0006120 Mitochondrial electron transport, NADH to ubiquinone
   GO:0042776 Mitochondrial ATP synthesis coupled proton transport
   GO:0070125 Mitochondrial translational elongation
   GO:0032981 Mitochondrial respiratory chain complex I assembly
   GO:0007409 Axonogenesis
   GO:0048812 Neuron projection morphogenesis
   GO:0043524 Negative regulation of neuron apoptotic process
   GO:0007268 Chemical synaptic transmission
   GO:0060078 Regulation of postsynaptic membrane potential
   GO:0016079 Synaptic vesicle exocytosis
   GO:0048813 Dendrite morphogenesis
   GO:0090263 Positive regulation of canonical Wnt signaling pathway
   GO:0009967 Positive regulation of signal transduction
   GO:0051932 Synaptic transmission, GABAergic
   GO:0046034 ATP metabolic process
   GO:0070933 Histone H4 deacetylation
   GO:0007420 Brain development
   GO:0007417 Central nervous system development
   GO:0035357 Peroxisome proliferator activated receptor signaling pathway
   GO:0015986 ATP synthesis coupled proton transport
   GO:0040029 Regulation of gene expression, epigenetic
   GO:0007399 Nervous system development
   GO:0051966 Regulation of synaptic transmission, glutamatergic
   GO:0048488 Synaptic vesicle endocytosis
   GO:0010977 Negative regulation of neuron projection development
   GO:0060071 Wnt signaling pathway, planar cell polarity pathway
   GO:0006521 Regulation of cellular amino acid metabolic process
   GO:2000310 Regulation of N-methyl-D-aspartate selective glutamate
   receptor activity
   GO:0038061 NIK/NF-kappaB signaling
   GO:0035418 Protein localization to synapse
   GO:0060291 Long-term synaptic potentiation
   [78]Open in a new tab

   The first column is GO terms ID; the second column is the name of GO
   terms.

   We used WGCNA to divide the DEGs into several highly related gene
   modules. As shown in [79]Figure 6, a very significant positive
   correlation was observed between five modules and AD for five dataset.
   A modular size was ranged from 96 to 142 genes that might reflect the
   different layers and complexity of gene regulation in the AD brain.
   These five AD-specific modules were used for identifying potential
   driver genes for AD etiology and pathology. We obtained potential
   driver genes from each AD-specific modules for every dataset. Finally,
   after removing the overlap genes, we have 602 candidate genes from 5
   AD-specific modules in total, including EC (107), HP(140), FC(142),
   TC(136) and ADNI(96). We hypothesized that the higher the CFG score is,
   the more likely the candidate genes are to be AD targets. We chose 40
   genes with CFG ≥ 4 for subsequent analyses.

Figure 6.

   [80]Figure 6
   [81]Open in a new tab

   Module-trait relationships for five datasets.Each row represents
   different gene co-expression modules, and each column represents
   different clinical phenotypes. Number represent correlation
   coefficients and P-values are in parenthesis. Correlation strength is
   represented by continuous color, with red being positive, blue being
   negative. (A) EC, (B) HP, (C) FC, (D) TC, and (E) ADNI.

Identification and Prioritization of Potential Driver Genes

   The 40 potential driver genes are prioritized by the CFG method based
   on AlzData database, which is integrated various levels of AD-related
   data in [82]Table 4. For each gene, we showed the eQLT, GWAS, PPI,
   Early_DEG, Pathology correlation Aβ and Tau (CFG ≥ 4), and CFG score.
   We found that several genes were validated by previous studies from
   literatures. For example, GJA1, also known as connexin 43, shows
   upregulated mRNA and protein levels in AD (Ren et al., [83]2018).
   Specific reductions of RPH3A immunoreactivity compared with aged
   controls. RPH3A loss correlated with dementia severity, cholinergic
   deafferentation, and increased Aβ concentrations. Furthermore, RPH3A
   expression is selectively downregulated in cultured neurons treated
   with Aβ 25–35 peptides (Tan et al., [84]2014). CASP6 activity is
   intimately associated with the pathologies that define AD, correlates
   well with lower cognitive performance in aged individuals, and is
   involved in axonal degeneration in several cellular and in vivo animal
   models (LeBlanc, [85]2013). The levels of angiotensinogen (AGT) is
   increased in the cerebrospinal fluid of patients with mild cognitive
   impairment and AD (Mateos et al., [86]2011). The stromal cell-derived
   factor 1 (SDF1), known as chemokine CXCL12, was a proinflammatory
   chemokine, highly expressed in the central nervous system. They may
   regulate synaptic transmission in excitability neurons and modulate
   neuroglial communication. CXCL12 was detected in plasma and hippocampus
   AD patients. Levels of this chemokine were considerably decreased
   compared to the control group (Dulewicz et al., [87]2020). In summary,
   combining WGCNA with CFG offer a useful tool to prioritize potential
   genes for AD.

Table 4.

   The 40 potential driver genes are prioritized by the CFG method based
   on AlzData database.
   Gene AD-related evidence CFG
   eQTL GWAS PPI Early_DEG Pathology cor
   (Aβ) (Tau)
   GJA1 2 2 PSEN1, MAPT, APOE yes 0.388^** 0.131^ns 5
   FOXO1 1 0 PSEN2 yes 0.270^ns 0.526^* 4
   PRKX 3 NA PSEN1 yes 0.352^* –0.023^ns 4
   RPH3A 5 2 - yes –0.199^ns –0.738^** 4
   CASP6 5 0 APP, PSEN1, PSEN2, MAPT yes 0.482^*** 0.738^** 4
   CRMP1 1 3 MAPT NA –0.304^* –0.506^ns 4
   RGS4 1 32 - yes –0.419^** –0.579^* 4
   NPTX2 1 1 - yes –0.688^*** –0.783^*** 4
   RPS27 1 0 PSEN2 yes 0.503^*** 0.662^** 4
   MEGF10 3 8 - yes 0.559^*** 0.120^ns 4
   AP2A1 1 0 APP, PSEN2, MAPT yes –0.277^ns –0.585^* 4
   PITPNC1 10 1 - yes –0.128^ns –0.638^* 4
   AGT 1 0 APP, PSEN1, APOE yes –0.359^* 0.002^ns 4
   AQP4 7 4 - yes 0.800^*** 0.275^ns 4
   MYT1L 3 12 - yes –0.488^*** –0.583^* 4
   IQGAP1 1 0 PSEN1 yes 0.310^* 0.282^ns 4
   IGFBP7 8 0 MAPT, APOE yes 0.353^* 0.510^ns 4
   CITED2 1 0 APP, PSEN1, APOE yes –0.433^** –0.772^*** 4
   SMAD1 16 1 APP, APOE NA –0.332^* –0.497^ns 4
   CDH7 0 1 PSEN1 yes –0.345^* –0.691^** 4
   MSRB2 5 2 - yes 0.32^* 0.609^* 4
   DBI 1 1 - yes 0.780^*** 0.718^** 4
   PELI2 2 0 PSEN2 yes 0.591^*** –0.107^ns 4
   AVEN 1 1 - yes 0.525^*** 0.008^ns 4
   F13A1 7 3 APP, APOE NA 0.195^ns 0.623^* 4
   SLA 1 0 PSEN1, MAPT yes 0.114^ns 0.662^** 4
   ADAMTS20 2 17 - yes 0.085^ns 0.587^* 4
   RARB 6 2 PSEN2 yes –0.064^ns –0.387^ns 4
   SDC2 8 3 PSEN1, PSEN2, MAPT, APOE yes 0.041^ns 0.086^ns 4
   DCN 8 0 APP, PSEN1, MAPT, APOE yes –0.416^** 0.546^* 4
   CCR5 1 0 APP yes 0.769^*** 0.616^* 4
   GPRC5B 2 41 - yes 0.307^* –0.248^ns 4
   IRF5 1 0 APP, PSEN1, PSEN2, MAPT, APOE yes 0.879^*** 0.839^*** 4
   IGFBP7 8 0 MAPT, APOE yes 0.353^* 0.510^ns 4
   CXCL12 1 0 APP, PSEN2, MAPT, APOE yes 0.432^** –0.069^ns 4
   CREM 1 0 PSEN1, MAPT, APOE yes –0.439^** –0.396^ns 4
   EHHADH 14 0 MAPT, APOE yes 0.438^** –0.022^ns 4
   SLC1A3 7 1 - yes 0.651^*** 0.494^ns 4
   VAV3 0 5 MAPT yes 0.319^* –0.284^ns 4
   IL15 2 18 - yes 0.623^*** 0.685^** 4
   [88]Open in a new tab

   “NA,” not applicable due to missing related data for the target gene.
   AD, Alzheimer's disease; CFG, convergent functional genomics score
   based on the total number of lines of AD-related evidence; DEG,
   differentially expressed gene; eQTL, the total number of risk SNPs
   based on the IGAP data setthat were able to regulate expression of the
   target gene; GWAS, the total number of risk SNPs within the target gene
   based on the IGAP data set; PPI, AD core genes (APP, PSEN1, PSEN2,
   MAPT, and APOE) that had a significant protein-protein interaction with
   the target genes; Early_DEG: target gene is differentially expressed in
   AD mouse models before AD pathology emergence; Expression correlation
   of the target gene and AD pathology in AD mice was performed for the Aβ
   line AD mice in Mouse (marked as Aβ) and the Tau line AD mice in Mouse
   (marked as Tau). *P < 0.05; **P < 0.01; ^***P < 0.001.

Candidate Genes GJA1

   As shown in [89]Table 4, the CFG score of GJA1 is the highest among all
   potential genes and regarded as candidate gene. We combined deepDTnet
   and SAveRUNNER to search association between candidate genes GJA1 and
   AD based on target-drug-disease network. As shown in [90]Figure 7, the
   network is constructed 13 drugs, a candidate genes GJA1 and
   neurodegenerative diseases. 11 newly drug-target interaction and 13
   newly drug-disease association are identified by deepDTnet and
   SAveRUNNER, respectively. Especially, we found that dopamine were
   validated by previous studies from literatures. Dopamine, a compound of
   the catecholamine and phenethylamine families playing important roles
   in the human brain, was predicted by deepDR to be associated with AD.
   Such a prediction can be supported by a previous study indicating that
   lack of dopamine in the brain may cause some of the earliest symptoms
   of Alzheimer (Zeng et al., [91]2019). In AD, the dysfunction of
   dopaminergic transmission has been hypothesized as a new player in the
   pathophysiology of AD. Dopamine acts through five different types of
   receptors, generally distinct in two main subclasses: D1-like
   [comprising the dopamine 1 receptor (D1R) and the dopamine 5 receptor
   (D5R)]; and D2-like [comprising the dopamine 2 receptor (D2R), dopamine
   3 receptor (D3R) and the dopamine 4 receptor (D4R)]. Pan et al. found
   that dopamine, D1R and D2R concentration levels were decreased in
   patients with AD compared with controls. Moreover, decreased levels of
   dopamine and D2-like receptors were linked with the pathophysiology of
   AD because of their strong higher rank correlations with AD (Pan et
   al., [92]2020). To conclude, candidate genes GJA1 is the most likely to
   be targets of AD.

Figure 7.

   [93]Figure 7
   [94]Open in a new tab

   Drug-GJA1-disease interaction network. The network contained candidate
   target GJA1 (green), Neurodegenerative Diseases (red) and 13 drugs
   (yellow).Gray indicate known interaction. Green and red lines and newly
   predicted interactions using deepDTnet and SAveRUNNER, respectively.

Discussion

   Pathway enrichment analysis was performed to interpret the function of
   these DEGs. KEGG pathway analysis for the 7,567 DEGs were significantly
   enriched in one KEGG pathway “MAPK signaling pathway,” which is
   composed of ERK, P38, and JNK. In the adult nervous system, ERK
   activation is necessary for synaptic plasticity and memory formation
   (Du et al., [95]2019). In the brains of AD patients, P38 is highly
   expressed. Aβ-induced P38 activation increases tau phosphorylation and
   promotes the amyloidogenic processing of APP (Giraldo et al., [96]2014;
   Gourmaud et al., [97]2015). In a mouse model of AD, the JNK signaling
   pathway is overactivated in the spine before cognitive decline (Sclip
   et al., [98]2014). These studies indicate that the overactivation of
   MAPK signaling pathway could cause the occurrence of AD. Therefore,
   preventing MAPK overactivation is effective strategy in order to reduce
   Aβ deposition, Tau hyperphosphorylation, neuronal apoptosis, and memory
   impairment. MAPKs could be potential targets for novel and effective
   therapeutics of AD (Yenki et al., [99]2013; Feld et al., [100]2014).

   GO term analysis indicated that the 7,567 DEGs were mainly involved in
   chemical synaptic transmission, regulation of postsynaptic membrane
   potential, synaptic vesicle exocytosis, synaptic transmission,
   GABAergic synapses, regulation of synaptic transmission, glutamatergic,
   synaptic vesicle endocytosis, long-term synaptic potentiation,
   neurotransmitter secretion, neuron projection morphogenesis, negative
   regulation of neuron apoptotic process and negative regulation of
   neuron projection development. Damage to neuronal and synaptic function
   has always been considered an important pathological feature of
   neurodegenerative diseases, and decreased synaptic activity is also
   considered to be the most relevant pathological feature of AD cognitive
   impairment (Wu et al., [101]2019). For example, the downregulation of
   GABAergic synapses is closely related to the loss of GABAergic
   inhibition (Kim et al., [102]2020). Studies have found that GABAergic
   neurotransmission is closely related to various aspects of AD
   pathology, including Aβ toxicity and Tau hyperphosphorylation (Kadoyama
   et al., [103]2021). The level of GABA inhibitory neurotransmitter in AD
   patients was significantly reduced, suggesting that AD has insufficient
   synaptic function and neuronal transmission (Schmitz et al.,
   [104]2017). In addition, In a mouse model of AD indicate that the
   impairment of hippocampal neurogenesis may be mediated by GABAergic
   signal dysfunction or the imbalance between excitatory and inhibitory
   synapses (Sun et al., [105]2009). Therefore, GABAergic synapses not
   only plays an important role in the function of the hippocampus, but
   also in the pathogenesis of AD.

Limitations

   There are some limitations in this study. First, although we identified
   23 potential driver genes of AD by the WGCNA and CFG method, these
   approachs could be used to prioritize genes rather than to identify
   true causal genes. Therefore, further biological validation of the
   identified genes are necessary in future studies. Second, 4 of 5
   datasets were downloaded from AlzData, which only retained the common
   genes from different studies during the cross-platform normalization.
   Third, the sample size of EC, HP and TC available for analyze was still
   limited, and the larger sample size of FC and ADNI might have a greater
   influence on the results. Fourth, the rapid development of various
   omics provide new opportunities for understanding of AD. However, we
   only used transcriptomics dataset to identify potential driver genes of
   AD. Finally, more potential genes of AD were not considered. Deep
   learning has capacity to dig out more hidden gene in data and is a
   machine learning algorithm based on artificial neural network, which is
   a computational model inspired by the structure of human brain. The
   main difference between deep learning and traditional artificial neural
   network lies in the scale and complexity of network structure. The
   networks of deep learning have a larger number of hidden layers, while
   traditional artificial neural networks usually have only one hidden
   layer. This is due to the lack of big data and GPU hardware technical
   support in the last century. Due to the emergence of more powerful CPU
   and GPU hardware, deep learning with more hidden layers is proposed on
   the basis of artificial neural network, and more nodes can be used in
   each hidden layer (Esteva et al., [106]2019; Zou et al., [107]2019).

Conclusions

   In this study, we identified potential driver genes from AD-specific
   modules using multiple transcriptomics datasets and observed that DEGs
   were enriched with several pathways significantly by DAVID 6.8, which
   are consistent with observations from previous studies. Moreover,
   through studying of WGCNA, CFG and drug-target-disease network
   prediction, candidate gene GJA1 is the most likely to be targets of AD,
   actually reported in previous study. In summary, identification of
   AD-related genes contributes to the understanding of AD pathophysiology
   and the development of new drugs. In summary, Our results contribute to
   understanding pathophysiology of AD and looking for candidates drug
   targets.

Data Availability Statement

   The original contributions presented in the study are publicly
   available. This data can be found here:
   [108]https://github.com/Macau-LYXia/Transcriptomics-Data-for-AD. Data
   used in the preparation of this article were obtained from the AlzData
   ([109]http://www.alzdata.org/) and Alzheimer's Disease Neuroimaging
   Initiative (ADNI) database ([110]adni.loni.usc.edu).

Author Contributions

   L-YX and LT contributed to collect data sets and analyze data. L-YX,
   LT, HH, and JL contributed to the interpretation of the results and
   revised the manuscript. L-YX took the lead in writing the manuscript.
   All authors contributed to the article and approved the submitted
   version.

Funding

   This work was supported by China Postdoctoral Science Foundation
   (2020M671125) and start-up grant of the Shanghai Jiao Tong University
   (WF220408213).

Conflict of Interest

   The authors declare that the research was conducted in the absence of
   any commercial or financial relationships that could be construed as a
   potential conflict of interest.

Publisher's Note

   All claims expressed in this article are solely those of the authors
   and do not necessarily represent those of their affiliated
   organizations, or those of the publisher, the editors and the
   reviewers. Any product that may be evaluated in this article, or claim
   that may be made by its manufacturer, is not guaranteed or endorsed by
   the publisher.

References