Abstract

Background

   Major depressive disorder (MDD) is a serious mental health problem in
   modern society, which is difficult to identify and diagnose in the
   early stages. Despite strong evidence supporting the heritability of
   MDD, progresses in large‐scale and individual genetic studies remain
   preliminary.

Methods

   In this study, a multi‐data source‐based prioritization (MDSP) method
   was proposed, and an appropriate threshold was determined for the
   optimization of depression‐related genes (DEPgenes). Analyses on Gene
   Ontology biological processes, KEGG pathway and the specific pathway
   crosstalk network were further proposed.

Results

   A total of 143 DEPgenes were identified and the MDD‐specific network
   was constructed for the pathogenesis investigation and therapeutic
   methods development of MDD. Comparing with existing research
   strategies, the genetic optimization and analysis results were
   confirmed to be reliable. Finally, the pathway enrichment and crosstalk
   analyses revealed two unique pathway interaction modules that were
   significantly enriched with MDD genes. The related core pathways of
   neuroactive ligand‐receptor interaction and dopaminergic synapse
   supported the neuropathology hypothesis of MDD. And the pathways of
   serotonergic synapse and morphine addiction indicated the mechanism of
   drug addiction caused by serotonin used in the treatment.

Conclusions

   This work provided a reference for the study of MDD, although future
   validation by extensive experimentation is still required.

   Keywords: gene ontology, KEGG pathway, major depressive disorder,
   multi‐data‐source based prioritization

1. INTRODUCTION

   Major depressive disorder (MDD) is a severe psychiatric disease with
   high morbidity and mortality worldwide (Culpepper, Lam, & McIntyre,
   [32]2017). This growing recognition of the public health burden has led
   to the development of depression detection and treatment. However,
   novel interventions of depression are still hindered by a limited
   understanding of the neurobiological mechanisms (Bayes & Parker,
   [33]2018). The efforts to clarify this biology through common or rare
   variant association studies seemed to be unsuccessful with the lack of
   distinct understanding of heterogeneity and absence of a biological
   gold‐standard diagnosis (Krystal & State, [34]2014). Nowadays, strong
   shreds of heritability evidence of mental diseases have been revealed
   (Alnaes et al., [35]2018; Pain et al., [36]2018), which attracted the
   studies on the generation of numerous genetic and genomic datasets in
   MDD studies.

   During the past decade, rapid advances in high throughput technologies
   have helped investigators, aiming to uncover disease causal genes and
   their actions in complex diseases. Specifically, in psychiatric
   genetics, there have been numerous datasets from different platforms or
   sources such as association studies, including genome‐wide association
   studies, genome‐wide linkage scans, microarray gene expression, and
   copy number variation (Michaelson, [37]2017). Large‐scale and
   individual genetic studies revealed various polymorphisms and
   overexpression of certain genes in patients presenting with depressive
   symptoms (Lacerda‐Pinheiro et al., [38]2014; Milanesi et al.,
   [39]2015). Zhang's group has found that, increased 5‐HT1A expression
   inversely correlated with 5‐HT activity via a negative feedback
   mechanism (Zhang et al., [40]2014). Moreover, HPA axis hyperactivity
   was reported as a trigger of MDD due to findings of GR and
   mineralocorticoid receptor dysfunction in depressed patients (Pariante
   & Lightman, [41]2008). However, a pervasive limitation in the existing
   research is the inherent heterogeneity in MDD studies, which impacts
   the validity of biomarker data (Young et al., [42]2016). Thus it is
   still necessary to simplify these depression‐related candidate genes to
   an optimal set for the subsequent biological experiments. Moreover, the
   incompletion of information resources used in existing calculation and
   the fixed screening threshold of corresponding online tools also result
   in arbitrarily preferred results and lower reliability.

   In this study, gene information from multiple sources (including OMIM,
   Phenolyzer, GeneCards and GLAD4U) were integrated and analyzed for MDD.
   A multi‐data‐source based prioritization (MDSP) was proposed and an
   appropriate threshold was determined for the optimization of
   depression‐related genes (DEPgenes). Finally, the acquired genes which
   were significantly related to depression (DEPgenes) were verified by
   the receiver operating characteristic (ROC) curve and functional and
   pathway enrichment analysis. Our work demonstrated a practical
   framework for complex disease candidate gene analysis, which is of
   great significance for the comprehensive functional assessment of
   optimized pathogenic genes.

2. MATERIALS AND METHODS

2.1. MDD candidate genes and optimizing process

   OMIM ([43]www.omim.org), which provides vast repositories of rich
   clinical and genetic knowledge, was considered as a core gene database
   in this study. For association studies, the susceptibility genes were
   retrieved by searching all human genetic association studies deposited
   in Phenolyzer ([44]phenolyzer.usc.edu), GeneCards
   ([45]www.genecards.org) and GLAD4U ([46]bioinfo.vanderbilt.edu/glad4u),
   which used as training gene categories. However, the background
   information of the dataset‐related patients is not provided in the
   database. For all the genes collected, genes presented in a certain
   training category were assigned a score of 1 point; otherwise, 0 was
   assigned. Thus, a gene could be represented by a vector of three
   elements, with each element being 1 or 0. When a gene showed up in all
   the training categories, all the elements in the vector would be 1's;
   on the other hand, a gene had at least one element being 1. For each
   training category, a weight was assigned to measure the category's
   reliability. A combined score derived from the category‐specific weight
   and gene score in the corresponding category was adopted to measure the
   correlation between a gene and the phenotype. All the candidate genes
   were ranked by their combined scores computed from their scores
   corresponding to the categories and the optimal weights. The combined
   scores were calculated by equation [47]1:
   [MATH:
   <mrow><msub><mi>S</mi><mtext>Combined</mtext></msub><mo>=</mo><munderov
   er><mo
   movablelimits="false">∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow>
   <mi>N</mi></munderover><msub><mi>w</mi><mi>i</mi></msub><mo>×</mo><msub
   ><mtext>Score</mtext><mi>i</mi></msub></mrow> :MATH]
   (1)

   where i was the training category index, N = 3, W[i] was the
   corresponding weight of category[i], and Score[i] and was equal to 1
   when a gene showed up in category[i]; otherwise, Score[i]=0.

   The combined score of a gene depends on its score from each training
   category and the corresponding weight value. In order to prioritize the
   genes collected so that the genes more likely correlated with MDD can
   be ranked higher in the list, a suitable weight for each training
   category needs to be determined. In this study, the following procedure
   was adopted:
    1. Randomly selecting weight value between 0 and 1.0 for each training
       category and normalizing the weight matrix (consisted of the three
       weights) to have a sum of 1;
    2. Calculating the combined score S for all genes by equation [48]1
       and ranking all genes according to their combined scores;
    3. Calculating ratio R: calculating the proportion k of core genes
       known to be related to MDD selected from OMIM in the top 3% of all
       candidate genes and R = k/23;
    4. Reallocate values into the weight matrix and keeping the weight
       matrix to have a sum of 1.
    5. Calculating ratio R after obtaining the new score S and ranking of
       all candidate genes;
    6. Repeating steps 2–5 until no larger R can be found, and then the
       weight matrix obtained is the optimal weight matrix.

2.2. Evaluation of genetic optimizing results

   The ROC curve was employed to assess the discrimination capability of
   the classifiers proposed in this study. ROC curves represent the
   performance of a classifier without taking into consideration class
   distribution or error overheads. And the classification success is then
   calculated by area under ROC curve (AUC) (Wray, Yang, Goddard, &
   Visscher, [49]2010). When the ROC curve deviated from the diagonal,
   i.e. the AUC value was close to 1, the verified method was evaluated as
   better reliability.

2.3. Functional and pathway enrichment tests

   The relation of the prioritized genes with MDD was evaluated by
   analyzing the Gene Ontology (GO) biological processes or biochemical
   pathways enriched in these genes. The Database for Annotation,
   Visualization, Integration and Discovery (DAVID,
   [50]david-d.ncifcrf.gov) was used for GO term enrichment analysis,
   followed by the correction of multiple testing using the Benjamini &
   Hochberg (BH) method. And the biological processes (BP) term was
   considered as significantly enriched with a cutoff of PBH < 0.01. In
   addition, KEGG pathway analysis was performed by WebGestalt online tool
   ([51]www.webgestalt.org) (Wang, Vasaikar, Shi, Greer, & Zhang,
   [52]2017) and PBH < 0.05 was set as the cutoff criterion.

2.4. Pathway crosstalk

   The pathway crosstalk analysis was performed to further investigate the
   interactions of significantly enriched pathways of optimized
   MDD‐related genes. Two pathways are considered to crosstalk if they
   share a proportion of DEPgenes. Two measurements were introduced to
   computationally indicate the overlap of a pair of pathways: Overlap
   coefficient (OC) =
   [MATH: <mfrac><mfenced close="|" open="|"
   separators=""><mi>A</mi><mo>⋂</mo><mi>B</mi></mfenced><mrow><mo
   movablelimits="true">min</mo><mfenced close=")" open="("
   separators=""><mfenced close="|"
   open="|"><mi>A</mi></mfenced><mo>,</mo><mfenced close="|"
   open="|"><mi>B</mi></mfenced></mfenced></mrow></mfrac> :MATH]
   and Jaccard coefficient (JC) =
   [MATH: <mfrac><mfenced close="|" open="|"
   separators=""><mi>A</mi><mo>⋂</mo><mi>B</mi></mfenced><mfenced
   close="|" open="|"
   separators=""><mi>A</mi><mo>⋃</mo><mi>B</mi></mfenced></mfrac> :MATH]
   , where A and B denote the number of DEPgenes in the two pathways,
   respectively. The averages of OC and JC were calculated to reflect the
   overlap degree between pairs of pathways. And the crosstalk results
   were visualized by Cytoscape (Uzoma et al., [53]2018).

2.5. Depression‐specific network and cluster analysis by Cytoscape

   To construct a depression‐specific network, the DEPgenes were imported
   into the STRING ([54]string-db.org). The information on gene
   interaction was extracted and used to form a specific network. Module
   cluster analysis of the depression‐specific network was performed using
   the MCODE plug‐in in Cytoscape. Besides, to verify the nonrandomness of
   the obtained depression‐specific network, the following verification
   steps were performed:
    1. Random network generation: generating 1,000 random networks which
       had the same node and interaction numbers as the
       depression‐specific network using Erdos‐Renyi model in an igraph
       package of R software;
    2. Calculating the average shortest path distance (SPD) and average
       clustering coefficient (CC) of all the random networks,
       respectively.
    3. Statistics: Calculating the number of the random networks that have
       shorter SPD than MDD‐specific network and the number of random
       network that have higher CC than MDD‐specific network, which
       denoted as ND and NC, respectively.
    4. Calculating the experience p‐value: PD = ND/1,000 and
       PC = NC/1,000, which should reflect the significance of
       nonrandomness of MDD‐specific network.

3. RESULTS

3.1. Collection of MDD candidate and core genes

   A total of 23 genes were collected from OMIM (Table [55]1), which were
   regarded as core genes. Besides, 14,144 genes from Phenolyzer, 5,358
   genes from GeneCards and 149 genes from GLAD4U were collected regarded
   as MDD candidate genes. These genes were collected from multi‐source,
   and each gene is showed up in a certain source in Figure [56]1a. MDSP
   was proposed and an appropriate threshold was determined for the
   optimization of MDD candidate genes. As the optimization algorithm flow
   chart of MDD candidate genes shown in Figure [57]1b, when a gene shows
   up in a certain training category, a score of 1 point is assigned;
   otherwise, 0 is assigned. Each of the four categories has a weight
   value, which is determined by the optimization algorithm as described
   in the "Material and Methods" section. The genes are ranked by their
   combined scores computed from scores of three training categories and
   their weights. Genes are ranked and prioritized by their combined
   scores, and further analysis is performed for the selected genes.

Table 1.

   Major depressive disorder core genes collected from OMIM
   Gene symbol MIM ID Gene symbol MIM ID
   MDD1        608516 DRD4        608516
   MDD2        608516 TPH1        608516
   FKBP5       608516 HTR2C       608516
   TPH2        608516 HTR1D       608516
   HTR2A       608516 HTR1B       608516
   CALCA       608516 MAOB        608516
   DUSP1       608516 SLC6A4      608516
   MTHFR       608516 BCR         608516
   CREB1       608516 PER3        608516
   HSP90AA1    608516 APAF1       608520
   CHRM2       608516 SLC6A15     608520
   TOR1A       608516
   [58]Open in a new tab

Figure 1.

   Figure 1
   [59]Open in a new tab

   Overview of gene prioritization method. (a) Venn diagram of major
   depressive disorder (MDD)‐related candidate genes collected from
   different sources; (b) The flow chart for MDD‐related genes
   prioritization

3.2. Optimization and evaluation of MDD candidate genes

   The combined scores of all candidate genes were calculated based on the
   optimal weight matrix and the candidate gene score in each source. The
   MDD candidate genes were ranked according to the combined scores. The
   gene list and the combined scores distribution of core genes and all
   candidate genes optimized by our process are shown in Figure [60]2a.
   Most of the core genes with higher combined scores appeared in front of
   the sorted list, and only several appeared in the posterior position,
   indicating that the distribution of the candidate genes' combined
   scores was in line with our expectations.

Figure 2.

   Figure 2
   [61]Open in a new tab

   Optimization and evaluation of MDD candidate genes. (a) Distribution of
   the combined scores of all candidate genes and the core genes. The
   percentage of each histogram bin is measured by the genes with scores
   falling in the bin divided by the total number of candidate genes or
   the number of the core genes; (b) The distribution of the combined
   scores of the candidate genes. The genes are ranked by their combined
   scores. The x‐axis is the order of the candidate genes. The y‐axis on
   the left side is the combined score of the candidate genes, and the
   y‐axis on the right side is the number of core genes with higher
   combined score. (c) ROC curve of different prioritization tools. MDD:
   major depressive disorder; ROC: receiver operating characteristic

   From Figure [62]2b, it was inferred that, the score drops quickly from
   1.0 to about 0.848 and then drops to about 0.604; after that, the
   combined scores decrease slowly. Such a distribution indicated that a
   relatively small number of genes have higher combined scores, while the
   majority of genes has moderate or small scores. With a threshold of
   0.848, 65.2% of the core genes (15/23) were contained. Although with a
   threshold of 0.604, 95.7% of the core genes (22/23) could be contained,
   the number of selected candidate genes would also dramatically increase
   to 4,105. As the smaller the comprehensive score was, the higher the
   false positive rate of the prioritized gene was, 143 DEPgenes were
   identified with a threshold of 0.848 (Table [63]S1).

   Finally, the reliability of our method for prioritizing MDD candidate
   genes was compared with Phenolyzer, GeneCards and GLAD4U through ROC
   curve. As a result, AUC of MDSP (0.944) is the largest followed by
   GeneCards (0.893) and Phenolyzer (0.888), and GLAD4U had the smallest
   AUC value (0.490), which indicated that the results of the MDSP
   optimization were the best.

3.3. GO enrichment analysis

   To explore specific functional features of the 143 DEPgenes, GO
   enrichment analysis was performed using DAVID. Seventy‐two biological
   processes (BP terms) which related to synaptic transmission,
   neurodevelopment and drug reaction were significantly enriched in
   DEPgenes (Table [64]2). The GO terms related to synaptic transmission
   included synaptic transmission, regulation of synaptic transmission,
   positive regulation of synaptic transmission and negative regulation of
   synaptic transmission. The GO terms related to nerve signal
   transduction included second‐messenger‐mediated signaling, regulation
   of transmission of nerve impulse, cell surface receptor linked signal
   transduction, G‐protein coupled receptor protein signaling pathway and
   glutamate signaling pathway. The GO terms related to neurotransmitter,
   such as regulation of neurotransmitter levels, regulation of
   neurotransmitter transport, regulation of neurotransmitter uptake,
   regulation of catecholamine secretion, regulation of dopamine secretion
   and regulation of glutamate secretion, while that related to drug
   reaction (response to tropane, response to cocaine, response to
   amphetamine and response to histamine) and learning or memory were also
   significantly enriched.

Table 2.

   Significantly enriched BP terms of the 143 DEPgenes
   GO terms Biological process No. of genes p‐value PBH
   GO:0007268 Synaptic transmission 36 1.24E‐32 1.02E‐29
   GO:0019932 Second‐messenger‐mediated signaling 22 5.46E‐17 2.25E‐14
   GO:0030808 Regulation of nucleotide biosynthetic process 16 4.37E‐15
   1.19E‐12
   GO:0050804 Regulation of synaptic transmission 17 5.37E‐15 1.10E‐12
   GO:0006140 Regulation of nucleotide metabolic process 16 9.79E‐15
   1.61E‐12
   GO:0051969 Regulation of transmission of nerve impulse 17 1.89E‐14
   2.60E‐12
   GO:0031644 Regulation of neurological system process 17 3.56E‐14
   4.21E‐12
   GO:0007166 Cell surface receptor linked signal transduction 46 8.23E‐14
   8.49E‐12
   GO:0045761 Regulation of adenylate cyclase activity 14 3.60E‐13
   3.30E‐11
   GO:0007186 G‐protein coupled receptor protein signaling pathway 33
   2.50E‐11 2.06E‐09
   GO:0051046 Regulation of secretion 16 3.56E‐11 2.67E‐09
   GO:0001505 Regulation of neurotransmitter levels 11 8.09E‐11 5.57E‐09
   GO:0051952 Regulation of amine transport 9 1.12E‐10 7.10E‐09
   GO:0031280 Negative regulation of cyclase activity 10 3.17E‐10 1.87E‐08
   GO:0051350 Negative regulation of lyase activity 10 3.17E‐10 1.87E‐08
   GO:0007611 Learning or memory 12 8.15E‐10 4.49E‐08
   GO:0051050 Positive regulation of transport 15 1.55E‐09 8.01E‐08
   GO:0014073 Response to tropane 7 4.54E‐09 2.21E‐07
   GO:0042220 Response to cocaine 7 4.54E‐09 2.21E‐07
   GO:0051940 Regulation of catecholamine uptake during transmission of
   nerve impulse 5 1.66E‐08 7.64E‐07
   GO:0051588 Regulation of neurotransmitter transport 7 3.69E‐08 1.61E‐06
   GO:0051580 Regulation of neurotransmitter uptake 5 4.96E‐08 2.05E‐06
   GO:0007242 Intracellular signaling cascade 29 1.45E‐07 5.70E‐06
   GO:0009712 Catechol metabolic process 7 2.05E‐07 7.70E‐06
   GO:0006584 Catecholamine metabolic process 7 2.05E‐07 7.70E‐06
   GO:0006576 Biogenic amine metabolic process 9 7.77E‐07 2.79E‐05
   GO:0014059 Regulation of dopamine secretion 5 1.06E‐06 3.65E‐05
   GO:0051047 Positive regulation of secretion 9 1.89E‐06 6.25E‐05
   GO:0051954 Positive regulation of amine transport 5 3.16E‐06 1.00E‐04
   GO:0030003 Cellular cation homeostasis 12 3.99E‐06 1.22E‐04
   GO:0001662 Behavioral fear response 5 4.28E‐06 1.26E‐04
   GO:0031281 Positive regulation of cyclase activity 7 4.80E‐06 1.37E‐04
   GO:0006939 Smooth muscle contraction 6 4.96E‐06 1.37E‐04
   GO:0001964 Startle response 5 5.68E‐06 1.51E‐04
   GO:0050806 Positive regulation of synaptic transmission 6 5.78E‐06
   1.49E‐04
   GO:0051349 Positive regulation of lyase activity 7 5.89E‐06 1.47E‐04
   GO:0008306 Associative learning 5 7.38E‐06 1.79E‐04
   GO:0015844 Monoamine transport 5 7.38E‐06 1.79E‐04
   GO:0051971 Positive regulation of transmission of nerve impulse 6
   8.89E‐06 2.10E‐04
   GO:0043269 Regulation of ion transport 8 1.10E‐05 2.52E‐04
   GO:0014075 Response to amine stimulus 6 1.16E‐05 2.59E‐04
   GO:0031646 Positive regulation of neurological system process 6
   1.16E‐05 2.59E‐04
   GO:0008217 Regulation of blood pressure 8 1.17E‐05 2.55E‐04
   GO:0050433 Regulation of catecholamine secretion 5 1.19E‐05 2.51E‐04
   GO:0001975 Response to amphetamine 5 1.19E‐05 2.51E‐04
   GO:0050805 Negative regulation of synaptic transmission 5 1.81E‐05
   3.74E‐04
   GO:0044106 Cellular amine metabolic process 12 2.24E‐05 4.52E‐04
   GO:0042053 Regulation of dopamine metabolic process 4 2.44E‐05 4.79E‐04
   GO:0055082 Cellular chemical homeostasis 13 3.46E‐05 6.65E‐04
   GO:0042069 Regulation of catecholamine metabolic process 4 3.63E‐05
   6.82E‐04
   GO:0010959 Regulation of metal ion transport 7 3.70E‐05 6.79E‐04
   GO:0007215 Glutamate signaling pathway 5 3.74E‐05 6.71E‐04
   GO:0051970 Negative regulation of transmission of nerve impulse 5
   3.74E‐05 6.71E‐04
   GO:0060134 Prepulse inhibition 4 5.16E‐05 9.06E‐04
   GO:0060191 Regulation of lipase activity 7 5.55E‐05 9.54E‐04
   GO:0031645 Negative regulation of neurological system process 5
   5.95E‐05 1.00E‐03
   GO:0050801 Ion homeostasis 13 7.05E‐05 1.16E‐03
   GO:0032309 Icosanoid secretion 4 7.06E‐05 1.14E‐03
   GO:0050482 Arachidonic acid secretion 4 7.06E‐05 1.14E‐03
   GO:0007632 Visual behavior 5 8.98E‐05 1.43E‐03
   GO:0014048 Regulation of glutamate secretion 4 1.21E‐04 1.88E‐03
   GO:0033238 Regulation of cellular amine metabolic process 4 1.21E‐04
   1.88E‐03
   GO:0034776 Response to histamine 3 1.76E‐04 2.70E‐03
   GO:0046717 Acid secretion 4 3.35E‐04 5.03E‐03
   GO:0048699 Generation of neurons 14 3.51E‐04 5.17E‐03
   GO:0015909 Long‐chain fatty acid transport 4 5.38E‐04 7.76E‐03
   GO:0019614 Catechol catabolic process 3 5.82E‐04 8.26E‐03
   GO:0015718 Monocarboxylic acid transport 5 5.87E‐04 8.19E‐03
   GO:0032102 Negative regulation of response to external stimulus 5
   5.87E‐04 8.19E‐03
   GO:0010648 Negative regulation of cell communication 9 6.57E‐04
   9.01E‐03
   GO:0022008 Neurogenesis 14 6.97E‐04 9.39E‐03
   GO:0043271 Negative regulation of ion transport 4 7.08E‐04 9.39E‐03
   [65]Open in a new tab

   DEPgenes: depression‐related genes; GO: gene ontology.

3.4. Crosstalk among significantly enriched pathways

   Since abundant genes and pathways seemed to be involved in MDD, a
   pathway crosstalk analysis was performed to deeply investigate the
   relationship between the pathways. As shown in Figure [66]3a, 16
   significantly enriched pathways were identified, including nervous
   system pathways, such as Dopaminergic synapse, serotonergic synapse,
   glutamatergic synapse, retrograde endocannabinoid signaling and
   GABAergic synapse. Besides, the pathways related to drug addiction
   (cocaine addiction, amphetamine addiction, nicotine addiction,
   alcoholism and morphine addiction), signal transduction (cAMP signaling
   pathway, taste transduction and calcium signaling pathway) were
   enriched. Interestingly, the environmental adaptation processes
   (circadian entrainment and circadian rhythm) were also involved in the
   DEPgenes' pathways. In Figure [67]3b, it was clear that the
   significantly enriched pathways were clustered into a module which was
   relevant to the pathogenesis of neurological diseases.

Figure 3.

   Figure 3
   [68]Open in a new tab

   KEGG pathway enrichment analysis of DEPgenes. (a) Significantly
   enriched KEGG pathways of DEPgenes. The abscissa GeneRatio was the
   ratio of DEPgenes mapped to a KEGG pathway to the total number of genes
   in the pathway; (b) Visual crosstalk of KEGG pathways. The nodes size
   represented the number of DEPgenes contained in the pathway. The larger
   the node was, the more DEPgenes were included. The width of the edge
   indicated the overlapping degree of genes contained in two pathways.
   DEPgenes: depression‐related genes

3.5. MDD‐specific networks

   The information on gene interaction was extracted from the STRING
   database and used to form a specific network (Figure [69]4a). To test
   nonrandomness of the MDD‐specific network, we generated 1,000 random
   networks with same node and edge number with MDD‐specific network and
   compared their SPD and CC. As a result, the average SPD of these random
   networks was 3.4, which was significantly larger than that of the
   MDD‐specific network with an SPD of 2.5, PD < 0.001. Meanwhile, the CC
   of random networks was 0.1, which was significantly smaller than that
   of the MDD‐specific networks with a CC of 0.5 (PC < 0.001). So, the
   nonrandomness of the MDD‐specific network could be inferred.
   Furthermore, two modules were identified by the modular cluster
   analysis of MDD‐specific networks (Figure [70]4b,c). KEGG pathway
   analysis of genes contained in Figure [71]4b indicated significantly
   enriched pathways of neuroactive ligand‐receptor interaction,
   dopaminergic synapse and morphine addiction. For genes contained in
   Figure [72]4c, the serotonergic synapse was the most significantly
   enriched pathway.

Figure 4.

   Figure 4
   [73]Open in a new tab

   MDD‐specific network analysis. (a) The specific network of MDD; (b and
   c) Module Cluster analyses by MCODE. MDD: major depressive disorder

4. DISCUSSION

   Drug therapy is still the preferred current clinical treatment for MDD.
   The most widely used antidepressant drugs are selective serotonin
   reuptake inhibitors (SSRIs), including fluoxetine, citalopram, and
   sertraline, which can significantly improve cognitive function of MDD
   patients (Jakubovski, Varigonda, Freemantle, Taylor, & Bloch,
   [74]2016). However, current antidepressant drugs used clinically bring
   lots of adverse reactions, such as xerostomia, constipation,
   drowsiness, obesity, cardiotoxicity, and drug withdrawal (Fava, Gatti,
   Belaise, Guidi, & Offidani, [75]2015; Hieronymus, Emilsson, Nilsson, &
   Eriksson, [76]2016). The lack of approaches on early identification and
   intervention of MDD patients limits the establishment of safe and
   effective individualized treatment (Duman, Aghajanian, Sanacora, &
   Krystal, [77]2016). Although numerous reports of susceptibility genes
   or loci to MDD have been reported previously, no disease causal genes
   and therapeutic target genes were confirmed (Rao et al., [78]2016).
   Thus, it is important to reduce the data noise and prioritize candidate
   genes from multiple datasets and then explore their functional
   relationships for further validation (Jia, Kao, Kuo, & Zhao, [79]2011).

   In this study, we presented a complete process to collect large‐scale
   genotypic data on MDD from different sources, and provided optimization
   and comprehensive analyses for the exploration of the pathogenesis and
   treatment of depression. Twenty‐three DEPgenes from OMIM, 14,144
   DEPgenes from Phenolyzer, 5,358 DEPgenes from GeneCards and 149
   DEPgenes from GLAD4U were collected and optimized for further
   analyzation. MDSP was proposed and an appropriate threshold was
   determined for the optimization of MDD‐related genes. One hundred and
   forty‐three DEPgenes were identified and used for additional functional
   and pathway enrichment analyses. Most of these genes, such as PCDH9,
   MDD1, MDD2, CREB1 and DISC1, have been identified to be associated with
   MDD (Cacabelos, Torrellas, & Fernandez‐Novoa, [80]2016; Xiao et al.,
   [81]2018), and some of them (e.g. TPH1, GRIN2B and MAOA) were also
   related to other mental disorders (van Donkelaar et al., [82]2017;
   Perlis, [83]2016; Tovilla‐Zarate et al., [84]2014). This indicated that
   our preferred solution designed was able to be utilized to get the
   expected data.

   So far, the study of the pathogenesis of depression mainly focuses on
   the biological mechanisms, such as autophagy and apoptosis of nerve
   cells, neurotransmitter secretion disorders, immune inflammatory
   reactions, dysfunction of hypothalamus pituitary adrenal axis, and
   other biological mechanisms (Cattaneo et al., [85]2015; Menard, Hodes,
   & Russo, [86]2016; Smith, [87]2015). With functional enrichment
   analysis, a more specific functional pattern implicated in these
   DEPgenes was revealed. In this study, 72 GO BP terms and 16 KEGG
   pathways were identified to be significantly enriched. The terms
   related to synaptic transmission, nerve signal transduction,
   neurotransmitter and learning or memory reflected the pathogenesis of
   MDD, which was consistent with the literature reports. Interestingly,
   the BP term of drug reaction and the KEGG pathway of drug addiction
   were both enriched, indicating that the key requirement of avoiding
   drug dependence in MDD drug development and clinical treatment.

   The occurrence and development of MDD involve complex biological
   processes, which is the result of a combination of multiple genes and
   environmental factors. Therefore, the study of the interactions between
   DEPgenes from the perspective of networks can provide insights into the
   pathogenesis of depression and contribute to the discovery of new drug
   targets. Thus, the network information on MDD was mined from the STRING
   database which contains experimental data, the PubMed abstract text
   database and results predicted by bioinformatics methods for specific
   analysis. Besides, applied bioinformatics methods in this process
   included gene adjacency, gene fusion, phylogenetic profiles, and gene
   co‐expression based on chip data. A comprehensive score was calculated
   with the weight matrix of these different methods determined by a
   scoring mechanism demonstrated above. Finally, the core pathways
   involved in MDD were shown in the module. The pathways of neuroactive
   ligand‐receptor interaction, dopaminergic synapse and morphine
   addiction are presented in Figure [88]4b. And as shown in Figure
   [89]4c, the serotonergic synapse seemed to be higher specificity than
   other pathways. From these results, we inferred that the drug addiction
   caused by serotonin used in the treatment of MDD might relate to the
   mechanism of morphine addiction.

   The main problems that limit the development of a reliably viable MDD
   biomarker are the heterogeneity of depressive disorder pathophysiology,
   etiology, and study designs, which may bring in conflicting data. In
   this study, a systems biology framework for the genetic information
   collection, advanced function and pathway analyses for MDD was
   developed. A total of 143 DEPgenes were identified and the MDD‐specific
   network was constructed for the pathogenesis investigation and
   therapeutic methods development of MDD. Comparing with existing
   research strategies, the genetic optimization and analysis results were
   confirmed to be reliable. As most studies collected data from small
   samples sizes often consisting of fewer than 100 subjects, this study
   would contribute to improving the precision and generalizability of
   MDD‐related genes in these three databases. However, although this
   computational framework applied quantity of valuable information that
   required future validation by extensive experimental, it still provided
   a reference for the study of other complex disease.

5. ETHICS APPROVAL AND CONSENT TO PARTICIPATE

   Not applicable.

CONFLICT OF INTEREST

   The authors declare no conflict of interest.

AUTHORS' CONTRIBUTIONS

   Yi Liu and Shiyuan Zhang conceived and designed the project, Pengfei
   Fan acquired the data, Yi Liu, Pengfei Fan and Yidan Wang analyzed and
   interpreted the data, Yidan Wang and Dan Liu wrote the paper. Shiyuan
   Zhang approved the final version.

Supporting information

    
   [90]Click here for additional data file.^ (16.8KB, docx)

   Liu Y, Fan P, Zhang S, Wang Y, Liu D. Prioritization and comprehensive
   analysis of genes related to major depressive disorder. Mol Genet
   Genomic Med. 2019;7:e659 10.1002/mgg3.659

   Funding information

   Not applicable.

REFERENCES