Abstract

   Huntington’s disease (HD) is a progressive and fatal neurodegenerative
   disorder caused by an expanded CAG repeat in the huntingtin gene.
   Although HD is monogenic, its molecular manifestation appears highly
   complex and involves multiple cellular processes. The recent
   application of high throughput platforms such as microarrays and
   mass-spectrometry has indicated multiple pathogenic routes. The massive
   data generated by these techniques together with the complexity of the
   pathogenesis, however, pose considerable challenges to researchers.
   Network-based methods can provide valuable tools to consolidate newly
   generated data with existing knowledge, and to decipher the interwoven
   molecular mechanisms underlying HD. To facilitate research on HD in a
   network-oriented manner, we have developed HDNetDB, a database that
   integrates molecular interactions with many HD-relevant datasets. It
   allows users to obtain, visualize and prioritize molecular interaction
   networks using HD-relevant gene expression, phenotypic and other types
   of data obtained from human samples or model organisms. We illustrated
   several HDNetDB functionalities through a case study and identified
   proteins that constitute potential cross-talk between HD and the
   unfolded protein response (UPR). HDNetDB is publicly accessible at
   [32]http://hdnetdb.sysbiolab.eu.

Introduction

   Huntington’s disease is an inherited neurodegenerative disorder that
   results from a trinucleotide (CAG) repeat expansion (>35) in the first
   exon of the huntingtin (HTT, IT15) gene^[33]1. Human HTT codes for a
   large protein of 3144 amino acids, which is ubiquitously expressed in
   various tissues and is present in several sub-cellular locations.
   Studies indicate that both loss of function of normal HTT as well as
   gain of function of mutant HTT contribute to neuropathological
   alterations in distinct regions of the brain^[34]2, [35]3. In the
   initial stages of HD, degeneration is detectable mainly in the striatum
   and cortex, whereas in later stages degeneration is also observed in
   other brain regions, such as the hypothalamus and hippocampus^[36]4,
   [37]5. Clinically, the disease is characterized by complex and variable
   symptoms that include movement disorders, psychiatric problems and
   cognitive decline^[38]2. Though HD is caused by mutation of a single
   gene, the disease development might involve a plethora of genes and
   processes^[39]6. Indeed, large variations in onset, severity and
   progression of HD suggest the existence of other influential molecular
   factors besides the mutation of HTT ^[40]7–[41]11. In order to identify
   genes that may modify disease onset and progression, genome-wide
   association and gene expression studies have been performed^[42]12,
   [43]13. Additionally, a large number of genes and proteins have been
   catalogued based on different types of experimental evidence^[44]6.
   Currently, identification of new targets, drugs and therapeutic
   strategies is at a crucial juncture, which can ultimately contribute to
   a delayed onset or ameliorate progression of HD.

   Despite considerable efforts, deciphering the precise pathological
   mechanisms underlying HD still requires further research. The large
   number of genes and the diversity of processes involved in the
   progression of neurological diseases in general, and HD in specific,
   emphasizes the need for comprehensive approaches in additional to
   studies of individual genes^[45]14. Integrative network models can
   provide powerful tools for this. The models previously been applied to
   analyze a wide range of human diseases and have rapidly gained
   popularity^[46]15–[47]17. The use of network-based approaches for
   examining HD is also motivated by the role of HTT. Several studies
   indicated that HTT interacts with a diverse array of cellular
   proteins^[48]18–[49]21. These interacting partners play important roles
   in various biological processes such as transcriptional regulation,
   vesicular transport and apoptosis as well as in signaling pathways such
   as MAPK, mTOR signaling and NOD-like receptor signaling^[50]22, [51]23.
   Thus, it is not surprising that large HTT-focused interaction networks
   have been derived by independent groups using yeast-two-hybrid (Y2H)
   screens or affinity purification mass spectrometry (﻿AP﻿-MS)^[52]20,
   [53]24–[54]26. While being formidable approaches, such studies require
   considerable expertise to assemble and analyze networks, which is a
   challenging task. In order to assist researchers in their pursuit to
   understand the disease mechanisms and to identify novel drug targets
   for HD, we have developed Huntington’s Disease Network DataBase
   (HDNetDB). It constitutes a versatile platform that integrates several
   levels of data and information ranging from protein-protein
   interactions, regulatory interactions (microRNA-ta﻿r﻿get gene and
   transcription factor-target gene), and gene expression to drug-target
   information about gene, gene ontology and pathway information to
   phenotype data pertaining to HD (Fig. [55]1). Besides being a central
   resource for integrated data and information, HDNetDB also equips users
   with several querying and visualization options for HD-related
   networks. HDNetDB is freely accessible at
   [56]http://hdnetdb.sysbiolab.eu and requires no login. To illustrate
   the potential of HDNetDB for network-oriented investigations, we
   describe an exemplary case study focusing on the unfolded protein
   response (UPR) in the context of HD.

Figure 1.

   Figure 1
   [57]Open in a new tab

   Data and information integrated in HDNetDB. Many types of complementary
   data and information can be accessed, analyzed and visualized in
   HDNetDB. While the incorporation of generic data like the human
   interactome provides a backbone for unbiased network construction, the
   inclusion of many HD-specific data empower researchers to carry out
   network-oriented investigations targeting molecular processes in HD.

Results

Case study: a potential connection between unfolded protein response (UPR)
and Huntington’s disease

   To illustrate the application of HDNetDB for network-based
   investigations, we examined the potential connection between HD and
   UPR, which is a complex intracellular pathway. UPR is activated upon
   accumulation of unfolded protein in the endoplasmic reticulum (ER). In
   mammalian cells, the UPR consists of three principal branches defined
   by signaling components located in the ER membrane: (i) ERN1
   (Endoplasmic Reticulum To Nucleus Signaling 1) also referred to as IRE1
   (inositol requiring enzyme 1), (ii) EIF2AK3 (Eukaryotic Translation
   Initiation Factor 2-Alpha Kinase 3) also referred to as PERK (protein
   kinase R-like ER kinase), and (iii) ATF6 (activating transcription
   factor 6). The main role of the UPR is to ensure homeostasis by
   increasing protein folding capacity within the ER, and by reduction of
   protein synthesis. If homeostasis cannot be re-established, persistent
   UPR activation can trigger cell death^[58]27. Although the UPR is well
   studied, its role in many diseases warrants further elucidation. This
   is also the case for HD, where different lines of investigation
   indicated a potential relevance of the UPR for the pathogenesis of
   HD^[59]28, [60]29.

   To start our network-based investigations, we collected a small set of
   six key proteins that were reported to be involved in the UPR signaling
   pathway triggered by ER stress. Besides the three key signaling
   components mentioned above (ERN1, EIF2AK3, ATF6), we selected
   transcription factors X-Box Binding Protein 1 (XBP1) and
   DNA-damage-inducible transcript 3 (DDIT3), also known as CHOP. Both are
   downstream of ERN1 and EIF2AK3. Additionally, BCL2-associated X protein
   (BAX) was included, which modulates UPR by a direct interaction with
   ERN1^[61]30, [62]31. It should be emphasized that numerous other
   proteins have been associated with the UPR, but we took only a small
   set for better illustration. Nevertheless, we would like to obtain a
   more comprehensive coverage of proteins associated with UPR and more
   importantly also of proteins that link the UPR to other processes.
   Thus, we queried HDNetDB with the six proteins and obtained a set of
   354 interacting proteins, which we refer to here as the UPR interactome
   (Table [63]S1). The workflow and the UPR interactome generated by
   HDNetDB are presented in Fig. [64]2. In the network, the six queried
   proteins serve as “anchor” nodes.

Figure 2.

   Figure 2
   [65]Open in a new tab

   HDNetDB workflow. HDNetDB retrieves the physical and regulatory
   interactions found for the queried genes or proteins, and generates a
   network. This is visualized by larger grey and smaller yellow nodes
   representing the input/query and interacting proteins, while red arrows
   and blue edges represent regulatory and protein-protein interactions,
   respectively. Subsequently, the network can be examined and filtered
   using various complementary datasets and tools integrated in HDNetDB.

   We note that this example shows a common characteristic of
   network-based investigations: despite starting with a small number of
   anchor proteins, the retrieved networks are fairly large, especially
   for well-studied anchor proteins. This makes individual inspection of
   their components into a highly challenging and time-consuming task. To
   assist researchers here, HDNetDB offers a series of integrated tools,
   which enable rapid functional assessment of retrieved networks and
   prioritization of network components for further investigation.

KEGG pathway enrichment analysis in UPR interactome

   To gain insights into the functional composition of the retrieved
   networks, we performed statistical enrichment analysis based on KEGG
   pathways annotations using the tool implemented on the Network page of
   HDNetDB. This type of analysis can identify those pathways curated in
   KEGG whose components are significantly overrepresented in the UPR
   interactome. Thus, we can verify whether we indeed obtained more
   proteins associated with UPR and we can identify other processes that
   are linked to the UPR based on the extract interactions. Results of
   enrichment analysis are returned to the user of HDNetDB as a table
   listing the detected pathways along with the number of corresponding
   network proteins and their statistical significance (Fig. [66]3). For
   the UPR interactome, the pathway “Protein processing in the ER”
   achieved expectably the highest significance for overrepresentation
   (n = 41, FDR = 4.88E^−23). Notably, apoptosis (n = 23, FDR = 5.48E^−13)
   and cell cycle (n = 23, FDR = 2.29E^−10) were also among the most
   significant KEGG pathways indicating a tight connection between the UPR
   and these processes within our network model of the UPR. Strikingly, we
   also found proteins associated with HD in the KEGG database (n = 20,
   FDR = 1.88E^−5) to be strongly overrepresented within the UPR
   interactome supporting the link between UPR and HD. Besides statistical
   evaluation, HDNetDB enables the highlighting of proteins associated
   with the detected pathways and thereby facilitates the individual
   inspection of pathway components. Examples for this option can be shown
   in Fig. [67]3. Alternatively to KEGG annotations, users can carry out
   functional enrichment analyses based on Gene Ontology (GO) categories
   for molecular functions, biological processes and cellular
   compartments.

Figure 3.

   Figure 3
   [68]Open in a new tab

   KEGG Pathway enrichment analysis. Results of enrichment analyses are
   returned as table listing pathways with significant overrepresentation
   among network proteins (right side). By a mouse click on a table row,
   components of the selected pathways are highlighted in the network as
   shown here for “Protein processing in endoplasmic reticulum” and
   “Huntington’s Disease”.

Linking the UPR interactome to mammalian phenotypes

   Connecting molecular processes to phenotypes is a daunting task in
   biomedicine. An important help here is provided by extensive cataloging
   of phenotypes observed for gene knockouts in model organisms. For
   research into human diseases, the systematic phenotype annotations of
   murine genotypes provided in the Mouse Genome Informatics (MGI)
   database are a valuable resource. This is also the case in our study of
   the UPR interactome. Using the relevant tool implemented in HDNetDB,
   the network was examined for possible enrichment of proteins associated
   with HD-relevant phenotypes. Remarkably, we found that most of the
   selected phenotypes are highly overrepresented among the network
   components (Fig. [69]4). For instance, the most significant phenotype
   (n = 54; p = 5.8E-30) was “Decreased body weight”, which is a common
   characteristics of HD patients already in the early stage of
   disease^[70]32. Intriguingly, the UPR interactome is also strongly
   enriched in components linked to abnormal locomotor behavior (n = 31;
   p = 4.04E-14), which is a classical hallmark of HD. To our knowledge,
   such a connection between UPR and loss of motor control has not been
   put forward so far. All in all, the results of the phenotypic analysis
   carried out in HDNetDB suggest that the UPR interactome includes many
   genes whose knockout in mice lead to HD-relevant phenotypes. These
   genes can be readily identified interactively in HDNetDB (Fig. [71]4).

Figure 4.

   Figure 4
   [72]Open in a new tab

   Phenotypic enrichment analysis. HD-relevant mammalian phenotypes are
   listed, for which a significant enrichment among components of the UPR
   interactome was detected. Highlighted phenotypes are “Abnormal
   Locomotor Behavior” and “Abnormal Learning Memory”. Red nodes represent
   the genes or proteins annotated with these HD-relevant phenotypes in
   MGI.

In silico screens with curated gene lists

   Besides pathways information and functional or phenotypic annotation,
   HDNetDB includes curated HD-relevant gene lists, which can be used for
   examination of networks. Overrepresentation of curated genes among
   network components can readily be assessed. For the UPR network,
   HDNetDB identified several gene sets as significantly overrepresented
   (Supplementary Fig. [73]S1). These include HD Therapeutic Target Genes
   (HDTTG) – a curated set of genes that were previously identified as
   potential therapeutic targets in HD^[74]6. Also, 63 HTT-interacting
   proteins were identified suggesting not only a functional but also a
   direct physical connection between UPR and (mutant) HTT. This is in
   line with previous findings that wild-type HTT is crucial for the
   integrity of the ER^[75]33. Since the poly-Q expansion results in a
   distinct binding behavior of mutant compared to wild-type HTT^[76]23,
   [77]34–[78]36, the results of our analyses suggests that the HD-causing
   mutation might also have a direct impact on the functioning of the UPR
   through aberrant protein binding. In addition, we identified a large
   number of genes (n = 94) that have been genetically associated with
   neurological diseases, supporting a link between UPR and neuropathology
   in general. Importantly, users can carry out in silico screens based on
   their own uploaded gene lists, so they are not limited to the curated
   genes lists provided in HDNetDB.

Sequential filtering for prioritization of candidate genes

   Besides the elucidation of the relevance of molecular processes for HD,
   prioritization of candidate genes for further study and for therapeutic
   intervention can be carried out efficiently in HDNetDB. Every network
   produced by application of a filtering procedure can be used as input
   for another filtering step. In this way, users can define the order and
   criteria for a sequential filtering procedure in a flexible manner.
   Moreover, complementary data integrated in HDNetDB can be exploited for
   network-based gene selection – a strategy which has already been used
   effectively in molecular pharmacology^[79]37. For illustration, we
   carried out step-wise filtering to identify components in the UPR
   interactome that are (i) differentially expressed, (ii) associated
   previously with HD and (iii) known drug targets. The underlying
   motivation for these criteria was to discover proteins related to the
   UPR whose dysregulation can play a role in the pathogenesis of HD, and
   can be readily targeted by existing drugs. In the first step, the UPR
   interactome was filtered based on expression changes between human HD
   caudate nucleus and normal caudate nucleus, which are available in
   HDNetDB as one of many comparisons of HD-related gene expression data
   (Fig. [80]5a). This resulted in a network of 37 differentially
   regulated genes, of which 18 are up-regulated and 19 are down-regulated
   (Fig. [81]5b). In the next step, we filtered this network based on the
   criterion that the included components have been either directly or
   indirectly implicated in HD as described in Kalathur RK et al.^[82]6
   (Fig. [83]5c). This led to the identification of network with 14 genes
   that are not only differentially expressed in HD but are also
   implicated in HD and thus may constitute a link between UPR and HD.
   Finally, we further filtered this network based on known drug-targets
   present in HDNetDB (Fig. [84]5d). Only four proteins remained after the
   sequential filtering that could possibly play key roles in linking HD
   and UPR, and can be targeted with existing drugs: histone deacetylase1
   (HDAC1), jun proto-oncogene (JUN), solute carrier family 25 member 4
   (SLC25A4) and 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR)
   (Fig. [85]5d).

Figure 5.

   Figure 5
   [86]Open in a new tab

   Sequential filtering of UPR interactome. (a) Initial UPR network; (b)
   Network after filtering by differential expression in human HD caudate
   nucleus samples using the criteria log2 fold change ≥ +0.25 and ≤−0.25;
   (c) Network of dysregulated components which have been indicated as
   potential therapeutic targets for HD; (d) Final network obtained after
   filtering using protein-drug target information (DrugBank). Red and
   green nodes indicate up and down regulated genes, respectively, and
   large grey nodes represent query proteins.

   HDAC1 is a component of the histone deactylase complex and plays an
   important role in regulation of gene expression. It can act as a
   molecular switch between neuronal survival and death by interacting
   with HDRP and HDAC3 respectively^[87]38. A recent study showed that
   targeting HDAC1 with HDAC inhibitors resulted in an improvement in HD
   related phenotypes in different HD model systems^[88]39. The second
   gene, JUN, is a transcription factor and component of the AP-1
   transcription complex that plays a key in role in neural development.
   Studies have also shown that there is a strong induction of JUN both at
   the gene and protein level in several human neurodegenerative diseases
   such as Alzheimer’s dementia^[89]40, [90]41, Parkinson’s
   disease^[91]42, [92]43 and amyotrophic lateral sclerosis^[93]44. The
   third known drug target we identified was SLC25A4 (ANT1), which is a
   member of a subfamily of solute carrier proteins that help in
   translocation of ADP from cytoplasm to mitochondrial matrix as well as
   of ATP from mitochondrial matrix to cytoplasm. In addition, SLC25A4
   also regulates the mitochondrial permeability transition pore that
   initiates apoptosis. It has been speculated that increased expression
   of SLC25A4 following a brain injury has biphasic consequence: the
   initial repair of damaged cells and neurons by increasing ATP export
   but eventual destruction of damaged cells by apoptosis if the damage is
   beyond repair^[94]45, [95]46. The final gene (HMGCR) identified is an
   ER resident transmembrane glycoprotein and the rate-limiting enzyme for
   cholesterol synthesis. Perturbations of the cholesterol metabolism have
   been reported in HD models as well as in HD patients^[96]47. HMGCR
   levels are regulated in response to sterols by ubiquitin-proteasome
   system through ER-associated degradation (ERAD) pathway. As mutant HTT
   has been shown to impair ERAD in cellular models of HD, and thereby to
   interfere with protein homeostasis in the ER^[97]48, it is conceivable
   that an activated UPR might have an impact on the cholesterol synthesis
   in HD patients. Taken together, the reported findings indicate that the
   identified genes can provide attractive targets in the context of UPR
   and HD, although a more comprehensive evaluation is certainly
   warranted.

Discussion

   HD is a fatal neurodegenerative disease with no known cure. Although it
   is caused by mutation of a single gene, its molecular manifestation
   appears to be highly complex and includes numerous processes. To help
   researchers better cope with the molecular complexity of HD, we have
   developed HDNetDB. Its development was motivated by our own experiences
   in network-oriented analyses of HD. Although a large number of tools
   for analysis of interaction network exists (including our UniHI
   database^[98]49), they are generally generic and require laborious data
   handling to study selected aspects of a specific disease such as HD.
   HDNetDB can help to overcome these limitations. It is a flexible
   platform that is customized for HD research. It integrates different
   types of data ranging from molecular interactions, drug-target
   information, HD associated genes and their expression in different
   model organisms and in humans.

   HDNetDB was designed to provide easy access to the results of a query.
   The retrieved data are presented simultaneously on four pages
   (Proteins, Physical Interactions, Regulatory Interactions and Network)
   enabling the user to switch between different types of
   information (Fig. [99]6). The Proteins page gives an overview of the
   genes and proteins matching the query in the database. Proteins, which
   should not be included as anchor nodes in the generated network, can be
   excluded. In our case study, for instance, we excluded X-box binding
   protein 1 pseudogene 1 (XBP1P1). It was found because one of its
   aliases is the same as the gene symbol of our query protein XBP. Such
   exclusion on the Proteins page will automatically update the network
   presented on the Network page. The Physical and Regulatory Interactions
   pages list all interaction partners found at the level of physical
   protein association or gene regulation for the queried genes and
   proteins. The sources from which each interaction has been retrieved
   are shown, and hyperlinks to these sources are provided, if available.
   In addition, different types of information regarding the individual
   interaction are given including the methods that were used for
   identification of the interactions as well as quality scores such as
   functional co-annotation and co-expression in human tissues. On both
   Interactions pages, options to download the full set of interactions
   are provided. Finally, the Network page displays a graphical
   visualization of the retrieved network. In addition to simple network
   visualization, a battery of tools for interactive network analysis is
   available on the side bar of the Network page. First, filtering of
   interactions can be carried out based on source, type, topology,
   experimental derivation and number of PubMed references attributed to