Abstract Huntington’s disease (HD) is a progressive and fatal neurodegenerative disorder caused by an expanded CAG repeat in the huntingtin gene. Although HD is monogenic, its molecular manifestation appears highly complex and involves multiple cellular processes. The recent application of high throughput platforms such as microarrays and mass-spectrometry has indicated multiple pathogenic routes. The massive data generated by these techniques together with the complexity of the pathogenesis, however, pose considerable challenges to researchers. Network-based methods can provide valuable tools to consolidate newly generated data with existing knowledge, and to decipher the interwoven molecular mechanisms underlying HD. To facilitate research on HD in a network-oriented manner, we have developed HDNetDB, a database that integrates molecular interactions with many HD-relevant datasets. It allows users to obtain, visualize and prioritize molecular interaction networks using HD-relevant gene expression, phenotypic and other types of data obtained from human samples or model organisms. We illustrated several HDNetDB functionalities through a case study and identified proteins that constitute potential cross-talk between HD and the unfolded protein response (UPR). HDNetDB is publicly accessible at [32]http://hdnetdb.sysbiolab.eu. Introduction Huntington’s disease is an inherited neurodegenerative disorder that results from a trinucleotide (CAG) repeat expansion (>35) in the first exon of the huntingtin (HTT, IT15) gene^[33]1. Human HTT codes for a large protein of 3144 amino acids, which is ubiquitously expressed in various tissues and is present in several sub-cellular locations. Studies indicate that both loss of function of normal HTT as well as gain of function of mutant HTT contribute to neuropathological alterations in distinct regions of the brain^[34]2, [35]3. In the initial stages of HD, degeneration is detectable mainly in the striatum and cortex, whereas in later stages degeneration is also observed in other brain regions, such as the hypothalamus and hippocampus^[36]4, [37]5. Clinically, the disease is characterized by complex and variable symptoms that include movement disorders, psychiatric problems and cognitive decline^[38]2. Though HD is caused by mutation of a single gene, the disease development might involve a plethora of genes and processes^[39]6. Indeed, large variations in onset, severity and progression of HD suggest the existence of other influential molecular factors besides the mutation of HTT ^[40]7–[41]11. In order to identify genes that may modify disease onset and progression, genome-wide association and gene expression studies have been performed^[42]12, [43]13. Additionally, a large number of genes and proteins have been catalogued based on different types of experimental evidence^[44]6. Currently, identification of new targets, drugs and therapeutic strategies is at a crucial juncture, which can ultimately contribute to a delayed onset or ameliorate progression of HD. Despite considerable efforts, deciphering the precise pathological mechanisms underlying HD still requires further research. The large number of genes and the diversity of processes involved in the progression of neurological diseases in general, and HD in specific, emphasizes the need for comprehensive approaches in additional to studies of individual genes^[45]14. Integrative network models can provide powerful tools for this. The models previously been applied to analyze a wide range of human diseases and have rapidly gained popularity^[46]15–[47]17. The use of network-based approaches for examining HD is also motivated by the role of HTT. Several studies indicated that HTT interacts with a diverse array of cellular proteins^[48]18–[49]21. These interacting partners play important roles in various biological processes such as transcriptional regulation, vesicular transport and apoptosis as well as in signaling pathways such as MAPK, mTOR signaling and NOD-like receptor signaling^[50]22, [51]23. Thus, it is not surprising that large HTT-focused interaction networks have been derived by independent groups using yeast-two-hybrid (Y2H) screens or affinity purification mass spectrometry (AP-MS)^[52]20, [53]24–[54]26. While being formidable approaches, such studies require considerable expertise to assemble and analyze networks, which is a challenging task. In order to assist researchers in their pursuit to understand the disease mechanisms and to identify novel drug targets for HD, we have developed Huntington’s Disease Network DataBase (HDNetDB). It constitutes a versatile platform that integrates several levels of data and information ranging from protein-protein interactions, regulatory interactions (microRNA-target gene and transcription factor-target gene), and gene expression to drug-target information about gene, gene ontology and pathway information to phenotype data pertaining to HD (Fig. [55]1). Besides being a central resource for integrated data and information, HDNetDB also equips users with several querying and visualization options for HD-related networks. HDNetDB is freely accessible at [56]http://hdnetdb.sysbiolab.eu and requires no login. To illustrate the potential of HDNetDB for network-oriented investigations, we describe an exemplary case study focusing on the unfolded protein response (UPR) in the context of HD. Figure 1. Figure 1 [57]Open in a new tab Data and information integrated in HDNetDB. Many types of complementary data and information can be accessed, analyzed and visualized in HDNetDB. While the incorporation of generic data like the human interactome provides a backbone for unbiased network construction, the inclusion of many HD-specific data empower researchers to carry out network-oriented investigations targeting molecular processes in HD. Results Case study: a potential connection between unfolded protein response (UPR) and Huntington’s disease To illustrate the application of HDNetDB for network-based investigations, we examined the potential connection between HD and UPR, which is a complex intracellular pathway. UPR is activated upon accumulation of unfolded protein in the endoplasmic reticulum (ER). In mammalian cells, the UPR consists of three principal branches defined by signaling components located in the ER membrane: (i) ERN1 (Endoplasmic Reticulum To Nucleus Signaling 1) also referred to as IRE1 (inositol requiring enzyme 1), (ii) EIF2AK3 (Eukaryotic Translation Initiation Factor 2-Alpha Kinase 3) also referred to as PERK (protein kinase R-like ER kinase), and (iii) ATF6 (activating transcription factor 6). The main role of the UPR is to ensure homeostasis by increasing protein folding capacity within the ER, and by reduction of protein synthesis. If homeostasis cannot be re-established, persistent UPR activation can trigger cell death^[58]27. Although the UPR is well studied, its role in many diseases warrants further elucidation. This is also the case for HD, where different lines of investigation indicated a potential relevance of the UPR for the pathogenesis of HD^[59]28, [60]29. To start our network-based investigations, we collected a small set of six key proteins that were reported to be involved in the UPR signaling pathway triggered by ER stress. Besides the three key signaling components mentioned above (ERN1, EIF2AK3, ATF6), we selected transcription factors X-Box Binding Protein 1 (XBP1) and DNA-damage-inducible transcript 3 (DDIT3), also known as CHOP. Both are downstream of ERN1 and EIF2AK3. Additionally, BCL2-associated X protein (BAX) was included, which modulates UPR by a direct interaction with ERN1^[61]30, [62]31. It should be emphasized that numerous other proteins have been associated with the UPR, but we took only a small set for better illustration. Nevertheless, we would like to obtain a more comprehensive coverage of proteins associated with UPR and more importantly also of proteins that link the UPR to other processes. Thus, we queried HDNetDB with the six proteins and obtained a set of 354 interacting proteins, which we refer to here as the UPR interactome (Table [63]S1). The workflow and the UPR interactome generated by HDNetDB are presented in Fig. [64]2. In the network, the six queried proteins serve as “anchor” nodes. Figure 2. Figure 2 [65]Open in a new tab HDNetDB workflow. HDNetDB retrieves the physical and regulatory interactions found for the queried genes or proteins, and generates a network. This is visualized by larger grey and smaller yellow nodes representing the input/query and interacting proteins, while red arrows and blue edges represent regulatory and protein-protein interactions, respectively. Subsequently, the network can be examined and filtered using various complementary datasets and tools integrated in HDNetDB. We note that this example shows a common characteristic of network-based investigations: despite starting with a small number of anchor proteins, the retrieved networks are fairly large, especially for well-studied anchor proteins. This makes individual inspection of their components into a highly challenging and time-consuming task. To assist researchers here, HDNetDB offers a series of integrated tools, which enable rapid functional assessment of retrieved networks and prioritization of network components for further investigation. KEGG pathway enrichment analysis in UPR interactome To gain insights into the functional composition of the retrieved networks, we performed statistical enrichment analysis based on KEGG pathways annotations using the tool implemented on the Network page of HDNetDB. This type of analysis can identify those pathways curated in KEGG whose components are significantly overrepresented in the UPR interactome. Thus, we can verify whether we indeed obtained more proteins associated with UPR and we can identify other processes that are linked to the UPR based on the extract interactions. Results of enrichment analysis are returned to the user of HDNetDB as a table listing the detected pathways along with the number of corresponding network proteins and their statistical significance (Fig. [66]3). For the UPR interactome, the pathway “Protein processing in the ER” achieved expectably the highest significance for overrepresentation (n = 41, FDR = 4.88E^−23). Notably, apoptosis (n = 23, FDR = 5.48E^−13) and cell cycle (n = 23, FDR = 2.29E^−10) were also among the most significant KEGG pathways indicating a tight connection between the UPR and these processes within our network model of the UPR. Strikingly, we also found proteins associated with HD in the KEGG database (n = 20, FDR = 1.88E^−5) to be strongly overrepresented within the UPR interactome supporting the link between UPR and HD. Besides statistical evaluation, HDNetDB enables the highlighting of proteins associated with the detected pathways and thereby facilitates the individual inspection of pathway components. Examples for this option can be shown in Fig. [67]3. Alternatively to KEGG annotations, users can carry out functional enrichment analyses based on Gene Ontology (GO) categories for molecular functions, biological processes and cellular compartments. Figure 3. Figure 3 [68]Open in a new tab KEGG Pathway enrichment analysis. Results of enrichment analyses are returned as table listing pathways with significant overrepresentation among network proteins (right side). By a mouse click on a table row, components of the selected pathways are highlighted in the network as shown here for “Protein processing in endoplasmic reticulum” and “Huntington’s Disease”. Linking the UPR interactome to mammalian phenotypes Connecting molecular processes to phenotypes is a daunting task in biomedicine. An important help here is provided by extensive cataloging of phenotypes observed for gene knockouts in model organisms. For research into human diseases, the systematic phenotype annotations of murine genotypes provided in the Mouse Genome Informatics (MGI) database are a valuable resource. This is also the case in our study of the UPR interactome. Using the relevant tool implemented in HDNetDB, the network was examined for possible enrichment of proteins associated with HD-relevant phenotypes. Remarkably, we found that most of the selected phenotypes are highly overrepresented among the network components (Fig. [69]4). For instance, the most significant phenotype (n = 54; p = 5.8E-30) was “Decreased body weight”, which is a common characteristics of HD patients already in the early stage of disease^[70]32. Intriguingly, the UPR interactome is also strongly enriched in components linked to abnormal locomotor behavior (n = 31; p = 4.04E-14), which is a classical hallmark of HD. To our knowledge, such a connection between UPR and loss of motor control has not been put forward so far. All in all, the results of the phenotypic analysis carried out in HDNetDB suggest that the UPR interactome includes many genes whose knockout in mice lead to HD-relevant phenotypes. These genes can be readily identified interactively in HDNetDB (Fig. [71]4). Figure 4. Figure 4 [72]Open in a new tab Phenotypic enrichment analysis. HD-relevant mammalian phenotypes are listed, for which a significant enrichment among components of the UPR interactome was detected. Highlighted phenotypes are “Abnormal Locomotor Behavior” and “Abnormal Learning Memory”. Red nodes represent the genes or proteins annotated with these HD-relevant phenotypes in MGI. In silico screens with curated gene lists Besides pathways information and functional or phenotypic annotation, HDNetDB includes curated HD-relevant gene lists, which can be used for examination of networks. Overrepresentation of curated genes among network components can readily be assessed. For the UPR network, HDNetDB identified several gene sets as significantly overrepresented (Supplementary Fig. [73]S1). These include HD Therapeutic Target Genes (HDTTG) – a curated set of genes that were previously identified as potential therapeutic targets in HD^[74]6. Also, 63 HTT-interacting proteins were identified suggesting not only a functional but also a direct physical connection between UPR and (mutant) HTT. This is in line with previous findings that wild-type HTT is crucial for the integrity of the ER^[75]33. Since the poly-Q expansion results in a distinct binding behavior of mutant compared to wild-type HTT^[76]23, [77]34–[78]36, the results of our analyses suggests that the HD-causing mutation might also have a direct impact on the functioning of the UPR through aberrant protein binding. In addition, we identified a large number of genes (n = 94) that have been genetically associated with neurological diseases, supporting a link between UPR and neuropathology in general. Importantly, users can carry out in silico screens based on their own uploaded gene lists, so they are not limited to the curated genes lists provided in HDNetDB. Sequential filtering for prioritization of candidate genes Besides the elucidation of the relevance of molecular processes for HD, prioritization of candidate genes for further study and for therapeutic intervention can be carried out efficiently in HDNetDB. Every network produced by application of a filtering procedure can be used as input for another filtering step. In this way, users can define the order and criteria for a sequential filtering procedure in a flexible manner. Moreover, complementary data integrated in HDNetDB can be exploited for network-based gene selection – a strategy which has already been used effectively in molecular pharmacology^[79]37. For illustration, we carried out step-wise filtering to identify components in the UPR interactome that are (i) differentially expressed, (ii) associated previously with HD and (iii) known drug targets. The underlying motivation for these criteria was to discover proteins related to the UPR whose dysregulation can play a role in the pathogenesis of HD, and can be readily targeted by existing drugs. In the first step, the UPR interactome was filtered based on expression changes between human HD caudate nucleus and normal caudate nucleus, which are available in HDNetDB as one of many comparisons of HD-related gene expression data (Fig. [80]5a). This resulted in a network of 37 differentially regulated genes, of which 18 are up-regulated and 19 are down-regulated (Fig. [81]5b). In the next step, we filtered this network based on the criterion that the included components have been either directly or indirectly implicated in HD as described in Kalathur RK et al.^[82]6 (Fig. [83]5c). This led to the identification of network with 14 genes that are not only differentially expressed in HD but are also implicated in HD and thus may constitute a link between UPR and HD. Finally, we further filtered this network based on known drug-targets present in HDNetDB (Fig. [84]5d). Only four proteins remained after the sequential filtering that could possibly play key roles in linking HD and UPR, and can be targeted with existing drugs: histone deacetylase1 (HDAC1), jun proto-oncogene (JUN), solute carrier family 25 member 4 (SLC25A4) and 3-hydroxy-3-methylglutaryl-CoA reductase (HMGCR) (Fig. [85]5d). Figure 5. Figure 5 [86]Open in a new tab Sequential filtering of UPR interactome. (a) Initial UPR network; (b) Network after filtering by differential expression in human HD caudate nucleus samples using the criteria log2 fold change ≥ +0.25 and ≤−0.25; (c) Network of dysregulated components which have been indicated as potential therapeutic targets for HD; (d) Final network obtained after filtering using protein-drug target information (DrugBank). Red and green nodes indicate up and down regulated genes, respectively, and large grey nodes represent query proteins. HDAC1 is a component of the histone deactylase complex and plays an important role in regulation of gene expression. It can act as a molecular switch between neuronal survival and death by interacting with HDRP and HDAC3 respectively^[87]38. A recent study showed that targeting HDAC1 with HDAC inhibitors resulted in an improvement in HD related phenotypes in different HD model systems^[88]39. The second gene, JUN, is a transcription factor and component of the AP-1 transcription complex that plays a key in role in neural development. Studies have also shown that there is a strong induction of JUN both at the gene and protein level in several human neurodegenerative diseases such as Alzheimer’s dementia^[89]40, [90]41, Parkinson’s disease^[91]42, [92]43 and amyotrophic lateral sclerosis^[93]44. The third known drug target we identified was SLC25A4 (ANT1), which is a member of a subfamily of solute carrier proteins that help in translocation of ADP from cytoplasm to mitochondrial matrix as well as of ATP from mitochondrial matrix to cytoplasm. In addition, SLC25A4 also regulates the mitochondrial permeability transition pore that initiates apoptosis. It has been speculated that increased expression of SLC25A4 following a brain injury has biphasic consequence: the initial repair of damaged cells and neurons by increasing ATP export but eventual destruction of damaged cells by apoptosis if the damage is beyond repair^[94]45, [95]46. The final gene (HMGCR) identified is an ER resident transmembrane glycoprotein and the rate-limiting enzyme for cholesterol synthesis. Perturbations of the cholesterol metabolism have been reported in HD models as well as in HD patients^[96]47. HMGCR levels are regulated in response to sterols by ubiquitin-proteasome system through ER-associated degradation (ERAD) pathway. As mutant HTT has been shown to impair ERAD in cellular models of HD, and thereby to interfere with protein homeostasis in the ER^[97]48, it is conceivable that an activated UPR might have an impact on the cholesterol synthesis in HD patients. Taken together, the reported findings indicate that the identified genes can provide attractive targets in the context of UPR and HD, although a more comprehensive evaluation is certainly warranted. Discussion HD is a fatal neurodegenerative disease with no known cure. Although it is caused by mutation of a single gene, its molecular manifestation appears to be highly complex and includes numerous processes. To help researchers better cope with the molecular complexity of HD, we have developed HDNetDB. Its development was motivated by our own experiences in network-oriented analyses of HD. Although a large number of tools for analysis of interaction network exists (including our UniHI database^[98]49), they are generally generic and require laborious data handling to study selected aspects of a specific disease such as HD. HDNetDB can help to overcome these limitations. It is a flexible platform that is customized for HD research. It integrates different types of data ranging from molecular interactions, drug-target information, HD associated genes and their expression in different model organisms and in humans. HDNetDB was designed to provide easy access to the results of a query. The retrieved data are presented simultaneously on four pages (Proteins, Physical Interactions, Regulatory Interactions and Network) enabling the user to switch between different types of information (Fig. [99]6). The Proteins page gives an overview of the genes and proteins matching the query in the database. Proteins, which should not be included as anchor nodes in the generated network, can be excluded. In our case study, for instance, we excluded X-box binding protein 1 pseudogene 1 (XBP1P1). It was found because one of its aliases is the same as the gene symbol of our query protein XBP. Such exclusion on the Proteins page will automatically update the network presented on the Network page. The Physical and Regulatory Interactions pages list all interaction partners found at the level of physical protein association or gene regulation for the queried genes and proteins. The sources from which each interaction has been retrieved are shown, and hyperlinks to these sources are provided, if available. In addition, different types of information regarding the individual interaction are given including the methods that were used for identification of the interactions as well as quality scores such as functional co-annotation and co-expression in human tissues. On both Interactions pages, options to download the full set of interactions are provided. Finally, the Network page displays a graphical visualization of the retrieved network. In addition to simple network visualization, a battery of tools for interactive network analysis is available on the side bar of the Network page. First, filtering of interactions can be carried out based on source, type, topology, experimental derivation and number of PubMed references attributed to