Abstract
The accumulation of misfolded and aggregated proteins is a hallmark of
neurodegenerative proteinopathies. Although multiple genetic loci have
been associated with specific neurodegenerative diseases (NDs),
molecular mechanisms that may have a broader relevance for most or all
proteinopathies remain poorly resolved. In this study, we developed a
multi‐layered network expansion (MLnet) model to predict protein
modifiers that are common to a group of diseases and, therefore, may
have broader pathophysiological relevance for that group. When applied
to the four NDs Alzheimer's disease (AD), Huntington's disease, and
spinocerebellar ataxia types 1 and 3, we predicted multiple members of
the insulin pathway, including PDK1, Akt1, InR, and sgg (GSK‐3β), as
common modifiers. We validated these modifiers with the help of four
Drosophila ND models. Further evaluation of Akt1 in human cell‐based ND
models revealed that activation of Akt1 signaling by the small molecule
SC79 increased cell viability in all models. Moreover, treatment of AD
model mice with SC79 enhanced their long‐term memory and ameliorated
dysregulated anxiety levels, which are commonly affected in AD
patients. These findings validate MLnet as a valuable tool to uncover
molecular pathways and proteins involved in the pathophysiology of
entire disease groups and identify potential therapeutic targets that
have relevance across disease boundaries. MLnet can be used for any
group of diseases and is available as a web tool at
[66]http://ssbio.cau.ac.kr/software/mlnet.
Keywords: common modifier, insulin signaling pathway, multi‐layered
network expansion, neurodegenerative diseases, proteostasis
Subject Categories: Computational Biology, Molecular Biology of
Disease, Neuroscience
__________________________________________________________________
MLnet is a multi‐layered network expansion model that finds proteins
with pathophysiological relevance for groups of diseases. Application
to four neurodegenerative diseases predicts multiple members of the
insulin pathway as common modifiers.
graphic file with name MSB-19-e11801-g006.jpg
Introduction
Neurodegenerative diseases (NDs) that cause reduced cognition and/or
motor function due to extensive loss of neuronal cells affect millions
of people worldwide (Erkkinen et al, [67]2018). The neuronal loss in
NDs, such as Alzheimer's disease (AD), Parkinson's disease (PD),
Huntington's diseases (HD) and spinocerebellar ataxias (SCAs), is
believed to be caused by the abnormal accumulation of misfolded or
aggregated proteins (Ross & Poirier, [68]2004; Chiti & Dobson,
[69]2017; Calabrese et al, [70]2022). For all NDs, autosomal dominant
disease‐causing mutations have been identified (St George‐Hyslop
et al, [71]1987, 21; Campion et al, [72]1995; Roos, [73]2010; Klein &
Westenberger, [74]2012). However, with the exception of diseases caused
by CAG repeats (Gusella & MacDonald, [75]2006), familial forms with
disease‐causing mutations represent a small minority of all cases of a
given ND type (Bertram & Tanzi, [76]2005). A picture has emerged
whereby multiple genetic loci are associated with specific NDs,
consistent with a polygenic model in which multiple genes may interact
in a synergistic or additive way to promote disease development (Ridge
et al, [77]2016). Even for the case of familial NDs that are associated
with a high penetrance disease‐causing mutation, genetic variation has
been shown to affect the phenotype. Indeed, only between 40 and 70% of
the variance in the age of onset of HD and SCA can be accounted for by
the CAG repeat number in the disease‐causing allele (Wexler
et al, [78]2004; Tezenas du Montcel et al, [79]2014).
As a result of these findings, significant efforts have been undertaken
in the last two decades to identify genetic modifiers of NDs.
Classically, genetic modifiers are studied in the context of a
deterministic disease‐causing mutation and identified as those genes
that affect disease severity and/or age of disease onset (Rahit &
Tarailo‐Graovac, [80]2020). A powerful and systematic way of
identifying modifier genes and pathways that impact NDs is to perform
genetic screens in invertebrate models. Disease‐causing mutant genes
have been used to generate various ND models in D. melanogaster, C.
elegans, and S. cerevisiae, which have then enabled the identification
of hundreds of modifiers via high‐throughput genetic screens
(Fernandez‐Funez et al, [81]2000; Outeiro & Lindquist, [82]2003; Bilen
& Bonini, [83]2007; van Ham et al, [84]2008, [85]2009; Wang
et al, [86]2009; Moloney et al, [87]2010; Bloom, [88]2014; Shulman
et al, [89]2014; Lavoy et al, [90]2018). Mapping of these modifiers has
shed clear light on a broad range of processes that can modulate NDs,
including RNA metabolism, protein folding, autophagy, and apoptosis,
and has sparked hope for the identification of new targets for
therapeutic intervention.
NDs belong to the ever‐growing group of diseases called proteinopathies
(Hipp et al, [91]2014) because intracellular protein misfolding and
aggregation are common to these diseases. Protein homeostasis
(proteostasis) is crucial to the prevention of protein aggregation and
has been demonstrated to decline with age and in proteinopathies (Balch
et al, [92]2008; Labbadia & Morimoto, [93]2015; Hipp et al, [94]2019).
Given the fact that protein misfolding and aggregation is common to
proteinopathies and modifiers of one proteinopathy can influence
another, e.g., a significant fraction of SCA3 modifiers in Drosophila
had similar effects in Alzheimer models, we hypothesized that there may
exist a subset of genetic modifiers that has broader relevance and may
modify several or even all proteinopathies. Such common or generic
modifiers may be central hubs in proteostatic control or key regulators
of the cellular stress response. A bioinformatics analysis that we
carried out previously on existing modifier sets revealed, however,
only a small and incoherent set of modifiers that were identified in
multiple ND models (Na et al, [95]2013), which may be due to the
limited power and coverage of high‐throughput screens for modifiers.
Therefore, we set out to develop a robust computational framework that,
with the help of data integration, predicts protein modifiers common to
multiple diseases. We believe that identifying modifiers is not only
relevant for a better understanding of the pathophysiology of
proteinopathies but may also be useful from a disease monitoring and
therapeutic point of view. Common modifiers may serve as biomarkers,
and monitoring their activity indicate disease risk across multiple
proteinopathies. Moreover, altering the activity of the common
modifiers directly or indirectly may slow disease progression or delay
the age of disease onset independent of the type of proteinopathy.
The multi‐layered network expansion model (MLnet), that we introduce
here, combines transcriptome, transcription‐target relationship,
protein–protein interaction (PPI), as well as meta‐data for the
reliable identification of proteins that commonly affect multiple
diseases. Using known AD, HD, and SCA modifiers as input, MLnet
identifies many proteins in the insulin pathway as common ND modifiers.
We validate predicted modifiers in Drosophila as well as mammalian cell
models of AD, HD and SCA (Fig [96]1A). Following up on these results,
we then show that activation of Akt1, a central hub in the insulin
pathway, alleviates long‐term memory decline and ameliorates altered
anxiety levels in the APP/PS1 transgenic AD mouse model. Our extensive
experimental testing validates the ability of MLnet to identify generic
modifier proteins that are common to a disease group.
Figure 1. Overview of the workflow and the multi‐layered network expansion
model.
Figure 1
[97]Open in a new tab
1. Workflow of this study: From the identification of disease‐specific
modifiers to the testing of the activation of a common modifier in
an AD mouse model.
2. Overall architecture of MLnet. It consists of two modules. In the
first module (top), disease‐specific modifiers are predicted using
the well‐established guilt‐by‐association principle and available
annotations. In the second step, the top 100 predicted
disease‐specific modifiers are used as seed proteins to predict
common modifiers across multiple diseases. This prediction is done
by using individual protein–protein interaction disease layers
(bottom), and the idea that common protein modifiers should link
disease‐specific modifiers across the different layers.
Results
Motivated by our hypothesis that there may exist a subset of proteins
that modify the severity of multiple NDs, we aimed to develop a
computational framework that allows for the identification of modifiers
that are common to an entire disease group. Although computational
methods for the prediction of disease‐associated genes and proteins
have been developed before (Zolotareva & Kleine, [98]2019;
Le, [99]2020; Chen et al, [100]2021; Ruan & Wang, [101]2021; Binder
et al, [102]2022), no prediction methods exist, to the best of our
knowledge, for the identification of proteins that commonly affect
multiple diseases. Therefore, we developed the multi‐layered network
expansion model, MLnet, as a general framework for the identification
of modifier proteins common to a disease group and then used known ND
modifiers as MLnet input in order to find proteins that may have
broader relevance for proteinopathies.
MLnet model
MLnet consists of two modules (Fig [103]1B). The first module predicts
disease‐specific modifiers while the second integrates these
predictions in multi‐layered modifier networks. The former is necessary
because of imbalances in the knowledge of modifiers for different
diseases, i.e., there may exist specific disease types with very few
known modifiers, which will hamper any effort to identify common ones.
For example, though we could find more than 100 reliable modifiers for
AD and HD, only 36 and 59 modifiers for SCA1 and SCA3, respectively,
were available (see [104]Methods and Protocols, and Appendix
Fig [105]S1 for details).
Disease‐specific modifier predictions by the first module are made by
the well‐established guilt‐by‐association principle and gene
prioritization (Zolotareva & Kleine, [106]2019). Specifically, the
module predicts so‐far unknown disease‐specific modifiers based on the
similarity between a query gene and known genetic modifiers. The
following features are used for disease‐specific modifier
identification: GeneOntology (GO) annotations, InterPro domain content,
gene regulation relationships (Murali et al, [107]2010), gene
co‐expression data (GEO), KEGG pathway associations, and sequence
similarities, which are all well‐known features successfully used in
guilt‐by‐association approaches (Aerts et al, [108]2006; Zolotareva &
Kleine, [109]2019). Since we used six different features, a gene can
have up to six different scores depending on their information
availability and consequently up to six different ranks. To generate a
consensus list, we then integrated predicted ranks from each feature
into one single P‐value via prioritization (Aerts et al, [110]2006).
The detailed statistical calculations are explained in [111]Methods and
Protocols. From each list of disease‐specific modifiers, we then
selected the proteins encoded by the top‐ranked genes as “seed” inputs
for the second module.
The second module of MLnet generates disease‐specific modifier networks
by mapping “seed” proteins of each disease on the PPI network of the
model organism of interest. The module then identifies potential common
modifiers by finding proteins that interact with or are modifiers in
different disease‐specific modifier networks (Fig [112]1B). In the
simplified example provided in Appendix Fig [113]S2, only two layers
are used. These layers are created by assuming that PPIs are identical
in each disease and by mapping seeds predicted by the first module onto
the individual PPI networks. In the first integration step, MLnet finds
proteins that interact with at least one modifier in each layer, which
in the given example is realized by the green protein because it
interacts with the blue and the red seeds from the two layers. This
protein is marked as a candidate common modifier across the two
diseases and its score is calculated (Appendix Fig [114]S2A and B). The
common modifier score (c, Equation [115]2 in [116]Methods and
Protocols) takes into account the ranking of the connected modifiers
(provided by module 1), the reliability of the protein interaction
data, and the degree (number of connections) of all involved proteins.
The latter is used for normalization and aims to prevent a strong bias
toward interaction hubs as common modifiers. In the second step,
proteins are selected that interact with at least one known modifier or
candidate common modifier in each layer. In Appendix Fig [117]S2C, four
proteins (yellow, red, blue, and violet) are selected as next candidate
common modifiers and their scores are calculated. In this step, the
bottom right protein (cyan‐circled) is not selected, because of the
constraint that proteins should interact with at least one or more
known modifiers or candidate common modifiers from every layer.
Finally, these steps are iterated until no more proteins are added
(Appendix Fig [118]S2D and E).
Optimal seed number determination
We tested MLnet on its ability to predict modifiers that are common
across AD, HD, SCA1, and SCA3. Specifically, we used high‐confidence
modifiers identified in Drosophila disease models as inputs for MLnet:
113 modifiers for AD, 209 modifiers for HD, 36 modifiers for SCA1, and
59 modifiers for SCA3 (see [119]Methods and Protocols for details and
Dataset [120]EV1 for the list of modifiers). Using these modifiers as
input, MLnet outputs proteins ranked according to their likelihood of
being common modifiers. Before assessing specific predictions further
and validating them experimentally, we carried out several
computational tests of prediction robustness.
The second module of MLnet uses disease‐specific modifiers provided by
the first module as input. Therefore, we first tested how variations in
seed numbers affect predictions (Fig [121]2). We tested multiple seed
numbers for their ability to identify common modifiers in a
leave‐one‐out‐cross‐validation approach. In the cross‐validation, we
marked experimentally identified modifiers that are common to different
disease combinations as unknown and then tested how well they are
predicted. Specifically, we excluded one of them from the prediction
pipeline (both modules) and then calculated the rank of the excluded
common modifier within the predicted common modifiers. This process was
iterated for all experimentally determined common modifiers (numbers
are given in parentheses in Fig [122]2) in order to evaluate the
performance. As shown in Fig [123]2, using the top 100 seeds showed
consistently the highest performance (Area Under the Receiver Operating
Characteristics, AUROC) in predicting experimentally validated
modifiers that are common to different NDs, and, thus, 100 seeds
(predicted disease‐specific modifiers) were used to run the second
module. The 100 seeds used to predict common modifiers across four NDs
are listed in Dataset [124]EV2.
Figure 2. Performance evaluation of MLnet.
Figure 2
[125]Open in a new tab
The performances of MLnet in the prediction of modifiers common to
different disease groups were assessed using different numbers of seed
proteins. Diseases were grouped as indicated and common modifiers for
that group were predicted with MLnet. As ground truth served the
intersection of high‐confidence genetic modifiers that were identified
experimentally for each disease in that group. The total number of
high‐confidence modifiers of each ND are: 113 for AD, 209 for HD, 36
for SCA1, and 59 for SCA3. The number of experimentally found common
modifiers for each group is given in parenthesis. AUROC was calculated
by leave‐one‐out‐cross‐validation and the bars are mean ± SEM (n = 3).
The results for different numbers of seeds are shown as blue bars. As
controls, we also assessed the performance of a simple gene
prioritization approach (N), GeneMania (G), and Endeavour (E).
Robustness of MLnet output
Next, we tested the extent of MLnet output convergence toward a
consistent set of proteins when including more disease layers, using
different combinations of disease layers or layers with alternative
modifier seeds. To this end, we predicted common modifiers using
various alternative combinations of disease and seed data and compared
the resulting common modifiers with those predicted when using the
standard approach (Appendix Figs [126]S3–S5). We compared the outputs
by calculating Spearman's correlation coefficients of predicted common
modifiers, and by counting the number of overlapping proteins within
the top 100 predicted common modifiers (Dataset [127]EV3).
In the first set of robustness test, we investigated whether the output
of MLnet is dominated by modifiers from one disease layer or a pair of
disease layers. In such a case, common modifiers of that pair of
diseases should be more correlated with the output of MLnet than common
modifiers of other disease pairs. Moreover, the top 100 common
modifiers should be dominated by common modifiers of these two
diseases. In other terms, leaving a disease or disease pair out should
lead to a major drop in the consistency of the data. To test for this
possibility, we predicted common modifiers for pairs of diseases such
as AD and HD (a), and SCA1 and SCA3 (b) – or other pairings – using
MLnet and then used the predicted common modifiers from (a) and (b) as
seeds in the final MLnet prediction of common modifiers (c) (Appendix
Fig [128]S3A). We compared these common modifiers with those predicted
by using data (seeds) from all four diseases concomitantly (d in
Appendix Fig [129]S3; the standard approach). Predictions of common
modifiers for none of the disease pairs stand out or drop in terms of
Spearman's correlation coefficients as well as the number of
overlapping proteins within the top 100. Using randomly selected
proteins as seeds in this comparison resulted in predicted common
modifier lists that did not correlate at all (Spearman's correlation
coefficients around 0) nor overlap.
MLnet integrates data from all disease layers concomitantly. However,
it is not clear whether stepwise integration of disease layer
information leads to different results. This may be the case if certain
disease combinations have significantly different common modifiers. If
not, the gradual integration of disease layer information should
continuously increase the consistency (correlation) of the prediction.
To test these ideas, we predicted common modifiers in a stepwise manner
and compared correlation and overlap at each step (Appendix
Fig [130]S4A), i.e., first (a) with (d), then (b) with (d) and finally
(c) with (d). Moreover, we used different orders of disease layers in
this stepwise approach. In the majority of cases (8 of 12) gradual
integration increased correlation in ranking and overlap of predicted
common modifiers among the 100 top‐ranked proteins (Appendix
Fig [131]S4B and C). Interestingly, the overall number of overlapping
proteins drops when data from the HD layer is integrated last. This
suggests that HD may have quite different modifiers than the other
three diseases. Nevertheless, the tests show that the majority of
common modifiers that are top‐ranked by the standard data integration
of MLnet (> 50%) are also found top‐ranked in an alternative
integration approach where disease data is integrated gradually. Using
randomly selected proteins as seeds in this comparison resulted in
predicted common modifier lists that correlated negatively and had
minimal overlap among the top 100 ranked proteins.
MLnet uses predicted disease‐specific modifiers from the first module
as seeds. Predictions of disease‐specific modifiers are necessary for
certain diseases to compensate for the lack of sufficient
experimentally verified ones. We wanted to test whether using predicted
disease‐specific modifiers as seeds produces results that are
significantly different from those that are generated when sufficient
experimentally verified disease‐specific modifiers are available and
used as seeds. Therefore, we used a combination of known and predicted
modifiers as seeds for MLnet and compared it to the predictions made by
MLnet when using only predicted modifiers as seeds. Specifically, we
used the predicted disease‐specific modifiers for SCA1 and SCA3 but
known and experimentally established modifiers for AD and HD as seeds
for MLnet (Appendix Fig [132]S5A). We used this setup because the
numbers of known high‐confidence SCA1 and SCA3 modifiers are lower than
the optimal number of seeds for MLnet. The Spearman's correlation of
the list of common modifiers predicted by this approach (b) compared to
the standard one using only predicted modifiers (c) is 0.83, and the
number of overlapping proteins within the top 100 is 62 (Appendix
Fig [133]S5B and C). Using randomly selected proteins as seeds in this
comparison resulted in predicted common modifier lists that correlated
minimally and had no overlap in the top 100 ranked proteins.
Robustness analysis in terms of bias toward hub proteins
MLnet uses the number of interactions that a query protein has to
normalize the common modifier score c (Equation [134]2) in order to
reduce a potential bias toward hub proteins that are more connected in
the PPI network. It needs to be stressed that the normalization aims to
reduce a potential bias but not prevent hub proteins from being scored
high. We tested whether alternative normalizations in module 2 would
prevent any heavy bias toward interaction hubs as common modifiers and
provide better prediction results. To this end, we modified the MLnet
code and calculated z‐scores for each query protein. For the z‐score,
we randomly selected seed proteins with the same degree as the original
seed proteins (predicted disease‐specific modifiers) in 10,000
iterations (Seed randomization in Appendix Fig [135]S6). Alternatively,
we randomized the protein–protein interactions while maintaining
proteins' interaction degrees (Network randomization in Appendix
Fig [136]S6). As shown in Appendix Fig [137]S6, the integration of the
alternative normalizations using seed randomization or network
randomization did not improve performance when using a
leave‐one‐out‐cross‐validation.
To confirm that the predicted common modifiers by the standard MLnet
model were not heavily biased toward high‐degree proteins, Pearson's
correlation coefficients between degrees (number of interacting
partners) and ranks of top 100 and 1,000 predicted common modifiers
were calculated. As shown in Appendix Fig [138]S7, there is a low level
of anti‐correlation between rank and network degree when looking at the
top 1,000 predicted common modifiers. Such low level of correlation
should be expected as common modifiers are likely to play a central
role in the network. However, this analysis clearly shows that there is
no strong bias toward high degree proteins (hubs). Importantly, there
is no anti‐correlation between rank and network degree among the top
100 predicted common modifiers.
Comparison with simple prioritization methods and added value of module 2
Finally, we assessed whether the multi‐layered approach of module 2
truly improves prediction of common modifiers. To this end, we first
compared the performance of MLnet with a simple gene prioritization as
employed in the first module. When we used the simple prioritization
approach, the AUROCs (leave‐one‐out‐cross‐validation) of the prediction
of experimentally determined common modifiers for different
combinations of diseases vary between 0.5–0.8 (see bars N in
Fig [139]2), but are in all cases significantly lower than the AUROCs
that are achieved when using MLnet with various seed numbers (blue bars
in Fig [140]2, P < 0.005).
To our knowledge, there are no computational models to predict common
modifiers across multiple diseases, but there are some models that via
prioritization find new genes/proteins associated with a user‐specified
list of genes/proteins (Tranchevent et al, [141]2016; Zolotareva &
Kleine, [142]2019). Our disease‐specific modifier prediction module is
similar to the prioritization models, but we include a network
expansion part to find common modifiers. To compare MLnet's performance
further, we used GeneMania (Warde‐Farley et al, [143]2010) and
Endeavour (Tranchevent et al, [144]2016). Specifically, we used them to
predict disease‐associated proteins for each ND and identified the
overlapping proteins as common modifiers. The resulting AUROCs are
shown in Fig [145]2 (G and E in the graphs). The AUROCs achieved in
this way are always lower than those of MLnet. To confirm the advance
provided by MLnet further, we also calculated Areas under the
Precision‐Recall Curve (AUPRC) for the prediction of modifiers common
to different disease combinations using the optimal number of seeds
(Appendix Fig [146]S8). AUPRCs are lower than AUROCs due to the small
number of common modifiers. More importantly, MLnet mostly outperforms
the other methods, specifically when common modifiers across more than
two diseases are predicted. In addition, AUPRCs of GeneMania and
Endeavour are very low in 4–6 cases, while MLnet shows consistent
performances.
To investigate the added value of module 2 in common modifier
identification further, we carried out additional tests. First, we
assessed whether experimentally‐identified modifiers common to
different sets of disease combinations are highly ranked in the
prediction lists of the other diseases (module 1) not included in the
set. As shown in Dataset [147]EV4, these experimentally established
common modifiers of different disease combinations are generally not
top‐ranked in the predicted lists of the other diseases. Second, we
traced the disease‐specific ranks of the top 12 common modifiers
predicted by MLnet, which we will discuss and experimentally validate
in the following sections. Most of these genes are not ranked in top
200 of at least one of the four NDs (Dataset [148]EV5). Thus, taking
just the top ranked 200 genes of each disease and selecting common ones
would not provide the result we achieve with MLnet. Finally, we
color‐coded disease‐specific modifiers predicted by the first module
according to their rank in the final prediction, i.e., how they are
ranked as common modifiers (Appendix Fig [149]S9A). For instance,
violet‐colored disease‐specific modifiers are proteins that are found
within the top 100 of the common modifiers predicted by MLnet, while
red‐colored ones are found within the top 400–500 common modifiers. In
a similar manner, Appendix Fig [150]S9B shows, in color coding, how
predicted common modifiers are ranked in the individual diseases. The
figures show that only about 30–40% of the top 100 disease‐specific
modifiers are also within the top 100 of MLnet‐predicted common
modifiers. Moreover, there are no overlapping modifiers across the top
50 disease‐specific modifiers of the four diseases used here (Appendix
Fig [151]S9A). When the top 100 are considered, there are two proteins
that overlap: one involved in alternative mRNA splicing and another in
heat shock response.
Overall, these results demonstrate that MLnet outperforms the tested
existing methods in identifying experimentally identified common
modifiers of various NDs combinations. Moreover, robustness tests
demonstrate convergence and superiority of the prediction by MLnet.
Pathway analysis of predicted common modifiers
After the robustness test, we performed KEGG pathway and GO annotation
enrichment analyses on the top 100 predicted common modifiers to get a
better understanding of the processes these proteins are involved in.
Predicted common modifiers are found with significant enrichment in the
KEGG pathways of apoptosis, autophagy, and mitophagy, as well as the
transduction pathways associated with FoxO, MAPK, mTOR, and Hippo
(Fig [152]3A). Consistent with this KEGG pathway analysis, the GO terms
of autophagy, and apoptosis but also protein refolding are
significantly enriched among the top‐ranked common modifiers
(Fig [153]3B). In addition, determination of adult life span and
long‐term memory are also captured, terms well known to be associated
with NDs (Branco et al, [154]2008; Doumanis et al, [155]2009; Zhang
et al, [156]2010; Cleret de Langavant et al, [157]2013; Nuzzo
et al, [158]2017; Fujikake et al, [159]2018). Most interestingly, both
KEGG pathway and GO enrichment analyses find common modifiers enriched
in the insulin receptor (InR) signaling pathway and/or its downstream
effector annotations. The insulin signaling pathway plays a pivotal
role in cell survival, cell growth, autophagy, and cytoskeleton
organization by regulating downstream factors such as BAD, mTOR, FoxO,
and GSK‐3β (Fig [160]3C) and has been linked to NDs in numerous studies
(de la Monte & Wands, [161]2008; Caberlotto et al, [162]2019; Akhtar &
Sah, [163]2020; Shaughness et al, [164]2020). Importantly, enrichment
in these key annotations does not appear “de novo”, meaning that
annotations related to downstream pathways of insulin signaling,
autophagy, and apoptosis are also enriched among predicted
disease‐specific modifiers (Appendix Fig [165]S10).
Figure 3. KEGG pathway and GO enrichment analyses with predicted common
modifiers.
Figure 3
[166]Open in a new tab
1. KEGG pathway enrichment analysis of the top 100 predicted common
modifiers using DAVID (Fisher's exact test) (Sherman
et al, [167]2022).
2. GO enrichment analysis of the top 100 predicted common modifiers
using DAVID (Fisher's exact test) (Sherman et al, [168]2022).
3. A simplified schematic diagram of the insulin signaling pathway and
downstream functions.
To validate our specific findings, we first tested whether application
of MLnet to human data would result in predicted common modifier
proteins that are associated with the same pathways. Although
information on human genetic modifiers is not available, there are many
known ND‐associated proteins that have been discovered in genomic,
proteomic, or transcriptomic analyses of ND patients. Thus, we
investigated whether MLnet was able to predict common modifiers across
multiple human NDs using disease‐associated proteins, not genetic
modifiers. We obtained 359 AD‐associated proteins and 47 HD‐associated
proteins from Neurocarta (Portales‐Casamar et al, [169]2013). Since
proteins associated with SCA1 or SCA3 were not available, only
modifiers common to AD and HD were predicted by MLnet, and a KEGG
pathway enrichment analysis was performed on the predicted proteins.
Microarray data from human brain tissue were obtained from GEO, and PPI
data were filtered by using the human brain proteome obtained from the
Human Protein Atlas (Sjöstedt et al, [170]2020). As shown in Appendix
Fig [171]S11, the most highly enriched pathway is the PI3K‐Akt
signaling pathway, which is part of the insulin signaling cascade.
As the insulin signaling pathway plays a central role in metabolism and
the proteins that are part of it interact with many partners, we
investigated whether this pathway would automatically come up in our
network‐based approach even when using genes associated with diseases
not related to neurodegeneration. To this end, we collected genes
related to three inflammatory diseases (gastroenteritis, hepatitis, and
dermatitis) and tested for annotations enriched among the top‐ranked
proteins predicted by MLnet to be common to these diseases.
Specifically, we collected 265, 146 and 442 genes associated with
gastroenteritis, hepatitis, and dermatitis, respectively, from
Neurocarta (Portales‐Casamar et al, [172]2013) and submitted them to
MLnet. Among the top 100 proteins predicted to be associated with all
three inflammatory diseases, pathways related to the immune response
and inflammation are significantly enriched (Appendix Fig [173]S12 and
Dataset [174]EV6), but not insulin‐related or any other annotations
found enriched among the top 100 common ND modifiers.
Experimental validation of predicted common modifiers in Drosophila models
To experimentally validate our findings, we tested the top 12 candidate
common modifier proteins predicted by MLnet (Fig [175]4) with the help
of D. melanogaster disease models (Chen et al, [176]2001; Franke
et al, [177]2003). Drosophila compound eyes with a simple nervous
system are ideal for such a test (Castedo et al, [178]2002). The
severity of the eye phenotype, which is correlated with the degree of
neurodegeneration, provides an easily measurable readout in this model
system. To establish fly eye models for AD, HD, SCA1, and SCA3, we
expressed the respective disease‐causing genes in the developing eyes
using the GMR‐GAL4 driver; Aβ [1‐42 ]for AD, Htt‐Q128 for HD,
Ataxin1‐Q82 for SCA1, and Ataxin3‐Q78 for SCA3. As observed previously
(Chan & Bonini, [179]2000; Nelson et al, [180]2005; Boland
et al, [181]2008; Wangler et al, [182]2015), all flies with the
eye‐specific expression of the disease‐causing gene showed rough eye
phenotypes with some variation in the severity of the phenotype
(Fig [183]4).
Figure 4. Changes in the rough eye phenotype of Drosophila models for AD, HD,
SCA1, and SCA3 due to knockdown of predicted common modifiers.
Figure 4
[184]Open in a new tab
Representative bright‐field microscope images of fly eyes with
GMR‐GAL4‐driven misexpression of the ND‐causing genes, along with
GMR‐GAL4‐driven RNAi against each of the indicated genes in which
mCherry served as a control. Compared with the wild‐type (WT) compound
eye with the ordered structure of ommatidia, flies with misexpression
of individual ND‐causing genes and RNAi against mCherry under the
control of the GMR‐GAL4 driver had rough eyes with the variation in
phenotypic severity. Suppression or enhancement of these rough eye
phenotypes caused by RNAi‐mediated knockdown of the predicted common
modifiers is indicated with + or −, respectively. As a negative
control, two genes (Cyp6a18 and CG34372) randomly selected among
low‐ranked genes were also tested.Source data are available online for
this figure.
Four of the 12 tested common modifier proteins (Akt1, InR, Pdk1, and
sgg (GSK3β)) changed the eye phenotypes in all four ND models when
down‐regulated by RNAi. Interestingly, three of these (Akt1, InR, and
Pdk1) are directly involved in insulin signaling and one of them (sgg)
acts downstream of the insulin signaling pathway (Fig [185]3C). As a
negative control, we also evaluated the effect of two randomly selected
low‐ranked genes (Cyp6a18 and CG34372) and found them to have little to
no impact across the four Drosophila ND models (Fig [186]4). To
validate the top 12 ranked proteins further, we searched the literature
for evidence that supports their impact in specific NDs
(Dataset [187]EV5) and, indeed, could find evidence across diseases for
many of these proteins.
Given the positive testing of all four genes related to the insulin
pathway, we decided to assess the impact of another insulin pathway
protein on the disease phenotypes. We evaluated Pi3K92E (PI3K in
Fig [188]3C) because it is within the top 20 of the predicted common
modifiers (Dataset [189]EV3). Moreover, PI3K is of particular interest
because it is one of the key mediators of the insulin pathway's impact
on brain plasticity and neurogenesis. For instance, PI3‐kinase is
essential for glutamate receptor insertion at plasma membranes during
synaptic plasticity (Man et al, [190]2003). Downregulation of Pi3k92E
changed the eye phenotype in all four ND models (Appendix
Fig [191]S13), confirming the significance of insulin signaling for the
model phenotypes investigated here.
Experimental validation of Akt1 in AD cell and mouse models
Motivated by these findings, we aimed to test the disease‐modifying
impact of insulin signaling in mammalian models of ND. Since decreased
activity of or resistance in insulin signaling is commonly found in the
patients of AD, we hypothesized that activation of the insulin
signaling pathway could alleviate neurodegenerative phenotypes. We
chose Akt1 as a target for insulin signaling modulation because of its
central position in this pathway. Akt1 is not ranked very high in the
disease‐specific modifier lists, with the exception of SCA1
(Dataset [192]EV5), but is second in the final ranking of common
modifiers due to its interaction with many proteins that are themselves
disease modifiers, i.e., partners that are highly ranked in the
disease‐specific modifier lists of module 1 (see Appendix Text [193]S1,
Appendix Fig [194]S14, and Datasets [195]EV7 and [196]EV8 for details).
Moreover, the availability of an activator of this kinase enables
induction of downstream insulin signaling (Jo et al, [197]2012).
In a first test, we constructed human cell‐based models for AD, HD,
SCA1, and SCA3 by expressing ND‐causing genes (Aβ[1–42], Htt‐Q74,
Atx1‐Q52, and Atx3‐Q84) in HEK293 cells and evaluated the impact of
Akt1 activation on cell phenotypes. HEK293 are generally not affected
by the wild‐type forms of the disease genes. However, their viability
is affected by the gene products of variants that have an increased
likelihood of aggregation (e.g., polyQ repeats in HD, SCA1, and
SCA3‐related genes), especially when expressed at high levels. Thus,
cell viability assays with HEK293 cells have extensively been used to
study disease mechanisms and test small compounds for their impact on
aggregation and cell viability (Wang et al, [198]2006, [199]2019;
Bartley et al, [200]2012; Pierzynowska et al, [201]2018; Shentu
et al, [202]2019; Hart et al, [203]2022; Niu et al, [204]2022). The
Akt1 activator SC79 that we used prevents the inhibitory intramolecular
interaction between the plecktrin homology (PH) and catalytic domain
(Warrick et al, [205]1998; Gabbouj et al, [206]2019). To activate Akt1
signaling in the disease cell models, we treated cells with 1 or 10 μM
of SC79. 10 μM was the highest concentration of SC79 that showed no
significant toxicity (Fig [207]5A). As shown in Fig [208]5B–E, the
treatment of the cells with 10 μM of SC79 significantly increased cell
viability in all models when compared with the viability of cells that
were not treated with SC79.
Figure 5. Effect of Akt1 activation on disease symptoms in mammalian disease
models.
Figure 5
[209]Open in a new tab
* A
Cell viability at the indicated concentrations of SC79
(0.1–100 μM), an Akt1 activator, in HEK293 cells. Data are the mean
and SEM of biological replicates (n = 4–10). * denotes P‐value
< 0.05 (Student's t‐test).
* B–E
Determination of the alleviating effect of SC79 (1 or 10 μM) on
cell death in human cell‐based models for AD (B), HD (C), SCA1 (D),
and SCA3 (E). Data are the mean and SEM of biological replicates
(n = 3–7). * denotes P‐value < 0.05 (Student's t‐test).
* F, G
SC79 was administered to 7‐month‐old AD mice for 1 month, and their
memory remedy was investigated by Barnes Maze Test. For 5 days, the
time to find a target hole decreased due to learning. The mice were
tested on day 15 and 16 after a blank period to investigate
long‐term memory. The time to find a target hole on tested days (F)
and the average of latencies on days 15 and 16 (G) are shown. Data
are the mean and SEM of three biological replicates. * denotes
P‐value < 0.05 (Student's t‐test).
* H
The dysregulated anxiety levels in AD mice were investigated using
the elevated plus maze test. In (G) and (H), + and – denote the
treatment and nontreatment with SC79, respectively. Bar graphs were
drawn from at least three independent experiments (biological
replicates) and represent mean and SEM. * denotes P‐value < 0.05
(Student's t‐test).
Source data are available online for this figure.
In a second test, we investigated whether the activation of Akt1 can
alleviate the symptoms in an AD mouse model. To this end, we tested the
impact of SC79 in the 5xFAD mouse model. 5xFAD transgenic mice
overexpress mutant human APP with the K670N, M671L, I716V, V717I
familial AD mutations and human PS1 harboring the two mutations M146L
and L286V. We fed cohorts of 7‐month‐old WT and 5xFAD mice with SC79
for 1 month, while control WT and 5xFAD mice received no treatment. We
then used a Barnes maze test and an elevated plus‐maze test to
investigate memory deficits and anxiety levels, respectively
(Fig [210]5F–H). In the Barnes maze test, mice are exposed to the test
of finding a target hole for 5 days. As shown in Fig [211]5F and G, the
elapsed time finding the hole (primary latency) decreased due to
learning and memorizing. There is no significant difference between the
WT and AD groups, which suggests that the AD mice do not show any
defect in short‐term memory. After a period of 10 days, during which we
did not test the mice, we restarted to evaluate their ability to find
the hole on days 15 and 16. Interestingly, AD mice spent significantly
more time to find the hole than SC79‐treated AD mice, which found the
hole as quickly as WT mice (Fig [212]5G).
AD patients also display dysregulated anxiety and 5xFAD transgenic mice
show reduced anxiety levels (Jawhar et al, [213]2012; Belaya
et al, [214]2020). Therefore, we employed an elevated plus‐maze test to
examine the effect of SC79 on anxiety level. In this test, an increased
residence time in open‐end arms indicates a lower level of anxiety,
which the 5xFAD transgenic mice have (Fig [215]5H). SC79‐treated AD
mice spent significantly less time in open‐end arms than non‐treated
5xFAD mice. The time spent in open‐end arms by treated AD mice is
similar to the one of WT mice. Overall, these tests with 5xFAD
transgenic mice suggest that activation of Akt1 in AD mice can recover
long‐term memory deficits and attenuate dysregulated anxiety levels.
Discussion
In this study, we introduce a computational model that predicts
modifier proteins common to multiple related diseases. Our approach
uses ideas from prioritization and network biology in order to be able
to integrate genomic, transcriptomic and proteomic data. Various
methods for gene and protein prioritization have been developed before
(Aerts et al, [216]2006; Tranchevent et al, [217]2016; Zolotareva &
Kleine, [218]2019; Ruan & Wang, [219]2021). Indeed, previous studies
have integrated genotype–phenotype association data with gene
annotations available in the public domain such as GO and knowledge
from biomolecular interaction networks to predict new associations. The
rationale for this approach is that genetic variations that are
associated with a specific disease should cluster in subnetworks of
physically and functionally interacting proteins (Califano
et al, [220]2012). Proof‐of‐principal for this approach has been
provided in the successful prediction of oncogenes for B‐cell lymphomas
(Basso et al, [221]2005) or genes that increase susceptibility for
obesity (Yang et al, [222]2009b). The idea of combining gene annotation
and PPI information has further been exploited in prioritization
methods, such as GeneMania and Endeavour, in order to predict functions
and phenotypes of non‐annotated genes. However, none of the existing
methods was specifically developed to identify proteins and genes that
may have broader pathophysiological relevance for an entire disease
group. Thus, MLnet is unique in that it creates different disease
layers and identifies those proteins as common modifiers that are most
connected across the different layers. The basic idea behind this
approach is that common modifiers are proteins that are at the
cross‐roads of pathways playing a role in the pathogenesis of the
different diseases.
Multiple robustness tests that we carried out suggest that MLnet
provides a consistent prediction ranking of common modifier proteins
with top‐ranked proteins that reappear independent of the detail of
data integration: independent from the order in which disease layer
data is integrated or the use of high‐confidence, experimentally
validated, or predicted disease‐specific modifiers as inputs to module
2. Disease‐specific modifiers that are highly ranked initially are not
necessary among the highest ranked common modifiers and those ranked
low for a specific disease may become highly ranked across diseases
because the encoded proteins connect modifiers across multiple layers.
For example, Akt1 was originally tested as modifier in HD and SCA1
models, but not AD and SCA3 models. In addition, the predicted ranks of
Akt1 in the respective NDs were 113rd (AD), 383rd (HD), 15th (SCA1),
and 260th (SCA3). Our benchmarking also demonstrates that there is no
strong correlation between the network degree of a protein and its rank
in the common modifier prediction. Proteins with high degree are often
highly studied with links to many diseases and, therefore, appear high
in rankings generated by guilt‐by‐association approaches independent of
the disease in question (Gillis & Pavlidis, [223]2011). Most
importantly, MLnet performs consistently better than classical gene
prioritization and the established methods GeneMania (Warde‐Farley
et al, [224]2010) and Endeavour (Tranchevent et al, [225]2016) in the
identification of protein modifiers common to different ND
combinations, highlighting the validity of the implemented new
approach.
GO and KEGG enrichment analyses of MLnet predictions for four NDs
revealed that top‐ranked modifiers are significantly associated with
cellular mechanisms and pathways well‐known to modulate
neurodegeneration such as autophagy and mitophagy. Most prominent among
them is the insulin signaling pathway and its constituents. This
finding is consistent with numerous studies from the last two decades
that have demonstrated the relevance of insulin and its signaling in
the pathophysiology of NDs and aging (de la Monte & Wands, [226]2008;
van Heemst, [227]2010; Akintola & van Heemst, [228]2015; Caberlotto
et al, [229]2019; Akhtar & Sah, [230]2020; Shaughness
et al, [231]2020). To validate individual predictions, we tested the
effect of the 12 top‐ranked predicted modifiers in D. melanogaster
models for AD, HD, SCA1, and SCA3. While four of these top 12 are
involved in the extended insulin pathway, five others are part of the
proteostasis machinery (Droj2 and Hsc70cb are chaperones, UBA1 is the
E1 ubiquitin‐activating enzyme, Agt1 an autophagy regulating
Ser/Thr‐kinase, and Mi‐2 a chromatin remodeler required for heat shock
gene expression), and the three remaining genes are involved in
microtubule function (par‐1 and Lk6 – two kinases involved in
microtubule organization – and zip (zipper) a microtubule‐binding
protein). Most of these proteins have previously been found associated
with ND pathology (Nishimura et al, [232]2004; Ambegaokar &
Jackson, [233]2011; Kuo et al, [234]2013; Blázquez et al, [235]2014;
Groen & Gillingwater, [236]2015; Zhang et al, [237]2015; Kim
et al, [238]2017; Pomytkin et al, [239]2018; Shaughness
et al, [240]2020; Burillo et al, [241]2021; Yakubu & Morano, [242]2021;
Ring et al, [243]2022; Nowell et al, [244]2023). Consistent with these
previous studies, we find that all 12 top‐ranked proteins, when
suppressed in expression, modulate disease phenotype in at least two of
the tested models. However, only proteins that are part of the insulin
pathway affect phenotypes in all of the tested D. melanogaster disease
models. Interestingly, supressing the expression of these modifiers
enhances the phenotype in some ND models, while it reduces it in
others. These observations are consistent with previous studies where
overexpression of the same gene can have opposing effects on the
phenotypic of different NDs when tested in fly models (Branco
et al, [245]2008). These differences are explained by the fact that the
impact of genetic modulation on ND phenotypes is highly dependent on
the method and level of modulation and the complexity of the
pathophysiology of the individual ND (Na et al, [246]2013). Therefore,
the results of the D. melanogaster experiments (Fig [247]4) should be
interpreted as evidence for the ability of the positively tested genes
to act as disease modifiers rather than enhancers or suppressors.
Our finding suggests that members of the insulin pathway may have
pathophysiological relevance for proteinopathies in general. This
hypothesis is consistent with growing evidence in the association of
insulin signaling with the pathophysiology of NDs (Blázquez
et al, [248]2014; Pomytkin et al, [249]2018; Shaughness
et al, [250]2020; Burillo et al, [251]2021; Nowell et al, [252]2023).
Insulin and insulin‐like growth factor 1 (IGF‐1) play metabolic and
neuroprotective roles in the brain (Pomytkin et al, [253]2018; Burillo
et al, [254]2021). Specifically, insulin regulates glucose homeostasis
and maintains energy requirements for different neuronal functions. It
is vital for neuronal growth and differentiation as well as
neuroprotection by modulating autophagy, mitochondrial function, ER
stress, and apoptosis (Pomytkin et al, [255]2018; Burillo
et al, [256]2021). Thus, dysfunction of insulin signaling makes
neuronal cells vulnerable to metabolic and cellular stresses (Kim &
Feldman, [257]2015). Moreover, the insulin signaling pathway plays key
roles in brain plasticity, impacting cognitive functions such as
learning and memory (Spinelli et al, [258]2019). In the hippocampus,
for instance, insulin positively impacts synaptic and structural
plasticity. Recently, eight genes have been associated with human adult
cognitive function through rare coding variants with large effects.
Four of these eight genes had previously been shown to affect insulin
and the insulin pathway, although this link has been established with
peripheral and not cerebral insulin (Giovannone et al, [259]2003;
Hamming et al, [260]2010; Backe et al, [261]2019; González
et al, [262]2022; Chen et al, [263]2023). Resistance to insulin
compromises many of these regulatory aspects, which is believed to
promote the development of NDs. This disease mechanism has been
extensively investigated in the context of AD, where epidemiologic
studies have shown that type 2 diabetes and prediabetic states of
insulin resistance are risk factors for AD (Arvanitakis
et al, [264]2006).
Insulin exerts its regulatory role on cellular metabolism, nutrient
homeostasis and cognition mainly via the PI3K/Akt signaling cascade and
the downstream effectors, FoxO and mTOR (Fig [265]3). FoxO impacts cell
differentiation and proliferation, while mTOR regulates fatty acid and
protein synthesis, as well as mitochondrial metabolism (Du &
Zheng, [266]2021; Maiese, [267]2021; Querfurth & Lee, [268]2021). The
link between insulin and mitochondrial metabolism appears to play a
central role in ND pathology (Galizzi et al, [269]2021; Schell
et al, [270]2021; Galizzi & Di Carlo, [271]2022). Indeed, mitochondrial
dysfunction is a common feature of NDs, which results in ATP
deficiency, oxidative stress, inflammation, and consequently apoptotic
cell death (Galizzi & Di Carlo, [272]2022). It has been shown that
reduced Akt1 signaling, which occurs in insulin resistance conditions,
reduces mitochondrial respiration and increases in mitochondrial
fission, eventually increasing oxidative stress (Miyamoto
et al, [273]2008; Yang et al, [274]2009a). In addition to its impact on
mitochondrial metabolism and stress response, altered insulin signaling
can directly influence cognition. Consistent with these important roles
of Akt1 in insulin signaling, our experiments demonstrate that
activation of Akt1 with the small molecule SC79 increases viability of
HEK293 cell expressing ND‐causing genes, and enhances long‐term memory
and ameliorates dysregulated anxiety levels in AD mice.
Interestingly, insulin signaling is Janus‐faced: while it promotes cell
survival, it also represses autophagy via the activation of an
autophagy‐inhibiting enzyme (mTOR) and inhibition of an
autophagy‐promoting enzyme (FoxO) (Fig [275]3). Autophagy impairment is
a hallmark of NDs characterized by the cellular accumulation of protein
aggregates (Subramanian et al, [276]2022). Many studies have reported
that recovery of autophagy activity by boosting a metabolite (NAD) or
suppressing autophagy‐inhibiting enzymes such as mTOR rescues the
viability of neuronal cells (Spilman et al, [277]2010; Heras‐Sandoval
et al, [278]2014; Sun et al, [279]2023). When we investigated the
levels of amyloid‐β in the brain of AD mice treated with or without
SC79, there were no significant changes in amyloid‐β levels whether
insulin signaling was activated or not (Appendix Fig [280]S15). As
activated insulin signaling improved cell viability in in‐vitro assays
(Fig [281]5B‐E), this may suggest that activation of Akt1 in our
experiments may have enhanced anti‐apoptotic effects while impacting
autophagy to a lesser extent. Insulin actually experts anti‐apoptotic
effects via Akt1, which reduces the mitochondrial release of cytochrome
c (Kang et al, [282]2003; Li et al, [283]2009). Alternatively,
activation of Akt1 may be beneficial via its regulatory impact on
cognitive functions (Spinelli et al, [284]2019). In any case,
experiments carried out in this study are primarily meant to provide
evidence for the cross‐disease relevance of MLnet‐predicted modifiers
and not to elucidate the detailed molecular mechanism that confer that
relevance. Moreover, although our experiments suggest a modulatory role
of Akt1 for multiple NDs, it may not be an ideal target for ND
treatment because of its involvement in numerous cellular processes and
the fact that its enhanced activation can lead to cancerous cell
transformation (Wang et al, [285]2017), which would require very close
monitoring for tumorigenic effects when activated via a therapeutic
agent. Other proteins in the insulin pathway such as the downstream
effector GSK3β are already actively targeted for ND therapy
development. GSK3β inhibitors showed positive improvement in animal
models, but unfortunately failed in AD patients (Rippin &
Eldar‐Finkelman, [286]2021; Arciniegas Ruiz &
Eldar‐Finkelman, [287]2022). Moreover, direct insulin administration to
healthy individuals and AD patients improved memory performance in
small studies, but mixed results were reported for larger clinical
trials (Morris & Burns, [288]2012; Hallschmid, [289]2021). It is clear
that more research is required to fully understand the roles of insulin
signaling in NDs and whether activation of specific elements of this
signaling pathway may benefit patients.
In summary, we introduce and benchmark MLnet, as a computational model
that can predict modifiers common to multiple diseases. When used on
genetic modifiers of NDs, MLnet identifies the insulin signaling
pathway and its constituents as potential elements that have broader
relevance for proteinopathies. MLnet has limitations as it depends on
third party data. Most importantly, the network expansion approach
relies on accurate protein interaction data and a good coverage of the
“real” network present in cells. In addition, the protein interaction
network varies between cell types and tissues, which will affect
MLnet's output. However, efforts are under way to map cell‐ and
tissue‐specific interactomes (Huttlin et al, [290]2021; Skinnider
et al, [291]2021; Holguin‐Cruz et al, [292]2022), which will provide
more relevant data that can be used in the future.
Materials and Methods
Reagents and Tools table
Reagents/Resource Reference of source Identifier or catalog number
Drosophila disease models
AD model Bloomington Drosophila Stock Center BL33769
HD model Bloomington Drosophila Stock Center BL33808
SCA1 model Bloomington Drosophila Stock Center BL39740
SCA3 model Bloomington Drosophila Stock Center BL8150
Drosophila RNAi lines
Droj2 Bloomington Drosophila Stock Center BL36089
Akt1 Bloomington Drosophila Stock Center BL31701
Atg1 Bloomington Drosophila Stock Center BL26731
Uba1 Bloomington Drosophila Stock Center BL36307
InR Bloomington Drosophila Stock Center BL31037
Par‐1 Bloomington Drosophila Stock Center BL32410
Pdk1 Bloomington Drosophila Stock Center BL27725
Mi‐2 Bloomington Drosophila Stock Center BL33419
Lk6 Bloomington Drosophila Stock Center BL28357
Hsc70Cb Bloomington Drosophila Stock Center BL33742
Zip Bloomington Drosophila Stock Center BL36727
sgg Bloomington Drosophila Stock Center BL35364
Cyp6a18 Bloomington Drosophila Stock Center BL42824
CG34372 Bloomington Drosophila Stock Center BL51472
Pi3k92E Bloomington Drosophila Stock Center BL61182 (AD, HD, SCA1)
and BL35798 (SCA3) since BL61182 was lethal in SCA3.
Drosophila overexpression lines
Akt1 Bloomington Drosophila Stock Center BL8191
Plasmid DNAs
pEGFP‐C1‐Aβ1‐42 constructed from pCAX‐FLAG‐APP Addgene #30154
pEGFP‐Htt‐exon1‐Q74 Addgene #40262
pEGFP‐Ataxin1‐52Q Addgene #32492
pEGFP‐C1‐Ataxin3‐Q84 Addgene #22123
Human cell line
HEK293 ATCC CRL‐1573
Animals
5xFAD mice The Jackson Laboratory 034848‐JAX
Reagents
SC79 Sigma‐Aldrich SML0749
Databases
NeuroGeM [293]https://neurogem.msl.ubc.ca/
GeneOntology [294]http://geneontology.org/
KEGG [295]https://www.genome.jp/kegg
InterPro [296]https://www.ebi.ac.uk/interpro/
Gene Regulation [297]http://droidb.org/
GEO [298]https://www.ncbi.nlm.nih.gov/geo/
UniProt [299]https://www.uniprot.org/
STRING [300]https://string‐db.org/
Neurocarta [301]https://gemma.msl.ubc.ca/phenotypes.html
GeneMania [302]https://genemania.org/
Tools
DAVID [303]https://david.ncifcrf.gov/
[304]Open in a new tab
Methods and Protocols
Data preparation
Due to the complex pathophysiology of NDs and the use of very diverse
mutant genes, e.g., different lengths of polyQ in HD, SCA1, and SCA3,
there are significant inconsistencies in the experimental results of
different ND studies (Na et al, [305]2013). These inconsistencies are
particularly prominent when studies for genetic modifier identification
are compared. This fact motivated us to develop a confidence score that
considered different experimental results and provided a metric of the
likelihood of a gene to be a modifier or non‐modifier. The following
confidence score was calculated for and assigned to each genetic
modifier obtained from NeuroGeM (Na et al, [306]2013):
[MATH: S=Sm−Sn=1−Πi1−rm,<
mi>i−1−Πj1−rn,<
mi>j :MATH]
(1)
S [m ]and S [n ]denote the confidence scores for being a modifier or
non‐modifier, respectively. To provide a single confidence score, S was
defined as S = S [m ]– S [n ]. S is in the range of −1 to +1. A
positive value of S indicates that the gene is likely to be a modifier,
while a negative value indicates that a gene is likely to be a
non‐modifier. The larger the magnitude of the score S, the larger is
the confidence. i and j denote experiments that identify genes as
modifiers or non‐modifiers, respectively, and r [m,i ]and r [n,j
]denote the reliabilities of modifier and non‐modifier, respectively.
Individual results could have different r [m,i ]and r [n,j ]values
depending on the specifics of experiment i and j such as the scale
(primary high‐throughput screening (HTS), secondary HTS, and
low‐throughput screening (LTH)) or the method used to alter gene
expression (siRNA‐based interference, knockout, overexpression, etc.).
Since the details of the experimental differences and their impact on
the reliability of the findings are hard to quantify, we approximated
reliabilities r[ m,i ]and r[ n,j ]by assessing how reproducible,
respectively, consistent specific experimental findings are. To do so,
we compared findings from different experiments with each other and
assessed consistency (Appendix Fig [307]S1). Specifically, we compared
primary HTS results with LTS results. If a gene was consistently
identified as a modifier or non‐modifier in both HTS and LTS, it was
counted as consistent. Otherwise, it was counted as inconsistent. If
LTS data was not available, HTS results were compared with secondary
HTS results to calculate the consistency. Likewise, secondary HTS
results were compared with LTS results to calculate the consistency of
secondary HTS results. LTS results were compared with other LTS results
if two or more experiments were available. As a result, we obtained the
following values for both r[ m,i ]and r[ n,j ]: 0.194 for primary HTS,
0.594 for secondary HTS, and 0.737 for LTH. The scores indicate that
LTS scale experiments are more consistent than HTS (Appendix
Fig [308]S1). We calculated confidence scores (S) for all genes
deposited in NeuroGeM (Na et al, [309]2013) and used the genes that had
a positive confidence score as modifiers for this study. Consequently,
111 modifiers of AD, 209 modifiers of HD, 36 modifiers of SCA1, and 59
modifiers of SCA3 were obtained.
Prediction of disease‐specific modifiers
The statistical approach for GO, KEGG pathways, InterPro domains, and
transcription regulations (transcription factor – target genes) is
identical. For a query gene, we calculated the P‐value for the
association of annotations to known genetic modifiers (those with
positive confidence scores) using a hypergeometric test. We then
calculated the score by summing the −log[10] (P‐value) of the terms
annotated to this gene. To determine the z‐score of a query gene, we
calculated the expected score and the standard deviation for randomly
selected genes in 10,000 iterations. Based on the z‐scores of all
genes, we obtained a ranked list of potential disease‐specific
modifiers.
Regarding GO, we only used GO leaf nodes from the three categories
(biological processes, cellular components, and molecular functions)
for the score calculation. Identical to GO annotations, we used KEGG
pathway information to predict new genetic modifiers. To use protein
domain information, which represents protein functions, the domains of
genetic modifiers were analyzed with InterPro. The regulation
relationships of transcription factors and their target genes were also
used to predict new genetic modifiers. The relationships were obtained
from DroID (Murali et al, [310]2010).
To use gene expression correlation as a feature, Drosophila microarray
data was obtained from GEO (Edgar et al, [311]2002). With these data,
the sum of absolute values of Pearson's correlation coefficients
between a query gene and known modifiers were calculated. This summed
score was then converted to a z‐score using the scores obtained from
random models as done for GO and others.
For the use of sequence similarity, protein sequences were compared
with those of known modifiers by using USearch (Edgar, [312]2010), a
faster algorithm than Blast. The highest bit score between a query gene
and known modifiers was used as a score, and then the score was also
converted to a z‐score as done for other features.
Finally, from each dataset, we ranked genes by their z‐scores. We then
converted the resulting six ranks to rank ratios (0 < rank ratio ≤ 1)
and used the rank ratios to calculate P‐values based on order
statistics. We prioritized potential disease‐specific genetic modifiers
then by their P‐values.
Prediction of common modifiers
Once disease‐specific modifiers were predicted, we used the top N
proteins encoded by the disease‐specific modifier genes as seeds in
MLnet for the prediction of common modifiers. The optimal number of
seeds (N) was determined before use. We selected the top N proteins and
used their rank as their seed scores s = 1‐(rank‐1)/N. We mapped seeds
for each disease on PPI networks. Specifically, we obtained PPI data
from STRING (Franceschini et al, [313]2012) and constructed layered
networks, each layer representing a particular disease. We then mapped
disease‐specific seed modifiers onto the layered PPI networks. To
illustrate this and how common modifiers are calculated in MLnet in the
next steps, we provide fictitious networks in Appendix Fig [314]S2. The
score function is provided in Appendix Fig [315]S2A, and the stepwise
procedure shown in Appendix Fig [316]S2B–E.
In the first round of MLnet calculations (Appendix Fig [317]S2B),
proteins that are connected to one or more seed modifiers from each
layer are identified. In Appendix Fig [318]S2B, there is only one
protein (p [1 ], green) that is linked to a seed modifier (p [2 ], red)
in the AD layer and another seed modifier (p [3 ], blue) in the HD
layer. Based on this topology, its score (c, common modifier score) is
calculated as provided below.
[MATH: cpk=1Wpk∏d∑iw
pi,kWpi×qpid :MATH]
(2)
[MATH: qpid=cpispidifcpiis
availableelse ifspidis available
:MATH]
[MATH: wpi,k :MATH]
: interaction reliability of proteins p [i ]and p [k ](0 < w < 1).
[MATH: wpi,i :MATH]
=1.
[MATH: Wpi :MATH]
: sum of interaction reliabilities of protein p [i ].
[MATH: cpi :MATH]
: common modifier score of protein p [i ].
[MATH: spid :MATH]
: seed score of protein p [i ]in disease d.
The red‐boxed equation in Appendix Fig [319]S2B is for the contribution
of protein p [2 ]in the AD disease layer and the blue‐boxed equation is
for protein p [3 ]in the HD layer. An important factor in the
calculation of c(p [k ]) is W(p [k ]) that is the sum of interaction
reliability values w(p [i,k ]) for all interactions a protein has. This
term is included to normalize by the number of interactions a protein
has and, thereby avoid predictions heavily biased toward hub proteins,
i.e., proteins with lots of interaction partners that have a higher
likelihood to interact with seeds/modifiers. Interaction reliability
values are obtained from the STRING database (Franceschini
et al, [320]2012). In the given example, the calculation in this first
step results in a score of 0.00284 for p [1 ], and p [1 ]is then marked
as a potential common modifier in both AD and HD layers. The first step
is terminated with the calculation of p [1 ], since there are no more
proteins that are connected to seeds in both layers. The next round of
calculation begins.
In the next round, new proteins linked to a disease‐specific modifier
or common modifier from each layer are identified and their scores are
calculated. In Appendix Fig [321]S2C, there are four proteins (p [2 ],
p [3 ], p [4 ], and p [5 ]) that are linked to a seed from each layer
and/or a common modifier (by default in each later). For example,
protein p [2 ](red) can be selected because it is a (seed) modifier
(self‐interaction) in the AD layer and is connected to a potential
common modifier (p [1 ]) in the HD layer (as well as to a potential
common modifier (p [1 ]) in the AD layer). Thus, p [2 ]could be another
common modifier, and so its score is calculated as highlighted in
Appendix Fig [322]S2C. Similarly, other proteins (p [3 ], p [4 ], and p
[5 ]) are connected to at least one modifier from each layer and their
scores are calculated (black arrows in Appendix Fig [323]S2C). In the
next step, there is only one protein left that can be selected as a
common modifier (p [6 ]) because it is linked to a common modifier (p
[5 ]) in the AD and HD layers. Its common modifier score is then
calculated as shown in Appendix Fig [324]S2D. As there are no more
proteins to be selected, the calculation round is terminated.
Consequently, each protein has a score, and they are ranked by their
common modifier score (Appendix Fig [325]S2E). In the given example,
though p [6 ](cyan) was last selected, it has the highest score due to
its high seed score in HD. Thus, p [6 ]is the most promising candidate
common modifier across two diseases of our fictitious example.
Drosophila eye models for various NDs
To generate Drosophila eye models for AD, HD, SCA1, and SCA3 using the
GAL4/UAS transactivation system (Brand & Perrimon, [326]1993), the
GMR‐GAL4 driver line was crossed to the UAS‐transgene lines being
analyzed: UAS‐Aβ42 (BL33769) for AD, UAS‐HTT‐128Q (BL33808) for HD,
UAS‐ATX1‐82Q (BL39740) for SCA1 (Fernandez‐Funez et al, [327]2000), and
UAS‐MJDtr‐78Q (BL8150) for SCA3 (Warrick et al, [328]1998). All fly
lines were obtained from Bloomington Drosophila Stock Center (BDSC).
All stocks and crosses were reared on standard cornmeal/agar media
under noncrowded conditions at 25°C unless otherwise stated.
Evaluation of candidate modifier proteins for AD, HD, SCA1, and SCA3
To confirm MLnet‐predicted candidates for common modifier proteins, the
conditional knockdown or overexpression of specific proteins was
achieved with the GAL4/UAS system (Brand & Perrimon, [329]1993) in the
Drosophila models for AD, HD, SCA1, and SCA3 at 25°C (for SCA3) or 29°C
(for AD, HD, and SCA1), where an enhanced activity of the GAL4/UAS
system is exerted (Seroude et al, [330]2002). The following RNAi lines
were used in this study: UAS‐Droj2‐RNAi ^TRiP (BL36089), UAS‐Akt1‐RNAi
^TRiP (BL31701), UAS‐Atg1‐RNAi ^TRiP (BL26731), UAS‐Uba1‐RNAi ^TRiP
(BL36307), UAS‐InR‐RNAi ^TRiP (BL31037), UAS‐par‐1‐RNAi ^TRiP
(BL32410), UAS‐Pdk1‐RNAi ^TRiP (BL27725), UAS‐Mi‐2‐RNAi ^TRiP
(BL33419), UAS‐Lk6‐RNAi ^TRiP (BL28357), UAS‐Hsc70Cb‐RNAi ^TRiP
(BL33742), UAS‐Zip‐RNAi ^TRiP (BL36727), and UAS‐sgg‐RNAi ^TRiP
(BL35364). Other lines included Canton‐S (BL64349) as a wild‐type
control, and UAS‐mCherry‐RNAi (BL35785) and UAS‐mCherry (BL35787),
which were used as controls for RNAi and overexpression, respectively.
All fly stocks were obtained from BDSC. Flies with misexpression of
UAS‐Aβ42, UAS‐HTT‐128Q, UAS‐ATX1‐82Q, or UAS‐MJDtr‐78Q under the
control of the GMR‐GAL4 driver were crossed with flies with the
UAS‐RNAi or UAS‐overexpression transgene being analyzed. After
anesthetizing 5‐day‐old F1 females with CO[2], eye images were acquired
using a stereomicroscope (Olympus) and a microscopic camera (Sentech
America). The fly eyes were photographed under the same adjustment
setting of I‐MEASURE software for capturing images. The Drosophila
experiments were performed in a blinded manner.
Viability assays with HEK293 cells
HEK293 cells for viability assays were cultured in plating medium
(Dulbecco's modified Eagle's medium (DMEM, Welgene, South Korea) with
10% fetal bovine serum (FBS, Welgene, South Korea) and 50 μg/ml
gentamycin (Duchefa, Netherlands) in a 5% CO[2] humidified atmosphere
at 37°C.
HEK293 cells having 70–80% cell density were transiently transfected
with pEGFP‐C1‐Aβ[1‐42], pEGFP‐Htt‐exon1‐Q74 (Addgene #40262),
pEGFP‐Ataxin1‐52Q (Addgene #32492), or pEGFP‐C1‐Ataxin3‐Q84 (Addgene
#22123) DNA using Lipofectamine 2000 (Invitrogen, CA, USA) following
the manufacturer's instructions. The pEGFP‐C1‐Aβ[1–42] plasmid was
constructed from pCAX‐FLAG‐APP (Addgene #30154).
Before drug treatment, HEK293 cells were washed with treating medium
(Minimum essential medium (MEM, Gibco, MD, USA) with 1% FBS) and then
treated with the indicated concentration of SC79 (0.1–100 μM) (SML0749,
Sigma‐Aldrich) for 24 h.
Cell death was measured using Cell Counting Kit‐8 (CK04, Dojindo, MD,
USA), which was performed according to the manufacturer's instructions.
The optical density (OD) of each well was measured using a microplate
reader at 450 nm (Molecular Devices, CA, USA), and the OD values were
reported as % cell viability (mean ± SEM, n = 4–8 per group). The in
vitro assays were performed in a blinded manner.
AD mice
Female and male 5xFAD mice overexpressing the mutant human APP (K670N,
M671L, I716V, V717I) and PS1 (M146L and L286V) (The Jackson Laboratory,
Stock No. 034848‐JAX) were treated with SC79 from 1 month before the
behavioral test (7‐month‐old). Wild‐type (WT) littermates served as
age‐matched control animals. Mice were separated by sex and genotype
and housed in polyethylene cages (25 cm × 30 cm × 22 cm) with aspen
shaving bedding (DBL, Korea), 4–5 each. They were classified into four
groups (WT, WT + SC79, AD, and AD + SC79). SC79 groups were
administered with SC79 dissolved in 8.5% DMSO (D2438, Sigma‐Aldrich) in
corn oil 5 days per week via oral gavage (1.5 mg/kg/day in 100 μl).
Body weight was measured every week. All groups were kept in standard
condition (23 ± 2°C, humidity 50 ± 5%, and 12 h light/dark cycle, and
light turned on from 9:00 am to 9:00 pm). Mice had ad libitum access to
food (NIH‐31) and sterile water. All procedures were performed in
accordance with Sejong University Institutional Animal Care and Use
Committee.
Barnes maze test
Barnes maze test was performed to elucidate the effect of SC79
treatment on cognitive deficits in learning and memory as described
(Patil et al, [331]2009) with slight modifications. Barnes maze
apparatus is a white acrylic circular disk, 92 cm in diameter, with 20
spaced 5 cm in diameter holes. The escape chamber was placed under one
of the holes, defined as the target hole. Because mice may sometimes
lack entering the escape chamber motivation, mice explore the maze
after finding the target hole without descending into it (Harrison
et al, [332]2006). To motivate mice to enter the escape chamber, the
escape chamber contained some plastic steps, aspen shavings, and six
standard feeds. Other holes were closed with matte black plates. Mice
were placed in a square styrofoam box (20 cm × 20 cm) covered with an
opaque lid for the 20 s to specify the starting direction randomly.
Each trial, lasting 3 min, was started after lifting the box. If the
mice do not find the escape chamber within 3 min, they were gently
guided to the escape chamber and allow the 20 s to pass before being
returned to the waiting cage. The escape cage is maintained at a fixed
location for all trials. On days 15^th and 16^th, the mice once again
received the test trial for 3 min to check long‐term retention memory.
Primary latency was defined when a mouse first poked its nose into the
target hole. Mice were not tested during the period between the 6^th
and 15^th day. All trials were recorded and analyzed by ANY‐maze 6.0
Software. All behavioral tests were conducted in a blinded manner and
the ANY‐maze software was used to avoid any bias in behavior analysis.
All behavior data are expressed as means ± SEM. Statistical
significance was calculated by Student's t‐test.
Elevated plus maze test
An elevated plus maze test was performed to evaluate the anxiety‐like
behavior. The apparatus was comprised of two closed arms with high
walls (30 cm × 5 cm × 16 cm), two open arms with small walls
(30 cm × 5 cm × 0.5 cm), and a center platform (5 cm × 5 cm). Each arm
had a 10 cm end zone from the end of the arms. The apparatus was 40 cm
above the floor. Mice were placed at the center facing a closed arm.
Mice were allowed to move freely for 5 min. The time in the open arms
was measured and recorded by Any‐maze 6.0 software. All behavioral
tests were conducted in a blinded manner.
Amyloid‐β western blot
Half of the mouse brain samples were homogenized in RIPA buffer with a
protease inhibitor cocktail (Thermo Scientific, USA). Homogenized
samples were centrifuged at 20,000 × g for 10 min at 4°C, and the
supernatant was collected and stored at −80°C until use. According to
the manufacturer's instructions, protein concentrations were quantified
by a Bradford assay (Bio‐Rad, USA).
Protein samples were loaded onto sodium dodecyl sulfate‐polyacrylamide
gel electrophoresis (SDS‐PAGE) gel and transferred to polyvinylidene
difluoride (PVDF) membranes. PVDF membranes were blocked in 5% non‐fat
dry milk in tris‐buffered saline with 0.1% Tween 20 detergent (TBST).
PVDF membranes were washed and incubated at 4°C overnight with primary
anti‐Amyloid‐β antibody (BioLegend, USA) and anti‐β‐actin antibody
(Cell signaling, USA). Membranes were washed and incubated with
horseradish peroxidase (HRP)‐conjugated secondary antibodies (Abcam,
USA) for 2 h at room temperature. Protein bands were detected by using
Fusion Solo (Vilber Lourmat, France) with Miracle‐Star™ Western Blot
Detection System (iNtRON Bio, Korea). The intensity of the protein
bands was normalized against β‐actin and quantified using the ImageJ
software (National Institutes of Health, USA).
Author contributions
Jörg Gsponer: Conceptualization; software; formal analysis;
supervision; funding acquisition; investigation; methodology; writing –
original draft; project administration; writing – review and editing.
Dokyun Na: Conceptualization; software; formal analysis; funding
acquisition; investigation; methodology; writing – original draft;
writing – review and editing. Do‐Hwan Lim: Conceptualization; formal
analysis; investigation; visualization; methodology; writing – original
draft. Jae‐Sang Hong: Formal analysis; investigation; visualization;
methodology; writing – original draft. Hyang‐Mi Lee: Resources; data
curation; software; formal analysis; investigation; visualization.
Daeahn Cho: Software; formal analysis. Myeong-Sang Yu: Software; formal
analysis; investigation; visualization. Bilal Shaker: Software; formal
analysis; visualization. Jun Ren: Resources; software; formal analysis;
visualization. Bomi Lee: Formal analysis; investigation; visualization.
Jae Gwang Song: Formal analysis; investigation; visualization. Yuna Oh:
Formal analysis; investigation; visualization. Kyungeun Lee: Formal
analysis; investigation; visualization. Kwang‐Seok Oh: Formal analysis;
investigation; visualization. Mi Young Lee: Formal analysis;
investigation; visualization. Min‐Seok Choi: Formal analysis;
investigation; visualization. Han Saem Choi: Formal analysis;
investigation; visualization. Yang‐Hee Kim: Formal analysis;
investigation; visualization. Jennifer M Bui: Conceptualization; formal
analysis; investigation; methodology. Kangseok Lee: Resources; formal
analysis; investigation. Hyung Wook Kim: Data curation; formal
analysis; investigation; visualization; writing – original draft. Young
Sik Lee: Formal analysis; supervision; funding acquisition;
investigation; writing – original draft; writing – review and editing.
Disclosure and competing interests statement
The authors declare that they have no conflict of interest.
Supporting information
Appendix
[333]Click here for additional data file.^ (2.9MB, pdf)
Dataset EV1
[334]Click here for additional data file.^ (14.6KB, xlsx)
Dataset EV2
[335]Click here for additional data file.^ (13.8KB, xlsx)
Dataset EV3
[336]Click here for additional data file.^ (11.3KB, xlsx)
Dataset EV4
[337]Click here for additional data file.^ (10.5KB, xlsx)
Dataset EV5
[338]Click here for additional data file.^ (11.6KB, xlsx)
Dataset EV6
[339]Click here for additional data file.^ (20.4KB, xlsx)
Dataset EV7
[340]Click here for additional data file.^ (10.1KB, xlsx)
Dataset EV8
[341]Click here for additional data file.^ (10.2KB, xlsx)
Source Data for Appendix
[342]Click here for additional data file.^ (316.5KB, zip)
Source Data for Figure 4
[343]Click here for additional data file.^ (4MB, zip)
Source Data for Figure 5
[344]Click here for additional data file.^ (57.3KB, zip)
Acknowledgements