Abstract
Despite years of intense investigation, the mechanisms underlying
neuronal death in Alzheimer’s disease, remain incompletely understood.
To define relevant pathways, we conducted an unbiased, genome-scale
forward genetic screen for age-associated neurodegeneration in
Drosophila. We also measured proteomics, phosphoproteomics, and
metabolomics in Drosophila models of Alzheimer’s disease and identified
Alzheimer’s genetic variants that modify gene expression in
disease-vulnerable neurons in humans. We then used a network model to
integrate these data with previously published Alzheimer’s disease
proteomics, lipidomics and genomics. Here, we computationally predict
and experimentally confirm how HNRNPA2B1 and MEPCE enhance toxicity of
the tau protein, a pathological feature of Alzheimer’s disease.
Furthermore, we demonstrated that the screen hits CSNK2A1 and NOTCH1
regulate DNA damage in Drosophila and human stem cell-derived neural
progenitor cells. Our study identifies candidate pathways that could be
targeted to ameliorate neurodegeneration in Alzheimer’s disease.
Subject terms: Neurodegeneration, Alzheimer's disease, Data
integration, RNAi
__________________________________________________________________
In this study, Leventhal et al. integrate multi-omic data from human
and Drosophila models of Alzheimer’s disease to define regulators of
age-associated neurodegeneration in Alzheimer’s disease and the
pathways through which they act.
Introduction
Neurodegenerative diseases are characterized by a progressive loss of
neurons and preferentially affect older individuals. As the global
population ages, there is an increasing imperative to understand and
design effective therapies for neurodegenerative disorders. Brains from
patients with Alzheimer’s disease, the most common neurodegenerative
disorder, show pathological aggregation and deposition of extracellular
amyloid β plaques and intracellular neurofibrillary tangles comprised
of tau protein^[59]1–[60]3. Amyloid β plaques are predominantly made up
of 42-amino acid amyloid β peptides (amyloid β[1-42]), which are
processed from the larger amyloid precursor protein (APP)^[61]4 through
the action of the gamma-secretase complex. The presenilin proteins
comprise the protease subunit of gamma-secretase. Mutations in APP and
in the genes encoding the presenilins give rise to fully penetrant,
though rare, familial Alzheimer’s disease. Mutations in MAPT, the gene
encoding the microtubule-associated protein tau, have not been
discovered in Alzheimer’s disease patients. However, wild-type tau is
the major aggregating protein in a group of sporadic neurodegenerative
disorders characterized by tau deposition in neurofibrillary
aggregates, termed tauopathies. Further, missense mutations in MAPT
cause fully penetrant, severe forms of familial neurodegeneration,
strongly linking tau to aging-dependent neurodegeneration^[62]5.
Discovery of single gene mutations leading to familial forms of
neurodegenerative disease provided a critical starting point for
developing animal models and understanding underlying pathophysiology.
More recently, data derived from high content approaches has added to
clues from classical genetics. Genome-wide association studies (GWAS),
transcriptomic analysis, and quantitative trait locus (QTL) analysis
have identified genetic risk factors and associated molecular changes
underlying Alzheimer’s disease in the brain at bulk and single-neuron
resolution^[63]6–[64]13.
Despite the wealth of human pathological and genetic data, the pathways
through which both single gene mutations and QTL-associated molecular
changes impact neuronal cell death remain incompletely defined.
Complementary approaches are thus needed to define the full set of
mechanisms mediating neurodegenerative disease pathogenesis. A more
complete understanding of cell death pathways should provide an
important new set of therapeutic targets in Alzheimer’s disease and
related age-dependent neurodegenerative disorders^[65]14.
Forward screens in organisms, like Drosophila, with relatively short
lifespan, well-developed experimental tools, and conserved neuronal
biology, represent a potentially valuable method for discovering new
pathways mediating neurodegeneration. Indeed, prior genetic screens in
tauopathy model flies have identified a number of pathways mediating
toxicity of pathological human tau that are conserved in vertebrate
systems^[66]15–[67]28. Similarly, genetic screens in otherwise wild
type flies have identified mutants causing progressive
neurodegeneration with relevance to human disease, but these efforts
have been relatively modest in scale^[68]29–[69]33. Here we have
employed unbiased forward genetic screening at genome-scale in
Drosophila to define mechanisms required for maintenance of aging adult
neurons.
We then used a multi-omic integration approach to relate the hits from
our model organism screen to human disease and identify the pathways
that influence age-associated neurodegeneration. We measured
proteomics, phosphoproteomics and metabolomics in transgenic Drosophila
models of human amyloid β and tau to identify molecular changes
associated with Alzheimer’s disease toxic proteins (Fig. [70]1). To
determine how our transgenic Drosophila RNAi screen and the other model
organism data were related to human Alzheimer’s disease patients, we
generated RNA-sequencing (RNA-seq) data from pyramidal neuron-enriched
populations from the temporal cortex using laser-capture
microdissection^[71]34–[72]38 (Fig. [73]1). We identified fine-mapped
expression QTLs (eQTLs) and the eQTL-associated genes (eGenes) in
neurons vulnerable to disease pathology to find patterns of gene
expression associated with human genetic risk factors of Alzheimer’s
disease. Next, we integrated these multi-species, multi-omic data with
a previously published genome-scale screen for tau-mediated
neurotoxicity^[74]15, existing human Alzheimer’s disease GWAS hits,
proteomics, and metabolomics^[75]7,[76]9,[77]15,[78]39,[79]40 using
advanced network modeling approaches (Fig. [80]1)^[81]41,[82]42.
Fig. 1. Overview of analytical framework for multi-omic integration to study
the biological processes underlying neurodegeneration.
[83]Fig. 1
[84]Open in a new tab
We performed a forward genetic screen for age-associated
neurodegeneration in Drosophila. We measured proteomics,
phosphoproteomics and metabolomics in amyloid β (gold) and tau (purple)
models of Alzheimer’s disease and performed an eQTL meta-analysis of
previous Alzheimer’s disease studies. We used a network integration
model to integrate these new data with previously published human
proteomics, human genetics, human lipidomics, and Drosophila modifiers
of tau-mediated neurotoxicity. We tested hypotheses inferred from
subnetwork analysis of the network model in Drosophila and human
iPSC-derived neural progenitor cells. Created in BioRender. Leventhal,
M. (2025) [85]https://BioRender.com/y28s838.
Based on our integrated model, we nominated genes and pathways that
contribute to age-associated neurodegeneration in Alzheimer’s disease.
We experimentally tested the predicted functional effects of knockdown
of proposed targets in flies and in human induced pluripotent stem
cells. Specifically, we demonstrate that the human Alzheimer’s disease
genetic risk factor MEPCE and neurodegeneration screen hit HNRNPA2B1
regulate tau-mediated neurotoxicity. Furthermore, we show in flies and
iPSC-derived neural progenitor cells that NOTCH1 and CSNK2A1 regulate
the DNA damage response, suggesting pathways through which these genes
enhance neurodegeneration.
Results
A genome-scale, forward genetic screen for neurodegeneration in Drosophila
We performed a genome-scale, forward genetic screen in Drosophila to
identify genes required for maintenance of viability of aging neurons
in vivo. We used transgenic RNAi to knock down 5261 fly genes in
neurons using the UAS-GAL4 bipartite expression system and the widely
used elav-GAL4 driver, which mediates expression in the pattern of the
pan-neuronal gene elav^[86]43,[87]44. Genes were knocked down based on
availability of transgenic RNAi lines from the public Bloomington
Drosophila Stock Center and were not otherwise preselected. Adult flies
with neuronal gene knockdown were aged for 30 days and brain integrity
was assessed on tissue sections representing the entire brain. Scoring
was performed in a blinded fashion. Genes were identified as hits in
the screen if there was neuronal loss or vacuolation in the context of
a properly developed brain. Neurodegeneration is frequently accompanied
by vacuolation of the brain in flies and is commonly viewed as a
sensitive and specific measure of neurodegeneration in the brain aging
and neurodegenerative disease
models^[88]20,[89]29,[90]33,[91]45–[92]57. Vacuoles often occur in a
range of human neurodegenerative disease as well, including Alzheimer’s
disease and related tauopathies^[93]58–[94]64.We identified 198 genes
that promoted age-associated neurodegeneration in Drosophila after
knockdown (Supplementary Data [95]1). Strikingly, we recovered
orthologs of both APP (Appl) and the presenilins (Psn), genes mutated
in the two monogenic causes of familial Alzheimer’s disease. Our
findings are consistent with demonstration of age-dependent
neurodegeneration in Appl and Psn mutants in other
studies^[96]28,[97]65. In addition to Drosophila APP and Psn we also
recovered fly orthologs of genes mutated in monogenic forms of
Parkinson’s disease (PLA2G6/iPLA2-VIA), amyotrophic lateral sclerosis
(SOD1/Sod1, VAPB/Vap33), hereditary spastic paraparesis
(VPS37A/Vps37A), and mitochondrial encephalomyopathy (COX10/Cox10,
NDUFS7/ND-20, NDUFV1/ND-51, MSTO1/Mst, TTC19/Ttc19). Further connecting
the results of our screen with neurodegeneration, we observed an
enrichment for ubiquitin-associated pathways (Benjamini-Hochberg
FDR-adjusted p-value = 1.48*10^−^3). We also found that a list of genes
with significant age-associated proteomic and transcriptomic changes in
the Drosophila brain and neurons from previous studies were enriched
for neurodegeneration screen hits^[98]66–[99]70 (hypergeometric
p-value = 2.13*10^−^3, Supplementary Data [100]2). This result suggests
that the hits from the screen have some association with chronological
age.
Gene expression of the human orthologs of the neurodegeneration screen hits
declines with age and Alzheimer’s disease in the human brain
To assess more broadly if Drosophila screen hits were associated with
human neurodegeneration we analyzed RNA-seq data from 2642 human
post-mortem brain tissues from the Genotype-Tissue Expression (GTEx)
project. We assessed whether there was a significant association
between the mean expression of the human orthologs of the
neurodegeneration screen hits and age in these human brains. We found
that the average expression of neurodegeneration screen hits was
negatively associated with chronological age (Fig. [101]2a,
p = 1.14*10^−^5). There was a stronger negative association between
average gene expression and age for the neurodegeneration screen hits
than the average expression of all protein-coding genes (Fig. [102]2a,
all protein-coding genes: R = 0.12, neurodegeneration screen hits:
R = 0.15). We found there was a significant difference in the slopes of
the regression lines showing the relationship between gene expression
and age for neurodegeneration screen hits compared to that of all
protein-coding genes (p = 7.38*10^−^6). We subsequently ranked all
genes by the regression coefficients measuring the relationship between
gene expression and age. We performed Gene Set Enrichment Analysis on
this ranked list to identify which pathways had significant changes in
gene expression with respect to age. Our analysis showed a negative
association between the expression of screen hits and age
(Fig. [103]2b, Benjamini-Hochberg FDR-adjusted p-value < 0.1). We note
that the average expression of neurodegeneration screen hits is
significantly greater than that of all protein-coding genes (Wilcoxon
rank-sum test, p < 1*10^−16). To assess the robustness of our results,
we performed permutation tests by randomly shuffling the patient ages.
Not a single permutation out of 10,000 iterations had a more
significant association between age and gene expression of the screen
hits, suggesting that this result is specific to chronological age in
humans.
Fig. 2. Gene expression of neurodegeneration screen hits declines with
chronological age in the human brain.
[104]Fig. 2
[105]Open in a new tab
a Geometric mean expression in transcripts per million (TPM) of
neurodegeneration screen hits (neurodegeneration genes, orange) and all
protein-coding genes in the Genotype-Tissue Expression (GTEx) shows
that the expression of neurodegeneration screen hits declines with age
in human brain tissues (all protein-coding genes: two-sided t-test
Benjamini-Hochberg FDR-adjusted p = 2.91*10^−4, neurodegeneration
screen hits: two-sided t-test Benjamini-Hochberg FDR-adjusted
p = 1.14*10^q5). There is a significant difference in the slopes of the
trends between age and gene expression for neurodegeneration screen
hits and all protein-coding genes (all protein-coding genes: R = 0.12,
neurodegeneration screen hits: R = 0.15, p = 7.38*10^−6). Regression
lines indicate the relationship between age and TPM with a 95%
confidence interval (standard error of the mean). The mixed effects
regression analysis controlled for post-mortem interval, sex,
ethnicity, and tissue of origin. b Gene set enrichment plot showing
that the set of age-associated neurodegeneration genes has reduced
expression with respect to age. Vertical lines indicate rank of
neurodegeneration screen hits by their association between gene
expression and age determined by mixed-effects regression analysis
coefficients. c Proportion of genes that have significant associations
between gene expression and age relative to the set of all
protein-coding genes (blue, n = 20438) or the set of age-associated
neurodegeneration genes (orange, n = 163). Data is presented as
proportion of differentially expressed genes relative to total with 95%
binomial confidence intervals of the estimated proportion of genes with
a significant association with age. Asterisk indicates tissues with a
FDR-adjusted one-tailed hypergeometric test p-value less than 0.01.
Binomial test p-values after FDR correction: Amygdala=7.61*10^−2,
Caudate=2.15*10^−2, Anterior Cingulate cortex=5.49*10^−1, Nucleus
Accumbens=3.32*10^−2, Cerebellar Hemisphere=1.48*10^−4,
Cerebellum=1.74*10^−4, Frontal Cortex=3.17*10^−4,
Hypothalamus=1.24*10^−2, Hippocampus=1.85*10^−3, Cortex=1.85*10^−3,
Substantia Nigra=1.30*10^−1. d Proportion of protein-coding genes
(blue, n = 17926) and age-associated neurodegeneration genes (orange,
n = 169) that are differentially expressed between Alzheimer’s disease
(AD) and control in excitatory neurons in single-nucleus RNA-seq. Data
is presented as proportion of differentially expressed genes relative
to total with 95% binomial confidence intervals. Source data for (a) is
provided in the source data file.
Next, we examined expression of the human orthologs of the screen hits
with respect to age across regions of the human brain (Fig. [106]2c).
Tissues enriched in age-associated changes of the screen hits include
Alzheimer’s disease-vulnerable regions such as the hippocampus and the
frontal cortex (Fig. [107]2c, hypergeometric test Benjamini-Hochberg
FDR-adjusted p-value < 0.01). In many cases, the same genes showed
significant age-associated changes in expression in several tissues
(Supplementary Fig. [108]1, mixed effect model Benjamini-Hochberg
FDR-adjusted p-value < 0.1, absolute value of regression
coefficient> 0.1). We observed that the Alzheimer’s disease-vulnerable
tissues clustered together and with the Parkinson’s disease-vulnerable
substantia nigra by hierarchical clustering (Supplementary
Fig. [109]1). These human results suggest that the hits from our screen
are associated with human aging in multiple regions of the brain, some
of which are affected by common neurodegenerative diseases.
We analyzed the single nuclear RNA-seq data of excitatory neurons from
a previously published single-nucleus RNA-seq study to examine cellular
specificity of the neurodegeneration screen hits^[110]71. We observed
that the average expression of screen hits was lower in Alzheimer’s
disease-associated excitatory neurons than in excitatory neurons from
healthy controls (Supplementary Fig. [111]2). We also found that the
genes differentially expressed in Alzheimer’s disease-associated
excitatory neurons in this dataset were enriched for neurodegeneration
screen hits (Fig. [112]2d, Benjamini-Hochberg FDR-adjusted
p-value < 0.1). These results show that gene expression of the
neurodegenerative screen hits declines with respect to age in human
brain tissues and human Alzheimer’s disease excitatory neurons,
suggesting their importance in human disease and aging.
Human genetic risk factors enriched in disease-associated neurons complement
results from the neurodegeneration screen
We collected human gene expression, human genetics, and proteomics,
phosphoproteomics and metabolomics from Drosophila models of
Alzheimer’s disease to systematically characterize omic perturbations
in Alzheimer’s disease and further explore the role of
neurodegeneration screen hits to human neurodegeneration (Fig. [113]1).
For the human data, we used laser-capture microdissection to obtain
pyramidal neurons from the human temporal cortex of 75 individuals,
including 42 Alzheimer’s disease and 33 control individuals
(Fig. [114]3a). We examined temporal pyramidal neurons because they are
preferentially vulnerable to the formation of neurofibrillary tangles
and neurodegeneration^[115]35. We performed RNA-seq and eQTL analysis
on laser-captured material to examine how the hits from our transgenic
Drosophila RNAi screen relate to genetic causes of Alzheimer’s disease
in human neurons (Fig. [116]3a, TCPY in Supplementary Data [117]3). The
RNA-seq from these pyramidal neurons demonstrated high expression of
the excitatory neuron marker SLC17A7, as expected (Supplementary
Fig. [118]3a)^[119]72.
Fig. 3. Multi-omic changes in human Alzheimer’s disease patients and model
systems.
[120]Fig. 3
[121]Open in a new tab
a Schematic depicting the identification of eGenes from laser-capture
microdissection of temporal cortex pyramidal neuron-enriched
populations from 75 individuals including 42 human Alzheimer’s disease
(AD) and 33 healthy control patients and identification of eGenes. b
The eQTL associated with the eGene HLA-DRB1 is highlighted in red and
overlaps with DNA binding motifs of MEF2B, CUX1 and ATF2 derived from
ENCODE ChIP-seq and FIMO-detected motifs. Grey horizontal bars indicate
ChIP-seq binding regions and the black horizontal bars indicate where
the DNA-binding motif is located. c Dot plots showing the negative
log[10] FDR-adjusted hypergeometric test p-values for enriched GO terms
in proteins that are significantly upregulated or significantly
downregulated in both Drosophila models of tau and amyloid β, only
differentially abundant in Drosophila models of amyloid β (Amyloid β
only), or only differentially abundant in Drosophila models of tau (Tau
only). d Heat maps depict the log2 fold changes between Aβ[1-42]
transgenic flies (Amyloid β) or tau^R406W (Tau) transgenic flies with
controls for (d) proteins or (e) phosphoproteins that were hits in the
age-associated neurodegeneration screen. An asterisk indicates whether
the comparison was significant at an FDR threshold of 0.1. The columns
of all heatmaps were clustered by hierarchical clustering.
We first performed an eQTL meta-analysis across 7 different bulk
RNA-seq and genomics studies in post-mortem brains (Supplementary
Data [122]3 and [123]4, Methods). The results from this meta-analysis
were then forwarded to the eQTL analysis in the newly collected
temporal cortex pyramidal neuron RNA-seq data to see which brain eQTLs
were enriched in Alzheimer’s disease-vulnerable neurons. We found
cis-regulatory effects in the pyramidal neuron-enriched transcriptomes
for 12 eGenes (Table [124]1). The enriched genes included C4A, EPHX2,
PRSS36, and multiple MHC class II genes (Table [125]1). Expression of
the eGenes was correlated with several known biological processes
previously associated with Alzheimer’s disease such as insulin
signaling, protein folding and lipid metabolism^[126]71,[127]73–[128]84
(Supplementary Fig. [129]3b). We incorporated the eGenes from the
temporal cortex pyramidal neurons and the meta-analysis in our analysis
of the fly screen hits.
Table 1.
eQTLs linked to Alzheimer’s disease GWAS loci
Chromosome:base pair position (hg19) Previously nominated GWAS
candidate eGene eQTL Ref/alt allele P Fixed effects regression
coefficient
6:32626139 HLA-DRB1 C4A rs6905975 C:G 1.53E-02 −0.319
8:27400592 CLU/PTK2B EPHX2 rs66924402 A:C 4.63E-02 0.170
6:32627485 HLA-DRB1 HLA-DQA1 rs9273432 T:C 3.69E-02 −0.414
6:32608251 HLA-DRB1 HLA-DQA2 rs28383408 C:G 9.15E-03 0.428
6:32628030 HLA-DRB1 HLA-DQB1 rs9273471 G:A 2.49E-03 −0.866
6:32608820 HLA-DRB1 HLA-DQB1-AS1 rs9272670 C:T 3.35E-02 −0.390
6:32663564 HLA-DRB1 HLA-DQB2 rs5000634 A:G 3.95E-05 0.690
6:32579035 HLA-DRB1 HLA-DRB1 rs9271209 G:A 6.94E-07 −0.686
6:32574990 HLA-DRB1 HLA-DRB5 rs9271025 T:C 5.80E-04 −0.791
16:31154146 KAT8 PRSS36 rs1549299 G:A 3.04E-02 −0.393
7:100190116 ZCWPW1 PVRIG rs2734895 T:C 2.99E-02 −0.446
6:47413226 CD2AP RP11-385F7.1 rs6934735 A:T 4.43E-02 −0.333
[130]Open in a new tab
eGenes and variants from an eQTL analysis of temporal cortex pyramidal
neuron-enriched population from human Alzheimer’s disease and control
patients. Chi-squared p-value from meta-analysis across 1087 human
Alzheimer’s disease and control patients across 7 previously published
studies is also reported. Beta coefficient indicates the association
between gene expression of the eGene and presence of Alzheimer’s
disease. Chromosomal coordinates are reported according to the human
genome reference hg19 and the hypothetical gene is the variant reported
in Jansen et al. 2019 for that particular locus^[131]11.
We hypothesized that some temporal cortex pyramidal neuron eQTLs
influence eGene expression by disrupting transcription factor binding.
We used the ENCODE 3 transcription factor ChIP-seq data to see which
eQTLs overlapped transcription factor peaks and DNA-binding motifs
(Fig. [132]3b). We found that the eQTL (rs9271209) for HLA-DRB1
overlapped with ChIP-seq peaks and DNA-binding motifs for the
transcription factors MEF2B, CUX1 and ATF2 (Fig. [133]3b). Patients
with the rs9271209 eQTL have reduced expression of HLA-DRB1, suggesting
that this Alzheimer’s disease-associated effect on gene expression
could be mediated through inhibition of transcription factor binding
(Fig. [134]3b and Table [135]1).
Proteomics, phosphoproteomics and metabolomics from Drosophila models of
tauopathy or amyloid β neurotoxicity show significant changes in
neurodegeneration screen hits in disease contexts
We generated proteomic, phosphoproteomic and metabolomic data from the
heads of established Drosophila models of amyloid β and tau toxicity to
broadly characterize molecular changes in models of Alzheimer’s disease
pathology^[136]85,[137]86 (Fig. [138]1, Supplementary Fig. [139]4, and
Supplementary Data [140]5–[141]7). Specifically, we modeled amyloid β
pathology using a transgenic fly line expressing the human amyloid
β[1-42] isoform (Aβ[1-42] transgenic flies)^[142]86. We modeled tau
pathology using a well-characterized transgenic fly line expressing
human MAPT with the neurodegenerative disease-associated R406W point
mutation (tau^R406W transgenic flies)^[143]85. We used tau^R406W
transgenic flies because these flies display a modest, but detectable
degree of neurodegeneration at 10 days of age^[144]85. We aged control
and experimental flies for 10 days and measured proteomics,
phosphoproteomics and metabolomics. Proteins downregulated in both
Aβ[1-42] transgenic flies and tau^R406W transgenic flies were enriched
for enzymes that metabolize carboxylic acids, amino acids, and lipids,
suggesting shared manifestations of molecular pathology (Fig. [145]3c,
Benjamini-Hochberg FDR-adjusted p-value < 0.1). Additionally, we found
that proteins that were upregulated in Aβ[1-42] transgenic flies and
tau^R406W transgenic flies were enriched for muscle development and
cell adhesion (Fig. [146]3c, g:SCS FDR-adjusted p-value < 0.1).
Proteins that were only differentially abundant in Aβ[1-42] transgenic
flies were enriched for Golgi-associated processes, while proteins that
were only differentially abundant in tau^R406W transgenic flies were
enriched for wound healing responses and amino acid synthesis pathways
(Fig. [147]3c, Benjamini-Hochberg FDR-adjusted p-value < 0.1). These
results suggest biological processes that are common among models of
Alzheimer’s disease, which are specific to amyloid β pathology, and
which are specific to tau-mediated pathology.
Few of the neurodegeneration screen hits were differentially abundant
in the proteomic and phosphoproteomic data, and the lack of overlap is
a phenomenon that has been noted previously in other comparisons of
genetic and expression data^[148]87. The screen hits that were
differentially abundant in the Aβ[1-42] transgenic fly proteomics were
enriched for biological processes pertaining to development and
cognition (Fig. [149]3d, Supplementary Fig. [150]4g, Benjamini-Hochberg
FDR-adjusted p-value < 0.1). None of the screen hits were
differentially phosphorylated in the tau^R406W transgenic flies, while
there were 11 phosphopeptides found in neurodegeneration screen hits
that were differentially phosphorylated in Aβ[1-42] transgenic flies
(Fig. [151]3e). Despite the small overlap, our results highlight
neurodegeneration screen hits that are differentially altered in models
of Alzheimer’s disease pathology. Furthermore, we observe that more
neurodegeneration screen hits show proteomic and phosphoproteomic
changes in Aβ[1-42] transgenic flies than in tau^R406W transgenic flies
(Fig. [152]3d, e).
Network integration of Alzheimer’s disease omics and novel genetic screening
data identifies subnetworks representing biological processes underlying
neurodegeneration in Alzheimer’s disease
We performed network integration of our Drosophila neurodegeneration
screen hits with Alzheimer’s disease multi-omics to determine how the
neurodegeneration screen hits contribute to human Alzheimer’s disease
(Fig. [153]1). We integrated the hits from the neurodegeneration screen
with our human eGenes and Drosophila proteomics, phosphoproteomics and
metabolomics, a previously published genome-scale screen for tau
mediated neurotoxicity tau^R406W flies, previously published human
Alzheimer’s disease proteomics, and previously published human
lipidomics using the Prize-collecting Steiner Forest algorithm (PCSF)
to build a protein-protein/protein-metabolite interaction network model
of Alzheimer’s disease^[154]15,[155]39,[156]40 (Figs. [157]1 and
[158]4a; Supplementary Data [159]8). Briefly, the method links as many
input nodes as possible while minimizing the number of low-confidence
edges in the final network. The result includes a mix of the input
nodes and new interactors of interest that were not part of the input
set, which we call “predicted nodes” (Fig. [160]4a, Methods). We
filtered out predicted nodes that did not show up in a sufficient
number of networks after 100 edge permutations and were found in too
many networks with randomized node weights (Supplementary Fig. [161]5,
Methods). We found that the majority of the nodes in our network were
observed in multiple hyperparameter selections, showing that these
results were robust to parameter choices (Supplementary Data [162]8).
This procedure filtered out 19102 nodes from the initial solution. We
compared our results to a previously published study in a different
disease indication, medulloblastoma, to emphasize the specificity of
our results to Alzheimer’s disease. We found that while both analyses
had subnetworks enriched for cell cycle regulation, none of the nodes
in these overlapping enrichments were shared. This result supported our
claim that our Alzheimer’s disease-associated network was distinct from
networks derived from the same method in different disease
indications^[163]88. The detailed results of this network are
visualized in an interactive website (Methods, Data availability).
Fig. 4. Network integration of Alzheimer’s disease multi-omics and novel
genetic screening data identifies subnetworks characterized by hallmarks of
neurodegeneration and processes previously not implicated in Alzheimer’s
disease.
[164]Fig. 4
[165]Open in a new tab
a Network integration of human and Drosophila multi-omics for
Alzheimer’s Disease highlights subnetworks enriched for proteins
belonging to known gene ontologies. Each subnetwork is represented by a
pie chart, which indicates the proportion of nodes represented by a
given data type. Edge width is determined by the number of interactions
between nodes within or with another subnetwork and colored by one of
the involved subnetworks. Each pie chart is labeled by the enriched
biological process by hypergeometric test (FDR-adjusted p-value less
than 0.1). b A subnetwork enriched for postsynaptic activity. Nodes
belonging to the annotated process are highlighted in yellow. Also in
this subnetwork are metabolites associated with postsynaptic activity
such as acetylcholine. c Phosphorylated tau, APOE, and APP-processing
proteins interact with each other and are in a subnetwork enriched for
NOTCH signaling-associated genes. Members of the NOTCH signaling
pathway are highlighted in yellow.
Louvain clustering of the network revealed subnetworks enriched for
biological processes associated with Alzheimer’s disease in previous
studies, such as insulin signaling, postsynaptic activity, and
double-strand break
repair^[166]76–[167]78,[168]83,[169]84,[170]89–[171]91 (Fig. [172]4a).
Subnetworks were also enriched for cell signaling pathways such as
NOTCH signaling and hedgehog signaling that have not been previously
characterized as hallmarks of neurodegeneration^[173]84 (Fig. [174]4a).
We found that the subnetworks associated with postsynaptic activity and
insulin signaling were enriched for genes that were differentially
expressed in excitatory neurons in a previously published study with
respect to Alzheimer’s disease status, Braak staging, and amyloid
plaque burden^[175]71 (Supplementary Data [176]8). The subnetworks
enriched for NOTCH signaling and condensation of prometaphase
chromosomes were enriched for differentially expressed genes in
excitatory neurons with respect to Alzheimer’s disease status and Braak
staging (Supplementary Data [177]8). We chose Louvain clustering
instead of Leiden clustering because the Louvain clusters had a higher
modularity score than the Leiden clusters (Louvain Q = 0.415, Leiden
Q = 0.412). We note that there is a significant overlap in the number
of biological processes enriched in the subnetworks derived from
Louvain clustering or Leiden clustering, suggesting that our findings
are robust to the clustering method of choice (hypergeometric test
p-value < 1*10^−16).
506 of the 1008 nodes in the network had no previously known
association with Alzheimer’s disease according to OpenTargets, which
supports the ability of the network-based approach to identify new
regulators of biological processes in Alzheimer’s disease. Out of the
690 input nodes in the network, only 60 were enriched in more than one
data type. The largest intersection was of 15 nodes shared between
Drosophila amyloid β proteomics and phosphoproteomics. These
overlapping nodes were enriched for ERBB4 signaling, NRIF-mediated cell
death and ion transport (Supplementary Data [178]8). These observations
highlight the contribution of network analysis in integrating findings
from multiple data sources to discover disease-associated biological
processes and to design experiments about specific nodes of interest.
We re-ran the Prize-Collecting Steiner Forest omitting one data type at
a time to determine the relative contribution of each data type to the
network solution. Most of the nodes that were missing from these new
network solutions were those contributed by the omitted data type
(Supplementary Data [179]8). Regardless of the data type removed, each
PCSF result had a subnetwork enriched for mitochondrial biogenesis and
at least one pathway involved in postsynaptic activity (Supplementary
Data [180]8). We note that the networks that lacked neurodegeneration
screen hits were not enriched for hedgehog signaling, insulin
signaling, autophagy, Gɑq signaling events, and condensation of
prometaphase chromosomes like our final network (Supplementary
Data [181]8). Overall, removing the neurodegeneration screen hits
removed the largest number of pathway enrichments compared to the final
network solution (Supplementary Data [182]8). In contrast, removing
Alzheimer’s disease GWAS hits removed pathway enrichments for Gαq
signaling events and NOTCH signaling (Supplementary Data [183]8). These
results underscore the importance of neurodegeneration screen hits in
our network analysis, and how each integrated datatype influenced the
enrichment of different biological processes in our final network
solution.
We inspected the nodes of our network communities to determine whether
the subnetworks represented expected or new relationships in the
context of Alzheimer’s disease. The subnetwork enriched for
postsynaptic activity showed expected protein-metabolite and
protein-protein interactions in choline metabolism^[184]92,[185]93
(Fig. [186]4b). We observed interactions involving the metabolite
acetylcholine with choline O-acetyl transferase (CHAT) and choline
transporter (SLC22A1) (Fig. [187]4b). Additionally, we saw interactions
between choline, CHAT and choline transporters SLC22A1 and SLC22A2
(Fig. [188]4b). This subnetwork illustrates the ability of our network
analysis to recover established biological processes in Alzheimer’s
disease.
A novel role of NOTCH signaling emerged in one subnetwork that linking
members of the pathway with phosphorylated tau, members of the gamma
secretase complex, the APOE protein (Fig. [189]4c). Each of these
proteins has been associated with hallmarks of Alzheimer’s
disease^[190]94–[191]98. This subnetwork suggests that Notch signaling
could influence the previously characterized relationship between
amyloid β and APOE^[192]99. These results suggest roles for NOTCH
signaling proteins in Alzheimer’s disease-mediated pathology.
We found strong overlaps for pathway enrichments between our network
and networks derived from comparable methods. We compared our result to
those of ROBUST and GNNExplainer (Supplementary Data [193]8). We found
a significant overlap in the number of enriched biological pathways in
subnetworks derived from the Prize-Collecting Steiner Forest and ROBUST
(hypergeometric p-value < 1*10^−16). We further found a significant
overlap between nodes in the Prize-Collecting Steiner Forest-derived
network and the ROBUST-derived network (hypergeometric
p-value < 1*10^−16). Some pathways were only enriched in one method:
viral response pathways were only enriched in the ROBUST-derived
network, inositol phosphate metabolism pathways were only enriched in
the GNNExplainer-derived network and respiratory electron transport
pathways were only enriched in the Prize-Collecting Steiner Forest
results. Overall, our network analysis findings were not highly
dependent on our network integration method of choice.
Network integration of Alzheimer’s disease Omics and genetic hits reveals
HNRNPA2B1 and MEPCE as targets that regulate tau-mediated neurotoxicity
We then experimentally tested implications of a subnetwork linking a
screen hit (HNRNPA2B1) and an eGene (MEPCE) with Drosophila modifiers
of tau toxicity^[194]15 (Fig. [195]5a). The HNRNPA2B1 protein is known
as an intronic splice suppressor, while MEPCE is known as a protein
that provides a methyl phosphate cap on 7SK non-coding RNA. HNRNPA2B1
was of particular interest because it was the only neurodegeneration
screen hit that interacted with phosphorylated tau in our network
(Supplementary Fig. [196]6). We selected MEPCE for follow-up
experimentation because we observed interactions between the MEPCE
protein and the HNRNPA2B1 protein, MEPCE did not previously have an
association with tau-mediated neurotoxicity, and MEPCE had the
highest-confidence Drosophila ortholog among the hits from our eGene
analysis. We knocked down the fly orthologs of HNRNPA2B1 or MEPCE in a
Drosophila model of tauopathy with two independent RNAi lines per gene
(Fig. [197]5b, c). To increase relevance to Alzheimer’s disease, in
which wild type human tau is deposited into neurofibrillary tangles, we
used transgenic flies expressing wild-type human tau (tau^WT) in the
fly retina^[198]85. We found that knockdown of fly orthologs of either
HNRNPA2B1 or MEPCE enhanced tau retinal toxicity, as quantified using a
previously described semi-quantitative rating scale based on
morphologic disruption and reduction in size of the adult fly eye
following expression of human tau^[199]100 (Fig. [200]5b, c, one-way
ANOVA with Tukey’s post-hoc correction p < 0.05). Enhancement of tau
toxicity in the fly retina is consistent with the human eQTL results,
which show that MEPCE expression is reduced in Alzheimer’s disease
patients with the eQTL rs7798226 (Supplementary Data [201]4) and
suggest a mechanism for effects of the GWAS variant in Alzheimer’s
disease.
Fig. 5. Network integration of Alzheimer’s disease multi-omics and novel
genetic screening data reveals biological processes associated with
tau-mediated neurotoxicity.
[202]Fig. 5
[203]Open in a new tab
a The proteins coded by the neurodegeneration modifier HNRNPA2B1 and
the eGene MEPCE interact with each other and have protein-protein
interactions with modifiers of tau neurotoxicity. The interaction
between HNRNPA2B1 and MEPCE is found in the subnetwork in Fig. [204]4
that is enriched for insulin signaling. b Knockdown of the Drosophila
orthologs of HNRNPA2B1 (Hrb98DE) and MEPCE (CG1293) shows enhancement
of the rough eye phenotype in flies expressing wild type human tau. c
Quantification of rough eye severity. The scale reflects the extent of
morphological disruption after human tau retinal expression (Methods).
Statistical significance was measured using a one-way ANOVA with
Tukey’s post-hoc correction and is indicated with an asterisk. Error
bars are the standard error of the mean. Two independent RNAi
constructs were used to knock down each gene. n = 8. Control is
GMR-GAL4/+. Flies are one day old. The boxplots are defined where the
line within the box depicts the median, the 25th-75th percentiles are
the boundaries of the box and the whiskers depict 1.5 times the
interquartile range. d Volcano plot depicting differential expression
analysis by DeSeq2 of bulk RNA-seq after HNRNPA2B1 CRISPRi knockdown in
NGN2 neural progenitor cells (Benjamini-Hochberg FDR-adjusted two-sided
Wald test p-value < 0.1, absolute log[2] fold change > 1). Each dot
represents a single gene. The horizontal dashed line indicates the
negative log[10] FDR-adjusted p-value significance cut-off of 0.1 and
the vertical dashed lines indicate the log[2] fold change cut-offs of 1
and −1. Red dots indicate genes that are significantly upregulated and
blue dots indicate genes that are significantly downregulated. e Dot
plot of the enriched pathways identified by gene set enrichment
analysis of the RNA-seq data. The 10 pathways with the highest negative
log[10] FDR-adjusted Gene Set Enrichment Analysis empirical p-value are
plotted. The size of the dot indicates the proportion of genes that are
part of the enriched pathway. The color of the dot represents the
normalized enrichment score (NES), where blue indicates downregulation
and red indicates upregulation. The x-position of the dot indicates the
negative log[10] FDR-adjusted Gene Set Enrichment Analysis empirical
p-value and the y-position is the corresponding, enriched pathway.
Source data for (c) is provided in the source data file.
To understand how HNRNPA2B1 contributes to age-associated
neurodegeneration in human systems, we performed RNA-seq after CRISPRi
knockdown of HNRNPA2B1 in human iPSC-derived, NGN2 neural progenitor
cells. Our knockdown achieved a partial reduction of HNRNPA2B1 gene
relative to control (Supplementary Fig. [205]7a, log[2](Fold
Change)=-0.60, Benjamini-Hochberg FDR-adjusted p-value < 0.1).
Differential expression after HNRNPA2B1 knockdown showed that the most
significantly downregulated genes involved those involved in neuronal
development or synaptic activity such as SCG2, FABP7, TENM1, and SIX3
(Fig. [206]5d and Supplementary Data [207]9, log[2](Fold Change)< −1,
Benjamini-Hochberg FDR-adjusted p-value < 0.1). No individual
subnetwork was enriched for differentially expressed genes after
HNRNPA2B1 knockdown. Gene Set Enrichment Analysis showed that the top
enriched pathways include downregulation of the electron transport
chain and of genes involved in postsynaptic events (Fig. [208]5e and
Supplementary Data [209]10, Benjamini-Hochberg FDR-adjusted
p-value < 0.1). Reduced postsynaptic activity and electron transport
chain activity have been previously associated with Alzheimer’s disease
and tau-mediated neurotoxicity^[210]83,[211]101–[212]106. These changes
suggest potential mechanisms by which HNRNPA2B1 contributes to
tau-mediated neurotoxicity and neurodegeneration in human aging.
Network analysis implicates CSNK2A1 and NOTCH1 as regulators of the
Alzheimer’s disease-associated biological process of DNA damage repair in
neurons
In addition to the network connections between NOTCH signaling proteins
and hallmark proteins of Alzheimer’s disease (Fig. [213]4c), we also
noted that NOTCH signaling proteins were linked with the Alzheimer’s
disease-associated process of DNA damage, a process also associated
with Alzheimer’s disease^[214]72,[215]89–[216]91,[217]107,[218]108
(Fig. [219]6a). We calculated the network betweenness of all
neurodegeneration screen hits with regulators of DNA damage repair
(Supplementary Data [220]11). We found that NOTCH1 and the
neurodegeneration screen hit CSNK2A1 were among the top 5
neurodegeneration genes in terms of network betweenness among nodes
associated with DNA damage repair (Supplementary Data [221]11). CSNK2A1
and NOTCH1 shared some interactors that regulated DNA damage repair
(Fig. [222]6a). NOTCH1 is a key receptor in NOTCH signaling that
regulates cell-fate decisions. CSNK2A1 is a kinase for many proteins
including casein; the activity of CSNK2A1 regulates processes such as
apoptosis and cellular proliferation. All the interacting DNA damage
repair-associated nodes that interact with CSNK2A1 and NOTCH1 except
for H2AFX and COPS2 regulate double-strand break repair, suggesting
that CSNK2A1 and NOTCH1 knockdown may disrupt the DNA damage response.
Fig. 6. Network analysis implicates neurodegeneration genes as regulators of
the AD-associated biological process of DNA damage repair.
[223]Fig. 6
[224]Open in a new tab
a NOTCH1 and CSNK2A1 interactors (highlighted in yellow) are involved
in DNA damage repair. b Knockdown of Drosophila orthologs for NOTCH1(N)
and CSNK2A1 (CKIIa and CKIIb lead to increased DNA damage in adult
neurons as measured by increased pH2Av-positive foci (red, arrowheads);
elav immunostaining (green) marks neurons; nuclei are identified with
DAPI (blue). Scale bar = 5 µm. c Percent of pH2Av foci in control and
knockdowns of CKIIa, CKIIb and N. Asterisks indicate p < 0.01 with a
one-way binomial test after Benjamini-Hochberg FDR correction with 95%
binomial confidence intervals (n = 6, N RNAi 1 p = 2.04*10^−6, N RNAi 2
p = 5.87*10^−12, CKIIa p = 1.34*10^−31, CKIIb p = 1.42*10^−29). d COMET
assay shows increased DNA damage in human iPSC neural progenitor cells
after inhibition of Casein Kinase 2 (CK2) by CX-4945 and inhibition of
NOTCH cleavage. e Quantification of mean tail moments from panel A in
arbitrary units. Asterisks indicate p < 0.01 by one-way ANOVA with
Tukey’s Post-Hoc correction (p = 6.01*10^−3, Compound E vs DMSO,
p = 3.95*10^−3, CX-4945 vs DMSO). Error bars = +/- SEM (DMSO n = 693,
CX-4945 n = 793, Compound E n = 860). f Dot plots showing the
normalized enrichment scores (NES) of selected, significantly enriched
pathways after CSNK2A1 and NOTCH1 knockdown in NGN2 neural progenitor
cells. Red and blue dots indicate positive (upregulation) and negative
(downregulation) NES, respectively. g Representative immunofluorescence
images of mature neurons in Drosophila brains show inappropriate cell
cycle re-entry in postmitotic neurons as indicated by PCNA expression
(red, arrow) following CKIIa knockdown. elav (green) identifies
neurons. h) Quantification of PCNA expression in control brains and
brains of Drosophila with knockdown of orthologs of CSNK2A1 (CKIIa and
CKIIb). Asterisks indicate p < 0.01 by one-way ANOVA with Tukey’s
Post-Hoc correction (p = 4.94*10^−5, CKIIa vs control, p = 3.92*10^−4,
CkIIb vs control). n = 6. Control is elav-GAL4/ + ; UAS-Dcr-2/+. Flies
are 10 days old. For the boxplots: the line marks the median, the
25th-75th percentiles are the boundaries of the box and the whiskers
depict 1.5 times the interquartile range. The source data for (c, e, h)
are provided in the source data file.
Next, we used RNAi to knock down Drosophila orthologs of NOTCH1 and
CSNK2A1 in a pan-neuronal pattern in the aging adult fly brain to
assess the relationship between these neurodegeneration screen hits and
DNA damage (Fig. [225]6b, c). To exclude off-target effects we used two
independent RNAi transgenes targeting the NOTCH1 ortholog N
(Fig. [226]6b, c). For CSNK2A1 we used one RNAi to target the CkIIa
subunit of the casein kinase holoenzyme and another RNAi to target the
CkIIb subunit of the casein kinase holoenzyme (Fig. [227]6b, c) because
other available transgenic RNAi lines were lethal with pan-neuronal
expression. We observed that knockdown of the Drosophila orthologs for
NOTCH1 and CSNK2A1 led to an increase in DNA damage, as measured by the
number of phospho-H2Av (Drosophila ortholog of mammalian phosphor-H2AX)
foci (Fig. [228]6b, arrowheads, c, One-Way Binomial Test p < 0.01).
We performed a COMET assay in wild-type human neuronal progenitor cells
treated with inhibitors for the Notch signaling pathway or the casein
kinase holoenzyme (CK2) to test if reduced CSNK2A1 or NOTCH1 function
leads to increased DNA damage in human cells (Fig. [229]6d, arrows, e).
We observed that treatment with the Notch inhibitor Compound E and the
CK2 inhibitor CX-4945 led to an increase in the tail moment of the
neural progenitor cells compared to DMSO treatment, showing an increase
in DNA damage after inhibitor treatment (Fig. [230]6d, arrows, e, ANOVA
with Tukey’s post-hoc correction p-value < 0.01, Methods). These
results show how the screen hits NOTCH1 and CSNK2A1 regulate DNA damage
in human and Drosophila neurons, as inferred by our computational
network analysis.
Transcriptomic analysis suggests how CSNK2A1 and NOTCH1 knockdown could lead
to age-associated neurodegeneration through distinct DNA-damaging pathways
We performed RNA-seq after CRISPRi knockdown of CSNK2A1 or NOTCH1 in
NGN2-expressing neural progenitor cells to broadly understand how human
cells respond to reduced CSNK2A1 and NOTCH1 expression (Fig. [231]6).
Expression of both target genes dropped significantly in the respective
knockdowns (CSNK2A1: log[2](fold change)<−1, FDR-adjusted
p-value < 0.1, Supplementary Fig. [232]7b; NOTCH1: log[2](fold
change)= −0.92, FDR-adjusted p-value < 0.1, Supplementary
Fig. [233]7c), with good clustering of replicates in PCA analysis
(Supplementary Fig. [234]7d). We found 145 significantly upregulated
and 282 significantly downregulated genes upon knocking down CSNK2A1,
while we found 15 significantly upregulated and 5 significantly
downregulated genes after knocking down NOTCH1 (Supplementary
Fig. [235]8a, b and Supplementary Data [236]9, absolute value of
log[2](fold change)>1, FDR-adjusted p-value < 0.1). The disparity in
the number of differentially expressed genes could be explained by how
the knockdown efficiency of NOTCH1 was less than that of CSNK2A1
(Supplementary Fig. [237]7a,b). We then used Gene Set Enrichment
Analysis (GSEA), which can identify coordinated expression changes,
even when some fall below univariate statistical thresholds. GSEA found
significant enrichment for DNA damage repair pathways. However, we were
surprised to find that these pathways were upregulated after CSNK2A1
knockdown but downregulated after NOTCH1 knockdown (Fig. [238]6f and
Supplementary Data [239]10). Interestingly, we also found an
upregulation for cell cycle regulating-genes after CSNK2A1 knockdown
(Fig. [240]6f).
Examining the expression changes in the context of the network, we
found that that the effects of CSNK2A1 knockdown decreased
significantly the greater the network distance from CSNK2A1
(Kruskal-Wallis test p = 4.77*10^−4, Supplementary Fig. [241]8c). There
was no significant variation in log[2] change between NOTCH1 knockdown
and control with respect to the degree of separation from NOTCH1, which
could also be a consequence of lower NOTCH1 knockdown efficiency than
CSNK2A1 knockdown efficiency (Kruskal-Wallis test p = 0.146,
Supplementary Fig. [242]8d).
To determine if CSNK2A1 knockdown could alter cell cycle in aging
postmitotic neurons in vivo, we knocked down the Drosophila the CkIIa
and CKIIb subunits of the casein kinase holoenzyme in adult Drosophila
brain neurons and assessed changes in proliferating cell nuclear
antigen (PCNA), a robust marker of cell cycle activation in Drosophila
and mammalian systems^[243]20,[244]109,[245]110 (Fig. [246]6g, arrow,
h). We found a significant increase in PCNA following CKIIa knockdown
of either CKIIa or CKIIb (Fig. [247]6h, ANOVA with Tukey’s post-hoc
correction p = 4.94*10^−5), supporting our hypothesis that knockdown of
CSNK2A1 promotes neuronal activation of cell cycle regulators. As
expected, there was no PCNA activation in control postmitotic neurons
(Fig. [248]6g, h). Activation of cell cycle proteins in mature neurons
is associated with Alzheimer’s disease, cell death, and double-strand
break-bearing neurons^[249]20,[250]111–[251]114. Our results
collectively suggest that CSNK2A1 knockdown may lead to
neurodegeneration through accumulation of DNA damage and subsequent
inappropriate activation of the cell cycle in postmitotic neurons
leading to cell death.
Discussion
We have integrated results from the largest reported forward genetic
screen for age-related neurodegeneration mutants with multi-omic data
from Alzheimer’s disease patients and Alzheimer’s disease relevant
models to computationally and experimentally define pathways
controlling neurodegeneration. One highlight of our work is the
demonstration that CSNK2A1 and NOTCH1 regulate age-associated
neurodegeneration through DNA damage response pathways (Figs. [252]5
and [253]6). We provide evidence that CSNK2A1 and NOTCH1 regulation of
the Alzheimer’s disease-associated process of DNA damage is a key
process for neurodegeneration among the many processes these genes
regulate. Previous studies showed that that HDAC inhibitors reduced DNA
damage burden and neuronal cell
death^[254]72,[255]89,[256]107,[257]108,[258]115–[259]117. Other
studies have proposed neuroprotective compounds that inhibit cell cycle
re-entry in postmitotic neurons like we observed upon CSNK2A1
knockdown^[260]118. Our study suggests that the CSNK2A1 and NOTCH1
pathways should also be explored for potential approaches to prevent
DNA damage-associated neurodegeneration.
Future work could explore cause-and-effect relationships between DNA
damage and activation of cell cycle genes in the context of CSNK2A1
knockdown. Currently, the relationship between cell cycle regulators
and DNA damage in neurodegeneration is unclear^[261]119. One hypothesis
supported by our results is that CSNK2A1 knockdown leads to
neurodegeneration by activating genes that promote DNA replication and
entry into the G[1] phase of the cell cycle, amplifying existing DNA
damage in the neuron (Fig. [262]6d, e). Alternatively, excess
accumulation of DNA damage upon CSNK2A1 knockdown could lead to
inappropriate activation of cell cycle regulators and DNA repair
proteins to fix DNA damage (Fig. [263]6d, e). Follow-up work can
determine the causes or consequences of DNA damage as regulated by
CSNK2A1. Such studies can help inform neuroprotective approaches for
limiting age-associated DNA damage.
In another advance from our study, we suggest how changes in MEPCE
expression contribute to neuronal death in Alzheimer’s disease
(Fig. [264]5). Our eQTL analysis showed that patients that inherited
the rs7798226 eQTL had reduced MEPCE expression and our experimental
data shows that reduced expression of MEPCE enhances tau toxicity in
the Drosophila nervous system (Fig. [265]5b,c and Supplementary
Data [266]3). Future studies could investigate whether the
downregulation of MEPCE in patients with the rs7798226 eQTL is strong
enough to induce tau-mediated neurotoxicity in humans. This example
illustrates how multi-omic network integration identified pathways
potentially downstream of a disease-causing variant. Our network
analysis work identified an eQTL that may play a role in Alzheimer’s
disease-mediated neurodegeneration, which is an inference that could
not be made from fine-mapping analysis alone.
Interestingly, some of our network findings differ from expectations in
the literature. We found from our network analysis and subsequent
experimentation in human tau transgenic flies that knockdown of
HNRNPA2B1 led to increased age-associated neurodegeneration and
increased tau-mediated neurotoxicity (Fig. [267]5). However, HNRNPA2B1
was upregulated in Alzheimer’s disease excitatory neurons in the
largest published single nucleus RNA-seq study in human Alzheimer’s
disease^[268]72. Another study showed that HNRNPA2B1 knockdown in
iPSC-derived neurons and mouse hippocampal neurons was protective
against oligomeric tau-mediated neurotoxicity^[269]120. Given these
prior observations, our results suggest that the HNRNPA2B1 is under
tight control; significant changes in HNRNPA2B1 homeostasis may have
consequences on tauopathy.
The network algorithms used in this study have previously been used to
identify biological processes in various disease consequences,
including Alexander disease, medulloblastoma, Parkinson’s disease in
Drosophila, amyotrophic lateral sclerosis, and an Appl model of
Alzheimer’s disease in Drosophila^[270]28,[271]88,[272]121–[273]123.
Related algorithms such as ROBUST, COSMOS, DOMINO and GNN-based tools
have demonstrated the broad applicability of such
approaches^[274]124–[275]127. In some cases, the goal of network
integration is to identify novel targets that were not found in the
input data, but are implicated by the network. Indeed, in prior work,
we knocked down a large number of nodes nominated by the network to
test for neurotoxicity in a Drosophila model relevant to Amyotrophic
Lateral Sclerosis and found that many of these targets had a
significant effect on neurotoxicity^[276]123. The goal of the current
study was different. Our genetic screen was intentionally broad, with
genes knocked down in a pan-neuronal pattern to maximize recovery of
neurodegeneration hits. The comprehensiveness of the screen reduced the
need for computational methods to expand the number of genes of
interest. Rather, the role of computational was to prioritize the genes
and to generate hypotheses explaining the mechanisms by which these
genes influenced neuronal health.
Many of the nodes in our network had only modest effect sizes in
comparisons between disease in control. Additionally, the genes we
studied in our follow-up experiments had only modest prior connections
with Alzheimer’s disease; in the OpenTargets list of Alzheimer’s
associated genes, the highest rank was 1256^th out of 6595. The
network-based multi-omic integration allowed us to focus on
disease-associated targets that would not be found by simpler
approaches.
The framework presented in this paper could be used to combine the
screen hits with appropriate disease-specific data to search for
disease-universal or disease-specific regulators across
neurodegenerative diseases. We observed that a significant proportion
of age-associated genes in multiple human brain tissues are enriched
for neurodegeneration screen hits. Given the diversity of brain regions
affected in aging-related disorders, some of the screen hits are likely
associated with diseases other than Alzheimer’s disease, and some may
influence more than one disease. We also note that while our genetic
screen data was neuron-specific, future work could use network analysis
approaches presented in this or other studies to screens in other
non-neuronal cell types^[277]10,[278]128. Pathways that influence
multiple diseases would be particularly important for therapeutic
strategies to prevent aging of the brain.
Methods
Declaration of consent for human subjects
We were granted access to the Genotype-Tissue Expression Project
datasets through dbGaP, following the policies of informed consent from
the original publication^[279]129. Human patient single-nucleus
RNA-sequencing data was acquired through data use agreements with the
AD Knowledge Portal confirming informed consent^[280]71. We confirmed
that all samples used from the Genotype-Tissue Expression Project and
AD knowledge portal were collected with ethics approval from the
relevant institutional review boards as part of the data use agreements
for the respective sources of data. We complied with the data use
agreements that ensured the data were collected with informed consent
from the patients in the studies for the human patient brains that we
analyzed with laser capture microdissection. Human patient brains for
laser capture microdissection were recruited with the oversight of the
Institutional Review Board of Brigham and Women’s hospital.
Drosophila stocks and genetics
All fly crosses and aging were performed at 25 °C. Equal numbers of
adult male and female flies were analyzed. For the genome-scale screen,
gene knockdown was mediated by the elav-GAL4 pan-neuronal driver and
brain histology was examined at 30 days post-eclosion. Transgenic RNAi
lines for genome-scale gene were obtained from the Bloomington
Drosophila Stock Center and from the Vienna Drosophila Resource Center
and are listed in Supplementary Data [281]1. We used all available
transgenic RNAi lines from the Bloomington Drosophila stock center when
the screen was performed. The UAS-tau wild type, UAS-tau^R406W and
UAS-Aβ^1–42 transgenic flies been described previously^[282]85,[283]86.
Expression of human tau or amyloid β was directed to neurons using the
pan-neuronal driver elav-GAL4 or to the retina using the GMR-GAL4
driver. Flies were aged to 10 days post-eclosion for brain proteomics,
metabolomics, and histology. Dcr-2 was expressed in some animals to
enhance RNAi-mediated gene knockdown. The following stocks were also
obtained from the Bloomington Drosophila Stock Center: elav-GAL4,
GMR-GAL4, UAS-CG1239 (MEPCE) RNAi HMC02896, UAS-CG1239 (MEPCE) RNAi
HMC04088, UAS-Hrb98DE (HNRNPA2B1) RNAi HMC00342, UAS-Hrb98DE
(HNRNPA2B1) RNAi JF01249, UAS-CkIIɑ RNAi JF01436, UAS-CkIIβ RNAi
JF01195, UAS-N RNAi 1 ([284]GLV21004), UAS-N RNAi 2 (GL0092),
UAS-Dcr-2.
Histology, immunostaining, and imaging
Flies were fixed in formalin and embedded in paraffin. 4 µm serial
frontal sections were prepared through the entire brain and placed on a
single glass slide. Hematoxylin and eosin staining was performed on
paraffin sections. For immunostaining of paraffin sections, slides were
processed through xylene, ethanol, and into water. Antigen retrieval by
boiling in sodium citrate, pH 6.0, was performed prior to blocking.
Blocking was performed in PBS containing 0.3% Triton X-100 and 2% milk
for 1 h and followed by incubation with appropriate primary antibodies
overnight. Primary antibodies to the following proteins were used at
the indicated concentrations: pH2Av (Rockland, 600-401-914, 1:100),
elav (Developmental Studies Hybridoma Bank, 9F8A9 at 1:20 and 7E8A10 at
1:5) and PCNA (DAKO, MO879, 1:500). For immunofluorescence studies,
Alexa 555- and Alexa 488-conjugated secondary antibodies (Invitrogen)
were used at 1:200. For quantification of pH2Av, a region of interest
comprised of approximately 100 Kenyon neurons was identified in
well-oriented sections of the mushroom body and the number of neurons
containing one or more than one immuno-positive foci was determined.
Images were taken on Zeiss LSM800 confocal microscope (Carl Zeiss, AG),
and quantification was performed using Image-J software. All
acquisition parameters were kept the same for all experimental groups.
Quantification for PCNA was assessed by counting the number of sections
containing PCNA immunopositivity in the entire brain. At least 6 brains
were analyzed per genotype and time point for pH2Av and PCNA
quantification. Histologic assessments were performed blinded to
genotype.
Quantitative mass spectrometry sample preparation for proteomics
Three control (genotype: elav-GAL4/+), three tau (genotype:
elav-GAL4/ + ; UAS-tau^R406W/+), and three Aβ[1-42] (genotype:
elav-GAL4/ + ; UAS-Aβ^1–42) samples of approximately 350 fly heads each
were used for proteomic analysis. Samples were prepared as previously
described (Paulo and Gygi 2018) with the following modifications. All
solutions are reported as final concentrations. Drosophila heads were
lysed by sonication and passaged through a 21-gauge needle in 8 M urea,
200 mM EPPS, pH 8.0, with protease and phosphatase inhibitors (Roche).
Protein concentration was determined with a micro-BCA assay (Pierce).
Proteins were reduced with 5 mM TCEP at room temperature for 15 minutes
and alkylated with 15 mM Iodoacetamide at room temperature for one hour
in the dark. The alkylation reaction was quenched with dithiothreitol.
Proteins were precipitated using the methanol/chloroform method. In
brief, four volumes of methanol, one volume of chloroform, and three
volumes of water were added to the lysate, which was then vortexed and
centrifuged to separate the chloroform phase from the aqueous phase.
The precipitated protein was washed with one volume of ice-cold
methanol. The protein pellet was allowed to air dry. Precipitated
protein was resuspended in 200 mM EPPS, pH 8. Proteins were digested
with LysC (1:50; enzyme:protein) overnight at 25 ^oC followed by
trypsin (1:100; enzyme:protein) for 6 hs at 37 ^oC. Peptide
quantification was performed using the micro-BCA assay (Pierce). Equal
amounts of peptide from each sample was labeled with tandem mass tag
(TMT10) reagents (1:4; peptide:TMT label) (Pierce). The 10-plex
labeling reactions were performed for 2 h at 25 ^oC. Modification of
tyrosine residues with TMT was reversed by the addition of 5% hydroxyl
amine for 15 minutes at 25 ^oC. The reaction was quenched with 0.5%
trifluoroacetic acid and samples were combined at a
1:1:1:1:1:1:1:1:1:1:1 ratio. Combined samples were desalted and offline
fractionated into 24 fractions as previously described.
Liquid chromatography-MS3 spectrometry (LC-MS/MS)
12 of the 24 peptide fractions from the basic reverse phase step (every
other fraction) were analyzed with an LC-MS3 data collection strategy
on an Orbitrap Lumos mass spectrometer (Thermo Fisher Scientific)
equipped with a Proxeon Easy nLC 1000 for online sample handling and
peptide separations^[285]130. Approximately 5 µg of peptide resuspended
in 5% formic acid + 5% acetonitrile was loaded onto a 100 µm inner
diameter fused-silica micro capillary with a needle tip pulled to an
internal diameter less than 5 µm. The column was packed in-house to a
length of 35 cm with a C[18] reverse phase resin (GP118 resin 1.8 μm,
120 Å, Sepax Technologies). The peptides were separated using a 180 min
linear gradient from 3% to 25% buffer B (100% acetonitrile + 0.125%
formic acid) equilibrated with buffer A (3% acetonitrile + 0.125%
formic acid) at a flow rate of 600 nL/min across the column. The scan
sequence began with an MS1 spectrum (Orbitrap analysis, resolution
120,000, 350 − 1350 m/z scan range, AGC target 1 × 10^6, maximum
injection time 100 ms, dynamic exclusion of 75 seconds). The “Top10”
precursors were selected for MS2 analysis, which consisted of CID
(quadrupole isolation set at 0.5 Da and ion trap analysis, AGC
1.5 × 10^4, NCE 35, maximum injection time 150 ms). The top ten
precursors from each MS2 scan were selected for MS3 analysis
(synchronous precursor selection), in which precursors were fragmented
by HCD prior to Orbitrap analysis (NCE 55, max AGC 1.5 × 10^5, maximum
injection time 150 ms, isolation window 2 Da, resolution 50,000).
LC-MS3 data analysis
A suite of in-house software tools was used for.RAW file processing and
controlling peptide and protein level false discovery rates, assembling
proteins from peptides, and protein quantification from peptides as
previously described^[286]130. MS/MS spectra were searched against a
Uniprot Drosophila reference database appended with common protein
contaminants and reverse sequences. Database search criteria were as
follows: tryptic with two missed cleavages, a precursor mass tolerance
of 50 ppm, fragment ion mass tolerance of 1.0 Da, static alkylation of
cysteine (57.02146 Da), static TMT labeling of lysine residues and
N-termini of peptides (229.162932 Da), and variable oxidation of
methionine (15.99491 Da). TMT reporter ion intensities were measured
using a 0.003 Da window around the theoretical m/z for each reporter
ion in the MS3 scan. Peptide spectral matches with poor quality MS3
spectra were excluded from quantitation ( < 200 summed signal-to-noise
across 10 channels and < 0.7 precursor isolation specificity).
Metabolomics
Three control (genotype: elav-GAL4/+), three tau (genotype:
elav-GAL4/ + ; UAS-tau^R406W/+), and three Aβ[1-42] (genotype:
elav-GAL4/ + ; UAS-Abeta^1–42) samples of 40 fly heads each were
collected and untargeted positively and negative charged polar and
non-polar metabolites were assessed using liquid chromatography-mass
spectrometry as described in detail previously^[287]131.
Identifying age-associated genes in RNA-seq data from the Genotype-Tissue
Expression (GTEx) project
To identify genes with significant associations between gene expression
in the brain and chronological age, we sought out RNA-seq data sets
with many individuals and a large dynamic range of ages. We analyzed
2642 samples from 382 individuals representing 13 different brain
tissues, using the measurements of transcripts per million (TPM)
available from the GTEx analysis version 8
([288]https://gtexportal.org/home/datasets). The age range of the
patients are from 20-70 years old with a median age of 58 years old. To
measure the effects of age on gene expression in the brain, we used a
mixed-effects model as implemented in lme4 version 1.1.27.1, treating
sex, ethnicity, patient identity and tissue of origin as covariates
with the following equation:
[MATH: Y~Age+Sex+PMI+Tissue+Ethnicity+SampleID :MATH]
1
Where “Sample ID” is treated as a random effect while all other
covariates are treated as fixed effects. We identify genes as
significantly associated with age if the FDR-adjusted p-value for the
age coefficient is less than 0.1 and if the absolute unstandardized
coefficient for age is greater than 0.1, which corresponds to a change
of 1 TPM per decade in this data set, assuming age is the only factor.
We used this equation to assess whether there was a significant effect
on gene expression with age given the mean expression of the screen
hits. To assess robustness of this test, we performed 10,000
permutations of either gene sets of the same size as the set of the
screen hits or over patient age. We computed an empirical p-value which
was the number of permutations with p-values smaller than the original
test divided by the number of permutations. When performing this
analysis for individual tissues, we used a generalized additive model
with the same formula but excluding the “Sample ID” and “Tissue”
variables.
To perform Gene Set Enrichment Analysis, we used the R package “fgsea”
version 1.14.0 using the Reactome 2022 library from Enrichr as the
reference set of pathways. We added the neurodegeneration screen hits
as a pathway term, for pathway enrichment analysis. We defined the gene
set of this new pathway using the human orthologs of the
neurodegeneration screen hits, which were mapped using the DRSC
integrative ortholog prediction tool (DIOPT)^[289]132. We used the
standardized regression coefficient to rank the genes^[290]73,[291]133.
Analysis of single-nuclear RNA-seq data
To identify cellular subtypes that were associated with disease, we
analyzed previously published single nuclear RNA-seq data^[292]71,
which included 70,000 cells from 24 Alzheimer’s disease patients and 24
age and sex-matched healthy controls. The data were preprocessed as in
previous work^[293]71. In short, for each of the previously defined
“broad cell types” (excitatory neurons, inhibitory neurons, astrocytes,
oligodendrocytes, microglia and oligodendrocyte progenitor cells), we
applied Seurat version 4.0.4’s implementations for log-normalizing the
data, detecting highly variable features, and standard scaling the
data. We used Seurat’s implementation of PCA reducing the data to 20
principal components. After applying PCA, we used Harmony version 0.1
to correct for the effects of sex, individual, sequencing batch and
post-mortem interval in our data. This correction was performed to
minimize the effects of confounders in our clustering analysis. We
further applied Scrublet to predict and remove doublet cells from the
population as implemented in Scanpy version 1.8.2. We used the Harmony
components for UMAP dimensionality reduction and Leiden clustering. To
determine the Leiden clustering resolution, we calculated the
silhouette coefficient after applying Leiden clustering on a range of
values (resolution = {0.1,0.2,0.3,0.4, 0.6, 0.8, 1.0, 1.4, 1.6,
2.0,2.1,2.2,2.3,2.4,2.5}). We selected the clustering resolution that
maximized the silhouette coefficient. To identify disease-associated
clusters, we applied a hypergeometric test to determine if a cluster
was over-represented by cells derived from Alzheimer’s disease patients
or healthy controls. We subsequently applied MAST as implemented in
Seurat to determine the differentially expressed genes between
Alzheimer’s disease-enriched clusters and the remaining sub-clusters
within a given cell type. We defined differentially expressed genes as
having an FDR-adjusted p-value less than 0.1 and an absolute log[2]
fold change greater than 1.
Analysis of drosophila multi-omics
We performed two-way t-tests to assess the significance of Drosophila
proteins, phosphoproteins and metabolites between Drosophila models of
amyloid β and control as well as significant proteins, phosphoproteins
and metabolites between Drosophila models of tau and control. We used
gProfiler with the g:SCS multiple hypothesis correction to identify
significant gene ontology terms using Drosophila pathways as a
reference^[294]134. We used PiuMet to map unannotated m/z peaks in the
metabolomic data to known compounds^[295]135.
Laser-capture RNA-seq
We used the laser-capture RNA-seq method to profile total RNA of brain
neurons similar to what we reported in previous
studies^[296]37,[297]38. In brief, laser-capture microdissection was
performed on human autopsy brain samples to extract neurons^[298]38.
For each temporal cortex (middle gyrus) about 300 pyramidal neurons
were outlined in layers V/VI by their characteristic size, shape, and
location in HistoGene-stained frozen sections and laser-captured using
the Arcturus Veritas Microdissection System (Applied Biosystems) as in
previous studies^[299]38. Linear amplification, construction,
quantification, and quality control of sequencing libraries,
fragmentation, and sequencing methods were described in earlier
studies^[300]38. RNA seq data processing and quality control was
performed similar to what we reported in previous
studies^[301]37,[302]38. In summary,
The RNA-sequencing data was aligned to the human genome reference
sequence hg19 using TopHat v2.0 and Cufflinks v1.3.0. To measure
RNA-sequencing quality control, we used FASTQC and RNA-SeQC. We blinded
ourselves to the disease status of the patient when preparing the
samples.
Data sets used for expression Quantitative Trait Locus (eQTL) analysis
eQTL analysis was performed using seven previously published bulk
cortex data sets and one new laser-captured pyramidal neuron data set.
ROSMAP, MayoRNAseq, MSBB, and HBTRC data were obtained from the AD
Knowledge Portal ([303]https://adknowledgeportal.org) on the Synapse
platform (Synapse ID: syn9702085). CommonMind was obtained from the
CommonMind Consortium Knowledge Portal (10.7303/syn2759792) also on the
Synapse platform (Synapse ID: syn2759792); GTEx was obtained from
[304]https://gtexportal.org/home/. UKBEC, was obtained from
[305]http://www.braineac.org/; BRAINCODE, was obtained from
[306]http://www.humanbraincode.org/. The data sets are described in
detail at each of the source portals and in the corresponding original
publications^[307]37,[308]38,[309]129,[310]136–[311]143.
We used a conservative four-stage design: 1, Cortex discovery stage:
eQTL analysis in four human cortex cohorts (stage D in Supplementary
Data [312]3). 2, Cortex replication stage: replication of findings from
the discovery stage in three independent human cortex cohorts (stage R
in Supplementary Data [313]3). 3, To further enhance statistical power,
we performed meta-analysis across all seven cohorts. This meta-analysis
highlighted an additional 17 suggestive eGenes with P-values < 5 *
10^−8 (Supplementary Data [314]3) which were not recovered in the
two-stage design. 4, We confirmed 12 suggestive eGenes in the
laser-captured pyramidal neuron data set with P-values < 0.05.
Gene expression data processing
For RNAseq data sets, the gene reads counts were normalized to TPM
(Transcripts Per Kilobase Million) by scaling gene length first and
followed by scaling sequencing depth. The gene length was considered as
the union of exon lengths. Consistent and stringent quality control and
normalization steps were applied for each of the cohorts: 1) For sample
quality control, we removed samples with poor alignment. We kept
samples with > 10 M mapped reads and > 70% mappability by considering
reads with mapping quality of Q20 or higher (the estimated read
alignment error rate was 0.01 or less). 2) Filtering sample mix-ups by
comparing the reported sex with the transcriptional sex determined by
the expression of female-specific XIST gene and male-specific RPS4Y1
gene. 3) Filtering sample outliers. Sample outliers with problematic
gene expression profiles were detected based on Relative Log Expression
(RLE) analysis, spearman correlation based hierarchical clustering,
D-statistics analysis^[315]144. 4) For normalization, gene expression
values were quantile normalized after log10 transformed by adding a
pseudo count of 1e-4. 5) SVA package was applied for removing batch
effects by using combat function and adjusting age, sex, RIN, PMI. We
accounted for latent covariates by performing fsva function. Residuals
were outputted for downstream analysis. For array-based gene expression
datasets, we directly used the downloaded, quality-controlled gene
expression profiles.
Genotype data processing for eQTL analyses
We applied PLINK2 (v1.9beta) and in house scripts to perform rigorous
subject and SNP quality control (QC) for each dataset in the following
order: 1) Set heterozygous haploid genotypes as missing; 2) remove
subjects with call rate < 95%; 3) remove subjects with gender
misidentification; 4) remove SNPs with genotype call rate < 95%; 5)
remove SNPs with Hardy-Weinberg Equilibrium testing P-value < 1 *
10^−6; 6) remove SNPs with informative missingness test (Test-mishap)
P-value < 1 * 10^−9; 7) remove SNPs with minor allele frequency (MAF) <
0.05; 8) remove subjects with outlying heterozygosity rate based on
heterozygosity F score (beyond 4*sd from the mean F score); 9) IBS/IBD
filtering: pairwise identity-by-state probabilities were computed for
removing both individuals in each pair with IBD > 0.98 and one subject
of each pair with IBD > 0.1875; (10) population substructure was tested
by performing principal component analysis (PCA) using smartPCA in
EIGENSOFT^[316]145. PCA outliers were excluded and the top 3 principal
components were used as covariates for adjusting population
substructures.
Imputation of genotypes for eQTL analyses
The array-based genotype datasets were enhanced by genotype imputation.
Genotype imputation for each dataset was performed separately on
Michigan Imputation Server, using 1000 G phase 3 reference panel. Eagle
v2.3 and Minimac3 were selected for phasing and imputing respectively,
and EUR population was selected for QC. Only variants with R^2
estimates less than or equal to 0.3 were excluded from further
analysis. And only variants with MAF > 5% were also included in
downstream eQTL analysis. Prior to imputation, pre-imputation checks
provided by Will Rayner performed external quality controls to fit the
requirements of the imputation server.
We used European population reference (EUR) haplotype data from the
1000 Genomes Project (2010 interim release based on sequence data
freeze from 4 August 2010 and phased haplotypes from December 2010) to
impute genotypes for up to 6,709,258 SNPs per data set. We excluded
SNPs that did not pass quality control in each study
eQTL analysis
The eQTL mapping was conducted using R Package Matrix EQTL using the
additive linear model on a high-performance Linux-based Orchestra
cluster at Harvard Medical School. For cis-eQTL analysis, SNPs were
included if their positions were within 1 Mb with the TSS of a gene.
And trans-eQTL analysis included SNP-gene association if their distance
was beyond this window. FDRs reported by MatrixEQTL were used to
estimate the association between SNPs and gene expression.
Meta eQTL analysis
We performed meta eQTL analysis using three separate effects model
implemented in METASOFT^[317]146, which took effect size and standard
error of SNP-gene pair in each dataset as input. Fixed effects model
(FE model) was based on inverse-variance-weighted effect sizes. Random
Effects model (RE model) was a very conservative model based on
inverse-variance-weighted effect size. Han and Eskin’s random effects
model (RE2 model) was optimized to detect associations under
heterogeneity. We reported statistics of the RE2 model in this study.
Identifying eGene-associated variants that associate with transcription
factor binding
We were interested in determining whether eGene-associated variants
overlapped with transcription factor binding sites. We used the optimal
hg19 ChIP-seq-derived transcription factor peak sets from ENCODE 3,
which we downloaded from the UCSC genome browser. To determine if the
eQTL of interest overlapped with a DNA-binding motif, we extracted the
sequence 50 base pairs upstream and 50 base pairs downstream of the
variant and used FIMO to detect the presence of an overlapping
DNA-binding motif^[318]147. We used the HOCOMOCO version 11 core motif
set as reference motifs. Correlations between HLA-DRB1 and CUX1
expression were performed using Pearson’s correlation test as
implemented in R version 4.0.2.
To identify correlations between eGenes and biological pathways, we
applied GSVA version 1.42.0 to the CPM-normalized temporal cortex
pyramidal neuron RNA-seq to identify the REACTOME pathway enrichments
per-sample. For this analysis we used the REACTOME pathways available
in GSVAdata version 1.30, which downloads the REACTOME pathways from
msigdb version 3.0 with the data set named “c2BroadSets”. We calculated
correlations between GSVA signatures and gene expression using the
Pearson correlation coefficient as implemented in R version 4.0.2,
considering correlations with an FDR-adjusted p-value less than 0.1.
Design of integration analyses
In order to identify the biological mechanisms through which human and
model organism genetic hits contribute to neurodegenerative disease, we
utilized the Prize-Collecting Steiner Forest algorithm (PCSF) as
implemented in OmicsIntegrator 2^[319]42 (OI v2.3.10,
[320]https://github.com/fraenkel-lab/OmicsIntegrator2). The PCSF
algorithm identifies disease-associated networks based on signals
derived from molecular data that are significantly altered in
individuals with the disease. The algorithm aims to retrieve as many
termini as possible while minimizing the number of low-confidence edges
in the final network. The resulting network includes a subset of the
input nodes and high-confidence interactors that were not part of the
input. We used OmicsIntegrator to map proteomic, phosphoproteomic,
metabolomic and genetic changes to a set of known protein-protein and
protein-metabolite interactions derived from physical protein-protein
interactions from iRefIndex version 17 and protein-metabolite
interactions described in the HMDB and Recon 2 databases. To add
brain-specific edges, we include the interactions derived from Affinity
Purification Mass Spectrometry (AP-MS) of mice in BraInMap^[321]148.
Additionally, we include previously published interactions between
proteins found in tau aggregates and phosphorylated tau derived from
AP-MS of neurofibrillary tangles^[322]149. The costs on the
protein-protein interactions were computed as 1 minus the edge score
reported by iRefIndex, while the cost of the protein-metabolite
interactions were calculated as in previous studies^[323]135,[324]150.
Given that these reference interactions were defined in human proteins
and metabolites, we mapped the Drosophila proteins and phosphoproteins
to their human orthologs using DIOPT version 8.0, choosing the human
orthologs that the tool determined were of “moderate” or “high”
confidence^[325]132
([326]https://www.flyrnai.org/cgi-bin/DRSC_orthologs.pl). Briefly,
DIOPT identifies gene-ortholog pairs from multiple databases and
computes a score based on how many databases report the gene-ortholog
pair. We chose the human ortholog most frequently reported by this
ensemble of databases. Orthologs reported in fewer than three reference
databases were also excluded. In order to comprehensively characterize
metabolomic changes, we used PiuMet to map uncharacterized metabolites
to compounds identified in HMDB^[327]135. In addition to integrating
the phosphoproteomic data, we included the predicted upstream kinases
from iProteinDB whose proteomic levels in Drosophila correlated with
its phosphoproteomic targets^[328]151 (Spearman correlation
coefficient> 0.4, [329]https://www.flyrnai.org/tools/iproteindb/web/).
For the Alzheimer’s disease-specific network, we integrated the screen
hits with genetic modifiers of disease severity from model organism
screens and available proteomics, phosphoproteomics and metabolomics
from the literature and generated data. These data are derived from
multiple brain tissues, and several brain regions are represented in
individual data sets as well (Supplementary Data [330]8). To make sure
no individual brain region or study was overrepresented in terms of the
number of weighted nodes, we applied different thresholds of
significance for each data source to generate the Alzheimer’s disease
network. The prizes of the proteomic, phosphoproteomic, and metabolomic
data are calculated as the negative base 10 logarithm of the
Benjamini-Hochberg FDR-corrected p-value calculated by a two-way
t-test. The Drosophila phosphoproteomic data and the metabolomic data
were subject to an FDR threshold of 0.1, while the Drosophila proteomic
and human proteomic data had more stringent cutoffs (FDR < 0.01 and
FDR < 0.0001 respectively). Additionally, the metabolomic data were
only assigned prizes if the absolute log[2] fold change was greater
than 1. The human lipidomic data were assigned prizes by their negative
log[10] nominal p-value and were included if their nominal p-value was
less than 0.05. The upstream kinases were assigned the same prizes as
the targeted phosphoproteins. Instead of assigning prizes based on a
two-way t-test, the genetic hits were assigned prizes differently. For
the human eGenes, prizes were assigned to all genes in the discovery
phase with a value equal to -log[10](genome-wide FDR). For genes found
in the Drosophila neurodegeneration and tau aggregation screens, prizes
were set to 1 = −log[10](0.1). For the human GWAS loci, the prizes were
set to the -log[10] Bonferroni corrected, genome-wide p-value for those
determined to be causal loci according to previous analyses and 1
otherwise^[331]9. After the initial prize assignments, the values are
minimum-maximum normalized to values between 0 and 1 within each data
type, weighing each prize by a scale factor reflecting our confidence
in the degree to which a given data type reflects Alzheimer’s disease
pathology (Supplementary Data [332]8). We further included previously
published Drosophila modifiers of tau toxicity^[333]15. If any nodes
overlapped between data sets, we attributed the node to the data set
for which it was assigned the greatest weight and dropped all entries
with smaller weights for the given node.
For each network, we performed 100 randomizations of the edges with
gaussian noise to assess the robustness of the nodes to perturbations
to edges and prize values. Additionally, we performed 100
randomizations of the prize values to assess the specificity of each
node to their assigned prizes. We filtered out nodes that did not have
a prize (Predicted nodes) if they appeared in fewer than 40 of the
robustness randomizations and more than 40 of the specificity
randomizations. We then performed Louvain clustering with a resolution
of 1.0 for community detection in the networks. We applied Leiden
clustering with the same resolution parameter and calculated the
modularity scores for Louvain and Leiden clustering results. Louvain
clustering had a slightly higher Q-value than Leiden clustering, so we
reported the Louvain clustering results for our network analysis
(Louvain Q = 0.415, Leiden Q = 0.412).
OmicsIntegrator hyperparameters control the weights on prizes (β), the
weight of the edges on the dummy node for network size (ω) and the edge
penalty for highly connected “hub” nodes (γ). In order to select
hyperparameters for OmicsIntegrator, we evaluated a range of
hyperparameters for each network: β = {2,5,10}, ω = {1,3,6} and
γ = {2,5,6}. We chose networks based on minimizing the mean
specificity, maximizing the mean robustness and minimizing the KS
statistic between node degree of the prizes as compared to those of the
predicted nodes. Our final selection was β = 5, ω = 3 and γ = 5.
Networks were visualized using Cytoscape version 3.8.0.
Implementation of ROBUST to compare to the result of OmicsIntegrator
We applied ROBUST to the same interactome and initial node list as we
did for OmicsIntegrator. We used the version of ROBUST from
[334]https://github.com/bionetslab/robust_bias_aware/tree/main. We used
the default parameters for ROBUST.
Comparison of OmicsIntegrator to a graph neural network and GNNExplainer
We trained a graph neural network with two graph convolutional layers
using a mean squared error loss function to predict the node weights
supplied to OmicsIntegrator using the same interactome as we used for
OmicsIntegrator. The graph neural network was implemented in Pytorch
Geometric version 2.5.2. To select hyperparameters, we trained the
algorithm on 80% of the nodes and tested the model on 20% of the data
selecting the combination of learning rate, weight decay and number of
epochs minimized the loss between the predicted and observed values. We
found that the best learning rate was 0.01, the optimal number of
epochs was 100 and the best weight decay was 1*10^−4. We re-ran the
model on our whole data set using these hyperparameters and applied
GNNExplainer as implemented in Pytorch Geometric version 2.5.2. To
establish which node attributions were most important for prediction,
we re-ran the model after re-distributing node weights to establish a
null distribution of node attributions. We kept the nodes that were
within the top 99^th quantile of the empirical distribution.
Semiquantitative scale for assaying the Drosophila rough eye phenotype
The approach for evaluating the Drosophila rough eye phenotype is
described in ref. ^[335]24. We used a semiquantitative scale ranging
from 0-5. 0 represents a wild-type eye, 1 refers to less than 50% of
fly eye ommatidia displaying morphological disruption. A score of 2
indicates that there is morphological disruption in 50-100% of the fly
eye ommatidia and 0-25% reduction in eye size. A score of 3 indicates
100% ommatidial disruption and a 25–50% in fly eye size. A score of 4
describes fly eyes with 100% ommatidial disruption and one of the
following features: ommatidial fusions, darkened or discolored areas,
or over 50% reduction in eye size. A score of 5 refers to eyes with at
least two of the features that can be observed in an eye with a rough
eye score of 4.
Betweenness analysis of neurodegeneration screen hits with regulators of DNA
damage
We used the R package igraph version 1.2.6 to compute the subnetwork of
neurodegeneration screen hits, and regulators of DNA damage. We used
the package’s implementation of betweenness centrality to compute each
node’s betweenness in this subnetwork.
COMET assay for DNA damage in human neural progenitor cells
For the alkaline COMET assay, we applied inhibitors of CK2 (CX-4945)
and the NOTCH signaling pathway (Compound E) to human iPSC-derived
neural progenitor cells. iPS cells AG06840 was obtained from the
Coriell Institute as a fibroblast then converted to iPS cells under
guidelines and approval of MIT institutional review board using human
materials, confirming that it was obtained under informed donor
consent. Cells were treated with 5 µM of the inhibitor overnight and
harvested. Comets were selected using the OpenComet plugin in
ImageJ^[336]116,[337]152. The extent of DNA damage was measured by the
tail moment and proportion of intensity between the tail and the head
of the comet. The tail represents single-stranded DNA that trails off
from the nucleus due to DNA damage burden. Longer tails indicate a
greater extent of DNA damage. DMSO and etoposide were included as
negative and positive controls respectively.
Human iPSC culture
Human iPSCs (male WTC11 background from the Coriell Institute #GM25256,
gift from the lab of Michael Ward) harboring a single-copy of
doxycycline-inducible (DOX) mouse NGN2 at the AAVS1 locus and
pC13N-dCas9-BFP-KRAB at human CLYBL intragenic safe harbor locus
(between exons 2 and 3) were cultured in mTeSR Medium (Stemcell
Technologies; Cat. No. 85850) on Corning Tissue Culture Treated Dishes
(VWR International LLC; Cat. No. 62406-161) coated with Geltrex (Life
Technologies Corporation; Cat. No. A1413301). Briefly, mTESR medium
supplemented with mTESR supplement (Stemcell Technologies; Cat. No.
85850) and antibiotic Normocin (Invivogen; Cat. No. Ant-nr-2) was
replenished every day until 50% confluent^[338]153. Stem cells were
transduced with lentivirus packaged with CROPseq-Guide-Puro vectors and
selected with Puromycin (Gibco; Cat. No. A11138-03) until confluent.
When cells were 80%-90% confluent, cells were passaged, which entailed
the following: dissociating in Accutase (Stemcell Technologies; Cat.
No. 7920) at 37 °C for 5 minutes, diluting Accutase 1:5 with mTeSR
medium, collecting in conicals and centrifuging at 300 g for 5 min,
asipirating supernatant, resuspending in mTESR supplemented with 10uM
Y-27632 dihydrochloride ROCK inhibitor (Stemcell Technologies; Cat. No.
72302) and plating in Geltrex-coated plates.
NGN2 neuronal differentiation and RNA extraction
Neuronal differentiation was performed as described in previous work
with a few modifications^[339]154. On day 1, iPSCs transduced with
CROPseq-Guide-Puro were plated at 40,000 cells/cm^2 density in
Geltrex-coated tissue culture plates in mTESR medium supplemented with
ROCK inhibitor and 2 µg/ml Doxycycline hyclate (Sigma; Cat. No.
D9891-25G). On Day 2, Medium was replaced with Neuronal Induction media
containing the following: DMEM/F12 (Gibco; Cat. No 11320-033), N2
supplement (Life Technologies Corporation; Cat. No. 17502048), 20%
Dextrose (VWR; Cat. No. BDH9230-500G), Glutamax (Life Technologies
Corporation; Cat. No. 35050079), Normocin (Invivogen; Cat. No.
Ant-nr-2), 100 nM LDN-193189 (Stemcell Technologies; Cat. No. 72147),
10uM SB431542 (Stemcell Technologies; Cat. No. 72234) and 2uM XAV
(Stemcell Technologies; Cat. No. 72674) and 2 µg/ml DOX. The Neuronal
Induction Media was replenished on day 3. On day 4, the medium was
replaced with Neurobasal Media (Life Technologies Corporation; Cat. No.
21103049) containing B27 supplement (Gibco; Cat. No. 17504044), MEM
Non-Essential Amino Acids (Life Technologies Corporation; Cat. No.
11140076) Glutamax, 20% Dextrose, 2 µg/ml DOX, Normocin, 10 ng/ml BDNF
(R&D Systems; Cat. No. 11166-BD), 10 ng/ml CNTF (R&D Systems; Cat. No.
257-NT/CF), 10 ng/ml GDNF (R&D Systems; Cat. No. 212-GD/CF) and
cultured for 2 days. At day 6, cells were dissociated with Accutase and
resuspended with Trizol (Thermofisher Scientific; Cat. No.15596018).
RNA was extracted following manufacturer’s manual, using Direct-zol RNA
Miniprep kit (Zymo Research, R2050). The sequences for each guide is as
follows: Non-targeting guide 1: GTAGCCTTAGGTAGAGGGGG; Non-targeting
guide 2: GTACCACCCAAACGATAACG; HNRNPA2B1 guide 1: CTGCGAGGAGCACCTCCGCA;
HNRNPA2B1 guide 2: GCTCGAGAAACAACTCTGCG; CSNK2A1 guide 1:
GCGATGGAGGAGGAGACACA; CSNK2A1 guide 2: AGACAATATGGCGGCGATGG; NOTCH1
guide 1: GGCCGCGGGAGGGAGCGCAA, NOTCH1 guide 2: CGCGGGCGCGAGCGCAGCGA.
Bulk RNA-seq analysis of CRISPRi knockdowns in neural progenitor cells
We analyzed the RNA-seq data after CRISPRi knockdown as performed in
previous CRISPRi studies^[340]155. In summary, we mapped the raw
sequencing reads to the hg38 reference transcriptome with salmon
version 1.10.1. We used the ‘-noLengthCorrection’ option to generate
transcript abundance counts. We generated gene-level count estimates
with tximport version 1.16.1. To account for the effects of different
guides, we performed differential expression analysis between knockdown
and control with DESeq2 version 1.28.1 treating guide identity as a
covariate. We applied the apelgm package version 1.10.0 to shrink the
log[2] fold changes. We applied Gene Set Enrichment Analysis to the
ranked, shrunk log[2] fold changes using the fgsea package version
1.14.0 and the Reactome 2022 library as the reference pathway
set^[341]73,[342]133.
Sample size determination for experiments
All sample sizes were determined according to standards in the field.
No power calculations were performed before performing experiments.
Ethical acquisition of human patient data
Human patient data was acquired through data use agreements with the AD
Knowledge Portal and the Genotype-Tissue Expression Project. We
complied with the data use agreements that ensured the data were
collected with informed consent from the patients in the studies. Human
patient brains for laser capture microdissection were recruited with
the oversight of the Institutional Review Board of Brigham and Women’s
hospital.
Reporting summary
Further information on research design is available in the [343]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[344]Supplementary Information^ (3.2MB, pdf)
[345]41467_2025_59654_MOESM2_ESM.pdf^ (85.7KB, pdf)
Description of Additional Supplementary Files
[346]Supplementary Data 1^ (364.9KB, xlsx)
[347]Supplementary Data 2^ (16MB, xlsx)
[348]Supplementary Data 3^ (9.6KB, xlsx)
[349]Supplementary Data 4^ (16.8KB, xlsx)
[350]Supplementary Data 5^ (1.5MB, xlsx)
[351]Supplementary Data 6^ (1.3MB, xlsx)
[352]Supplementary Data 7^ (3.1MB, xlsx)
[353]Supplementary Data 8^ (165.3KB, xlsx)
[354]Supplementary Data 9^ (3.5MB, xlsx)
[355]Supplementary Data 10^ (625.7KB, xlsx)
[356]Supplementary Data 11^ (11.9KB, xlsx)
[357]Reporting Summary^ (74.7KB, pdf)
[358]Transparent Peer Review file^ (161.8KB, pdf)
Source data
[359]Source Data^ (165.3KB, xlsx)
Acknowledgements