Abstract
Progressive supranuclear palsy (PSP) is a neurodegenerative
parkinsonian disorder characterized by cell-type-specific tau lesions
in neurons and glia. Prior work uncovered transcriptome changes in
human PSP brains, although their cell-specificity is unknown. Further,
systematic data integration and experimental validation platforms to
prioritize brain transcriptional perturbations as therapeutic targets
in PSP are currently lacking. In this study, we combine bulk tissue
(n = 408) and single nucleus RNAseq (n = 34) data from PSP and control
brains with transcriptome data from a mouse tauopathy and experimental
validations in Drosophila tau models for systematic discovery of
high-confidence expression changes in PSP with therapeutic potential.
We discover, replicate, and annotate thousands of differentially
expressed genes in PSP, many of which reside in glia-enriched
co-expression modules and cells. We prioritize DDR2, STOM, and KANK2 as
promising therapeutic targets in PSP with striking cross-species
validations. We share our findings and data via our interactive
application tool PSP RNAseq Atlas
([64]https://rtools.mayo.edu/PSP_RNAseq_Atlas/). Our findings reveal
robust glial transcriptome changes in PSP, provide a cross-species
systems biology approach, and a tool for therapeutic target discoveries
in PSP with potential application in other neurodegenerative diseases.
Subject terms: Neurodegeneration, Genetics of the nervous system
__________________________________________________________________
Progressive supranuclear palsy is a devastating neurological disorder
without treatment. Here, the authors leveraged omics data and model
organisms to nominate, prioritize, and validate high-confidence
candidate genes as therapeutic targets.
Introduction
Progressive supranuclear palsy (PSP) is a neurodegenerative disorder
with a relatively early age of onset and rapid progression to
death^[65]1. PSP is a primary tauopathy characterized by the
overexpression of 4-repeat tau isoform in both neuronal and glial
cells, leading to cell-specific tau lesions including neurofibrillary
tangles (NFT) in the neurons, tufted astrocytes (TA), coiled bodies
(CB) in oligodendrocytes, and tau threads (TauTh) in white matter.
Multiple genetic risk factors have been associated with PSP, including
variants in or proximal to microtubule associated protein tau (MAPT),
myelin associated oligodendrocyte basic protein (MOBP) and RUNX family
transcription factor 2 (RUNX2)^[66]2,[67]3. Some of these loci are also
associated with brain gene expression levels, suggesting that genetic
variants may confer PSP risk via transcriptional
regulation^[68]4,[69]5. We previously showed that PSP brains have vast
transcriptional perturbations^[70]6,[71]7. PSP brain gene expression
networks have distinct association patterns with the different
cell-specific tau lesions, highlighting cell-specificity of these
transcriptional changes in PSP^[72]7,[73]8.
While these findings nominate brain perturbations in gene expression
and transcriptional networks as potential culprits in PSP
pathophysiology, they are primarily based on microarray expression
measures of bulk brain tissue from human cohorts or do not
systematically investigate PSP brain RNAseq changes in independent
cohorts. Additionally, complementary data from single cell/single
nucleus RNAseq (sc/snRNAseq) is necessary to assess cell-type-specific
brain gene expression changes in PSP. Further, although investigating
these transcriptome changes in model systems across different species
can increase confidence in the findings and enable experimental
validations, such studies are lacking in neurodegenerative diseases
with few exceptions^[74]9,[75]10. Consequently, the exact disease
mechanism(s) and key molecular players that underlie PSP pathogenesis
remain unclear, creating a barrier to nominating effective targets for
disease-modifying treatment.
To enable the translational discovery of potential therapeutic targets
with mechanistic insights, the myriad brain transcriptional changes in
PSP must be narrowed down by systematic prioritization platforms that
integrate multiple data modalities and experimental validations. In
this study we integrated bulk brain RNAseq data across two large,
independent human cohorts, with complementary human brain snRNAseq and
tau mouse model data to discover and prioritize high-confidence
cell-specific gene expression and network perturbations in PSP followed
by in vivo validations in a Drosophila tau model. We built an
interactive web application PSP RNAseq Atlas to broadly serve the
research community as a facile tool to access and utilize these rich,
complex datasets ([76]https://rtools.mayo.edu/PSP_RNAseq_Atlas/). Our
study applies robust systems biology approaches to multi-omics data
across species to nominate high-priority genes with therapeutic
potential in tauopathies, specifically in PSP.
Results
PSP brains have vast and replicable transcriptome perturbations in
glia-enriched genes
To identify perturbed genes in PSP, we first collected and analyzed the
bulk gene expression profile from the superior temporal gyrus of
temporal cortex (TCX) tissue for 281 neuropathologically confirmed PSP
cases and 127 controls that lack significant pathology. Due to the
relatively low degree of gross tau pathology in TCX^[77]11,[78]12, we
hypothesized that this region would be less susceptible to confounding
factors associated with downstream consequences of the disease, such as
neuronal loss^[79]6, making the expression changes less likely to be
secondary to these confounds. TCX is also a more easily accessible
region, where there is typically more sample availability and less
susceptibility to variations in acquisition of gray matter. Expression
data was collected as part of two independent studies (studies 1 and 2,
Table [80]S1). All PSP cases have detailed measures of cell-specific
tau (CB, NFT, TA, TauTh) and overall degree of neuropathology
quantified from multiple brain regions as previously
described^[81]4,[82]8.
After quality control (QC), a total of 22,560 unique genes were
detected in both studies, the majority (67%) of which are
protein-coding (Table [83]S2, Figure [84]S1). To identify the gene(s)
associated with PSP diagnosis, we compared the brain gene expression
levels between PSP cases and controls using multiple linear regression
adjusting for relevant covariates in each dataset separately and
combined the results using an inverse-variance meta-analysis model.
Using a similar approach, we also analyzed the association of brain
gene expression levels with the severity of tau neuropathologies
measured within the 281 PSP cases. Compared to controls, 2,528 genes
were differentially expressed (DEGs) in PSP brains at an FDR-adjusted
p-value of 0.05 (Fig. [85]1a), suggesting extensive transcriptional
dysregulations in PSP brains at the bulk tissue level, even in a brain
region relatively spared from gross tau pathology. Importantly, a
secondary model that adjusted for cell proportions (Figure [86]S2)
showed congruent expression dysregulation, indicating that the
expression perturbations are not confounded by neuropathology-induced
differences in cell populations. Neuropathology association among the
PSP cases indicated the greatest number of associations with NFT (134
genes), while 8 gene levels were associated with TauTh (Fig. [87]1b).
Using less stringent significance thresholds (unadjusted p
value < 0.05), associations were also observed between gene expression
levels with other tau neuropathologic lesions in PSP (Fig. [88]1a). A
complete list of all expression associations is included in
Supplementary Data [89]1.
Fig. 1. Bulk RNAseq analysis identifies vast and replicable differentially
expressed genes in PSP brains.
[90]Fig. 1
[91]Open in a new tab
a Number of differentially expressed genes with respect to different
phenotypes at different threshold. b Heatmap of logFC (for PSPvsCtrl)
and association beta (for neuropathology) of the top genes defined as
lowest 5% FDR and top 5% |logFC|. Top genes that are also marker genes
for astrocyte, endothelia, and oligodendrocyte are color-coded. c
Volcano plot for all DEGs with FDR < 0.05 comparing gene expression
levels between PSP and control. The color of the dots represents cell
type enrichment of the DEGs: blue = astrocyte, orange = endothelia, red
= microglia, yellow = neuron, green = oligodendrocyte, gray = no cell
type specificity. d Distribution of the expression levels of MOBP,
KANSL1-AS1, YAP1, and CLU between PSP and control in each study cohort.
We highlighted these genes given their known association with PSP risk
or other neurodegenerative diseases. Statistics: differential
expression analysis using linear regression adjusting for covariates.
FDR-adjusted p values are presented. Unadjusted p values are provided
in Supplementary Data [92]1 and source data. Study 1: N = 257 PSP and
Control individuals; Study 2: N = 151 PSP and Control individuals;
Meta: N = 408 PSP and Control individuals. Also see Supplementary
Table [93]1. e Heatmap for the 4 genes highlighted in d also showed
consistent expression changes with PSP diagnosis and neuropathology.
Source data are provided as a Source Data file.
We next focused on the top gene expression changes in PSP brains
compared to controls. We defined top expression changes as a PSP vs
control DEG with the bottom 5% FDR and top 5% |logFC| (Fig. [94]1b). We
observed biologically congruent gene expression associations with both
diagnosis and neuropathology such that genes that are higher in PSP
than controls are also positively correlated with tau neuropathologies
and those that are down in PSP have negative neuropathology
correlations (Fig. [95]1b, [96]S3). Among all DEGs (FDR < 0.05), there
is also a striking and statistically significant cell-type specificity
as many of the PSP up-regulated genes are also marker genes^[97]13 for
astrocytes (p < 0.001, 229 genes) or endothelia (p = 0.013, 63 genes).
In contrast, many of the PSP down-regulated genes are
oligodendrocyte-specific (p < 0.001, 77 genes) (Figs. [98]1b, c,
[99]S4).
Our review of the DEGs revealed genes previously implicated in PSP or
Alzheimer’s disease (AD) (Fig. [100]1d, e). One of the top perturbed
genes is the antisense RNA KANSL1-AS1 which is significantly lower in
PSP brains (logFC = −1.00, FDR = 8.790E-5, Figs. [101]1d, e, [102]S5,
Table [103]S3). Both KANSL1-AS1 and its sense gene KANSL1 are located
within the PSP risk genome-wide association study (GWAS) locus near
tau-encoding MAPT^[104]2,[105]4. Another top perturbed gene,
astrocyte-enriched YAP1, which is significantly higher in PSP brains,
was identified as a regulatory network hub gene that is higher in AD
brains^[106]14. We previously showed the astrocyte-enriched CLU, an AD
risk GWAS gene, to be higher in both AD and PSP brains^[107]6, whereas
the oligodendrocyte-enriched MOBP, a PSP risk GWAS gene, is lower in
both diseases^[108]7. Our current DEG results confirm prior work and
demonstrate remarkable replication between the two brain cohorts
(Fig. [109]1d). Collectively, these findings highlight robust
transcriptome perturbations in PSP brains in glia-enriched genes.
Glial cell-enriched gene co-expression network modules are associated with
PSP
To identify and characterize groups of co-expressed genes associated
with PSP we performed harmonized weighted gene co-expression network
analysis (WGCNA)^[110]15 across both studies. A total of 16 modules
were detected using the consensus network construction algorithm after
harmonized gene assignments (Fig. [111]2, Supplementary Data [112]2).
These consensus network modules were robust with high agreement between
the gene expression clusters and module assignments (Figure [113]S6a)
and high preservation of modules between the two studies
(Figure [114]S6b). We correlated the module eigengenes (ME), a summary
measure of all gene expression values in the module, with PSP diagnosis
and each neuropathology phenotype (Fig. [115]2a). We identified 6
modules that are significantly (Bonferroni adjusted p < 0.05)
associated with PSP diagnosis. Among them, 3 modules have higher (M4,
M6, and M12) and 3 modules have lower (M3, M10, M11) expression in PSP
cases. These associations were consistent across both studies
(Figure [116]S7).
Fig. 2. Glial cell-enriched gene co-expression network modules are associated
with PSP.
[117]Fig. 2
[118]Open in a new tab
a The associations between the module eigengenes (ME) and PSP diagnosis
or quantitative tau neuropathology were assessed. b Enrichment of cell
type specific genes was detected in seven modules, three of which
(M3=oligodendrocyte, M4=astrocyte, and M6=microglia/endothelia) were
significantly associated with PSP. Statistics: one-sided Fisher’s Exact
Test. Unadjusted p values are provided in Supplementary Data [119]2 and
source data. c Associations were robust and consistent across both
studies and significant in the consensus network modules M3, M4, M6.
Statistics: two-sided t-test comparing PSP and control module
eigengenes. Bonferroni-adjusted p values are presented. Study 1:
N = 257 PSP and Control individuals; Study 2: N = 151 PSP and Control
individuals; Combined: N = 408 PSP and Control individuals. Also see
Supplementary Table [120]1. d Top enriched gene ontology biological
process terms in the PSP-associated glial cell-enriched modules. Source
data are provided as a Source Data file.
We hypothesized that these modules represent gene expression
perturbations in distinct cell types and biological pathways
(Fig. [121]2b–d). Indeed, three of the PSP-associated modules were
enriched for glial genes. We found an enrichment of oligodendrocyte
genes (Bonferroni-adjusted p < 2.22E-16) in the down-regulated M3,
consistent with our prior work^[122]8. M6, which is up in PSP, is
enriched for endothelial (Bonferroni-adjusted p < 2.22E-16) and
microglial (Bonferroni-adjusted p = 5.29E-4) marker genes, whereas M4,
also up in PSP, is an astrocyte-enriched module (Bonferroni-adjusted
p < 2.22E-16). Gene ontology (GO) terms enriched in co-expression
modules were broadly consistent with their cell types (Fig. [123]2d,
[124]S8, Supplementary Data [125]3). Microglial/endothelial M6 is
enriched for immunity- and vascular-related GO terms, while astrocytic
M4 has metabolic, and oligodendroglial M3 has RNA processing/splicing
GO term enrichment. These findings suggest that cell-type specific
transcriptional changes in PSP may be indicators of disrupted
biological processes including immune activation, angiogenesis, cell
energetics and RNA metabolism.
Single-nucleus RNAseq captures glial expression changes in PSP brains
We sought to further replicate the glial gene expression changes from
bulk brain RNAseq detected in our two independent studies by conducting
snRNAseq experiments in brain tissue samples from the TCX of PSP and
control patients (Table [126]S4). After QC (Figure [127]S9), we
obtained 26,241 nuclei from 34 brain samples comprising 18 PSP and 16
controls. To ensure cell populations can be reliably identified across
each sample, we integrated the snRNAseq data, treating each sample as a
batch variable. Nuclei were clustered based on the integrated PCA
embedding using the Louvain algorithm implemented in Seurat^[128]16,
which yielded 28 nuclei clusters. (Fig. [129]3a). There was no
statistically significant enrichment of nuclei from either sex or
diagnosis in any cluster (Figures [130]S10,[131]11). Although a few
samples were statistically over-represented in 8 clusters, 7 of these
clusters were small, and each contributed to <1% of the total nuclei in
the snRNAseq dataset, indicating homogeneity for most of the nuclear
clusters with respect to sex, diagnosis, and samples.
Fig. 3. Single-nucleus RNAseq captures glial expression changes in PSP
brains.
[132]Fig. 3
[133]Open in a new tab
a Nuclei in UMAP space colored by cluster number and grouped by its
cell type. b Average expression and percent nuclei expressed of the
cell type marker genes used for cluster type assignment in each
snRNAseq cluster. c Count and proportions of each nuclei type. d
Expression levels of module genes from the three cell-type-enriched,
PSP-associated expression modules (M3, M4, M6) in the snRNAseq clusters
for each cell type. Boxplots indicating the distribution of the
expression scores and the first quartile, the median, and the third
quartile. The upper whisker indicates the maximum value no further than
1.5 times the inter-quartile range from the third quartile. The lower
whisker indicates the minimum value no further than 1.5 times the
inter-quartile range from the first quartile. The expression was
derived using 26,241 nuclei from 34 PSP and control samples.
Statistics: One-way Analysis of Variance. Exact p values are provided
in source data. e The number of overlapping genes between snRNAseq DEGs
in each cluster and the three modules. P value: Fisher’s exact test for
the enrichment of snRNAseq DEGs in the expression module. Pie chart
demonstrates proportion of up- and down-regulated genes. The radii of
the pie charts reflect the significance of the overlap, where
significant overlaps (p < 0.05) have solid outline. Transparency of pie
chart reflects the size of the overlap between module genes and cluster
DEGs (more opaque=higher number of overlapping genes). Statistics:
one-sided Fisher’s Exact Tests. Source data are provided as a Source
Data file.
We annotated each cluster based on the overlap of their highly
overexpressed cluster-marker genes and a list of (see Methods)
well-known cell-type marker genes curated from the literature
(Fig. [134]3b) or via published databases (Figure [135]S12,
[136]13)^[137]13,[138]17, which yielded consistent annotations. All
major brain cell types were identified (Fig. [139]3c). All clusters
could be annotated according to their cell type except three small
clusters (CL23, CL24, CL26), which constitute <1% of total nuclei.
Upon comparison of the expression levels between PSP and control nuclei
in each cluster, we identified significant DEGs at FDR < 0.05 for genes
expressed in >10% of the nuclei in the cluster analyzed. We detected
significantly up- or down-regulated DEGs in both neuronal and glial
clusters (Table [140]S5, Supplementary Data [141]4). To provide
additional context and annotations for the genes from the three
glial-marker enriched WGCNA co-expression modules, M3, M4 and M6, that
are replicably associated with PSP in bulk RNAseq from two studies,
expression scores, calculated based on the average expression levels of
these modules’ genes for each nucleus in the snRNAseq, were analyzed
across clusters for each cell type (Fig. [142]3d). For completeness,
all clusters were included in the analysis, even for those that
accounted for less than 1% of the total nuclei. Module genes have high
expressions in the nuclei corresponding to the enriched cell type of
each module. Oligodendrocyte-enriched module M3, astrocyte-enriched M4
and microglia/endothelia-enriched M6 genes have the highest expression
in oligodendrocyte, astrocyte, and endothelia snRNAseq clusters,
respectively, thus validating module cell-type annotations. There was
expression of these module genes in other cellular clusters,
underscoring that they are enriched in but not exclusive to specific
cell types.
We subsequently evaluated the overlap between the snRNAseq DEGs in each
nuclear cluster with genes from the three modules (Fig. [143]3e,
[144]S14). Oligodendrocyte-enriched M3 genes have the most significant
overlap (p = 2.74E-10) with DEGs of the oligodendrocyte cluster CL0.
Further, 86.76% of these overlapping snRNAseq genes were down in PSP,
consistent with the negative association between M3 eigengene and PSP
(Fig. [145]2a). Astrocyte-enriched M4 genes overlap mostly with
astrocyte CL2 DEGs (p = 4.48E-49). These overlapping genes have a
slightly higher proportion of up DEGs (54.52%), consistent with the
positive association between M4 eigengene and PSP (Fig. [146]2a).
Microglia/endothelia-enriched M6 genes significantly overlap with
microglia CL3 DEGs (p = 2.34E-10), pericytes CL12 DEGs (p = 3.90E-4)
and astrocytes CL2 DEGs (p = 3.35E-20). Their direction of change is
mostly up in astrocytes (82.36% up) and mixed in the other clusters
(microglia: 49.57% up, pericytes: 53.85% up).
In summary, snRNAseq data corroborates the cell-enrichment annotations
for the PSP-associated bulk RNAseq modules M3, M4, and M6.
Directionality of gene expression changes for these modules’ genes is
highly consistent between bulk RNAseq and snRNAseq oligodendrocyte
cluster, although other glial clusters have DEGs that change in both
directions, suggesting that single nucleus data may capture more subtle
gene expression changes that may be missed in bulk data.
Cross-species prioritization and screening of glial gene expression changes
in PSP
Our bulk RNAseq data analyses yielded expression perturbations,
highlighting glial cell-enriched co-expression modules (M3, M4, M6)
comprising 4,969 genes. The subset of these module genes that are also
significant snRNAseq DEGs in the same glial cell clusters still
constitute a large number for experimental validations
(Figure [147]S14). We, therefore, applied a systematic data-driven
prioritization approach to further narrow down and select genes using
the human brain transcriptome data from this study and a published tau
mouse model^[148]18.
Our prioritization approach is schematized in Fig. [149]4a. Among the
4969 genes from modules M3, M4 and M6, we focused on those that are
central and highly connected within each module, defined as hub genes
with module membership (MM) > 0.7. There are 550 hub genes that are
also DEGs (FDR < 0.05) based on the meta-analysis of two bulk brain
RNAseq studies. We further filtered these genes by selecting those that
are also significant DEGs in the snRNAseq data cluster corresponding to
the bulk module cell type and that have a concordant direction of
change between bulk and snRNAseq. For oligodendrocyte-enriched M3 and
astrocyte-enriched M4 hub genes, we filtered for snRNAseq DEGs that are
down in PSP in oligodendrocyte and up in astrocyte clusters,
respectively, resulting in 56 and 59 genes, respectively
(Fig. [150]4a). For the microglia/endothelia-enriched M6 hub genes,
given the enrichment of snRNAseq DEGs from this module genes within
these cell types (Fig. [151]3e), we selected those that are
up-regulated snRNAseq in microglia, endothelia, pericytes or
astrocytes, resulting in 40 genes. In total, there are 155 such genes
which reflect robust glial gene expression changes in PSP brains.
Fig. 4. Prioritization and experimental validation strategy for PSP brain
transcriptome changes.
[152]Fig. 4
[153]Open in a new tab
a Prioritization approach using bulk brain human RNAseq, snRNAseq and
rTg4510 tau mouse brain transcriptome data which led to 21 high
confidence glial perturbed genes in PSP. Of these, 11 had available
Drosophila ortholog genes with RNAi for in vivo screening. b Upset Plot
showing the overlap of different validation methods and the number of
genes selected at each filtering stage from each module. c Summary
results of the Drosophila screen of the 11 genes in using the GMR-Gal4
system. d Circos plot visualization of genes from the three glial
cell-enriched, PSP-associated gene expression modules. Modules M3, M4,
M6 (outer-most = first ring) and their bulk DEG directions of PSP
associations (second ring, red = up and blue = down-regulated modules),
module membership (third ring, module membership scores are
color-coded), validation in snRNAseq (fourth ring, validated = green),
tau mouse model rTg4510 brain RNAseq (fifth ring, validated=green), and
Drosophila tau model RNAi in vivo screening (sixth ring, Drosophila eye
neurodegeneration morphology scores are color-coded) are shown. Red
text indicates the 11 high confidence PSP glial perturbed hub genes
that have available Drosophila orthologs and were screened
experimentally in vivo. Source data are provided as a Source Data file.
To determine the gene expression changes that are preserved in a mouse
model of tauopathy, we utilized brain transcriptome data from a mouse
model that overexpresses a mutant form of human tau encoding MAPT and
develops tau neuropathology by 4 months of age^[154]18. We analyzed
brain gene expression data from 24 transgenic (rTg4510) mice and their
wild-type control littermates sacrificed at 4.5 or 6 months of age
(Table [155]S6), available on the AD Knowledge Portal^[156]19.
Among the 155 genes with robust glial expression changes in PSP brains,
21 were also significant DEGs perturbed in the same direction as humans
in either 4.5- or 6-month rTg4510 mouse brains (Fig. [157]4a, b). The
rTg4510-validated genes are significantly over-represented in the
astrocyte-enriched module M4 (p = 1.97E−3) and
microglia/endothelia-enriched module M6 (p = 4.63E−18), whereas only
one DEG from the oligodendrocyte-enriched M3 was validated in the mouse
data. The cell-type-specific overrepresentation of validated genes
suggests that the rTg4510 mouse model may best recapitulate
tauopathy-related glial expression changes in astrocytes, endothelia,
and microglia but not oligodendrocytes. The 21 genes that passed
through the human and mouse model filters represent high-confidence PSP
glial DEGs with cross-species validation.
To determine whether experimentally perturbing levels of
high-confidence PSP glial genes would impact tau-related
neurodegeneration, we carried out a knockdown screening experiment with
the 21 genes using a Drosophila model of tauopathy with GMR-Gal4
driver^[158]20. Of the 21 high-confidence PSP glial DEGs, 11 have a
Drosophila ortholog and available RNAi stocks (Tables [159]S7,[160]8).
Using a semi-quantitative scoring system, where negative scores reflect
suppression and positive scores reflect enhancement of the
neurodegenerative eye morphology, we determined that RNAi suppression
of 10 of the 11 tested genes suppressed the neurodegenerative eye
morphology (Fig. [161]4c). The prioritization and gene information is
summarized in Fig. [162]4d.
Validation of top tau toxicity suppressing PSP glial perturbed genes
Three of the 11 screened genes, DDR2, KANK2, and STOM, have the
strongest suppressive effect on the Drosophila tau eye phenotype
(Score < −1.5) when down-regulated with RNAi (Fig. [163]4c, d). DDR2 is
a hub gene in the astrocyte-enriched M4, whereas KANK2 and STOM are
hubs in the microglia/endothelia-enriched M6 (Fig. [164]5,
Figures [165]S15,[166]16). We first examine the gene expression
perturbation of these three genes in our human brain transcriptome
data. In alignment with the prioritization paradigm, all three genes
have higher expression levels (FDR < 0.05) in PSP brains based on bulk
RNAseq (Fig. [167]5a). The up-regulation in bulk tissue is also
consistent across both studies, suggesting the results are not driven
by a particular cohort. The three genes are up-regulated even after
adjusting for differences in cell proportions (Table [168]S7),
indicating that the expression perturbations are not driven by this but
are likely associated with disease pathogenesis. In snRNAseq
(Fig. [169]5b), all three genes are significantly upregulated in
astrocytes. Pseudobulk DEG analysis using the snRNAseq datasets also
supported the up-regulation of the three genes in astrocytes
(Table [170]S8). Interestingly, the subclustering analysis
(Figure [171]S17) indicated DDR2 and KANK2 are up-regulated in specific
populations of astrocytes including reactive astrocytes, whereas the
up-regulation of STOM is universal to all astrocytes. Our transcriptome
data demonstrate that STOM, KANK2, and DDR2 are up-regulated in glial
cells in PSP.
Fig. 5. Functional validation of the top tau-mediated toxicity suppressing
PSP genes.
[172]Fig. 5
[173]Open in a new tab
a Expression levels of the top in vivo validated genes in bulk RNAseq
dataset. Horizontal bar indicates median expression levels. Statistics:
differential expression analysis using linear regression adjusting for
covariates. FDR-adjusted p values are presented. Unadjusted p values
are provided in Supplementary Data [174]1 and source data. Study 1:
N = 257 PSP and Control individuals; Study 2: N = 151 PSP and Control
individuals; Meta: N = 408 PSP and Control individuals. Also see
Supplementary Table [175]1. Red text indicates up-regulation in PSP.
Blue text indicates down-regulation in PSP. b Expression levels of the
top in vivo validated genes in the snRNAseq dataset. Statistics:
Zero-inflated generalized linear regression comparing PSP versus
Control nuclei adjusting for sex and age, implemented by the MAST
package. FDR-adjusted (FDR) or unadjusted (P) p values as indicated.
Red FDR/p indicates up-regulation in PSP. Blue FDR/p indicates
down-regulation in PSP. Expression changes with unadjusted p values
greater than 0.05 were not annotated for clarity. c Statistical tests
(two-sided Wilcoxon rank sum tests) showed a significant reduction in
eye degeneration (n = 10 biologically independent flies per genotype
were assessed). Boxplots indicating the distribution of the fly eye
degeneration scores and the first quartile, the median, and the third
quartile. The upper whisker indicates the maximum value no further than
1.5 times the inter-quartile range from the third quartile. The lower
whisker indicates the minimum value no further than 1.5 times the
inter-quartile range from the first quartile. d Representative
Drosophila eye pictures indicate a robust rescue of tau-mediated
toxicity when inhibiting the expression of the fly ortholog of STOM,
KANK2, or DDR2 with RNAi. Source data are provided as a Source Data
file.
Consistent with the results that DDR2, KANK2, and STOM are all
upregulated in human PSP and rTg4510 mouse brain transcriptome, our
initial tau Drosophila screening experiment (Fig. [176]4c) demonstrated
that RNAi inhibition of these genes suppressed tau-mediated cell
toxicity in this model. To further validate and confirm these
Drosophila screening results, we repeated the RNAi experiments for
these three genes, and quantified the severity of tau-mediated toxicity
using two blinded independent evaluators (Fig. [177]5c, d). As
expected, the expression of tau in Drosophila under the GMR-Gal4 driver
led to significant eye degeneration compared to wild type (WT) flies
(two-sided Wilcoxon rank sum test, p = 6.16E-5). Importantly,
expression of RNAi against DDR2, KANK2, or STOM in tau Drosophila
significantly reduced the tau-mediated cell toxicity in fly eyes. These
results suggest that these three genes can be targeted in PSP or other
tauopathies as a potential therapeutic avenue to ameliorate
tau-mediated neurodegeneration.
We built an interactive web application tool PSP RNAseq Atlas
([178]https://rtools.mayo.edu/PSP_RNAseq_Atlas/), which houses human
bulk and snRNASeq transcriptome, mouse and Drosophila results. Our
application can quickly report the key statistics for queried genes and
is built to facilitate the dissemination of our results to the broader
research community.
Discussion
To translate the growing amount of multi-omics data into
high-confidence candidate therapeutic targets with mechanistic
implications, systematic prioritization platforms that integrate
large-scale, multi-modal data and experimental validations are
required. In this study, we applied a cross-species systems biology
approach to large-scale human brain transcriptome data at bulk tissue
and single nucleus resolutions, a tau mouse model^[179]18 and
experimental validations in a Drosophila tau model^[180]20, which
revealed robust glial transcriptome perturbations and nominated
therapeutic targets in PSP.
Using bulk brain transcriptome, we discovered 2528 differentially
expressed genes (DEGs) that are robustly detected across two studies of
408 donors, comprised of PSP cases, and controls without
neurodegenerative diseases. PSP up-regulated DEGs are enriched for
astrocyte- or endothelia-specific genes, whereas down-regulated genes
are enriched for oligodendrocyte markers. This human bulk brain
transcriptome is organized into 16 co-expression network modules, three
of which have both robust PSP association and cell-type enrichment.
Consistent with the DEG findings, the PSP down-regulated module M3 is
enriched for oligodendrocyte genes, whereas the up-regulated M4 and M6
are astrocyte and microglia/endothelia enriched modules, respectively.
In our prioritization paradigm, we first focused on 550 bulk DEGs that
are also highly connected hub genes in these glial modules, based on
the rationale that these centrally linked genes with perturbed
expressions may represent targetable disease pathway molecules.
With brain snRNAseq of PSP and control donors, we further filtered down
these 550 hub bulk DEGs by selecting the subset of 155 that are
snRNAseq DEGs with concordant expression changes in the cell cluster
corresponding to the bulk module cell type (Figure [181]S18). Of these
155 human bulk brain and snRNAseq glial DEGs, 21 had consistent changes
in the rTg4510^[182]18 tau mouse model brain RNAseq. Eleven of these
had Drosophila orthologues and when tested in an RNAi-based screening
of Drosophila tau model^[183]20, they significantly influenced
tau-related neurodegeneration. We prioritize STOM, and KANK2 from the
microglia/endothelia enriched (M6) and DDR2 from the astrocyte-enriched
(M4) modules, as potential therapeutic targets with both multi-omics
and model system validation in our study.
STOM encodes stomatin, which belongs to an evolutionarily conserved
superfamily of proteins, is oligomeric, localizes to membrane lipid
rafts, functions as a membrane scaffold through actin binding and
regulates activities of ion channels and receptors^[184]21. Stomatin
has anti-cancer properties^[185]22, is upregulated with hypoxia in the
rat brain^[186]23 and has known chemical ligands, highlighting its
potential as a druggable target^[187]24. KANK2 is a primarily
mesenchymal member of an evolutionarily conserved gene
family^[188]25,[189]26, interacts with actin and serves in multiple
roles acting as a scaffold within the cortical microtubule stabilizing
complexes^[190]25 including cell migration^[191]27.
The emergence of two microtubule-associated membrane scaffold proteins,
STOM and KANK2, as top prioritized potential therapeutic targets for
PSP, a primary tauopathy characterized by microtubule-associated
protein tau neuropathology, is noteworthy. Both genes are enriched in
biological processes involved in the immune response (Figs. [192]2,
[193]3) within module M6, which has greatest overlap of bulk DEGs with
astrocyte, microglial, and endothelial snRNAseq DEGs. Collectively,
these results suggest that these proteins may be involved in the
pathogenesis of PSP by promoting glial tau neuropathology via
disrupting microtubule structure and function.
DDR2, which encodes discoidin domain receptor, is a collagen-activated
receptor tyrosine kinase that regulates multiple functions including
cell migration, proliferation and tissue modelling^[194]28 through MAP
kinase pathway, leading to the activation of transcriptional factor
RUNX2^[195]29, which itself is a PSP risk gene^[196]30. Besides its
functions in bone modelling^[197]29 and cancer^[198]28, DDR2 has also
been implicated in neurodegeneration^[199]31. Consistent with our
findings, it was shown that DDR2 levels are upregulated in the
astrocytes from MAPT mutant organoids^[200]32. DDR2 knockdown in vitro
and in vivo led to increased clearance of neurodegenerative proteins
including tau, reduced cell loss and attenuated immune response
including TREM2 positive microgliosis^[201]31. DDR2 was previously
shown to oppose the osteoclastogenic effect of TREM2 by sequestering
its signalling partner PlexinA1^[202]33. Intriguingly, DDR2 shares
signalling partners with the neurodegenerative disease risk gene,
TREM2^[203]34, and both genes have functions in bone metabolism and
neurodegeneration. Our findings add DDR2 to the growing list of
neurodegenerative disease genes also involved in bone modelling.
Potent small molecule inhibitor of DDRs, nilotinib replicated the
experimental outcomes of DDR knockout models^[204]31. Nilotinib was
shown to be safe in randomized clinical trials for Parkinson’s
disease^[205]35–[206]37 and Alzheimer’s disease^[207]38, with a
reduction of cerebrospinal fluid (CSF) tau reported in some
studies^[208]35,[209]38. Our findings suggest that nilotinib or other
small molecules inhibiting DDR2 may also be viable therapeutic
candidates in PSP. We provide a list of small molecule inhibitors
retrieved from the Drug Gene Interaction database (DGIdb) in
Supplementary Data [210]5.
Our collective findings demonstrate glial expression changes in PSP in
a brain region relatively spared from gross neuropathological changes
in this condition^[211]11,[212]12, suggesting these changes are less
likely to be driven by neuropathology including neuronal loss or
gliosis. We detected a smaller number of genes significantly associated
with neuropathology than with PSP diagnosis. This is likely because the
analysis was carried out only in PSP cases, which has lower power both
owing to smaller sample size and smaller effect size. Furthermore, the
neuropathology phenotypes represent specific cellular tau pathology in
PSP, which may be associated with less transcriptional perturbations
than the binary overall disease phenotype. Nevertheless, our
cross-species multimodal systems biology approach prioritized three
genes as potential therapeutic targets in PSP. Additionally, the
internal replication of the PSP bulk transcriptome changes in two
independent studies with confirmation in snRNAseq provides the field
with hundreds of high-confidence genes well-annotated for their module
membership, cell-type enrichment, and biological processes.
Our findings also replicate prior transcriptome changes reported in PSP
including down-regulation of oligodendrocyte-enriched myelination gene
networks^[213]7 and up-regulation of immune networks with astrocytic
tau pathology^[214]8. Some of the most perturbed genes in PSP brains
have been implicated in the risk of PSP and shown to have regulatory
changes, namely oligodendrocytic MOBP^[215]2,[216]4,[217]7,
KANSL-AS1^[218]2,[219]4,[220]10 and astrocyte-enriched YAP1^[221]14.
Another astrocyte-enriched AD risk gene, CLU, is also significantly
higher in PSP brains and AD^[222]6, underscoring common transcriptome
changes in these two neurodegenerative diseases^[223]6 that may point
to shared disease mechanisms. Our findings and web tool are therefore
expected to be useful across different neurodegenerative disorders.
Despite these strengths and insights, there are limitations to our
study. We acknowledge that single-nucleus RNAseq captures only the
transcriptional activity within the nucleus and might potentially miss
biological changes relevant to PSP, which may be detectable by single
cell RNAseq (scRNAseq). Nevertheless, we were still able to validate in
snRNAseq the bulk RNAseq expression changes in the corresponding nuclei
type that led to the discovery of DDR2, KANK2, and STOM as potential
therapeutic targets for PSP. Moreover, snRNAseq allowed the use of
frozen brain tissue, which made it possible to include as many PSP
samples as possible. There are no PSP-specific mouse models. We used a
single mouse model of tauopathy rTg4510^[224]18 that despite
age-dependent brain deposition of human 4-repeat tau, neuronal, and
synaptic loss, is not strictly a model of PSP. Further, factors other
than human tau overexpression may be contributing to these mouse
phenotypes^[225]39. However, this model was still valuable for the
purpose of screening human DEGs with concordant changes in a mouse
model of tauopathy. We note that although many of the astrocytic,
microglial, and endothelia genes from modules M4 or M6 were validated
in rTg4510, this was not the case for oligodendrocytic DEGs. This
suggests that the rTg4510 tauopathy model may be capturing some but not
all aspects of the glial perturbations in PSP. Our available data will
enable future studies for conducting systematic cross-species
comparisons to identify models that recapitulate different aspects of
human brain molecular perturbations. This will be useful in selecting
appropriate models for experimental validations and pre-clinical
therapeutic trials. Moreover, carrying out cell-type-specific
overexpression or knockdown of the target genes in mouse models and
assessing the clinical impact will be informative to validate the
therapeutic target prior to clinical applications.
In our study, we elected to use a well-established Drosophila model of
tau-mediated neurodegeneration^[226]20,[227]40 for experimental
validations. Although this model has the advantage of facilitating a
relatively rapid and cost-effective medium-throughput experimental
screen, it is not a one-to-one translation to human brain disease.
Additionally, unlike the repo-Gal4 driver^[228]41, the Drosophila
GMR-Gal4 driver is not glia specific as it drives the expression of tau
in all cell types. Nevertheless one of the top prioritized genes DDR2,
has congruent findings in Drosophila when suppressed with RNAi as in
other in vitro and in vivo models^[229]31 and human clinical
trials^[230]35,[231]38. This provides proof-of-principle for the
utility of the fly as a facile screening tool to prioritize findings
for further validations by more cumbersome though also more complex
models.
In summary, our systems biology approach that integrates multimodal
omics and phenotype data across species with experimental validations
discovered DDR2, STOM and KANK2 as potential therapeutic targets in
PSP. Our findings demonstrate robust transcriptome changes in glia that
may underlie the striking glial tau-pathology in this disease. Our
findings and data we make available in the interactive web application
PSP RNAseq Atlas ([232]https://rtools.mayo.edu/PSP_RNAseq_Atlas/)
highlight the complex pathophysiology of PSP. Our web application, data
and results are also expected to serve as a resource for the research
community to apply their own paradigms for therapeutic target or
biomarker prioritizations in PSP or other neurodegenerative diseases.
Methods
Sample information
This study was approved by the Mayo Clinic Institutional Review Board
(IRB). The use of samples from the University of Kentucky and Banner
Sun Health Research Institute has been approved by the appropriate
review boards. Additional data used in this study from the AD Knowledge
Portal ([233]https://adknowledgeportal.synapse.org) were accessed under
the data usage agreement
([234]https://adknowledgeportal.synapse.org/DataAccess/DataUseCertifica
tes). All personally identifiable information has been removed. Written
informed consent was obtained from all participants, their qualified
caregivers or next of kin.
We have collected RNAseq from bulk brain tissue of 408 frozen,
post-mortem, temporal cortex (superior temporal gyrus) tissue samples
consisting of 127 control and 281 PSP patients from two independent
study cohorts (Table [235]S1). All PSP cases were obtained from the
Mayo Clinic Brain Bank and received a neuropathologic diagnosis of
PSP^[236]11,[237]12 by a single neuropathologist (DWD). All PSP samples
also underwent neuropathological evaluation for the overall and
cell-specific tau lesions (TA, CB, NFT, TauTh). The semi-quantitative
tau pathology scores were transformed and normalized as previously
described^[238]4,[239]8. Amongst brain donors with PSP, there was only
one patient with a MAPT mutation, namely the A152T variant which is
considered a rare risk factor for tauopathies and not a fully penetrant
pathogenic mutation^[240]42,[241]43.
Control samples were obtained from Banner Health Research Institute
(Banner), Mayo Clinic Brain Bank (Mayo), and University of Kentucky
(UKy). Controls were defined as those brain samples with Braak^[242]44
NFT stage of 3.0 or less; CERAD^[243]45 neuritic and cortical plaque
densities of 0 (none) or 1 (sparse) and lacked any of the following
pathologic diagnosis: AD, Parkinson’s disease (PD), DLB, VaD, PSP,
motor neuron disease (MND), CBD, Pick’s disease (PiD), Huntington’s
disease (HD), FTLD, hippocampal sclerosis (HipScl) or dementia lacking
distinctive histology (DLDH), as previously described^[244]46.
Study 1 comprises 199 PSP and 58 control samples (43 UKy, 15 Mayo),
whereas study 2 has 82 PSP and 69 control samples (44 Banner, 25 Mayo).
RNAseq and DEG from study 2 were previously reported^[245]6,[246]46.
Within each study cohort, there was well-balanced sex distribution
between PSP and controls. However, there were significant differences
in donor age at death, RNA integrity number (RIN) and brain bank
source, all of which were adjusted for in the analyses.
For the snRNAseq, all samples were obtained from the Mayo Brain Bank.
The PSP cases (N = 18) received a neuropathologic diagnosis of
PSP^[247]11,[248]12 by a single neuropathologist (DWD). The control
samples (N = 16) are defined as those brain samples that lack any of
the following pathologic diagnosis: AD, Parkinson’s disease (PD), DLB,
VaD, PSP, motor neuron disease (MND), CBD, Pick’s disease (PiD),
Huntington’s disease (HD), FTLD, hippocampal sclerosis (HipScl) or
dementia lacking distinctive histology (DLDH), as previously
described^[249]4,[250]8. Samples for snRNAseq were selected to ensure
that the PSP and control donors had comparable sex and age at death.
Additionally, we prioritized samples with available bulk tissue RNAseq
and/or whole genome sequencing data and for those that had sufficient
amount of available frozen tissue. Lastly, for the PSP samples, we
selected samples that represented the quantitative neuropathology score
distribution of the larger pool of available PSP samples.
RNAseq of the bulk human brain
Generation of raw bulk RNAseq data in the study 2 cohort was previously
described^[251]6,[252]46. Briefly, libraries were prepared from total
RNA using the TruSeq RNA Sample Prep Kit (Illumina, San Diego, CA) and
sequenced on Illumina HiSeq 2000 instruments. For study 1, cDNA
libraries were prepared using 200 ng of total RNA according to the
manufacturer’s instructions for the TruSeq RNA Sample Prep Kit v2
(Illumina, San Diego, CA). The concentration and size distribution of
the completed libraries were determined using an Agilent Bioanalyzer
DNA 1000 chip (Santa Clara, CA) and Qubit fluorometry (Invitrogen,
Carlsbad, CA). Libraries were sequenced at six samples per lane,
following Illumina’s standard protocol using the HiSeq 3000/4000 PE
Cluster Kit. The flow cells were sequenced as 100 ×2 paired-end reads
on an Illumina HiSeq 4000 using the HiSeq 3000/4000 sequencing kit and
HCS v3.3.20 collection software. Base-calling was performed using
Illumina’s RTA version 2.5.2.
Single nucleus RNAseq of human brains
Frozen temporal cortex tissue samples were obtained from the Mayo
Clinic Brain Bank. Total RNA from ~20 mg collected tissue was isolated
to evaluate the quality of tissue. RNA integrity number (RIN) was
determined via Agilent 2100 Bioanalyzer using RNA Pico Chip assay, and
tissues that have RIN > 6.0 were utilized in nuclei isolation and
single nucleus RNA sequencing (snRNAseq).
For each participant, 100 mg tissue sample was used for nuclei
isolation using a modified protocol^[253]47,[254]48. Samples were
homogenized with 25 strokes of loose and tight pestle sequentially
using dounce homogenizer in homogenization buffer (0.25 M sucrose,
25 mM KCl, 5 mM MgCl2, 20 mM tricine-KOH, pH 7.8, 1 mM DTT, 0.15 mM
spermine, 0.5 mM spermidine, protease inhibitors, 5 μ g/mL actinomycin,
5 u/ μL recombinant RNAase inhibitor, and 0.04% BSA). IGEPAL (5%,
Sigma, I8896) solution was added following stroke with the tight pestle
to a final concentration of 0.32%. After 10 additional strokes, the
tissue homogenate was filtered using a 30 μm cell strainer. Debris was
pelleted by centrifugation at 500 g for 5 minutes and washed with Wash
and Storage Buffer (WSB, 1XPBS with 2%BSA and 5 u/μL recombinant RNAase
inhibitor (Takara Bio, 2313 A)). The nuclei-containing supernatant was
filtered again with a 30 μm cell strainer, followed by centrifugation
at 500 g for 10 minutes. After re-suspending the pallet in 700 μl cold
PBS with 5 U/μl RNAse inhibitors, 300 μl debris removal solution
(Miltenyi Biotech) was added, and the solution was gently mixed.
Another 1 mL WSB was carefully overlaid on top of the nuclei solution.
The supernatant was removed after centrifugation at 3000 g for
10 minutes. The nuclei were washed once with WSB and pelleted after
centrifugation for 10 minutes at 1000 g.
Isolated nuclei were sorted using fluorescence-activated nuclei sorting
(FANS). Human Nuclear Antigen [235-1] (ab191181, Abcam) antibody was
applied to the nuclei at 1:50 and incubated for 1 hour on ice.
Concurrently, mouse IgG1, kappa monoclonal [15-6E10A7] isotype was
included as controls (ab170190, Abcam, 1:50). Goat anti-mouse Alexa488
secondary antibodies (ab150113, Abcam, 1:200) were incubated with the
nuclei for 30 minutes on ice. The stained nuclei were reconstituted in
WSB and sorted using BD FACSAria II sorter using the 70-micron nozzle
with 70 psi sheath pressure and 1.5 ND filter. Examples of the gating
strategy are shown in Figure [255]S19. After quantifying the sorted
nuclei using a hemocytometer in 0.04% trypan blue, a total of 1000
estimated nuclei at 700 nuclei/μl were loaded on the 10x Chromium
microchip. Single cell gel beads-in-emulsion (GEMs) were generated by
running Single Cell Instrument (10X Genomics) on the sorted nuclei.
Chromium Single Cell 3’ Gel Bead and Library Kit v3.1 (10X Genomics,
No. 120237) and the Single Index Kit T Set A (10X Genomics, No.
1000213) were used to prepare the single nucleus RNAseq libraries
according to the manufacturer’s instructions. Qualities of libraries
were checked using Agilent High Sensitivity DNA Kit (Agilent
Technologies, 5067-1504) via Agilent 2100 Bioanalyzer. DNA libraries
were sequenced at the Mayo Clinic Genome Analysis Core (GAC) using the
Illumina NovaSeq 6000 sequencer.
Bulk human brain RNAseq analyses and differential gene expression
Alignment and processing of the bulk RNAseq data in study 2 were
previously described^[256]46. For study 1, raw paired-end reads were
processed through MAP-RSeq pipeline v2.0^[257]49. MAP-RSeq removes
reads of low base-calling Phred scores, aligns remaining reads to
reference genome hg19 using TopHat aligner v2.0^[258]50, and counts
reads in genes and exons using subread. It obtains QC measures from
pre- and post-alignment reads using the RSeQC toolkit and
fastQC^[259]51. Subsequently, we identify samples for exclusion defined
as those samples with high RNA degradation, low mappability,
discordance between recorded sex and estimated sex, or based on
principal component analysis (PCA) such that samples whose PC1 or PC2
are outside mean +/− 3*SD.
After excluding samples that fail QC, raw RNA read counts from
remaining samples were normalized using R package CQN, which gives
library size, gene length, and GC content adjusted expression values in
the log2 scale. Based on the bimodal expression distribution, genes
with low CQN values (CQN < 2 for study 1, CQN < −1 for study 2) were
filtered out (Figure [260]S1). Only genes that are expressed in both
cohorts are used in the following analysis. Batch effect due to the
source of the samples (the brain bank from which the sample was
obtained) was corrected using the combat function from R package
sva^[261]52. The associations between the batch-corrected, normalized,
bulk gene expression and different clinicopathological traits (PSP
diagnosis, TA, CB, NFT, TauTh, and overall pathology) were assessed for
study 1 and study 2 separately using a multiple linear model adjusting
for technical and biological covariates including sex, age at death,
RIN, and sequencing flowcell. Gene expression levels were treated as a
continuous dependent variable, while the traits of interests were
treated as independent variables, coded as binary (PSP = 1, control =
0) or continuous (quantitative pathology scores) variables. For the
association of gene expression levels with quantitative pathology
scores, the tests were performed within PSP cases only, as controls do
not have pathology scores. The association effect sizes (regression
beta coefficients) in studies 1 and 2 were combined using an
inverse-variance weighting meta-analysis, implemented in R package
‘meta‘^[262]53. A fixed-effect model was used for gene-trait
associations when Higgin’s and Thompson’s I^2 heterogeneity
value^[263]54 was less than 0.3, while a random-effect model was used
otherwise. Lastly, we adjusted for multiple testing using the
Benjamini-Hochberg false discovery rate^[264]55 (FDR).
Cell proportions were estimated based on the expression values of the
top 100 BRETIGEA cell type marker genes^[265]13 for astrocyte,
endothelia, microglia, neuron, and oligodendrocyte using the Digital
Sorting Algorithm (DSA)^[266]56. The association between PSP vs Control
diagnosis and gene expression was assessed in a comprehensive model
that adjusted for key covariates (sex, age at death, RIN, sequencing
flowcell) and the estimated cell proportions in each study.
Meta-analysis was carried out using the same strategy outlined for the
main model. Multiple testing was adjusted using FDR.
Bulk human brain RNAseq co-expression network analyses
Gene expression networks were constructed using the R package
WGCNA^[267]15. Before the network analysis, residuals from the
batch-corrected, normalized gene expression data after adjusting for
technical and biological covariates (sex, age, RIN, flowcell number)
were generated for study 1 and study 2 separately. We calculated the
adjacency matrices
[MATH:
Aij
:MATH]
using the pairwise biweighted midcorrelation (bicor), whose element
[MATH:
aij=bicor(
mo>gi,gj) :MATH]
, for study 1 and study 2 separately. Topological overlap matrices
(TOMs) were calculated based on their adjacency matrices as a signed
network with a soft power threshold of 12 for studies 1 and 2
separately. Subsequently, after applying quantile scaling, we
calculated the consensus TOM as the component-wise minimum of the TOMs
from study 1 and study 2. The modules were identified using
hierarchical clustering and dynamic treecut algorithm provided in the
WGCNA package from consensus TOM.
From these initial modules, we identified the final modules as follows.
We calculated the module eigengenes (ME) as the signed first principal
component using gene expression of initial modules. The module
memberships (MM) were calculated as the biweighted midcorrelation of
the ME and gene expression. Closely related modules, defined as
biweighted midcorrelation of ME higher than 0.8, were merged. We
further examined and reassigned genes in each module based on their MM.
Specifically, genes with negative MM were moved to the background
module (module 0). In contrast, genes in the initial background modules
were reassigned to the module with maximum MM if the maximum MM was
greater than 0.5. We recalculated the ME and MM values using the final
module definitions.
A t-distributed Stochastic Neighbor Embedding (t-SNE) plot
(Figure [268]S6) was made for final modules using gene expression of
the combined cohort of study 1 and 2 through R package Rtsne as
described in other studies^[269]57. In addition, statistics were
calculated regarding the preservedness of final modules in studies 1
and 2 separately (Figure [270]S6). We assessed the module preservations
using the WGCNA::modulePreservation() function^[271]58, and calculated
preservation Z statistics with 100 permutations.
Biweighted midcorrelation between the ME and traits (PSP diagnosis, TA,
CB, NFT, TauTh, and overall pathology) were calculated for each module.
The association significance p values were adjusted to the number of
modules using Bonferroni correction. Based on the module assignment, we
tested if there were any significant enrichment of the top 100
BRETIGEA^[272]13 cell type marker genes in any of the modules using a
one-sided version of Fisher’s exact test and a Bonferroni adjusted
p-value cut-off of 0.05. Additionally, we calculated the gene ontology
(GO) terms enriched in the modules using the R package
anRichment^[273]15, which calculates the statistical significance of
the overrepresentation of GO terms based on hypergeometric distribution
with an FDR-adjusted p-value of 0.05. Modules were manually annotated
with the top GO terms from the list. (Figures [274]S7,[275]8).
Human Brain Single Nucleus RNAseq (snRNAseq) Validations
Raw snRNAseq data were processed and aligned using Cell Ranger version
4.0 (10X Genomics). Raw reads were mapped to the human reference genome
hg38 using the STAR aligner. We obtained an average of 924 nuclei
(standard deviation: 469, N = 36) per sample and performed quality
control for each individual (Figure [276]S9). First, nuclei with an
extreme number of mapped UMIs (lower bound 1000, upper bound 98th
percentile, or >=85,906.5, Figures [277]S9a, b) or detected genes
(lower bound 500, upper bound 98^th percentile >=9,648.8,
Figures [278]S9c, d) were removed. Nuclei with more than 10%
mitochondrial genes were also excluded (Figure [279]S9e, f). At a
sample level, PSP sample 12117 was removed from the analysis because it
has an abnormal distribution in terms of the number of mapped UMIs and
the number of detected genes (Figure [280]S9g). PSP sample 12051 was
removed due to the low number of nuclei (N = 117, Figure [281]S9h).
Genes that are only expressed (defined as having a count of greater
than 0) in less than 6 nuclei are excluded (Figure [282]S9i). The
recorded sex for each individual was compared against that of the
median expressions of 4 chromosome Y genes RPS4Y1, EIF1AY, DDX3Y, and
KDM5D^[283]59 to identify any potential mislabeled samples
(Figure [284]S9j).
Using Seurat R package^[285]16, we merged UMI counts from each sample,
performed library size normalization, and log transformation. We
integrated the dataset using the Harmony package^[286]60, treating each
sample as its own batch. Nuclei were clustered using a Shared Nearest
Neighbor (SNN) Graph implemented in Seurat^[287]16 at a resolution of
0.4 with default parameters. We checked if there was significant
enrichment (fold-enriched > 5, unadjusted p-value < 0.05) of nuclei
from any diagnosis, sex, or sample in each cluster (Figures [288]S10,
[289]11).
To elucidate cluster cell type, we first identified cluster marker
genes that are significantly (FDR < 0.05) over-expressed (logFC > 0.5)
and universally expressed (percent nuclei > 70%) in each cluster
compared to other clusters. We annotated each cluster based on the
overlap of their highly overexpressed cluster-marker genes and known
cell-type marker genes^[290]61–[291]66 (Fig. [292]3b): GFAP and AQP4
for astrocyte, VWF, PECAM1, FLT1 for endothelia, NRGN and SLC17A7 for
excitatory neurons, GAD1 and GAD2 for inhibitory neuron, C3, CD74, and
CSF1R for microglia, GRIN1, SNAP25, and SYT1 for neurons, MBP, MOBP,
and PLP1 for oligodendrocyte, CSPG4, PDGFRA, and VCAN for
oligodendrocyte progenitor cells (opc), and PDE5A and PDGFRB for
pericyte. Additional cluster cell type assignment was performed with
published databases (Figure [293]S12, [294]13)^[295]13,[296]17, which
yielded consistent annotations.
Differentially expressed genes (DEG) between PSP and control nuclei
were identified for each cluster through a hurdle model implemented in
MAST R package^[297]67, with adjustment of sex and age. We required DEG
to be detected (UMI > 0) in at least 20% of the nuclei in a cluster.
Multiple testing was adjusted for using FDR < 0.05. We performed
pathways enrichment analysis using FUMA GWAS web service
(v1.4.0)^[298]68 using all 22,431 expressed snRNAseq genes as
background. Multiple testing was adjusted for using the
Benjamini-Hochberg method. MigSigDB v7.0 is used. A minimal overlap of
2 genes was required.
Pseudobulk DEG comparing the expression between PSP and control
participants was performed for each cell type separately (Supplementary
Data [299]4, Table [300]S9). Briefly, the pseudo-count matrix was
obtained by adding the read count from all the nuclei of the same
individual. The pseudo count between PSP and control individuals was
compared using a negative binomial generalized linear model implemented
in R package edgeR^[301]69, with adjustment of sex and age. P-values
were corrected for multiple testing using FDR.
For each nucleus in the snRNAseq analysis, its expression scores, which
reflect the expression levels of a selected set of genes, were
calculated as the average expression levels of the module genes based
on the module definition from the 16 WGCNA modules using the
AddModuleScore() function implemented in Seurat^[302]16. We assessed
the overlap between module genes and PSP vs Control DEGs in all
clusters. Significance was calculated using a one-sided Fisher’s exact
test. Only single-nucleus DEGs that are also detected in bulk RNAseq
are used in the analysis.
We utilized the snRNAseq data to validate and filter the bulk RNASeq
DEGs based on our prioritization approach (Fig. [303]4a). We focused on
the significant bulk DEGs (FDR < 0.05 in the meta-analysis of study
1 + 2) from the PSP-associated co-expression modules that are also
enriched in brain cell types (i.e., modules 3, 4 and 6). There were
4969 such DEGs, 550 of which were also highly-connected network hub
genes with module membership (MM) > 0.7. We filtered these 550
significant PSP DEG hub genes by selecting those that are also
significant DEGs in the snRNAseq data cluster corresponding to the bulk
module cell type and that have a concordant direction of change between
bulk and snRNAseq. Specifically, for M3 candidate genes, we selected
those that are significantly down-regulated in any of the
oligodendrocyte clusters (CL0 or CL26). For the M4 candidate genes, we
selected those that are upregulated in the astrocyte cluster (CL2).
Lastly, for the M6 candidate genes, we selected those that are
upregulated in any of the microglia (CL3, CL27), endothelia (CL19),
pericytes (CL12, CL17), or astrocytes (CL2) clusters.
To elucidate additional cell population, the astrocyte nuclei were
subjected to subclustering analysis (Figure [304]S17). Briefly, read
counts of nuclei from astrocyte cluster CL2 were normalized,
log-transformed, and harmonized using the same procedures for the full
dataset. Clustering was carried out at a resolution of 0.3 with default
parameters. Subcluster marker genes were identified using the Wilcoxon
Rank Sum test. Differentially expressed genes between PSP and Control
nuclei in each subcluster were identified with the same approach used
for the full dataset.
Validation with rTg4510 Tauopathy Mouse Model Brain RNAseq
Bulk brain RNAseq data of the tauopathy mouse model rTg4510^[305]18 was
retrieved from the AD Knowledge Portal (syn3157182). Briefly, the
dataset consists of bulk RNAseq from forebrain samples of rTg4510 mice
overexpressing human P301L tau (4R0N) and wild-type non-transgenic
(nonTg) littermate control mice. Using the RNAseq data from the 4.5-
and 6-month-old (Table [306]S6) mice, we normalized the gene expression
values using CQN, followed by QC to check for outliers or mismatched
sex (Figures [307]S20,[308]21). Read counts between rTg4510 and nonTg
mice were compared in each dataset within each age group using a
negative binomial generalized linear model implemented in R package
edgeR^[309]69, with adjustment of sex and RIN. DEGs were identified
based on an FDR-adjusted p-value of 0.05. Mouse genes were mapped to
their human orthologs using R package biomaRt^[310]70 and ensemble
version 105.
Using the mouse DEG information, we investigated the 155 human DEGs
(Fig. [311]4a) that were significant and had congruent changes in both
bulk and snRNAseq and were also hubs in the prioritized modules M3, M4,
and M6. These human DEGs were considered validated in the mouse model
if significant at an FDR-adjusted level of 0.05 in either 4.5- or
6-month rTg4510 mouse brains and had the same direction of change as in
humans.
Validation with Drosophila Tau Model Experiments
High confidence PSP glial DEGs with significant and congruent changes
in human bulk and snRNAseq and rTG4510 mouse model brains were
experimentally validated in a Drosophila melanogaster model expressing
human wild-type Tau protein^[312]20. Briefly, Drosophila orthologs of
the human gene are obtained by querying DIOPT^[313]71 version 8. We
obtained all available RNAi lines against the Drosophila homologues of
the high confidence PSP glial DEGs (Fig. [314]4a) from the Bloomington
Drosophila Stock Center (BDSC), Transgenic RNAi Project (TRiP)
collection or requested from laboratories within the fly research
community^[315]72 (Tables [316]S9, [317]10). We crossed the
GMR-GAL4/CyO; UAS-hTau/TM3, to RNAi lines and selected progeny that
co-expressed both human Tau and RNAi (GMR-GAL4/+; UAS-hTau/UAS-RNAi,
where the UAS-RNAi can be on any chromosome). We first compared the
morphology of their eyes with the control flies expressing only the tau
protein using a −4 to 4 semi-quantitative scale where 0 indicated no
change compared to the tau-expressing control. A positive score
indicates enhancement/exacerbation of the well-described eye
neurodegeneration pathology, whereas a negative score indicates
suppression/rescue of this pathology. A score of 4 indicated that the
flies had no eyes, whereas a score of −4 indicated that the eyes were
indistinguishable from that of the wild-type control. If the flies fail
to eclose, we indicate the phenotype as being lethal as previously
described^[318]40.
To assess the impact of the three top tau-toxicity suppressor genes, we
aged the progenies for 5 days at 25 °C before pictures of the fly left
eyes were taken. The severity of the tau-induced eye degeneration was
assessed blindly based on the following categories: loss of bristle
(0-1), size (0-1), color (0-2), the presence of necrotic patterns
(0-2), the collapse of the eye (0-2), and the loss of ommatidia (0-2),
where a higher score indicates a more severe phenotype. Scores were
provided by two independent evaluators separately, and the average
scores were used for analysis.
Data sharing and interactive web tool
We built a web application to enable the interactive exploration of our
data called the PSP RNAseq Atlas
([319]https://rtools.mayo.edu/PSP_RNAseq_Atlas/), which is freely
available to the broad research community. The PSP RNAseq Atlas is
searchable by gene name and provides results for human bulk RNAseq PSP
DEG, neuropathology associations, snRNAseq, rTG4510 brain associations,
Drosophila tau model results and any available therapies against the
prioritized genes based on the Drug Gene Interaction Database (DGIdb)
v4.0^[320]73. In addition, all human and mouse RNAseq data in this
manuscript is available via the AD Knowledge Portal
([321]https://adknowledgeportal.synapse.org). The AD Knowledge Portal
is a platform for accessing data, analyses and tools generated by the
Accelerating Medicines Partnership (AMP AD) Target Discovery Program
and other National Institute on Aging (NIA)-supported programs to
enable open- science practices and accelerate translational learning.
Data is available for general research use according to the following
requirements for data access and data attribution
([322]https://adknowledgeportal.synapse.org/DataAccess/Instructions).
Reporting summary
Further information on research design is available in the [323]Nature
Portfolio Reporting Summary linked to this article.
Supplementary information
[324]Supplementary Information^ (24.3MB, pdf)
[325]Peer Review File^ (5.4MB, pdf)
[326]41467_2023_42626_MOESM3_ESM.pdf^ (86.4KB, pdf)
Description of Additional Supplementary Files
[327]Supplementary Data 1^ (16.3MB, xlsx)
[328]Supplementary Data 2^ (860.4KB, xlsx)
[329]Supplementary Data 3^ (817.9KB, xlsx)
[330]Supplementary Data 4^ (12.9MB, xlsx)
[331]Supplementary Data 5^ (1.1MB, xlsx)
[332]Reporting Summary^ (3.8MB, pdf)
Source data
[333]Source Data^ (42.6MB, xlsx)
Acknowledgements