Abstract

Introduction

   Recent advances in generating massive single‐cell/nucleus
   transcriptomic data have shown great potential for facilitating the
   identification of cell type–specific Alzheimer's disease (AD)
   pathobiology and drug‐target discovery for therapeutic development.

Methods

   We developed The Alzheimer's Cell Atlas (TACA) by compiling an AD brain
   cell atlas consisting of over 1.1 million cells/nuclei across 26 data
   sets, covering major brain regions (hippocampus, cerebellum, prefrontal
   cortex, and so on) and cell types (astrocyte, microglia, neuron,
   oligodendrocytes, and so on). We conducted nearly 1400 differential
   expression comparisons to identify cell type–specific molecular
   alterations (e.g., case vs healthy control, sex‐specific,
   apolipoprotein E (APOE) ε4/ε4, and TREM2 mutations). Each comparison
   was followed by protein‐protein interaction module detection,
   functional enrichment analysis, and omics‐informed target and drug
   (over 700,000 perturbation profiles) screening. Over 400 cell‐cell
   interaction analyses using 6000 ligand‐receptor interactions were
   conducted to identify the cell‐cell communication networks in AD.

Results

   All results are integrated into TACA
   ([38]https://taca.lerner.ccf.org/), a new web portal with cell
   type–specific, abundant transcriptomic information, and 12 interactive
   visualization tools for AD.

Discussion

   We envision that TACA will be a highly valuable resource for both basic
   and translational research in AD, as it provides abundant information
   for AD pathobiology and actionable systems biology tools for drug
   discovery.

Highlights

     * We compiled an Alzheimer's disease (AD) brain cell atlas consisting
       of more than 1.1 million cells/nuclei transcriptomes from 26 data
       sets, covering major brain regions (cortex, hippocampus,
       cerebellum) and cell types (e.g., neuron, oligodendrocyte,
       astrocyte, and microglia).
     * We conducted over 1400 differential expression (DE) comparisons to
       identify cell type–specific gene expression alterations. Major
       comparison types are (1) AD versus healthy control; (2)
       sex‐specific DE, (3) genotype‐driven DE (i.e., apolipoprotein E
       [APOE] ε4/ε4 vs APOE ε3/ε3; TREM2^R47H vs common variants)
       analysis; and (4) others. Each comparison was further followed by
       (1) human protein‐protein interactome network module analysis, (2)
       pathway enrichment analysis, and (3) gene‐set enrichment analysis.
     * For drug screening, we conducted gene set enrichment analysis for
       all the comparisons with over 700,000 drug perturbation profiles
       connecting more than 10,000 human genes and 13,000 drugs/compounds.
     * A total of over 400 analyses of cell‐cell interactions against 6000
       experimentally validated ligand‐receptor interactions were
       conducted to reveal the disease‐relevant cell‐cell communications
       in AD.

   Keywords: Alzheimer's disease, database, drug repurposing, network
   pathobiology, single‐cell, single‐nucleus, target identification,
   transcriptomics

1. INTRODUCTION

   Alzheimer's disease (AD) is a devastating neurodegenerative disease now
   affecting 6.5 million Americans age 65 and older and projected to
   double to 13.8 million by 2060.[39] ^1 More than 11 million family
   members and unpaid caregivers provided an estimated $271.6 billion care
   to people with AD and other dementias in 2021,[40] ^1 while the
   attrition rate for AD clinical trials (2002–2012) is estimated at over
   99%.[41] ^2 The underlying disease etiology and molecular mechanisms of
   AD are under investigation.[42] ^3 , [43]^4 , [44]^5 , [45]^6 The
   genetic predisposition to AD involves a complex, polygenic, and
   pleiotropic genetic architecture.[46] ^7 , [47]^8 The traditional
   reductionist paradigm overlooks the inherent complexity of AD and often
   leads to incomplete evidence on disease initiation, progression, or
   modification.[48] ^9 Existing multi‐omics data resources, including
   genomics, transcriptomics, proteomics, and interactomics
   (protein‐protein interactions [PPIs]), have not been fully utilized and
   integrated to identify pathobiology and support therapeutic development
   for AD and AD‐related dementias (ADRDs). For example, tools such as
   Single Cell Portal
   ([49]https://singlecell.broadinstitute.org/single_cell) and CELLxGENE
   ([50]https://cellxgene.cziscience.com/) have an extensive number of
   single‐cell/nucleus (sc/sn) omic data sets. These tools focus on
   visualizing cells (annotations) and gene expressions, but have not
   utilized resources such as PPIs to reveal underlying disease
   pathobiology and actionable targets, or utilized drug perturbation
   profiles for therapeutic discoveries. It is urgent to develop
   genome‐wide, systems approaches or resources to identify likely
   molecular drivers and disease networks, which will enable a more
   complete mechanistic understanding of AD/ADRDs and assist in
   identifying effective treatments.[51] ^10 , [52]^11 , [53]^12

   Recent breakthroughs in sc/sn RNA‐sequencing (RNA‐seq) technologies
   have advanced our understanding of AD/ADRDs.[54] ^13 , [55]^14 For
   example, using 5XFAD mouse model scRNA‐seq data, a novel microglia
   subtype termed disease‐associated microglia (DAM) was discovered that
   co‐localized with amyloid beta (Aβ) plaques.[56] ^13
   Diseased‐associated astrocyte (DAA) was also discovered using a
   snRNA‐seq data set, which occurred in AD mouse models and increased
   with disease progression.[57] ^14 Using a large‐scale human snRNA‐seq
   data set, researchers discovered two distinct microglial subclusters in
   patients with AD that correlated with Aβ plaques and tau pathology,
   respectively.[58] ^15 Marked disease heterogeneity of AD may have been
   one of the leading causes of the high failure rate of AD clinical
   trials.[59] ^16 These sc/sn studies advance our understanding of the
   heterogeneity of AD and offer cell type–specific actionable targets
   and, therefore, have great potential in target identification and
   precision‐medicine drug repurposing for AD.[60] ^10 , [61]^12 , [62]^17
   For example, using endophenotype network and population‐based
   validation, we identified that sildenafil use was associated
   significantly with a 69% reduced likelihood of AD, potentially by
   promoting neurite growth and decreasing phospho‐tau expression in
   patients with AD.[63] ^10 Using sc/sn RNA‐seq data and network‐based
   methodologies, our team identified both unique and shared immune
   pathways between DAM and astrocytes, and performed network‐based
   predictions that identified fluticasone as a potential treatment for
   AD.[64] ^17

   Although there has been a surge of new AD‐related sc/sn RNA‐seq data
   sets in the past few years,[65] ^13 , [66]^14 , [67]^15 , [68]^18 ,
   [69]^19 , [70]^20 , [71]^21 , [72]^22 , [73]^23 , [74]^24 , [75]^25 ,
   [76]^26 , [77]^27 , [78]^28 , [79]^29 , [80]^30 , [81]^31 the potential
   insights embedded in these data come with several difficulties. The
   majority of the original studies of these heterogeneous data sets focus
   on specific aspects of AD, although some studies such as Mathys
   et al.[82] ^32 and Grubman et al.[83] ^19 provide a comprehensive view
   of the AD biology in cell type–specific manners. Researchers frequently
   need to re‐run the single‐cell analysis pipelines for their tasks due
   to limited access to processed data and results, and such analyses
   require a large amount of computing resources. The application of
   state‐of‐the‐art techniques, such as network pathobiology mapping, have
   been lacking with these data sets. To overcome these limitations, we
   built The Alzheimer's Cell Atlas (TACA), which contains abundant
   AD‐related sc/sn transcriptomic information and various types of
   large‐scale transcriptomic and systems biology analysis results for the
   identification of cell type–specific AD pathobiology and target
   discovery for rapid translational therapeutic development (e.g., drug
   repurposing).

RESEARCH IN CONTEXT

    1. Systematic review: We reviewed the literatures using traditional
       sources (i.e., PubMed) and we have seen a surge in the number of
       Alzheimer's disease (AD) single‐cell/nucleus multi‐omics data sets
       in the past few years. Yet, genome‐wide, systems biology approaches
       or resources that utilize these large‐scale data to identify likely
       molecular drivers, disease networks, and drug target are still
       lacking. The development of a portal for these analyses results
       will enable a more complete mechanistic understanding of
       Alzheimer's disease (or AD) and assist in identifying treatments.
    2. Interpretation: We compiled an AD brain cell atlas (termed The
       Alzheimer's Cell Atlas [TACA], [84]https://taca.lerner.ccf.org/)
       consisting of more than 1 million cells/nuclei from 26 data sets,
       covering major brain regions (cortex, hippocampus, cerebellum, and
       so on) and cell types (neuron, oligodendrocyte, astrocyte,
       microglia, and so on). We developed a web portal with 12
       interactive visualization tools (including cells, targets, drugs,
       and networks) and databases incorporating large‐scale single
       cell/nucleus transcriptomic, various biological networks, and
       analyses results to facilitate the identification of cell
       type–specific AD pathobiology and drug‐target identification for
       therapeutic discovery.
    3. Future directions: We envision that TACA will be a highly valuable
       resource for both basic and translational research for AD, owing to
       the abundant information it contains for the AD pathobiology and
       the actionable systems biology tools it is equipped with for
       therapeutic discovery. We will continue to bring more single
       cell/nucleus transcriptomic data and more types of analyses results
       and visualizations into TACA.

2. METHODS

   The construction of TACA involved three steps: data collection
   (Figure [85]1A), data analysis (Figure [86]1B), and construction and
   implementation of the database and web portal (Figures [87]1C
   and [88]2A). Detailed methods can be found in the Supplementary
   Methods. In this initial version of TACA, three interactive explorers
   were implemented for genes (Figure [89]2B), drugs (Figure [90]2C), and
   sc/sn data sets (Figures [91]1C and [92]2D), respectively. A total of
   12 visualization tools were implemented, among which seven are for
   different types of network visualizations.

FIGURE 1.

   FIGURE 1
   [93]Open in a new tab

   Overview of the information architecture and functions of The
   Alzheimer's Cell Atlas. (A) We have collected and assembled multiple
   types of data and networks, including single‐cell/nucleus (sc/sn)
   RNA‐sequencing (RNA‐seq) data sets, ligand‐receptor interactions
   (LRIs), protein‐protein interactions (PPIs), drug‐target interactions,
   and gene‐quantitative trait locus (QTL) associations. (See Table [94]S1
   and Supplementary Methods for more details of the data sources and
   preprocessing steps.) In total, we obtained over 1.1 million
   cells/nuclei from the transcriptomic data sets. We curated the metadata
   of the samples in the data sets from the GEO database and original
   publications, which enabled a comprehensive analysis of differential
   expression (DE) comparisons and cell‐cell interactions (CCIs). AD,
   Alzheimer's disease; CV, common variant; MCI, mild cognitive
   impairment. (B) The analysis pipeline of TACA. We adopted a standard
   sc/sn RNA‐seq processing pipeline as shown. We referred to the original
   publications of these data sets for cutoffs for gene and cell
   filtering, dimensional reduction technique selection (i.e., UMAP or
   tSNE), marker genes for detecting cell types, and other additional
   processing steps if used in the original publication. Otherwise, we
   integrated the quality‐controlled (cells filtered by mitochondria gene
   expression and number of features detected, etc.) data sets, performed
   dimensional reduction and clustering to annotate cell types, and
   exported the processed data for use in downstream analyses and the TACA
   webserver. For DE and CCI analyses, we defined possible analysis
   strategies and our pipeline conducted these analyses systematically
   (see Supplementary Methods). The differentially expressed genes (DEGs)
   were analyzed subsequently for PPIs, functional enrichment analysis,
   and virtual drug screening against over 700,000 chemical perturbation
   profiles. (C) Overview of the main tools (indicated by the tabs) and
   visualization types (indicated by the sample charts) available in the
   gene, drug, and sc/sn RNA‐seq data set explorers in TACA. The tools in
   the data set explorer are organized as trees corresponding to the
   analysis pipeline. For example, drug‐screening results can be accessed
   from the DE tool, when a specific DE comparison is selected. TACA
   incorporated several types of network visualizations for various types
   of biological relationships. These tools and visualizations are
   explained in more detail in Figures [95]2, [96]3, [97]4 and the Results
   section. MOA, mechanism of action.

FIGURE 2.

   FIGURE 2
   [98]Open in a new tab

   Drug, gene/target, and data set explorers in TACA. (A) The home page
   provides search tools for genes (B) and drugs (C) that direct users to
   the gene and drug explorers. All data sets in TACA can be listed by
   clicking the “human” or “mouse” buttons, and each data set has its own
   data set explorer page (D). (B) A gene explorer page shows the basic
   gene information, gene‐quantitative trait locus (QTL) associations, and
   protein‐protein interaction (PPI) neighbors of the gene of interest.
   (C) A drug explorer page shows the basic drug information, the
   structure, and the drug‐target network of the drug of interest. PPIs
   among the targets are shown as gray edges. (D) A data set explorer page
   that currently displays the dimensional reduction plot colored by cell
   types. Various tools can be accessed from the navigation panel on the
   left side of the page. Several tools are grayed out upon page loading,
   indicating that they are downstream analyses whose results become
   available to view only when the upstream analysis is selected. The help
   information for each tool can be accessed using the “help” button in
   the top header.

3. RESULTS

3.1. Overall design of TACA

   In this study, we compiled an AD brain sc/sn atlas consisting of more
   than 1.1 million cells/nuclei from over 400 human/mouse samples across
   26 data sets ([99]Tables S1 and [100]S2). All data sets were processed
   in consistent pipelines. We exhaustively compared gene expression among
   groups by automating the differential expression (DE) analyses using
   metadata that we curated from the original studies, reaching 1400
   comparisons (Table [101]S3). Major comparison types are (1) case versus
   healthy control, (2) sex‐specific DE, (3) genotype‐driven DE (i.e.,
   apolipoprotein E (APOE) ε4/ε4 vs APOE ε3/ε3; TREM2^R47H vs common
   variants), and (4) others. Each comparison was accompanied by network
   analysis to reveal PPI modules, functional analyses to reveal the
   enriched pathways and biological processes, and gene set enrichment
   analyses for target and drug screening for more than 700,000 chemical
   perturbation profiles. We performed an exhaustive search of cell‐cell
   interactions (CCIs) using a comprehensive ligand‐receptor interaction
   (LRI) network that we have compiled (Table [102]S4), achieving over 400
   CCI analyses.

   All results were integrated into a new web service, TACA, with
   interactive data set, gene, and drug explorers, and a wide range of
   visualization tools such as dimensional reduction plot for cell types
   and gene expressions, volcano plot, and PPI network for the
   differentially expressed genes (DEGs), LRI network for CCIs, and
   mechanism‐of‐action plot for chemical perturbation profiles against the
   DEGs. All visualized networks can be modified interactively, downloaded
   as images, and exported for use on the users’ own computers. All other
   types of visualization tools provide panning, scaling, selecting, and
   downloading as images, and offers informative messages when data points
   are hovered on.

3.2. Interface and main functions of TACA

   On the home page of TACA (Figure [103]2A), users can search for genes
   and drugs, which will lead users to their respective explorer pages. In
   the gene explorer (Figure [104]2B), the basic information, gene‐xQTL
   associations (including expression quantitative trait locus [eQTL] and
   protein quantitative trait locus [pQTL]) (Table [105]S5), and PPI
   network centered with the selected gene are shown. Gene‐xQTL
   associations are categorized as positive (β > 0) and negative (β < 0)
   associations in two separate tables. The PPI network can help identify
   important neighbor genes (blue nodes) that may serve as targets of
   drugs to indirectly affect the gene of interest (yellow node). In the
   drug explorer (Figure [106]2C), basic drug information and the
   drug‐target network of the selected drug are shown.

   The home page lists all the sc/sn data sets and serves as their entry
   points. Each data set is shown in a dedicated data set explorer page.
   The data set explorer (Figure [107]2D) is composed of a navigation
   panel on the left (Figure [108]3A,B) and a shared space on the right
   for the currently selected tool from the navigation panel
   (Figure [109]3C–G). The tools in the navigation panel are organized in
   a tree format corresponding to the analysis pipeline. For example, DE
   is the upstream analysis of drug screening (downstream analysis) that
   utilized the DEGs, whereas drug screening is the upstream analysis of
   drug‐perturbation network (downstream analysis) (Figure [110]3A). The
   downstream tool buttons are grayed out initially, and are (re)enabled
   when an upstream analysis is selected. Selecting a different upstream
   analysis will reset its downstream tools. For example, when a DE
   comparison is selected, its associated PPI network, functional
   enrichment analysis, and drug screening results may become available to
   view using the buttons on the DE tool page initially and can be
   accessed later from the navigation panel until another DE selection is
   made. The details of the currently selected DE comparison are shown
   below the DE navigation button, similar for the drug screening and CCI
   analysis tools.

FIGURE 3.

   FIGURE 3
   [111]Open in a new tab

   The data set page in TACA. (A) The main functions of the data set page
   are organized corresponding to the analysis pipeline. (B) A closer view
   of the navigation panel. Information regarding the currently selected
   analyses is shown in the navigation panel. (C) Basic data set
   information and a list of samples with metadata can be accessed by
   corresponding buttons. (D) In “cell viewer,” the dimensional reduction
   plot has three coloring modes, by cell types, by gene expression (color
   gradient), and by samples or sample metadata. A full‐featured cell
   selection tool based on cell type or sample (metadata) is offered. (E)
   In “differential expression” tool, all analyses are organized as
   “strategy,” “comparison,” and “cell type” (see Supplementary Methods).
   Once selected, a volcano plot of the comparison is shown, with those
   significantly differentially expressed genes colored in red. Two tables
   show the up‐ and down‐expressed genes, respectively. The downstream
   analyses can be accessed from this tool (indicated by arrows pointing
   back to the navigation panel). (F) Drug‐screening results are
   categorized as either inversely related (i.e., the perturbation leads
   to opposite gene expression pattern to that of the selected DE
   comparison) or positively related (i.e., the perturbation leads to
   similar gene expression pattern to that of the selected DE comparison).
   Gene expression patterns in both the DE comparison (dots whose colors
   and y positions indicate expression fold change) and the selected
   chemical perturbation (blue line for gene expressions in ascending
   order) are plotted. (G) Cell‐cell interactions (CCIs) (i.e., cellular
   communications among cell types) analyses are organized similar to that
   of the “differential expression” tool. Once an analysis is selected,
   the numbers of significant ligand‐receptor interactions (LRIs) in all
   cell type pairs are visualized in a heatmap. The results are organized
   in two tables, showing the significant LRIs in the selected CCI and the
   number of CCIs in which a certain LRI is significant, respectively.

   In the navigation panel, the first two buttons provide access to basic
   data set information and a table for sample metadata (Figure [112]3C).
   “Cell Viewer” offers a versatile dimensional reduction plot that has
   three coloring modes (Figure [113]3D), by cell types (i.e., microglia),
   by sample identities or sample metadata fields (e.g., TREM2 variants),
   and by gene expressions (e.g., APOE) in which cells are colored by a
   gradient. It is notable that “cell viewer” comes with a full‐featured
   sample and/or cell type selector. For samples, users can choose to show
   or hide each sample individually, or by filtering all samples with one
   or more metadata fields (e.g., selecting all male mild cognitive
   impairment [MCI] samples).

   In the “differential expression” tool, all DE comparisons can be found
   by selecting “strategy,” “comparison,” and “cell type.” Once selected,
   the DE comparison's description, volcano plot, number of DEGs, and
   downstream analysis availabilities are shown, together with two tables
   at the page bottom for up‐ and down‐expressed genes (Figure [114]3E).
   The gene names of the top DEGs with smallest false discovery rates
   (FDRs) are shown in the volcano plot, and are hidden upon clicking. In
   the two data tables, users can click the genes to open corresponding
   gene explorer pages. In TACA, we predefined nine sets of DEG cutoffs
   using fold change (FC) and FDR. These cutoffs can be selected in the
   table that shows the number of DEGs. The initial access points to the
   downstream tools are three buttons also found on this page.

   In the “drug screen” tool (Figure 3F), all significant inversely and
   positively related perturbations are shown in two separate tables. For
   readability we display the compound name instead of the IDs of the
   perturbations (referred to as “signature” ID in Connectivity Map [CMap]
   L1000) in the tables. Once a perturbation is selected, its details are
   shown below, with two buttons for opening the drug target and
   perturbation network tools. These networks along with other ones are
   explained in the next section. At the bottom of the page is a
   scatter/line hybrid plot that visualizes the relationship between the
   perturbation profile and the selected DE comparison. The perturbation
   profile is shown as a blue line, in which genes (x‐axis) are always in
   ascending order by their Z scores (y‐axis) in the profile. The DEGs
   (dots) are x‐positioned according to the genes in the perturbation
   profile, and are y‐positioned and colored by their log[2]FC. For
   inversely related perturbations, the up‐DEGs (warm color, above x‐axis)
   tend to locate to the left, indicating that they are downregulated by
   the perturbation, and the down‐DEGs (cold color, below x‐axis) tend to
   locate to the right, indicating they are upregulated by the
   perturbation. This pattern is reversed for the positively related
   perturbations.

   In the “cell interactions” tool (Figure 3G), all CCI analyses are found
   by selecting “strategy” and “analysis.” Once a CCI analysis is
   selected, a heatmap is shown for the number of significant LRIs in all
   cell type pairs. The grids in the heatmap can be selected, and the
   significant LRIs for the selected cell‐cell pair are populated in a
   table below. In another table, all LRIs that are significant in at
   least one cell‐cell pair are listed in descending order by the number
   of significant cell‐cell pairs. Two network visualizations can be
   accessed from this tool page for these two tables, respectively.

3.3. Drug/gene/cell network visualizations in TACA

   TACA offers seven types of network visualization tools, among which
   five (Figure [115]4) are found in the data set explorer.

FIGURE 4.

   FIGURE 4
   [116]Open in a new tab

   The network visualizations in TACA. (A) Protein‐protein interaction
   (PPI) network of the differentially expressed genes. Node colors
   indicate log[2] fold change (log[2]FC), and node sizes indicate false
   discovery rate (FDR). (B) Drug target network of the selected drug.
   Differentially expressed drug targets are colored by log[2]FC. PPIs
   among the targets are shown. (C) Perturbation network that visualized
   the inverse relation or positive relation of the differential
   expression results and gene profiles of a chemical perturbation. A
   maximum of 50 differentially expressed genes (DEGs) with the lowest Z
   scores and 50 with the highest Z scores in the perturbation profile are
   shown in the network. Gene nodes are colored and sized by their
   log[2]FC and FDR, respectively, whereas their border colors and edge
   (to the compound) colors indicate the Z scores in the perturbation
   profile. As a result, plot for inversely related perturbation and DEGs
   will have inverse node and edge color, whereas positively related
   perturbation and DEGs will have similar node and edge color. PPIs among
   the DEGs are shown as gray edges. (D) Ligand‐receptor interaction (LRI)
   network for the selected cell‐cell interaction. (E) Cell‐cell
   interaction network for the selected LRI.

   In the “differential expression” tool, when a DE is selected, the “PPI
   Network” becomes accessible that shows the PPIs among the top 200 DEGs
   with the smallest FDRs (Figure [117]4A). Node colors and sizes indicate
   log[2]FC and FDR, respectively.

   In the “drug screen” tool, when a perturbation is selected, its “drug
   target network” and “perturbation network” may become accessible. In
   the “drug target network” (Figure [118]4B), the targets of the drug are
   shown with the PPIs among them. Targets are colored by log[2]FC if they
   are also DEGs. This network shows the DEGs from the selected DE
   comparison that can be targeted directly by the selected drug, or
   targeted indirectly through PPIs with the drug's targets. In the
   “perturbation network” (Figure [119]4C), the inverse or positive
   relations of the DE and perturbation are visualized. Figure [120]4C
   shows an example of inverse relation, in which the up‐DEGs
   (warm‐colored nodes) are downregulated by the perturbation
   (cold‐colored edges and borders), whereas the down‐DEGs (cold‐colored
   nodes) are upregulated by the perturbation (warm‐colored edges and
   borders).

   In the “cell interactions” tool, when a CCI is selected, “LRI Network”
   becomes available when a specific pair of cell types is selected from
   the heatmap (Figure [121]4D), and “CCI Network” becomes available when
   a specific LRI is selected from the table (Figure [122]4E). “LRI
   Network” shows the significant LRIs in the selected pair of cell types.
   Ligands and receptors are denoted by different colors. In “CCI
   Network,” cell types are displayed instead, showing the cell types
   hosting the ligand that interact with cell types hosting the receptor.
   For example, using the data set (GEO ID: [123]GSE98969) that led to the
   original discovery of DAM,[124] ^13 we found that the APOE‐TREM2
   interaction was one of the top significant LRIs in multiple CCIs
   (Figure [125]4E) in the 5XFAD mouse, including DAM‐DAM and
   DAM‐microglia (Figure [126]4D). This observation is consistent with
   those of previous studies that demonstrated the important roles of
   APOE‐TREM2 interaction in modulating phagocytosis and mediating the
   transition from homeostatic microglia to DAM.[127] ^13 , [128]^33 ,
   [129]^34 , [130]^35

3.4. Discovery of repurposable drugs for AD using TACA

   In this example, we selected data set “[131]GSE148822.” In the
   “differential expression” tool, we selected the strategy ‘SUBSET by
   “REGION” – GROUP by “GROUP” – ADJUST by “AGE,SEX”’, comparison ‘SUBSET
   = “OC” – COMPARE GROUPs “AD” versus “CTR,’” and cell type “Neuron.” In
   other words, here we are exploring the DE results of comparing
   occipital cortex (OC) samples in AD patients versus those in
   non‐demented controls (CTR) for the cell type neuron. The DE comparison
   resulted in 79 DEGs, such as SLC1A3, SLC1A2, SPRED1, GPC5, MBP, and
   DDX24. It has been reported that members from the solute carriers
   (SLCs) family may be associated with neurodegenerative diseases.[132]
   ^36 SPRED1 may be involved in tauopathy.[133] ^37 By clicking “drug
   screen” below the “Number of DEGs” table, the page is switched to the
   “drug screen” tool. As the comparison is AD versus CTR, the desired
   relationship is, therefore, “inversely‐related” (such that up‐DEGs in
   AD are downregulated by the drug perturbation and the down‐DEGs in AD
   are upregulated by the drug perturbation to achieve an “rescued”
   effect). In this table, one perturbation (troglitazone) has a
   significant enrichment score. By clicking this perturbation, we see
   that most of the up‐DEGs are downregulated in the perturbation profile,
   and most of the down‐DEGs are upregulated by the drug
   (Figure [134]5A,B). By comparing the strongly perturbed genes (e.g.,
   STAT1, CLU, GPM6A, CST3, SLC1A2) (Figure [135]5A,B) with a list of
   AD‐associated risk genes that were compiled in a previous study,[136]
   ^10 , [137]^12 we found that clusterin (CLU),[138] ^38 which is
   significantly up‐expressed (log[2]FC = 0.453, FDR = 0.000006) in AD
   versus CTR, is strongly downregulated by troglitazone (Z score =
   −1.767). We found that one of troglitazone's physical interacting
   targets (Figure [139]5C), transient receptor potential cation channel
   subfamily M member 3 (TRPM3), is significantly overexpressed (log[2]FC
   = 0.977, FDR = 0.0002) in AD versus CTR. Troglitazone is a TRPM3
   inhibitor (IC[50] = 12 μM).[140] ^39 It also downregulated TRPM3 in
   this perturbation (Z score = ‐0.562). These results suggest that
   troglitazone may have a beneficial effect for AD neurons by reducing
   the levels of two up‐expressed genes in AD. It is possible that other
   inversely perturbed genes in Figure [141]5A and [142]B can explain the
   beneficial effect.

FIGURE 5.

   FIGURE 5
   [143]Open in a new tab

   Case study: single‐cell transcriptomics‐based drug screening. (A) This
   plot shows the inverse relationship between the selected drug
   perturbation (blue line, genes ordered in ascending order by their
   expression Z scores) and the differential expression (DE) (colored
   dots, x‐positioned according to the perturbation profile) profiles. The
   up‐differentially expressed genes (DEGs) (warm color) are downregulated
   by the perturbation, whereas the down‐DEGs (cold color) are upregulated
   by the perturbation. (B) A drug perturbation network that shows the (a
   maximum of) 50 DEGs with the lowest Z scores and 50 with the highest Z
   scores in the perturbation profile are shown in the network. In
   inversely related drug perturbation and DE profiles, the node color
   (indicate DE profile) and edge/border color (indicate drug perturbation
   profile) of the majority of the nodes are shown in opposite colors. (C)
   A drug target network colored by the DEG profiles. Non‐DEG targets are
   shown as gray circles.

3.5. Discovery of potential pathobiology of AD using TACA

   Here we show a case of how we identify potential pathobiology of AD in
   cell type–specific manners using TACA. Previous studies have shown that
   the transcription factor EB (TFEB) may have a protective role against
   AD because the upregulation of TFEB alleviated AD pathologies in mice
   and cells.[144] ^40 TFEB is a master regulator of lysosomal biogenesis
   and plays important roles in autophagy and mitophagy,[145] ^40 ,
   [146]^41 which were shown to be associated with AD pathology.[147] ^42
   , [148]^43 Here, we show that by using three data sets (GSE147528_EC,
   GSE147528_SFG, and [149]GSE148822) from two studies,[150] ^15 ,
   [151]^26 we found that TFEB was significantly downregulated when we
   compared AD patients with healthy or less‐severe AD patients
   (Figure [152]6). The cell dimensional reduction plots, gene expression
   plots (Figure [153]6A–C), and DE analyses results (Figure [154]6D–F)
   can be found in TACA as explained in previous sections.

FIGURE 6.

   FIGURE 6
   [155]Open in a new tab

   Case study: discovery of potential pathobiology of AD using TACA. (A–C)
   Dimensional reduction plots and expression plots for transcription
   factor EB (TFEB) from three data sets in TACA. (D–F) TFEB was
   significantly downregulated when we compared AD patients with healthy
   or less‐severe AD patients. CTR, non‐demented controls; CTR+, 
   non‐demented controls with mild amyloid beta pathology; DAA,
   disease‐associated astrocyte; EC, entorhinal cortex; FC, fold change;
   OC, occipital cortex; OPC, oligodendrocyte progenitor cell; OTC,
   occipitotemporal cortex; SFG, superior frontal gyrus.

   In GSE147528_EC and GSE147528_SFG (Figure [156]6A,B), we found that as
   Braak stage increases, the expression of TFEB significantly decreases
   (|log[2]FC| > 0.25 and FDR < 0.05) in both the entorhinal cortex (EC)
   region and superior frontal gyrus (SFG) region from post‐mortem brain
   tissue of male donors (Figure [157]6D,E). In [158]GSE148822, we found
   that the expression of TFEB was inversely associated with AD disease
   progression using male samples from OC and occipitotemporal cortex
   (OTC) regions (Figure [159]6C,F). However, this effect is not observed
   in female patients (|log[2]FC| < 0.25 or FDR > 0.05). In addition, TFEB
   is highly expressed in the oligodendrocytes (Figure [160]6A–C),
   consistent with results of a previous study that TFEB plays important
   roles in myelination in the oligodendrocytes.[161] ^44 These
   observations illustrate that TACA offers a useful tool for identifying
   potential pathobiology of AD in cell type–specific manners.

4. DISCUSSION

   We present TACA, a web portal and database with strong potential for
   the identification of cell type–specific AD pathobiology as well as
   target discovery for drug repurposing. We collected and processed a
   large amount of data, including sc/sn RNA‐seq transcriptomic data sets
   and many types of networks. Our first version of TACA achieved over 1.1
   million cells/nuclei and ≈1400 differential expression and 400
   cell‐cell interaction analyses with various downstream analyses. We
   will continue to expand TACA by adding new sc/sn RNA‐seq data sets and
   new types of visualizations and analyses.

   TACA offers a highly organized and interactive interface. Currently,
   there are 12 types of visualization tools throughout the data set,
   gene, and drug explorers. TACA's many types of network visualizations
   will play important roles in showing PPIs among DEGs, understanding
   cell type communications by LRIs, and revealing potential mechanisms of
   action of chemical perturbations against the DE comparisons and so on.

   As examples, we used the data and tools provided in TACA, and
   identified that troglitazone may have a protective effect for AD
   neurons. We found that it can lower the expression levels of CLU (known
   AD risk–associated gene)[162] ^38 and TRPM3 (direct target of
   troglitazone) that are both significant up‐DEGs in AD neurons. Previous
   studies have reported that troglitazone has a protective effect on
   neurodegenerative disorders, such as AD.[163] ^45 Yet, the underlying
   molecular mechanisms are not fully understood. A potential explanation
   is that inhibition of cyclin‐dependent kinase 5 (CDK5) activity by
   troglitazone repressed tau‐Thr231 phosphorylation.[164] ^45 Our case
   study shows that the virtual drug screening in TACA discovered
   troglitazone for AD without this prior knowledge, and identified two
   additional potential mechanisms of action for the beneficial effect.
   Our second case study of TFEB shows that TACA can be validated at
   mechanistic level, and we further found a male‐specific protective
   effect of TFEB.

   We envision that TACA will be a highly valuable resource for both basic
   and translational research in AD, as it provides abundant information
   for AD pathobiology and actionable systems biology tools for
   therapeutic discovery. Our framework can guide future AD sc/sn analyses
   and cell type–specific pathobiology and target discovery by providing
   numerous examples of data processing, analysis, and interpretation.
   Moreover, our framework can be broadly applied to other diseases. TACA
   will be regularly updated to include up‐to‐date sc/sn RNA‐seq AD data
   sets.

4.1. Collaborative interactions with other sc/sn RNA‐seq and AD resources

   To date, several useful bioinformatics tools have been developed for a
   broader range of sc/sn data set exploration, such as Single Cell Portal
   ([165]https://singlecell.broadinstitute.org/single_cell) and CELLxGENE
   ([166]https://cellxgene.cziscience.com/), and for AD studies, such as
   Agora ([167]https://agora.adknowledgeportal.org/genes) and the
   Alzheimer's Disease Atlas
   ([168]https://adatlas.helmholtz‐muenchen.de/).[169] ^46 We envision
   that it would be beneficial to the AD research community if TACA could
   establish collaborative work with these resources in the future. For
   example, a pipeline may be implemented to automatically import the
   annotated AD sc/sn data sets from Single Cell Portal, and our analysis
   pipeline will conduct analyses such as DE, CCI, and drug screening. The
   analyses outputs can be integrated into (or linked from) tools such as
   Agora for a more comprehensive view of genes and networks in a cell
   type–specific manner for rapid data sharing.

4.2. Limitations and future directions

   We acknowledge several limitations. First, although we included 26 AD
   data sets, more data sets have become available during the development
   of TACA. We will expand TACA in the following directions. (1) We will
   continue to process the sc/sn RNA‐seq data sets as we did for the first
   phase of the data sets in TACA, as well as allowing user‐supplied
   processed data sets in .rds format to be added using a pipeline that we
   have developed for this purpose. (2) We will focus on adding data sets
   from more diverse populations (e.g., African American, Asian
   populations,[170] ^47 and other minority populations), brain regions,
   and other AD tissue types (e.g., peripheral blood mononuclear cell
   [PBMC] and cerebrospinal fluid [CSF]). (3) We will integrate other
   types of omics data, such as scATAC‐seq, and offer multi‐omics
   integration analyses.[171] ^48 , [172]^49 We will add additional tables
   on the gene page to show other omic layers, such as proteomic and
   metabolomic data from the AD knowledge portal and The Alzheimer's
   Disease Metabolomics Consortium (ADMC).[173] ^50 (4) We will expand
   TACA for other neurodegenerative diseases, such as Parkinson disease
   (PD) and amyotrophic lateral sclerosis (ALS). Second, although we have
   curated the metadata from the GEO database and original publications,
   the availability of metadata varies among the data sets, and those with
   limited metadata have, therefore, limited DE comparisons and CCI
   analysis results. We recommend that researchers make their sample
   metadata available as complete as possible, since these metadata can
   significantly improve the reusability of the data sets. We will add
   more analysis results for existing data sets if these metadata become
   available. Third, although we integrated data from many sources to
   generate the human protein interactome, drug‐target network, and
   ligand‐receptor network, these networks are still incomplete and will
   be expanded. Fourth, the current “cell viewer” is optimized for showing
   large numbers of cells, but a “subset” function that loads only a
   subsetted data set with fewer cells may be useful to accelerate the
   performance on older generation computers. Fifth, we predefined nine
   sets of DE cutoffs for generating DEGs for downstream analyses, such
   that drug screening can be pre‐calculated. In future updates, we will
   further improve the drug‐screening computational efficiency to allow
   user‐defined DE cutoffs. Finally, advanced artificial
   intelligence/machine‐learning techniques, such as deep generative model
   and transfer‐learning approaches, can be applied for sc/sn data
   integration (among the AD data sets or with non–disease‐centric
   datasets such as the Tabula Sapiens[174] ^51 ) and analysis to identify
   novel/rare cell types and states.[175] ^52

AUTHOR CONTRIBUTIONS

   Feixiong Cheng conceived the study. Yadi Zhou implemented the pipeline,
   constructed the databases, and developed the website. Jielin Xu, Yuan
   Hou, and Yadi Zhou collected the data sets and performed all analyses.
   Lynn Bekris, James B. Leverenz, Andrew A. Pieper, and Jeffrey Cummings
   discussed and interpreted all results. Yadi Zhou, Jielin Xu, Yuan Hou,
   Feixiong Cheng, and Jeffrey Cummings wrote the manuscript. Yadi Zhou,
   Feixiong Cheng, and Jeffrey Cummings revised the manuscript. All
   authors critically revised the manuscript and gave final approval.

CONFLICTS OF INTEREST

   Dr. Cummings has provided consultation to AB Science, Acadia, Alkahest,
   AlphaCognition, ALZPathFinder, Annovis, AriBio, Artery, Avanir, Biogen,
   Cerevel, Clinilabs, Cortexyme, Diadem, EIP Pharma, Eisai, Genentech,
   Green Valley, Grifols, Janssen, Karuna, Lexeo, Lilly, Lundbeck, LSP,
   Merck, NervGen, Novo Nordisk, Oligomerix, Otsuka, PharmatrophiX,
   PRODEO, Prothena, ReMYND, Resverlogix, Roche, Signant Health, Suven,
   Unlearn AI, Vaxxinity pharmaceutical, assessment, and investment
   companies. Dr. Leverenz has received consulting fees from consulting
   fees from Vaxxinity, grant support from GE Healthcare, and serves on a
   Data Safety Monitoring Board for Eisai. The other authors have declared
   no competing interests. [176]Author disclosures are available in the
   supporting information.

Supporting information

   SUPPORTING INFORMATION
   [177]Click here for additional data file.^ (121KB, pdf)

   SUPPORTING INFORMATION
   [178]Click here for additional data file.^ (34KB, xlsx)

   SUPPORTING INFORMATION
   [179]Click here for additional data file.^ (119.6KB, pdf)

ACKNOWLEDGMENTS