Abstract
Type 2 diabetes (T2D) is a significant risk factor for Alzheimer’s
disease (AD). Despite multiple studies reporting this connection, the
mechanism by which T2D exacerbates AD is poorly understood. It is
challenging to design studies that address co-occurring and comorbid
diseases, limiting the number of existing evidence bases. To address
this challenge, we expanded the applications of a computational
framework called Translatable Components Regression (TransComp-R),
initially designed for cross-species translation modeling, to perform
cross-disease modeling to identify biological programs of T2D that may
exacerbate AD pathology. Using TransComp-R, we combined peripheral
blood-derived T2D and AD human transcriptomic data to identify T2D
principal components predictive of AD status. Our model revealed genes
enriched for biological pathways associated with inflammation,
metabolism, and signaling pathways from T2D principal components
predictive of AD. The same T2D PC predictive of AD outcomes unveiled
sex-based differences across the AD datasets. We performed a gene
expression correlational analysis to identify therapeutic hypotheses
tailored to the T2D-AD axis. We identified six T2D and two dementia
medications that induced gene expression profiles associated with a
non-T2D or non-AD state. We next assessed our blood-based T2DxAD
biomarker signature in post-mortem human AD and control brain gene
expression data from the hippocampus, entorhinal cortex, superior
frontal gyrus, and postcentral gyrus. Using partial least squares
discriminant analysis, we identified a subset of genes from our
cross-disease blood-based biomarker panel that significantly separated
AD and control brain samples. Finally, we validated our findings using
single cell RNA-sequencing blood data of AD and healthy individuals and
found erythroid cells contained the most gene expression signatures to
the T2D PC. Our methodological advance in cross-disease modeling
identified biological programs in T2D that may predict the future onset
of AD in this population. This, paired with our therapeutic gene
expression correlational analysis, also revealed alogliptin, a T2D
medication that may help prevent the onset of AD in T2D patients.
Subject terms: Systems biology, Biomarkers
Introduction
Type 2 diabetes (T2D) is a metabolic disease characterized by chronic
hyperglycemia and insulin dysregulation that significantly elevates the
risk for Alzheimer’s disease (AD) by more than 60%^[33]1–[34]3. AD is
an irreversible neurodegenerative disorder that gradually impairs
memory and cognitive function. A recent large-scale longitudinal study
found that individuals with an earlier onset of T2D were at higher risk
of developing AD^[35]4. Other cohort studies^[36]5,[37]6 reported
similar results. In addition to the elevated risk of AD, T2D also
contributes to other conditions such as hypertension^[38]7,
neuroinflammation^[39]8, heart disease^[40]9, stroke^[41]10, and kidney
disease^[42]11. As a result, the influence of T2D on other
comorbidities further complicates our understanding of its impact on
human health and the development of potential therapeutics for such
conditions.
To understand this T2D-AD axis, previous studies examined how the onset
of T2D influences the progression of AD^[43]12. Multiple studies
reported insulin signaling impairment in T2D and AD^[44]13,[45]14. The
metabolic connection to AD^[46]15 also carries the T2D risk factor and
is further amplified by the age^[47]16. Systemic low-grade inflammation
in T2D progressively leads to downstream neuroinflammation and neuronal
cell death, increasing the risk of AD^[48]17–[49]19. Another study
revealed altered gene expression levels in neurons, astrocytes, and
endothelial cells in post-mortem brain tissue of T2D subjects, showing
alterations to brain cells under diabetic conditions^[50]20.
Previous work from other groups implicates the blood-brain barrier
(BBB) as a potential route that connects T2D^[51]21 and AD^[52]22. The
BBB is a selective semipermeable membrane consisting of endothelial
cells, pericytes, and astrocytes, which protects the brain from harmful
substances and regulates the passage of immune cells and nutrients into
the brain^[53]23,[54]24. One large clinical study observed heightened
BBB permeability in people with T2D and AD^[55]25. This progressive
breakdown of the BBB in T2D and AD is associated with irregular
vascular endothelial growth factor production, resulting in increased
permeability across the BBB^[56]25,[57]26. Other reports suggested that
damage to endothelial cells in the cerebral blood vessels, indicated by
elevated adhesion molecules, may contribute to this
breakdown^[58]25,[59]27,[60]28. Therefore, chronic circulation of
molecules produced under T2D conditions in the bloodstream may
contribute to BBB breakdown and eventually enter the brain,
contributing to the development of dementia and cognitive dysfunction.
A barrier to understanding how one disease influences another is that
studies that simultaneously investigate multiple health conditions in
humans are rare and difficult^[61]29. This challenge is compounded in
chronic disorders like T2D and AD, where pathogenesis can precede
diagnosis by decades^[62]30. To overcome this barrier, other groups
have used differential expression analysis of transcriptomic data
between T2D and AD but have fallen short in considering human
heterogeneity, such as sex and age^[63]31,[64]32. Another group
integrated T2D and AD data using non-negative matrix factorization to
identify shared genes across the blood of T2D and AD. While they
identified dysregulated transcription factors shared across both
diseases, they also did not account for confounding variables such as
sex and age^[65]33. To overcome this challenge, we adapted Translatable
Components Regression (TransComp-R), a computational approach initially
developed to translate observations from pre-clinical animal disease
models to human contexts^[66]34–[67]39, to perform cross-disease
modeling of human datasets to identify T2D biology predictive of AD.
In this work, we hypothesized that gene transcripts in T2D blood may
predict and inform AD pathology. We tested this hypothesis via
computational modeling of publicly available peripheral blood
transcriptomics data of T2D and AD patients to determine if biomarkers
in T2D blood could distinguish blood signatures in AD versus
cognitively normal control groups. To identify potential therapeutics
tailored to the T2D-AD axis, we employed a correlational analysis to
identify candidate drugs that may impact AD development. Lastly, we
assessed whether the blood-based biomarkers from our T2D-AD
computational models could differentiate between AD and control samples
in brain tissue transcriptomics data.
Results
TransComp-R modeling separates AD and control subjects in T2D PC space
We acquired bulk-RNA seq T2D and microarray AD peripheral whole blood
data from Gene Expression Omnibus (GEO). For the T2D dataset
([68]GSE184050)^[69]40, we used the longitudinal baseline sample
collection and information, including demographic variables of sex and
age. Two separate cohorts of AD data were used in the model to test the
predictability of T2D for AD. In both AD cohort 1 ([70]GSE63060)^[71]41
and AD cohort 2 ([72]GSE63061)^[73]41, we used AD and healthy control
subjects. Using two separate cohorts ensured that the selected T2D PC’s
would be robust (Table [74]1).
Table 1.
Demographics of processed human transcriptomic blood data across each
data set
GEO dataset (accession) Condition Age (years) Mean ± SD Sex (%) Total
sample size (n)
Male Female
T2D ([75]GSE184050) Control 64.4 ± 9.6 3 (19%) 13 (81%) 16
T2D 64.1 ± 2.8 3 (30%) 7 (70%) 10
AD Cohort 1 ([76]GSE63060) Control 72.8 ± 5.8 42 (41%) 60 (59%) 102
AD 75.4 ± 6.6 46 (32%) 99 (68%) 145
AD Cohort 2 ([77]GSE63061) Control 75.3 ± 6.0 53 (40%) 81 (60%) 134
AD 77.9 ± 6.7 54 (39%) 85 (61%) 139
[78]Open in a new tab
We repurposed the TransComp-R to identify biological pathways
dysregulated in T2D predictive of AD status. Cross-disease TransComp-R
begins by matching shared genes across all datasets (Fig. [79]1a). We
then projected the AD human samples into a principal component analysis
(PCA) space constructed from the T2D data. We evaluate predictive power
of T2D PCs for outcomes in AD by Least Absolute Shrinkage and Selection
Operator (LASSO) feature selection and generalized linear model (GLM)
regression (Fig. [80]1b). Using GSEA, we annotated the biological and
therapeutic interpretations of the significant T2D PCs predictive of AD
biology (Fig. [81]1c). We correlated differentially expressed genes
from the drug list containing consensus signatures from the Library of
Integrated Network-based Cellular Signatures (LINCS) database to the
loadings of the T2D PCs predictive of AD. This method links drug
regulation of genes associated with healthy states vs AD or T2D with
drug response signatures to identify therapeutic hypotheses.
Fig. 1. Workflow of TransComp-R.
[82]Fig. 1
[83]Open in a new tab
a Genes across T2D and AD are selected for analysis. Each AD cohort is
individually projected into the T2D PCA space to combine the two
diseases. b PC translatability from T2D to AD is determined by running
a GLM regression against AD outcomes using PCs consistently selected
across each AD cohort. c Pathway enrichment analysis is performed on
the loadings of significant PCs to identify enriched biological
pathways. Potential therapeutic candidates are then identified using a
correlation analysis framework.
We matched 11,455 genes across the T2D and AD datasets and constructed
the PCA space of the T2D and control samples. To prevent overfitting,
we selected thirteen PCs for a cumulative explained variance of 80% for
the TransComp-R model (Supplementary Fig. [84]1). Each AD cohort was
separately projected onto the T2D PCs, such that we constructed two
cross-disease models: T2D with AD cohort 1 and T2D with AD cohort 2.
We quantified how the variance captured by the T2D PCs explained the
variation in human AD. To determine the cross-disease relevance of the
T2D PCs to the variance of the AD data, we visualized each of the
thirteen T2D PCs, comparing the variance explained in the T2D and AD
data (Fig. [85]2a). When comparing the translatability of T2D PCs in AD
cohort 1 and 2, we found T2D PC1, PC2, and PC3 had higher explained
variance in AD data relative to the other T2D PCs 4-13, showing that
T2D PCs1-3 have highest potential for translation of biology between
T2D and AD.
Fig. 2. TransComp-R identifies T2D PCs predictive of AD outcomes.
[86]Fig. 2
[87]Open in a new tab
a AD PCs were separated by cohort, with variance explained in AD. b
Selection of PCs using a LASSO model incorporating sex and age
demographics from the AD datasets. The model was run across twenty
random rounds of ten-fold cross-validation. PCs consistently determined
significant across both AD cohorts from the GLM regression were further
analyzed. c Principal component plots of AD scores on selected T2D PCs
separating AD and control outcomes in AD cohort 1 and d AD cohort 2.
Each T2D PC is represented by the percent variance explained in AD.
We used LASSO to select the most relevant T2D PCs for predicting AD by
regressing AD projections on T2D PCs, sex, and age from the AD cohort,
with interaction effects of T2D PC with sex and age. From the LASSO
model, several PCs (PC2, PC5-6, PC9-13) were selected across both AD
cohorts (Fig. [88]2b). Despite the multiple number of PCs being
consistently selected from LASSO, only T2D PC2, PC5, PC6, and PC11
fulfilled the selection criteria and discerned between AD and control
groups in the GLM. The T2D PCs predictive of AD conditions were
visualized for both AD cohort 1 (Fig. [89]2c) and AD cohort 2 (Fig.
[90]2d). While the transcriptomic variation encoded on T2D PC2 and PC5
were able to distinguish between human AD and control groups, there was
less distinguishable separation made by T2D PC6 and PC11. Among T2D PC2
and PC5, we selected T2D PC2 for deeper downstream interrogation due to
the higher potential for T2D-to-AD translatability as quantified by the
percentage of variance explained in AD (Fig. [91]2a).
T2D and AD share pathways associated with metabolism, signaling pathways, and
cellular processes
We employed GSEA to interpret the T2D PC2 gene loadings, which encoded
transcriptomic variation between healthy and T2D subjects that
predicted AD outcomes using both KEGG (Fig. [92]3a) and Hallmark (Fig.
[93]3b) databases to gain a holistic insight into the genes loaded on
T2D PC2.
Fig. 3. Pathway enrichment analysis.
[94]Fig. 3
[95]Open in a new tab
The transcriptomic variance separating AD and control subjects on T2D
PC2 was interpreted with GSEA using the a KEGG and b Hallmark
databases. Significantly enriched pathways were determined with a
Benjamini–Hochberg adjusted p value less than 0.01. c Shared leading
edge genes between biological pathways in the KEGG and d Hallmark
pathways. The node size represents the number of genes contributing to
the pathway from GSEA, whereas the edge size is the number of shared
genes between each biological pathway. Missing pathways signified that
there were no shared genes with other pathways.
We organized the enriched pathways into themes to determine if
neighboring pathways were due to the overrepresentation of shared genes
for both the KEGG (Fig. [96]3c) and Hallmark (Fig. [97]3d) databases.
In the AD-associated pathways from KEGG, we identified enriched pathway
themes, such as the cardiovascular system, signaling pathways, cellular
processes and metabolism, and cancer pathways. In the control group, we
found pathways associated with neurodegenerative diseases and
metabolism. From Hallmark, pathways enriched in AD associations
included signaling pathways, cellular processes, metabolism, and stress
response, with metabolism and cell cycle pathways enriched in controls.
T2D PC2 identifies gene expression changes with predictive ability across sex
and disease conditions in two AD cohorts
We compared the average log[2] fold change of the 11,455 shared genes
for disease and control groups to identify trends in the regulation of
genes across diseases. In both AD cohorts and T2D, there were decreases
in gene expression including COX7C, NDUSF5, NDUFA1, RPL17, RPL23,
RPL26, RPL31, and TOMM7 (Fig. [98]4a), genes responsible for
mitochondrial and ribosomal functions. COX7C, NDUSF5, and NDUFA1 are
active in the electron transport chain function in the inner
mitochondrial membrane and TOMM7 encodes for a subunit of the
translocase of the outer mitochondrial membrane. Ribosomal protein L
genes such as RPL17, RPL23, RPL26, and RPL31 play a role in forming
structures of ribosomes and regulating ribosome function.
Fig. 4. Comparison of global gene expression and AD-predictive T2D PCs.
[99]Fig. 4
[100]Open in a new tab
a AD and T2D log[2] fold change plot of all shared 11,455 genes, b AD
and T2D log[2] fold change plot filtered by gene expressions with the
top 50 and bottom 50 loadings of T2D PC2. c Scores of T2D PC2 separated
by sex and disease condition. A Mann–Whitney test adjusted by
Benjamini–Hochberg was used to determine statistical significance. The
distribution of the data is annotated by the mean and interquartile
ranges.
We next tested to see if the top 50 and bottom 50 gene loadings from
T2D PC2 could capture the cross-disease trends of the total
transcriptome. We visualized the filtered gene with AD and T2D fold
changes and observed a similar trend such that multiple genes were
downregulated in both AD and T2D conditions (Fig. [101]4b). Among those
consistently downregulated in AD and T2D, genes related to ribosomal
proteins (RPL and RPS) were present. These 100 genes also distinguished
between control and AD subjects (Supplementary Fig. [102]2).
Finally, we evaluated T2D PC2’s ability to stratify sex and disease
characteristics in AD. We identified significant sex-based differences
across AD and control in both cohorts. In AD cohort 1, we found that
the female and male groups, each separated by AD and control, were
significantly different by the variation captured by T2D PC2, with
adjusted p values of 0.0002 and 0.0013, respectively (Fig. [103]4c).
Similarly, in AD cohort 2, there was significance in disease separation
for both females and males from the AD datasets, with adjusted p values
of 0.0073 and 0.0033, respectively (Fig. [104]4c). Comparing the scores
of T2D PC2 by disease condition only, we found significance in both AD
cohort 1 (p = 2.000 × 10^−7) and AD cohort 2 (p = 9.078 × 10^−5).
Identification of drug perturbation signatures associated with PC2 T2D-AD
signatures
We developed a correlation analysis to identify therapeutic candidates
associated with the T2D PC2 predictive of AD. We used the Library of
Integrated Network-Based Cellular Signatures (LINCS) Consensus
Signatures, a dataset containing 33,609 drugs with their respective
post-treatment gene expression profiles summarized as a “characteristic
direction” (CD) coefficient^[105]42. Of the 33,609 drugs in the LINCS
database, 2558 remained after we filtered out duplicates and drugs
without known targets. We compared the CD coefficient values of genes
affected by each drug to the gene loadings on T2D PC2 using Spearman’s
correlation. We hypothesized a drug could be therapeutic for T2D/AD
risk based on the correlation directionality, where negative ρ values
were interpreted as inducing profiles associated with a non-T2D or
non-AD state and positive ρ values associated with a T2D or AD disease
state.
We identified 1262 drugs significantly correlated with the loadings in
T2D PC2 (Fig. [106]5a). Drugs associated with a non-T2D and non-AD gene
expression profile included dienestrol, BW-180C, T-0156, alogliptin,
and roflumilast (Supplementary Data [107]1). Dienestrol had the most
negative correlation coefficient of −0.5059 and is an estrogen receptor
agonist used to treat vaginal pain by targeting ESR1. T-0156 (PDE5A)
and roflumilast (PDE4A, PDE4B, PDE4C and PDE4D) are both
phosphodiesterase inhibitors. We also identified a prototypical delta
opioid receptor agonist (BW-180C) and a T2D prescription medication
(alogliptin), which targets OPRD1 and DPP4, respectively. Conversely,
drugs associated with gene expression of a T2D or AD disease state
included antagonists such as wortmannin (PI3K inhibitor), proglumide
(CCK receptor antagonist), GR-127935 (serotonin receptor antagonist),
homatropine-methylbromide (acetylcholine receptor antagonist), and
phenacemide (sodium channel blocker). These medications were correlated
to both AD and T2D signatures with therapeutic potential.
Fig. 5. Computational gene expression correlational analysis.
[108]Fig. 5
[109]Open in a new tab
a All significant drugs identified from the LINCS database. Drugs
filtered by b FDA approval status and c over-the-counter drugs. d
FDA-approved T2D drugs (alogliptin and glipizide) associated with
control group signatures. e FDA-approved T2D drug (orlistat) associated
with genes upregulated in AD. f FDA-approved medications for
cognitive-enhancement (galantamine and donepezil). g FDA-approved drug
(brexpiprazole) with signatures correlated to genes elevated in AD.
To filter drugs tested for safety and efficacy, we referenced the Food
and Drug Administration (FDA) Orange Book for FDA-approved and
over-the-counter drugs (June 2024 version)^[110]43. We identified 301
FDA-approved drugs in our original significant 1262 (Fig. [111]5b), and
of these, 23 were approved for over-the-counter use (Fig. [112]5c).
Among the FDA-approved drugs, alogliptin and roflumilast were among the
most negative correlation coefficients. Other medications with negative
coefficients associated with a non-T2D or AD state were isradipine,
used for hypertension (CACNA1S, CACNA1C, CACMA1F, CACMA1D, and CACMA2D1
targets), niacin used for vitamin B (HCAR2 and HCAR3 targets), and
disopyramide used for irregular heartbeats (SCN5A gene target)
(Supplementary Data [113]2). Among medications with top positive
coefficients associated with AD and T2D, we identified two anti-cancer
drugs (pacritinib and lenvatinib), a blood thinner (ticagrelor), and
two anti-arrhythmic drugs (adenosine and flecainide).
The most negative coefficients for over-the-counter drugs were
vasodilators, opioid receptor targets, and histamine receptor drugs
(Supplementary Data [114]3). Minoxidil had the most negative
correlation coefficient (−0.3101) and is a hypertension medication that
targets KCNJ8, KCNJ11, and ABCC9. Loperamide (opioid receptor agonist),
used for diarrhea, targets OPRM1 and OPRD1, while naloxone (opioid
receptor antagonist), used for opioid overdose, affects OPRK1, OPRM1,
and OPRD1. We also identified two histamine receptor antagonists,
cimetidine and doxylamine, which targeted HRH2 and HRH1, respectively.
The most positively correlated medications that induced disease gene
signatures included orlistat, a lipase inhibitor used for weight loss
and T2D, had the greatest coefficient of 0.3104 (LIPF, PNLIP, DAGLA,
and FASN targets). Other positive correlation, T2D-AD associated drugs
included budesonide (corticosteroid for Crohn’s disease) and mometasone
(steroid for skin discomfort), both of which are glucocorticoid
receptor agonists with the target of NR3C1. Other medications among the
most positively correlated included clotrimazole (cytochrome p450
inhibitor) and pheniramine (histamine receptor antagonist), which
targeted KCNN4 and HRH1 respectively.
We compared the FDA-approved drugs to MedlinePlus and First Databank
for any medication currently used to treat T2D or cognitive-associated
symptoms (Supplementary Data [115]4). Of the 301 FDA-approved drugs
identified, we found ten medications for T2D and three with cognitive
function associations (Supplementary Data [116]5). Among the
medications used for T2D, glipizide (sulfonylurea), repaglinide
(insulin secretagogue), and nateglinide (insulin secretagogue) targeted
KCNJ11 and ABCC8. The diabetes dipeptidyl peptidase inhibitors that
target DPP4, included alogliptin, sitagliptin, and linagliptin. We also
identified sodium/glucose co-transporter inhibitor empagliflozin
(SLC5A2), the PPAR receptor antagonist pioglitazone, glucosidase
inhibitor acarbose (AMY2A, MGAM, and GAA), and lipase inhibitor
orlistat (LIPF, PNLIP, DAGLA, and FASN). Among medications commonly
prescribed to improve cognitive function, we identified donepezil and
galantamine, acetylcholinesterase inhibitors that target ACHE and
ACHE/BCHE and brexpiprazole (HTR2A, DRD2, HTR1A), a dopamine receptor
partial agonist used for AD-associated agitation. Of these thirteen
medications, empagliflozin, linagliptin, brexpiprazole, acarbose, and
orlistat contained gene expression responses correlated to an AD or T2D
condition. Nine medications were associated with a non-AD or non-T2D
condition, which included alogliptin, glipizide, repaglinide,
sitagliptin, pioglitazone, galantamine, nateglinide, and donepezil.
We selected the top two medications that associated with a non-disease
state (T2D and cognitive-enhancing medication) and those associated
with a disease state to compare the relationship of the drug DEGs and
T2D PC2 scores. We found that alogliptin and glipizide, anti-T2D drugs
had the most significant correlation magnitude among the six drugs,
with a coefficient of −0.5 (p < 2.2 × 10^−16) and −0.42
(p < 2.2 × 10^−16), respectively (Fig. [117]5d). Orlistat had gene
signatures most positively correlated with disease states (rho = 0.31,
p = 2.9 × 10^−10) (Fig. [118]5e). The signatures affected by cognitive
medications galantamine (rho = −0.13 p = 0.0028) and donepezil
(rho = −0.1 p = 0.024) had weaker correlations than the anti-T2D
medication (Fig. [119]5f). Finally, we identified brexpiprazole, an
anti-psychotic drug with a low positive correlation coefficient of 0.22
(p = 2.6 × 10^−7) associated with T2D and AD disease status (Fig.
[120]5g). Other FDA-approved T2D medications, with weaker correlations
to a non-T2D or non-AD state included repaglinide, sitagliptin,
pioglitazone, and nateglinide (Supplementary Fig. [121]3).
Translation of T2D PC2 gene loadings to from AD blood to AD brain
transcriptomics
Having identified biomarkers in T2D blood predictive of AD status, we
assessed if the identified signature stratified AD from control
patients in brain tissues. We acquired a human microarray dataset
([122]GSE48350)^[123]44,[124]45 profiling AD and control samples in
multiple brain regions: hippocampus, entorhinal cortex (EC), superior
frontal gyrus (SFG), and postcentral gyrus (PoCG). Potential age bias
was reduced by excluding subjects younger than 55. The post-processed
demographics separated by their respective brain region were summarized
(Table [125]2).
Table 2.
Demographic summary across four different processed human brain regions
GEO dataset ([126]GSE48350) Condition Age (years) Mean ± SD Sex (%)
Total sample size (n)
Male Female
Hippocampus Control 82.0 ± 10.0 13 (52%) 12 (48%) 25
AD 83.1 ± 8.5 9 (47%) 10 (53%) 19
Entorhinal cortex Control 80.7 ± 10.3 9 (50%) 9 (50%) 18
AD 86.5 ± 5.5 7 (47%) 8 (53%) 15
Superior frontal Control 80.8 ± 10.3 12 (46%) 14 (54%) 26
Gyrus AD 87.1 ± 6.2 7 (33%) 14 (67%) 21
Postcentral gyrus Control 81.5 ± 10.4 11 (46%) 13 (54%) 24
AD 85.0 ± 8.2 10 (40%) 15 (60%) 25
[127]Open in a new tab
We matched genes in the AD brain dataset to the top 50 and bottom 50
genes from T2D PC2 (Fig. [128]6a) and matched 88 genes. We determined
AD status-associated genes in each brain region via differential
expression analysis (Benjamini–Hochberg adjusted Mann–Whitney test, p
adjusted <0.20). We first investigated the hippocampus brain tissue to
identify genes from T2D-blood PC2 that could stratify AD and control
groups in the brain. We identified 25 significant genes (adjusted p
value < 0.20) and hierarchical clustering showed these 25 genes
separated AD and control conditions in the hippocampus gene expression
data (Fig. [129]6b). We used these genes to construct PLS-DA models to
identify genes driving separation across the brain tissue samples of AD
and control groups (Fig. [130]6c and Supplementary Fig. [131]4).
Fig. 6. Translating blood-predictable signatures to the brain.
[132]Fig. 6
[133]Open in a new tab
a Method of testing blood-derived data predictability in the brain
(Illustration from biorender.com). b Z-score of significant
AD-associated genes identified in the human hippocampal dataset
(Mann–Whitney adjusted by Benjamini–Hochberg, p adjusted <0.20). c
PLS-DA model using significant genes to predict AD status. AD groups
are labeled by APOE genotype, Braak stage, and MMSE. d Loading
variables LV1 and LV2 for the model are presented. A VIP > 1 is
annotated with a star, and the color of the loading bar represents the
highest contribution to the specific class by the respective gene.
We annotated the subjects within the PLS-DA plot by their respective
apolipoprotein E (APOE) genotype, Braak stage, and mini-mental state
examination (MMSE) scores (Fig. [134]6d). These were used since APOE e4
is the greatest genetic risk factor for AD^[135]46, Braak stage
assesses neurofibrillary tangle pathology^[136]47, and MMSE for
cognitive impairment screening^[137]48. There was clear separation
between AD and control groups in our PLS-DA model and we identified a
subset of genes loaded in the latent variables (LVs) most predictive of
disease status (Fig. [138]6d). On LV1, we identified genes with
variable importance of projection (VIP) greater than 1 associated with
the control group, including SNRPD2, POLR2K, ATP6V0C, NDUFB1, COX6C,
COX7C, and CHGA. For the AD group, we found BNC1, WDR38, SLC9A1, ALB,
and TNRC18 with a VIP > 1. Although there was no separation across the
disease classes on LV2, we found NDUFB1, ATP6V0C, COX7C, COX6C, and
CHGA contributed greater than average (VIP > 1) to the control group,
whereas ALB, TNRC18, SLC9A1, BNC1, BCORL1, and ZNF467 had a VIP > 1 for
AD.
After observing separation across disease classes in the hippocampus
brain data, we next determined if the T2D blood biomarkers able to
stratify AD conditions in blood were reflective in other parts of the
brain. We built PLS-DA models for the EC, SFG, and PoCG. Of the 88
genes that matched in the human brain tissue data, five genes were
significant across AD and control groups in the EC (Fig. [139]7a).
Using these genes for the PLS-DA model, we found distinct separation
across LV1, and identified RIN3, RPL36A, and POLR2K as genes with a VIP
greater than 1 (Fig. [140]7b). In the SFG brain region, we identified
four significant genes: RIN3, CSTA, RCN3, and RPL36A (Fig. [141]7c). In
the SFG model, RIN3 and RPL36A contributed most to separation between
the AD and control groups (Fig. [142]7d). In the PoCG region, three
genes significantly separated AD and control, including PRAM1, RCN3,
and RPL36A (Fig. [143]7e, f). For each of these three brain regions,
additional annotation on the PLS-DA subjects by APOE genotype, Braak
stage, and MMSE were visualized for the EC, SFG, and PoCG PLS-DA models
(Supplementary Fig. [144]5).
Fig. 7. PLS-DA models using blood biomarkers to predict AD status in other
brain regions.
[145]Fig. 7
[146]Open in a new tab
a Z-score of significant genes identified in the human EC dataset. b
PLS-DA using the significant genes on the EC data with loadings on LV1
and LV2. c Z-score of significant genes identified in the human SFG
dataset. d PLS-DA using the significant genes on the SFG data with
loadings on LV1 and LV2. e Z-score of significant genes identified in
the human PoCG dataset. f PLS-DA using the significant genes on the
PoCG data with loadings on LV1 and LV2. For all brain regions, the
significance of the genes was determined by a Mann–Whitney adjusted by
Benjamini–Hochberg (p adjusted <0.20) across AD and control groups.
Single cell transcriptomics biomarkers from erythroid cells contributed to
the T2D PC2 separation of AD and control patients
After demonstrating biomarkers in T2D blood that could differentiate
individuals with AD or control in both blood and the brain, we sought
to identify cell types expressing our signature genes using single-cell
RNA-sequencing (scRNA-seq) data. We identified a scRNA-seq data from
GEO that compared peripheral blood mononuclear cells across AD and
control with 10x Genomics Chromium single cell
([147]GSE226602)^[148]49. The GEO dataset contained 10 females (mean
age: 70.4 ± 7.1) and 12 males (mean age: 75.0 ± 8.8) for control, and
14 females (mean age: 72.4 ± 9.8) and 14 males (mean age: 72.7 ± 11.3)
for AD.
In our workflow, we processed the data containing 270,884 cells using
Seurat and visualized differentiated cell clusters using a uniform
manifold approximation and projection (UMAP) (Fig. [149]8a). We took
the top and bottom 50 genes from the T2D PC2 and identified if these
signatures were differentially expressed in each cell type. Our UMAP
displayed 12 different cell types, including CD4+ T, CD8+ T, TRB7-2+ T,
exhausted T, B, natural killer (NK), classical monocyte, non-classical
monocyte, plasmacytoid dendritic cell (pDC), erythroid, progenitor, and
platelet cells (Fig. [150]8b). These cells are associated with the
adaptive, innate, and other hematopoietic cells.
Fig. 8. Validation of TransComp-R findings with single-cell RNA-seq analysis.
[151]Fig. 8
[152]Open in a new tab
a Analysis pipeline with the scRNA-seq data to identify cell types and
differentially expressed genes. (Illustration from biorender.com). b
UMAP and projected clusters of the 12 clustered scRNA cell types from.
c UMAPs annotated by how well the top and bottom 50 genes from T2D PC2
are expressed in each cell type. Quantification performed by the module
score (left) and percentage of the total genes of the top and bottom 50
genes from T2D PC2 (right).
We sought to identify which of these cell types expressed gene
expression signatures encoded on the T2D PC2. We first quantified the
gene set activity of the top and bottom 50 score-ranked genes in T2D
PC2 and found that all cell types except exhausted T cells exhibited
elevated gene expression levels (Fig. [153]8b). As another approach, we
took the top and bottom 50 genes by their score in T2D PC2 and
determined which cell types contained the greatest number of signature
genes. The cells were also consistent with the findings with the module
score, showing that the genes loaded on T2D PC2 may be expressed in
human blood.
To identify potential differences of expressed genes across the cell
types, we compared genes that had a log[2] fold change greater than 0.5
compared to respective control groups (Fig. [154]9a). Of the twelve
different cell types, eight cells (TRB7-2+ T, B, classical monocyte
non-classical monocyte, pDC, erythroid, progenitor, and platelet)
contained at least one gene. Among the eight, erythroid, platelets, and
progenitors shared the greatest number of genetic signatures of our T2D
PC2.
Fig. 9. Comparison and differential expression analysis of cell types.
[155]Fig. 9
[156]Open in a new tab
a All genes in each cell type with a log[2] fold change greater than
0.5 or less than −0.5 were compared for cell types. Significance was
not considered to identify which cells expressed genes from the top 50
and bottom 50 in T2D PC2. Differential gene expression analysis of b
erythroid, c platelet, and d progenitor cells. A p value < 0.05 and
|log[2] fold change | > 0.5 was considered significant.
Having found a majority of the top and bottom 50 genes from T2D PC2 in
erythroid, platelets, and progenitor cells, we performed differential
expression analysis to understand which genes were significantly
different across AD and control populations in blood. We found 15
differentially expressed genes in erythroid cells (Fig. [157]9b), none
in platelets (Fig. [158]9c), and one in progenitor cells (Fig.
[159]9d). The erythroid cell differentially downregulated RPS27, RPS20,
RPS10, NDUFB1, MYH9, TGFB1, RPL36A, COMMD6, SNRPD2, CD52, COX6C, ZYX,
RPL39, RPS26, and RPS18. Additionally, ATXN2L was the only gene found
differentially upregulated in the progenitor cells. Several of these
ribosomal and mitochondrial functions were also found in our
blood-based analysis with bulk RNA-sequenced data.
Discussion
In this study, we used blood transcriptomics data from human T2D and AD
studies to understand the potential pathways by which T2D affects AD
pathology. Our cross-disease model identified a T2D-derived blood gene
signature predictive of AD status and therapeutic candidates associated
with non-T2D and AD status. A subset of genes in the T2D blood were
predictive of AD status in four brain regions, showing the
cross-disease model’s significance and implications. We then validated
our findings using scRNA-seq blood data from individuals with and
without AD.
Chemokine signaling pathways were involved in patients of T2D^[160]50
by routes of downstream inflammation^[161]51 and AD^[162]52 with
connections to cognitive decline. Wnt signaling also played a role in
metabolic dysregulation^[163]53 and loss of synaptic integrity^[164]54.
Insulin pathways were enriched in AD conditions, consistent with prior
literature showing insulin resistance^[165]55 is associated with an
increased risk for AD development^[166]56. Pathways, such as MAPK and
NOTCH, were enriched in AD conditions, with MAPK-p38 phosphorylation
associated with both T2D and AD^[167]57,[168]58. Notch1 expression
decreases beta cell masses and insulin secretion in rodents^[169]59 and
was significantly different across control and AD groups in our
analysis^[170]60. FC epsilon RI is also altered in T2D and AD cases,
such that downstream mast cells are affected^[171]61.
We also identified cellular processes and metabolism pathways on the AD
predictive T2D PC2. Elevated neutrophil activation to chemokines and
transendothelial migration is associated with T2D^[172]62. In AD,
monocytes and human brain microvascular endothelial cells expressing
CXCL1 are associated with amyloid-beta-induced migration from the blood
to the brain^[173]63. FC gamma receptor-mediated phagocytosis is
observed in T2D in compromised monocyte phagocytosis^[174]64. PRKCD is
associated with amyloid-beta significantly triggered neurodegeneration
in AD^[175]65. In blood, coagulation is active in hyperglycemia^[176]66
and factor XIII Val34Leu gene polymorphism is associated with sporadic
AD^[177]67. Lastly, heme metabolism was associated with T2D and AD. A
T2D-based study reported that increased dietary heme iron intake
increased the risk of T2D^[178]68. In an AD study, altered heme
metabolism was noted in AD brain samples^[179]69 (Supplementary Data
[180]6).
From our drug screening analysis, we identified T2D and AD medications
whose perturbed gene signatures significantly associated with the
healthy state on the cross-disease predictive T2D PC2. The T2D
(alogliptin and glipizide) and AD (galantamine and donepezil)
medications that induced gene signatures correlated with T2D PC2 are
current therapies for T2D and AD^[181]70. Alogliptin, an FDA-approved
T2D, has been shown to reduce hippocampal insulin resistance in
amyloid-beta-induced AD rodent models^[182]71. Glipizide has
conflicting findings, with one study showed improved glycemic control
and memory^[183]72 and another reported the drug be associated with
higher risk of AD than metformin, another T2D medication^[184]73.
Therefore, medications that have therapeutic potential for people with
T2D while simultaneously elevating the risk for AD are possible drugs
to prioritize away from patients with a history or risk for AD.
Overall, the identification of these medications in our analysis shows
promise for high-throughput drug screening integrated in a
cross-disease modeling framework for comorbid conditions.
Our PLS-DA models identified signatures encoded in the T2D PC2
predicted AD status in brain tissue and many genes from our blood-based
signature have associations with AD pathology in the brain. Individuals
with MCI and AD show decreased SNRPD2 expression levels in the
hippocampus^[185]74–[186]76, as well as decreased
POLR2K^[187]77,[188]78. COX deficiency has been reported in both AD
brain and blood samples^[189]79. CHGA was associated with senile and
pre-amyloid plaques^[190]80 and linked to AD compared to control groups
in cerebrospinal fluid^[191]81. Our findings in literature show that
ALB may differ across blood and brain^[192]82,[193]83. While others
reported decreased serum ALB levels increased the risk of AD, our
findings in the hippocampus showed the opposite effects.
In the EC, SFG, and PoCG brain regions, RIN3 was reported to have
significantly elevated mRNA levels in the hippocampus and cortex of
APP/PS1 mouse models for AD^[194]84 and is a signature gene expressed
in peripheral blood and the brain^[195]84,[196]85. In a metformin
response, drug-naïve T2D study, RPL36A correlated with a change in
hemoglobin A1c levels^[197]86. In AD, RPL36A was found to be
downregulated in cells stimulated by amyloid-beta^[198]87. This
downregulation was consistent with our findings in the AD groups
(Supplementary Data [199]7). These findings suggest that some gene
signatures in T2D blood predictive of AD are present in the brain,
linking blood-based biomarkers to primary tissue pathobiology.
Our comparison with scRNA-seq analysis demonstrated that gene
expression signatures in erythroid cells strongly contributed to the
model separations. Erythroid cells are among the red blood cell lineage
and perform essential functions such as oxygen transport, carbon
dioxide removal, and pH balancing. Within this lineage, studies have
demonstrated that there are morphological and membrane changes of
erythrocytes among people with T2D compared to healthy individuals,
such that there are abundant distorted forms^[200]88–[201]90.
Interestingly, there are also morphological changes to erythrocytes in
cases of AD^[202]91. Such changes to red blood cells can impact an
individual’s ability to carry oxygen and nutrients^[203]92, alter the
immune system^[204]93, and affect other health conditions^[205]94.
Thus, these alterations to the quality of blood-based cells may affect
downstream pathways and contribute to the eventual development of
conditions such as T2D^[206]95 and AD^[207]96. Therefore, disruption to
erythrocytes and precursor cells may be a potential route for further
investigation between T2D and AD.
A limitation to our study is that that data from large-scale human
studies simultaneously studying the relationship between T2D and AD are
still rare, meaning sample sizes and demographic representation of the
human population across sex, age, and other variables is limited.
Therefore, biomarkers associated with such demographic information
should be interpreted with caution. Additionally, our cross-disease
TransComp-R model selects for matching gene expression markers present
in all datasets, thus there is a possibility that some informative
genes may have been omitted. Addressing this gap in the AD-T2D axis
would improve opportunities to integrate other clinical variables, such
as hemoglobin A1c for T2D, pathological results of amyloid-beta
quantification for AD, and other human demographic variables known to
be linked to AD and T2D pathology.
Our work introduced a new application for cross-disease modeling using
TransComp-R to identify significantly relevant shared pathways by which
T2D influences AD development. We found gene signatures in the
peripheral blood of T2D subjects predictive of AD pathology, and
identified a subset of genes in the blood that significantly predicted
AD status in four brain regions. These findings shed insight into the
shared comorbidity between T2D and AD and encourage future applications
of TransComp-R for cross-disease modeling.
Materials and methods
Data selection
Human AD and T2D transcriptomic datasets were selected on GEO with the
requirements that samples were collected from similar blood sample
collection processes, a sample size of 10 or greater per condition, and
demographic information containing sex and age. The datasets on GEO
were scanned by using combinations of phrases, including “Alzheimer’s
disease,” “diabetes,” “blood,” and “gene expression.” Like the blood
data, post-mortem human brain tissue gene expression was identified
using the information criteria containing human data with a cohort size
greater than 10 per condition. Terms used to identify data on GEO
included “brain,” “Alzheimer’s disease,” “human,” and “gene
expression.”
Pre-processing and normalization
Transcriptomic AD and T2D human data were acquired from GEO using
Bioconductor tools in R (GEOquery ver. 2.70.0, limma ver. 3.58.1, and
Biobase ver. 2.62.0)^[208]97–[209]99. To reduce potential bias from
younger age participants in the data, we removed all subjects 55 years
old or below from the study in both the AD and T2D datasets with the
justification of balancing the established age of late onset of AD (65
years). The T2D baseline group was used. For the AD cohorts, conditions
that were not AD or control were excluded from the study. The datasets
were then log[2] transformed and matched for the same gene overlap. The
genes shared across all AD and T2D datasets were normalized by z-score
before computational modeling with TransComp-R.
Cross-disease modeling with TransComp-R
We conducted TransComp-R by applying PCA on the T2D data with both
disease and control groups. The number of PCs that encoded
transcriptomic variation between healthy and T2D subjects was limited
to a total explained cumulative variance of 80%. The two AD datasets
were individually projected into the T2D PCA space, such that there
were two separate models: T2D with AD cohort 1 and T2D with AD cohort
2. The projection of AD data into the T2D PCA space can be described by
matrix multiplication:
[MATH: PAD,
T2DsxPC=<
mi>XADsxgQT2DgxPC :MATH]
where matrix P^s x PC, the projection of AD data onto the T2D space,
defined by columns of T2D PCs and rows of AD subjects, is represented
by the product of matrix X^s x g and Q^g x PC. Here, s is represented
by AD subjects, g is represented by the gene list shared by AD and T2D,
and PC is the principal components from the T2D space.
Variance explained in Alzheimer’s disease by principal components of type 2
diabetes
To determine the translatability of T2D variance onto the AD data, we
quantified the percent variability that is explained in AD by the T2D
PCs with the following equation:
[MATH: VarianceExplainedinADbyT2D=qiT[XTX]qi∑diag(QTXTXQ) :MATH]
where AD data matrix X, projected onto a matrix Q containing columns of
T2D PCs by matrix multiplication (T representing a matrix transpose).
The percent variance of AD in X explained by a PC (q[i]) of Q was then
calculated.
Variable selection of T2D PCs
The T2D PCs predictive of AD outcomes were identified by employing
LASSO across twenty random rounds of ten-fold cross-validations
regressing the AD positions in T2D PC space against AD disease status.
Demographic sex and age variables describing the subjects from the AD
datasets were included in the GLM:
[MATH:
Y~β0+β1PC+β2
Sex+β3Age
+β4SexPC+β5AgePC :MATH]
PCs with a coefficient frequency greater than 4 of the 20 rounds (25%
selection frequency) in at least two of the three PC terms (PC, Sex*PC,
or Age*PC) were selected for GLMs with individual PCs regressed against
AD outcomes. T2D PCs that were consistently significant in both AD
cohorts (p value < 0.05) were selected for further biological
interpretation.
Gene set enrichment analysis
Loadings of the PCs selected by the GLM were analyzed with GSEA in R
(msigdbr ver. 7.5.1, fgsea ver. 1.28.0, and clusterProfiler ver.
4.10.1)^[210]100–[211]102. Two data collections (KEGG and Hallmark)
were downloaded from the Molecular Signatures Database to identify
enriched biological pathways. Identified pathways were determined to be
significant, with a Benjamini–Hochberg adjusted p value of less than
0.01 to account for multiple hypothesis testing. The imputed parameters
to run GSEA included a minimum gene size of 5, a maximum gene size of
500, and epsilon, the tuning constant of 0. The default setting of 1000
permutations was used.
Identifying shared genes across enriched biological pathways
We used igraph (ver. 2.0.3)^[212]103 in R to identify overlapping genes
that may be commonly enriched across multiple biological pathways
identified from GSEA. We then processed the R-generated data in
Cytoscape (ver. 3.10.2)^[213]104 to enhance pathway visualization. We
established the nodes representing different biological pathways and
the edge thickness by the number of overlapping genes between the two
biological pathways. Additionally, the node size was determined by the
number of total enriched genes contributing to the biological pathway
as determined by GSEA, with the node colors red and blue used to
discern pathway associations with AD or control groups, respectively.
Cross-disease fold-change comparison
The relationship of different gene expression across AD and T2D
conditions was compared using the log[2] fold change of each gene
shared across the AD and T2D blood data. For each dataset (T2D and AD),
the log[2] fold change of each gene expression was calculated by taking
the log[2] of the average gene expression of the disease groups divided
by the average gene expression of the control groups. Different gene
expression relationships were compared across the T2D and AD datasets.
Sex-based comparison across type 2 diabetes principal component scores
PC scores were compared across sex and disease conditions to compare PC
predictability across sex demographics. A Mann–Whitney pair-wise test
was used to compare AD females to control females and AD males to
control males. To account for multiple hypothesis testing, a
Benjamini–Hochberg adjusted p value less than 0.05 was determined
significant for the analysis.
Computational gene expression correlational analysis
Potentially therapeutic drugs correlated with T2D PCs predictive of AD
were screened using publicly available data from the L1000 Consensus
Signatures Coefficient Tables (Level 5) from the LINCS database. Before
screening, the LINCS drug data was pre-processed by excluding all drugs
with no known targets based on the LINCS small molecules metadata.
To identify candidate drugs associated with T2D and AD, two data
sources were compiled: DEGs from each respective drug from LINCS and
the loadings from the T2D PCs predictive of AD. DEGs for each drug were
determined through the following: The characteristic direction values,
which signified the drug’s up- or down-regulation of a gene, were
scaled to obtain their z-score values^[214]42. The list of DEGs for
each drug was then identified if the gene’s z-score value presented
with a p value less than 0.05. The original characteristic direction
values for the selected genes for each respective drug were then
isolated. For each T2D PC that was able to stratify transcriptomic
variance between control and AD subjects, differentially expressed drug
genes and PC gene loadings were matched. A Spearman correlation was
calculated to determine the correlation between PC loadings and the
DEGs’ characteristic direction coefficients for each drug. For a given
T2D PC of interest, drugs were ranked by their respective Spearman’s ρ
values. The correlations’ p values were corrected by Benjamini–Hochberg
before visualizing the drugs’ ranks against their ρ values (adjusted p
value < 0.05).
Filtering genetic blood biomarkers for computational modeling of brain tissue
data
The top 50 and bottom 50 genes, ranked by their respective scores on
the T2D PC predictive of AD in blood, were used to filter genes of AD
brain tissue data. After filtering for matching genes, a
Benjamini–Hochberg adjusted Mann–Whitney test was performed to
determine significant genes. An adjusted p value of less than 0.20 was
deemed significant to allow for a more permissible list of potential
genes that relate the blood to the brain. The significant genes were
then used for PLS-DA modeling.
Partial least squares discriminant analysis
Using R (mixOmics ver. 6.26.0)^[215]105, we constructed a PLS-DA model
to determine the predictability of blood-based gene expression markers
in the human brain. Specifically, we used PCs derived from T2D blood
transcriptomic data predictive of AD outcomes in blood profiles and
selected the top 50 and bottom 50 gene loadings as a filter for
hippocampal tissue transcriptomic data in human subjects. A PLS-DA
model screening for the 100 genes was used to determine if all genes
driving the transcriptomic variation in the T2D PC could stratify AD
and control in brain tissue. As an additional follow-up, the 100
filtered genes selected by the blood data significantly distinguishable
among AD and control in human blood were also used to construct the
PLS-DA model. The number of latent variables used for the model was
determined by 100 randomly repeated three-fold cross-validation based
on the model with the lowest cross-validation error rate.
As a way to determine the most important predictors driving separation
and predictive accuracy in the PLS-DA model, we calculated the VIP
score for each gene. For a given number of PLS-DA components A, the VIP
for each gene predictor, k, is calculated by:
[MATH:
VIPk=K·∑a=1A<
/mrow>wak2·SSA
aA·SSYtotal1
/2 :MATH]
where K is the total number of gene predictors, w[ak] is the weight of
predictor k in the a^th LV component. The total sum of squares
explained in all LV components is represented by SSY[total].
A calculated VIP score greater than 1 signifies that a given gene is an
important variable for a specific LV in the PLS-DA model.
AD subjects were annotated by their APOE genotype, Braak stage, and
MMSE score among each PLS-DA model. The MMSE numerical scores, which
evaluate cognitive impairment, were aggregated based on standardized
scoring metrics such that 30–26 was normal, 25–20 was mild, 19–10 was
moderate, and 9–0 was severe^[216]106. The control groups did not have
any clinical records.
Single-cell RNA-sequence validation analysis
We screened for single-cell blood-derived transcriptomics data on GEO
with similar searching criteria on the bulk RNA-seq data. We processed
the scRNA-seq data using the Seurat package (ver. 5.2.1)^[217]107 in R.
We removed cells with less than 200 mapped features and data with
mitochondrial gene content 10% or greater for quality control. We
normalized, scaled, and centered the data using SCTransform with
cell-cycle scores (S score and G2M score) regressed out.
Uniform manifold approximation and approximation analysis
We calculated 3000 variable features using SelectIntegrationFeatures.
Next, we performed PCA and batch corrected for sample variation using
harmony (ver. 1.2.3)^[218]108. For UMAP visualization, we generated
clusters using the first 25 PCs. We used the Louvain algorithm with a
resolution of 0.6 and 30 nearest neighbors for clustering. We also
identified subclusters to discern clusters into individual cell types
by using FindSubCluster in Seurat by using the Louvain algorithm with a
resolution of 0.1 and 30 nearest neighbors for clustering.
Differential expression analysis of TransComp-R PCs
We identified signature gene markers by randomly sampling 500 cells
from each cluster with a Wilcoxon rank sum test to compare the
expression across all clusters. We identified blood marker clusters of
AD by referencing to the Human Protein Atlas. We performed differential
expression analysis using MAST (ver. 1.32.0)^[219]109 for each cell
type with patient AD and control conditions as the covariate. To
validate findings found from the TransComp-R PCs driving separation of
AD and control groups, we filtered the gene list by the top and bottom
50 gene scores to identify differentially expressed genes across the
different cell types. To identify meaningful changes between AD and
control groups from our scRNA-seq analysis, we filtered out genes that
had an absolute log[2] fold change magnitude less than 0.5. For
statistical significance, remaining genes with p value < 0.05 was
considered differentially expressed. Also overlapped any genes across
groups to identify potential shared markers across cell types.
Quantifying gene expressions levels from TransComp-R PCs in cell types
To determine cell types that were enriched for the top and bottom 50
genes on the PC identified by the TransComp-R pipeline, we merged the
genes into a singular module and scored each cell using the
AddModuleScore function in Seurat. We calculated this expression level
of a gene module for each cell to compare the cell-type expression of
the genes compared to the human control group. This approach allows us
to quantify how well the set of genes in the T2D PC selected by
TransComp-R is expressed in individual cell types.
Supplementary information
[220]Supplementary materials^ (2.8MB, pdf)
[221]Supplementary Data 1^ (11.5KB, xlsx)
[222]Supplementary Data 2^ (11.5KB, xlsx)
[223]Supplementary Data 3^ (11.5KB, xlsx)
[224]Supplementary Data 4^ (31.4KB, xlsx)
[225]Supplementary Data 5^ (11.6KB, xlsx)
[226]Supplementary Data 6^ (13.4KB, xlsx)
[227]Supplementary Data 7^ (13.6KB, xlsx)
Acknowledgements