Abstract

   Alzheimer’s disease (AD) is associated with heterogeneous atrophy
   patterns. We employed a semi-supervised representation learning
   technique known as Surreal-GAN, through which we identified two latent
   dimensional representations of brain atrophy in symptomatic mild
   cognitive impairment (MCI) and AD patients: the “diffuse-AD” (R1)
   dimension shows widespread brain atrophy, and the “MTL-AD” (R2)
   dimension displays focal medial temporal lobe (MTL) atrophy.
   Critically, only R2 was associated with widely known sporadic AD
   genetic risk factors (e.g., APOE ε4) in MCI and AD patients at
   baseline. We then independently detected the presence of the two
   dimensions in the early stages by deploying the trained model in the
   general population and two cognitively unimpaired cohorts of
   asymptomatic participants. In the general population, genome-wide
   association studies found 77 genes unrelated to APOE differentially
   associated with R1 and R2. Functional analyses revealed that these
   genes were overrepresented in differentially expressed gene sets in
   organs beyond the brain (R1 and R2), including the heart (R1) and the
   pituitary gland, muscle, and kidney (R2). These genes were enriched in
   biological pathways implicated in dendritic cells (R2), macrophage
   functions (R1), and cancer (R1 and R2). Several of them were “druggable
   genes” for cancer (R1), inflammation (R1), cardiovascular diseases
   (R1), and diseases of the nervous system (R2). The longitudinal
   progression showed that APOE ε4, amyloid, and tau were associated with
   R2 at early asymptomatic stages, but this longitudinal association
   occurs only at late symptomatic stages in R1. Our findings deepen our
   understanding of the multifaceted pathogenesis of AD beyond the brain.
   In early asymptomatic stages, the two dimensions are associated with
   diverse pathological mechanisms, including cardiovascular diseases,
   inflammation, and hormonal dysfunction—driven by genes different from
   APOE—which may collectively contribute to the early pathogenesis of AD.
   All results are publicly available at
   [122]https://labs-laboratory.com/medicine/.

   Subject terms: Personalized medicine, Predictive markers

Introduction

   Alzheimer’s disease (AD) is the most common cause of dementia in older
   adults and remains incurable despite many pharmacotherapeutic clinical
   trials, including anti-amyloid drugs [[123]1, [124]2] and anti-tau
   drugs [[125]3]. This is largely due to the complexity and multifaceted
   nature of the underlying neuropathological processes leading to
   dementia. The research community has embraced several mechanistic
   hypotheses to elucidate AD pathogenesis [[126]4–[127]6]. Among these,
   the amyloid hypothesis has been dominant over the past decades and has
   proposed a dynamic biomarker chain: extracellular beta-amyloid (Aβ)
   triggers a cascade that leads to subsequent intracellular
   neurofibrillary tangles, including hyperphosphorylated tau protein (tau
   and p-tau), neurodegeneration, including medial temporal lobe atrophy,
   and cognitive decline [[128]7, [129]8]. However, the amyloid hypothesis
   has been re-examined and revised due to substantial evidence that
   questions its current form [[130]8–[131]10]. While amyloid remains
   critical to AD development, the amyloid cascade model has been
   continually refined as other biological factors are discovered to
   influence the pathway from its accumulation to cell death.

   Cardiovascular dysfunction has been widely associated with an increased
   risk for AD[[132]11]. There is also growing evidence that inflammatory
   [[133]10–[134]12] and neuroendocrine processes [[135]5, [136]13]
   influence pathways of amyloid accumulation and neuronal death. The
   inflammation hypothesis claims that microglia and astrocytes release
   pro-inflammatory cytokines as drivers, by-products, or beneficial
   responses associated with AD progression and severity
   [[137]12–[138]14]. The neuroendocrine hypothesis, first introduced in
   the context of aging [[139]15], has been extended to AD [[140]16],
   where it proposes that neurohormones secreted by the pituitary and
   other essential endocrine glands can affect the central nervous system
   (CNS), which subsequently contribute to developing AD. For example,
   Xiong and colleagues [[141]17] recently found that blocking the action
   of follicle-stimulating hormone in mice abrogates the AD-like phenotype
   (e.g., cognitive decline) by inhibiting the neuronal C/EBPβ–δ-secretase
   pathway. These findings emphasize the need to further elucidate early
   brain and body changes well before they lead to irreversible clinical
   progression [[142]18].

   Recent advances in artificial intelligence (AI), especially deep
   learning (DL), applied to magnetic resonance imaging (MRI), showed
   great promise in biomedical applications [[143]19, [144]20]. DL models
   discover complex non-linear relationships between phenotypic and
   genetic features and clinical outcomes, thereby providing informative
   imaging-derived endophenotypes [[145]21]. In particular, AI has been
   applied to MRI to disentangle the neuroanatomical heterogeneity of AD
   with categorical disease subtypes [[146]22–[147]26]. The genetic
   underpinnings [[148]17, [149]27, [150]28] of this neuroanatomical
   heterogeneity in AD are also complex and heterogeneous. The most recent
   large-scale genome-wide association study [[151]28] (GWAS: 111,326 AD
   vs. 677,633 controls) has identified 75 genomic loci, including APOE
   genes, associated with AD. However, such case-control group comparisons
   conceal genetic factors that might contribute differentially to
   different dimensions of AD-related brain change. More importantly, the
   genetic variants that contribute to the initiation and early
   progression of brain change in younger and asymptomatic individuals are
   poorly understood.

   In this study, we utilize a novel semi-supervised deep learning
   approach, Surreal-GAN, to characterize the neuroanatomical
   heterogeneity of the disease. Unlike our previous model, Smile-GAN
   [[152]22], which categorized subtypes, Surreal-GAN generates multiple
   continuous latent dimensional representations, simultaneously
   accounting for spatial and temporal disease heterogeneity, similar to
   what was accomplished in a previous unsupervised clustering model known
   as Sustain [[153]24]. These multi-dimensional scores reflect the
   co-expression level of respective brain atrophy dimensions in the same
   patient; this is biologically plausible, as brain diseases like AD
   often progress continuously over a long disease trajectory. Refer to
   the method (Surreal-GAN for disease heterogeneity) and Supplementary
   eMethod [154]1 for methodological details of Surreal-GAN, comparisons
   to other subtyping methods, and strengths of semi-supervised
   representation learning. We hypothesized that genetic variants,
   potentially unrelated to APOE genes, contribute to early manifestations
   of multiple dimensions of brain atrophy in early asymptomatic stages.
   We first trained the Surreal-GAN model to define the AD dimensions to
   test this hypothesis in the late symptomatic stages. We then examined
   their expression back to early asymptomatic stages. In our previous
   study [[155]29], we derived two neuroanatomical dimensions (R1 and R2)
   by applying Surreal-GAN to the MCI/AD participants (target population)
   and cognitively unimpaired (CU) participants (reference population)
   from the Alzheimer’s Disease Neuroimaging Initiative study (ADNI
   [[156]30]). Herein, we applied the trained model to three asymptomatic
   populations and one symptomatic population: the general population
   (N = 39,575; age: 64.12 ± 7.54 years) from the UK Biobank (UKBB
   [[157]31]) excluding demented individuals; the cognitively unimpaired
   population (N = 1658; age: 65.75 ± 10.90 years) from ADNI and the
   Baltimore Longitudinal Study of Aging study (BLSA [[158]32]); the
   cognitively unimpaired population with a family risk (N = 343; age:
   63.63 ± 5.05 years) from the Pre-symptomatic Evaluation of Experimental
   or Novel Treatments for Alzheimer’s Disease (PREVENT-AD [[159]33]); the
   MCI/AD population (N = 1534; age: 73.45 ± 7.69 years) from ADNI and
   BLSA. Refer to the method (Study design and populations) and Table
   [160]1 for details of the definition of these populations.

Table 1.

   Study characteristics.
   Population Study Participant (N) Scan (N) Age (year) Sex/female CU AD
   MCI proxy-AD
   MCI/AD ADNI & BLSA 1534 7019 73.45 (54.27, 93.00) 888/58% 0 424 1110 NA
   General UKBB 39,575 40,981 64.12 (44.56, 82.27) 18,625/47% 39,574^b 1
   NA 10,189^c
   CU ADNI & BLSA 1658 6143 65.75 (22.00, 80.00) 939/57% 1658 0 0 NA
   CU with a family risk PREVENT-AD 343 1215 63.63 (55.13, 84.22) 243/71%
   343^a NA NA 343^a
   [161]Open in a new tab

   We present the age with the mean, min, and max in each population. The
   definition of cognitively unimpaired (CU)^a in PREVENT-AD, asymptomatic
   participants^b in UKBB, and proxy-AD^c in UKBB are detailed as (a)
   Participants (proxy-AD and CU with a family risk) from the PREVENT-AD
   study were recruited with the following criteria: (i) being cognitively
   normal, (ii) having a family history of AD, (iii) aging within 15 years
   from the age of disease onset of their youngest relative, and (iv) no
   history of neurological or psychiatric diseases; (b) The UKBB
   participants (the general population) represent a general population
   with healthy Aging and diseases (not AD, specifically). We excluded
   those diagnosed with all sources of dementia (G30 in ICD-10 diagnoses,
   see below). However, these asymptomatic participants might have
   diagnoses of other illnesses or comorbidities based on ICD-10; (c)
   Participants with proxy-AD in UKBB are defined by a family history of
   AD with the following criteria: (i) illnesses_of_father_f20107 and (ii)
   illnesses_of_mother_f20110.

Materials and methods

Study design and populations

   The current study consists of four main populations (Table [162]1),
   which were jointly consolidated by the iSTAGING [[163]34] and the AI4AD
   consortia ([164]http://ai4ad.org/): the iSTAGING consortium
   consolidated all imaging and clinical data; imputed genotyping data
   were downloaded from UKBB; the AI4AD consortium consolidated the
   whole-genome sequencing (WGS) data for the ADNI study. The iSTAGING
   consortium is an NIH-funded effort that systematically and
   statistically consolidates and harmonizes brain imaging data for the
   study of aging and AD, including different ethnicity groups and
   demographics, and covers the entire lifespan. The AI4AD consortium aims
   to leverage the power of AI to study AD and aging, which also
   consolidates WGS data across the USA. Supplementary eMethod [165]2
   details each population’s definition and inclusion criteria. Our goal
   is to consolidate and harmonize large-scale lifespan imaging data to
   model the full spectrum of Alzheimer’s disease and assess how the
   identified AD dimensions are expressed across various stages of the
   disease, especially at the early stages.

Image preprocessing

   All T1w-weighted MR images were first corrected for magnetic field
   intensity inhomogeneity [[166]35]. A deep learning-based skull
   stripping algorithm was applied for the removal of extra-cranial
   material. In total, 145 anatomical regions of interest (ROIs) were
   generated in gray matter (GM, 119 ROIs), white matter (WM, 20 ROIs),
   and ventricles (6 ROIs) using a multi‐atlas label fusion method
   [[167]36] (Supplementary eMethod [168]3). The 119 ROIs were
   statistically harmonized by an extensively validated approach, i.e.,
   ComBat-GAM [[169]37], using the entire imaging data of iSTAGING.
   Supplementary eFig. [170]1 demonstrates the normality check for the
   MUSE ROI (right accumbent area) before and after statistical
   harmonization, illustrating that our statistical harmonization enhanced
   the normality of ROIs across various studies. The harmonized MUSE ROIs
   were then fit to Surreal-GAN to derive the dimensions.

Surreal-GAN for disease heterogeneity

   Surreal-GAN [[171]29] (Supplementary eFig. [172]2) dissects underlying
   disease-related heterogeneity via a deep representation learning
   approach under the principle of semi-supervised learning [[173]22,
   [174]23]. At a high level, its most fundamental novelty is that it
   provides a continuous representation of the presence of multiple,
   non-exclusive abnormal brain patterns in each individual, rather than
   clustering individuals into one of many clusters, i.e., disease
   subtypes. More specifically, several methodological advancements were
   considered compared to its predecessor, Smile-GAN [[175]22]. First,
   Surreal-GAN is to model neuroanatomical heterogeneity by considering
   both spatial and temporal (i.e., disease severity) variation using only
   cross-sectional MRI data. Secondly, Surreal-GAN disentangles the
   neuroanatomical heterogeneity of AD by enabling patients to
   simultaneously exhibit multiple distinct imaging patterns (i.e., high
   scores for expressing all these patterns), resulting in
   high-dimensional scores across multiple dimensions. Lastly, in contrast
   to prior probability-based clustering methods like Smile-GAN,
   Surreal-GAN operates without the constraint that all dimensional scores
   must sum to 1. This allows for a more normal distribution of
   dimensional scores suited for GWAS (Supplementary eFig. [176]3).
   Further methodological details are elaborated upon in Supplementary
   eMethod [177]1.

   Alternative clustering techniques, such as Sustain [[178]24] and
   Bayesian latent [[179]38] methods, are available for deciphering the
   neuroanatomical heterogeneity in AD [[180]22–[181]26, [182]39].
   Surreal-GAN distinguishes itself from these approaches based on
   fundamental methodological distinctions, such as its utilization of
   semi-supervised deep learning compared to the unsupervised approach of
   others [[183]40]. Additionally, Surreal-GAN generates continuous
   dimensions associated with distinct phenotypic outcomes, allowing the
   simultaneous co-expression of multiple patterns instead of categorizing
   patients into a single dominant subtype or stage, as seen in other
   methods. Notably, the two dimensions, R1 and R2, displayed correlations
   with the four subtypes generated by Smile-GAN, particularly R2
   exhibited a correlation with P3 (reflecting medial temporal lobe
   atrophy), and both R1 and R2 displayed correlations with P4
   (representing global atrophy), as depicted in Supplementary eFig.
   [184]4. These two dimensions capture the individual-level manifestation
   of the two distinct imaging atrophy patterns, contrasting them with
   healthy control groups within the ADNI study across the AD spectrum.

Brain and clinical variable associations

   We performed brain-wide associations for the 119 GM ROIs. For baseline
   brain-wide associations, linear regression models were fitted with R1
   and R2 dimensions as independent variables, with each ROI as the
   dependent variable, controlling for age, sex, intracranial volume
   (ICV), and/or diagnosis as covariates.

   We performed a two-step linear regression for longitudinal brain-wide
   associations. First, we derived the individual-level age change rate
   using a linear mixed-effects model. To this end, we included a
   participant-specific random slope for age and random intercept; age and
   sex were treated as fixed effects. Secondly, the same linear regression
   model as in baseline brain associations was fitted with the age change
   rate as the independent variable.

   We also performed clinical variable association for all clinical
   variables and neuropsychological testing available for each population,
   using the same model in the baseline brain-wide associations.
   Bonferroni correction of 119 GM ROIs was performed to adjust for the
   multiple comparisons. We included various clinical variables across
   different studies, including AI-derived imaging signatures, such as
   SPARE-AD [[185]41], an imaging surrogate for AD atrophy patterns, and
   SPARE-BA [[186]42] for brain aging-related atrophy. Other clinical
   variables also included cognitive scores (e.g., the Rey Auditory Verbal
   Learning Test (RAVLT)), modifiable risk factors (e.g., BMI), and CSF
   biomarkers (Aβ42). The detailed 45 clinical variables are presented in
   Supplementary Table [187]3.

Genetic analyses

   Genetic analyses were performed for the WGS data from ADNI and the
   imputed genotype data from UKBB [[188]43]. Our quality check protocol
   is detailed in Supplementary eMethod [189]4. This resulted in 1487
   participants and 24,194,338 SNPs in ADNI WGS data. For UKBB, we limited
   our analysis to European ancestry participants, resulting in 33,541
   participants and 8,469,833 SNPs [[190]44–[191]47].

   Using UKBB data, we first estimated the SNP-based heritability using
   GCTA-GREML [[192]48], controlling for confounders of age (at imaging),
   age-squared, sex, age-sex interaction, age-squared-sex interaction,
   ICV, and the first 40 genetic principal components, following a
   previous pioneer study [[193]49]. In GWAS, we performed a linear
   regression for each neuroanatomical dimension and included the same
   covariates as in the heritability estimates. We adopted the genome-wide
   P-value threshold (5 ×10^−8) in all GWAS. The annotation of genomic
   loci (displayed by its top lead SNP) and gene mappings, prioritized
   gene set enrichment, and tissue specificity analyses were performed
   using FUMA ([194]https://fuma.ctglab.nl/, version: v1.3.8)
   (Supplementary eMethod [195]5 and [196]6). A two-step procedure
   (Supplementary eMethod [197]7) was performed to determine if an
   annotated genomic locus or gene was associated with AD-related clinical
   traits. We calculated the polygenic risk scores (PRS) [[198]50] using
   both ADNI and UKBB genetic data (Supplementary eMethod [199]8).
   Finally, we constructed a target-drug-disease network for these genes
   associated with R1 and R2 to identify these “druggable genes”
   (Supplementary eMethod [200]9).

Results

Two dominant dimensions of brain atrophy found in MCI and AD

   In MCI/AD patients, the “diffuse-AD” dimension (R1) showed widespread
   brain atrophy without an exclusive focus on the medial temporal lobe
   (Fig. [201]1A and Supplementary eTable [202]1 for p values and effect
   sizes). In contrast, the “MTL-AD” dimension (R2) displayed more focal
   medial temporal lobe atrophy, prominent in the bilateral
   parahippocampal gyrus, hippocampus, and entorhinal cortex (Fig.
   [203]1A). All results, including p values and effect sizes (Pearson’s
   correlation coefficient r), are presented in Supplementary eTable
   [204]1. The atrophy patterns of the two dimensions defined in the
   symptomatic MCI/AD population (Fig. [205]1A) were present in the
   asymptomatic populations, albeit with a smaller magnitude of r
   (Supplementary eTables [206]1, [207]4 and [208]8). We presented the age
   distribution (Supplementary eFig. [209]5A), as well as the expression
   of R1 and R2, along with the population-level difference for the four
   populations in Supplementary eFig. [210]5B–D.

Fig. 1. The manifestation of the R1 and R2 dimensions of brain atrophy in the
MCI/AD population.

   [211]Fig. 1
   [212]Open in a new tab

   A Brain association studies reveal two dominant brain atrophy
   dimensions. A linear regression model was fit to the 119 GM ROIs at
   baseline for the R1 and R2 dimensions. The −log[10](p value) of each
   significant ROI (Bonferroni correction for the number of 119 ROIs:
   −log[10](p value) >3.38) is shown. A negative value denotes brain
   atrophy with a negative coefficient in the linear regression model. All
   the statistics (r, Pearson’s correlation coefficient) are presented in
   Supplementary Table [213]1. The brain maps denote the signed p value,
   and the range of r for each dimension is also shown. Of note, the
   sample size (N) for R1 and R2 is the same for each ROI. B Genome-wide
   association studies demonstrate that the R2, but not R1, dimension is
   associated with variants related to APOE genes (genome-wide p value
   threshold with the red line: −log[10](p value) >7.30). We associated
   each common variant with R1 and R2 using the whole-genome sequencing
   data from ADNI. Gene annotations were performed via positional,
   expression quantitative trait loci, and chromatin interaction mappings
   using FUMA [[214]58]. We then manually queried whether they were
   previously associated with AD-related traits in the GWAS Catalog
   [[215]55]. Red-colored loci/genes indicate variants associated with
   AD-related traits in previous literature. C Clinical association
   studies show that the R2 dimension is associated to a larger extent
   with AD-specific biomarkers, including SPARE-AD [[216]41], an imaging
   surrogate to AD atrophy patterns, and APOE ε4, the well-established
   risk allele in sporadic AD. The R1 dimension is associated to a larger
   extent with aging (e.g., SPARE-BA [[217]42], an imaging surrogate for
   brain aging) and vascular-related biomarkers (e.g., WML white matter
   lesion). The same linear regression model was used to associate the R1
   and R2 dimensions with the 45 clinical variables, including cognitive
   scores, modifiable risk factors, CSF biomarkers, disease/condition
   labels, demographic variables, and imaging-derived phenotypes. The
   radar plot shows representative clinical variables; results for all 45
   clinical variables are presented in Supplementary eTable [218]3. The
   SPARE-AD and SPARE-BA scores are rescaled for visualization purposes.
   The gray-colored circle lines indicate the p value threshold in both
   directions (Bonferroni correction for the 45 variables: −log[10](p
   value) >2.95). A positive/negative −log[10](p value) value indicates a
   positive/negative correlation (beta). The transparent dots represent
   the associations that do not pass the Bonferroni correction; the
   blue-colored dots and red-colored dots indicate significant
   associations for the R1 and R2 dimensions, respectively.

   At baseline, the R1-dominant group had 25.72% AD patients (N = 222 out
   of 863); the R2-dominant group consisted of 30.10% AD patients (202 out
   of 671). Within a 7-year follow-up period, MCI participants from both
   the R1-dominant and R2-dominant groups progressed to AD, with the
   R2-dominant group exhibiting a higher proportion of AD patients (40%
   vs. 25%) (Supplementary eFig. [219]6A, B); the two dominant dimensions
   developed independently throughout the 7-year follow-up period
   (Supplementary eFig. [220]6C, D).

APOE genes are associated with R2 but not with R1 in the MCI/AD population

   In GWAS, the R2 dimension, but not R1, was associated with
   well-established AD genomic loci (rs429358, chromosome: 19, 45411941;
   minor allele: C, p value: 1.05 × 10^−11) and genes (APOE, PVRL2,
   TOMM40, and APOC1) (Fig. [221]1B). The details of the identified
   genomic locus and annotated genes are presented in Supplementary eTable
   [222]2. The PRS of AD showed a slightly stronger positive association
   with the R2 dimension [r = 0.11, −log[10](p value) = 3.14] than with
   the R1 dimension [r = 0.09, −log[10](p value) = 2.31, Supplementary
   eFig. [223]7]. The QQ plots of the baseline GWAS are presented in
   Supplementary eFig. [224]8.

Clinical profiles of the R1 and R2 dimensions in the MCI/AD population

   Clinical association studies correlated the two dimensions with 45
   clinical variables and biomarkers. Compared to the R1 dimension, R2
   showed associations, to a larger extent than R1, with SPARE-AD and
   RAVLT. SPARE-AD quantifies the presence of a typical imaging signature
   of AD-related brain atrophy, which has been previously shown to predict
   clinical progression in both CU and MCI individuals [[225]41]. RAVLT
   measures episodic memory, a reliable neuropsychological phenotype in
   AD, which is also correlated with medial temporal lobe atrophy
   [[226]51, [227]52]. The R1 dimension was associated to a greater extent
   with 1) SPARE-BA, which captures the individualized expression of
   advanced brain age from MRI [[228]42]; 2) white matter lesions (WML),
   which are commonly associated with vascular risk factors and cognitive
   decline [[229]53], and 3) whole-brain uptake of 18F-fluorodeoxyglucose
   (FDG) PET, which is a biomarker of brain metabolic function and
   atrophy. Both dimensions were positively associated with cerebrospinal
   fluid (CSF) levels of tau and p-tau and negatively associated with the
   CSF level of Aβ42 [[230]54] (Fig. [231]1C), as well as the whole-brain
   standardized uptake value ratio of 18F-AV-45 PET (Supplementary eTable
   [232]3). Results for all 45 clinical variables, including cognitive
   scores, modifiable risk factors, CSF biomarkers, disease/condition
   labels, demographic variables, and imaging-derived phenotypes, are
   presented in Supplementary eTable [233]3 for p values and effect sizes
   (i.e., beta coefficients).

Clinical profiles of the R1 and R2 dimensions in the general population

   Brain association studies confirmed the presence of the two atrophy
   patterns in the general population (Fig. [234]2A and Supplementary
   eTable [235]4 for p values and effect sizes). In clinical association
   studies, the R1 dimension was significantly associated, to a larger
   extent than R2, with cardiovascular (e.g., triglycerides) and diabetes
   factors (e.g., Hba1c and glucose), executive function (TMT-B),
   intelligence, physical measures (e.g., diastolic blood pressure),
   SPARE-BA [−log[10](p value) = 236.89 for R1 and −46.35 for R2] and WML
   [−log[10](p value) = 120.24 for R1 and 2.06 for R2]. In contrast, the
   R2 dimension was more significantly associated with SPARE-AD
   [−log[10](p value) = 136.01 for R1 and 250.41 for R2] and prospective
   memory (Fig. [236]2B). Results for all 61 clinical variables, including
   cardiovascular factors, diabetic blood markers, social demographics,
   lifestyle, physical measures, cognitive scores, and imaging-derived
   phenotypes, are presented in Supplementary eTable [237]5 for p values
   and effect sizes.

Fig. 2. The expression of the R1 and R2 dimensions in the general population.

   [238]Fig. 2
   [239]Open in a new tab

   A Brain association studies confirm the presence of the two dimensions
   in the general population: the R1 dimension shows widespread brain
   atrophy, whereas the R2 dimension displays focal medial temporal lobe
   atrophy. p value and effect sizes (r, Pearson’s correlation
   coefficient) are presented in Supplementary eTable [240]4. The brain
   maps denote the signed p value, and the range of r for each dimension
   is also shown. Of note, the sample size (N) for R1 and R2 is the same
   for each ROI. B Clinical association studies further show that the R2
   dimension is associated with prospective memory, and the R1 dimension
   is associated with several cognitive dysfunctions, cardiovascular risk
   factors (e.g., triglycerides), and diabetes (e.g., HbA1c). The same
   linear regression models were used to associate the R1 and R2
   dimensions with the 61 clinical variables, including cardiovascular
   factors, diabetic blood markers, social demographics, lifestyle,
   physical measures, cognitive scores, and imaging-derived phenotypes.
   The radar plot shows representative clinical variables; all other
   results are presented in Supplementary eTable [241]5. The gray circle
   lines indicate the p value threshold in both directions (Bonferroni
   correction for the 61 variables: −log[10](p value) >3.08). A
   positive/negative −log[10](p value) value indicates a positive/negative
   correlation (beta). Transparent dots represent the associations that do
   not pass the Bonferroni correction; the blue-colored dots and
   red-colored dots indicate significant associations for the R1 and R2
   dimensions, respectively. C Genome-wide association studies demonstrate
   that the R2 dimension is associated to a larger extent with genomic
   loci and genes previously associated with AD-related traits in the
   literature (genome-wide p value threshold with the red line: −log[10](p
   value) >7.30). Each genomic locus is represented by its top lead SNP.
   The R1 dimension identified 8 (blue-colored in bold) out of the 49
   mapped genes associated with AD-related traits. The R2 dimension
   identified 13 (red-colored in bold) out of 40 mapped genes associated
   with AD-related traits. Gene annotations were performed via positional,
   expression quantitative trait loci, and chromatin interaction mappings
   using FUMA (Supplementary eTable [242]6 for all mapped genes)
   [[243]58]. The genomic loci and mapped genes were manually queried in
   the GWAS Catalog [[244]55] to determine whether they were previously
   associated with AD (newly identified or not). * denotes that the
   genomic locus is newly identified. D Besides AD-related traits, the
   genes and genomic loci in the two dimensions were also associated with
   other clinical traits, including inflammation, neurohormones, and
   imaging-derived phenotypes, as shown in the literature from the GWAS
   Catalog [[245]55]. The flowchart first maps the genomic loci and genes
   (left) identified in the two dimensions onto the human genome (middle).
   It then links these variants to any clinical traits identified in
   previous literature from the GWAS Catalog (right). In the middle of the
   human genome, we show chromosomes 1 to 22 (above to below); the blue
   and red-colored genes are AD-related for the R1 and R2 dimensions,
   respectively. The black-colored genes (C) are not annotated. INF
   inflammation, PD psychiatric disorder, PM physical measure; “New”
   (corresponding to the newly identified loci/genes in C) indicates that
   the locus or gene was not associated with any traits in the literature.
   DSST Digit Symbol Substitution Test, TMT Trail Making Test, CRP
   C-reactive protein, AD Alzheimer’s disease, PD Parkinson’s disease, INF
   inflammation, IDP imaging-derived phenotype.

Twenty-four genomic loci and 77 genes unrelated to APOE are associated with
the R1 and R2 dimensions in the general population

   GWAS identified 24 genomic loci, 14 of which are newly identified (not
   previously associated with any traits in GWAS Catalog), and 77
   positionally and functionally mapped genes unrelated to APOE associated
   with R1 or R2. In particular, the R1 dimension was significantly
   associated with 11 genomic loci and 49 genes. Eight genes (blue-colored
   genes in Fig. [246]2D) were previously associated with AD-related
   traits; 12 newly identified loci/genes have not been previously
   associated with any clinical traits. The R2 dimension was significantly
   associated with 13 genomic loci and 40 annotated genes. 13 genes
   (red-colored genes in Fig. [247]2D) were associated with AD-related
   traits; 8 loci/genes were newly identified (Fig. [248]2C and
   Supplementary eTable [249]6). These genomic loci and genes were also
   associated with many clinical traits in the literature from the GWAS
   Catalog [[250]55]. These included hormones (e.g., sex hormone-binding
   globulin measurement vs. CCKN2C), inflammatory factors (e.g.,
   macrophage inflammatory protein 1b measurement vs. CDC25A),
   imaging-derived phenotypes (e.g., cerebellar volume measurement from
   MRIs vs. DMRTA2), and psychiatric disorders (e.g., unipolar depression
   vs. ASTN2) (Fig. [251]2D). Details of the GWAS Catalog results are
   presented in Supplementary eFile [252]1. The Manhatton and QQ plots of
   the baseline GWAS are presented in Supplementary eFig. [253]9. The LDSC
   [[254]56] intercept of the two GWASs was close to 1, indicating no
   substantial genomic inflation (R1 = 1.0032 ± 0.0084;
   R2 = 1.023 ± 0.0084). Furthermore, our main GWASs using European
   ancestry were robust in three sensitivity check analyses: split-sample,
   sex-stratified, and mixed-effect [[255]57] linear model analyses.
   Detailed results are presented in Supplementary eText [256]1 and
   Supplementary eFile [257]2–[258]4.

   The two dimensions were significantly heritable in the general
   population based on the SNP-based heritability estimates (R1:
   h^2 = 0.49
   [MATH: <mrow><mo>±</mo><mspace width="0.25em"></mspace></mrow> :MATH]
   0.02; R2: h^2 = 0.55
   [MATH: <mo>±</mo> :MATH]
   0.02). The PRS of AD showed a marginally positive association with the
   R2 dimension [−log[10](p value) = 1.42], but not with the R1 dimension
   [−log[10](p value) = 0.47 < 1.31] in this population.

Genes associated with the R1 and R2 dimensions are overrepresented in organs
beyond the brain in the general population

   Tissue specificity analyses test whether the mapped genes are
   overrepresented in differentially expressed gene sets (DEG) in one
   organ/tissue compared to all other organs/tissues using different gene
   expression data [[259]58]. The genes associated with the R1 dimension
   were overrepresented in the caudate, hippocampus, putamen, amygdala,
   substantia nigra, liver, heart, and pancreas; the genes associated with
   the R2 dimension were overrepresented in the caudate, hippocampus,
   putamen, amygdala, anterior cingulate, pituitary, liver, muscle,
   kidney, and pancreas (Fig. [260]3A and Supplementary eFig. [261]10).
   Genes in DEG over-expressed in the heart were only associated with R1,
   while those in DEG over-expressed in the pituitary gland, muscle, and
   kidney were unique in R2. The expression values of every single gene
   for all tissues are presented in Supplementary eFig. [262]11.

Fig. 3. Tissue specificity and biological pathway enrichment analysis of the
R1 and R2 dimensions in the general population.

   [263]Fig. 3
   [264]Open in a new tab

   A Tissue specificity analyses show that genes associated with the two
   dimensions of neurodegeneration are overrepresented in organs/tissues
   beyond the human brain (R1 and R2). The unique overrepresentation of
   genes in differentially expressed gene sets (DEG) in the heart (R1) and
   the pituitary gland, muscle, and kidney (R2) may imply the involvement
   of inflammation [[265]12, [266]75, [267]76] and neurohormone
   dysfunction [[268]15–[269]17], respectively. The GENE2FUNC [[270]58]
   pipeline from FUMA was performed to examine the overrepresentation of
   prioritized genes (Fig. [271]2C) in pre-defined DEGs (up-regulated,
   down-regulated, and both-side DEGs) from different gene expression
   data. The input genes (Fig. [272]2C) were tested against each DEG using
   the hypergeometric test. We present only the organs/tissues that passed
   the Bonferroni correction for multiple comparisons. B Gene set
   enrichment analysis shows that genes associated with the two dimensions
   are enriched in different biological pathways. For example, genes
   associated with the R1 dimension are implicated in down-regulated
   macrophage functions, which have been shown to be associated with
   inflammation [[273]13]. In contrast, the R2 dimension is enriched in AD
   hallmarks (e.g., hippocampus), AD-related gene sets, and the pathway
   involved in dendritic cells, which may regulate amyloid-β-specific
   T-cell entry into the brain [[274]60]. Both dimensions are enriched in
   gene sets involved in cancer, which may indicate overlapped genetic
   underpinnings between AD and cancer [[275]59]. The GENE2FUNC [[276]58]
   pipeline from FUMA was performed to examine the enrichment of
   prioritized genes (Fig. [277]2C) in pre-defined gene sets.
   Hypergeometric tests were performed to test whether the input genes
   were overrepresented in any pre-defined gene sets. Gene sets were
   obtained from different sources, including MsigDB [[278]95] and GWAS
   Catalog [[279]55]. We show the significant results from gene sets
   defined in the GWAS Catalog, curated gene sets, and immunologic
   signature gene sets. All results are shown in Supplementary eTable
   [280]7). C The target-drug-disease network for R1 and R2-associated
   genes provides great potential for drug discovery and repurposing.
   R1-annotated “druggable genes” were developed for cardiovascular
   diseases, various cancers, and inflammation, whereas R2-annotated
   “druggable genes” were developed for diseases of the nervous system
   (e.g., Parkinson’s disease). For the target-drug-disease network, the
   5^th level of the Anatomical Therapeutic Chemical (ATC) code is
   displayed for the DrugBank database [[281]96], and the disease name
   defined by the International Classification of Diseases (ICD-11) code
   is showed for the Therapeutic Target Database [[282]97]. The human
   anatomy was created with [283]https://www.biorender.com/.

Genes associated with the R1 and R2 dimensions are enriched in key biological
pathways in the general population

   Genes associated with the two dimensions were enriched in different
   biological pathways. Genes associated with the two dimensions were
   implicated in several types of cancer, including up-regulation of
   fibroblast, breast cancer, and neuroblastoma tumors (Fig. [284]3B),
   which indicate a certain extent of genetic overlaps and shared pathways
   that may explain the intriguing inverse relationship between AD and
   cancer [[285]59]. Genes associated with the R1 dimension were
   implicated in pathways involved in the down-regulation of macrophages
   (Fig. [286]3B), which are involved in the initiation and progression of
   various inflammatory processes, including neuroinflammation and AD
   [[287]13]. Inflammation is also known to be associated with vascular
   compromise and dysfunction. This further concurs with the stronger
   cardiovascular profile of R1, especially with increased WML and
   predominant SPARE-BA increases. Genes associated with the R2 dimensions
   were enriched in pathways involved in AD onset, hippocampus-related
   brain volumes, and dendritic cells (Fig. [288]3B). In particular,
   dendritic cells may regulate amyloid-β-specific T-cell entry into the
   brain [[289]60], as well as the inflammatory status of the brain
   [[290]61]. The gene set enrichment analysis results are presented in
   Supplementary eTable [291]7.

Genes associated with the R1 and R2 dimensions show potential for drug
discovery and repurposing

   We queried whether these 77 genes associated with R1 and R2 are
   “druggable genes” from the constructed target-drug-disease network—the
   target genes express proteins to bind drug-like molecules, and the drug
   is at any stage of the clinical trial. For the 49 R1-annotated genes, 9
   genes were targets for 15 drugs and drug-like molecules, treating
   various cancer, inflammation, and cardiovascular dysfunctions. For the
   40 R2-annotated genes, 6 genes were targets for 7 drugs developed for
   diseases of the nervous system, such as Parkinson’s (Fig. [292]3C). The
   pharmacological mechanisms targeted by these identified drugs are
   largely related to the pathogenesis of AD in previous literature. For
   example, FDA-approved Niacin [R1; target gene: NNMT; Anatomical
   Therapeutic Chemical (ATC) code: C10AD02] is a B vitamin used to treat
   various deficiencies and diseases in the cardiovascular system,
   including myocardial infarctions [[293]62], hyperlipidemia [[294]63],
   and coronary artery disease [[295]64]. Interestingly, a recent study
   [[296]65] showed that Niacin detained AD progression in a 5xFAD mice
   model. The niacin receptor HCAR2 modulates microglial response to
   amyloid deposition, ultimately alleviating neuronal loss and cognitive
   decline. Other drugs for potential drug repurposing of AD are the
   FDA-approved Docetaxel (R1; target gene: MAP4; ATC: L01CD02) and
   Paclitaxel (R1; target gene: MAP4; ATC: L01CD01), which both target
   various cancers, including breast cancer and metastatic prostate
   cancer. The intriguing inverse relationship between AD and cancer has
   long been established, but the underlying shared etiology remains
   unclear [[297]43]. One hypothesis was that microtubule-associated
   protein tau—a pathological biomarker of AD—was associated with
   resistance to Docetaxel in certain cancer treatments [[298]66]. In
   addition, Docetaxel impacted the blood-brain barrier function of breast
   cancer brain metastases [[299]67]. Another drug called KM-819 (R2;
   target gene: FAF1) is currently in Phase 1 for a clinical trial of
   Parkinson’s disease [[300]68], which aims to suppress
   α-synuclein-induced mitochondrial dysfunction [[301]69], consistent
   with the mitochondrial hypothesis [[302]70] of AD. To sum up, R1 and R2
   show distinct landscapes of the “druggable genome” [[303]71] on drug
   discovery and repurposing [[304]72] for future clinical translation.

The longitudinal rate of change in the R2 dimension, but not R1, is
marginally associated with the APOE ε4 allele, tau in cognitively unimpaired
individuals

   Using cognitively unimpaired participants from ADNI and BLSA,
   longitudinal brain association studies showed that the rate of change
   in the R1 dimension was associated with the change of brain volume in
   widespread brain regions. In contrast, the rate of change in the R2
   dimension was associated with the change of brain volume in the focal
   medial temporal lobe (Fig. [305]4A and Supplementary eTable [306]8 for
   p values and effect sizes). This further indicates that the two
   dominant patterns discovered cross-sectionally also progress in
   consistent directions longitudinally. The two dimensions were not
   associated with CSF biomarkers (Aβ42, tau, and p-tau) and the APOE ε4
   allele (rs429358) at baseline [−log[10](p value) < 1.31)]. The rate of
   change of the R2 dimension, but not R1, was marginally [nominal
   threshold: −log[10](p value) >1.31] associated with the APOE ε4 allele,
   the CSF level of tau, and p-tau (Fig. [307]4B and Supplementary eTable
   [308]9 for p values and effect sizes), but they did not survive the
   Bonferroni correction [−log[10](p value) = 2.95]. The longitudinal rate
   of change of both dimensions was negatively associated [−log[10](p
   value) >2.95] with the total CSF level of Aβ42.

Fig. 4. The longitudinal rate of change in R1 and R2 in the cognitively
unimpaired population.

   [309]Fig. 4
   [310]Open in a new tab

   A Longitudinal brain association studies show that the R1 dimension
   exhibits longitudinal brain volume decrease in widespread brain
   regions, whereas the R2 dimension displays longitudinal brain volume
   decrease in the focal medial temporal lobe. We first derived the rate
   of change of the 119 GM ROIs and the R1 and R2 dimensions using a
   linear mixed effect model; a linear regression model was then fit to
   the rate of change of the ROIs, R1, and R2 to derive the beta
   coefficient value of each ROI. A negative value denotes longitudinal
   brain changes with a negative coefficient of the rate of change in the
   linear regression model. p value and effect sizes (r, Pearson’s
   correlation coefficient) are presented in Supplementary eTable [311]8.
   The brain maps denote the signed p value, and the range of r for each
   dimension is also shown. Of note, the sample size (N) for R1 and R2 is
   the same for each ROI. B The rate of change, not the baseline
   measurement, in the two dimensions is negatively associated with the
   CSF level of Aβ42 (Bonferroni correction for the 45 variables:
   −log[10](p value) >2.95). The rate of change in the R2 dimension, not
   the R1 dimension, was marginally (−log[10](p value) >1.31) associated
   with the CSF level of tau and p-tau, and APOE ε4. All other clinical
   associations are presented in Supplementary eTable [312]9. The
   gray-colored circle lines indicate different p value thresholds in both
   directions (Bonferroni correction for the 45 variables: −log[10](p
   value) >2.95 and the nominal p value threshold: −log[10](p value)
   >1.31). A positive/negative −log[10](p value) value indicates a
   positive/negative correlation (beta). Transparent dots represent the
   associations that do not pass the nominal p value threshold [log[10](p
   value) = 1.31]; the blue-colored dots and red-colored dots indicate
   significant associations [log[10](p value) >1.31] for the R1 and R2
   dimensions, respectively.

   We tested these associations using cognitively unimpaired individuals
   with a high risk of AD based on their family history from the
   PREVENT-AD cohort. Similarly, at baseline, the two dimensions were not
   associated with CSF biomarkers or the APOE ε4 allele (rs429358). The
   longitudinal rate of change in the R2 dimension, but not R1, was
   marginally [nominal threshold: −log[10](p value) >1.31] associated with
   the APOE ε4 allele [−log[10](p value) = 1.92], the CSF level of tau
   [−log[10](p value) = 1.65], and p-tau [−log[10](p value) = 1.66].

   Longitudinal brain association studies also confirmed the longitudinal
   progression of the two dimensions in the MCI/AD population
   (Supplementary eFig. [313]12A). The rates of change in the two
   dimensions were both associated with APOE ε4 [−log[10](p value) = 12.54
   for R1 and 9.05 for R2] in GWAS (Supplementary eFig. [314]12B), and
   related to CSF levels of tau [−log[10](p value) = 16.47 for R1 and 9.73
   for R2], p-tau [−log[10](p value) = 19.13 for R1 and 10.81 for R2], and
   Aβ42 [−log[10](p value) = 13.64 for R1 and 13.55 for R2] (Supplementary
   eFig. [315]12C).

Discussion

   The current study leveraged a deep semi-supervised representation
   learning method to establish two predominant dimensions in the
   symptomatic MCI/AD population, which were independently found to be
   expressed, to a lesser degree, in three asymptomatic populations. In
   particular, the R1 dimension represented a “diffuse-AD” atrophy
   pattern: varying degrees of brain atrophy throughout the entire brain.
   In contrast, the R2 dimension showed an “MTL-AD” atrophy pattern: brain
   atrophy predominantly concentrated in the medial temporal lobe (Fig.
   [316]1A). Importantly, only R2 was found to be significantly associated
   with genetic variants of the APOE genes in MCI/AD patients.
   Furthermore, our study examined early manifestations of the R1 and R2
   dimensions in asymptomatic populations with varying levels of AD risks
   and their associations with genetics, amyloid plaques and tau tangles,
   biological pathways, and body organs. We identified that 24 genomic
   loci, 14 of which are GWAS identified 24 genomic loci, 14 of which are
   newly identified, and 77 annotated genes contribute to early
   manifestations of the two dimensions. Functional analyses showed that
   genes unrelated to APOE were overrepresented in DEG sets in organs
   beyond the brain (R1 and R2), including the heart (R1) and the
   pituitary gland (R2), and enriched in several biological pathways
   involved in dendritic cells (R2), macrophage functions (R1), and cancer
   (R1 and R2). Several of these genes were “druggable genes” for cancer
   (R1), inflammation (R1), cardiovascular diseases (R1), and diseases of
   the nervous system (R2). Longitudinal findings in the cognitively
   unimpaired populations showed that the rate of change of the R2
   dimension, but not R1, was marginally associated with the APOE ε4
   allele, the CSF level of tau, and Aβ42 (R1 and R2). Our findings
   suggested that diverse pathologic processes, including cardiovascular
   risk factors, neurohormone dysfunction, and inflammation, might occur
   in the early asymptomatic stages, supporting and expanding the current
   amyloid cascade (Fig. [317]5) [[318]7, [319]8].

Fig. 5. Genes unrelated to APOE influence early manifestations of R1 and R2.

   [320]Fig. 5
   [321]Open in a new tab

   Genes unrelated to APOE and overrepresented in organs beyond the human
   brain are associated with early manifestations of the R1 (diffuse-AD)
   and R2 (MTL-AD) dimensions, which capture the heterogeneity of
   AD-related brain atrophy. For visualization purposes, we display the
   two genes with the highest expression values in the tissue specificity
   analyses for each organ/tissue. The black arrow line emulates the
   longitudinal progression trajectory along these two dimensions. The
   positions of beta-amyloid, tau, and APOE (increasing APOE-mediated
   progression) indicate the time point when they are associated with the
   two dimensions. The blue/red gradient-color background indicates a
   higher influence of APOE-related genes (left to right; early to late
   stages). The brain atrophy patterns are presented in the 3D view. In
   early asymptomatic stages, the R1-related genes are implicated in
   cardiovascular diseases and inflammation; the R2-related genes are
   involved in hormone-related dysfunction. Critically, longitudinal
   progression of the dimension demonstrates an impact of the APOE genes
   in early asymptomatic stages in R2, but this longitudinal effect occurs
   only in late symptomatic stages in R1. These results suggest that
   comorbidities (e.g., cardiovascular conditions) or normal aging in R1
   may alter or delay the trajectory of neurodegeneration in early
   asymptomatic stages; APOE-related genes may play a pronounced role in
   the acceleration and progression in late symptomatic stages for both
   dimensions. Of note, the underlying pathological processes that
   initiate and drive the progression of the two dimensions are not
   mutually exclusive. Hence, both R1 and R2 can be co-expressed in the
   same individual. In addition, the two dimensions can also be affected
   by other AD hypotheses, such as the mitochondrial hypothesis [[322]70]
   and the metabolic hypothesis [[323]86]. MTL medial temporal lobe. The
   human anatomy was created with [324]https://www.biorender.com/.

   AD has been regarded as a CNS disorder. However, increasing evidence
   has indicated that the origins or facilitators of the pathogenesis of
   AD might involve processes outside the brain [[325]6]. For example,
   recent findings revealed that gut microbiota disturbances might
   influence the brain through the immune and endocrine system and the
   bacteria-derived metabolites [[326]73, [327]74]. Our findings support
   the view that multiple pathological processes might contribute to early
   AD pathogenesis and identify non-APOE genes in the two dimensions
   overrepresented in tissues beyond the brain (e.g., the heart, pituitary
   gland, muscle, and kidney). Pathological processes may be involved in
   different cells, molecular functions, and biological pathways,
   exaggerating amyloid plaque and tau tangle accumulation and leading to
   the downstream manifestation of neurodegeneration and cognitive
   decline.

   The genetic and clinical underpinnings of the R1 dimension support
   inflammation, as well as cardiovascular diseases, as a core pathology
   contributing to AD [[328]12, [329]75, [330]76]. Genes associated with
   the R1 dimension were previously associated with various
   inflammation-related clinical traits (Fig. [331]2D), and enriched in
   biological pathways involved in immunological response (e.g.,
   up-regulation in macrophages [[332]77], Fig. [333]3B). In addition,
   genes in this dimension were overrepresented in DEG sets in the heart
   (Fig. [334]3A). Previous literature indicated that inflammation is
   likely an early step that initiates the amyloidogenic pathway—the
   expression of inflammatory cytokines leads to the production of
   β-amyloid plaques [[335]13]. Several markers of inflammation are also
   present in serum and CSF before any indications of Aβ or tau tangles
   [[336]78]. For example, clusterin, a glycoprotein involved in many
   processes and conditions (e.g., inflammation, proliferation, and AD)
   induced by tumor necrosis factor (TNF), was present ten years earlier
   than Aβ deposition [[337]79]. In addition, the R1 dimension was also
   strongly associated with cardiovascular and diabetes biomarkers (Fig.
   [338]2B). Inflammatory processes have been critical, well-established
   risk factors for compromised cardiovascular function [[339]80], such as
   coronary artery disease and the breakdown of the blood-brain barrier.
   Our results corroborated the close relationships between AD,
   cardiovascular diseases, and inflammation.

   The genetic and clinical underpinnings of the R2 dimension support that
   neuroendocrine dysfunction might be an early event contributing to the
   pathogenesis of AD [[340]16, [341]17]. Genes in the R2 dimension were
   previously associated with different hormone and pancreas-related
   traits from GWAS Catalog (Fig. [342]2D); they were also overrepresented
   in DEG in the pituitary and pancreas glands, muscle and kidney (Fig.
   [343]3A), which are master glands or key organs in the endocrine system
   [[344]81]. Previous literature suggested that neuroendocrine
   dysfunction might contribute to AD development by secreting
   neurohormonal analogs and affecting CNS function [[345]16]. For
   example, luteinizing hormone-releasing hormone and follicle-stimulating
   hormone in serum or neurons were associated with the accumulation of Aβ
   plaques in the brain [[346]17, [347]82, [348]83]. However, early
   experimental studies on antagonists of Luteinizing hormone-releasing
   hormone and growth hormone-releasing hormone in animal models of AD
   have shown promising but not entirely convincing evidence [[349]16].
   Taken together, neurodegeneration in the R2 dimension represents an
   AD-specific phenotype that might be driven by hormonal dysfunction,
   leading to rapid accumulation of amyloid plaques, and was potentially
   accelerated by the APOE ε4 allele—the rate of change in R2, but not R1,
   was associated with the APOE ε4 allele in cognitively unimpaired
   individuals (Fig. [350]4B).

   The hypothesized implications above of the R1 and R2 dimensions on
   inflammation, cardiovascular functions, and neuroendocrine dysfunctions
   are not mutually exclusive and may collectively contribute to AD
   pathogenesis. It has been shown that dysregulation of the
   hypothalamic-pituitary-gonadal axis is associated with dyotic
   signaling, modulating the expression of TNF and related cytokines in
   systemic inflammation, and the induction of downstream
   neurodegenerative cascades within the brain [[351]84, [352]85]. These
   studies hypothesized that the neuroendocrine dysfunction and the
   inflammation mechanism might be the upstream and downstream
   neuropathological processes along the disease course of AD [[353]16].
   That is, the loss of sex steroids and the elevation of gonadotropins
   might lead to a higher level of inflammatory factors in the brain.
   Finally, other competing hypotheses may also play a role in developing
   AD in early asymptomatic stages, including the mitochondrial hypothesis
   [[354]70], the metabolic hypothesis [[355]86], and the tau hypothesis
   [[356]3].

   The NIA-AA framework [[357]87] claims that AD is a continuum in which
   AD pathogenesis is initiated in early asymptomatic cognitively
   unimpaired stages and progresses to amyloid-positive and tau-positive
   (A+T+) in late symptomatic stages [[358]87]. Our findings are
   consistent with this framework and elucidate the cross-sectional and
   longitudinal associations of the two dimensions with genetic and
   clinical markers from early asymptomatic to late symptomatic stages. In
   early asymptomatic stages, the rates of change in the two dimensions
   are both associated with amyloid. However, only the R2 dimension, not
   R1, is marginally associated with the APOE ε4 allele and the CSF level
   of tau (Fig. [359]4B). In contrast, in late symptomatic stages, the
   rates of change in the two dimensions are both associated with the APOE
   ε4 allele, CSF levels of tau, p-tau, and amyloid. Our findings suggest
   that comorbidities or normal aging in R1 may alter the rate or
   trajectory of neurodegeneration at early asymptomatic stages, but
   APOE-related genes might play a more pronounced role in the
   acceleration and progression during late symptomatic stages for both
   dimensions (Fig. [360]5).

   Several recent studies [[361]88–[362]92], as detailed in an insightful
   overview by Luo et al., collectively provide a comprehensive
   transcriptomics and epigenomics atlas depicting AD progression at the
   single-cell level [[363]93]. Similarly, researchers have also proposed
   a new theory suggesting that Alzheimer’s disease may not only be a
   brain disorder but could also be considered an autoimmune disease
   [[364]94]. These studies highlight the involvement of microglia-related
   inflammation, lipid metabolism, and mitochondrial dysfunction. This
   substantiates the primary hypothesis in our study: the two dimensions
   are linked to diverse pathological mechanisms, encompassing
   cardiovascular diseases, inflammation, and hormonal dysfunction,
   potentially driven by genes beyond APOE.

Limitations

   This study has several limitations. Firstly, there is a need for
   longitudinal data from the general population, as exemplified by the UK
   Biobank, to provide further validation for the hypotheses proposed to
   cover the entire AD spectrum in the same population. Secondly, it is
   essential to extend the generalization of the current GWAS findings to
   include underrepresented ethnic groups, going beyond the European
   ancestry populations.

Outlook

   In conclusion, the current study used a novel deep semi-supervised
   representation learning method to establish two AD dimensions. Our
   findings support that those diverse pathological mechanisms, including
   cardiovascular diseases, inflammation, hormonal dysfunction, and
   involving multiple organs, collectively affect AD pathogenesis in
   asymptomatic stages. These novel biomarkers may serve as instrumental
   variables to guide future treatments in the early asymptomatic stages
   of AD, targeting multi-organ dysfunction beyond the brain.

Supplementary information

   [365]Supplementary eFiles^ (334.6KB, xlsx)
   [366]Supplementary Materials^ (7.5MB, docx)

Acknowledgements