Graphical abstract graphic file with name fx1.jpg [45]Open in a new tab Highlights * • The early-stage protein panel distinguishes amyloid PET-positive and negative cases * • The late-stage protein panel identifies tau PET status in amyloid-positive individuals * • The protein panel predicts dementia progression and cognitive decline over 10 years * • Identified proteins are implicated in synaptic plasticity and metabolic dysfunction __________________________________________________________________ Wang et al. apply machine learning to proteomic profiles to identify proteins predictive of Alzheimer’s disease pathology stages. They develop two models: one for early-stage amyloid positivity and another for assessing tau pathology severity in amyloid-positive patients. Both models can predict dementia progression and cognitive decline over 10 years. Introduction Alzheimer’s disease (AD) is the primary cause of dementia, characterized by heightened aggregated amyloid-β (Aβ) peptide and tau neurofibrillary tangles (NFTs) in the brain, with or without cognitive impairment.[46]^1 Currently, assessing Aβ plaques and NFT neuropathological burden relies on positron emission tomography (PET) imaging with radioactive tracers or molecular protein biomarkers in cerebrospinal fluid (CSF) and, recently, blood, available at clinical or research levels.[47]^1 PET imaging, the most accurate AD pathology diagnostic marker validated by autopsy data,[48]^2^,[49]^3 is limited by high cost and complex infrastructure to specialized centers. Conversely, fluid biomarkers are more affordable and clinically accessible; however, current established AD biomarkers (i.e., Aβ42, phosphorylated tau [p-tau]181, and total tau [t-tau]) mainly focus on early AD pathology, responding to Aβ plaques but lacking sensitivity for moderate to late tau pathology.[50]^4^,[51]^5 Staging the severity of Aβ and tau pathology is crucial for several reasons. Firstly, accurately discriminating the severity of AD pathology better reflects future cognitive function. Evidence suggests that cognitively normal individuals with amyloid PET-positive (A+) and tau PET-positive (T+) face a higher risk of cognitive decline than those with A+T−.[52]^6 Secondly, precise AD pathological staging aids in establishing inclusion and exclusion criteria; for instance, the Trailblazer-Alz study used amyloid PET stage as an inclusion criterion in which the doses were adjusted based on tau PET stage.[53]^7 Moreover, the results of the Trailblazer-Alz study revealed benefits in A+ individuals with low/medium tau, not high tau pathology,[54]^7 suggesting that capturing tau pathology stage can offer clinical guidance for improved treatment responses in patients. CSF proteins, directly linked to the brain and less affected by peripheral factors, offer unique insights into normal and abnormal brain pathophysiology compared to plasma biomarkers. Moreover, CSF proteomics, spanning various biological processes, holds promise for advancing etiological understanding and refining promising biomarkers. Most previous CSF proteomic studies have focused on asymptomatic and symptomatic AD[55]^8^,[56]^9 or CSF-based biologically defined AD[57]^10 or sporadic and genetic AD[58]^11; however, few studies have investigated potential CSF biomarkers for identifying specific pathological stages of AD based on PET scans. In addition, with the advent of modified nucleic acid aptamer (SomaScan)-targeted assay platforms,[59]^12 large-scale CSF proteomic profiling (>7,000 proteins) at a population scale has become feasible; however, few studies have explored proteomic profiling in AD across this platform. Here, we employed large-scale CSF proteomic profiling combined with machine learning to identify candidate biomarkers for discriminating amyloid PET and tau PET stages and compared their performance with established AD biomarkers. We developed two protein panels to differentiate early and late disease stages as defined by PET imaging. These models outperformed established CSF AD biomarkers (i.e., Aβ42 and p-tau181) in distinguishing disease stages. Furthermore, they were associated with dementia progression and cognitive decline over the subsequent decades. The proteins in the early-stage panel were enriched for synaptic damage and compensatory processes, while those in the late-stage panel were linked to metabolic dysfunction pathways. These findings not only provide more sensitive and specific protein markers for tracking disease progression but also offer insights into the temporal evolution of pathophysiological mechanisms underlying AD progression. Results Study design and participants The study design is depicted in [60]Figure 1. The primary objective was to identify CSF proteins exhibiting changes in early and late stages of AD as assessed by amyloid PET and tau PET. A total of 6,164 proteins meeting quality control criteria were included in the analysis ([61]STAR Methods). Established AD biomarkers, including CSF Aβ42, p-tau181, t-tau, and the p-tau181/Aβ[42] ratio, were included for comparison. Machine learning techniques were employed to pinpoint a subset of proteins capable of accurately distinguishing between early and late AD stages. These proteins were further analyzed in relation to dementia progression and cognitive decline. Validation was performed using both external and internal cohorts. Figure 1. [62]Figure 1 [63]Open in a new tab Study workflow This workflow outlines the primary analytical process of the study. Proteomic analysis identified proteins associated with amyloid PET or tau PET. LASSO models integrated with machine learning selected target proteins. The performance of protein panels in differentiating AT stages was compared with established CSF AD biomarkers. Additionally, the relationship between protein panels and dementia progression and cognitive decline was evaluated. Finally, findings were validated in external and internal cohorts The primary cohort included 136 non-demented individuals ([64]Table 1), with a mean age of 71.50 ± 2.98 years, 41.20% female, and 34.60% carrying at least one APOE ε4 allele. Participants were followed for an average of 4.4 years (SD: 2.98). During this period, 14 participants (11.80%) transitioned from amyloid PET negative (A−) to positive (A+), and 31 participants (22.80%) progressed to dementia. For autopsy validation, an independent cohort of 54 participants was analyzed, including 11 with none/low AD neuropathological changes (ADNCs) and 43 with intermediate/high ADNCs ([65]Table S1). Additionally, a longitudinal population of 354 non-demented individuals was evaluated for clinical progression over a mean duration of 3.70 years (SD: 2.25) ([66]Table S2). Table 1. Study populations Overall A−T− A+T− A+T+ A−T+ N 136 66 30 33 7 Age, mean (SD) 71.5 (6.9) 69.7 (6.8) 73.5 (6.4) 73.6 (7.0) 70.0 (5.6) Gender, n (%) female 56 (41.2) 30 (45.5) 10 (33.3) 12 (36.4) 4 (57.1) APOE ε4 carriers (%) 47 (34.6) 11 (16.7) 9 (30.0) 24 (72.7) 3 (42.9) Years of education, mean (SD) 16.5 (2.6) 16.8 (2.7) 16.9 (2.7) 15.6 (2.2) 17.0 (4.2) MMSE score at baseline, mean (SD) 28.5 (1.6) 28.9 (1.3) 28.2 (2.1) 28.1 (1.6) 28.9 (1.3) CDRSB score at baseline, mean (SD) 0.7 (0.9) 0.6 (0.8) 0.6 (0.9) 1.1 (1.1) 0.6 (0.6) ADAS13 score at baseline, mean (SD) 11.8 (6.2) 9.8 (4.9) 10.3 (4.2) 16.8 (7.4) 13.6 (5.7) Diagnosis, n (%) MCI 69 (50.7) 30 (45.5) 12 (40.0) 22 (66.7) 5 (71.4) Centiloid, mean (SD) 33.8 (47.9) −2.8 (10.1) 66.5 (38.4) 84.7 (37.2) −1.7 (11.2) Tau PET Braak I-IV SUVR, mean (SD) 1.3 (0.3) 1.1 (0.1) 1.2 (0.1) 1.7 (0.3) 1.4 (0.2) Follow-up months, mean (SD) 52.7 (35.8) 57.3 (34.5) 46.2 (35.7) 48.3 (35.8) 58.1 (48.8) Progression from A− to A+, n (%)[67]^a 14 (11.8%) 0 (0%) 12 (46.2%) 2 (7.7%) 0 (0%) Progression to dementia, n (%) 31 (22.8) 4 (6.1) 4 (13.3) 21 (63.6) 2 (28.6) [68]Open in a new tab AD, Alzheimer’s disease; ADAS13, Alzheimer’s Disease Assessment Scale-Cognitive Subscale 13; CDRSB, Clinical Dementia Rating Scale-Sum of Boxes; CSF, cerebrospinal fluid; MCI, mild cognitive impairment; MMSE, Mini-Mental State Examination; PET, positron emission tomography. ^a Of the participants, 119 had available amyloid PET scans within 1 year of CSF proteomic assessments and had two or more amyloid PET scans, with some transitioning from amyloid PET-negative (A−) to amyloid PET-positive (A+) between the initial and final PET examination. Proteomic profiling associated with amyloid PET We identified proteins correlated with amyloid PET, quantified using Centiloids instead of standardized uptake value ratios (SUVRs) ([69]STAR Methods). We conducted a multivariate linear regression analysis of 6,164 proteins and CSF AD biomarkers (Aβ42, p-tau181, and t-tau), adjusted for age, gender, and time interval between CSF collection and PET scanning. This analysis identified 262 proteins significantly associated with Centiloid values (p < 0.05). After false discovery rate (FDR) correction, 10 proteins remained significant (see also [70]Figure 2A and [71]Table S3). Notably, CSF p-tau181 showed the strongest positive correlation with Centiloid (β = 20.67, FDR p = 6.91 × 10^−4),[72]^13 followed by C-C motif chemokine 25 (CCL25) (β = 20.01, FDR p = 7.24 × 10^−4), kinetochore protein Spc25 (SPC25) (β = 19.01, FDR p = 2.00 × 10^−3), coiled-coil-helix-coiled-coil-helix domain-containing protein 7 (CHCHD7) (β = 18.69, FDR p value = 4.00 × 10^−3), and SPARC-related modular calcium-binding protein 1 (SMOC1) (β = 17.87, FDR p = 0.01). SMOC1, a protein that co-localizes with Aβ plaques, has been shown to be an early biomarker of AD, and its CSF levels are elevated many years prior to the onset of symptoms.[73]^14^,[74]^15^,[75]^16 CCL25 and SPC25 are associated with neuroinflammation and have been less frequently reported in AD.[76]^17^,[77]^18 CHCHD7 is associated with mitochondrial function, and its dysfunction may contribute to neurodegenerative diseases.[78]^19^,[79]^20 Conversely, CSF Aβ42 was negatively associated with Centiloid,[80]^13 alongside by tubulin-specific chaperone A (TBCA) (β = −17.41, FDR p = 0.02), a protein involved in the microtubule protein folding pathway. Figure 2. [81]Figure 2 [82]Open in a new tab Proteomic profiling associated with amyloid PET (A) Volcano plot illustrating the association between log2 protein abundance and amyloid PET, with beta values on the x axis and two-sided p values on the y axis. The black line denotes the uncorrected p value threshold, and the yellow line represents the FDR-corrected threshold. p and beta values were obtained from multiple linear regression, adjusted for age, gender, and time interval between proteomic and amyloid PET assessments. The data underlying this figure are found in [83]Table S3. Lists of all proteins used for analyses are provided in [84]Table S21. (B) Receiver operating characteristic (ROC) curves assessing biomarker accuracy in distinguishing A+ from A− individuals. Area under the curve (AUC) values for each protein are reported. (C) Left: odds ratios (ORs) from binomial logistic regression evaluating the effect of individual biomarkers on amyloid PET positivity, adjusted for age, gender, and time interval between assessments. Right: heatmap showing the abundance of 10 selected proteins across participants, sorted by amyloid PET level, with the bottom bar indicating amyloid PET levels. (D) Protein trajectories relative to amyloid PET load. Each line represents a protein, with amyloid PET level on the x axis and Z score values on the y axis, fitted using LOESS. (E) Gene Ontology (GO) term enrichment for identified proteins. Bar graphs display GO terms with p < 0.05, a minimum count of 5, and the top five pathways for each GO term. The 10 significant proteins demonstrated moderate to strong discriminatory power between A+ and A− statuses ([85]Figure 2B). CSF Aβ42 exhibited the highest area under the curve (AUC) of 0.90, followed by SMOC1 (AUC = 0.76) and p-tau181 (AUC = 0.74). Logistic regression indicated that CSF Aβ42, SMOC1, and p-tau181 were the strongest predictors of A+ status ([86]Figure 2C). We next used locally estimated scatterplot smoothing (LOESS) model to track the trajectory of proteins along the amyloid PET. Protein trajectories analyzed via LOESS models ([87]Figure 2D) revealed that CSF Aβ42 levels sharply decreased up to 100 Centiloid units before plateauing, while TBCA levels declined post 20 Centiloid units. The remaining eight proteins showed gradual increases until 100 Centiloid units, followed by rapid ascents. Gene Ontology (GO) enrichment analysis indicated significant involvement of these proteins in chemokine receptor binding and oxidoreductase activity-related processes, particularly those utilizing NAD/NADP as acceptors in oxidation-reduction reactions ([88]Figure 2E). Additionally, enrichment was observed in cellular metabolic pathways, including nucleoside and glycosyl compound metabolism, glycolysis, and vesicular transport-related pathways such as the secretory granule lumen and cytoplasmic vesicle lumen. Proteomic profiling associated with tau PET Using multivariate linear regression, we identified 949 proteins significantly associated with tau PET (p < 0.05), with 101 proteins remaining significant after FDR correction (see also [89]Figure 3A and [90]Table S4). Key proteins included p-tau181 (β = 0.17, FDR p = 1.01 × 10^−9), t-tau (β = 0.15, FDR p = 6.57 × 10^−8), 14-3-3 protein gamma (YWHAG) (β = 0.15, FDR p = 4.19 × 10^−7), and glucose-6-phosphate isomerase (GPI) (β = 0.14, FDR p = 6.12 × 10^−6). YWHAG, a member of the 14-3-3 protein family, is strongly correlated with AD pathology and cognitive function,[91]^21^,[92]^22 while GPI is involved in brain energy metabolism and has been implicated in mild cognitive impairment and AD.[93]^23 Figure 3. [94]Figure 3 [95]Open in a new tab Proteomic profiling associated with tau PET (A) Volcano plot depicting the association between log2 protein abundance and tau PET, with beta values on the x axis and two-sided p values on the y axis. The black line indicates the uncorrected p value threshold, and the yellow line represents the FDR-corrected threshold. p and beta values were derived from multiple linear regression, adjusted for age, gender, and time interval between proteomic and tau PET assessments. The data underlying this figure are found in [96]Table S4. Lists of all proteins used for analyses are provided in [97]Table S21. (B) Receiver operating characteristic (ROC) curves evaluating biomarker accuracy in distinguishing T+ from T− individuals. Area under the curve (AUC) values for each protein are reported. (C) Left: odds ratios (ORs) from binomial logistic regression assessing the impact of individual biomarkers on tau PET positivity, adjusted for age, gender, and time interval between assessments. Right: heatmap displaying the abundance of 10 selected proteins across participants, sorted by tau PET level, with the bottom bar indicating tau PET levels. (D) Protein trajectories relative to tau PET load. Each line represents a protein, with tau PET level on the x axis and Z score values on the y axis, fitted using LOESS. (E) Gene Ontology (GO) term enrichment for identified proteins. Bar graphs present GO terms with p < 0.05, a minimum count of 5, and the top five pathways for each GO term. Receiver operating characteristic analysis demonstrated that HCLS1-associated protein X-1 (HAX1) (AUC = 0.78), YWHAG (AUC = 0.77), and N-chimerin (CHN1) (AUC = 0.76) were the strongest predictors for distinguishing tau PET-positive (T+) from negative (T−) statuses ([98]Figure 3B). HAX1 is a multifunctional protein involved in apoptosis regulation, mitochondrial membrane potential maintenance, and calcium homeostasis.[99]^24 CHN1 is a GTPase-activating protein related to neuronal axon guidance.[100]^25 However, the role of these two proteins in AD remains unclear. Moreover, logistic regression revealed that each standard deviation increase in HAX1 and YWHAG levels was associated with a 22% increased risk of T+ status ([101]Figure 3C). Protein trajectories indicated progressive increases in these proteins with rising tau PET SUVR, particularly for p-tau181, and the calcineurin (PPP3CA|PPP3R1) ([102]Figure 3D). Calcineurin, a calcium/calmodulin-dependent protein phosphatase, has been implicated in AD through its role in tau hyperphosphorylation and NFT formation.[103]^26 GO enrichment analysis of tau PET-associated proteins highlighted significant enrichment in kinase-related pathways (e.g., calcium-dependent protein kinase activity, serine/threonine kinase activity), protein degradation pathways (e.g., proteasome core complex, ubiquitin-like protein conjugating enzyme activity), and glucose metabolic processes ([104]Figure 3E). These findings suggest a complex interplay between protein post-translational modifications, protein quality control, and energy metabolism in tau-mediated neurodegeneration. Proteomic signatures for distinguishing amyloid and tau PET stages Among the 111 proteins associated with amyloid and tau PET, including three CSF AD-specific biomarkers (e.g., Aβ42, p-tau181, and t-tau), 99 distinct proteins were selected for feature selection. We employed a robust feature selection process involving 100 iterations of random sampling, extracting 80% of the data in each iteration. Utilizing the least absolute shrinkage and selection operator (LASSO) model, we identified the most informative proteins across iterations. Proteins consistently selected in over 40% of bootstrap samples (16 proteins) constituted the early-stage discriminative panel (A+ versus A−T−) ([105]Figure S1A), while those selected in more than 20% of samples (9 proteins) formed the late-stage panel (A+T+ versus A+T−) ([106]Figure S1B). Notably, 13 proteins were exclusive to the early-stage panel, six to the late-stage panel, and three were common to both, highlighting distinct proteomic signatures at different AD stages. To ensure the robustness of our predictive models and mitigate overfitting, we compared multiple machine learning algorithms, including Random Forest (RF), XGBoost (XGB), Support Vector Machine, LASSO, and generalized linear model (GLM) (A+ versus A−T−: [107]Table S5; A+T+ versus A+T−: [108]Table S6). Although RF and XGB achieved perfect discrimination (AUC = 1.0), their performance suggested potential overfitting given the current sample size.[109]^27 Therefore, we selected the GLM for its optimal balance of predictive performance, interpretability, and ability to avoid overfitting. In the early-stage protein panel, synaptotagmin-12 (SYT12), tropomodulin-2 (TMOD2), calcium/calmodulin-dependent protein kinase type II subunit beta (CAMK2B), and microtubule-associated proteins 1A/1B light chain 3A (MAP1LC3A) were the most important proteins in the models ([110]Figure S1C). These proteins are primarily involved in synaptic function and plasticity, with SYT12 serving as a presynaptic protein and CAMK2B playing a critical role in calcium signaling and synaptic plasticity.[111]^28^,[112]^29 Additionally, TMOD2 and MAP1LC3A are associated with cytoskeletal regulation and autophagy pathways, respectively,[113]^30^,[114]^31 indicating multifaceted cellular maintenance mechanisms in AD pathology. For the late-stage panel, TMOD2, BLOC-1-related complex subunit 5 (BORCS5), and CAMK2B were identified as key proteins ([115]Figure S1D). BORCS5 is integral to cellular degradation and autophagic processes,[116]^32 although its specific association with AD requires further investigation. The selection of these proteins underscores the progression of AD-related proteomic alterations from synaptic and structural dysfunctions to impaired cellular degradation mechanisms. To elucidate the biological functions of the identified protein panels, we performed enrichment analysis ([117]Figure S1E). Our enrichment analysis revealed that, during the early stages, there is a significant upregulation of proteins involved in synaptic function and plasticity pathways. This suggests a compensatory mechanism by which neurons enhance synaptic transmission and plasticity to counteract early Aβ toxicity. In contrast, the late stages of AD show protein enrichment in downregulated pathways related to energy metabolism, calcium homeostasis, and insulin signaling, indicating metabolic dysfunction and cellular stress contributing to neurodegeneration. These findings offer insights into the temporal progression of AD pathology, emphasizing early neuronal compensatory upregulation of synaptic function in response to Aβ accumulation and the subsequent metabolic dysfunction and calcium homeostasis disturbances associated with advanced stages characterized by both Aβ and tau pathologies. Predictive performance for early-stage and late-stage protein panels Next, we compared the accuracy of protein panels, established AD biomarkers, and baseline models (including age, gender, and APOE ε4 status) in distinguishing AT stages. The early-stage protein panel demonstrated superior accuracy (AUC = 0.95) in distinguishing A+ from A−T−, outperforming CSF Aβ42 (AUC = 0.90, p = 0.08) and baseline models (AUC = 0.78, p < 1.78 × 10^−5) ([118]Figure 4A). Similarly, the late-stage protein panel achieved an AUC of 0.92 in distinguishing A+T+ from A+T−, exceeding baseline models (AUC = 0.74, p = 0.001) and CSF Aβ42 (AUC = 0.70, p = 0.005) ([119]Figure 4B). Sensitivity analyses excluding participants with missing proteomic data confirmed the consistency of these AUC values ([120]Figure S2). Figure 4. [121]Figure 4 [122]Open in a new tab Predictive performance for early-stage and late-stage protein panels (A) Comparison of CSF AD biomarkers (Aβ42, CSF p-tau181, and CSF t-tau), baseline models (age, gender, and APOE ε4 status), and two protein panels in distinguishing A+ from A−T−. The data underlying this figure are found in [123]Table S5. (B) Comparison of the same biomarkers and models in distinguishing A+T+ from A+T−. The data underlying this figure are found in [124]Table S6. (C) Discrimination ability of the top four proteins in protein panels and CSF p-tau181/Aβ42 alone, as well as CSF AD biomarkers and the baseline model with or without the addition of the four proteins in distinguishing A+ from A−T−. (D) Similar comparison for distinguishing A+T+ from A+T−. (E) Two-sample t tests comparing levels of CSF AD biomarkers and 22 proteins in two protein panels across different AT stages, reporting T values and FDR-adjusted p values. (F) AUC values for each protein in distinguishing A+ from A+T− based on ROC curves. The data underlying this figure are found in [125]Table S5. (G) AUC values for each protein in distinguishing A+T+ from A+T− based on ROC curves. The data underlying this figure are found in [126]Table S6. Color coding: purple, CSF AD biomarkers; green, proteins selected in early and late stages; blue, early-stage proteins; red, late-stage proteins. To evaluate the relative performance of our CSF-based predictive models against plasma-based AD biomarkers, we analyzed a subset of 57 participants from the ADNI cohort with available plasma data ([127]STAR Methods). The early-stage protein panel achieved an AUC of 0.94 for distinguishing A+ from A−T−, and the late-stage panel reached an AUC of 1.00 for differentiating A+T+ from A+T−. These results were superior to plasma %p-tau217 (the ratio between phosphorylated and non-phosphorylated p-tau217)[128]^33 (AUC = 0.85 and 0.88, respectively) and plasma p-tau217 alone (AUC = 0.82 and 0.87, respectively) ([129]Figure S3). We further investigated whether incorporating additional proteins could enhance the performance of existing AD biomarkers and baseline models. The p-tau181/Aβ42 ratio (AUC = 0.958) achieved an AUC of 0.958 in distinguishing A+ from A−T−, while adding four proteins (neurofilament heavy chain [NEFH], CCL25, TMOD2, and SMOC1) to CSF Aβ42 yielded a comparable AUC of 0.95 (p = 0.42) ([130]Figure 4C). NEFH, a marker of neurodegeneration, was closely related to cognitive function.[131]^34 For differentiating A+T+ from A+T−, combining CSF Aβ42 with four proteins (HAX1, BORCS5, CAMK2B, and GPI) resulted in an AUC of 0.91, which outperformed baseline models and the CSF p-tau181/Aβ42 ratio ([132]Figure 4D). Comprehensive analysis of 22 CSF proteins alongside AD biomarkers across AT stages revealed significant alterations in CSF Aβ42 and SMOC1 across all comparisons. Additionally, levels of CSF p-tau181, t-tau, CCL25, and NEFH varied within the AD continuum, although no significant differences were observed between A+ and A−T+ stages ([133]Figure 4E). In addition, we compared the accuracy of 22 proteins and AD biomarkers individually in distinguishing AT status. The results showed that CSF Aβ42 (AUC = 0.895) performed best in distinguishing A+ from A−T−, followed by NEFH (AUC = 0.758), CSF p-tau181 (AUC = 0.752), and SMOC1 (AUC = 0.747) ([134]Figure 4F). While HAX1 (AUC = 0.777), GPI (AUC = 0.739), and BORCS5 (AUC = 0.725) showed superior performance in distinguishing A+T+ from A+T−, the established AD biomarkers showed normal performance (AUC = 0.67–0.704) ([135]Figure 4G). To validate the robustness and generalizability of our findings, we conducted multiple sensitivity analyses. First, we examined subpopulations with assessment intervals between CSF proteomics and PET scans within 6 ([136]Figure S4), 8 ([137]Figure S5), and 10 years ([138]Figure S6). In these subgroups, both protein panels and the 22 proteins maintained discriminative performance consistent with our primary results. Second, excluding 14 participants who transitioned from A− to A+ between initial and final PET assessments did not alter the outcomes of the protein panels and individual proteins ([139]Figure S7). Third, restricting amyloid PET assessments to within 1 year of CSF proteomics resulted in 471 participants ([140]Table S7), where protein panels and all proteins remained significantly associated with amyloid PET after FDR correction ([141]Table S8). Additionally, the early protein panels continued to accurately discriminate A+ from A− ([142]Table S9). Lastly, in the subset of 324 participants with available repeated amyloid PET measurements, we examined the longitudinal association between proteins and amyloid PET ([143]Table S10). Utilizing a linear mixed model to explore longitudinal changes in amyloid PET, 12 of the 22 proteins and both protein panels were significantly associated with these changes ([144]Table S11). These consistent results across various analyses underscore the robustness and clinical relevance of our CSF proteomic signatures in predicting AD PET staging. Association with progression to dementia and rate of cognitive decline We conducted a time-to-event analysis to evaluate the predictive capacity of two CSF protein panels for dementia onset in non-demented individuals over a 10-year follow-up period. Participants were categorized into high and low expression groups based on the median panel scores. Kaplan-Meier survival analysis demonstrated significantly different dementia-free survival probabilities between the high and low expression groups (see also [145]Figure 5A and [146]Table S12). Cox proportional hazard models revealed that individuals with high panel scores had a markedly increased risk of developing dementia compared to those with low scores. Specifically, early-stage panel exhibited a hazard ratio (HR) of 4.992 (95% confidence interval [CI]: 2.0–12.4, p = 5.49 × 10^−4) and late-stage panel an HR of 5.965 (95% CI: 2.3–15.6, p = 2.80 × 10^−4) in model 1, which adjusted for age, gender, and education. These associations remained robust after further adjustment for APOE genotype (model 2) and baseline clinical diagnosis (model 3) ([147]Table S12). Figure 5. [148]Figure 5 [149]Open in a new tab Association with dementia progression and cognitive decline at 10-year follow-up (A) Kaplan-Meier survival curves comparing dementia progression between low and high scores for the early-stage and late-stage protein panels. p values and hazard ratios (HRs) were obtained from Cox proportional hazards regression models, adjusted for age, gender, and years of education. The data underlying this figure are found in [150]Table S12. (B) Cognitive trajectories of MMSE scores between low and high scores for the early-stage and late-stage protein panels. Average regression lines for each group were fitted, with error bars representing 95% confidence intervals (CIs). Interaction p values for time were derived from linear mixed-effects models with random effects, adjusted for age, gender, and years of education. The data underlying this figure are found in [151]Table S14. (C) Kaplan-Meier survival curves comparing dementia progression between low and high levels of five specific proteins. p values and HRs were obtained from Cox proportional hazards regression models, adjusted for age, gender, and years of education. The data underlying this figure are found in [152]Table S12. (D) Association between 22 proteins and cognitive scores. FDR-adjusted p values and beta values were derived from multiple linear regression models, adjusted for age, gender, and years of education. Green, proteins selected in early and late stages; blue, early-stage proteins; red, late-stage proteins. (E) Cognitive trajectories of MMSE scores between low and high levels of five proteins. Average regression lines for each group were fitted, with error bars representing 95% CIs. Interaction p values for time were obtained from linear mixed-effects models with random effects, adjusted for age, gender, and years of education. The data underlying this figure are found in [153]Table S14. To determine whether the identified protein panels could differentiate between slow and rapid cognitive decline, we employed linear mixed-effects models adjusted for age, gender, and education. Participants were stratified into high and low protein expression groups based on median levels. Cognitive decline was assessed using the Mini-Mental State Examination (MMSE), Clinical Dementia Rating Scale-Sum of Boxes (CDRSB), and Alzheimer’s Disease Assessment Scale-Cognitive Subscale 13 (ADAS13). Significant differences in the regression slopes were observed between high and low expression groups for all cognitive measures (MMSE: p < 0.001, [154]Figure 5B; CDRSB: p < 0.001, [155]Figure S8; ADAS13: p < 0.001, [156]Figure S9). Individuals with high protein panel scores exhibited accelerated cognitive decline over the 10-year period. These findings persisted across models 2 and 3, which included adjustments for APOE genotype and baseline diagnosis (ADAS13: [157]Table S13; MMSE: [158]Table S14; CDRSB: [159]Table S15). Further analysis focused on individual proteins within the panels to identify specific markers associated with dementia risk. Elevated levels of NEFH, YWHAG, SMOC1, CHN1, and CAMK2B were significantly linked to an increased risk of dementia, with HRs ranging from 3.271 to 5.131 (p < 0.005) ([160]Figure 5C). Additionally, six of the remaining 17 proteins demonstrated significant risk differences between high and low expression groups ([161]Table S12). Based on subsequent analysis using multiple regression analysis adjusted for age, gender, and years of education, we found the cross-sectional relationship between these 22 proteins and cognitive function ([162]Figure 5D). Notably, five proteins (NEFH, YWHAG, SMOC1, CHN1, and CAMK2B) and nine of the remaining 17 proteins were significantly associated with the rate of cognitive decline (see also [163]Figure 5E and [164]Table S14). Interestingly, proteins SMOC1 and TMOD2 did not correlate with baseline MMSE scores but were linked to longitudinal declines in MMSE scores ([165]Table S14). Consistent associations were observed when cognitive decline was measured using ADAS13 scores ([166]Table S13) and CDRSB ([167]Table S15), reinforcing the relevance of these proteins as biomarkers for AD progression. Validation in external cohort and internal cohorts To validate the predictive performance of our identified CSF protein panels, we analyzed data from the external Knight Alzheimer Disease Research Center (ADRC) cohort (N = 146) ([168]Table S16). In this cohort, the interval between CSF collection and PET scan assessment was restricted to less than 6 months ([169]STAR Methods). As expected, the early protein panel showed an AUC of 0.99 for distinguishing A+ individuals from A−T− controls. This performance significantly surpassed that of established CSF AD biomarkers (AUC range: 0.91–0.97, p < 0.002) and baseline models (AUC = 0.84, p = 3.26 × 10^−5) ([170]Figure 6A). Similarly, the late-stage protein panel effectively classified A+T+ individuals from A+T− subjects, attaining an AUC that outperformed both CSF AD biomarkers (AUC range: 0.55–0.69, p < 0.001 for p-tau181 and t-tau; p = 6.22 × 10^−2 for Aβ42) and baseline models (AUC = 0.83, p = 3.26 × 10^−1) ([171]Figure 6B). These findings underscore the robustness of our protein panels in external validation settings, highlighting their superior performance over conventional biomarkers. Figure 6. [172]Figure 6 [173]Open in a new tab Validation in external cohort and internal cohorts (A and B) Performance of CSF AD biomarkers (Aβ42, CSF p-tau181, and CSF t-tau), baseline models (age, gender, and APOE ε4 status), and two protein panels in distinguishing A+ from A+T− and A+T+ from A+T− in an external Knight ADRC cohort (N = 146). (C and D) Performance of protein panels, CSF p-tau181/Aβ42, the top four proteins in the protein panels, CSF AD biomarkers, and baseline models, with or without the top four proteins, in distinguishing ADNC intermediate/high from ADNC none/low in an internal cohort with an independent autopsy population (N = 54). (E) Kaplan-Meier survival curves comparing dementia progression between low and high levels of early-stage and late-stage protein panels, as well as NEFH and YWHAG, in an internal cohort with an independent longitudinal population (N = 354). p values and hazard ratios (HRs) were derived from Cox proportional hazards regression models, adjusted for age, gender, and years of education. The data underlying this figure are found in [174]Table S17. (F) Cognitive trajectories of MMSE scores between low and high levels of early-stage and late-stage protein panels, as well as NEFH and YWHAG, in an independent longitudinal population (N = 354). Average regression lines were fitted for each group, with error bars representing 95% confidence intervals (CIs). Interaction p values for time were derived from linear mixed-effects models with random effects, adjusted for age, gender, and years of education. The data underlying this figure are found in [175]Table S19. To further assess the clinical utility of our proteomic signatures, we validated our findings within an internal autopsy-confirmed cohort (N = 54). This cohort allowed for the evaluation of protein panel accuracy in differentiating stages of autopsy-confirmed AD pathology, the gold standard for AD diagnosis. ADNC was categorized as none/low (A−T−), intermediate/high (A+), intermediate (A+T−), and high (A+T+), focusing primarily on intermediate/high stages pertinent to AD pathology diagnosis.[176]^2 The early-stage protein panel achieved perfect discrimination between ADNC none/low and ADNC intermediate/high stages (AUC = 1), outperforming the CSF p-tau181/Aβ42 ratio (AUC = 0.97) and a combination of the top four proteins from the early-stage panel (AUC = 0.96) ([177]Figure 6C). Incorporating the top four proteins significantly enhanced the discriminative accuracy of CSF Aβ42 from an AUC of 0.89–1, compared to AD biomarkers and baseline models. Similarly, the late-stage protein panel excelled in distinguishing between ADNC intermediate and high stages, with significant improvements observed upon adding the top four proteins from this panel, particularly for CSF Aβ42 and baseline models ([178]Figure 6D). Comprehensive correlation analyses revealed significant associations between individual proteins and the severity of AD neuropathological scores, including Thal, Braak, CERAD, and overall ADNC, after FDR correction. Specifically, CCL25, YWHAG, CHCHD7, HAX1, and TBCA were significantly correlated with all four neuropathological scores. Notably, translin (TSN) and CHN1 from the late-stage panel were exclusively associated with Braak score severity ([179]Figure S8). TSN, a multifunctional protein involved in nucleic acid metabolism such as DNA repair and RNA processing, may contribute to neurodegeneration underlying AD pathogenesis.[180]^35 To evaluate the prognostic value of our protein panels, we examined an independent non-demented cohort (N = 354) for dementia progression and cognitive decline. Individuals with elevated scores in either the early or late-stage protein panels exhibited a significantly higher risk of progression to dementia (see also [181]Figure 6E and [182]Table S17) and demonstrated accelerated cognitive decline over time (ADAS13: [183]Table S18; MMSE: see also [184]Figure 6F and [185]Table S19; CDRSB: [186]Table S20). Analysis of individual proteins within both panels indicated that participants with protein levels above the median had an increased risk of dementia progression, with NEFH and YWHAG showing the highest HR among the 22 proteins assessed (see also [187]Figure 6E and [188]Table S17). Additionally, in contrast to prior findings where 12 out of 22 proteins were associated with at least one cognitive scale, the larger sample size revealed that 21 out of 22 proteins correlated with at least one baseline cognitive score ([189]Figure S9). Longitudinal analyses further demonstrated that individuals with high levels of 20 out of these 22 proteins experienced faster cognitive decline over a decade compared to those with lower levels (ADAS13: [190]Table S18; MMSE: see also [191]Figure 6F and [192]Table S19; CDRSB: [193]Table S20). These associations remained robust after adjusting for APOE genotype (model 2) and clinical diagnosis (model 3), underscoring the independent predictive value of both the protein panels and the individual proteins for dementia progression ([194]Table S17) and cognitive decline (ADAS13: [195]Table S18; MMSE: [196]Table S19; CDRSB: [197]Table S20). Discussion In this study, we integrated comprehensive CSF proteomic profiling with advanced machine learning techniques to identify two distinct protein panels capable of accurately discriminating the pathological stages of AD as determined by PET imaging. Specifically, we developed an early-stage protein panel to assess Aβ deposition and a late-stage panel to evaluate tau pathology severity. These protein signatures not only enhance the stratification of participants in clinical trials but also demonstrate potential as predictive markers for dementia progression and cognitive decline over a decade. Our proteomic analysis revealed 262 proteins associated with amyloid PET and 949 proteins linked to tau PET, such as HAX1 and CCL25.[198]^36^,[199]^37^,[200]^38^,[201]^39 Pathway enrichment analysis highlighted distinct molecular signatures for amyloid and tau pathologies. Proteins associated with amyloid PET were predominantly involved in neuroinflammatory pathways, particularly chemokine receptor binding and cytokine activity, aligning with the current understanding of neuroinflammation’s role in Aβ-mediated AD pathology.[202]^40^,[203]^41 Additionally, the enrichment of oxidoreductase activity and nucleoside metabolic processes suggests disruptions in cellular energy homeostasis and mitochondrial function, which are critical aspects of AD pathogenesis.[204]^19 Conversely, proteins linked to tau PET were enriched in pathways related to protein quality control and post-translational modifications, including ubiquitin-like protein conjugating enzyme activity and proteasomal degradation. These findings are consistent with the established mechanisms of tau protein clearance and hyperphosphorylation.[205]^42 Furthermore, the involvement of calcium-dependent protein kinase activity and mitogen-activated protein kinase signaling pathways in tau pathology indicates potential regulatory mechanisms that contribute to synaptic dysfunction and neuronal loss.[206]^43 Notably, both Aβ- and tau-associated proteins showed convergent enrichment in glucose metabolic processes, highlighting a shared metabolic vulnerability that may represent a critical point in AD pathogenesis.[207]^44 Leveraging these proteomic findings, we developed two discrete predictive models characterizing distinct stages of AD progression. Through machine learning approaches, we derived a 16-protein signature for early-stage prediction (A−T− vs. A+) and a 9-protein signature for late-stage differentiation (A+T− vs. A+T+), revealing stage-specific molecular landscapes that extend our understanding of AD pathogenesis. The early-stage signature exhibited significant enrichment in pathways governing synaptic function and plasticity, indicating that synaptic dysfunction may serve as an early response to the pathological changes occurring in AD.[208]^45 Notably, we observed a significant involvement of microtubule-based transport processes, suggesting the activation of compensatory mechanisms in the early stages of the disease.[209]^45 This compensatory response may help explain the preservation of cognitive function in some individuals despite the presence of underlying Aβ pathology.[210]^46 Conversely, the late-stage signature demonstrated enrichment in pathways characteristic of advanced AD, encompassing disruptions in energy metabolism, calcium homeostasis, and insulin signaling. The coordinated alterations in these metabolic, calcium, and insulin pathways provide molecular evidence for the multi-factorial deterioration seen in advanced AD.[211]^47^,[212]^48^,[213]^49 This complex pathway perturbation may underlie increased neuronal vulnerability at this stage, where compensatory mechanisms become overwhelmed, culminating in cognitive decline. These findings yield several key insights: (1) a comprehensive molecular map of disease progression, (2) potential mechanisms governing the transition from compensation to pathological deterioration, and (3) the significance of system-wide alterations in AD. The stage-specific molecular signatures suggest that therapeutic strategies should be tailored according to disease stage—targeting compensatory mechanisms early while employing multi-target approaches in advanced disease. A key methodological strength of our study lies in utilizing PET staging as the primary outcome measure. PET and CSF biomarkers represent distinct approaches to measuring AD pathology, with PET providing direct visualization of aggregated amyloid plaques and tau deposits, while CSF biomarkers (e.g., Aβ42, p-tau181, and t-tau) reflect soluble forms of these pathologies. Autopsy studies have demonstrated that although CSF biomarkers effectively discriminate AD pathological stages, their accuracy remains lower than PET imaging.[214]^2 Therefore, our study is based on amyloid and tau PET stages as the study outcome and head-to-head comparison with CSF AD biomarkers and final autopsy pathology validation to improve the robustness and reliability of the findings. The accuracy of the two protein panels identified in this study surpasses that of existing CSF AD biomarkers, and we further demonstrate that four proteins can significantly improve established CSF AD biomarkers and baseline models. Notably, the late-stage protein panel we discovered accurately captures amyloid and tau pathologies detected by PET, addressing a key limitation of current CSF biomarkers, which primarily reflect amyloid plaque changes but show reduced sensitivity to tau accumulation.[215]^50 This limitation is further evidenced by recent phase 3 trials of gantenerumab, where anti-Aβ treatment affected amyloid PET and CSF/plasma biomarkers but not tau PET signals.[216]^51 Furthermore, the latest diagnostic criteria for AD proposed by the National Institute on Aging and the Alzheimer’s Association (NIA-AA) working group classify AD into four pathological stages based on Aβ and tau PET.[217]^13 However, corresponding CSF biomarkers for advanced PET-based disease stages are currently lacking. Previous studies have indicated that the microtubule-binding region of tau containing the residue 243 is a promising biomarker for tau PET with higher correlation than other established AD biomarkers,[218]^4 yet its accuracy in AT status discrimination requires further investigation. Moreover, plasma p-tau217 is another promising biomarker for identifying tau pathology stages,[219]^52^,[220]^53^,[221]^54 with studies showing that plasma %p-tau217 performs comparably or even superior to CSF p-tau181/Aβ42.[222]^54 In the present study, the protein panel we identified outperformed plasma %p-tau217 and p-tau217 in distinguishing PET stages. However, the limited number of participants in this subgroup (n = 57) may have reduced the statistical power, limiting the robustness of this conclusion. Overall, our findings suggest that the protein panels identified for late-stage disease hold promise in overcoming the limitations of existing AD biomarkers in identifying high tau pathology in amyloid-positive individuals. The identified protein panels demonstrate significant potential for advancing clinical trials and patient care in AD. Their implementation in CSF offers distinct advantages, as CSF directly interfaces with brain pathophysiology and provides real-time insights into disease progression with minimal interference from peripheral factors.[223]^55^,[224]^56 In early disease stages, the panel enables sensitive detection of Aβ deposition, facilitating timely intervention for individuals at the onset of AD pathology. For later stages, the panel provides assessment of tau pathology as a cost-effective alternative to PET imaging, increasing accessibility for broader patient populations. Given the established relationship between anti-Aβ treatment efficacy and tau burden,[225]^7^,[226]^57 these panels can enhance clinical trial stratification by identifying participants more likely to benefit from anti-Aβ therapeutics, thereby optimizing intervention outcomes. Moreover, the protein markers serve as reliable pharmacodynamic indicators, making them particularly valuable for assessing drug target engagement and therapeutic efficacy in clinical trials. This application is especially crucial given CSF biomarkers’ ability to reflect subtle changes in brain protein homeostasis that may not be detectable through other means.[227]^58 Furthermore, the panels’ validated ability to predict cognitive decline and dementia progression over a decade positions them as valuable prognostic tools for individual risk stratification. These protein signatures could enable clinicians to identify high-risk individuals who may benefit from more intensive monitoring and early intervention strategies before significant symptoms manifest. In addition, the panels could facilitate clinical trial recruitment by identifying individuals at higher risk of cognitive decline and dementia progression, potentially reducing trial costs and increasing success rates through enriched participant selection. These characteristics, combined with the panel’s robust performance in distinguishing pathological stages, suggest significant potential for broader implementation in both clinical practice and therapeutic development programs. Limitations of the study Several limitations of this study warrant consideration. First, the median interval of 6.2 years between CSF sampling and PET assessment may affect protein panel performance due to potential transitions in amyloid and tau PET status. Although we addressed this through methodological approaches and sensitivity analyses, including validation in an independent cohort with CSF-PET intervals under 6 months, this temporal gap remains a consideration for interpretation. Second, while our analyses demonstrated robust protein associations with cognitive decline and dementia progression over 10 years, the predictive validity beyond this time frame requires investigation. Third, the promising comparison with plasma p-tau217 is limited by sample size (n = 57), necessitating validation in larger cohorts. Fourth, while blood-based tests would offer significant clinical advantages over CSF sampling due to their minimal invasiveness and greater feasibility for widespread implementation, the predictive performance of our identified CSF protein signatures when translated to blood markers was not investigated in this study. Fifth, our machine learning approach utilized the entire dataset for training due to sample size constraints, potentially affecting model generalizability. We implemented bootstrapping (100 iterations) and nested 10-fold cross-validation to mitigate overfitting risks. Sixth, our proteomics profiling relied solely on the SomaScan platform, lacking orthogonal validation with alternative proteomic technologies, which is important given potential measurement biases from varying aptamer reagent affinities.[228]^59 Seventh, our current analysis used a median-based dichotomization to determine protein thresholds in a single cohort. While this approach effectively demonstrated the proteins’ prognostic potential, establishing clinically applicable thresholds will require validation in larger independent cohorts to ensure result robustness and generalizability. Eighth, the specificity of our panels for AD versus other neurodegenerative conditions remains to be established. Finally, validation across diverse ethnicities using various measurement technologies in large independent cohorts is necessary to enhance result robustness. Future studies incorporating larger samples and longitudinal biomarker measurements will be crucial for understanding the temporal dynamics and clinical utility of these proteins. Resource availability Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Jianping Jia (jjp@ccmu.edu.cn). Materials availability This study did not generate new materials. Data and code availability * • This study does not generate new original data. All data used in this study were obtained from the ADNI and Knight ADRC cohorts. For ADNI, all variables used tables in this study are provided at [229]https://zenodo.org/records/14866417, and the full datasets are available through ADNI upon reasonable request. For ADNI data access, please visit the ADNI website at [230]www.adni-info.org, and request approval for access. * • For the Knight ADRC cohort used in the validation of this study, individual-level data and proteomics can be requested from the NeuroGenomics and Informatics Center (NGI) at [231]https://neurogenomics.wustl.edu/open-science/resource-sharing- hub/ or from Database: [232]NIAGADS, NG00130 . * • This paper does not report the original code. The code for analyses conducted in this study was derived from two previously published studies.[233]^60^,[234]^61 The machine learning code employed is publicly accessible at [235]https://zenodo.org/records/14866417. * • All software utilized in this study is publicly available. Summary results generated from this study are provided in [236]Tables S3 and [237]S4. A full list of open-source software used is available in the [238]key resources table. * • Any additional information required to reanalyze the data reported in this paper is available from the [239]lead contact upon request. Acknowledgments