Abstract Diabetic kidney disease (DKD) progression is not well understood. Using high-throughput proteomics, biostatistical, pathway and machine learning tools, we examine the urinary Complement proteome in two prospective cohorts with type 1 or 2 diabetes and advanced DKD followed for 1,804 person-years. The top 5% urinary proteins representing multiple components of the Complement system (C2, C5a, CL-K1, C6, CFH and C7) are robustly associated with 10-year kidney failure risk, independent of clinical covariates. We confirm the top proteins in three early-to-moderate DKD cohorts (2,982 person-years). Associations are especially pronounced in advanced kidney disease stages, similar between the two diabetes types and far stronger for urinary than circulating proteins. We also observe increased Complement protein and single cell/spatial RNA expressions in diabetic kidney tissue. Here, our study shows Complement engagement in DKD progression and lays the groundwork for developing biomarker-guided treatments. Subject terms: Chronic kidney disease, Prognostic markers, Data mining, Proteomics __________________________________________________________________ Complement proteome engagement is strongly linked to kidney outcomes in diabetes. This translational study leveraged five cohorts of over 4,500 person-years and high-throughput proteomics to enable potential biomarker-guided drug development. Introduction Diabetes is the leading cause of kidney failure. Over the last two decades, prevalence of kidney failure due to diabetes has more than doubled in the United States^[64]1. National databases mainly reflect kidney disease due to type 2 diabetes, but similarly unfavorable patterns are seen for type 1 diabetes^[65]2. Despite recent advances^[66]3,[67]4, the residual risk of kidney failure^[68]1,[69]2 is high, emphasizing the need for research on novel diabetic kidney disease (DKD) molecular mechanisms and the identification of new drug targets. It is plausible to assume that the urinary proteome will offer meaningful insights into the mechanisms which underlie DKD. Earlier studies of urinary proteins in DKD were pursued with targeted biomarker studies^[70]5–[71]7 or mass spectrometry^[72]8,[73]9. High-throughput protein testing in large cohort studies has only recently become possible. These robust affinity methods, like the aptamer proteomics SomaScan platform, have been used so far to measure protein levels in serum or plasma in multiple studies. By contrast, this technology has only been utilized in a few studies of the urinary proteome. Those studies were all cross-sectional^[74]10–[75]12, mostly small^[76]10,[77]11 and focused on distant clinical phenotypes. To date, no prospective study has been performed on the urinary proteome in diabetic kidney disease or any other chronic disease. Ours was a prospective two-cohort study of subjects with type 1 or type 2 diabetes and advanced DKD at baseline, in whom high-throughput proteomics of urine was evaluated for associations with 10-year kidney failure development. Advanced computational techniques were used to explore multi-dimensional links of the most enriched pathway proteome (Complement system) with clinical and molecular indices of diabetic kidney injury. Subsequently, we evaluated whether the associations of the top Complement proteins extend to earlier disease stages in a three-cohort study in type 1 or type 2 diabetes and early-to-moderate DKD for clinically recognized kidney outcomes. We explored select features of Complement proteins as potential biomarkers. In order to gain insights into the possible source of these proteins, we further examined the Complement proteome across three matrices (plasma, urine, and kidney), supported further by single-cell and spatial RNA kidney data. Results Prospective study cohorts Figure [78]1 outlines the study framework. The two-cohort study of subjects with advanced DKD and type 1 (189 subjects) or type 2 diabetes (115 subjects) had at baseline impaired kidney function with glomerular filtration rate (GFR) predominantly in the G3 category and increased albuminuria in the A3 category in more than half of the subjects (Supplementary Data [79]1). Development of kidney failure within 10 years occurred in 53% of subjects with type 1 and 23% of subjects with type 2 diabetes. The three-cohort study included 652 subjects with early-to-moderate DKD and of both diabetes types. These subjects had mostly normal kidney function (G1 or G2) and moderately increased albuminuria (A2). Kidney failure (clinical endpoint) was rare in this group. The composite outcome of 30% or more decline in GFR and/or kidney failure occurred in 18%. The study cohorts were predominantly white (73%), whereas black race accounted for 21% of the subjects. The inclusion criteria and subject selection for the current study are detailed in Supplementary Fig. [80]1a, b. See also Supplementary Methods, Supplementary Data [81]2. Fig. 1. Schematic representation of the study framework. [82]Fig. 1 [83]Open in a new tab Our comprehensive study of the urinary Complement proteome comprised a two-cohort study of subjects with type 1 (n = 189) or type 2 diabetes (n = 115), and advanced DKD followed for kidney failure in 10 years. We employed advanced molecular phenotyping technologies to establish proteomics associations with prospective kidney outcomes, detailed biological relationships of our high-throughput proteomics data with the clinical and molecular phenotypes of diabetic kidney disease progression; we evaluated whether the associations extend to earlier DKD stages in a three-cohort study (n = 652) followed for up to 5 years. We investigated potential sources of increased Complement proteins in the urine by proteomics studies across three biofluid/tissue matrices and single-cell or spatial transcriptomics. DKD, diabetic kidney disease; T1D, type 1 diabetes; T2D, type 2 diabetes; GFR, estimated creatinine-based glomerular filtration rate; AA, African American; sc/snRNAseq, single-cell/single-nucleus RNA sequencing. Enriched pathways in urinary proteome and kidney failure risk in advanced DKD Pathway-driven analyses (one pathway at a time) of the 1305 urinary proteins measured for associations with kidney outcome in the advanced DKD two-cohort study identified the most enriched pathways belonging to the Complement system (Complement and coagulation cascades (hsa04610), regulation of Complement cascade (R-HSA-977606) and activation of C3 and C5 (R-HSA-174577); Fig. [84]2a). Our analysis resulted in 66 significant out of 1305 measured proteins including 15 out of 110 Complement system proteins, with the threshold set at the 95^th percentile of P-values distribution, equivalent to α = 10^−16. The subsequent data-driven, pathway-enhanced approach (all Complement pathways combined) confirmed further a significant enrichment in Complement (2.8-fold enrichment, P = 2.1 × 10^−4, Fig. [85]2b). Fig. 2. Urinary proteome associated with prospective kidney outcome is enriched in the Complement system. Analysis is performed in the advanced DKD two-cohort study. Fig. 2 [86]Open in a new tab a Pathway enrichment analysis performed using the DAVID Gene Functional Classification Tool shows the statistically significant pathways (P < 0.05). b Over-representation analysis using Fisher’s exact test detected enrichment in Complement proteins. The cross within each plot represents the median, 75^th, and 25^th percentile values. All P-values are two-sided. Source data are provided as a Source Data file. Complement proteome and development of kidney failure within 10 years in advanced DKD Associations of the urinary Complement proteome and progression to kidney failure in advanced DKD adjusted for diabetes type were highly robust, and all displaying risk pattern (Fig. [87]3a). From among 110 Complement proteins, C2, C5 anaphylatoxin (C5a), collectin kidney 1 (CL-K1), Complement factor H (CFH), C6 and C7 represented the top 5% of most significant proteins, which featured association strengths of P < 2.4 x 10^−23. For all these proteins hazard ratio (HR) was very high. Kidney failure risk was over 4 times higher per C2 tertile change (hazard ratio (HR), 4.27; 95% confidence interval (CI), 3.24, 5.64; P = 1.4 x 10^−24, Fig. [88]4a - base model; see Supplementary Data [89]3 for the formal protein nomenclature). Fig. 3. Urinary Complement proteome and kidney failure risk by pathway in the advanced DKD two-cohort study. [90]Fig. 3 [91]Open in a new tab a A comprehensive view of all proteins of the most enriched pathway (Complement system, 110 proteins) from among 1305 proteins measured by high-throughput proteomics. The needle plot depicts the strengths of associations representing P-values from the Cox proportional hazards models for developing kidney failure within 10 years in the two-cohort advanced DKD. One vertical needle represents P-value transformed to its base 10 logarithm from the diabetes type-adjusted model (base model), evaluating one creatinine-adjusted Complement protein at a time. Proteins are ordered on the x-axis by the pathway, the UniProt identifier (ID), and the protein name (to account for the fact that some proteins share the same UniProt ID; e.g., C5 and C5a). The top 5% Complement proteins are labeled. The gray line marks the threshold of significance at Bonferroni corrected α = 4.5 × 10^−4. b A map of the Complement system pathways and vertical bar graphs of association strengths (as described in a) for pathways that include the top 5% Complement proteins. The top Complement proteins are indicated with red font on the bar graphs and on the pathway scheme. For a full list of the Complement system proteins and data-driven associations, please refer to the Supplementary Data [92]5. All P-values are two-sided. Figure 3b is created in BioRender, Md Dom, ZI. (2024) [93]https://BioRender.com/l33k779. DKD, diabetic kidney disease. Source data are provided as a Source Data file. Fig. 4. Complement proteome associations with kidney failure development, their prognostic measures and biological insights into diabetic kidney disease progression in the advanced DKD two-cohort study. [94]Fig. 4 [95]Open in a new tab a Forest plots and tabular form data showing the top 5% Complement proteins associated with 10-year kidney failure development in the two cohorts (n = 304) for base and clinically adjusted Cox proportional hazards models. The Base Model is controlled for diabetes type. Effect sizes (closed diamond symbols) with corresponding 95% confidence intervals (horizontal bars) are shown per one tertile change in the urinary creatinine-adjusted Complement protein distribution. b Correlations of urinary Complement proteome with clinical legacy measures (needle plot). The order and colors of the needles follow the pathway annotation of the Fig. [96]3a. The Height of one needle represents a correlation coefficient between one Complement protein and one clinical legacy measure in a needle title. P-values are two-sided. c Spaghetti plot displaying P-values for all 110 Complement protein associations with 10-year kidney failure risk in 5 different partially adjusted models. The black and gray dashed lines denote the Bonferroni-corrected and nominal significance thresholds, respectively. P-values are two-sided. d Correlations of urinary Complement proteome with molecular kidney injury indices (needle plot). The order and colors of the needles follow the pathway annotation of the Fig. [97]3a. e A chord diagram of relationships between urinary Complement proteome and circulating KRIS. Upper sectors representing TNFRSF members of KRIS (green) and non-TNFRSF members of KRIS (purple) arranged clockwise in order of decreasing strength of associations with the kidney outcome. Lower sectors are arranged counterclockwise by the strengths of the Complement associations with the kidney outcome. The top 5% Complement proteins are marked with asterisks. The length of the circular sectors indicates the cumulative strengths of the associations for a given protein. Links corresponding to the 75^th percentile of the distribution of correlation coefficients are shown. HR, hazard ratio; CI, confidence intervals. Source data are provided as a Source Data file. These strong relationships are illustrated by large differences in the 10-year cumulative incidence of kidney failure according to tertiles of baseline urinary concentration of each of the top Complement proteins. For the top tertile of C2, the incidence reached 82% at the end of 10 years, whereas it was 23% for the lowest tertile, Fig. [98]5a–f). Only 14 all-cause and cardiovascular mortality events (4.5%) occurred in the advanced DKD cohorts, thus, competing event analyses were deemed unnecessary. Fig. 5. Urinary top 5% Complement proteins and cumulative incidence of kidney failure in the advanced DKD two-cohort study. [99]Fig. 5 [100]Open in a new tab The proportions of subjects with type 1 (n = 189) or type 2 (n = 115) diabetes and advanced DKD (two-cohort study), who developed kidney failure within 10 years of follow-up, as per tertiles of distribution of baseline values of urinary creatinine-adjusted Complement proteins measured with high-throughput aptamer proteomics. a Complement C2. b Complement C5a. c Collectin kidney 1 (CL-K1). d Complement C6. e Complement factor H (CFH). f Complement C7. Solid lines represent Kaplan-Meier curves, whereas the surrounding shaded areas represent the corresponding 95% confidence intervals. Log-rank test reflects the comparison among tertiles treated as a three-level categorical variable. All P-values are two-sided. T1 (bottom), tertile 1; T2, tertile 2; T3 (top), tertile 3. Source data are provided as a Source Data file. In order to determine the independence of the Complement proteins from clinical covariates in the presence of kidney outcome, we turned back to the Cox model. After adjustment for clinical covariates (age, sex, race, diabetes type and duration, body-mass index, systolic and diastolic blood pressure, hemoglobin A1c (HbA1c), GFR, albuminuria, cholesterol, smoking status, insulin use, renoprotective/other antihypertensive, and lipid-lowering treatments), each top protein remained significantly associated with kidney failure with risks from 2.0 to over 3.7-fold. In the adjusted model, kidney failure risk remained over 3 times higher per C2 tertile change (HR), 3.30; 95% CI, 2.10, 5.20; P = 2.4 x 10^−7; Fig. [101]4a. When the confounding effects of key covariates were evaluated in detail, age, sex, and HbA1c did not confound the associations for top proteins (changes in β effect estimates less than 6%). Confounding by GFR was moderate (changes in β from 10% to 15%). The analysis using the new CKD-EPI 2021 GFR equation^[102]13 (GFR[new]) yielded similar results. Confounding by albuminuria was substantial, and protein-specific (β changes from 4% to 26%); Supplementary Fig. [103]2a and Supplementary Data [104]4. For the top 5% Complement proteins, we found no interactions with sex (P[interaction ]> 0.28 for each, Supplementary Fig. [105]2b) and no interactions with diabetes type. In the analysis of the entire Complement proteome, we first examined the orthogonal relationships between the urinary Complement proteome with clinical covariates (Fig. [106]4b). Overall, correlations with albuminuria were more substantial compared to those with kidney function or glycemic control. Fifty-nine out of 110 (54%) proteins were associated with kidney failure independently from key clinical covariates (Fig. [107]4c). Urinary Complement proteome in advanced DKD cohorts by pathway The data-driven associations between the top Complement proteins and kidney failure risk were integrated into current biological knowledge about the Complement system^[108]14–[109]16. The top 5% proteins represented its multiple components (Fig. [110]3b). The top protein, Complement C2, is a downstream effector resulting from activation of either the lectin or classical pathways. The lectin pathway itself is activated upstream by proteins recognizing specific carbohydrate groups, including CL-K1 (the top third protein in our study), mannose-binding lectin (MBL), and ficolins 1, 2, and 3 (FCN1-3). The alternative pathway is regulated by a number of Complement factors, including CFH (the top fifth protein). All three upstream pathways converge upon the formation of C3 and C5 convertases. The latter, C5 convertase, cleaves an intact Complement C5 protein into the Complement 5 anaphylatoxin (C5a – top second protein) and C5b. The terminal Complement pathway starts with C5b binding to Complement C6 (top fourth protein) and then to Complement C7 (top sixth protein) to form C5b-7, which anchors into the cell membrane, whereas binding of C8 and C9 creates the terminal form of the membrane attack complex - MAC (C5b-9), responsible for cell lysis. Proteins of the classical pathway, opsonins, or regulatory proteins had weaker associations. Urinary Complement proteome and other molecular indices in advanced DKD From among urinary biomarkers of DKD (Fig. [111]4d), Complement proteins correlated markedly with a monocyte chemoattractant protein 1 (MCP1), and noticeably with a liver-type fatty acid binding protein (LFABP). When it comes to urinary immunoglobulins, correlations with immunoglobulins M (IgM) and IgG were substantially stronger than those with IgA. Among circulating biomarkers, KRIS proteins, previously reported by us, were examined for association with urinary Complement proteins. About one-third of the circulating KRIS proteins revealed strong connections with the urinary Complement. Seven out of 17 KRIS proteins: TNFRSF1A (also known as TNFR1), SF1B (known as TNFR2), SF19 and IL15RA, IL17F, CD55, and CD300C, were each strongly connected with over 40 Complement proteins. Globally, however, about two-thirds of the Complement proteome (77%) did not have strong connections (7 links or fewer) with KRIS. All top Complement proteins remained significant after adjustment by TNFR1 (Fig. [112]4c, [113]e). Urinary Complement proteins and kidney outcomes by diabetes type and kidney disease stage In order to appreciate the similarity/dissimilarity of disease progression according to diabetes type, we compared high-throughput Complement proteomes and kidney failure risks between the two advanced DKD cohorts. The association strengths (P-values) and effect sizes (hazard ratios) both showed a strong and uniform agreement between type 1 and type diabetes (concordance: 87%; k[w] = 0.67; Fig. [114]6a and Supplementary Data [115]5 for Complement proteome associations with kidney failure by diabetes type). Fig. 6. Urinary Complement proteins and kidney disease progression across five cohorts - by kidney disease stage and by diabetes type. [116]Fig. 6 [117]Open in a new tab a The plot of strengths of associations (P-values transformed to base 10 logarithms) and effect sizes per one tertile change of each of the 110 Complement proteins on 10-year risk of developing kidney failure in the two advanced DKD cohorts, with type 1 (n = 189) or type 2 diabetes (n = 115). The top Complement proteins are marked with red dots. A weighted Cohen’s kappa coefficient (k[w]) and corresponding P-values reflect the test of agreement for the strength of Complement associations between type 1 and type 2 diabetes. Measurements in advanced DKD were performed with aptamer proteomics and in early-to-moderate DKD with targeted assays. All P-values are two-sided. b Associations of the top 5% Complement proteins with the prospective continuous kidney outcome in all 956 subjects and 4629 person-years by cohort. Color-annotated and cohort-specific effect estimates (diamond symbols) and 95% confidence intervals (horizontal bars) represent changes in kidney function over time per one tertile increase in the distribution of a urinary creatinine-adjusted Complement protein. Please see Supplementary Data S[118]6 for all results on GFR-based outcomes. c Associations of the top 5% Complement proteins with the prospective, continuous kidney outcome and (d), binary composite kidney outcome in study cohorts combined by DKD stages. Color-annotated and model-specific effect estimates (crude model: closed circle symbols; adjusted model: open circle) and 95% confidence intervals (horizontal bars) represent changes in kidney function over time per one tertile increase in the distribution of a urinary creatinine-adjusted Complement protein. Please see Supplementary Data S[119]6 for all results on GFR-based outcomes. DKD, diabetic kidney disease; CI, confidence intervals; GFR, estimated creatinine-based glomerular filtration rate; OR, odds ratio; AA, African American. Source data are provided as a Source Data file. Three early-to-moderate DKD cohorts were examined together with advanced DKD cohorts in subsequent analyses of the secondary kidney outcomes. Analyses of the 5 cohorts (Fig. [120]6b) revealed that all top 5% proteins in the advanced DKD and three to five of the six proteins in the early-to-moderate DKD were associated with an acceleration of the GFR slope. The confidence intervals for C5a and CFH crossed the x-axis at 0 in the type 2 diabetes African American cohort. The effect estimates were largest in the advanced DKD with type 1 diabetes, substantial in the type 2, and moderate to weak, but quite comparable across the three cohorts with earlier disease. In the crude analyses of intermediate kidney outcomes (GFR slope and the composite binary outcome) across combined cohorts: two cohorts with advanced DKD combined and three cohorts with early-to-moderate DKD combined, respectively, the results were consistent to those observed within individual cohorts. The findings in the combined advanced DKD cohorts were similar and remained significant for all top proteins (C2, C5a, CL-K1, C6, CFH, C7) after adjustment for clinical variables, including albuminuria. The findings in the combined early/moderate DKD cohorts remained significant for the top five proteins (all but CL-K1) measured, after adjustment for clinical variables such as age, sex, HbA1c, GFR, and diabetes type. Three out of five proteins (C2, C5a, C7) remained significant in the three early/moderate cohorts, after the further adjustment by albuminuria (Fig. [121]6c, d and Supplementary Data [122]6). Complement proteins in urine, circulation, and kidney tissue To gain insights into the source of these proteins, we evaluated Complement in paired plasma and urine specimens from the cohort subset of 97 subjects with advanced DKD with type 1 diabetes (for clinical characteristics, refer to Supplementary Data [123]7). Association of the GFR slope with proteins in the urine was much stronger than with the corresponding proteins in circulation (Fig. [124]7a, [125]c and Supplementary Data [126]5). Interestingly, complement decay-accelerating factor or DAF, a previously reported KRIS member^[127]17, was also the top plasma protein correlated with the GFR slope in our study (“proof-of concept”, Fig. [128]7a, b). Noteworthy, even the strongest associations of the circulating proteins were much weaker than the urinary associations. Further support for the hypothesis that kidney tissue is a source of increased concentration of urinary Complement proteins, rather than protein leakage through glomerular basement membrane, is the absence of correlations between urinary Complement proteins and their molecular weight (P = 0.54, Supplementary Fig. [129]3). Fig. 7. Complement proteomes and diabetic kidney disease progression across three matrices. [130]Fig. 7 [131]Open in a new tab a The two-needle plot of the correlation strengths for subject- and timepoint-level pairs of urinary and circulating Complement proteins with a prospective GFR slope in the advanced DKD cohort with Type 1 diabetes subset (n = 97). Proteins are ordered on the x-axis by the matrix, the UniProt identifier (ID), and the protein name. b The volcano plot, where ratios of mean values of Complement proteins in the kidney tissue in DKD cases to controls (n = 31, independent cross-sectional study group, see Supplementary Data [132]8 for clinical characteristics) are plotted against the strengths of the associations (P-values transformed to their base 10 logarithm). The gray dashed lines indicate the false discovery rate (top) and the nominal (bottom) significance thresholds. c The box plots show in parallel the relative concentrations of top 5% Complement proteins in urine and plasma at baseline between subjects who developed kidney failure compared to those who did not within 10 years of follow-up in the advanced DKD subset (n = 97). d The box plots show kidney tissue protein expressions of tthe op 5% Complement proteins in controls (n = 9) and in subjects with advanced diabetic kidney disease (n = 22). For all boxplots, the horizontal center line within each box represents the median, the top and bottom of each box limit indicate the 75^th and 25^th percentile, and the whisker bars indicate the range. Data presented as dots beyond whiskers are outliers. Association strengths that did not reach significance (α = 0.05) are not shown. All P-values are two-sided. GFR, estimated creatinine-based glomerular filtration rate; DKD, diabetic kidney disease; RFU, relative fluorescence units. Source data are provided as a Source Data file. To obtain more direct evidence supporting the above hypothesis, we compared Complement proteomes in the kidney tissue in an independent case-control study group (clinical characteristics of 31 subjects in this study is in Supplementary Data [133]8). Sixteen out of 110 proteins were differentially expressed between DKD cases and controls. Those included C2, alternative pathway (CFH, CFD, CFP), and terminal complex (C7 and C9) proteins. C5a was not different between these groups, but intact C5 was. Another anaphylatoxin, C3a, was increased in diabetic kidneys with the highest subject-level heterogeneity (dispersion value: 140%). Interestingly, expression of the lectin pathway members, including CL-K1 and others, did not differ (Fig. [134]7b, [135]d and Supplementary Data [136]5). The associations between expressions of top Complement proteins with histological indices of kidney injury revealed C7 expression in kidney tissue had a strong correlation with interstitial fibrosis and immune cell infiltrates. In contrast, C5, C3, or derived anaphylatoxin correlated markedly with intimal fibrosis (Supplementary Data [137]9). In the kidney tissue analyses by the single cell/nucleus RNA sequencing (sc/snRNAseq), CFH and C7 genes were highly abundant in kidney: glomerular parietal epithelial cells and collagen-expressing interstitial fibroblasts, respectively (Fig. [138]8a). The other four genes had low overall expression. Nevertheless, expression in diabetic kidney was increased in resident cells for C5, C6 and C7 and in infiltrating cells for COLEC11 (encoding CL-K1 protein) (Fig. [139]8b). In addition, spatial transcriptomics analysis revealed distinct spatial expression patterns for these genes in control and DKD samples, further highlighting the differential expression across tissue regions (Fig. [140]8c and Supplementary Fig. [141]4a, b). Integrated evidence from the current study of protein profiles in urine, plasma, and tissue reinforces the notion that the kidney is the source of increased urinary concentration of Complement proteins associated with the risk of DKD. Fig. 8. Sc/snRNA-seq expression of genes corresponding to the top 5% Complement proteins in the kidney tissue. [142]Fig. 8 [143]Open in a new tab a Overall expression and proportions on the uniform scale across the genes. b Gene-specific expression for controls (n = 17) and DKD cases (n = 8). Gene expressions corresponding to the top 5% Complement proteins from our study are shown as bubble plots in all kidney tissue examined overall (a) and by caseness, comparing diabetic kidney tissue (red) vs control tissue (purple) (b). Of note, CL-K1 protein is a product of the COLEC11 gene. Dot size in both panels represents the percent of cells expressing the gene of interest, whereas the color intensity corresponds to the expression. c Spatial transcriptomics of genes corresponding to Complement C2 and C7 proteins. This figure illustrates the spatial distribution of 2 genes (C2 and C7) in human kidney tissues from a control sample and a DKD sample. The data were obtained using the Visium spatial transcriptomics platform. Each panel represents the log-transformed expression counts for one gene, visualized across the tissue sample. Color gradients indicate varying levels of gene expression, with blue representing lower expression and red indicating higher expression. Scale bar = 250 μm. Source data are provided as a Source Data file. DKD, diabetic kidney disease; RBC, red blood cells; Baso/Mast, basophils or mast cells; Mac, macrophages; CD16_Mono, CD16 + monocytes; CD14_Mono, CD14 + monocytes; pDC, plasmacytoid dendritic cells; cDC, classical dendritic cells; NK, natural killer cells; CD8T, CD8 + T lymphocytes; CD4T, CD4 + T lymphocytes; B_memory, memory B lymphocytes; B_Naiive, naïve B lymphocytes; IC_B, type beta intercalated cells; IC_A, type alpha intercalated cells; PC, principal cells of collecting duct; CNT, connecting tubule cells; DCT, distal convoluted tubule cells; M_TAL, medullary thick ascending loop of Henle; C_TAL, cortical thick ascending loop of Henle; DLOH, thin descending loop of Henle; Injured_PT, injured proximal tubule cells; PT_S3, proximal tubule segment 3; PT_S2, proximal tubule segment 2; PT_S1, proximal tubule segment 1; Podo, podocytes; PEC, parietal epithelial cells; Mes, mesangial cells; GS_Stromal, glomerulosclerosis-specific stromal cells; Myofib, myofibroblasts; Fibroblast_2, fibroblasts expressing insulin-like growth factor-binding protein 7, vimentin, and beta-2-microglobulin; Fibroblast_1, fibroblasts expressing collagen type I alpha 1 and 2 chain; Endo_lymphatic, endothelial cells of lymphatic vessels; Endo_peritubular, endothelial cells of peritubular vessels; Endo_GC, endothelial cells of glomerular capillary tuft. Urinary Complement proteins as potential biomarkers of kidney failure risk Top Complement proteins added on top of the key clinical covariates (age, sex, diabetes type, hemoglobin A1c, GFR, and albuminuria) significantly increased the model discrimination (from the concordance index, c = 0.810 to c = 0.836; P = 0.021). Not only did we investigate the associations with kidney failure risk over a 10-year period, but we also examined these associations during intermediate-in-length outcomes. Fifty-three subjects (17%) from the advanced DKD developed kidney failure in 3 years and 79 subjects (25%) in 5 years of follow-up, respectively. The strengths of associations were weaker, and the confidence intervals surrounding the effect estimates were wider for shorter follow-up. Nevertheless, they remained significant for each comparison (Supplementary Data [144]10). This evidence supports Complement proteins as potential biomarkers of kidney failure risk or prognostication of DKD progression (as per FDA-NIH nomenclature^[145]18). The proportionality test in the Cox model revealed the effects of the variables at the hazard rate were proportional and constant over time (P > 0.05). We also explored relationships among Complement proteins. All top proteins, but C7 correlated among each other. In the global evaluation, Complement factors - alternative pathway members and coagulation proteins correlated markedly among each other (Supplementary Data [146]11). Next, we evaluated whether clustering informed by Complement would differentiate subjects at future kidney risks. The cumulative incidence of kidney failure relying on clinical legacy measures (albuminuria and GFR) was 37% overall (Fig. [147]9a). In striking contrast, an unsupervised cluster built upon the top 5% Complement proteins resulted in 3 clusters of subjects with markedly differing proportions of the kidney outcomes (Fig. [148]9b). Seventy-eight percent of the advanced DKD subjects with type 1 diabetes in Cluster 3 progressed to kidney failure within 5 years of follow-up in comparison to only 46% and 13% in Cluster 1 and 2, respectively (Fig. [149]9c). Overall risks in type 2 diabetes were lower, but the cluster-based differences were similar (Fig. [150]9d–f). Moreover, in early-to-moderate DKD subjects, clusters informed by the top proteins also markedly discriminate the odds of prospective kidney outcomes (Supplementary Fig. [151]5a–d). Fig. 9. Complement proteins-informed clustering and prospective kidney failure in the advanced DKD cohort with type 1 and type 2 diabetes. [152]Fig. 9 [153]Open in a new tab a Kaplan-Meier curve of 189 subjects with type 1 diabetes and advanced diabetic kidney disease who developed kidney failure within 5 years of follow-up. b An unsupervised approach performed in the form of hierarchical clustering built upon the top urinary Complement proteins. Subjects with type 1 diabetes were clustered based on Complement protein levels, disregarding the outcome. Vertical colored bars show cluster groups and prospective subject caseness. c Proportions of subjects with type 1 diabetes who developed kidney failure within 5 years by cluster. Solid lines represent Kaplan-Meier curves. Log-rank test reflects the comparison over time of the cumulative incidence of kidney failure by cluster. d Kaplan-Meier curve of 115 subjects with type 2 diabetes and advanced diabetic kidney disease who developed kidney failure within 5 years of follow-up. e An unsupervised approach performed in the form of hierarchical clustering built upon the top urinary Complement proteins. Subjects with type 2 diabetes were clustered based on Complement protein levels, disregarding the outcome. Vertical colored bars show cluster groups and prospective subject caseness. f Proportions of subjects with type 2 diabetes who developed kidney failure within 5 years by cluster. Solid lines represent Kaplan-Meier curves. Log-rank test reflects the comparison over time of the cumulative incidence of kidney failure by cluster. P-values are two-sided. Source data are provided as a Source Data file. In addition to biostatistical models, we tested the prognostic accuracy of six machine learning approaches (Fig. [154]10). The logistic model with the top protein, C2, had a robust prognostic accuracy (c = 0.838) and improved further when the top 5% proteins were examined together (c[5% ]= 0.854) or when built into the principal component (c[5%] = 0.853). From among the decision tree family, the gradient boosting method performed best (c[5%] = 0.809; c[100% ]= 0.823). Overall, classical biostatistical methods featured a superior performance. Fig. 10. Biostatistical and machine learning models to evaluate the prognostic role of the Complement proteome for kidney failure in the advanced DKD cohorts (n = 304). Fig. 10 [155]Open in a new tab Penalized, penalized regression; Dec tree, decision tree; Deep, deep learning; LR, logistic regression; PCA, principal component; EN, elastic net; RD, ridge; LS, lasso; RF, random forest; GBM, generalized boosting method; NNET, neural network. Source data are provided as a Source Data file. Since the Complement measurements in the advanced DKD were performed with high-throughput proteomics and those in early-to-moderate DKD with targeted immunoassays, we validated the top proteins between the two. Correlations between aptamer- and antibody-based measurements were excellent for C2, C5a, C6, CFH, and C7 (Pearson correlation coefficients ranged from r = 0.80 to 0.98, P-values < 10^−8 to 10^−27). Correlations for CL-K1 between the two methods were only modest (r = 0.51). Please see Supplementary Fig. [156]6a, b and Methods for more details. We expanded targeted measurements in advanced DKD cohorts for two selected proteins – C5a and CFH. The associations with kidney outcomes were again highly similar between the two (Supplementary Fig. [157]7). Discussion To address the need to understand the mechanisms that underlie DKD development, this study comprehensively examined the association between urinary Complement and DKD progression. The top urinary Complement proteins with the strongest association with the development of kidney failure within 10 years in a two-cohort study of subjects with advanced DKD at baseline were: C2, C5a, CL-K1, C6, CFH, and C7. These proteins flagged engagement of multiple pathways of the Complement system. Among subjects with high levels of C2, our top protein, 4 out of 5 subjects developed kidney failure, whereas it was the case for only 1 out of 5 subjects from among those with low levels. Kidney failure risks were increased over three-and-a half risk or more for top proteins in the crude analyses and featured substantial strengths of the associations. Associations for top proteins remained independently associated with the kidney failure following a comprehensive adjustment for clinical covariates. The associations of the increased urinary concentrations of the Complement proteins were similar in type 1 and type 2 diabetes, but were much more substantial in advanced than in early-to-moderate DKD. The latter was examined for the top proteins in a three-cohort study of subjects with both diabetes types, followed for up to 5 years. This study also provided evidence that the kidney tissue most likely is the source of the excess of urinary levels of the top Complement proteins. The following is a discussion of the study findings regarding existing literature, implications for mechanisms of DKD development, possible therapeutic targets, and the potential development of prognostic tests to identify subjects with diabetes at risk of DKD progression. The top proteins associated with prospective kidney failure point to multiple specific components of the Complement system. In DKD, the roles of the lectin pathway and anaphylatoxins have been well established^[158]14,[159]19–[160]21. Our study confirms these findings and expands them further by implicating previously unreported roles of C2, an alternative pathway, and components of MAC. Urinary and kidney tissue data in our study concordantly indicate a role of C2 in DKD tissue. Most studies in DKD have not measured C2 so far, and thus, little is known. One mass spectrometry study reported C2 associated with a kidney failure^[161]20. Anaphylatoxins C5a (our top second protein) and C3a have potent inflammatory properties. Previous small-scale studies reported elevated C5a and C3a levels in the urine or kidney tissue immunostaining in DKD^[162]21–[163]24, while our single-cell analyses similarly showed differential C5 increases in proximal tubules. Subject-level C3a heterogeneity in our tissue data aligns with an earlier C3 immunostaining study^[164]25; whereas animal studies have demonstrated metabolic and renal benefits of C5a and C3a inhibition^[165]15,[166]19,[167]23. The third top protein, CL-K1, a newly described lectin pathway member, is strongly implicated in tubulointerstitial injury^[168]26. However, unlike other Complement components, lectin members (CL-K1 and mannose-binding lectin (MBL)) may not originate from the kidney. Concordantly, CL-K1 transcripts were present only in infiltrating cells. These observations align with weak MBL immunostaining in the kidney^[169]27 and evidence for circulating MBL^[170]28. Alternative pathway CFH ranked as the fifth protein, with other members, CFB and CFP, nearly as strong. The alternative pathway has also not received much attention so far. Recent studies reported associations of select proteins (CFH, CFB) with kidney failure in type 2 diabetes^[171]20,[172]29,[173]30. CFH protein had the most increased expression in diabetic kidney among our top proteins. CFH RNA expression was highly abundant in our and other studies^[174]25,[175]31. In contrast, genetic or acquired CFH deficiencies underlie the biology of other immune-mediated glomerulonephritides^[176]32,[177]33. A better understanding of these seemingly divergent relationships is needed. All three upstream pathways ultimately lead to the formation of MAC. Urinary C6 (fourth) and C7 (sixth protein, respectively) were more strongly associated with kidney outcomes than other MAC components. Levels of C6, C7, or MAC were shown to associate with DKD progression in focused biomarker studies^[178]20,[179]27,[180]29. Our data demonstrate especially abundant C7 in kidney fibroblasts, concordantly with other transcriptomics studies^[181]25,[182]34. C6 contributes to tubulointerstitial injury in animal models^[183]35,[184]36, also shown in the proximal tubular region in our study. Among other Complement components, we observed a decent number of closely related coagulation and kallikrein-kinin proteins increased. Cross-talks between the Complement and coagulation were previously implicated^[185]15,[186]37. Our study offers exploratory insights into the Complement relationships with other molecular indices. Urinary Complement correlated substantially with an inflammatory biomarker, MCP1. Indeed, anaphylatoxins, collections, or MAC are recognized for their pro-inflammatory properties^[187]19,[188]38. Correlations with tubular biomarker (LFABP) were marked and aligned with our expression data and with the literature pointing to the tubulointerstitial injury^[189]19,[190]35,[191]36. Complement links to IgA and IgM, biomarkers of kidney filtration barrier damage^[192]39,[193]40, also implicated in other Complement diseases^[194]33,[195]38,[196]41 were strong. TNFR1 is an established systemic biomarker of DKD progression^[197]5,[198]22,[199]23, whereas circulating TNFR-enriched signature – KRIS was recently reported^[200]17. The chord diagram shows that only select circulating KRIS proteins, including DAF (which is a Complement regulatory protein), correlated with the urinary Complement. Only limited evidence in the literature exists linking the two^[201]42–[202]44. Due to the paucity of kidney failure in earlier DKD^[203]45,[204]46, we used clinically recognized surrogate endpoints: continuous GFR slope and binary 30% or more decline in GFR. GFR slope is an attractive outcome because it universally translates across disease stages, accommodates varying lengths of observation, and maximizes the study power. Other than CL-K1, the top proteins accelerated early kidney function loss. Associations remained significant after adjustment for a limited number of clinical covariates. Effect sizes were smaller than in advanced DKD but still remarkably concordant between these three independent cohorts differing by study design, recruitment sites, and racial/ethnic background. This observation partially aligns with a previous concept that Complement is more relevant in advanced disease^[205]40. Complement associations in our study showed a remarkable concordance between type 1 and type 2 diabetes. It was visible in comparisons of Complement proteomes in the advanced DKD as well as for comparisons of top proteins in early-to-moderate DKD cohorts. Most existing Complement research was conducted in type 2 diabetes, except for circulating lectin pathway biomarkers, which were largely investigated in type 1 diabetes^[206]14,[207]15,[208]47. It is becoming evident that the proteomics phenotypes of DKD often overlap between type 1 and type 2 diabetes in this study and as reported by others^[209]17,[210]48,[211]49. Our study design prioritized DKD stage over diabetes types, thus, there may be Complement proteins highly specific to only type 1 or type 2 diabetes, which may have not been identified. Further evaluation of Complement in both diabetes types is recommended. Of note, most interventional studies in DKD have typically focused on just one diabetes type. Moreover, although there are sex differences in prospective kidney risks in DKD^[212]50, and also in immune and Complement responses^[213]51,[214]52, we did not observe interactions between sex and Complement proteins. Although Complement presence in the diabetic kidney was previously attributed to non-specific circulating protein deposition^[215]14,[216]41, our integrated evidence across three matrices strengthens the notion that Complement is likely produced in the kidney. Urinary proteins were far more strongly associated with the kidney outcomes than circulating ones. Although urinary Complement proteins did correlate with albuminuria, the patterns of associations did not correlate with the molecular weight of the proteins, speaking against the Complement increase being a result of simple leakage. Moreover, our tissue proteomics and single-cell and spatial transcriptomics studies indicated increased kidney capability of production of select molecules, as also selectively implied before^[217]25,[218]31,[219]34. Our study design utilized a discovery approach to identify a pathway enrichment in two advanced DKD cohorts, with a subsequent focus on the top pathway. The associations were remarkably strong in the analyses adjusted for multiple testing and clinical covariates. Internal validation includes an examination of the concordant signals between type 1 and type 2 diabetes, of more than one kidney outcome, of more than one computational approach, and supported further by analytical method validation. We offer biological insights, or a molecular validation of our findings, offered by kidney tissue data comprising the Complement proteome and single-cell and spatial transcriptomics for the top signals, together with other molecular indices of the disease. Subsequently, we evaluate whether Complement associations extend^[220]53 to earlier disease stages in a three-cohort study. Employment of well-characterized cohorts in this translational project is a considerable study strength. Such a comprehensive evaluation in biofluids like serum or urine has only recently been enabled by innovative proteomics. Our study is the initial attempt to evaluate the urinary proteome with high-throughput aptamer proteomics in any prospective, chronic disease. Furthermore, it offers the highest ever resolution of the Complement proteome performed so far. Proteins are close to the disease phenotype and, as such, they are often biomarkers and drug targets. Albuminuria is currently the only protein used in clinical care. The SomaScan platform takes advantage of aptamers, allowing for high-throughput, great sensitivity, broad dynamic range, and capabilities of teasing apart intact and split products (intact C5 from C5a, for example); altogether, as a result, outperforming other protein techniques^[221]54,[222]55. Our study expands on earlier targeted biomarker or mass spectrometry approaches for Complement^[223]20–[224]22,[225]29,[226]30,[227]47. Our study did not confirm select proteins like urinary DAF or CD59 reported elsewhere, likely attributed to different technologies, or inabilities to measure a glycated form^[228]20,[229]30,[230]47. Measurements in advanced DKD were performed with aptamer proteomics, whereas those in early-to-moderate DKD with targeted assays. Our substantial orthogonal assay validation showed an excellent reproducibility between the two for all, but CL-K1. We supported these high-throughput data with advanced computational analyses. Although machine learning and artificial intelligence-based approaches are becoming important partners in big data analyses^[231]56,[232]57, our biostatistical models offered the most optimal performance. Noteworthy, Complement therapies that target top proteins reported in our study are available in other kidney diseases. Those include prominent targets for C5, C5a or C5aR1 (from preclinical development to FDA-approved therapies)^[233]15,[234]32, alternative factors (up to phase III), or C6 (preclinical). Our study may spark interest in developing DKD therapies targeting Complement. Therapeutic landscape of DKD is different at large from the landscape of diseases targeted with Complement inhibitors^[235]3,[236]4,[237]14,[238]45. Therapies in DKD are applied to large sectors of the population, whereas complement-mediated glomerulonephritides are classified as rare diseases and have a more acute disease course^[239]33,[240]58. Complement inhibitors often feature narrow safety profiles and substantial costs^[241]15. Thus, a potential therapy in DKD will likely focus on specific subpopulations, speaking to the needs of biomarker guidance. The knowledge gained in our study is arguably substantial to inform such precision medicine approaches. Our study offers an evaluation across kidney disease stages and diabetes types, and in the context of well-established clinical outcomes^[242]3,[243]59. It points to urinary Complement proteins as biomarkers of overall kidney risks, particularly in advanced DKD, where the associations were independent from clinical covariates and increased prognostication ability. Our cluster analyses across advanced and early stages, employed similarly in another Complement disease^[244]59, show differential risks on top of the classical enrollment based on DKD stages. The following are limitations of our work. Our findings may not be transferable to DKD without albuminuria, non-diabetic kidney disease, or other diabetic complications like cardiovascular mortality (for details on the study generalizability, see Supplementary Data [245]12). We did not perform repeated Complement measurements over time; however, we do offer insights across kidney disease stages. We did evaluate Complement associations with the ultimate outcome (prospective kidney failure) in advanced DKD cohorts, whereas we were not able to do so in early/moderate DKD cohorts, because kidney failure was rare in those subjects. We identified Complement proteins as risk factors. Other studies may be better powered to identify protective patterns (downregulated proteins)^[246]49,[247]60. Our study included cohorts of Caucasian subjects, and it also included one cohort of African American subjects^[248]61,[249]62. Future studies across populations at increased kidney risk, like Pima Indians^[250]63 or others, will complete the picture. Our proteomics technology did not allow us to separate glomerular and tubular compartments, and we also did not perform tissue immunostaining. To gain insights into the cellular origin, we used high-resolution single-cell and spatial transcriptomics instead. Lastly, although we cannot formally distinguish whether our results reflect the Complement activation or an accelerated turnover, our data provide almost overwhelming evidence of Complement engagement in the disease progression. In summary, our findings provide robust evidence of the role of the Complement proteome in progressive diabetic kidney disease in type 1 and type 2 diabetes. It is particularly pronounced in advanced disease stages and likely attributed to local kidney involvement. Our study provides important biological insights and solid biomarker guidance to inform drug development strategies targeting Complement in diabetic kidney disease. Methods Study oversight The current study adhered to all relevant ethical regulations, and all the protocols were approved by the Committee on Human Studies at Joslin Diabetes Center. The institutional review board at each site, including the Joslin Diabetes Center Committee on Human Studies for the Joslin Kidney Studies, the National Institute of Diabetes and Digestive and Kidney Diseases (NIDKK)-appointed data and safety monitoring board for the Preventing Early Renal Loss in Diabetes (PERL) study, the Wake Forest University School of Medicine Institutional Review Board for the Diabetes Heart Studies and the University of Pennsylvania Institutional Review Board for the kidney tissue study, approved the protocols for the respective parent studies. All patients provided pre-enrollment written informed consent. Advanced DKD – two-cohort study Study population The advanced DKD included one type 1 diabetes cohort comprising 189 subjects (37% were female) and one type 2 diabetes cohort of 115 subjects (36% were female) from the Joslin Kidney Study, which is a prospective, observational investigation of the natural history and molecular determinants of DKD progression^[251]2,[252]17,[253]49,[254]64,[255]65. Advanced DKD was defined as impaired kidney function (glomerular filtration rate (GFR) categories; G3: 30–59 or G4: 15–29 ml/min/1.73 m^2) and moderately or severely increased albuminuria (albuminuria categories; A2: 30–299 or A3: ≥ 300 mg/g creatinine) at baseline. Subjects were followed for 7–15 years. Early/Moderate DKD – three-cohort study Study population The type 1 diabetes cohort comprised 207 subjects (26% were female) with A2 or A3 albuminuria and GFR ranging from 40 to 99.9 ml/min/1.73 m^2 from the PERL trial. Participants were followed for 3 years and 2 months^[256]66. The type 2 diabetes cohort comprised 322 subjects (33% were female) from the Joslin Kidney Study with normal kidney function (GFR categories; G1: ≥ 90 or G2: 60–89 ml/min/1.73 m^2) and A2 or A3 albuminuria. Participants were followed for 5 years. The type 2 diabetes African American cohort comprised 123 subjects (52% were female) from the Diabetes Heart Studies^[257]61,[258]62 with normal kidney function (93% subjects with G1 or G2), and A2 or A3 albuminuria. Participants were followed for 5 years. Study outcomes Incident kidney failure – primary outcome The primary outcome in the two advanced DKD cohorts, namely kidney failure, was ascertained based on the national registries. The United States Renal Data System (USRDS) governs a roster of patients receiving kidney replacement therapy, which includes dates of therapy initiation. The National Death Index (NDI) is a database comprising dates and causes of death. Incident kidney failure was counted against USRDS for subjects who remained alive, or counted against NDI if kidney failure was listed as a cause of death. Subjects were censored either at the time of death (unrelated to kidney failure), date of last GFR measurement, or at 10 years. Since the primary outcome was rare in the early-to-moderate DKD cohorts, in order to allow for comparisons across four cohorts, we utilized the following secondary outcomes: continuous GFR slope, and binary outcomes of ≥ 30% GFR decline or incident kidney failure. The kidney endpoint definitions and evidence strength used above are based on the scientific workshop co-sponsored by the National Kidney Foundation and the US Food and Drug Administration held in 2020^[259]45. GFR slope – continuous outcome GFR slope is recognized as a clinically valid surrogate outcome of DKD progression^[260]45,[261]67. A number of studies based on the Joslin Kidney Study participants showed that the vast majority of subjects have linear or almost linear GFR slopes within the course of DKD^[262]2. In this study, the annual rate of kidney function decline - GFR slope, expressed in ml/min/1.73 m^2 per year, was estimated with subject-specific trajectories from time series of GFR calculated from serum creatinine using the CKD-EPI formula, including the baseline GFR value. GFR-based binary outcomes Kidney failure, 40% or more decline in GFR, were scarce in the early-to-moderate DKD cohorts. Thus, we utilized the composite outcomes of 30% or more decline in GFR or kidney failure. To harmonize the length of observation, the advanced DKD cohorts were censored at 5 years, so that the length of observation aligned with the early-to-moderate DKD cohorts with type 2 diabetes. The maximum length of observation in the early-to-moderate DKD in type 1 diabetes was 3 years and 2 months. Complement proteome determinations High-throughput proteomics All specimens were stored at − 80 °C until subjected to proteomics analysis. High-throughput proteomics profiling was performed at the Proteomics Core, Beth Israel Deaconess Medical Center in Boston, MA, using the SomaScan platform^[263]17,[264]54,[265]55,[266]68 (See Supplementary Data [267]13 for the list of proteins measured). We used an aptamer platform for our proteomics determinations. Aptamers are unique, single-stranded sequences of nucleic acids that recognize folded, 3-dimensional structures of protein epitopes with high affinity and specificity. This property is further enhanced with the Slow Off-rate Modified Aptamers (SOMAmers). This platform transforms each individual protein concentration into a specific corresponding bound SOMAmer reagent, such that the end result is directly proportional to the target amount of protein in the original sample. The samples are incubated with aptamers to form aptamer-protein complexes. Subsequent washing steps eliminate non-specifically bound or non-bound aptamers and proteins. Next, aptamers are quantified by hybridization with probes complementary to aptamer sequences (Agilent Technologies, Santa Clara, CA). The assay readout is reported in relative fluorescence units^[268]12,[269]54. Proteomics profiling in urine was performed using the Cells and Tissue Lysate 1.3 k kit (SomaLogic, Boulder, CO) according to the manufacturer’s recommended protocol. Urine samples from Joslin Kidney Study subjects with advanced DKD were assayed in batches of 26 samples each. Samples were balanced on the plates by prospective case status, which was blinded to the operating laboratory personnel. Instead of using the manufacturer’s calibration controls, we created a custom in-house pooled urine generated based on a large roster of 121 subjects that reflected the baseline phenotype and composition of the advanced DKD cohorts used for proteomics. Five pooled replicate samples were run on each batch and were used for inter-run calibration. First, the data were normalized to remove hybridization variation within a run. Subsequently, scaling was performed on a per-batch basis to remove intensity differences between runs. Plate scale factor, which is derived from calibrator sample values, ranged from 0.9 to 1.32 (accepted range: 0.4– 2.5). All subject-level and protein-level data for urine determinations passed the SOMAscan assay quality-control criteria and were fit for analysis. In addition, we incorporated an internal, urine control from four subjects with a comparable DKD phenotype run on every other batch that we used in determinations of the coefficients of variation (CV). The distribution of inter-assay coefficients of variation of all Complement proteins measured in urine on the aptamer platform is shown in Supplementary Fig. [270]6a. To evaluate the detectability, we have determined background noise based on the 21 buffer replicates measured on the array. The limit of detection was defined as an averaged value of buffer plus 2 standard deviations. Protein was defined as having a very good detectability if it was detected in ≥ 70% of our samples from the two-cohort advanced DKD. In other words, it was detected in more than two tertiles of the protein distribution in our study population, allowing for analyses per tertile change. Protein was defined as having good detectability if it was detected in ≥ 50% of our samples. In other words, it was detected in more than half of our study population, allowing for analyses above/below the median. Non-well-detectable proteins were analyzed and categorized by the detection threshold (Supplementary Data [271]5). Proteomics profiling of the Complement proteins in plasma was performed using the Human Plasma SOMAscan 1.3 k kit (SomaLogic, Boulder, CO) according to the manufacturer’s standardized protocol as described elsewhere^[272]17. Data normalization was done according to SOMAscan assay data quality-control procedures as described above for urine. In addition, median signal normalization was applied, which accounts for sample-to-sample differences in total protein concentration and other systematic variations within a plate run. Proteomics profiling in kidney tissue was performed using the Cells and Tissue Lysate 1.3 k kit (SomaLogic, Boulder, CO) according to the manufacturer’s protocol. Kidney tissue acquisition and specimen processing for proteomics were previously described^[273]68 (see also Supplementary Methods). Hybridization, normalization, and plate scaling were applied. Principal component-based score plot revealed one subject-level outlier, subsequently removed from the study. All other data were fit for the analyses. Complement proteome annotations The portfolio of 110 Complement proteins was assembled based on Kyoto Encyclopedia of Genes and Genomes^[274]69 (KEGG pathway – hsa04610: Complement and coagulation cascades; hsa01002: peptidases and inhibitors – peptidases inhibitors – Family I4: serpin family), Reactome^[275]70 (R-HSA-166658: Complement cascade; R-HSA-140877: Formation of Fibrin Clot; R-HSA-75205: Dissolution of Fibrin Clot) and UniProt^[276]71 databases (Family: peptidase S1 family Kallikrein subfamily). In addition, we also included Complement component C1q receptor (C1QR1) and Complement component 1q subcomponent binding protein (C1QBP) to our final Complement roster. The molecular weight of each Complement protein was curated using the UniProt database. Targeted measurements of Complement proteins High-throughput aptamer proteomics as performed in advanced DKD cohorts is an excellent tool in discovery; however, it is an unlikely strategy for focused biomarker studies. Thus, we sought to orthogonally validate our proteomics measurements with targeted, antibody-based, single or low-multiplex solutions^[277]29,[278]72,[279]73. The MicroVue Complement Panel 2 was used for measurements of C2, whereas Panel 1 was used to quantify C5a desArg protein levels (Cat. No. A916 and A900, respectively, Quidel, San Diego, CA). C5a desArg (without arginine) – a stable product of C5a was measured as an alternative of C5a. The Quansys platform used for the readout is a chemiluminescent imager that supports quantitative analysis of 96-well plate-based immunoassays. Each well has nano spots coated with protein-specific capture antibodies. The Q-View Imager LS features 18-megapixel resolution and a rapid read time of 270 s (Quansys Biosciences, Logan, UT). The remaining four Complement proteins were measured using enzyme-linked immunosorbent assays (ELISAs). These included CL-K1 (Cat. No. LS-[280]F35879, LifeSpan BioSciences, Seattle, WA), CFH (Cat. No. A039, Quidel, San Diego, CA), C6 and C7 (Cat. No. ab125965 and ab125964, respectively, Abcam, Cambridge, UK). All assays were performed according to manufacturer instructions. Urine specimens were diluted 1:2, except for CL-K1 and C7, which were diluted 1:5. Analysis of all ELISA measurements were performed using a 5-parameter logistic (5PL) curve fitting. On the low multiplex platform, the auto-fit function of the Q-View Software v3.13 was used to choose between 4-parameter and 5-parameter logistic curve fitting. Our initial analyses demonstrated a superior performance of the vendor-provided algorithm built into the auto-function over the fixed curve fitting with the 5PL method. Orthogonal method validation was performed in the baseline urine from 37 subjects with type 1 diabetes from the advanced DKD. We also performed targeted measurements for two of the six proteins (C5a and CFH) due to limited sample volume, in 165 subjects with type 1 and 107 subjects with type 2 diabetes from the advanced DKD cohort. Subsequently, we performed targeted biomarker measurements in urine specimens of the three early-to-moderate DKD cohorts. The protein detectability was excellent across the cohorts, ranging from 98-100%, except for CFH, in which the detectability was 88%. A cohort-specific half-minimum value was assigned to samples that fell below the limit of detection. Our in-house controls, which were a pool of cohort-specific samples, were used to evaluate inter-assay CVs. Inter-assay CVs were less than 14%, except for C5a, which had a CV of 32%. Statistical analysis Continuous variables were presented as means and standard deviations or medians (25^th and 75^th percentiles) as applicable. Categorical variables were provided as counts and percentages. The pathway enrichment analysis was done with the Database for Annotation, Visualization and Integrated Discovery (DAVID) using a full set proteins (n = 1305) measured on the SomaScan platform as a background. Class enrichment by the over-representation method was tested using two-sided Fisher’s exact tests, taking into account the significant proteins and the number of proteins that were present within each respective class, and compared with the rest of the proteins (including significant and non-significant) measured on the high-throughput platform. In the advanced DKD two-cohort study, we used a Cox proportional hazards model to test associations of urinary proteome with the primary outcome. Tied failure times were handled using the exact method in the Cox proportional hazards model. The effect sizes were expressed as hazard ratios per one tertile change in the urinary creatinine-adjusted protein distribution with corresponding 95% confidence intervals. The plots of the Martingale residuals against the covariates were within the distribution of the observed curves, indicating an acceptable model fit. The proportionality test using time-dependent covariates (interactions between covariates and a function of survival time) were not significant, indicating that the proportional hazards assumption was not violated. Measured confounding was evaluated with changes in β effect estimates (difference between the partially adjusted and base model β effect estimates divided by the base model estimates, where β is the natural logarithm of the hazard ratio). A change in the β effect estimate of 20% or higher was deemed non-negligible. Unmeasured confounding was estimated with an E-value by VanderWeele and Ding^[281]74, a sensitivity analysis tool intended for observational studies. The E-value quantifies the minimum strength of the association of a hypothetical, unmeasured confounder that would explain away the association between the exposure and the outcome. A large E-value suggests that it is unlikely that the exposure-outcome relationship could be explained purely by unmeasured confounding. There were no missing data in the key clinical covariates or in the urinary proteome determinations. Our univariable and partially adjusted Cox models had all the data available. Cox models were adjusted for age, sex, race, diabetes type, diabetes duration, body mass index, systolic and diastolic blood pressures, serum cholesterol, hemoglobin A1c, GFR, albuminuria, smoking status, insulin use, renoprotective/other antihypertensives, and lipid-lowering treatments. There were small numbers of missing data in select clinical variables used in the fully adjusted Cox models: systolic (n = 1, 0.3%) and diastolic (n = 1, 0.3%) blood pressures, total cholesterol (n = 14, 4%) and high-density lipoprotein (n = 15, 5%), smoking (n = 7, 2%) and lipid lowering treatment (n = 2, 0.6%). Data missingness was handled with a multiple imputation approach under an assumption of missing at random^[282]75 and comprised three phases: imputation, analysis, and pooling phase. We employed the fully conditional specification^[283]76 method, which uses a discriminant function method for binary/categorical variables (smoking and lipid-lowering treatment) and a linear regression method for continuous variables (systolic and diastolic blood pressure, total cholesterol, and high-density lipoprotein). We created 10 imputed datasets for missing variables and then analyzed each of the 10 complete datasets using the Cox proportional hazards regression model. The parameter estimates (e.g., imputation-specific coefficients and standard errors) from each of the 10 imputations were pooled into a single set of results for inference using Rubin’s rule^[284]77. In addition, we performed analyses in the complete dataset of the advanced DKD with no missing data (93%) that yielded highly comparable results. There were 27 out of 110 proteins that were not very well detectable. We analyzed those using two-sided Fisher’s exact tests. Of these, 13 well-detectable proteins were categorized as above or below the median, while the remaining 14 non-well-detectable proteins were categorized above or below their detection threshold. Life tables of the 10-year cumulative incidence of kidney failure were generated using the Kaplan-Meier method. Homogeneity across the tertiles treated as a three-level categorical variable was evaluated with a log-rank test. We used a weighted Cohen’s kappa test^[285]78 to evaluate the degree of agreement of Complement associations with kidney outcomes between type 1 and type 2 diabetes advanced DKD cohorts. Association strengths (P-values) from the cohort-specific crude Cox proportional hazards model were categorized into quintile ranks. Next, weights were computed using the equal-spacing method. Weighted kappa coefficients range from 0 to 1, where a coefficient closer to 1 indicates greater agreement^[286]79. Spearman rank-order correlations were used as a non-parametric measure of associations to evaluate urinary/circulating Complement pairs with the prospective kidney outcomes, urinary Complement with clinical covariates, and molecular indices of diabetic kidney injury. The continuous variables in the two-group comparisons were compared in the analysis of variance. Dispersion of each protein in the kidney tissue was determined as a ratio of the standard deviation over the mean value, expressed as a percentage. We used Pearson parametric correlations for orthogonal validation between high-throughput and targeted protein measurements. The prognostic model performance in the two-cohort advanced DKD study was evaluated using a Cox proportional hazards regression model. The clinical model (Model 1) contained age, sex, diabetes type, hemoglobin A1c, GFR, and albuminuria. Model 2 consisted of the top 5% Complement proteins added to the clinical covariates. We evaluated the risk discrimination with Uno’s concordance statistic (C-statistic). Uno’s method calculated the concordance probability by modeling the censoring distribution and using it to weigh the uncensored observations in the estimation, resulting in censoring-independent estimates. Unsupervised hierarchical clustering was performed using Ward’s method with Euclidean distances. We also assessed the prognostic accuracy of the Complement proteome for a long-term kidney outcome using machine learning algorithms in the advanced DKD cohorts. The outcome of interest was the development of kidney failure in 10 years. We tested a varying number of proteins: top 1, top 5% (n = 6), and the entire roster of the Complement proteome (n = 110). The model performances were compared using the concordance statistic. We evaluated biostatistical logistic regression (LR) and six machine learning models: principal component (PCA) using LR; penalized regression: elastic net (EN), lasso (LS), and ridge (RD); decision tree: random forest (RF) and generalized boosting method (GBM); and neural network (NNET). Our advanced DKD cohorts was split by random sampling into training and testing datasets. The training dataset was 60% of the full cohort, and the testing dataset was the remaining 40% to allow for a sufficient sample size. The model accuracy was calculated using repeated 10-fold cross-validation. We confirmed that our data did not contain any covariates that had near zero variation, were highly correlated, or were linear combinations of each other. Complement protein levels were normalized to urinary creatinine and transformed to their base 10 logarithms or cohort-specific percentile-ranked values. The final algorithm for each model was determined using the training datasets, and subsequently, each algorithm was used to calculate the Complement proteome’s prognostic accuracy for kidney failure within the testing datasets, resulting in the final performance accuracy of each computational model. All graphical displays were generated using GraphPad Prism v8.3.1 (GraphPad Software, San Diego, CA) except for the following. The needle plot was generated with the R package ggplot2, in R version 3.5.0 (R Core Team, 2018)^[287]80. The weighted Cohen’s kappa test was performed in R version 4.2.2, using the vcd package^[288]81. The schematic representation of the study workflow and major pathways of the Complement system was created with BioRender.com. Hierarchical clustering was performed with JMP 16.0.0 software (SAS, Cary, NC). The chord diagram was generated with the R package circlize, in R, version 0.4.8 (R Core Team, 2018)^[289]82. Machine learning models were built with the R package caret (short for Classification And REgression Training) in R version 6.0-86 (R Core Team, 2018)^[290]83. All statistical tests were two-sided. Evaluations of the Complement proteome in the advanced DKD focused on the top 5% of the distribution of P-values obtained in the study, which translated the association strength to an α = 2.4 x 10^−23 significance threshold. For comparison, Bonferroni correction had a less stringent α = 4.5 × 10^−4 (0.05/110). The kidney tissue proteomics data were treated with Benjamin-Hochberg false discovery rate < 0.05. Other significance tests used α = 0.05. Analyses were performed in SAS v9.4 (SAS, Cary, NC) unless otherwise indicated. The Supplementary Information provides detailed descriptions on the five cohorts, additional study groups, and single-cell/single-nucleus RNA sequencing (sc/snRNA-seq) and spatial transcriptomics. Reporting summary Further information on research design is available in the [291]Nature Portfolio Reporting Summary linked to this article. Supplementary information [292]Supplementary Information^ (2.1MB, pdf) [293]41467_2025_62101_MOESM2_ESM.docx^ (14.6KB, docx) Description of Additional Supplementary Files [294]Supplementary Data 1-13^ (363.9KB, xlsx) [295]Reporting Summary^ (100KB, pdf) [296]Transparent Peer Review file^ (762.4KB, pdf) Source data [297]Source Data^ (681KB, xlsx) Acknowledgements