Abstract

   Diabetic kidney disease (DKD) progression is not well understood. Using
   high-throughput proteomics, biostatistical, pathway and machine
   learning tools, we examine the urinary Complement proteome in two
   prospective cohorts with type 1 or 2 diabetes and advanced DKD followed
   for 1,804 person-years. The top 5% urinary proteins representing
   multiple components of the Complement system (C2, C5a, CL-K1, C6, CFH
   and C7) are robustly associated with 10-year kidney failure risk,
   independent of clinical covariates. We confirm the top proteins in
   three early-to-moderate DKD cohorts (2,982 person-years). Associations
   are especially pronounced in advanced kidney disease stages, similar
   between the two diabetes types and far stronger for urinary than
   circulating proteins. We also observe increased Complement protein and
   single cell/spatial RNA expressions in diabetic kidney tissue. Here,
   our study shows Complement engagement in DKD progression and lays the
   groundwork for developing biomarker-guided treatments.

   Subject terms: Chronic kidney disease, Prognostic markers, Data mining,
   Proteomics
     __________________________________________________________________

   Complement proteome engagement is strongly linked to kidney outcomes in
   diabetes. This translational study leveraged five cohorts of over 4,500
   person-years and high-throughput proteomics to enable potential
   biomarker-guided drug development.

Introduction

   Diabetes is the leading cause of kidney failure. Over the last two
   decades, prevalence of kidney failure due to diabetes has more than
   doubled in the United States^[64]1. National databases mainly reflect
   kidney disease due to type 2 diabetes, but similarly unfavorable
   patterns are seen for type 1 diabetes^[65]2. Despite recent
   advances^[66]3,[67]4, the residual risk of kidney failure^[68]1,[69]2
   is high, emphasizing the need for research on novel diabetic kidney
   disease (DKD) molecular mechanisms and the identification of new drug
   targets.

   It is plausible to assume that the urinary proteome will offer
   meaningful insights into the mechanisms which underlie DKD. Earlier
   studies of urinary proteins in DKD were pursued with targeted biomarker
   studies^[70]5–[71]7 or mass spectrometry^[72]8,[73]9. High-throughput
   protein testing in large cohort studies has only recently become
   possible. These robust affinity methods, like the aptamer proteomics
   SomaScan platform, have been used so far to measure protein levels in
   serum or plasma in multiple studies. By contrast, this technology has
   only been utilized in a few studies of the urinary proteome. Those
   studies were all cross-sectional^[74]10–[75]12, mostly
   small^[76]10,[77]11 and focused on distant clinical phenotypes. To
   date, no prospective study has been performed on the urinary proteome
   in diabetic kidney disease or any other chronic disease.

   Ours was a prospective two-cohort study of subjects with type 1 or type
   2 diabetes and advanced DKD at baseline, in whom high-throughput
   proteomics of urine was evaluated for associations with 10-year kidney
   failure development. Advanced computational techniques were used to
   explore multi-dimensional links of the most enriched pathway proteome
   (Complement system) with clinical and molecular indices of diabetic
   kidney injury. Subsequently, we evaluated whether the associations of
   the top Complement proteins extend to earlier disease stages in a
   three-cohort study in type 1 or type 2 diabetes and early-to-moderate
   DKD for clinically recognized kidney outcomes. We explored select
   features of Complement proteins as potential biomarkers. In order to
   gain insights into the possible source of these proteins, we further
   examined the Complement proteome across three matrices (plasma, urine,
   and kidney), supported further by single-cell and spatial RNA kidney
   data.

Results

Prospective study cohorts

   Figure [78]1 outlines the study framework. The two-cohort study of
   subjects with advanced DKD and type 1 (189 subjects) or type 2 diabetes
   (115 subjects) had at baseline impaired kidney function with glomerular
   filtration rate (GFR) predominantly in the G3 category and increased
   albuminuria in the A3 category in more than half of the subjects
   (Supplementary Data [79]1). Development of kidney failure within 10
   years occurred in 53% of subjects with type 1 and 23% of subjects with
   type 2 diabetes. The three-cohort study included 652 subjects with
   early-to-moderate DKD and of both diabetes types. These subjects had
   mostly normal kidney function (G1 or G2) and moderately increased
   albuminuria (A2). Kidney failure (clinical endpoint) was rare in this
   group. The composite outcome of 30% or more decline in GFR and/or
   kidney failure occurred in 18%. The study cohorts were predominantly
   white (73%), whereas black race accounted for 21% of the subjects. The
   inclusion criteria and subject selection for the current study are
   detailed in Supplementary Fig. [80]1a, b. See also Supplementary
   Methods, Supplementary Data [81]2.

Fig. 1. Schematic representation of the study framework.

   [82]Fig. 1
   [83]Open in a new tab

   Our comprehensive study of the urinary Complement proteome comprised a
   two-cohort study of subjects with type 1 (n = 189) or type 2 diabetes
   (n = 115), and advanced DKD followed for kidney failure in 10 years. We
   employed advanced molecular phenotyping technologies to establish
   proteomics associations with prospective kidney outcomes, detailed
   biological relationships of our high-throughput proteomics data with
   the clinical and molecular phenotypes of diabetic kidney disease
   progression; we evaluated whether the associations extend to earlier
   DKD stages in a three-cohort study (n = 652) followed for up to 5
   years. We investigated potential sources of increased Complement
   proteins in the urine by proteomics studies across three
   biofluid/tissue matrices and single-cell or spatial transcriptomics.
   DKD, diabetic kidney disease; T1D, type 1 diabetes; T2D, type 2
   diabetes; GFR, estimated creatinine-based glomerular filtration rate;
   AA, African American; sc/snRNAseq, single-cell/single-nucleus RNA
   sequencing.

Enriched pathways in urinary proteome and kidney failure risk in advanced DKD

   Pathway-driven analyses (one pathway at a time) of the 1305 urinary
   proteins measured for associations with kidney outcome in the advanced
   DKD two-cohort study identified the most enriched pathways belonging to
   the Complement system (Complement and coagulation cascades (hsa04610),
   regulation of Complement cascade (R-HSA-977606) and activation of C3
   and C5 (R-HSA-174577); Fig. [84]2a). Our analysis resulted in 66
   significant out of 1305 measured proteins including 15 out of 110
   Complement system proteins, with the threshold set at the 95^th
   percentile of P-values distribution, equivalent to α = 10^−16. The
   subsequent data-driven, pathway-enhanced approach (all Complement
   pathways combined) confirmed further a significant enrichment in
   Complement (2.8-fold enrichment, P = 2.1 × 10^−4, Fig. [85]2b).

Fig. 2. Urinary proteome associated with prospective kidney outcome is
enriched in the Complement system. Analysis is performed in the advanced DKD
two-cohort study.

   Fig. 2
   [86]Open in a new tab

   a Pathway enrichment analysis performed using the DAVID Gene Functional
   Classification Tool shows the statistically significant pathways
   (P < 0.05). b Over-representation analysis using Fisher’s exact test
   detected enrichment in Complement proteins. The cross within each plot
   represents the median, 75^th, and 25^th percentile values. All P-values
   are two-sided. Source data are provided as a Source Data file.

Complement proteome and development of kidney failure within 10 years in
advanced DKD

   Associations of the urinary Complement proteome and progression to
   kidney failure in advanced DKD adjusted for diabetes type were highly
   robust, and all displaying risk pattern (Fig. [87]3a). From among 110
   Complement proteins, C2, C5 anaphylatoxin (C5a), collectin kidney 1
   (CL-K1), Complement factor H (CFH), C6 and C7 represented the top 5% of
   most significant proteins, which featured association strengths of
   P < 2.4 x 10^−23. For all these proteins hazard ratio (HR) was very
   high. Kidney failure risk was over 4 times higher per C2 tertile change
   (hazard ratio (HR), 4.27; 95% confidence interval (CI), 3.24, 5.64;
   P = 1.4 x 10^−24, Fig. [88]4a - base model; see Supplementary
   Data [89]3 for the formal protein nomenclature).

Fig. 3. Urinary Complement proteome and kidney failure risk by pathway in the
advanced DKD two-cohort study.

   [90]Fig. 3
   [91]Open in a new tab

   a A comprehensive view of all proteins of the most enriched pathway
   (Complement system, 110 proteins) from among 1305 proteins measured by
   high-throughput proteomics. The needle plot depicts the strengths of
   associations representing P-values from the Cox proportional hazards
   models for developing kidney failure within 10 years in the two-cohort
   advanced DKD. One vertical needle represents P-value transformed to its
   base 10 logarithm from the diabetes type-adjusted model (base model),
   evaluating one creatinine-adjusted Complement protein at a time.
   Proteins are ordered on the x-axis by the pathway, the UniProt
   identifier (ID), and the protein name (to account for the fact that
   some proteins share the same UniProt ID; e.g., C5 and C5a). The top 5%
   Complement proteins are labeled. The gray line marks the threshold of
   significance at Bonferroni corrected α = 4.5 × 10^−4. b A map of the
   Complement system pathways and vertical bar graphs of association
   strengths (as described in a) for pathways that include the top 5%
   Complement proteins. The top Complement proteins are indicated with red
   font on the bar graphs and on the pathway scheme. For a full list of
   the Complement system proteins and data-driven associations, please
   refer to the Supplementary Data [92]5. All P-values are two-sided.
   Figure 3b is created in BioRender, Md Dom, ZI. (2024)
   [93]https://BioRender.com/l33k779. DKD, diabetic kidney disease. Source
   data are provided as a Source Data file.

Fig. 4. Complement proteome associations with kidney failure development,
their prognostic measures and biological insights into diabetic kidney
disease progression in the advanced DKD two-cohort study.

   [94]Fig. 4
   [95]Open in a new tab

   a Forest plots and tabular form data showing the top 5% Complement
   proteins associated with 10-year kidney failure development in the two
   cohorts (n = 304) for base and clinically adjusted Cox proportional
   hazards models. The Base Model is controlled for diabetes type. Effect
   sizes (closed diamond symbols) with corresponding 95% confidence
   intervals (horizontal bars) are shown per one tertile change in the
   urinary creatinine-adjusted Complement protein distribution. b
   Correlations of urinary Complement proteome with clinical legacy
   measures (needle plot). The order and colors of the needles follow the
   pathway annotation of the Fig. [96]3a. The Height of one needle
   represents a correlation coefficient between one Complement protein and
   one clinical legacy measure in a needle title. P-values are two-sided.
   c Spaghetti plot displaying P-values for all 110 Complement protein
   associations with 10-year kidney failure risk in 5 different partially
   adjusted models. The black and gray dashed lines denote the
   Bonferroni-corrected and nominal significance thresholds, respectively.
   P-values are two-sided. d Correlations of urinary Complement proteome
   with molecular kidney injury indices (needle plot). The order and
   colors of the needles follow the pathway annotation of the Fig. [97]3a.
   e A chord diagram of relationships between urinary Complement proteome
   and circulating KRIS. Upper sectors representing TNFRSF members of KRIS
   (green) and non-TNFRSF members of KRIS (purple) arranged clockwise in
   order of decreasing strength of associations with the kidney outcome.
   Lower sectors are arranged counterclockwise by the strengths of the
   Complement associations with the kidney outcome. The top 5% Complement
   proteins are marked with asterisks. The length of the circular sectors
   indicates the cumulative strengths of the associations for a given
   protein. Links corresponding to the 75^th percentile of the
   distribution of correlation coefficients are shown. HR, hazard ratio;
   CI, confidence intervals. Source data are provided as a Source Data
   file.

   These strong relationships are illustrated by large differences in the
   10-year cumulative incidence of kidney failure according to tertiles of
   baseline urinary concentration of each of the top Complement proteins.
   For the top tertile of C2, the incidence reached 82% at the end of 10
   years, whereas it was 23% for the lowest tertile, Fig. [98]5a–f). Only
   14 all-cause and cardiovascular mortality events (4.5%) occurred in the
   advanced DKD cohorts, thus, competing event analyses were deemed
   unnecessary.

Fig. 5. Urinary top 5% Complement proteins and cumulative incidence of kidney
failure in the advanced DKD two-cohort study.

   [99]Fig. 5
   [100]Open in a new tab

   The proportions of subjects with type 1 (n = 189) or type 2 (n = 115)
   diabetes and advanced DKD (two-cohort study), who developed kidney
   failure within 10 years of follow-up, as per tertiles of distribution
   of baseline values of urinary creatinine-adjusted Complement proteins
   measured with high-throughput aptamer proteomics. a Complement C2. b
   Complement C5a. c Collectin kidney 1 (CL-K1). d Complement C6. e
   Complement factor H (CFH). f Complement C7. Solid lines represent
   Kaplan-Meier curves, whereas the surrounding shaded areas represent the
   corresponding 95% confidence intervals. Log-rank test reflects the
   comparison among tertiles treated as a three-level categorical
   variable. All P-values are two-sided. T1 (bottom), tertile 1; T2,
   tertile 2; T3 (top), tertile 3. Source data are provided as a Source
   Data file.

   In order to determine the independence of the Complement proteins from
   clinical covariates in the presence of kidney outcome, we turned back
   to the Cox model. After adjustment for clinical covariates (age, sex,
   race, diabetes type and duration, body-mass index, systolic and
   diastolic blood pressure, hemoglobin A1c (HbA1c), GFR, albuminuria,
   cholesterol, smoking status, insulin use, renoprotective/other
   antihypertensive, and lipid-lowering treatments), each top protein
   remained significantly associated with kidney failure with risks from
   2.0 to over 3.7-fold. In the adjusted model, kidney failure risk
   remained over 3 times higher per C2 tertile change (HR), 3.30; 95% CI,
   2.10, 5.20; P = 2.4 x 10^−7; Fig. [101]4a.

   When the confounding effects of key covariates were evaluated in
   detail, age, sex, and HbA1c did not confound the associations for top
   proteins (changes in β effect estimates less than 6%). Confounding by
   GFR was moderate (changes in β from 10% to 15%). The analysis using the
   new CKD-EPI 2021 GFR equation^[102]13 (GFR[new]) yielded similar
   results. Confounding by albuminuria was substantial, and
   protein-specific (β changes from 4% to 26%); Supplementary Fig. [103]2a
   and Supplementary Data [104]4. For the top 5% Complement proteins, we
   found no interactions with sex (P[interaction ]> 0.28 for each,
   Supplementary Fig. [105]2b) and no interactions with diabetes type.

   In the analysis of the entire Complement proteome, we first examined
   the orthogonal relationships between the urinary Complement proteome
   with clinical covariates (Fig. [106]4b). Overall, correlations with
   albuminuria were more substantial compared to those with kidney
   function or glycemic control. Fifty-nine out of 110 (54%) proteins were
   associated with kidney failure independently from key clinical
   covariates (Fig. [107]4c).

Urinary Complement proteome in advanced DKD cohorts by pathway

   The data-driven associations between the top Complement proteins and
   kidney failure risk were integrated into current biological knowledge
   about the Complement system^[108]14–[109]16. The top 5% proteins
   represented its multiple components (Fig. [110]3b). The top protein,
   Complement C2, is a downstream effector resulting from activation of
   either the lectin or classical pathways. The lectin pathway itself is
   activated upstream by proteins recognizing specific carbohydrate
   groups, including CL-K1 (the top third protein in our study),
   mannose-binding lectin (MBL), and ficolins 1, 2, and 3 (FCN1-3). The
   alternative pathway is regulated by a number of Complement factors,
   including CFH (the top fifth protein). All three upstream pathways
   converge upon the formation of C3 and C5 convertases. The latter, C5
   convertase, cleaves an intact Complement C5 protein into the Complement
   5 anaphylatoxin (C5a – top second protein) and C5b. The terminal
   Complement pathway starts with C5b binding to Complement C6 (top fourth
   protein) and then to Complement C7 (top sixth protein) to form C5b-7,
   which anchors into the cell membrane, whereas binding of C8 and C9
   creates the terminal form of the membrane attack complex - MAC (C5b-9),
   responsible for cell lysis. Proteins of the classical pathway,
   opsonins, or regulatory proteins had weaker associations.

Urinary Complement proteome and other molecular indices in advanced DKD

   From among urinary biomarkers of DKD (Fig. [111]4d), Complement
   proteins correlated markedly with a monocyte chemoattractant protein 1
   (MCP1), and noticeably with a liver-type fatty acid binding protein
   (LFABP). When it comes to urinary immunoglobulins, correlations with
   immunoglobulins M (IgM) and IgG were substantially stronger than those
   with IgA.

   Among circulating biomarkers, KRIS proteins, previously reported by us,
   were examined for association with urinary Complement proteins. About
   one-third of the circulating KRIS proteins revealed strong connections
   with the urinary Complement. Seven out of 17 KRIS proteins: TNFRSF1A
   (also known as TNFR1), SF1B (known as TNFR2), SF19 and IL15RA, IL17F,
   CD55, and CD300C, were each strongly connected with over 40 Complement
   proteins. Globally, however, about two-thirds of the Complement
   proteome (77%) did not have strong connections (7 links or fewer) with
   KRIS. All top Complement proteins remained significant after adjustment
   by TNFR1 (Fig. [112]4c, [113]e).

Urinary Complement proteins and kidney outcomes by diabetes type and kidney
disease stage

   In order to appreciate the similarity/dissimilarity of disease
   progression according to diabetes type, we compared high-throughput
   Complement proteomes and kidney failure risks between the two advanced
   DKD cohorts. The association strengths (P-values) and effect sizes
   (hazard ratios) both showed a strong and uniform agreement between type
   1 and type diabetes (concordance: 87%; k[w] = 0.67; Fig. [114]6a and
   Supplementary Data [115]5 for Complement proteome associations with
   kidney failure by diabetes type).

Fig. 6. Urinary Complement proteins and kidney disease progression across
five cohorts - by kidney disease stage and by diabetes type.

   [116]Fig. 6
   [117]Open in a new tab

   a The plot of strengths of associations (P-values transformed to base
   10 logarithms) and effect sizes per one tertile change of each of the
   110 Complement proteins on 10-year risk of developing kidney failure in
   the two advanced DKD cohorts, with type 1 (n = 189) or type 2 diabetes
   (n = 115). The top Complement proteins are marked with red dots. A
   weighted Cohen’s kappa coefficient (k[w]) and corresponding P-values
   reflect the test of agreement for the strength of Complement
   associations between type 1 and type 2 diabetes. Measurements in
   advanced DKD were performed with aptamer proteomics and in
   early-to-moderate DKD with targeted assays. All P-values are two-sided.
   b Associations of the top 5% Complement proteins with the prospective
   continuous kidney outcome in all 956 subjects and 4629 person-years by
   cohort. Color-annotated and cohort-specific effect estimates (diamond
   symbols) and 95% confidence intervals (horizontal bars) represent
   changes in kidney function over time per one tertile increase in the
   distribution of a urinary creatinine-adjusted Complement protein.
   Please see Supplementary Data S[118]6 for all results on GFR-based
   outcomes. c Associations of the top 5% Complement proteins with the
   prospective, continuous kidney outcome and (d), binary composite kidney
   outcome in study cohorts combined by DKD stages. Color-annotated and
   model-specific effect estimates (crude model: closed circle symbols;
   adjusted model: open circle) and 95% confidence intervals (horizontal
   bars) represent changes in kidney function over time per one tertile
   increase in the distribution of a urinary creatinine-adjusted
   Complement protein. Please see Supplementary Data S[119]6 for all
   results on GFR-based outcomes. DKD, diabetic kidney disease; CI,
   confidence intervals; GFR, estimated creatinine-based glomerular
   filtration rate; OR, odds ratio; AA, African American. Source data are
   provided as a Source Data file.

   Three early-to-moderate DKD cohorts were examined together with
   advanced DKD cohorts in subsequent analyses of the secondary kidney
   outcomes. Analyses of the 5 cohorts (Fig. [120]6b) revealed that all
   top 5% proteins in the advanced DKD and three to five of the six
   proteins in the early-to-moderate DKD were associated with an
   acceleration of the GFR slope. The confidence intervals for C5a and CFH
   crossed the x-axis at 0 in the type 2 diabetes African American cohort.
   The effect estimates were largest in the advanced DKD with type 1
   diabetes, substantial in the type 2, and moderate to weak, but quite
   comparable across the three cohorts with earlier disease. In the crude
   analyses of intermediate kidney outcomes (GFR slope and the composite
   binary outcome) across combined cohorts: two cohorts with advanced DKD
   combined and three cohorts with early-to-moderate DKD combined,
   respectively, the results were consistent to those observed within
   individual cohorts. The findings in the combined advanced DKD cohorts
   were similar and remained significant for all top proteins (C2, C5a,
   CL-K1, C6, CFH, C7) after adjustment for clinical variables, including
   albuminuria. The findings in the combined early/moderate DKD cohorts
   remained significant for the top five proteins (all but CL-K1)
   measured, after adjustment for clinical variables such as age, sex,
   HbA1c, GFR, and diabetes type. Three out of five proteins (C2, C5a, C7)
   remained significant in the three early/moderate cohorts, after the
   further adjustment by albuminuria (Fig. [121]6c, d and Supplementary
   Data [122]6).

Complement proteins in urine, circulation, and kidney tissue

   To gain insights into the source of these proteins, we evaluated
   Complement in paired plasma and urine specimens from the cohort subset
   of 97 subjects with advanced DKD with type 1 diabetes (for clinical
   characteristics, refer to Supplementary Data [123]7). Association of
   the GFR slope with proteins in the urine was much stronger than with
   the corresponding proteins in circulation (Fig. [124]7a, [125]c and
   Supplementary Data [126]5). Interestingly, complement
   decay-accelerating factor or DAF, a previously reported KRIS
   member^[127]17, was also the top plasma protein correlated with the GFR
   slope in our study (“proof-of concept”, Fig. [128]7a, b). Noteworthy,
   even the strongest associations of the circulating proteins were much
   weaker than the urinary associations. Further support for the
   hypothesis that kidney tissue is a source of increased concentration of
   urinary Complement proteins, rather than protein leakage through
   glomerular basement membrane, is the absence of correlations between
   urinary Complement proteins and their molecular weight (P = 0.54,
   Supplementary Fig. [129]3).

Fig. 7. Complement proteomes and diabetic kidney disease progression across
three matrices.

   [130]Fig. 7
   [131]Open in a new tab

   a The two-needle plot of the correlation strengths for subject- and
   timepoint-level pairs of urinary and circulating Complement proteins
   with a prospective GFR slope in the advanced DKD cohort with Type 1
   diabetes subset (n = 97). Proteins are ordered on the x-axis by the
   matrix, the UniProt identifier (ID), and the protein name. b The
   volcano plot, where ratios of mean values of Complement proteins in the
   kidney tissue in DKD cases to controls (n = 31, independent
   cross-sectional study group, see Supplementary Data [132]8 for clinical
   characteristics) are plotted against the strengths of the associations
   (P-values transformed to their base 10 logarithm). The gray dashed
   lines indicate the false discovery rate (top) and the nominal (bottom)
   significance thresholds. c The box plots show in parallel the relative
   concentrations of top 5% Complement proteins in urine and plasma at
   baseline between subjects who developed kidney failure compared to
   those who did not within 10 years of follow-up in the advanced DKD
   subset (n = 97). d The box plots show kidney tissue protein expressions
   of tthe op 5% Complement proteins in controls (n = 9) and in subjects
   with advanced diabetic kidney disease (n = 22). For all boxplots, the
   horizontal center line within each box represents the median, the top
   and bottom of each box limit indicate the 75^th and 25^th percentile,
   and the whisker bars indicate the range. Data presented as dots beyond
   whiskers are outliers. Association strengths that did not reach
   significance (α = 0.05) are not shown. All P-values are two-sided. GFR,
   estimated creatinine-based glomerular filtration rate; DKD, diabetic
   kidney disease; RFU, relative fluorescence units. Source data are
   provided as a Source Data file.

   To obtain more direct evidence supporting the above hypothesis, we
   compared Complement proteomes in the kidney tissue in an independent
   case-control study group (clinical characteristics of 31 subjects in
   this study is in Supplementary Data [133]8). Sixteen out of 110
   proteins were differentially expressed between DKD cases and controls.
   Those included C2, alternative pathway (CFH, CFD, CFP), and terminal
   complex (C7 and C9) proteins. C5a was not different between these
   groups, but intact C5 was. Another anaphylatoxin, C3a, was increased in
   diabetic kidneys with the highest subject-level heterogeneity
   (dispersion value: 140%). Interestingly, expression of the lectin
   pathway members, including CL-K1 and others, did not differ
   (Fig. [134]7b, [135]d and Supplementary Data [136]5). The associations
   between expressions of top Complement proteins with histological
   indices of kidney injury revealed C7 expression in kidney tissue had a
   strong correlation with interstitial fibrosis and immune cell
   infiltrates. In contrast, C5, C3, or derived anaphylatoxin correlated
   markedly with intimal fibrosis (Supplementary Data [137]9).

   In the kidney tissue analyses by the single cell/nucleus RNA sequencing
   (sc/snRNAseq), CFH and C7 genes were highly abundant in kidney:
   glomerular parietal epithelial cells and collagen-expressing
   interstitial fibroblasts, respectively (Fig. [138]8a). The other four
   genes had low overall expression. Nevertheless, expression in diabetic
   kidney was increased in resident cells for C5, C6 and C7 and in
   infiltrating cells for COLEC11 (encoding CL-K1 protein) (Fig. [139]8b).
   In addition, spatial transcriptomics analysis revealed distinct spatial
   expression patterns for these genes in control and DKD samples, further
   highlighting the differential expression across tissue regions
   (Fig. [140]8c and Supplementary Fig. [141]4a, b). Integrated evidence
   from the current study of protein profiles in urine, plasma, and tissue
   reinforces the notion that the kidney is the source of increased
   urinary concentration of Complement proteins associated with the risk
   of DKD.

Fig. 8. Sc/snRNA-seq expression of genes corresponding to the top 5%
Complement proteins in the kidney tissue.

   [142]Fig. 8
   [143]Open in a new tab

   a Overall expression and proportions on the uniform scale across the
   genes. b Gene-specific expression for controls (n = 17) and DKD cases
   (n = 8). Gene expressions corresponding to the top 5% Complement
   proteins from our study are shown as bubble plots in all kidney tissue
   examined overall (a) and by caseness, comparing diabetic kidney tissue
   (red) vs control tissue (purple) (b). Of note, CL-K1 protein is a
   product of the COLEC11 gene. Dot size in both panels represents the
   percent of cells expressing the gene of interest, whereas the color
   intensity corresponds to the expression. c Spatial transcriptomics of
   genes corresponding to Complement C2 and C7 proteins. This figure
   illustrates the spatial distribution of 2 genes (C2 and C7) in human
   kidney tissues from a control sample and a DKD sample. The data were
   obtained using the Visium spatial transcriptomics platform. Each panel
   represents the log-transformed expression counts for one gene,
   visualized across the tissue sample. Color gradients indicate varying
   levels of gene expression, with blue representing lower expression and
   red indicating higher expression. Scale bar = 250 μm. Source data are
   provided as a Source Data file. DKD, diabetic kidney disease; RBC, red
   blood cells; Baso/Mast, basophils or mast cells; Mac, macrophages;
   CD16_Mono, CD16 + monocytes; CD14_Mono, CD14 + monocytes; pDC,
   plasmacytoid dendritic cells; cDC, classical dendritic cells; NK,
   natural killer cells; CD8T, CD8 + T lymphocytes; CD4T, CD4 + T
   lymphocytes; B_memory, memory B lymphocytes; B_Naiive, naïve B
   lymphocytes; IC_B, type beta intercalated cells; IC_A, type alpha
   intercalated cells; PC, principal cells of collecting duct; CNT,
   connecting tubule cells; DCT, distal convoluted tubule cells; M_TAL,
   medullary thick ascending loop of Henle; C_TAL, cortical thick
   ascending loop of Henle; DLOH, thin descending loop of Henle;
   Injured_PT, injured proximal tubule cells; PT_S3, proximal tubule
   segment 3; PT_S2, proximal tubule segment 2; PT_S1, proximal tubule
   segment 1; Podo, podocytes; PEC, parietal epithelial cells; Mes,
   mesangial cells; GS_Stromal, glomerulosclerosis-specific stromal cells;
   Myofib, myofibroblasts; Fibroblast_2, fibroblasts expressing
   insulin-like growth factor-binding protein 7, vimentin, and
   beta-2-microglobulin; Fibroblast_1, fibroblasts expressing collagen
   type I alpha 1 and 2 chain; Endo_lymphatic, endothelial cells of
   lymphatic vessels; Endo_peritubular, endothelial cells of peritubular
   vessels; Endo_GC, endothelial cells of glomerular capillary tuft.

Urinary Complement proteins as potential biomarkers of kidney failure risk

   Top Complement proteins added on top of the key clinical covariates
   (age, sex, diabetes type, hemoglobin A1c, GFR, and albuminuria)
   significantly increased the model discrimination (from the concordance
   index, c = 0.810 to c = 0.836; P = 0.021).

   Not only did we investigate the associations with kidney failure risk
   over a 10-year period, but we also examined these associations during
   intermediate-in-length outcomes. Fifty-three subjects (17%) from the
   advanced DKD developed kidney failure in 3 years and 79 subjects (25%)
   in 5 years of follow-up, respectively. The strengths of associations
   were weaker, and the confidence intervals surrounding the effect
   estimates were wider for shorter follow-up. Nevertheless, they remained
   significant for each comparison (Supplementary Data [144]10). This
   evidence supports Complement proteins as potential biomarkers of kidney
   failure risk or prognostication of DKD progression (as per FDA-NIH
   nomenclature^[145]18). The proportionality test in the Cox model
   revealed the effects of the variables at the hazard rate were
   proportional and constant over time (P > 0.05). We also explored
   relationships among Complement proteins. All top proteins, but C7
   correlated among each other. In the global evaluation, Complement
   factors - alternative pathway members and coagulation proteins
   correlated markedly among each other (Supplementary Data [146]11).

   Next, we evaluated whether clustering informed by Complement would
   differentiate subjects at future kidney risks. The cumulative incidence
   of kidney failure relying on clinical legacy measures (albuminuria and
   GFR) was 37% overall (Fig. [147]9a). In striking contrast, an
   unsupervised cluster built upon the top 5% Complement proteins resulted
   in 3 clusters of subjects with markedly differing proportions of the
   kidney outcomes (Fig. [148]9b). Seventy-eight percent of the advanced
   DKD subjects with type 1 diabetes in Cluster 3 progressed to kidney
   failure within 5 years of follow-up in comparison to only 46% and 13%
   in Cluster 1 and 2, respectively (Fig. [149]9c). Overall risks in type
   2 diabetes were lower, but the cluster-based differences were similar
   (Fig. [150]9d–f). Moreover, in early-to-moderate DKD subjects, clusters
   informed by the top proteins also markedly discriminate the odds of
   prospective kidney outcomes (Supplementary Fig. [151]5a–d).

Fig. 9. Complement proteins-informed clustering and prospective kidney
failure in the advanced DKD cohort with type 1 and type 2 diabetes.

   [152]Fig. 9
   [153]Open in a new tab

   a Kaplan-Meier curve of 189 subjects with type 1 diabetes and advanced
   diabetic kidney disease who developed kidney failure within 5 years of
   follow-up. b An unsupervised approach performed in the form of
   hierarchical clustering built upon the top urinary Complement proteins.
   Subjects with type 1 diabetes were clustered based on Complement
   protein levels, disregarding the outcome. Vertical colored bars show
   cluster groups and prospective subject caseness. c Proportions of
   subjects with type 1 diabetes who developed kidney failure within 5
   years by cluster. Solid lines represent Kaplan-Meier curves. Log-rank
   test reflects the comparison over time of the cumulative incidence of
   kidney failure by cluster. d Kaplan-Meier curve of 115 subjects with
   type 2 diabetes and advanced diabetic kidney disease who developed
   kidney failure within 5 years of follow-up. e An unsupervised approach
   performed in the form of hierarchical clustering built upon the top
   urinary Complement proteins. Subjects with type 2 diabetes were
   clustered based on Complement protein levels, disregarding the outcome.
   Vertical colored bars show cluster groups and prospective subject
   caseness. f Proportions of subjects with type 2 diabetes who developed
   kidney failure within 5 years by cluster. Solid lines represent
   Kaplan-Meier curves. Log-rank test reflects the comparison over time of
   the cumulative incidence of kidney failure by cluster. P-values are
   two-sided. Source data are provided as a Source Data file.

   In addition to biostatistical models, we tested the prognostic accuracy
   of six machine learning approaches (Fig. [154]10). The logistic model
   with the top protein, C2, had a robust prognostic accuracy (c = 0.838)
   and improved further when the top 5% proteins were examined together
   (c[5% ]= 0.854) or when built into the principal component
   (c[5%] = 0.853). From among the decision tree family, the gradient
   boosting method performed best (c[5%] = 0.809; c[100% ]= 0.823).
   Overall, classical biostatistical methods featured a superior
   performance.

Fig. 10. Biostatistical and machine learning models to evaluate the
prognostic role of the Complement proteome for kidney failure in the advanced
DKD cohorts (n = 304).

   Fig. 10
   [155]Open in a new tab

   Penalized, penalized regression; Dec tree, decision tree; Deep, deep
   learning; LR, logistic regression; PCA, principal component; EN,
   elastic net; RD, ridge; LS, lasso; RF, random forest; GBM, generalized
   boosting method; NNET, neural network. Source data are provided as a
   Source Data file.

   Since the Complement measurements in the advanced DKD were performed
   with high-throughput proteomics and those in early-to-moderate DKD with
   targeted immunoassays, we validated the top proteins between the two.
   Correlations between aptamer- and antibody-based measurements were
   excellent for C2, C5a, C6, CFH, and C7 (Pearson correlation
   coefficients ranged from r = 0.80 to 0.98, P-values < 10^−8 to 10^−27).
   Correlations for CL-K1 between the two methods were only modest
   (r = 0.51). Please see Supplementary Fig. [156]6a, b and Methods for
   more details. We expanded targeted measurements in advanced DKD cohorts
   for two selected proteins – C5a and CFH. The associations with kidney
   outcomes were again highly similar between the two (Supplementary
   Fig. [157]7).

Discussion

   To address the need to understand the mechanisms that underlie DKD
   development, this study comprehensively examined the association
   between urinary Complement and DKD progression.

   The top urinary Complement proteins with the strongest association with
   the development of kidney failure within 10 years in a two-cohort study
   of subjects with advanced DKD at baseline were: C2, C5a, CL-K1, C6,
   CFH, and C7. These proteins flagged engagement of multiple pathways of
   the Complement system. Among subjects with high levels of C2, our top
   protein, 4 out of 5 subjects developed kidney failure, whereas it was
   the case for only 1 out of 5 subjects from among those with low levels.
   Kidney failure risks were increased over three-and-a half risk or more
   for top proteins in the crude analyses and featured substantial
   strengths of the associations. Associations for top proteins remained
   independently associated with the kidney failure following a
   comprehensive adjustment for clinical covariates. The associations of
   the increased urinary concentrations of the Complement proteins were
   similar in type 1 and type 2 diabetes, but were much more substantial
   in advanced than in early-to-moderate DKD. The latter was examined for
   the top proteins in a three-cohort study of subjects with both diabetes
   types, followed for up to 5 years. This study also provided evidence
   that the kidney tissue most likely is the source of the excess of
   urinary levels of the top Complement proteins. The following is a
   discussion of the study findings regarding existing literature,
   implications for mechanisms of DKD development, possible therapeutic
   targets, and the potential development of prognostic tests to identify
   subjects with diabetes at risk of DKD progression.

   The top proteins associated with prospective kidney failure point to
   multiple specific components of the Complement system. In DKD, the
   roles of the lectin pathway and anaphylatoxins have been well
   established^[158]14,[159]19–[160]21. Our study confirms these findings
   and expands them further by implicating previously unreported roles of
   C2, an alternative pathway, and components of MAC.

   Urinary and kidney tissue data in our study concordantly indicate a
   role of C2 in DKD tissue. Most studies in DKD have not measured C2 so
   far, and thus, little is known. One mass spectrometry study reported C2
   associated with a kidney failure^[161]20.

   Anaphylatoxins C5a (our top second protein) and C3a have potent
   inflammatory properties. Previous small-scale studies reported elevated
   C5a and C3a levels in the urine or kidney tissue immunostaining in
   DKD^[162]21–[163]24, while our single-cell analyses similarly showed
   differential C5 increases in proximal tubules. Subject-level C3a
   heterogeneity in our tissue data aligns with an earlier C3
   immunostaining study^[164]25; whereas animal studies have demonstrated
   metabolic and renal benefits of C5a and C3a
   inhibition^[165]15,[166]19,[167]23.

   The third top protein, CL-K1, a newly described lectin pathway member,
   is strongly implicated in tubulointerstitial injury^[168]26. However,
   unlike other Complement components, lectin members (CL-K1 and
   mannose-binding lectin (MBL)) may not originate from the kidney.
   Concordantly, CL-K1 transcripts were present only in infiltrating
   cells. These observations align with weak MBL immunostaining in the
   kidney^[169]27 and evidence for circulating MBL^[170]28.

   Alternative pathway CFH ranked as the fifth protein, with other
   members, CFB and CFP, nearly as strong. The alternative pathway has
   also not received much attention so far. Recent studies reported
   associations of select proteins (CFH, CFB) with kidney failure in type
   2 diabetes^[171]20,[172]29,[173]30. CFH protein had the most increased
   expression in diabetic kidney among our top proteins. CFH RNA
   expression was highly abundant in our and other
   studies^[174]25,[175]31. In contrast, genetic or acquired CFH
   deficiencies underlie the biology of other immune-mediated
   glomerulonephritides^[176]32,[177]33. A better understanding of these
   seemingly divergent relationships is needed.

   All three upstream pathways ultimately lead to the formation of MAC.
   Urinary C6 (fourth) and C7 (sixth protein, respectively) were more
   strongly associated with kidney outcomes than other MAC components.
   Levels of C6, C7, or MAC were shown to associate with DKD progression
   in focused biomarker studies^[178]20,[179]27,[180]29. Our data
   demonstrate especially abundant C7 in kidney fibroblasts, concordantly
   with other transcriptomics studies^[181]25,[182]34. C6 contributes to
   tubulointerstitial injury in animal models^[183]35,[184]36, also shown
   in the proximal tubular region in our study. Among other Complement
   components, we observed a decent number of closely related coagulation
   and kallikrein-kinin proteins increased. Cross-talks between the
   Complement and coagulation were previously implicated^[185]15,[186]37.

   Our study offers exploratory insights into the Complement relationships
   with other molecular indices. Urinary Complement correlated
   substantially with an inflammatory biomarker, MCP1. Indeed,
   anaphylatoxins, collections, or MAC are recognized for their
   pro-inflammatory properties^[187]19,[188]38. Correlations with tubular
   biomarker (LFABP) were marked and aligned with our expression data and
   with the literature pointing to the tubulointerstitial
   injury^[189]19,[190]35,[191]36. Complement links to IgA and IgM,
   biomarkers of kidney filtration barrier damage^[192]39,[193]40, also
   implicated in other Complement diseases^[194]33,[195]38,[196]41 were
   strong. TNFR1 is an established systemic biomarker of DKD
   progression^[197]5,[198]22,[199]23, whereas circulating TNFR-enriched
   signature – KRIS was recently reported^[200]17. The chord diagram shows
   that only select circulating KRIS proteins, including DAF (which is a
   Complement regulatory protein), correlated with the urinary Complement.
   Only limited evidence in the literature exists linking the
   two^[201]42–[202]44.

   Due to the paucity of kidney failure in earlier DKD^[203]45,[204]46, we
   used clinically recognized surrogate endpoints: continuous GFR slope
   and binary 30% or more decline in GFR. GFR slope is an attractive
   outcome because it universally translates across disease stages,
   accommodates varying lengths of observation, and maximizes the study
   power. Other than CL-K1, the top proteins accelerated early kidney
   function loss. Associations remained significant after adjustment for a
   limited number of clinical covariates. Effect sizes were smaller than
   in advanced DKD but still remarkably concordant between these three
   independent cohorts differing by study design, recruitment sites, and
   racial/ethnic background. This observation partially aligns with a
   previous concept that Complement is more relevant in advanced
   disease^[205]40.

   Complement associations in our study showed a remarkable concordance
   between type 1 and type 2 diabetes. It was visible in comparisons of
   Complement proteomes in the advanced DKD as well as for comparisons of
   top proteins in early-to-moderate DKD cohorts. Most existing Complement
   research was conducted in type 2 diabetes, except for circulating
   lectin pathway biomarkers, which were largely investigated in type 1
   diabetes^[206]14,[207]15,[208]47. It is becoming evident that the
   proteomics phenotypes of DKD often overlap between type 1 and type 2
   diabetes in this study and as reported by
   others^[209]17,[210]48,[211]49. Our study design prioritized DKD stage
   over diabetes types, thus, there may be Complement proteins highly
   specific to only type 1 or type 2 diabetes, which may have not been
   identified. Further evaluation of Complement in both diabetes types is
   recommended. Of note, most interventional studies in DKD have typically
   focused on just one diabetes type. Moreover, although there are sex
   differences in prospective kidney risks in DKD^[212]50, and also in
   immune and Complement responses^[213]51,[214]52, we did not observe
   interactions between sex and Complement proteins.

   Although Complement presence in the diabetic kidney was previously
   attributed to non-specific circulating protein
   deposition^[215]14,[216]41, our integrated evidence across three
   matrices strengthens the notion that Complement is likely produced in
   the kidney. Urinary proteins were far more strongly associated with the
   kidney outcomes than circulating ones. Although urinary Complement
   proteins did correlate with albuminuria, the patterns of associations
   did not correlate with the molecular weight of the proteins, speaking
   against the Complement increase being a result of simple leakage.
   Moreover, our tissue proteomics and single-cell and spatial
   transcriptomics studies indicated increased kidney capability of
   production of select molecules, as also selectively implied
   before^[217]25,[218]31,[219]34.

   Our study design utilized a discovery approach to identify a pathway
   enrichment in two advanced DKD cohorts, with a subsequent focus on the
   top pathway. The associations were remarkably strong in the analyses
   adjusted for multiple testing and clinical covariates. Internal
   validation includes an examination of the concordant signals between
   type 1 and type 2 diabetes, of more than one kidney outcome, of more
   than one computational approach, and supported further by analytical
   method validation. We offer biological insights, or a molecular
   validation of our findings, offered by kidney tissue data comprising
   the Complement proteome and single-cell and spatial transcriptomics for
   the top signals, together with other molecular indices of the disease.
   Subsequently, we evaluate whether Complement associations
   extend^[220]53 to earlier disease stages in a three-cohort study.
   Employment of well-characterized cohorts in this translational project
   is a considerable study strength.

   Such a comprehensive evaluation in biofluids like serum or urine has
   only recently been enabled by innovative proteomics. Our study is the
   initial attempt to evaluate the urinary proteome with high-throughput
   aptamer proteomics in any prospective, chronic disease. Furthermore, it
   offers the highest ever resolution of the Complement proteome performed
   so far. Proteins are close to the disease phenotype and, as such, they
   are often biomarkers and drug targets. Albuminuria is currently the
   only protein used in clinical care. The SomaScan platform takes
   advantage of aptamers, allowing for high-throughput, great sensitivity,
   broad dynamic range, and capabilities of teasing apart intact and split
   products (intact C5 from C5a, for example); altogether, as a result,
   outperforming other protein techniques^[221]54,[222]55. Our study
   expands on earlier targeted biomarker or mass spectrometry approaches
   for Complement^[223]20–[224]22,[225]29,[226]30,[227]47. Our study did
   not confirm select proteins like urinary DAF or CD59 reported
   elsewhere, likely attributed to different technologies, or inabilities
   to measure a glycated form^[228]20,[229]30,[230]47. Measurements in
   advanced DKD were performed with aptamer proteomics, whereas those in
   early-to-moderate DKD with targeted assays. Our substantial orthogonal
   assay validation showed an excellent reproducibility between the two
   for all, but CL-K1.

   We supported these high-throughput data with advanced computational
   analyses. Although machine learning and artificial intelligence-based
   approaches are becoming important partners in big data
   analyses^[231]56,[232]57, our biostatistical models offered the most
   optimal performance.

   Noteworthy, Complement therapies that target top proteins reported in
   our study are available in other kidney diseases. Those include
   prominent targets for C5, C5a or C5aR1 (from preclinical development to
   FDA-approved therapies)^[233]15,[234]32, alternative factors (up to
   phase III), or C6 (preclinical). Our study may spark interest in
   developing DKD therapies targeting Complement.

   Therapeutic landscape of DKD is different at large from the landscape
   of diseases targeted with Complement
   inhibitors^[235]3,[236]4,[237]14,[238]45. Therapies in DKD are applied
   to large sectors of the population, whereas complement-mediated
   glomerulonephritides are classified as rare diseases and have a more
   acute disease course^[239]33,[240]58. Complement inhibitors often
   feature narrow safety profiles and substantial costs^[241]15. Thus, a
   potential therapy in DKD will likely focus on specific subpopulations,
   speaking to the needs of biomarker guidance. The knowledge gained in
   our study is arguably substantial to inform such precision medicine
   approaches. Our study offers an evaluation across kidney disease stages
   and diabetes types, and in the context of well-established clinical
   outcomes^[242]3,[243]59. It points to urinary Complement proteins as
   biomarkers of overall kidney risks, particularly in advanced DKD, where
   the associations were independent from clinical covariates and
   increased prognostication ability. Our cluster analyses across advanced
   and early stages, employed similarly in another Complement
   disease^[244]59, show differential risks on top of the classical
   enrollment based on DKD stages.

   The following are limitations of our work. Our findings may not be
   transferable to DKD without albuminuria, non-diabetic kidney disease,
   or other diabetic complications like cardiovascular mortality (for
   details on the study generalizability, see Supplementary Data [245]12).
   We did not perform repeated Complement measurements over time; however,
   we do offer insights across kidney disease stages. We did evaluate
   Complement associations with the ultimate outcome (prospective kidney
   failure) in advanced DKD cohorts, whereas we were not able to do so in
   early/moderate DKD cohorts, because kidney failure was rare in those
   subjects. We identified Complement proteins as risk factors. Other
   studies may be better powered to identify protective patterns
   (downregulated proteins)^[246]49,[247]60. Our study included cohorts of
   Caucasian subjects, and it also included one cohort of African American
   subjects^[248]61,[249]62. Future studies across populations at
   increased kidney risk, like Pima Indians^[250]63 or others, will
   complete the picture. Our proteomics technology did not allow us to
   separate glomerular and tubular compartments, and we also did not
   perform tissue immunostaining. To gain insights into the cellular
   origin, we used high-resolution single-cell and spatial transcriptomics
   instead. Lastly, although we cannot formally distinguish whether our
   results reflect the Complement activation or an accelerated turnover,
   our data provide almost overwhelming evidence of Complement engagement
   in the disease progression.

   In summary, our findings provide robust evidence of the role of the
   Complement proteome in progressive diabetic kidney disease in type 1
   and type 2 diabetes. It is particularly pronounced in advanced disease
   stages and likely attributed to local kidney involvement. Our study
   provides important biological insights and solid biomarker guidance to
   inform drug development strategies targeting Complement in diabetic
   kidney disease.

Methods

Study oversight

   The current study adhered to all relevant ethical regulations, and all
   the protocols were approved by the Committee on Human Studies at Joslin
   Diabetes Center. The institutional review board at each site, including
   the Joslin Diabetes Center Committee on Human Studies for the Joslin
   Kidney Studies, the National Institute of Diabetes and Digestive and
   Kidney Diseases (NIDKK)-appointed data and safety monitoring board for
   the Preventing Early Renal Loss in Diabetes (PERL) study, the Wake
   Forest University School of Medicine Institutional Review Board for the
   Diabetes Heart Studies and the University of Pennsylvania Institutional
   Review Board for the kidney tissue study, approved the protocols for
   the respective parent studies. All patients provided pre-enrollment
   written informed consent.

Advanced DKD – two-cohort study

Study population

   The advanced DKD included one type 1 diabetes cohort comprising 189
   subjects (37% were female) and one type 2 diabetes cohort of 115
   subjects (36% were female) from the Joslin Kidney Study, which is a
   prospective, observational investigation of the natural history and
   molecular determinants of DKD
   progression^[251]2,[252]17,[253]49,[254]64,[255]65. Advanced DKD was
   defined as impaired kidney function (glomerular filtration rate (GFR)
   categories; G3: 30–59 or G4: 15–29 ml/min/1.73 m^2) and moderately or
   severely increased albuminuria (albuminuria categories; A2: 30–299 or
   A3: ≥ 300 mg/g creatinine) at baseline. Subjects were followed for 7–15
   years.

Early/Moderate DKD – three-cohort study

Study population

   The type 1 diabetes cohort comprised 207 subjects (26% were female)
   with A2 or A3 albuminuria and GFR ranging from 40 to
   99.9 ml/min/1.73 m^2 from the PERL trial. Participants were followed
   for 3 years and 2 months^[256]66.

   The type 2 diabetes cohort comprised 322 subjects (33% were female)
   from the Joslin Kidney Study with normal kidney function (GFR
   categories; G1: ≥ 90 or G2: 60–89 ml/min/1.73 m^2) and A2 or A3
   albuminuria. Participants were followed for 5 years.

   The type 2 diabetes African American cohort comprised 123 subjects (52%
   were female) from the Diabetes Heart Studies^[257]61,[258]62 with
   normal kidney function (93% subjects with G1 or G2), and A2 or A3
   albuminuria. Participants were followed for 5 years.

Study outcomes

Incident kidney failure – primary outcome

   The primary outcome in the two advanced DKD cohorts, namely kidney
   failure, was ascertained based on the national registries. The United
   States Renal Data System (USRDS) governs a roster of patients receiving
   kidney replacement therapy, which includes dates of therapy initiation.
   The National Death Index (NDI) is a database comprising dates and
   causes of death. Incident kidney failure was counted against USRDS for
   subjects who remained alive, or counted against NDI if kidney failure
   was listed as a cause of death.

   Subjects were censored either at the time of death (unrelated to kidney
   failure), date of last GFR measurement, or at 10 years.

   Since the primary outcome was rare in the early-to-moderate DKD
   cohorts, in order to allow for comparisons across four cohorts, we
   utilized the following secondary outcomes: continuous GFR slope, and
   binary outcomes of ≥ 30% GFR decline or incident kidney failure.

   The kidney endpoint definitions and evidence strength used above are
   based on the scientific workshop co-sponsored by the National Kidney
   Foundation and the US Food and Drug Administration held in
   2020^[259]45.

GFR slope – continuous outcome

   GFR slope is recognized as a clinically valid surrogate outcome of DKD
   progression^[260]45,[261]67. A number of studies based on the Joslin
   Kidney Study participants showed that the vast majority of subjects
   have linear or almost linear GFR slopes within the course of
   DKD^[262]2.

   In this study, the annual rate of kidney function decline - GFR slope,
   expressed in ml/min/1.73 m^2 per year, was estimated with
   subject-specific trajectories from time series of GFR calculated from
   serum creatinine using the CKD-EPI formula, including the baseline GFR
   value.

GFR-based binary outcomes

   Kidney failure, 40% or more decline in GFR, were scarce in the
   early-to-moderate DKD cohorts. Thus, we utilized the composite outcomes
   of 30% or more decline in GFR or kidney failure. To harmonize the
   length of observation, the advanced DKD cohorts were censored at 5
   years, so that the length of observation aligned with the
   early-to-moderate DKD cohorts with type 2 diabetes. The maximum length
   of observation in the early-to-moderate DKD in type 1 diabetes was 3
   years and 2 months.

Complement proteome determinations

High-throughput proteomics

   All specimens were stored at − 80 °C until subjected to proteomics
   analysis. High-throughput proteomics profiling was performed at the
   Proteomics Core, Beth Israel Deaconess Medical Center in Boston, MA,
   using the SomaScan platform^[263]17,[264]54,[265]55,[266]68 (See
   Supplementary Data [267]13 for the list of proteins measured).

   We used an aptamer platform for our proteomics determinations. Aptamers
   are unique, single-stranded sequences of nucleic acids that recognize
   folded, 3-dimensional structures of protein epitopes with high affinity
   and specificity. This property is further enhanced with the Slow
   Off-rate Modified Aptamers (SOMAmers). This platform transforms each
   individual protein concentration into a specific corresponding bound
   SOMAmer reagent, such that the end result is directly proportional to
   the target amount of protein in the original sample. The samples are
   incubated with aptamers to form aptamer-protein complexes. Subsequent
   washing steps eliminate non-specifically bound or non-bound aptamers
   and proteins. Next, aptamers are quantified by hybridization with
   probes complementary to aptamer sequences (Agilent Technologies, Santa
   Clara, CA). The assay readout is reported in relative fluorescence
   units^[268]12,[269]54.

   Proteomics profiling in urine was performed using the Cells and Tissue
   Lysate 1.3 k kit (SomaLogic, Boulder, CO) according to the
   manufacturer’s recommended protocol. Urine samples from Joslin Kidney
   Study subjects with advanced DKD were assayed in batches of 26 samples
   each. Samples were balanced on the plates by prospective case status,
   which was blinded to the operating laboratory personnel. Instead of
   using the manufacturer’s calibration controls, we created a custom
   in-house pooled urine generated based on a large roster of 121 subjects
   that reflected the baseline phenotype and composition of the advanced
   DKD cohorts used for proteomics. Five pooled replicate samples were run
   on each batch and were used for inter-run calibration. First, the data
   were normalized to remove hybridization variation within a run.
   Subsequently, scaling was performed on a per-batch basis to remove
   intensity differences between runs. Plate scale factor, which is
   derived from calibrator sample values, ranged from 0.9 to 1.32
   (accepted range: 0.4– 2.5). All subject-level and protein-level data
   for urine determinations passed the SOMAscan assay quality-control
   criteria and were fit for analysis. In addition, we incorporated an
   internal, urine control from four subjects with a comparable DKD
   phenotype run on every other batch that we used in determinations of
   the coefficients of variation (CV). The distribution of inter-assay
   coefficients of variation of all Complement proteins measured in urine
   on the aptamer platform is shown in Supplementary Fig. [270]6a.

   To evaluate the detectability, we have determined background noise
   based on the 21 buffer replicates measured on the array. The limit of
   detection was defined as an averaged value of buffer plus 2 standard
   deviations. Protein was defined as having a very good detectability if
   it was detected in ≥ 70% of our samples from the two-cohort advanced
   DKD. In other words, it was detected in more than two tertiles of the
   protein distribution in our study population, allowing for analyses per
   tertile change. Protein was defined as having good detectability if it
   was detected in ≥ 50% of our samples. In other words, it was detected
   in more than half of our study population, allowing for analyses
   above/below the median. Non-well-detectable proteins were analyzed and
   categorized by the detection threshold (Supplementary Data [271]5).

   Proteomics profiling of the Complement proteins in plasma was performed
   using the Human Plasma SOMAscan 1.3 k kit (SomaLogic, Boulder, CO)
   according to the manufacturer’s standardized protocol as described
   elsewhere^[272]17. Data normalization was done according to SOMAscan
   assay data quality-control procedures as described above for urine. In
   addition, median signal normalization was applied, which accounts for
   sample-to-sample differences in total protein concentration and other
   systematic variations within a plate run.

   Proteomics profiling in kidney tissue was performed using the Cells and
   Tissue Lysate 1.3 k kit (SomaLogic, Boulder, CO) according to the
   manufacturer’s protocol. Kidney tissue acquisition and specimen
   processing for proteomics were previously described^[273]68 (see also
   Supplementary Methods). Hybridization, normalization, and plate scaling
   were applied. Principal component-based score plot revealed one
   subject-level outlier, subsequently removed from the study. All other
   data were fit for the analyses.

Complement proteome annotations

   The portfolio of 110 Complement proteins was assembled based on Kyoto
   Encyclopedia of Genes and Genomes^[274]69 (KEGG pathway – hsa04610:
   Complement and coagulation cascades; hsa01002: peptidases and
   inhibitors – peptidases inhibitors – Family I4: serpin family),
   Reactome^[275]70 (R-HSA-166658: Complement cascade; R-HSA-140877:
   Formation of Fibrin Clot; R-HSA-75205: Dissolution of Fibrin Clot) and
   UniProt^[276]71 databases (Family: peptidase S1 family Kallikrein
   subfamily). In addition, we also included Complement component C1q
   receptor (C1QR1) and Complement component 1q subcomponent binding
   protein (C1QBP) to our final Complement roster. The molecular weight of
   each Complement protein was curated using the UniProt database.

Targeted measurements of Complement proteins

   High-throughput aptamer proteomics as performed in advanced DKD cohorts
   is an excellent tool in discovery; however, it is an unlikely strategy
   for focused biomarker studies. Thus, we sought to orthogonally validate
   our proteomics measurements with targeted, antibody-based, single or
   low-multiplex solutions^[277]29,[278]72,[279]73.

   The MicroVue Complement Panel 2 was used for measurements of C2,
   whereas Panel 1 was used to quantify C5a desArg protein levels (Cat.
   No. A916 and A900, respectively, Quidel, San Diego, CA). C5a desArg
   (without arginine) – a stable product of C5a was measured as an
   alternative of C5a. The Quansys platform used for the readout is a
   chemiluminescent imager that supports quantitative analysis of 96-well
   plate-based immunoassays. Each well has nano spots coated with
   protein-specific capture antibodies. The Q-View Imager LS features
   18-megapixel resolution and a rapid read time of 270 s (Quansys
   Biosciences, Logan, UT). The remaining four Complement proteins were
   measured using enzyme-linked immunosorbent assays (ELISAs). These
   included CL-K1 (Cat. No. LS-[280]F35879, LifeSpan BioSciences, Seattle,
   WA), CFH (Cat. No. A039, Quidel, San Diego, CA), C6 and C7 (Cat. No.
   ab125965 and ab125964, respectively, Abcam, Cambridge, UK).

   All assays were performed according to manufacturer instructions. Urine
   specimens were diluted 1:2, except for CL-K1 and C7, which were diluted
   1:5. Analysis of all ELISA measurements were performed using a
   5-parameter logistic (5PL) curve fitting. On the low multiplex
   platform, the auto-fit function of the Q-View Software v3.13 was used
   to choose between 4-parameter and 5-parameter logistic curve fitting.
   Our initial analyses demonstrated a superior performance of the
   vendor-provided algorithm built into the auto-function over the fixed
   curve fitting with the 5PL method.

   Orthogonal method validation was performed in the baseline urine from
   37 subjects with type 1 diabetes from the advanced DKD. We also
   performed targeted measurements for two of the six proteins (C5a and
   CFH) due to limited sample volume, in 165 subjects with type 1 and 107
   subjects with type 2 diabetes from the advanced DKD cohort.

   Subsequently, we performed targeted biomarker measurements in urine
   specimens of the three early-to-moderate DKD cohorts. The protein
   detectability was excellent across the cohorts, ranging from 98-100%,
   except for CFH, in which the detectability was 88%. A cohort-specific
   half-minimum value was assigned to samples that fell below the limit of
   detection. Our in-house controls, which were a pool of cohort-specific
   samples, were used to evaluate inter-assay CVs. Inter-assay CVs were
   less than 14%, except for C5a, which had a CV of 32%.

Statistical analysis

   Continuous variables were presented as means and standard deviations or
   medians (25^th and 75^th percentiles) as applicable. Categorical
   variables were provided as counts and percentages.

   The pathway enrichment analysis was done with the Database for
   Annotation, Visualization and Integrated Discovery (DAVID) using a full
   set proteins (n = 1305) measured on the SomaScan platform as a
   background. Class enrichment by the over-representation method was
   tested using two-sided Fisher’s exact tests, taking into account the
   significant proteins and the number of proteins that were present
   within each respective class, and compared with the rest of the
   proteins (including significant and non-significant) measured on the
   high-throughput platform.

   In the advanced DKD two-cohort study, we used a Cox proportional
   hazards model to test associations of urinary proteome with the primary
   outcome. Tied failure times were handled using the exact method in the
   Cox proportional hazards model. The effect sizes were expressed as
   hazard ratios per one tertile change in the urinary creatinine-adjusted
   protein distribution with corresponding 95% confidence intervals. The
   plots of the Martingale residuals against the covariates were within
   the distribution of the observed curves, indicating an acceptable model
   fit. The proportionality test using time-dependent covariates
   (interactions between covariates and a function of survival time) were
   not significant, indicating that the proportional hazards assumption
   was not violated.

   Measured confounding was evaluated with changes in β effect estimates
   (difference between the partially adjusted and base model β effect
   estimates divided by the base model estimates, where β is the natural
   logarithm of the hazard ratio). A change in the β effect estimate of
   20% or higher was deemed non-negligible.

   Unmeasured confounding was estimated with an E-value by VanderWeele and
   Ding^[281]74, a sensitivity analysis tool intended for observational
   studies. The E-value quantifies the minimum strength of the association
   of a hypothetical, unmeasured confounder that would explain away the
   association between the exposure and the outcome. A large E-value
   suggests that it is unlikely that the exposure-outcome relationship
   could be explained purely by unmeasured confounding.

   There were no missing data in the key clinical covariates or in the
   urinary proteome determinations. Our univariable and partially adjusted
   Cox models had all the data available. Cox models were adjusted for
   age, sex, race, diabetes type, diabetes duration, body mass index,
   systolic and diastolic blood pressures, serum cholesterol, hemoglobin
   A1c, GFR, albuminuria, smoking status, insulin use,
   renoprotective/other antihypertensives, and lipid-lowering treatments.
   There were small numbers of missing data in select clinical variables
   used in the fully adjusted Cox models: systolic (n = 1, 0.3%) and
   diastolic (n = 1, 0.3%) blood pressures, total cholesterol (n = 14, 4%)
   and high-density lipoprotein (n = 15, 5%), smoking (n = 7, 2%) and
   lipid lowering treatment (n = 2, 0.6%). Data missingness was handled
   with a multiple imputation approach under an assumption of missing at
   random^[282]75 and comprised three phases: imputation, analysis, and
   pooling phase. We employed the fully conditional specification^[283]76
   method, which uses a discriminant function method for
   binary/categorical variables (smoking and lipid-lowering treatment) and
   a linear regression method for continuous variables (systolic and
   diastolic blood pressure, total cholesterol, and high-density
   lipoprotein). We created 10 imputed datasets for missing variables and
   then analyzed each of the 10 complete datasets using the Cox
   proportional hazards regression model. The parameter estimates (e.g.,
   imputation-specific coefficients and standard errors) from each of the
   10 imputations were pooled into a single set of results for inference
   using Rubin’s rule^[284]77. In addition, we performed analyses in the
   complete dataset of the advanced DKD with no missing data (93%) that
   yielded highly comparable results. There were 27 out of 110 proteins
   that were not very well detectable. We analyzed those using two-sided
   Fisher’s exact tests. Of these, 13 well-detectable proteins were
   categorized as above or below the median, while the remaining 14
   non-well-detectable proteins were categorized above or below their
   detection threshold.

   Life tables of the 10-year cumulative incidence of kidney failure were
   generated using the Kaplan-Meier method. Homogeneity across the
   tertiles treated as a three-level categorical variable was evaluated
   with a log-rank test.

   We used a weighted Cohen’s kappa test^[285]78 to evaluate the degree of
   agreement of Complement associations with kidney outcomes between type
   1 and type 2 diabetes advanced DKD cohorts. Association strengths
   (P-values) from the cohort-specific crude Cox proportional hazards
   model were categorized into quintile ranks. Next, weights were computed
   using the equal-spacing method. Weighted kappa coefficients range from
   0 to 1, where a coefficient closer to 1 indicates greater
   agreement^[286]79.

   Spearman rank-order correlations were used as a non-parametric measure
   of associations to evaluate urinary/circulating Complement pairs with
   the prospective kidney outcomes, urinary Complement with clinical
   covariates, and molecular indices of diabetic kidney injury. The
   continuous variables in the two-group comparisons were compared in the
   analysis of variance. Dispersion of each protein in the kidney tissue
   was determined as a ratio of the standard deviation over the mean
   value, expressed as a percentage. We used Pearson parametric
   correlations for orthogonal validation between high-throughput and
   targeted protein measurements.

   The prognostic model performance in the two-cohort advanced DKD study
   was evaluated using a Cox proportional hazards regression model. The
   clinical model (Model 1) contained age, sex, diabetes type, hemoglobin
   A1c, GFR, and albuminuria. Model 2 consisted of the top 5% Complement
   proteins added to the clinical covariates. We evaluated the risk
   discrimination with Uno’s concordance statistic (C-statistic). Uno’s
   method calculated the concordance probability by modeling the censoring
   distribution and using it to weigh the uncensored observations in the
   estimation, resulting in censoring-independent estimates. Unsupervised
   hierarchical clustering was performed using Ward’s method with
   Euclidean distances.

   We also assessed the prognostic accuracy of the Complement proteome for
   a long-term kidney outcome using machine learning algorithms in the
   advanced DKD cohorts. The outcome of interest was the development of
   kidney failure in 10 years. We tested a varying number of proteins: top
   1, top 5% (n = 6), and the entire roster of the Complement proteome
   (n = 110). The model performances were compared using the concordance
   statistic. We evaluated biostatistical logistic regression (LR) and six
   machine learning models: principal component (PCA) using LR; penalized
   regression: elastic net (EN), lasso (LS), and ridge (RD); decision
   tree: random forest (RF) and generalized boosting method (GBM); and
   neural network (NNET). Our advanced DKD cohorts was split by random
   sampling into training and testing datasets. The training dataset was
   60% of the full cohort, and the testing dataset was the remaining 40%
   to allow for a sufficient sample size. The model accuracy was
   calculated using repeated 10-fold cross-validation. We confirmed that
   our data did not contain any covariates that had near zero variation,
   were highly correlated, or were linear combinations of each other.
   Complement protein levels were normalized to urinary creatinine and
   transformed to their base 10 logarithms or cohort-specific
   percentile-ranked values. The final algorithm for each model was
   determined using the training datasets, and subsequently, each
   algorithm was used to calculate the Complement proteome’s prognostic
   accuracy for kidney failure within the testing datasets, resulting in
   the final performance accuracy of each computational model.

   All graphical displays were generated using GraphPad Prism v8.3.1
   (GraphPad Software, San Diego, CA) except for the following. The needle
   plot was generated with the R package ggplot2, in R version 3.5.0 (R
   Core Team, 2018)^[287]80. The weighted Cohen’s kappa test was performed
   in R version 4.2.2, using the vcd package^[288]81. The schematic
   representation of the study workflow and major pathways of the
   Complement system was created with BioRender.com. Hierarchical
   clustering was performed with JMP 16.0.0 software (SAS, Cary, NC). The
   chord diagram was generated with the R package circlize, in R, version
   0.4.8 (R Core Team, 2018)^[289]82. Machine learning models were built
   with the R package caret (short for Classification And REgression
   Training) in R version 6.0-86 (R Core Team, 2018)^[290]83.

   All statistical tests were two-sided. Evaluations of the Complement
   proteome in the advanced DKD focused on the top 5% of the distribution
   of P-values obtained in the study, which translated the association
   strength to an α = 2.4 x 10^−23 significance threshold. For comparison,
   Bonferroni correction had a less stringent α = 4.5 × 10^−4 (0.05/110).
   The kidney tissue proteomics data were treated with Benjamin-Hochberg
   false discovery rate < 0.05. Other significance tests used α = 0.05.
   Analyses were performed in SAS v9.4 (SAS, Cary, NC) unless otherwise
   indicated.

   The Supplementary Information provides detailed descriptions on the
   five cohorts, additional study groups, and single-cell/single-nucleus
   RNA sequencing (sc/snRNA-seq) and spatial transcriptomics.

Reporting summary

   Further information on research design is available in the [291]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [292]Supplementary Information^ (2.1MB, pdf)
   [293]41467_2025_62101_MOESM2_ESM.docx^ (14.6KB, docx)

   Description of Additional Supplementary Files
   [294]Supplementary Data 1-13^ (363.9KB, xlsx)
   [295]Reporting Summary^ (100KB, pdf)
   [296]Transparent Peer Review file^ (762.4KB, pdf)

Source data

   [297]Source Data^ (681KB, xlsx)

Acknowledgements