Abstract Current genome-wide association studies (GWAS) for kidney function lack ancestral diversity, limiting the applicability to broader populations. The East-Asian population is especially under-represented, despite having the highest global burden of end-stage kidney disease. We conducted a meta-analysis of multiple GWASs (n = 244,952) on estimated glomerular filtration rate and a replication dataset (n = 27,058) from Taiwan and Japan. This study identified 111 lead SNPs in 97 genomic risk loci. Functional enrichment analyses revealed that variants associated with F12 gene and a missense mutation in ABCG2 may contribute to chronic kidney disease (CKD) through influencing inflammation, coagulation, and urate metabolism pathways. In independent cohorts from Taiwan (n = 25,345) and the United Kingdom (n = 260,245), polygenic risk scores (PRSs) for CKD significantly stratified the risk of CKD (p < 0.0001). Further research is required to evaluate the clinical effectiveness of PRS[CKD] in the early prevention of kidney disease. Subject terms: End-stage renal disease, Genome-wide association studies __________________________________________________________________ Here the authors present a large genetic study in East Asians that identifies 97 genetic regions linked to kidney function. These findings aim at better understanding chronic kidney disease in diverse populations. Introduction No cure has yet been developed for chronic kidney disease (CKD), which affects more than 700 million people worldwide and thus places a substantial socioeconomic burden on the global economy and public health systems^[62]1. Sodium–glucose cotransporter-2 (SGLT2) inhibitors are the only oral drugs approved by the U.S. Food and Drug Administration to slow the progression of CKD. This therapeutic effect was an unexpected discovery^[63]2–[64]4. Accordingly, an effective drug discovery platform for CKD should be established^[65]5. Regardless of which type of therapy is examined, such as cell therapy or antisense oligonucleotides^[66]6, identifying the genetic variants associated with CKD is essential for accelerating drug development efforts^[67]7. More than 250 genetic loci associated with kidney functional markers, including the estimated glomerular filtration rate (eGFR) and the urine albumin-to-creatinine ratio, are highly replicated in large-scale population-based cohorts, such as those from the Million Veteran Program, Biobank Japan (BBJ) Project, and UK Biobank (UKB)^[68]8–[69]10. However, undetermined pathogenic pathways, a dominant distribution in noncoding regions, and extensive linkage disequilibrium (LD) across common variants in these kidney function-related loci have prevented researchers from elucidating the functionality and pinpointing the casual variants^[70]8,[71]11. In addition, the relatively limited ancestral and ethnic diversity of the individuals analyzed in existing large-scale genome-wide association studies (GWASs), which predominantly focus on Caucasian populations, may lead to significant limitations in regional practice and healthcare policy development for the Asian population^[72]12. Therefore, GWASs involving diverse populations with a range of kidney functional markers are urgently required for clinical and therapeutic translation. Taiwan has the highest prevalence (3679 per million population) and incidence (823 per million population) of end-stage kidney disease (ESKD) worldwide, making it particularly suitable for CKD-related GWAS^[73]13,[74]14. In a population-based family study that involved data obtained from Taiwan’s National Health Insurance Research Database, a high kidney disease heritability of 31.1% was observed, indicating that further examination of genetic inheritance is warranted^[75]15. In this study, to determine the effect of genetic heritability on common kidney functional markers in regions with a high prevalence of ESKD, we conducted a systematic analysis of GWAS findings in populations from the Taiwan Biobank (TWB), the BBJ, and a large hospital-based cohort from China Medical University Hospital (CMUH) in Taiwan. Results Discovery of genetic associations with eGFR through a meta-analysis of GWASs involving the BBJ and TWB The present study design is depicted in Fig. [76]1. We conducted a fixed-effects inverse variance-weighted meta-analysis of GWASs involving two large biobanks containing samples from individuals of East Asian ancestry, namely the BBJ (n = 143,658) and TWB (n = 101,294). The mean eGFRs (standard deviations [SDs]) of the BBJ, TWB, and pooled populations were 79.93 ( ± 15.42), 101.95 ( ± 14.75), and 89.04 ( ± 15.14) mL/min/1.73 m^2, respectively, and their corresponding mean ages (SDs) were 62.9 ( ± 13.2), 50.4 ( ± 10.9), and 57.7 ( ± 12.2) years, respectively (Supplementary Data [77]1). The genotypes used in the two GWASs were imputed from the East Asian reference panels of the 1000 Genomes Project Phase 3^[78]16. As shown in Fig. [79]2 and Supplementary Data [80]2, a meta-analysis of these GWASs revealed 5790 genome-wide significant (P < 5 × 10^−8) single-nucleotide polymorphisms (SNPs). In these genome-wide significant SNPs, 2140 were unreported in the previous eGFR GWASs^[81]10,[82]17. A total of 238 independently significant SNPs (LD r^2 < 0.6) were represented by 111 lead SNPs (LD r^2 < 0.1) located in 97 genomic risk loci, of which 26 loci were unreported in other GWASs (Supplementary Data [83]3). Genomic risk loci were defined as nonoverlapping on the basis of a window size of ±250 kb around LD block of lead SNPs, and these loci were merged into a single locus if the distance between them was shorter than 250 kb^[84]10,[85]17. The previously unreported genome-wide significant SNP with the lowest P value (P = 5.1 × 10^−45) was rs754331108, which was located in the intronic region of CASP9. The minor alleles across genome-wide significant SNPs both increased and decreased the eGFR, with lower-frequency alleles exhibiting stronger effects (Supplementary Fig. [86]1a). The effects of genome-wide significant SNPs on the eGFR were largely homogeneous (median I^2 = 0) across the two biobank populations (Supplementary Fig. [87]1b). According to meta-GWAS summary statistics, the estimated genetic heritability (h^2) of the eGFR was 10.9%, and the LD score regression intercept was 1.05, indicating that the effects of bias due to population stratification and cryptic relatedness were negligible. Fig. 1. Flowchart of the study design. [88]Fig. 1 [89]Open in a new tab A meta-analysis of an eGFR GWAS was conducted using the BBJ and TWB. Replication was performed using an independent TWB-derived replication dataset. The relevance of the eGFR to kidney function was validated through the associations of the eGFR with BUN, CKD, and ESKD in the TWB and BBJ. Pathways and tissue types were enriched through the FUMA platform. Genetic correlation analysis of 119 traits was conducted using LD score regression. Fine mapping of causal variants was performed using GCTA-COJO, and gene prioritization with tissue-specific cis-eQTLs was conducted using the R package “coloc.” The PRS for CKD was derived using PRSice-2 and tested using patient data obtained from a Taiwanese hospital cohort (CMUH-CRDR CKD) and community-based data obtained from a UK cohort (UKB CKD). BBJ Biobank Japan, BUN blood urea nitrogen, cis-eQTL cis-expression quantitative trait locus, CKD chronic kidney disease, CMUH-CRDR Clinical Research Data Repository of China Medical University Hospital, eGFR estimated glomerular filtration rate, ESKD end-stage kidney disease, FUMA Functional Mapping and Annotation of Genome-Wide Association Studies, GCTA-COJO genome-wide complex trait analysis–conditional and joint analysis, GWAS genome-wide association study, LDSC linkage disequilibrium score regression, PRS polygenic risk score, SNP single-nucleotide polymorphism, TWB Taiwan Biobank, UKB UK Biobank. Fig. 2. A circular Manhattan plot from a meta-analysis of eGFR-derived GWASs (TWB, n = 101,294; BBJ, n = 143,658; total n = 244,952). [90]Fig. 2 [91]Open in a new tab The green band corresponds to –log[10](P) for association with eGFR in the meta-analysis by chromosomal position. The blue band corresponds to –log[10](P) for association with eGFR in the TWB-derived discovery dataset by chromosomal position. The orange band corresponds to –log[10](P) for association with eGFR in the BBJ dataset by chromosomal position. The solid red line indicates genome-wide significance (P = 5 × 10^–8). Genes labeled in black indicate SNPs exclusively identified in the meta-analysis, whereas genes labeled in blue indicate SNPs identified in the meta-analysis and additionally detected in the TWB or BBJ or both. A total of 5790 SNPs had P values of <5 × 10^−8, of which 4732 had a consistent effect direction. The lowest P value was observed for rs62435145 near UNCX on chromosome 7 (P = 5.23 × 10^−67 Supplementary Data [92]2). All statistical tests employed two-sided P values. BBJ Biobank Japan, eGFR estimated glomerular filtration rate, GWAS genome-wide association study, SNP single-nucleotide polymorphism, TWB Taiwan Biobank. Replication of eGFR-associated SNPs in an independent TWB dataset We evaluated the replication of eGFR-associated SNPs, defined as genome-wide significant SNPs, in an independent replication dataset that comprised data from 27,058 individuals in the TWB. Of the 5790 eGFR-associated SNPs identified during the discovery stage, 5342 were available in the replication dataset (Supplementary Data [93]4). These SNPs had a consistent effect direction in the TWB-based replication dataset, with 3899 of them having a P value of <0.05. The effect estimates strongly correlated with those from the discovery stage, exhibiting almost complete consistency in directionality (Pearson’s r = 0.97; P < 0.0001; Fig. [94]3). Of the 3899 replicated eGFR-associated SNPs, 387 from 10 independent genomic risk loci were genome-wide significant in the TWB-based replication dataset. These SNPs were located close to genes including ABCG2, BCAS3, CSP9, CCDC158, DCDC1, DNAJC16, LRP2, MRTFA, MUC1, NRG4, SCARNA21B, SHROOM3, SLC34A1, STBD1, THBS3, and TRIM46 (Supplementary Data [95]4). Fig. 3. Replication of eGFR-associated SNPs in an independent replication dataset derived from the TWB (n = 27,058). Fig. 3 [96]Open in a new tab Data regarding 5342 out of 5790 eGFR-associated SNPs were available in the TWB-derived replication dataset. Of these eGFR-associated SNPs, 3899 were replicated in the TWB-derived replication dataset (two-sided P < 0.05, consistent effect direction), plotted as blue dots (“Yes” indicated by blue dots, with a consistent effect, two-sided P < 0.05; “Inconclusive” indicated by gray dots, two-sided P ≥ 0.05). The blue line represents the best fit of the blue dots. Pearson’s r is 0.97 (two-sided P < 0.0001). The data present the effect estimates, and error bars correspond to 95% CIs. Further details are provided in Supplementary Data [97]4. BBJ Biobank Japan, eGFR estimated glomerular filtration rate, SNP single-nucleotide polymorphism, TWB Taiwan Biobank. Correlations of eGFR loci with blood urea nitrogen and CKD We examined the correlation between eGFR-associated SNPs and blood urea nitrogen (BUN) to determine whether the identified SNPs were directly related to kidney function rather than creatinine metabolism^[98]10. Urea nitrogen is a harmful waste product that may accumulate in the blood if kidney function is impaired^[99]18. Our BUN-related meta-analysis of GWASs involved BBJ and TWB datasets (n = 241,112). A total of 2365 replicated eGFR-associated SNPs were associated with BUN (P < 0.05; Supplementary Fig. [100]2a and Supplementary Data [101]5). The effect sizes of 2064 replicated eGFR-associated SNPs from 31 independent genomic risk loci associated with eGFR and BUN were strongly and inversely correlated (defined as kidney-relevant); this finding was consistent with the established understanding of kidney pathophysiology (Pearson’s r = −0.86; P < 0.0001). To determine whether the replicated eGFR-associated SNPs altered the risk of kidney diseases (e.g., CKD or ESKD), we conducted a logistic regression of a validation longitudinal cohort from the Clinical Research Data Repository of CMUH (CMUH-CRDR), which contains data regarding CKD follow-up status (CKD, n = 4509; control, n = 20,836). In addition, we examined the effect of replicated eGFR-associated SNPs on CKD; the results indicated that the effect direction of 397 replicated eGFR-associated SNPs on the eGFR was negatively correlated with the relevant effect direction of CKD (P < 0.05; maximum, median, and minimum odds ratios [ORs] = 1.02, 1.01, and 0.98, corresponding 95% confidence intervals [CIs] = 1.01–1.03, 1.00–1.01, 0.97–1.00, respectively; Pearson’s r = −0.94, P < 0.0001; Supplementary Fig. [102]2b and Supplementary Data [103]6). These SNPs included 160 from 14 independent genomic risk loci likely related to kidney function, which were located close to the following genes: ASCC3, DCDC1, F12, FAM47E, FBXO22, HCRTR2, KNG1, LRP2, RAI14, NRG4, PAX8, PDILT, SIM1, STC1, TINAG, UBE2Q2, and WDR72. Data from the CMUH-CRDR (ESKD, n = 706; control, n = 20,836) were employed to clinically validate the effect of replicated eGFR-associated SNPs on the risk of ESKD. A total of 162 replicated eGFR-associated SNPs exhibited negative correlations with the effect directions of the eGFR and ESKD (P < 0.05; maximum, median, and minimum ORs= 1.01, 1.00, and 0.99, corresponding 95% CIs = 1.01–1.02, 0.99–1.00, 0.98–1.00, respectively; Pearson’s r = −0.96, P < 0.0001; Supplementary Fig. [104]2c and Supplementary Data [105]7), including 69 likely kidney-related SNPs from 7 genomic risk loci close to the following genes: KNG1, F12, HLA-DQA1, TINAG, RSBN1L, and TMEM60. Genetic correlations of the eGFR and BUN with other phenotypes We explored the genome-wide genetic correlations (r[g]) of eGFR and BUN with 71 quantitative and 48 binary phenotypes from a TWB discovery dataset to understand their shared genetic basis (Supplementary Data [106]8). Totals of 13 and 2 statistically significant genetic correlations were identified for eGFR and BUN, respectively (P < 4.2 × 10^−4 = 0.05/119; Supplementary Data [107]8). With the exception of serum creatinine (S-Cre), the strongest genetic correlations observed between the eGFR and phenotypes were those with BUN (r[g] = −0.30, P = 1.13 × 10^−8), urinary albumin (r[g] = 0.25, P = 5.92 × 10^−7), uric acid (r[g] = −0.24, P = 7.15 × 10^−10), and muscle mass (r[g] = −0.14, P = 1.67 × 10^−7; Supplementary Fig. [108]3a). For BUN, the strongest genetic correlation observed was that with S-Cre (r[g] = 0.28, P = 4.06 × 10^−8; Supplementary Fig. [109]3b). Genetic correlation analysis revealed that muscle mass was correlated with eGFR but not with BUN. These results indicated that the eGFR-associated SNPs reflected the regulatory roles of renal excretion and muscle generation in S-Cre levels. Functional enrichment and pathway enrichment analyses To determine whether the eGFR-associated SNPs were mechanistically linked to kidney function, we conducted serial enrichment analyses to characterize tissue-specific gene expression, regulatory annotations, and pathway dynamics. Multimarker Analysis of GenoMic Annotation (MAGMA) software was used to prioritize genes for gene sets, pathways, and cell types based on the results of a meta-analysis of eGFR GWASs. We identified significantly enriched tissues and cell types with the strongest enrichment of eGFR-associated SNPs observed in the kidney medulla (P = 1.01 × 10^−6) and kidney cortex (P = 1.26 × 10^−6) tissues in Genotype-Tissue Expression (GTEx) version 8 (54 tissue types; Fig. [110]4 and Supplementary Data [111]9). Pathway enrichment analysis revealed nine significant canonical pathways, including several pathways relevant to kidney function, such as urate metabolism (Bonferroni-corrected P = 2.0 × 10^−4) and abacavir transmembrane transport (Bonferroni-corrected P = 5.0 × 10^−4; Table [112]1 and Supplementary Data [113]10). Enrichment analysis of BUN-associated SNPs in specific tissues and cell types revealed a similar expression pattern to that of eGFR-associated SNPs, including in kidney medulla and kidney cortex tissues (Supplementary Data [114]9); this finding supports the use of BUN for prioritizing loci that are highly likely to be associated with kidney function. Fig. 4. Tissue-specific analysis of eGFR GWASs. [115]Fig. 4 [116]Open in a new tab Functional analysis of an eGFR-derived GWAS was conducted using GTEx version 8 (54 tissue types) in MAGMA. Kidney medulla and kidney cortex tissues had P values of <0.05 (above the dashed line). All statistical tests employed two-sided P values. Further details are provided in Supplementary Data [117]9. eGFR estimated glomerular filtration rate, GTEx Genotype-Tissue Expression, GWAS genome-wide association study, MAGMA Multimarker Analysis of GenoMic Annotation. Table 1. Gene set enrichment analysis of GWASs for the eGFR in MAGMA Database Category Gene set No. of genes Beta Standard deviation of beta Standard error of beta P value Bonferroni-corrected P value Gene Ontology Biological process Urate metabolic process 10 1.92 0.04 0.34 1.29 × 10^−8 0.0002 Reactome Curated gene set Abacavir transmembrane transport 5 2.68 0.04 0.49 3.03 × 10^−8 0.0005 Gene Ontology Biological process Cellular response to endogenous stimulus 1263 0.16 0.04 0.03 1.51 × 10^−7 0.0023 Gene Ontology Biological process Enzyme-linked receptor protein signaling pathway 947 0.18 0.04 0.04 2.90 × 10^−7 0.0045 Gene Ontology Biological process Negative regulation of RNA biosynthetic process 1104 0.15 0.04 0.03 6.11 × 10^−7 0.0095 Gene Ontology Biological process Response to endogenous stimulus 1501 0.14 0.04 0.03 6.25 × 10^−7 0.0097 Gene Ontology Biological process Contact inhibition 8 1.50 0.03 0.32 1.05 × 10^−6 0.0163 Gene Ontology Molecular function Regulatory region nucleic acid binding 849 0.17 0.04 0.04 1.12 × 10^−6 0.0174 Gene Ontology Cellular compartment Activin receptor complex 7 2.44 0.05 0.52 1.40 × 10^−6 0.0217 [118]Open in a new tab eGFR estimated glomerular filtration rate, GWAS genome-wide association study, MAGMA Multimarker Analysis of GenoMic Annotation. All statistical tests employed two-sided P values. Bonferroni correction was applied for multiple testing adjustment of P values. Stratified LD score regression was used to estimate the contributions of cell-type-specific functional genomic elements and tissue-specific gene expression to heritability through GWAS summary statistics pertaining to the eGFR and BUN. Cell-type-specific functional genomic elements were sourced from the Roadmap Epigenomics database^[119]19, and tissue-specific gene expression data were derived from the GTEx^[120]20 and Franke Lab databases^[121]21. In the eGFR GWAS, fetal kidney tissues were regarded as the enriched tissues for functional genomic elements, and kidney (A05.810.453.kidney), pancreas, and kidney cortex tissues were regarded as the most significant tissues for tissue-specific gene expression, followed by fetal kidney tissues (Supplementary Data [122]11). In the BUN GWAS, fetal kidney tissues were the enriched tissues for functional genomic elements, and kidney cortex tissues were the enriched tissues for tissue-specific gene expression (Supplementary Data [123]11). These findings indicated that kidney-specific epigenomic elements and gene expression contributed to the heritability of the eGFR and BUN. Statistical fine mapping of causal variants from eGFR GWAS To identify the causal variants among the eGFR-associated SNPs, a stepwise conditional analysis was conducted using genome-wide complex trait analysis–conditional and joint analysis (GCTA-COJO) with an in-sample LD reference. Subsequently, statistical fine mapping of eGFR loci was conducted through summary statistics-based conditional analyses for 238 independent significant SNPs mapped to 97 genomic risk loci. We found a credible set with a cumulative posterior probability (PP) of more than 99% and a median set size of 52 SNPs for independent significant SNPs (Supplementary Data [124]12). The potential causal variants within the small credible set were evaluated to determine their functional impact and regulatory potential. Missense SNPs with a cumulative PP of over 99% or mapping to a small credible set (n < 5) are particularly important, as they suggest a direct involvement of the affected gene. As shown in Fig. [125]5a and Table [126]2, five missense SNPs were identified. Among these missense SNPs, the rs17730281 missense SNP in WDR72 had a combined annotation-dependent depletion (CADD) score of >15, which supported its potential deleterious effect (Table [127]2). To determine the regulatory potential of SNPs from small credible sets within the kidney, we associated these SNPs with DNase I hypersensitivity sites identified from the Roadmap Epigenomics database, which includes multiple kidney cell types. We subsequently prioritized 23 eGFR-associated SNPs that were mapped to one of these epigenomic annotations with a credible set size of <5 and a PP of >95%, indicating their potential as causal regulatory variants (Supplementary Data [128]13). The rs9895661 SNP close to the BCAS3 locus had a PP of >95% and a CADD score of 17.47, which suggested its regulatory potential for gene expression in kidney tissues (Fig. [129]5b). Fig. 5. Fine mapping of credible sets of exonic and regulatory SNPs. [130]Fig. 5 [131]Open in a new tab a Fine mapping of exonic SNPs. The triangles represent exonic SNPs, and their sizes correspond to their CADD scores. The red triangles indicate exonic SNPs with a credible set size of <5 or a PP of >99%. b Fine mapping of regulatory SNPs. Each color corresponds to a unique tissue type, as indicated by Roadmap Epigenomics data. The labels indicate credible set sizes of ≤10 and PPs of >95%. All statistical tests employed two-sided P values. Further details are provided in Supplementary Data [132]13. CADD combined annotation-dependent depletion, PP posterior probability, SNP single-nucleotide polymorphism. Table 2. Missense SNPs with a small credible set size (n < 5) or a PP of 50% rsID Credible set size PP Gene Exonic function CADD Regulatory function Summary rs76872124 5 0.63 TRIM46 p.Thr474Ile 18 Roadmap Others TRIM46 is a gene that encodes a protein belonging to the tripartite motif family. This gene is associated with interferon-gamma signaling and cytokine signaling in the immune system^[133]96. It is also associated with inflammatory bowel disease^[134]97. The SNPs in TRIM46 are associated with measurements of uric acid^[135]98 and BUN^[136]97. rs140449886 3 0.25 GON4L p.Asp1522Gly 0.777 None GON4L is a gene that encodes a protein with transcriptional repressor activity. This activity is presumably mediated by the formation of a complex with YY1, SIN3A, and HDAC1^[137]99. GON4L is required for B-cell lymphopoiesis^[138]100. It is also associated with chronic obstructive pulmonary disease. The SNPs in GON4L are associated with measurements of BUN^[139]10. rs2304456 1 1 KNG1 p.Ile197Met 3.084 None KNG1 is a gene that encodes kininogen 1—a protein that serves as an inhibitor of thiol proteases. KNG1 is associated with several diseases, including high-molecular-weight kininogen deficiency^[140]101 and type 6 hereditary angioedema^[141]102. The SNPs in KNG1 are associated with measurements of creatinine^[142]103 and BUN^[143]17. rs2231137 1 1 ABCG2 p.Val12Met 4.287 None ABCG2 is a gene that encodes ATP-binding cassette subfamily G member 2—a protein that belongs to the ATP-binding cassette transporter family. ABCG2 is associated with several diseases, including hyperuricemia^[144]104 and gout^[145]105. The SNPs in ABCG2 are associated with measurements of uric acid^[146]9, hyperuricemia^[147]106, gout^[148]107, and creatinine^[149]97. rs17730281 1 0.99 WDR72 p.Leu819Pne 25.7 None WDR72 is a gene that encodes WD repeat domain 72—a protein that is presumably involved in the localization of the calcium transporter SLC24A4 to the ameloblast cell membrane^[150]108. WDR72 is associated with several diseases, including CKD^[151]10 and hypomaturation amelogenesis imperfecta^[152]109,[153]110. The SNPs in WDR72 are associated with measurements of the estimated glomerular filtration rate^[154]111, creatinine^[155]103, and urinary pH^[156]112. [157]Open in a new tab CADD combined annotation-dependent depletion, BUN blood urea nitrogen, SNP single-nucleotide polymorphism, PP posterior probability. Statistical colocalization for causal gene prioritization Colocalization analyses of 111 eGFR-associated lead SNPs were conducted using cis-expression quantitative trait locus (cis-eQTL) data and eGFR GWASs within ±100 kb regions of the lead SNPs. Specifically, cis-eQTLs across 51 tissues were sourced from GTEx version 8 and the Human Kidney eQTL Atlas, which includes data regarding tissues from renal glomerular and tubulointerstitial compartments^[158]11. A PP of >80% for colocalization was observed in 287 genes encompassing multiple tissue types, including 43 genes in at least one kidney tissue type (Supplementary Data [159]14). Changes in the expression of these 43 genes in the kidney may be linked to the eGFR, as several genes have been reported^[160]10. For instance, UMOD was identified as a casual gene for CKD^[161]22, with higher UMOD gene expression associated with a lower eGFR (Supplementary Fig. [162]4). We observed that the rs77924615 SNP was associated with high UMOD gene expression and a low eGFR in both glomerular and tubulointerstitial compartments (Supplementary Fig. [163]4), while Wuttke et al. observed the rs77924615 SNP only in tubulointerstitial compartments^[164]10. Overall, our findings underscore the key roles of several kidney function-related genes—including UNCX, TBX2, and SHROOM3—and are consistent with those of related studies^[165]10,[166]23,[167]24. According to our colocalization findings, FGF5 and F12 are potential effector genes that influence an individual’s eGFR. Notably, the rs16998073 SNP associated with FGF5 gene expression was detected in both renal glomerular and tubulointerstitial compartments (Supplementary Fig. [168]4). FGF5, which encodes fibroblast growth factor 5, has been associated with blood pressure, coronary artery disease, and kidney function^[169]10. In the present study, the rs16998073 SNP associated with F12 gene expression was exclusively identified in renal tubulointerstitial compartments (Supplementary Fig. [170]4). F12, which encodes coagulation factor XII protein, plays a role in the renin–angiotensin–aldosterone system, which may be linked to kidney function^[171]25. Our analyses revealed that eGFR-associated SNPs demonstrate colocalization with eQTL in kidney tissues. Among these SNPs, some also colocalized with gene expressions in other tissue types and showed consistent effect directions, similar to those observed as in kidney tissue, with the exceptions of CEP89 and PAX8 (Supplementary Fig. [172]4). Cumulative incidence of CKD throughout an individual’s life, stratified by polygenic risk scores To examine the relevance of our findings regarding genetic susceptibility to CKD, we constructed a polygenic risk score (PRS) for CKD (PRS[CKD]) by using GWAS-derived summary statistics for the eGFR. An optimal PRS model was obtained through a meta-analysis of eGFR-related GWAS summary statistics, with BBJ- and TWB-derived discovery data used as the base datasets and with TWB-derived replication data regarding CKD used as the target dataset (with a cut-off P value of 0.048). Information regarding the onset of CKD is particularly useful for validating the longitudinal predictive performance of PRSs derived from eGFR-associated SNPs. We observed a significantly higher cumulative incidence of CKD in patients with a PRS[CKD] two SDs above the mean compared with those with a PRS[CKD] two SDs below the mean in an external Taiwanese dataset. This difference in cumulative incidence remained constant throughout the lives of the participants, particularly from age 50 to age 80 (Fig. [173]6a). An analysis of the adjusted hazard ratio (HR) of CKD revealed a dose–response pattern between patients with a PRS[CKD] two SDs above the mean and those with a PRS[CKD] within two SDs of the mean, with the corresponding adjusted HRs for patients with a PRS[CKD] two SDs below the mean being 2.3 (95% CI = 1.7–3.0; P < 0.0001) and 1.6 (95% CI = 1.3–2.1; P < 0.0001) (Supplementary Fig. [174]5), respectively. We also examined the time to reach a 10% cumulative incidence of CKD in the three PRS[CKD] groups. This threshold is close to the global CKD prevalence of 9.1%^[175]1. The time to reach a 10th percentile cumulative incidence of CKD in the three PRS[CKD] groups (above, within, and below two SDs) were 61.8, 63.8, and 69.8 years after birth, respectively. Notably, the PRS[CKD] effectively differentiated between high- and low-risk groups in both East Asians and White British populations, as demonstrated in the UKB (Fig. [176]6b). The area under the receiver-operating characteristic curve (AUROC) for PRS[CKD] was consistent across the Taiwanese and White British populations, namely 0.788 for both (95% CIs = 0.781–0.794 and 0.783–0.793, respectively; Fig. [177]6c). In addition, the calibration curve for our PRS[CKD] model indicated that the predicted probability was highly consistent with the observed probability, particularly between 0.4 and 0.5, indicating the model’s accuracy (Fig. [178]6d). Fig. 6. Cumulative incidence of CKD based on PRS stratification with a Taiwanese dataset obtained from the CMUH-CRDR and a White British dataset obtained from the UKB. [179]Fig. 6 [180]Open in a new tab The high-PRS group exhibited a higher cumulative incidence of CKD than did the low-PRS group across age in the a external Taiwanese dataset (CMUH-CRDR, n = 25,345, P = 2.06 × 10^−7) and b White British dataset (UKB, n = 260,245, P = 2.60 × 10^−29). The data present the PRS and error bars correspond to 95% CIs. All statistical tests employed two-sided P values. The dashed line represents a CKD cumulative incidence of 10%, which is an estimate of the global prevalence of CKD. c The AUROC of the CKD PRS model is 0.788 in both the CMUH-CRDR and UKB datasets. d The calibration curve of our PRS[CKD] model indicates that the predicted probability was closely aligned with the observed probability when the predicted probability ranged between 0.4 and 0.5 in the CMUH-CRDR dataset and between 0.0 and 0.2 in the UKB dataset. AUROC area under the receiver-operating characteristic curve, CI confidence interval, CKD chronic kidney disease, CMUH-CRDR Clinical Research Data Repository of China Medical University Hospital, PRS polygenic risk score, SD standard deviation, UKB UK Biobank. Discussion This study represents a large GWAS examining the eGFR of East Asian populations from Taiwan and Japan, which are among the regions with the highest prevalence of ESKD worldwide. A total of 26 unreported genomic risk loci closely linked to kidney function were identified. We discovered that individuals whose PRS[CKD] was within the highest two SDs of the PRS[CKD] distribution were more likely than others to develop CKD and also developed CKD approximately 8 years earlier than did those whose PRS[CKD] was within the lowest two SDs. When PRS[CKD] was combined with age and sex, it demonstrated good discriminative and calibrating performance in forecasting CKD development, achieving an AUROC of 0.788. Functional enrichment analysis revealed that the identified biological pathways were primarily associated with urate metabolic process. This finding aligns with the fine-mapping result, which identified a missense mutation in ABCG2 as a potential causal variant. A major gap between research regarding translation of genetic nephrology and real-world practice manifests as the underrepresentation of certain ancestry groups, particularly Asian populations^[181]26,[182]27. The present study sought to address this gap and expand the current level of understanding regarding the long-term epidemiology and susceptibility patterns of CKD in Taiwan^[183]28. The escalating prevalence of ESKD in Taiwan imposes a substantial socioeconomic burden, yet its primary causes remain unidentified^[184]29. Over the preceding three decades, extensive efforts have been made to determine the factors underlying Taiwan’s high incidence of CKD. Some researchers have suggested the involvement of Chinese herbal medicine^[185]30 or environmental exposures, such as arsenic^[186]31. Others have speculated that Taiwan’s universal healthcare system may have resulted in an increase in CKD diagnoses^[187]32. However, these theories lack causal evidence. Conversely, the present findings uniquely demonstrate a genetic predisposition to early CKD onset in the Taiwanese population, thereby corroborating the results of a population-based familial aggregation study that revealed a moderate heritability rate of 31.1% for the phenotypic variance of ESKD^[188]15. After applying the highest PRS[CKD] threshold of two SDs, we observed an earlier onset of CKD, by an average of 8 years, among individuals in their mid-50s who were at the highest genetic risk of CKD. Although the predictive performance of PRS[CKD] was modest in the White British dataset, it effectively differentiated between groups with high and low risk of developing CKD as defined by PRS[CKD]. These findings underscore the importance of shifting toward early prevention of CKD through genetic counseling. Our proposed, PRS-based risk stratification has the potential to provide an opportunity for the early implementation of CKD preventive care strategies, such as intensive prediabetes management and rigorous blood pressure monitoring and optimization. Nevertheless, a more comprehensive pragmatic trial is required to assess the real-world effectiveness of PRS[CKD] in preventing CKD or slowing its progression. From the enrichment analysis, we identified three eGFR-associated genes that were physiologically relevant, which may serve as potential targets for CKD drug development^[189]33. Compared to prior findings mainly derived from Caucasian and African American populations, this study focused on East Asian populations and revealed that UMOD but not APOL1 was significantly associated with the eGFR^[190]10,[191]34. The UMOD variant was discovered in a 2009 GWAS that employed the S-Cre-based eGFR as a phenotype^[192]35. Extensive research has underscored the promising prognostic value of UMOD for CKD in general populations of various ethnicities^[193]22. A recent discovered variant of UMOD, namely p.Thr62Pro, has been found to have a moderate effect on the risk of CKD in a large proportion of individuals with European ancestry, with endoplasmic reticulum homeostasis and maturation being potential pathogenic pathways^[194]36. A large Mendelian Randomization (MR) study with a sample size of 567,460 provided evidence further supporting a causal relationship between UMOD and the risk of CKD^[195]37. These findings highlight the potential for drug discovery by exploring whether modulating UMOD expression could be a viable strategy for CKD prevention. The F12 gene was consistently identified in all our replication and validation analyses (Supplementary Data [196]15). F12 gene encodes the plasma coagulation factor XIIa, which can subsequently cleave plasma pre-kallikrein to kallikrein, thereby initiating the kallikrein-kinin system (KKS)^[197]38. In a subset of twin and sibling participants recruited from southern California, F12 was found to involve the renin–angiotensin system (RAS) pathway and potentially regulate blood pressure^[198]25. Association of the F12 polymorphism and serum osteopontin (OPN) levels was identified in the German Chronic Kidney Disease (GCKD) study^[199]39. OPN is a phosphorylated glycoprotein encoded by SPP1; it is predominantly synthesized in kidney tissue and has been associated with kidney fibrosis in animal experiments and in the German Chronic Kidney Disease (GCKD) study^[200]39,[201]40. In addition, the association of the F12 gene with CKD may be attributable to dysregulated blood pressure and chronic inflammation resulting from the activation of the RAS and KKS, respectively^[202]25,[203]39. A recent study from China has identified F12 as a druggable target through extensive MR and colocalization analysis, using data from the CKDGen Consortium and cell-type-dependent eQTL data from kidney tubular and glomerular samples^[204]41. Continued research efforts are required to elucidate the potential beneficial role of targeting F12 in patients across a range of kidney disease spectrums, with a focus on replication studies and large clinical trials. Among the three index genome-wide-significantly SNP associated with OPN, rs10011284 was mapped to an intergenic region between SPP1 and MEPE, and MEPE was found to be involved in bone mineralization, phosphate homeostasis, and bone turnover^[205]39,[206]42,[207]43. The variant rs10011284, also linked to gout, may be influenced by ABCG2, which is located near SPP1 and MEPE. This proximity suggests a potential pathogenic role for ABCG2 in the development of CKD^[208]39. Although the ABCG2 gene is well known for its connection with serum uric acid levels and gout, as demonstrated in previous GWASs^[209]44–[210]46, mechanistic studies^[211]47,[212]48, and human research^[213]49–[214]51, its independent role in CKD development remains to be clarified. The ABCG2 protein, also known as the breast cancer resistance protein, is a multidrug transporter and a high-capacity urate exporter^[215]47,[216]52,[217]53. In a human study, reduced ABCG2 function was associated with a rapid decline in the age-dependent eGFR among individuals with serum uric acid levels exceeding 6 mg/dL. This finding underlines the potential role of ABCG2 in eGFR decline, especially given the high prevalence of hyperuricemia in East Asia^[218]54. These insights underscore the importance of further exploring the feasibility of integrating ABCG2 function quantification into routine practice to enhance CKD care and disease prevention. We identified three of five causal missense SNPs in the present study differ from those reported in previous CKD genetic studies^[219]8,[220]10,[221]54. The genes associated with these missense SNPs, such as TRIM46 and GON4L, have been linked to tubular fibrosis and BUN levels, respectively^[222]10,[223]55. Our study highlights the genetic diversity in CKD risk between populations of European ancestry and East Asian ancestry. For instance, rs4715491 in the FAM83B gene and rs4148155 in the ABCG2 gene were found to be associated with eGFR exclusively in Asian populations. The genetic variants contributing to CKD risk across different ancestries warrant cautious interpretation and limited generalizability from regional evidence, potentially arising from the following reasons: population-specific variations such as founder effects, changes in allele frequency due to genetic drift and local selection^[224]12, and interactions between genetic backgrounds and environmental exposure^[225]56. Therefore, the null association between rs4148155 and eGFR decline in European populations may be underestimated, as the allele frequency of rs4148155 in the ABCG2 gene is only 0.104 in European populations, compared to 0.319 in East Asian populations^[226]57. The variance in rs4148155 allele frequency indicates the necessity of larger European sample sizes for achieving statistical significance. In addition, the effect of rs4148155 in ABCG2 may be amplified by the purine-rich diet prevalent in Taiwan, which is known to exacerbate gout and hyperuricemia—both of which are risk factors for CKD. This concern underscores the importance of utilizing regional genetic data to develop feasible PRSs for disease prevention and early CKD screening in specific populations. Future studies should investigate how the distinct genetic risk patterns defined by PRS[CKD] interact with environmental exposures and contribute to geographical disparities in CKD epidemiology worldwide. This study had several methodological strengths, including a large sample size comprising relatively homogenous East Asian populations; the use of standardized, high-quality genotype imputation and phenotype identification; and a comprehensive bioinformatic analysis of eGFR-associated genes and pathways. In addition, the potential causal association in this study was suggested by the validation of PRS[CKD] through longitudinal prediction with the phenotype of CKD onset in an independent dataset. This study also had some limitations. First, our findings may not be generalizable to people with non-Asian ancestry. In addition, the assumption of fixed-allele effects across multiple geographical areas in Asia may not hold given the potential for geographical differences in environment–gene interaction^[227]58. Despite this inherent limitation, we observed similar PRS[CKD] model performance in both the independent Taiwanese and White British populations. Second, the proposed PRS for risk of CKD was derived from the GWAS for kidney function estimated by eGFR based on S-Cre rather than CKD, the responsible phenotype. While this approach has been debated in a previous study^[228]59, it may not be feasible to perform GWAS directly for CKD as CKD itself stands for a highly heterogenous disease spectrum from arbitrary definitions to etiologies of primary or secondary kidney degeneration. Third, the possibility of an underestimation of potential eGFR-associated SNPs and their genetic impact on CKD development cannot be excluded, despite our study having a large sample size among those conducted on East Asian populations to date^[229]9,[230]60,[231]61. In this GWAS of eGFR using large data exclusively from an East Asian population, we identified 2140 previously unreported genome-wide significant SNPs. One of the prioritized genes was, F12, has been potentially linked to CKD through RAS and KKS dysregulation and supported by a recent GWAS-based drug repurposing study^[232]41. The robust association between PRS[CKD] and the cumulative incidence of CKD across different ancestries sets the stage for further research to verify its clinical effectiveness in early CKD prevention. Methods Biobank data source BBJ The BBJ is a hospital-based registry of DNA samples, serum samples, and clinical information collected from approximately 200,000 patients with one or more of a collection of 47 common diseases (e.g., cancers, neurological diseases, cardiovascular diseases, and infectious diseases) identified by physicians at 66 hospitals affiliated with 12 medical institutions between 2003 and 2007^[233]62. Informed consent was obtained from all participants in writing, and the ethics committees of the RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences, at the University of Tokyo approved the study. The RIKEN Center made available the GWAS summary statistic for public download at the Japanese Encyclopedia of Genetic Associations by Riken [[234]http://jenger.riken.jp/en/result] without requiring data application. The BBJ comprised 143,658 available eGFR and 139,817 available BUN from patient population with a mean age (SD) of 62.9 ( ± 13.2) years, a mean eGFR (SD) of 79.93 ( ± 15.42) mL/min/1.73 m^2, and a mean (SD) BUN level of 15.44 ( ± 4.77) mg/dL^[235]9; 54.9% of this group were male. TWB The TWB is an ongoing project that contains data and samples from a national prospective cohort of the Taiwanese population; its goal is to longitudinally collect a wide range of phenotypic measurements and genomic data from Han Chinese individuals aged 20–70 years without cancer history^[236]63. The TWB includes two customized arrays, namely TWBv1 and TWBv2, which are specifically designed for the Taiwanese population (see “Genotyping and imputation”). In the present study, in addition to demographic and phenotypic data, we obtained TWBv2 genotyping data for 101,294 individuals for our discovery dataset. The TWB-based discovery dataset included individuals with a mean age (SD) of 50.4 ( ± 10.9) years, a mean eGFR (SD) of 101.95 ( ± 14.75) mL/min/1.73 m^2, and a mean BUN level (SD) of 13.07 ( ± 3.90) mg/dL.; 45.6% of the individuals were male. We also obtained TWBv1 genotyping data for 27,058 individuals for our replication dataset. The TWB-based replication dataset included individuals with a mean age (SD) of 49.26 ( ± 11.10) years, a mean eGFR (SD) of 101.83 ( ± 15.22) mL/min/1.73 m^2, and a mean BUN level (SD) of 13.26 (4.00) mg/dL; 49.9% of the individuals were male. The data utilization and research conduct were approved by the Ethics Committees of Academia Sinica, the TWB, and CMUH^[237]25. UKB The UKB is a large prospective database of biological samples obtained from ~500,000 individuals aged 40–69 years with extensive phenotypic data ([238]http://www.ukbiobank.ac.uk)^[239]64. Genotyping was performed using the Affymetrix UK BiLEVE Axiom array on an initial set of 50,000 participants; the Affymetrix UKB Axiom array was then used on the remaining set of participants. In the UKB, 91.7%, 1.9%, 0.8%, and 5.6% of the included data are from European (White) individuals, Asian individuals, Black individuals, and individuals of other ethnicities, respectively. In this study, we used data for 260,245 White British participants as validation data to confirm the predictive performance of PRS[CKD]. The validation cohort had a mean age (SD) of 54.98 ( ± 18.20) years and 8821 cases of CKD, and 45.7% of them were male. CMUH-CRDR We used independent data from the CMUH-CRDR as clinical validation data. Between 2003 and 2020, the CMUH-CRDR documented the electronic medical records data of 3,077,895 patients, including demographic (e.g., age, self-report sex) and administrative data, diagnostic data, surgical data, laboratory measurements, and mortality data, from the Health and Welfare Data Science Center of the Ministry of Health and Welfare^[240]65. CMUH also conducted its Precision Medicine Project to gather genetic data using the TWBv2 genotyping array^[241]66. The CMUH’s Precision Medicine Project consecutively enrolled outpatient participants since 2018^[242]67. A total of 25,345 patients with eGFR measurements were identified from the combined CMUH-CRDR (clinical data) and CMUH’s Precision Medicine Project (genotyping data) cohort. This cohort had a mean age (SD) of 56.72 ( ± 15.73) years, 4509 cases of CKD, and 706 cases of ESKD; 44.3% of the cohort was male. All the participants provided written informed consent. The study protocol was approved by the Big Data Center and the Research Ethics Committee and Institutional Review Board (REC/IRB) of CMUH (IRB No. CMUH105-REC3-068, CMUH110-REC2-145, CMUH111-REC3-138, and CMUH112-REC2-036). Phenotype definition The primary phenotype of concern was the eGFR, which is based on S-Cre measurements. S-Cre and BUN levels were measured using the Jaffe rate method and enzymatic conductivity rate method, respectively, on a Beckman UniCel DxC 800 immunoassay system (Beckman Coulter, Brea, CA, USA). S-Cre values were calibrated using an isotope dilution mass spectrometry reference method^[243]68. The Chronic Kidney Disease Epidemiology Collaboration equation was employed to determine eGFR values^[244]69. Before GWAS analysis, BUN and eGFR values were normalized using rank-based inverse normal transformation (RINT)^[245]70. In addition, the predictive performance of the proposed PRS was validated using the CKD phenotype, which was differentially defined in the three study populations. In the TWB validation cohort, CKD was identified by a baseline eGFR of <60 mL/min/1.73 m^2, whereas in the CMUH-CRDR validation cohort, it was identified by the presence of at least two outpatient eGFR measurements of <60 mL/min/1.73 m^2, with a minimum interval of 90 days between measurements. In the sensitivity analysis of the UKB cohort, CKD was defined in accordance with the International Classification of Diseases (ICD) codes for CKD (ICD-10 codes N18.3, N18.4, N18.5, and N18.6). CKD controls were defined as individuals with no CKD and a baseline eGFR of >90 mL/min/1.73 m^2. In the CMUH-CRDR cohort, progression to ESKD was defined as the initiation of long-term renal replacement therapy (peritoneal dialysis, hemodialysis, or kidney transplantation) identified from certificates of catastrophic illness issued by the National Health Insurance Administration, Ministry of Health and Welfare, Taiwan. ESKD controls were individuals with no ESKD and a baseline eGFR of >90 mL/min/1.73 m^2. Genotyping and imputation Genotyping was conducted using two custom SNP arrays based on the Taiwanese population, namely TWBv1 and TWBv2^[246]63,[247]71,[248]72. The TWBv1 array was based on the Thermo Fisher Axiom Genome-Wide CHB Array, with customized content containing ~650,000 markers on the GRCh37 coordinates, aimed at capturing functional variants. The TWBv2 array covers rare coding risk alleles based on whole genomic sequence data obtained from 946 TWB samples. It also contains data on ~690,000 markers aligned to the GRCh38 reference build. For imputation, low-quality samples and variants were filtered out, including those with an SNP genotype call rate of <95%, an individual call rate of <95%, a minor allele frequency (MAF) of <1%, and a Hardy–Weinberg equilibrium (HWE) P value of <1 × 10^−4. The 1000 Genome Project Phase 3 was used as the reference to exclude samples obtained from populations of non-East Asian ethnicity, as determined through principal component analysis (PCA). The genotyping data were converted from GRCh38 to GRCh37 using LiftOver^[249]73. Then, we pre-phased genotyping data using SHAPEIT2^[250]74 and imputed it using IMPUTE2^[251]75, which generated 81,698,455 imputed SNPs. We further filtered out imputed SNPs with an INFO score of lower than 0.7, resulting in the retention of 18,609,316 imputed SNPs. All quality control procedures were conducted using PLINK v2.0^[252]76. GWAS We conducted a GWAS by using a machine-learning method, called REGENIE^[253]77. Before REGENIE, both raw and imputed genotyping data were subjected to quality control checks at an SNP genotype call rate of <99%, a MAF of <1%, an HWE P value of <1 × 10^−15, an individual call rate of <98%, and a heterozygosity rate greater than ±3 SDs. In addition, PCA was conducted using the reference population of the 1000 Genomes Project^[254]16. In the first step of REGENIE, we calculated the genetic correlation of the samples and then used this information as covariates in the second step of REGENIE. In the second step, we applied a linear regression test for quantitative traits, assuming an additive genetic model, and conducted a Firth logistic regression test for binary traits. To adjust for potential confounding effects, we employed age, sex, and the first six principal components (PCs) as covariates in the GWAS of the eGFR. The circular Manhattan plot was generated using the R package “circlize” in R software v.4.3.1 (R Foundation for Statistical Computing, Vienna, Austria)^[255]78. Meta-analysis of GWASs A fixed-effects inverse variance-weighted meta-analysis was conducted using METAL software to increase the accuracy of effect estimates and their standard errors^[256]79. Genomic control (GC) correction was applied when the GC factor λ[GC] exceeded 1. This meta-analysis yielded 5790 SNPs for the eGFR and 3600 SNPs for BUN, with a genome-wide significance level of 5 × 10^−8. Heterogeneity among studies was evaluated using the I^2 statistic. We used LD with an r² threshold of 0.6 to identify independently significant SNPs, and an LD r² threshold of 0.1 to identify lead SNPs from among these independently significant SNPs. Genomic risk loci were defined as regions with a window size of ±250 kb centered on independently significant SNPs. In addition, the genomic risk loci were merged into a single locus if the distance between them was shorter than 250 kb^[257]80. Genetic heritability We examined the genetic heritability of the eGFR by using GWAS summary statistics obtained from our meta-analysis, LD score regression was used to estimate genetic heritability^[258]81. LD scores were estimated from the East Asians panel of the 1000 Genomes Project Phase 3 by using an r^2 estimator with 1 cM windows. SNPs with a MAF of less than 1% and loci with significantly large effect sizes or long-range LD were excluded from all regressions^[259]81. This approach enabled us to avoid potential bias from confounding factors, such as cryptic relatedness or population stratification. Each SNP’s contribution was evaluated by examining the correlation between test statistics and LD^[260]16. Replication in an independent TWB dataset A replication procedure was conducted to validate the findings of the original GWAS meta-analysis through an independent cohort. The TWB dataset, which comprises data regarding 27,705 individuals with available eGFR and TWBv1 genotyping data, was used as a replication dataset. The quality control steps applied for the replication dataset were the same as those described in the subsection of this paper titled “GWAS.” To assess the association between SNPs and RINT (eGFR) in the replication cohort, a linear regression model was used in REGENIE. A SNP was considered replicated if it had the same effect direction as the corresponding SNP in the original GWAS and a P value of <0.05 in the replication dataset. Associations of eGFR-associated SNPs to BUN level, CKD, and ESKD status To investigate the eGFR-associated SNPs with other markers of kidney function, we conducted a meta-analysis of BUN levels by using data from the BBJ- and TWB-based discovery datasets. We then examined the associations between eGFR-associated SNPs and BUN levels. SNPs were considered relevant to kidney function if they had an opposite effect direction on BUN levels relative to the eGFR and a P value of <0.05. Logistic regression modeling was used to evaluate the association between these eGFR-associated SNPs and the risk of developing CKD or ESKD in the CMUH-CRDR cohort. We searched for an opposite effect direction between eGFR-associated SNPs and CKD or ESKD as well as a P value of <0.05. Genetic correlations of eGFR and BUN with other complex traits and diseases To examine the genetic correlations between complex quantitative traits and diseases, a GWAS-based analysis of 119 phenotypes was conducted using the TWB-based discovery dataset. Genetic correlations were estimated using LD score regression^[261]81, which involved two steps. First, the LD scores of each SNP in two GWAS summary statistics were calculated using predetermined East Asian population-based LD scores from the 1000 Genomes Project. Second, cross-trait LD score regression was conducted, which involved regressing the product of the LD scores of SNPs in the two GWAS summary statistics against the observed correlation in effect sizes between traits. Gene prioritization, gene set enrichment, and tissue enrichment analysis Functional analysis was conducted for genes, gene sets, and tissue enrichment by using MAGMA software version 1.08^[262]82, which was integrated into the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA) platform version 1.5.2^[263]80. Data regarding East Asian populations from the 1000 Genomes Project were used for LD calculation. Gene prioritization was performed using the mean P value of SNPs located within 1 kb upstream or downstream of genes. Gene set enrichment analysis was conducted on curated gene sets and Gene Ontology (GO) terms obtained from MSigDB version 7.0^[264]83. For all tested gene sets, the P values utilized in the MAGMA-based gene set analysis were adjusted using Bonferroni correction. A total of 10,678 gene sets were used; these sets were divided into 4761 curated gene sets and 5917 GO terms. Tissue enrichment analysis was conducted using the GTXv8 database^[265]20. Cell-type-specific enrichment through stratified LD score regression Stratified LD score regression was used to identify tissues and cell types relevant to eGFR and BUN GWAS meta-analysis results. Heritability was partitioned from GWAS summary statistics to sets of cell-type-specific regulatory elements and gene expression^[266]84,[267]85. The precalculated LD score for East Asians was obtained from the LD score regression resource website ([268]https://alkesgroup.broadinstitute.org/LDSCORE/). The regulatory annotation datasets for cell-type-specific analysis contained 220 cell types featuring four histone markers, namely H3K4me1, H3K4me3, H3K9ac, and H3K27ac^[269]86–[270]88. The gene expression datasets for cell-type-specific analysis were from GTEx and Franke Lab. GTEx contains RNA sequencing data of 53 human cell types, while Franke Lab includes array data of 152 human and mouse cell types^[271]21. The significance threshold was set at a false discovery rate of less than 5%. Fine mapping and credible set identification in a meta-analysis Genomic risk loci containing functional SNPs were subjected to statistical fine mapping using GCTA-COJO-Slct, a stepwise model selection procedure for identifying independent SNPs^[272]89. Approximate conditional analyses were then performed to determine the conditional effect sizes of all remaining independent SNPs in a locus using the GCTA-COJO-Cond algorithm^[273]89. For each SNP within a locus, the approximate Bayes factors (ABFs) derived from the effect estimate on the eGFR and its standard error of conditional estimates were used to compute the posterior probability (PP) of the SNP being responsible for the association signal (potential causal variant). To calculate the ABF for each SNP, the R package “gtx” version 2.1.6 ([274]https://github.com/tobyjohnson/gtx) was employed, applying Wakefield’s formula^[275]90. The 99% credible sets were calculated by summing the PP-ranked SNPs until the cumulative PP exceeded 99%, thereby representing the credible set of SNPs that included the SNP responsible for the association with eGFR. For the deleterious scoring of functional SNPs in genomic risk loci, we utilized the FUMA platform to integrate annotations from CADD (version 1.4)^[276]80,[277]91. Colocalization analysis of eGFR-associated SNPs with cis-eQTLs Colocalization analysis was conducted through a Bayesian test to examine the question of whether two traits can share a causal variant^[278]92. We examined the correlation between the eGFR and gene expression by evaluating the colocalization between eGFR-associated SNPs and cis-eQTLs from the Human Kidney eQTL Atlas for tubules and glomeruli^[279]11 and by using GTEx version 8 for 49 tissues^[280]20. Both cis-eQTLs and GWAS-derived effect alleles were harmonized and identified within ±100 kb of each GWAS-derived lead SNP for colocalization analysis. We used the R package “coloc” (version 5) with default settings to identify loci with a PP of >80%. In cases where the harmonized effect alleles had the same effect directions in both the eQTL and GWAS data, the aligned allelic effect direction was defined as positive. PRS for CKD To determine whether our findings can support the genetic susceptibility of CKD, we established a PRS for CKD on the basis of GWAS summary statistics for the eGFR. Clumping and thresholding were used to calculate the PRS^[281]93,[282]94. Briefly, PRSice-2 software was used to calculate the PRS and to evaluate the most appropriate PRS model with the highest R^2 value^[283]95. The base dataset used in PRSice-2 was derived from the meta-analysis summary statistics of eGFR GWASs involving the BBJ- and TWB-based discovery datasets. The target dataset in PRSice-2 consisted of independent samples from the TWB replication dataset with CKD status available, defined as a baseline eGFR of <60 mL/min/1.73 m^2. These samples underwent the same quality control procedure as that described in the GWAS section. The weight of the proposed PRS model was determined from the beta coefficient of the eGFR GWAS summary statistic of the base dataset. After SNPs were clumped with an LD r^2 of 0.1 and a window size of ±250 kb, a P value threshold was established to select independent significant SNPs for inclusion in the PRS. The PRS model was established through adjustment for age, sex, and the first six PCs. Since the PRS model adopts beta coefficients from eGFR GWAS, a higher PRS indicates higher eGFR and consequently low CKD risk. To better interpret the role of the PRS in CKD risk predictions, we multiplied the PRS by −1 as the operative value of PRS[CKD]. Predictive performance of the PRS for CKD The predictive performance of the PRS for CKD development was evaluated in two independent datasets by using a cumulative incidence curve, which was used to represent the probability of a CKD event over time. After we examined the predictive performance of the PRS by using the CMUH-CRDR dataset (n = 25,345), we externally verified this performance by using the UKB dataset (n = 260,245). The Kaplan–Meier method was used to generate cumulative incidence curves, and a log-rank test was conducted to determine the differences between the curves of different PRS[CKD] subgroups, categorized as <−2 SDs, −2 to 2 SDs, and >2 SDs of the mean PRS[CKD]. The index date was set as each patient’s date of birth, and follow-up was conducted until the first CKD diagnosis was established, until the patient died, until the patient was lost to follow-up, or until the administrative censor date. In the CMUH-CRDR validation cohort, each patient’s date of death was verified by the National Death Registry of the Ministry of Health and Welfare of Taiwan. In the UKB cohort, the date of death was verified by NHS Digital for patients in England and Wales and by the NHS Central Register (part of the National Records of Scotland) for patients in Scotland. The administrative censor dates were December 31, 2021, for the CMUH-CRDR validation cohort and December 31, 2016, for the UKB cohort. To examine the level of CKD risk associated with PRS[CKD], a competing risk analysis with deaths considered as censoring events was conducted using cause-specific Cox proportional hazards modeling, with age used as the time scale. The discriminative performance of the Cox proportional hazards model that incorporated PRS[CKD] and sex data was evaluated using the AUROC. In addition, we plotted the observed versus predicted risk probability to evaluate the calibration of the Cox proportional hazards model. Reporting summary Further information on research design is available in the [284]Nature Portfolio Reporting Summary linked to this article. Supplementary information [285]Supplementary Information^ (1.7MB, pdf) [286]41467_2024_53516_MOESM2_ESM.pdf^ (100.3KB, pdf) Description of Additional Supplementary Files [287]Supplementary Data 1-15^ (3.6MB, xlsx) [288]Reporting Summary^ (71.6KB, pdf) [289]Transparent Peer Review file^ (2.1MB, pdf) Acknowledgements