Abstract

   Genetic mechanisms of blood pressure (BP) regulation remain poorly
   defined. Using kidney-specific epigenomic annotations and 3D genome
   information we generated and validated gene expression prediction
   models for the purpose of transcriptome-wide association studies in 700
   human kidneys. We identified 889 kidney genes associated with BP of
   which 399 were prioritised as contributors to BP regulation. Imputation
   of kidney proteome and microRNAome uncovered 97 renal proteins and 11
   miRNAs associated with BP. Integration with plasma proteomics and
   metabolomics illuminated circulating levels of myo-inositol,
   4-guanidinobutanoate and angiotensinogen as downstream effectors of
   several kidney BP genes (SLC5A11, AGMAT, AGT, respectively). We showed
   that genetically determined reduction in renal expression may mimic the
   effects of rare loss-of-function variants on kidney mRNA/protein and
   lead to an increase in BP (e.g., ENPEP). We demonstrated a strong
   correlation (r = 0.81) in expression of protein-coding genes between
   cells harvested from urine and the kidney highlighting a diagnostic
   potential of urinary cell transcriptomics. We uncovered adenylyl
   cyclase activators as a repurposing opportunity for hypertension and
   illustrated examples of BP-elevating effects of anticancer drugs (e.g.
   tubulin polymerisation inhibitors). Collectively, our studies provide
   new biological insights into genetic regulation of BP with potential to
   drive clinical translation in hypertension.

   Subject terms: Gene regulation, Gene expression, Genome-wide
   association studies
     __________________________________________________________________

   Genetic mechanisms of blood pressure regulation remain poorly defined.
   Here, the authors perform a kidney multi-omics study to identify 399
   kidney effector genes, 11 miRNAs and 97 proteins for blood pressure and
   hypertension.

Introduction

   Persistently raised blood pressure (BP) (hypertension) is single most
   important attributable risk factor for death globally^[98]1–[99]3. BP
   is a heritable trait^[100]4–[101]6; previous studies uncovered
   ultra-rare^[102]7, low-frequency^[103]8–[104]11 and
   common^[105]12–[106]23 genetic variants associated with BP and/or
   hypertension. The mechanisms underpinning the role of ultra-rare
   variants in the development of low/high BP is well-defined – almost all
   of them result in alterations of sodium/water reabsorption in the
   kidney through effect on target genes in the distal nephron and
   collecting duct^[107]7,[108]24. Several common genetic variants
   associated with BP in genome-wide association studies (GWAS) also act
   through alterations of epigenetic and transcriptional programmes
   operating in the kidney^[109]25,[110]26 – an organ of “over-riding
   dominance” in the pathogenesis of hypertension^[111]27–[112]30. Indeed,
   our recent studies uncovered the identity of some of their target genes
   and established causal connections between DNA-methylation, expression
   and splicing of these genes and BP through causal inference
   analyses^[113]25. However, for a majority of GWAS loci associated with
   BP the effector genes have not been yet identified. Thus, new
   strategies inclusive of previously unexplored “omics” layers are needed
   to accelerate the BP gene discovery and fully characterise the
   downstream molecular mechanisms and clinically measurable consequences
   of these signals.

   Here, using a collection of up to 700 human
   kidneys^[114]25,[115]31–[116]34 and new computational algorithms
   embedded in three-dimensional (3D) configuration of the genome and
   kidney epigenome, we uncover 6490 kidney genes with genetically
   imputable expression (~30.3% of kidney transcriptome). We perform BP
   transcriptome-wide association studies (TWAS), Mendelian randomisation
   and fine-mapping of causal gene sets (FOCUS) to prioritise kidney
   effector genes for BP regulation. Through genetic imputation of the
   kidney microRNAome and proteome we uncover the identity of renal miRNAs
   and proteins associated with BP. Our computational drug repositioning
   analysis demonstrates BP effects of the existing non-cardiovascular
   medications and highlight new drug repositioning opportunities for
   hypertension. We also analyse urinary cell transcriptome to provide
   insights into diagnostic tractability of BP kidney genes. Finally, we
   triangulate outputs from plasma proteomics and metabolomics with kidney
   TWAS to yield new insights into pathways of blood pressure regulation.

Results

Prioritisation of human cell-types/tissues for relevance to blood pressure
through transcriptome-wide association studies

   TWAS can not only shed light on the biological importance of
   cell-types/tissues to a trait/disease but also uncover new and
   unexpected tissue-disease associations, e.g. that of intestines to
   psychiatric disorders^[117]35 or monocytes to Parkinson’s
   disease^[118]36.

   We applied the elastic net method^[119]37 to predict the genetically
   regulated component of gene expression across 49 human cell types and
   tissues from numerically identical sets of individuals in Genotype
   Tissue Expression (GTEx). We used an equal number of samples (n = 65)
   across the panel of 49 tissues to maximise the comparability of BP TWAS
   discovery rates across these tissues accepting that this would be at
   the expense of the individual discovery rates in tissues that have been
   down-sampled. After quality control and a nested cross-validation we
   used S-PrediXcan to estimate the mediating effects of gene expression
   levels in various GTEx tissues on systolic BP (SBP) and diastolic BP
   (DBP). In brief, we used the eQTL data for 25,332 human genes across 49
   GTEx tissues and summary statistics for 7,088,121 and 7,160,657 SNPs
   from the GWAS meta-analysis of SBP and DBP conducted in UK Biobank and
   the International Consortium for Blood Pressure (ICBP)^[120]22
   (~750,000 individuals). We then calculated the overall SBP and DBP TWAS
   scores for each of the human cell-types/tissues ranking them for their
   relevance to genetic regulation of SBP and DBP, (Figs. [121]1A,
   [122]2A, S[123]1–[124]2 and Supplementary Data [125]1–[126]2). Cultured
   fibroblasts and kidney cortex showed the strongest overall association
   with SBP and DBP, respectively (Figs. [127]2A, S[128]1–[129]2 and
   Supplementary Data [130]1–[131]2).

Fig. 1. Transcriptome-wide association studies, kidney and blood pressure –
schematic representation of input data sources (with sample size), analytical
processes and output data.

   [132]Fig. 1
   [133]Open in a new tab

   A Blood pressure tissue prioritisation. B Kidney GReX derived by
   Prediction Using Models Informed by Chromatin conformations and
   Epigenomics (PUMICE) algorithm – discovery analysis. C Kidney GReX –
   validation analysis. D BP kidney TWAS analysis. E Causality and drug
   repositioning analyses. The input data sources are coloured in blue,
   schematic intermediate results are coloured in grey, the primary
   outputs are coloured in yellow, downstream single-gene analyses are
   marked in green. GReX – genetically regulated expression, GWAS –
   genome-wide association study, TWAS – transcriptome-wide association
   studies, Beta – effect size estimate, SE – standard error of beta,
   ChIP-seq – chromatin immunoprecipitation sequencing,
   HiChIP – chromosome conformation capture by sequencing and
   immunoprecipitation, ENCODE – encyclopaedia of DNA elements consortium,
   H3K27me3 – histone 3, lysine residue 27, tri-methylation,
   H3K4me3 – histone 3, lysine residue 4, tri-methylation, DHS – DNase I
   hypersensitive sites, H3K27ac – histone 3, lysine residue 27,
   acetylation, HK2 cell line – human kidney 2 cell line, BP – blood
   pressure, ICBP – International Consortium for Blood Pressure,
   FDR – false discovery rate, PMR – probabilistic Mendelian
   randomisation, FOCUS – fine-mapping of causal gene sets. Parts of the
   figure were drawn by using pictures from Servier Medical Art and some
   of these pictures were modified. Servier Medical Art by Servier is
   licensed under a Creative Commons Attribution 3.0 Unported License
   ([134]https://creativecommons.org/licenses/by/3.0/. Further parts of
   the figure were drawn by using pictures from Marcel Tisch
   ([135]https://twitter.com/MarcelTisch) and are licensed under a
   Creative Commons CC0 License
   ([136]https://creativecommons.org/publicdomain/zero/1.0/).

Fig. 2. Blood pressure TWAS – from prioritisation of human tissues to
development of enhanced gene expression prediction kidney model.

   [137]Fig. 2
   [138]Open in a new tab

   A Representation of 49 human tissues and cell-types from the GTEx
   project ranked by magnitude of association with systolic (SBP) and
   diastolic blood pressure (DBP). Higher score represents stronger
   association with blood pressure. The ranking of each tissue is labelled
   from 1 to 49 derived by the sum of SBP and DBP scores. The colour scale
   is based on the ranking (from highest to lowest) from orange, green,
   blue, purple to pink. Each tissue or cell-type highlighted in black.
   Further information is shown in Figs. [139]S1–[140]2. Created by Idoya
   Lahortiga. B Kidney-informed enhanced prediction of gene expression.
   Numbers of features (peaks, 3D structures) are shown for each kidney
   data layer. TAD – The variant inclusion window [delimited by a
   topologically associating domain (TAD), chromatin loop domain (Loop),
   1 Mb fixed window or 250 kb fixed window] and variant specific
   weighting strategy are optimised to maximise gene discovery in kidney
   transciptome-wide association study (TWAS) using 3D genome and
   epigenomic data directly measured in the kidney. CTCF – CCCTC-binding
   factor. HiChlP – chromosome conformation capture with chromatin
   immunoprecipitation, genomic contact – two regions of chromatin in
   close physical proximity in the HK-2 cell line. C TWAS model prediction
   workflow. Data are coloured by type: genotype – grey, gene expression –
   blue, predictive model – orange. D, E Correlation between the predicted
   genetically regulated expression (GReX) and observed expression of
   ERAP2 in HKTR (blue) and in NIH resources (red). The best-fitting line
   with 95% confidence interval (highlighted in grey) is represented, P –
   nominal P-value is calculated from two-sided Pearson correlation. F
   Predictive performance (r^2) of gene expression models from discovery
   resource (HKTR, n = 478) vs validation resource (NIH, n = 222). r –
   Pearson correlation coefficient. G Percentage of imputable genes (top)
   and all expressed genes (bottom) in biotypes. Data are coloured by
   biotype: protein coding gene – pink, long non-coding RNA – red, others
   – orange. H. Examples of imputable kidney genes of relevance to
   cellular transport of solutes. Associations between GReX of each gene
   and significantly associated quantitative and disease traits are
   represented by large arrows showing the directionality of change in the
   trait (upwards – higher risk or increased blood levels, downwards –
   lower risk or lower blood levels). Example genes are coloured according
   to their associated traits and are placed in the region of the nephron
   with highest expression. Partially created with BioRender.com.

   Only five cell types/tissues – kidney, cultured fibroblasts, adrenal
   gland, cells EBV-transformed lymphocytes and thyroid ranked within top
   ten tissues for both SBP and DBP (Figs. [141]2A, [142]S1 and
   Supplementary Data [143]1–[144]2). Both vasculature (aorta,
   tibial artery) and central nervous system (several areas of brain
   including hippocampus, cortex, cerebellar hemisphere, frontal cortex,
   cerebellum, and caudate basal ganglia) showed a strong representation
   within the tissues of the highest relevance to BP (Figs. [145]2A,
   [146]S1 and Supplementary Data [147]1–[148]2). The uncovered
   association between lymphocytes, spleen and minor salivary gland
   supports the emerging role of immune system in BP regulation and
   hypertension^[149]38,[150]39 while the connection between the brain
   regions and BP is increasingly recognised not only in the pathogenesis
   of hypertension but also hypertension-mediated cognitive
   decline^[151]40.

   Collectively, our data uncovered new and prioritised existing tissue
   contributors to BP regulation. We have also provided an additional line
   of evidence for the role of the kidney as a key mediator of genetic
   effects on BP regulation and the development of hypertension.

Development and validation of the prediction models for a genetic component
of kidney gene expression – enhanced performance through integration of
information from kidney epigenomics and 3D configuration of the genome

   We applied Prediction Using Models Informed by Chromatin conformations
   and Epigenomics (PUMICE)^[152]41, an algorithm which integrates 3D
   chromatin organisation data and epigenetic annotations, to generate
   prediction models for genetically regulated gene expression (GReX) in
   the human kidney (Fig. [153]1B). To obtain 3D kidney genome data, we
   first generated a high-resolution transcription-centred chromatin map
   using established pipelines from H3K27ac Hi-ChIP in HK2 cell
   line^[154]42 (Fig. [155]2B). In total, we obtained 69,036,737 genomic
   contacts and identified 11,569 Topologically Associating Domains (TADs)
   and 8934 chromatin loops (Fig. [156]2B). For epigenomics data, we
   retrieved information for human adult kidney from ENCODE using 4
   different tracks, including H3K27ac (127,379 peaks), H3K4me3 (70,569
   peaks), DNase hypersensitivity sites (275,004 peaks) and CTCF (45,832
   peaks; Fig. [157]2B).

   By integrating these data with RNA-sequencing-derived transcriptome of
   478 kidneys available in the discovery resource using PUMICE
   (Fig. [158]1B), we generated 8682 significant gene expression
   prediction models (Fig. [159]2C). This is ~24% increase in comparison
   to 7011 models generated by fitting “traditional” elastic-net linear
   models on the same number of kidneys by PrediXcan (Supplementary
   Data [160]3). We, then, validated these models in an independent
   dataset of 222 kidneys collected as our validation resource
   (Fig. [161]1C). We found that 6490 kidney PUMICE-derived models
   remained significant (in comparison to 5647 PrediXcan models;
   Fig. [162]2C and Supplementary Data [163]3–[164]4). ERAP2 (endoplasmic
   reticulum aminopeptidase 2) gene is an example of kidney gene with an
   excellent correlation between the predicted GReX and its observed
   expression in both our discovery and validation resource (Fig. [165]2D,
   E). Overall, there was a very strong correlation (r = 0.83,
   P-value = 1.1 × 10^−1647) in the predictive performance of
   PUMICE-derived models between our discovery and validation kidney
   resources (Fig. [166]2F). We saw no significant differences in
   proportions of imputable kidney genes across three main gene biotypes
   (Fig. [167]2G).

   Collectively, we enhanced the discovery of kidney genes with
   genetically imputable expression through integration of kidney
   epigenomic and 3D genomic annotations and validated approximately 75%
   of the discovered models in an independent collection of the same
   tissue type.

The predicted kidney gene expression of metabolic transporters/receptors
recapitulates their biological functions and expected contributions to human
disease

   In addition to validation of the computational robustness for the
   predictive models, we sought to examine if their imputed gene
   expression is associated with biochemical readouts of their molecular
   activity in the kidney. We reasoned that the generated GReX models for
   the key receptors/transporters involved in the renal handling of urate,
   glucose, phosphate and calcium should capture the expected
   contributions of these genes to the circulating levels of these
   solutes. Amongst kidney genes with robust, independently validated GReX
   we selected: solute carrier family 2 member 9 gene (SLC2A9), solute
   carrier family 22 member 12 gene (SLC22A12), solute carrier family 2
   member 2 gene (SLC2A2), inositol hexakisphosphate kinase 3 gene
   (IP6K3), calcium sensing receptor gene (CASR) as the examples for
   regulators of urate, glucose, phosphate and calcium transport in
   tubular epithelium (respectively) (Fig. [168]2H). We demonstrated that
   the genetically imputed kidney expression for these genes showed the
   expected associations with the relevant blood biochemistry phenotypes
   in UK Biobank; e.g. increased renal expression of SLC22A12 and SLC2A9
   ([encoding apical URAT1 and basolateral GLUT9] (respectively) – the
   main transporters responsible for reabsorption of urate in proximal
   tubule)^[169]43 was associated with increased serum levels of urate in
   321,210 individuals from UK Biobank (Fig. [170]2H and Supplementary
   Data [171]5). Moreover, predicted renal expressions of these genes show
   directionally consistent associations with diseases arising from
   increased/decreased changes of these phenotypes in blood, e.g.,
   increased predicted kidney expression of SLC22A12 and SLC2A9 was
   associated with increased odds of gout in UK Biobank (Fig. [172]2H and
   Supplementary Data [173]5).

   In the absence of information on serum sodium and potassium levels in
   UK Biobank we could not test their associations with SCNN1B (encoding
   the imputable beta subunit of epithelial sodium channel – ENaC). The
   latter operates as a key regulator of sodium/potassium handling in the
   aldosterone-sensitive portion of the distal nephron^[174]44, is
   targeted by amiloride^[175]45 and represents one of the most consistent
   gene expression signatures of hypertension-mediated effect on kidney
   disease^[176]46. However, we noted that increased level of SCNN1B
   expression was associated with numerically lower and nominally
   significant risk for hyperkalaemia in UK Biobank consistent with the
   lowering effect of upregulated ENaCs on circulating levels of potassium
   (Fig. [177]2H and Supplementary Data [178]5).

   Taken together, these examples illustrate the biological robustness of
   the models for genetically predicted expression of kidney genes
   developed for the purpose of TWAS.

Kidney transcriptome-wide association studies and computational drug
repositioning analysis uncover new genes associated with blood pressure and
provide new therapeutic insights

   Using 6490 independently validated GReX models we conducted a two-stage
   reciprocal replication BP kidney TWAS with SBP, DBP and pulse pressure
   (PP) summary statistics from 337,422 UK Biobank individuals and 299,024
   individuals from ICBP (Figs. [179]1D and [180]S3). In brief, we used
   each BP summary statistic resource as a discovery population and
   followed up the significant signals (i.e., kidney genes associated with
   at least one BP trait) in the other cohort (for the same BP trait)
   (Fig. [181]S3). Through this reciprocal replication we uncovered a
   total of 889 unique kidney genes showing statistically significant
   associations (FDR < 0.05) with at least one BP trait in both datasets
   (Supplementary Data [182]6 and Fig. [183]S3). We then mapped these
   genes onto the existing 429 independent BP GWAS loci (Supplementary
   Data [184]6) and found at least one kidney gene within 258 of these
   loci (60.1%; Fig. [185]S3). Altogether, 772 BP TWAS genes were mappable
   to the existing BP GWAS loci (Fig. [186]S4). One-hundred and seventeen
   of these genes mapped outside the known BP GWAS loci (Supplementary
   Data [187]6 and Fig. [188]S4) illustrating the potential of TWAS to
   uncover new associations in the chromosomal regions “missed by
   GWAS”^[189]47. 78.7% and 57.9% of kidney genes uncovered by our TWAS
   were not amongst BP genes from any human tissues examined in TWAS by
   Giri et al^[190]23. or multi-tissue panel TWAS examined more recently
   by Wu et al^[191]48. (Supplementary Data [192]7). There was a modest
   (18%) degree of overlap between kidney genes associated with BP in our
   TWAS and the genes associated with CKD-defining traits in TWAS
   conducted by Schlosser et al^[193]49. (Supplementary Data [194]7).
   Amongst those that overlap were several notable genes linked already to
   both BP and kidney health/disease including interferon regulatory
   factor 5 gene (IRF5)^[195]24,[196]25, N-Acetyltransferase 8B gene
   (NAT8B)^[197]32,[198]50 and Dipeptidase 1 gene (DPEP1)^[199]32,[200]51.

   We then examined the output from our BP kidney TWAS via Connectivity
   Map (CMap)^[201]52,[202]53 – a library of gene expression changes
   induced by a panel of 1309 different FDA-approved drugs and small
   chemical compounds. We sought to identify drugs/compounds both inducing
   and reversing changes in gene expression associated with BP –
   pharmaceuticals with reversed direction of the effects on gene
   expression (to that of BP) can be interpreted as potential
   repurposing/repositioning options for hypertension (Fig. [203]1E).

   Based on the reversal of the BP-related changes in the transcriptome,
   adenylyl cyclase activators were identified as a group of medications
   with a potential to lower BP (Supplementary Data [204]8). This is in
   line with their effect on adenylate cyclase and an elevation of
   intracellular cyclic AMP (cAMP)^[205]54 leading to vascular smooth
   muscle relaxation and subsequent vasodilation^[206]55. cAMP-dependent
   effect of forskolin [natural root extract from Coleus barbatus (Blue
   Spur Flower)] showed a potential to protect podocytes from
   injury^[207]56 and already emerged as a new drug repurposing
   opportunity for kidney diseases^[208]57. Forskolin is available as a
   diet supplement^[209]58 and was associated with several health benefits
   before^[210]59.

   Our data support previously reported side effect of topoisomerase
   inhibitors (commonly used to treat acute myeloid leukaemia) on BP
   (hypotension)^[211]60. We also demonstrated both known (e.g.,
   glucocorticoids) and less recognised (e.g., tubulin polymerisation
   inhibitors) potential of several groups of therapeutics to increase BP
   based on the direction of their effects on gene expression (i.e.,
   synchronous with that of BP) (Supplementary Data [212]8). The data on
   tubulin polymerisation inhibitors are in line with one of our most
   recent studies showing how docetaxel induced endothelial dysfunction
   and hypertension^[213]61.

   Collectively, our studies identified almost 900 genes whose predicted
   kidney expression show directionally consistent associations with SBP,
   DBP and/or PP across two independent cohorts providing a robust input
   for further downstream analyses. We also identified a new potential
   pharmaceutical repositioning opportunity for hypertension and
   highlighted examples of BP elevating effects of the existing
   medications with indications for conditions other than hypertension.

Mendelian randomisation and fine-mapping of causal gene sets uncovers
putatively causal independent associations between kidney genes and blood
pressure

   We then sought to substantiate the evidence of causal effects of 889 BP
   genes identified in our BP kidney TWAS on BP traits (Figs. [214]1E and
   S[215]3–[216]4). We integrated the summary statistics information from
   cis-eQTL analysis [generated using 478 kidneys from the Human Kidney
   Tissue Resource (HKTR)] and GWAS summary statistics for SBP, DBP and PP
   (generated using ~750,000 individuals from both UK Biobank and ICBP) in
   two-sample Mendelian randomisation (MR) designed for TWAS
   applications^[217]62. We determined that 663 kidney genes from our TWAS
   analysis showed robust evidence of potentially causal effect on at
   least one BP trait (FDR < 0.05; Fig. [218]S4).

   We noticed that 510 of these genes cluster with at least one other gene
   within the same locus (Supplementary Data [219]9) – we identified a
   total of 133 loci with two or more BP kidney genes showing potentially
   causal associations with BP (Supplementary Data [220]9). To account for
   a potential effect of linkage disequilibrium-driven correlations
   between genes in these regions on our findings we applied FOCUS-based
   analysis^[221]63. Within 35 of these regions, FOCUS pointed to a single
   BP kidney gene. For example, within a locus on chromosome 17, the
   initial set of ten genes showing causal effects on DBP, FOCUS
   prioritised N-Myristoyltransferase 1 gene (NMT1) whose primary function
   is a co-/post-translational modification of proteins through the
   addition of a fatty acid (myristate) to the N-terminal glycine
   residue^[222]64,[223]65 (Supplementary Data [224]10).

   Altogether, our MR studies followed by fine-mapping of causal gene sets
   prioritised 399 kidney genes showing a potentially causal effect on BP
   (Figs. [225]3 and [226]S4). Nearly half (182, 45.6%) of them mapped as
   a single gene onto specific BP GWAS/TWAS locus (Supplementary
   Data [227]9 and Fig. [228]S4) and 29 of them had causal effects on all
   three BP traits (Fig. [229]3).

Fig. 3. Circular representation of information on 399 putatively causal genes
for blood pressure and the degree of their shared association with SBP
(blue), DBP (red) and PP (green).

   [230]Fig. 3
   [231]Open in a new tab

   Genes are grouped by their biological theme (shown as coloured
   regions). From outermost to inner most data circle: associations with
   systolic blood pressure (SBP) coloured in blue, associations with
   diastolic blood pressure (DBP) coloured in red and associations with
   pulse pressure (PP) coloured in green.

   We assigned each of these 399 genes into one of 16 biological master
   themes based on their known function and relevance to human health and
   disease (Supplementary Data [232]11). We observed a very strong
   footprint of human metabolism (Fig. [233]3 and Supplementary
   Data [234]11) among BP kidney TWAS genes. This included biochemical
   pathways responsible for metabolic processing of amino acids [e.g.
   agmatinase gene (AGMAT) and serine racemase gene (SRR)], carbohydrates
   [e.g. solute carrier family 5 member 11 gene (SLC5A11), solute carrier
   family 2 (facilitated glucose transporter), member 4 gene (SLC2A4) and
   starch binding domain 1 gene (STBD1)] lipids [(e.g. Acyl-CoA
   Thioesterase 8 gene (ACOT8), ELOVL Fatty Acid Elongase 7 gene (ELOVL7)
   and fatty acid desaturase 1 (FADS1)], vitamins [e.g. Folate receptor
   alpha gene (FOLR1)] and oxidative phosphorylation [(e.g. Inner Membrane
   Mitochondrial Protein gene (IMMT) and Nicotinamide Nucleotide
   Transhydrogenase gene (NNT)] in line with a theory of metabolic roots
   of hypertension^[235]66 and increasingly appreciated role of the kidney
   as a key regulatory organ of human metabolism beyond its contributions
   to fluid-ion homeostasis^[236]67.

   A few of these genes have an established role in the physiological
   maintenance of sodium-water homeostasis and our data demonstrate their
   contributions to BP elevation. For example, increased renal expression
   of aquaporin 1 gene (AQP1) and aquaporin 4 gene (AQP4) –
   water-selective channels operating in the proximal tubule/thin
   descending limb of Henle/descending vasa recta and principal cells of
   the collecting duct (respectively)^[237]68 showed causal effects on
   increased SBP and DBP (Fig. [238]3). This is most consistent with the
   increase in both constitutive (e.g. via AQP1)^[239]68 and
   arginine-vasopressin (AVP)-regulated^[240]69 (e.g. via AQP4)
   reabsorption of water and their role in hypertension proposed in
   experimental models^[241]70,[242]71. Together with the previous
   findings on aquaporin 11 gene (AQP11)^[243]25, this also means that one
   third (3/9) of renal aquaporins^[244]68 are mediators of the
   genetically determined predisposition to elevated BP.

   Overall, our TWAS-driven studies identified 7.5-fold greater number of
   kidney genes whose expression shows a potentially causal effect on BP
   than our earlier MR analyses (n = 53)^[245]25. This is also over 2-fold
   increase in the discovery over the number of genes identified by
   integration of BP GWAS and three different types of kidney
   omics^[246]25.

Triangulation of outputs from plasma proteomics and metabolomics with kidney
transcriptome-wide association studies yields new insights into pathways of
blood pressure regulation

   Genetic analyses of plasma proteomics and metabolomics are increasingly
   used to gain insights into the pathogenesis of complex traits including
   chronic kidney disease (CKD) and hypertension^[247]51,[248]72. We
   sought to explore how such layers of omics may help in functional
   interpretation of findings from kidney TWAS using SLC5A11, agmatinase
   gene (AGMAT) and angiotensinogen gene (AGT) as examples (Fig. [249]4).

Fig. 4. Genetically predicted expression of kidney genes and their
biochemical readouts – integrative multi-omics analysis.

   [250]Fig. 4
   [251]Open in a new tab

   A Localisation of the sodium/myo-inositol cotransporter 2 (SMIT2),
   encoded by the SLC5A11, in the apical pole of renal proximal tubular
   cells. SMIT2 facilitates the co-transport of Na+ and myo-inositol
   across the cell membrane. Created with BioRender.com. B Effects of
   genetically regulated expression (GReX) of kidney SLC5A11 on pulse
   pressure (PP) from transcriptome-wide association study (TWAS). Nominal
   P-value is calculated from two-sided Z-score test. UKB – UK Biobank,
   ICBP – International Consortium for Blood Pressure, Estimate – change
   of PP in units per one-unit higher GReX of SLC5A11, CI – confidence
   interval. C Representation of 58.4% effect of SLC5A11 mRNA expression
   on PP mediated by plasma levels myo-inositol. Nominal P-value is
   calculated from two-sided Z-score test. MP – mediation proportion, B
   – estimated effect, SE – standard error of the estimated effect, EM
   – from exposure to mediator, EO – from exposure to outcome, MO – from
   mediator to outcome, indirect – indirect effect from exposure to
   outcome, total – total effect from exposure to outcome. D The polyamine
   pathway of arginine catabolism. Agmatinase, encoded by the AGMAT,
   catalyzes the reaction between agmatine and putrescine, resulting in
   the production of urea. Created with BioRender.com. E Effects of kidney
   AGMAT (GReX) on systolic blood pressure (SBP) from TWAS. Nominal
   P-value is calculated from two-sided Z-score test. Estimate – change of
   SBP in units per one-unit higher GReX of AGMAT. F Representation of
   11.7% effect of AGMAT mRNA expression on SBP mediated by plasma levels
   of 4-guanidinobutanoate. Nominal P-value is calculated from two-sided
   Z-score test. G Renin catalyzes the reaction from ANG (angiotensinogen)
   to Angiotensin I (AngI), which is subsequently converted to AngII by
   Angiotensin-Converting Enzyme (ACE). Created with BioRender.com. H
   Effects of kidney AGT mRNA (GReX) on SBP from TWAS. Nominal P-value is
   calculated from two-sided Z-score test. Estimate – change of SBP in
   units per one-unit higher GReX of AGT. I Representation of 52.7% effect
   from AGT mRNA expression on SBP mediated by circulating plasma protein
   levels of angiotensinogen. Nominal P-value is calculated from two-sided
   Z-score test. In 4B, 4E and 4H, squares are positioned by the estimated
   effects with horizontal error bars illustrating 95% confidence
   intervals of the estimated effects.

   We found that reduced renal expression of SLC5A11 is causally
   associated with increase in PP (Fig. [252]4A, B). This gene encodes
   sodium/glucose cotransporter 6 (SGLT6, SMIT2). Given a well-established
   role of this family of transporters [e.g. solute carrier family 5
   member 1 gene (SLC5A1, SGLT1) and solute carrier family 5 member 2 gene
   (SLC5A2, SGLT2)] in glucose homeostasis^[253]73, we firstly examined
   the association between renal expression of SLC5A11 and levels of
   glucose and HbA1C using data from 337,350 unrelated European
   individuals from UK Biobank. Having found no association with either
   (Supplementary Data [254]12), we then sought to determine whether the
   key target substrate of SLC5A11 [(i.e. myo-inositol – a cyclic
   carbohydrate of importance to signal transduction and
   osmoregulation)^[255]74] may act as a mediator of its association with
   BP. Using data from 14,296 individuals from two cohorts (INTERVAL and
   EPIC-Norfolk)^[256]75 we confirmed that renal SLC5A11 was indeed
   associated with circulating levels of myo-inositol (Fig. [257]4C). This
   is in line with the key role of the kidney in myo-inositol
   metabolism^[258]74. We then uncovered an inverse association between
   myo-inositol and PP (Fig. [259]4C). Through further mediation analysis,
   we determined that ~48.4% of effect of SLC5A11 renal expression on PP
   is mediated by serum levels of myo-inositol. Indeed, myo-inositol
   depletion was linked before to several metabolic disorders such as
   insulin resistance and polycystic ovary syndrome^[260]74; its increased
   urinary excretion was highlighted as a marker of CKD
   progression^[261]76. These findings show that a reduction in
   circulating concentrations of myo-inositol may contribute to increased
   BP at least in part because of genetically determined drop in
   expression of its key renal cotransporter. Our data also suggest that
   increasing levels of myo-inositol (e.g. through enhancing
   SLC5A11-dependent reabsorption in the proximal tubule^[262]77) may be
   beneficial to BP control.

   Our BP kidney TWAS also demonstrated that genetically determined
   increase in renal expression of AGMAT was causally associated with
   increased SBP (Fig. [263]4D, E). AGMAT is responsible for enzymatic
   conversion of agmatine to putrescine downstream from arginine on the
   alternative pathway for polyamine biosynthesis^[264]78 (Fig. [265]4D).
   It shows a strong enrichment in the kidney (Fig. [266]S5) and a
   cell-type specific enrichment in proximal tubule
   ([267]https://www.proteinatlas.org/ENSG00000116771-AGMAT/single+cell+ty
   pe). While we could not examine how kidney AGMAT expression correlates
   with putrescine levels, we showed that genetically determined kidney
   expression of AGMAT mRNA is associated with circulating blood urea
   nitrogen (BUN) levels – a by-product of its enzymatic activity (i.e.
   agmatine + H2O = putrescine + urea) in two independent populations
   (Fig. [268]S6). Using data of 14,296 individuals from two cohorts
   (INTERVAL and EPIC-Norfolk)^[269]75 we also confirmed the association
   between renal AGMAT and plasma levels of 4-guanidinobutanoate (known
   also as gamma-guanidinobutyric acid or gamma-guanidinobutanoate) – a
   metabolite whose salivary levels were linked to AGMAT before^[270]79
   (Fig. [271]4F). We also found an association between BP and circulating
   levels of 4-guanidinobutanoate (Fig. [272]4F). The latter was
   previously proposed to act in a pro-inflammatory
   manner^[273]80,[274]81. Finally, through further mediation analysis we
   determined that approximately 12.3% of effect of renal AGMAT expression
   on SBP is mediated by increased levels of 4-guanidinobutanoate in
   plasma (Fig. [275]4F). These data demonstrate the importance of renal
   catabolism of arginine in BP regulation and uncover an arginine-derived
   compound as a new metabolic readout of high BP.

   Our TWAS also uncovered a consistently strong causal association
   between increased renal expression of angiotensinogen gene (AGT) and BP
   (Fig. [276]4G, H). Expression of angiotensinogen mRNA has been thought
   to influence BP through intra-renal RAS activity (Fig. [277]4G) and we
   sought to quantify the extent to which the effect we detected is indeed
   independent of systemic (circulating) angiotensinogen. Using data from
   10,708 individuals from the Fenland study^[278]82 with available plasma
   levels of angiotensinogen protein we first confirmed the causal
   association between the plasma angiotensinogen and BP in the expected
   direction (Fig. [279]4I). Further mediation analysis demonstrated that
   approximately 44.7% of the effect of kidney AGT mRNA on BP was
   independent of circulating plasma angiotensinogen (Fig. [280]4I) and
   possibly reflective of the local activity of angiotensinogen in the
   kidney^[281]83. Interestingly, a significant proportion of renal
   angiotensinogen mRNA effect on BP appeared to be mediated by plasma
   levels of angiotensinogen (Fig. [282]4I). This may suggest either a
   shared genetic regulation of angiotensinogen mRNA expression between
   the kidney and the key tissue(s) from where it is released into
   circulation (i.e. liver)^[283]83 or/and a largely unrecognised systemic
   effect of renal angiotensinogen on BP.

   Collectively, these studies exemplify how integration of genomics and
   kidney transcriptomics with data from other “omics” (e.g., proteomics
   and metabolomics) can provide insights into molecular mechanisms
   underpinning the findings from TWAS, identify the downstream effectors
   of kidney TWAS genes and highlight new biochemical readouts of elevated
   BP.

Genetically regulated expression of miRNAs in the kidney is associated with
blood pressure

   Several miRNAs have been proposed to contribute to BP regulation and
   the development of hypertension mostly through gene
   expression-phenotype correlation studies^[284]84,[285]85. To
   systematically examine whether kidney miRNAs are associated with BP we
   first created a repository of 339 miRNA expression profiles with
   matching genotype information in our discovery resource (HKTR). We
   uncovered 1459 kidney miRNAs, a majority of which are encoded by
   intronic sequences (Fig. [286]5A). As expected, miRNAs accounted for a
   much smaller proportion of kidney genes when compared to protein-coding
   genes and long non-coding RNAs (Fig. [287]5B). Fewer kidney miRNAs than
   other biotypes accounted for the same proportion of cumulative gene
   expression in the kidney (Fig. [288]5C).

Fig. 5. Kidney miRNAs and blood pressure.

   [289]Fig. 5
   [290]Open in a new tab

   A The number of miRNAs expressed in the human kidney and the
   distribution of their genetic origins. Red – exonic, green – intronic,
   yellow – intergenic. Partially created with BioRender.com. B Cumulative
   proportion of gene biotypes expressed in the kidney, stratified by
   chromosome. Red – protein-coding genes, yellow – long non-coding RNAs,
   purple – pseudogenes, orange – miRNAs. C Cumulative proportion of the
   sum of miRNA expression (log[2](TPM + 1)) and gene expression
   (log[2](TPM + 1)) against the total number of miRNAs and genes ranked
   by their expression in descending order. Red – protein-coding genes,
   yellow – long non-coding RNAs, purple – pseudogenes, orange – miRNAs. D
   Tissue enrichment profiles of 52 miRNAs with highest level of
   expression enrichment/enhancement in the kidney – data across 35
   tissues. Blue – tissue enhanced (expression 4x higher than the
   cross-tissue mean), purple – group enriched (group of 2–5 tissues with
   expression 4x higher than any other tissue), grey – no enrichment. E
   Predictive miRNA expression model workflow used in TWAS. Grey –
   genotype, blue – gene expression, yellow – predictive models. F
   Correlation between the predicted GReX and observed expression of
   miR-196-3p. The best-fitting line with 95% confidence interval
   (highlighted in grey) is represented. Blue – HKTR (n = 339), red – NIH
   kidney validation resource (n = 150). P – P-value is calculated from
   two-sided Pearson correlation. G Overview of association between 80
   imputable, validated kidney miRNAs and blood pressure traits, ordered
   by chromosome. Red – significantly associated with SBP, blue –
   significantly associated with DBP, purple – significantly associated
   with PP, grey – non-significant association. H Selected characteristics
   of miRNAs significantly associated with at least one blood pressure
   trait. ‘Genetic origin’ represents the location of each miRNA relative
   to other genes. Red – exonic, green – intronic, yellow – intergenic.
   ‘Imputability’ represents the degree of imputability of each miRNA as
   determined by the predictive model r^2. Blue – higher imputability,
   white – weaker imputability. ‘Expression’ represents the expression
   fold-change of each miRNA in comparison to the most highly expressed
   kidney miRNA. Purple – strongly expressed, white – weakly expressed.
   ‘Specificity’ represents the expression fold-change of each miRNA in
   comparison to the cross-tissue mean. Green – most highly expressed in
   the kidney, white – similar expression in the kidney to other tissues,
   red – most highly expressed in tissues other than the kidney.

   We then examined the degree of miRNA specificity to kidney tissue by
   integrating our catalogue of renal miRNAs with expression profiles of
   35 human tissues curated by miRNATissueAtlas2^[291]86. Of 1431 miRNAs
   overlapping between our kidney catalogue and miRNATissueAtlas2, 49 and
   3 fulfilled the criteria of kidney enhanced and kidney group
   enriched^[292]87, respectively (Fig. [293]5D). MiR-30a-3p, miR-30a-5p
   and miR-188-5p showed strong enrichment in the kidney and some of them
   have prior evidence of contributions to kidney disease
   (Fig. [294]5D)^[295]88,[296]89.

   Using 339 kidneys from our discovery resource we then generated and
   cross-validated GReX prediction models for 201 kidney miRNAs
   (Fig. [297]5E). Of these, 143 were available for validation and 80 of
   these were validated in an independent resource of 150 National
   Institutes of Health (NIH) kidneys (TCGA and CPTAC) with matching
   genotype and small RNAseq-derived expression profiles (Fig. [298]5E, F
   and Supplementary Data [299]13). miR-196a-3p is an example of a miRNA
   whose strong genetic regulatory component renders it an excellent
   target for TWAS (Fig. [300]5F).

   We then used a computational pipeline with reciprocal replications in
   two cohorts established at earlier stages to conduct BP kidney
   microRNA-TWAS (Fig. [301]S7). We identified 11 kidney miRNAs whose
   genetically imputed expression was associated with BP across both
   cohorts (Fig. [302]5G and Supplementary Data [303]14). They mapped to
   nine independent BP GWAS loci and represented different spectra of
   kidney abundance, tissue specificity and genetic origin (Fig. [304]5H
   and Supplementary Data [305]15).

   Given the intragenic (exonic/intronic) DNA origin for a majority of BP
   kidney miRNAs, we then examined whether their host genes could act as
   BP TWAS genes and whether they may account for the detected
   associations with BP. Out of ten intragenic miRNAs, only one – hsa-miR
   1908-5p – had a kidney BP TWAS gene as the host gene (FADS1)
   (Supplementary Data [306]16). However, further analyses showed that the
   association between the hsa-miR 1908-5p and BP was not mediated by
   FADS1 – the only BP TWAS gene in the locus (Supplementary
   Data [307]16). We then extended the mediation analysis to all BP kidney
   TWAS genes mapping onto the proximity of nine remaining intragenic
   kidney miRNAs. For eight of the kidney miRNAs with at least one BP TWAS
   gene in the locus, we found only one gene (Homeobox C6, HOXC6)
   accounting for the mediation signals between kidney miRNAs
   (hsa-miR-196a-3p) and BP (Supplementary Data [308]16).

   Taken together, our data provided insights into the landscape of miRNAs
   expressed in the kidney. We uncovered new associations between BP and
   kidney miRNAs and showed that the associations between BP and kidney
   miRNAs are not necessarily mediated via their host genes. Finally, we
   demonstrated that in some cases other BP-associated genes in proximity
   to the miRNA may act as the mediators of their associations with BP.

Kidney proteome-wide association studies uncover new proteins associated with
blood pressure and provide an orthogonal validation for blood pressure
associations uncovered at the transcriptome level

   Proteins are the key effector molecules translating biological signals
   inherited in DNA and/or transcribed in mRNA into phenotypic differences
   between individuals including their health and susceptibility to
   disease. It is not clear whether genetic determinants of BP are
   associated with changes of protein expression in the human kidney and
   whether the associations between BP genetic variants and the renal mRNA
   expression have a functional impact at the protein level. Using 72
   NIH-CPTAC human kidneys with tissue proteome characterised by liquid
   chromatography mass spectrometry, we identified 7,291 proteins; with
   the majority classified as predicted intracellular proteins (89%)
   (Supplementary Data [309]17). The proportions of soluble,
   membrane-bound and secreted proteins (Fig. [310]6A) were consistent
   with the data from tissue-based map of the global human
   proteome^[311]90. Of 452 HPA genes with elevated expression in the
   kidney (when compared to other tissues), 223 were identified in the
   NIH-CPTAC dataset (Supplementary Data [312]18). As expected, we found
   the strongest enrichment for proteins encoded by genes with highest
   level of specificity to the kidney (Fig. [313]6B).

Fig. 6. Kidney tissue proteomics and blood pressure PWAS.

   [314]Fig. 6
   [315]Open in a new tab

   A Classification of 20,082 proteins (from the Human Protein Atlas
   (HPA)) and 7,291 measurable proteins in human kidney (from CPTAC) by
   predicted localisation. Percentage of all measurable proteins in each
   class is shown, a single protein may be predicted to belong to more
   than one category. B Enrichment for tissue specificity (kidney
   enriched, kidney enhanced, and group enriched genes) in measurable
   kidney proteins (6608 proteins) compared to all the HPA proteins
   (20,082 proteins). Tissue enrichment and enhancement status is taken
   from the HPA. Enrichment is calculated by two-sided Fisher’s exact
   test, coloured circles represent statistically significant enrichment
   results, increasing size and red colour intensity denotes a larger
   negative log[10] P-value. Circles are positioned by enrichment odds
   ratio with horizontal error bars showing 95% confidence intervals on
   the odds ratio. C Density of Spearman’s correlation coefficients
   between protein abundance and gene expression for 6712 protein/gene
   pairs, with associated enrichment estimates for HPA kidney enriched
   genes, monogenic hypertension/hypotension genes, antihypertensive drug
   targets and kidney TWAS genes showing causal association with BP
   (derived from this study). Enrichment P-value was calculated by a
   one-sided two-sample Kolmogorov Smirnov test. D Correlation between
   kidney tissue gene expression and tissue protein abundance for two
   blood pressure lowering therapeutic targets (SLC12A3 and SLC12A1).
   Error bands represent 95% confidence intervals. E Overview of human
   kidney tissue PWAS workflow. Objects are coloured by their data type:
   genotype – grey, protein abundance – purple, predictive model – orange,
   phenotype – red, BP PWAS proteins – pink. F Of 97 BP PWAS proteins, 46
   have at least one additional level of evidence for association with BP.
   Number of genes is shown in each area of the Venn diagram. TWAS (blue
   circle) – PWAS proteins with evidence from BP kidney TWAS analysis,
   Kidney colocalisation (green circle) – PWAS proteins with evidence from
   previous colocalisation analyses with BP (Eales et al.^[316]25). G
   Overrepresented KEGG pathways and Human Phenotype Ontology diseases in
   97 BP PWAS proteins (one-sided hypergeometric test). All pathways
   significant at 5% FDR are shown as coloured circles, circles are sized
   by nominal P-value (large – most significant, small least significant)
   and coloured by magnitude of overrepresentation (light pink – least
   positive overrepresentation, dark red – most positive
   overrepresentation). Pathways are grouped by shared overlapping genes
   (>65%) and have been manually classified into themes based on the known
   function of overlapping genes. H Enrichment for drug tractability in 97
   BP PWAS proteins (permutation test) across three therapeutic
   modalities. Distributions shown are smoothed density estimates of
   category counts across 100,000 randomly permuted gene sets of equal
   size to BP PWAS proteins (n = 97). Red point denotes observed count for
   each tractability category in the 97 BP PWAS proteins. P-values are
   calculated by one-sided permutation test. PROTAC proteolysis-targeting
   chimera.

   We then examined how the abundance of proteins known for their
   relevance to BP regulation and hypertension correlates with the
   expression of their respective genes. A total of 7036 genes quantified
   by RNA-sequencing had their abundance measured at the protein level in
   the NIH-CPTAC dataset. Genes known for Mendelian
   hypertension/hypotension syndromes^[317]91 and kidney targets for
   BP-lowering medications were significantly enriched amongst genes with
   the highest positive correlation with their respective proteins
   (P-value = 2.5 × 10^−2 and P-value = 1.5 × 10^−3, respectively) and the
   magnitude of enrichment was comparable to HPA kidney-enriched/enhanced
   genes (Fig. [318]6C). For example, SLC12A1 (encoding
   bumetanide-sensitive sodium-(potassium)-chloride cotransporter 2 – a
   target for loop diuretics) and SLC12A3 (encoding a thiazide-sensitive
   sodium-chloride cotransporter – a target for thiazides) were within the
   top 25 kidney genes showing very strong mRNA-protein correlations
   (Supplementary Data [319]19 and Fig. [320]6D). Kidney genes with
   evidence of causal association with BP (Supplementary Data [321]20)
   were also enriched (although to a lesser degree) for significant
   positive correlations with the respective proteins (Fig. [322]6C).

   Of 815 proteins, whose kidney expression was genetically imputable
   (Supplementary Data [323]21), 97 showed reciprocal association with at
   least one BP trait in two independent cohorts (UK Biobank and ICBP)
   (Supplementary Data [324]22 and Figs. [325]6E and [326]S8). Of these,
   57 had no evidence of association with BP at mRNA level because either
   their GReX prediction model did not converge (n = 34) or there was no
   or only weak BP signal in our kidney TWAS analysis (Supplementary
   Data [327]23). This may indicate either the existence of differences in
   the genetic regulation of their mRNA and protein expressions
   contributing to BP/hypertension or/and reflect a false-positive finding
   in proteome-wide association studies (PWAS).

   For 46 BP-associated kidney proteins we identified additional evidence
   for relevance to BP/hypertension at other molecular levels
   (Supplementary Data [328]23 and Fig. [329]6F). Indeed, parent genes of
   40 proteins showed association with BP in our kidney BP TWAS
   (Supplementary Data [330]23 and Fig. [331]6F) and six others had an
   additional layer of prior evidence for association with BP in our
   previous kidney QTL studies^[332]25 (Supplementary Data [333]23 and
   Fig. [334]6F).

   Overall, BP PWAS proteins were enriched for being therapeutically
   tractable (Fig. [335]6G). Indeed, when compared to a random set of
   kidney proteins, our set of 97 BP PWAS proteins showed a strong
   enrichment for proteins with discovery potential (i.e., not currently
   targeted) by small molecule and proteolysis targeting chimera (PROTAC)
   modalities (Fig. [336]6G).

   Our pathway enrichment analysis further revealed enrichment of
   BP-associated proteins for hypertensive crisis (Fig. [337]6H and
   Supplementary Data [338]24) as well as enrichment for RAS and proteins
   involved in innate and adaptive immunity (Fig. [339]6H and
   Supplementary Data [340]24). This is consistent with a rarely
   appreciated role of the kidney as a tissue contributor to immune
   activation^[341]38 and antimicrobial defence^[342]92. We also observed
   an enrichment of BP-associated proteins within metabolic pathways
   including those involved in detoxification of reactive oxygen species,
   arachidonic acid metabolism, tyrosine metabolism and fatty acid
   metabolism; in particular mitochondrial fatty acid beta-oxidation
   (Fig. [343]6H, Supplementary Data [344]24). Indeed, genetically
   determined reduction in three proteins of key importance to
   mitochondrial beta oxidation of fatty acids including long-chain
   specific acyl-CoA dehydrogenase, mitochondrial (ACADL), medium-chain
   specific acyl-CoA dehydrogenase, mitochondrial (ACADM) and Acyl-CoA
   Dehydrogenase Family Member 10 (ACAD10) were associated with increased
   BP (Supplementary Data [345]22).

   Collectively, we show that kidney proteins showing positive
   correlations with their parent mRNAs are enriched for genes of
   relevance to BP, genetically mediated hypertension/hypotension and
   antihypertensive treatment. This substantiates the evidence for
   informativeness of kidney transcriptome as a source of information on
   biological underpinnings of hypertension in the absence of larger
   datasets on kidney proteome. Through triangulation with other omics, we
   further corroborate the evidence behind relevance of kidney proteins to
   hypertension. Such proteins (i.e., with evidence of genetic
   contribution to their abundance at multiple molecular levels) are of
   utmost clinical interest, e.g. for druggability studies.

Transcriptome profiling of cells harvested from urine yields non-invasive
insights into expression of kidney genes including those of relevance to
blood pressure

   Human kidneys shed their epithelial cells (both of glomerular and
   tubular origin) into urine^[346]93–[347]96 and RNA-based analysis of
   urinary cells has been proposed as a new strategy with potential to
   inform kidney diagnoses^[348]97–[349]99.

   We have generated 33 RNA-sequencing derived transcriptomic profiles of
   urinary cell pellets from individuals recruited into our HKTR
   (Supplementary Data [350]25). We confirmed that a set of 12
   RNA-sequencing quality control metrics (generated by RNASeQC^[351]100
   produce either similar (e.g. “Expression profiling efficiency”) or
   superior (e.g. “Mapping rate”) values in urine compared to saliva
   (Fig. [352]S9). We detected expression of 21,981 genes in urinary cells
   and similar numbers of the major gene biotypes expressed in urine
   compared to human kidney tissue (Fig. [353]7A). Transcriptome
   complexity was broadly similar between urinary cells and kidney tissue
   (Fig. [354]7B). We confirmed there was no obvious transcriptomic
   differences due to different donor sources of urine in our study (i.e.,
   nephrectomy and kidney biopsy, Fig. [355]7C). This is in line with our
   previous observations on transcriptomic profiles of kidney tissue
   specimens^[356]84.

Fig. 7. Transcriptomic profiling of urinary cells by RNA-sequencing.

   [357]Fig. 7
   [358]Open in a new tab

   A Number of expressed genes from urinary cell pellets (n = 33) and
   kidneys (n = 430) by major gene biotypes. Purple – protein coding, blue
   – long non-coding, green – pseudogenes, yellow – short non-coding
   genes. B Transcriptome complexity for urinary cells and kidneys.
   Expressed genes are ordered from most to least highly expressed on the
   x-axis and the cumulative proportion of the overall transcriptome
   attributable to those genes is shown on the y-axis. Horizontal lines
   identify the proportion of overall transcriptome expression
   attributable to the top 25, 50 and 75% of genes in kidney (orange) and
   urine (blue) RNA-sequencing samples. C Hierarchical clustering of 33
   urinary cell transcriptomes from nephrectomy and biopsy samples.
   Samples are hierarchically clustered using Pearson’s correlation
   distances calculated from log[2](TPM + 1) gene expression values of all
   expressed genes in urine. Nephrectomy samples are coloured blue and
   biopsy yellow. D Overrepresented KEGG pathways and GO biological
   processes in the 100 most highly expressed genes in urinary cells and
   kidneys characterised by RNA-sequencing. Overrepresentation analysis
   (one-sided Fisher’s exact test) results significant at 5% false
   discovery rate. Results are grouped by biological themes determined by
   manual grouping of genes present in each pathway. Fold enrichment is
   represented by colour from pale to dark red (least to greatest
   enrichment). Statistical significance (negative log[10] P-value) is
   represented as the size of each circle. Non-significant results are
   represented by small grey circles. Some pathway names have been
   abbreviated or simplified for brevity. ROS – reactive oxygen species.
   SRP – signal recognition particle. E Transcriptome similarity between
   the urinary transcriptome and 54 GTEx tissues. Similarity is calculated
   as the Spearman’s correlation coefficient between median
   log[2](TPM + 1) expression of 19,273 expressed protein-coding genes.
   Tissue names have been abbreviated using alphabetic ordering. Colour
   denotes the magnitude of similarity from weak (grey) to strong (red).
   Tissues are ordered by the magnitude of similarity from strong (top) to
   weak (bottom). F Correlation in expression of 19,273 protein-coding
   genes between urinary cells and the kidney. Points represent median
   gene expression (log[2](TPM + 1)) in urine (x-axis) and a comparative
   tissue (y-axis). r - Spearman’s correlation coefficient. The linear
   trend line is derived from linear regression between urinary cell
   expression and the comparative tissue. Trend lines are coloured by data
   source – HKTR is blue and Genotype-Tissue Expression project is red. G
   Expression profile for genes specific to kidney cortex and medulla.
   Standardised median log[2](TPM + 1) median expression values from
   urinary cells, cortical and medullary renal tissue datasets for curated
   renal single-cell marker genes from HPA. Genes are ordered by
   hierarchical clustering of the expression values. Expression is
   represented by colour from three standard deviations below mean
   expression (blue) through grey (mean expression) to three standard
   deviations above the mean expression (red). H Correlation between
   urinary cell and kidney median expression (log[2](TPM + 1)) for 339
   causal BP TWAS genes. The ten genes with smallest deviation from the
   linear regression line are labelled. Error bands represent 95%
   confidence intervals. r – Pearson correlation coefficient.

   We hypothesised that functional annotations of genes with the strongest
   expression in cells harvested from urine and the kidney will show a
   strong overlap. Our analysis of the 100 mostly highly expressed genes
   in urine and kidney tissue showed an overrepresentation for 33 and 35
   pathways, respectively (KEGG pathways and Gene Ontology Biological
   Process terms, Supplementary Data [359]26), 24 (73%) of which were
   common to both urinary cells and the kidney (Fig. [360]7D) and there
   were 44 distinct overrepresented pathways (Fig. [361]7D). These
   pathways were then manually grouped into 6 biological themes by overlap
   of shared genes. Most of the common pathways mapped onto immunity theme
   are consistent with the key role of the urinary tract in immune
   activation and defence against infections^[362]38,[363]92. The top 100
   most highly expressed genes in urinary cells also showed enrichment for
   pathways reflective of the renal contributions to ion exchange
   (Fig. [364]7D). We also noted that the genes in urinary cells showed
   enrichment for glucose metabolism, glycolysis and gluconeogenesis
   (Fig. [365]7D). The latter is consistent with the renal origin of these
   cells given that apart from the liver, kidney (mainly proximal tubule)
   is the only other organ capable of de novo glucose
   production^[366]101,[367]102.

   We then examined the extent of transcriptomic similarity between
   urinary cells from our dataset and 54 human tissue types from GTEx.
   Kidney cortex and kidney medulla showed the highest degree of
   correlation in expression of 19,273 protein-coding genes with urinary
   cells (Fig. [368]7E and Supplementary Data [369]27) at r = 0.81 and
   r = 0.80, respectively (Fig. [370]7F). We confirmed the magnitude of
   the correlation between urinary cells and the kidney using an
   independent (to GTEx) set of 430 kidney tissue samples from HKTR
   (Fig. [371]7F).

   We also found that the magnitude of expression of genes known as
   markers of renal cortex (20 genes) and those from medulla (11 genes)
   was correlated with that in urinary cells (Fig. [372]7G). There was a
   particularly strong correlation between urinary cell expression of
   medullary markers (Fig. [373]7G). UMOD in urinary cells was comparable
   to that in the medulla and higher than in cortex, consistent with the
   production of UMOD within the ascending loop of Henle (Fig. [374]7G).
   For cortical markers, NAT8, MIOX, RIDA, BHMT all had relative urinary
   expression levels comparable with that of cortex samples.

   Finally, we examined the urinary cell-kidney correlation in expression
   of 399 kidney genes showing a causal association with BP in our
   analyses. We noted that 339 of these genes had detectable expression in
   urinary cells. There was a strong positive correlation in expression of
   these genes between urinary cells and the kidney (r = 0.73,
   P-value = 2.85 × 10^−57, Fig. [375]7H).

   Collectively, we generated robust profiles of urinary cell
   transcriptome and showed its excellent correlation with the kidney
   transcriptome. We further demonstrate that several histological and
   functional annotations of gene expression profiles harvested from urine
   highlight specific kidney regions as the likely origin of these cells.
   Finally, we determine that profiling of urinary cells transcriptome
   offers a non-invasive insight into expression of genes of relevance to
   BP in the kidney. This highlights a potential diagnostic applicability
   of urinary cell transcriptomics, e.g., in development of non-invasive
   tests to determine kidney health and predict renal damage through
   analysis of spot urine samples.

Genetically programmed reduction in kidney abundance of glutamyl
aminopeptidase gene (ENPEP) and protein is associated with increased blood
pressure and the risk of hypertension – integration of evidence from
genome-level, kidney transcriptome and proteome

   ENPEP encodes glutamyl aminopeptidase – an enzyme responsible for
   cleaving an aspartate from N-terminal of angiotensin II in the process
   of its conversion to angiotensin III (Fig. [376]8A). It is an
   attractive druggable target for development of new antihypertensive
   medications. However, the recent clinical trial (Firibastat in
   Treatment-resistant Hypertension - FRESH) has failed to demonstrate a
   BP lowering effect through pharmacological inhibition of this enzyme -
   an oral inhibitor of brain glutamyl aminopeptidase (Firibastat) did not
   lead to a significant reduction in BP in patients with
   difficult-to-treat and resistant hypertension when compared to the
   placebo.

Fig. 8. Kidney glutamyl aminopeptidase gene (ENPEP) and blood pressure.

   [377]Fig. 8
   [378]Open in a new tab

   A Simplified renin-angiotensin system. AGT – angiotensinogen gene. B
   Normalised ENPEP expression in GTEx. Red – Tissues enhanced, nTPM –
   consensus normalised expression. C Effects of genetically regulated
   expression (GReX) of ENPEP on systolic blood pressure (SBP), diastolic
   blood pressure (DBP) and pulse pressure (PP) from kidney TWAS. Nominal
   P-value is calculated from two-sided Z-score test. UKB – UK Biobank,
   ICBP – International Consortium for Blood Pressure, Estimate – change
   of SBP/DBP/PP in units per one-unit higher GReX of ENPEP, CI –
   confidence interval. D. Causal effects of GReX of kidney ENPEP on
   SBP/DBP/PP in UKB and ICBP. Estimate – change of outcome in units per
   one-unit higher GReX of ENPEP. Nominal P-value is calculated from
   one-sided chi-squared test. E Effect of GReX of ENPEP (per one standard
   deviation higher) on urinary sodium in UKB. Nominal P-value is
   calculated from linear regression (two-sided test). Red arrow –
   positive association. SE – standard error. Partially created with
   BioRender.com. F Effects of genetically regulated protein (GReP) of
   glutamyl aminopeptidase on SBP/DBP/PP from kidney PWAS. Nominal P-value
   is calculated from two-sided Z-score test. Estimate – change of outcome
   in units per one-unit higher GReP of glutamyl aminopeptidase. G
   rs33966350 of ENPEP [A – minor allele, G – major allele] leading to a
   premature stop codon at position 413 (out of 957) in exon 6. A
   shortened glutamyl aminopeptidase protein (red) compared to the normal
   variant (green). White – active site (catalysis residues), blue –
   binding site (protein-chemical interactions), Trp – Tryptophan, NMD –
   nonsense-mediated mRNA decay. Structural domains are coloured within
   each protein structure. Created with BioRender.com. H Effects of
   rs33966350-A on human diseases from FinnGen. Nominal P-value is
   calculated from Firth regression (two-sided test). Dots are coloured by
   categories of diseases. Beta – log of Odds ratio. I Effect of
   rs33966350-A on urinary sodium in UKB. Blue arrow – negative
   association. Partially created with BioRender.com. J Effect of
   rs33966350 on ENPEP expression in HKTR. Nominal P-value is calculated
   from linear regression (two-sided test). K, L Effects of rs33966350 on
   ENPEP expression and protein abundance in CPTAC. Nominal P-value is
   calculated from linear regression (two-sided test). In 8J-8L, whiskers
   denote extent of 1.5x interquartile range. Upper, middle and lower box
   lines denote 75th, 50th and 25th percentiles, respectively. In C, D and
   F, squares are positioned by the estimated effects with horizontal
   error bars illustrating 95% confidence intervals.

   ENPEP is one of kidney-enriched genes – it shows approximately 36-fold
   higher expression in the kidney than cerebral cortex (Supplementary
   Data [379]28 and Fig. [380]8B).

   Our kidney BP TWAS study showed a consistently strong association
   between genetically determined increase in renal expression of ENPEP
   and a drop in all BP traits across two independent datasets as well as
   in their joint analysis bringing together approximately 750,000
   individuals (Fig. [381]8C) and further MR demonstrated a causal effect
   (Fig. [382]8D). Genetically determined renal expression of ENPEP showed
   an association with increased urinary sodium in 337,350 individuals
   from UK Biobank (Fig. [383]8E); this may suggest renal Ang III-mediated
   effect on natriuresis as the relevant mechanism^[384]103,[385]104.
   Through our BP kidney PWAS we also demonstrated an association between
   glutamyl aminopeptidase protein and BP in the consistent direction –
   reduced abundance of the protein was associated with higher values of
   DBP (Fig. [386]8F). Finally, we determined a signal of multi-trait
   colocalisation between DBP, ENPEP mRNA and protein expression in the
   kidney on chromosome 4 (Supplementary Data [387]29).

   We then sought to provide an orthogonal replication of these findings
   using a rare ENPEP variant (rs33966350) associated with hypertension in
   previous GWAS^[388]21. We confirmed that rs33966350 has not been
   included in the models used to generate GReX for ENPEP in TWAS and PWAS
   and is not in strong LD with the genetic variants used in these models
   (Supplementary Data [389]30). The rare allelic variant (A) of
   rs33966350 leads to a premature stop codon, has a scaled Combined
   Annotation Dependent Depletion (CADD) score of 43 (consistent with top
   ~0.01 % of 8.6 billion variants) and is one of the high-confidence loss
   of function variants (Fig. [390]8G).

   Our in silico analysis mapped this nonsense variant to exon 6 of ENPEP
   and confirmed that the truncated mRNA is a strong candidate for
   nonsense-mediated decay (NMD) and that the truncated protein (lacking
   two binding sites and a residue responsible for transition-state
   stability) is unlikely to be functional (Fig. [391]8G).

   In the phenome-wide association of 2269 binary traits in 377,277
   individuals from the FinnGen consortium (r9.finngen.fi) (Fig. [392]8H)
   we identified hypertension as the top association signal for
   rs33966350. We further uncovered that the carriers of AA genotype of
   this variant had approximately 33% increase in odds of hypertension
   compared to those with a wild-type genotype (P-value = 1.6 × 10^−6).
   Our studies in UK Biobank confirmed that rs33966350 was also associated
   with urinary sodium (P-value = 4.7 × 10^−2, Fig. [393]8I). We then
   showed that when compared to those with a wild-type genotype (GG),
   carriers of one copy of A-allele of rs33966350 have approximately
   4.9-fold lower expression of ENPEP mRNA in the analysis of 478 kidneys
   (P-value = 8×10^−14, Fig. [394]8J). We replicated this observation in
   the analysis of 65 kidneys from CPTAC – carriers of the rare homozygous
   genotype had significantly lower levels of ENPEP than those with GG
   genotype (P-value = 4.4 × 10^−7, Fig. [395]8K). We further validated
   these observations at the protein level (Fig. [396]8L).

   Collectively, our data show that genetically determined reduction of
   kidney abundance of ENPEP mimicking the effects of its rare
   loss-of-function genetic variant on gene and protein expression leads
   to increase in BP and the risk of hypertension possibly via effects on
   renal excretion of sodium. These results provide a persuasive case for
   the development of pharmaceuticals that increase levels of ENPEP in the
   kidney (rather than inhibit its activity in the brain, e.g. as in FRESH
   trial).

Discussion

   Through analysis of effects of subtle changes in genetically determined
   portion of gene expression on phenotypes at a population scale, TWAS
   has emerged as a powerful approach to gene
   discovery^[397]36,[398]47,[399]105–[400]108. The attractiveness of TWAS
   lies in its scaled-up detection power^[401]109,[402]110, immunity to
   confounding from pharmacological treatment, lifestyle,
   environment^[403]111, and reverse causality^[404]105 as well as a
   capacity to uncover disease/trait-associated genes within chromosomal
   regions “missed by GWAS”^[405]47. Indeed, application of TWAS-based
   approach in our project has increased the discovery of kidney genes
   showing robust associations with BP by 2.2-fold when compared to our
   previous cis-eQTL-driven analysis of BP GWAS^[406]25 and mapped many of
   them outside the BP GWAS loci. This enhanced discovery power stems
   partly from enhanced genetic input into the TWAS prediction models
   (aggregation of numerous SNPs for each gene at the same time rather
   than single “top” e-variant) and a reduced burden for multiple testing
   (fewer genes in TWAS than SNPs in GWAS are tested^[407]109,[408]110).
   We further augmented the rate of gene discovery through integrating
   input from kidney 3D genomic and epigenomic data in our GReX
   models^[409]41 developed for the purpose of TWAS. With larger tissue
   reference panels, cross-ethnic analyses^[410]112,[411]113, utilisation
   of input from rare and in-trans variants^[412]114, future BP TWAS
   studies should be able to generate even more precise estimates for
   expression of more human genes enhancing the discovery of new and fine
   mapping of the existing BP GWAS and TWAS loci.

   The genotype-based imputations of molecular layers other than
   transcriptome are also gaining traction in post-GWAS searches of the
   effector genes for complex disorders^[413]115–[414]117. Our study is
   the first analysis applying the genotype-based prediction models to
   impute the kidney microRNAome and proteome. This has helped us not only
   to uncover new BP-associated molecules amongst kidney miRNAs and
   proteins but also to substantiate the evidence for the robustness of
   our key targets emerging from BP kidney TWAS (e.g., through multi-omic
   overlaps or/and mediation analyses).

   We also illustrate the benefit of triangulation of outputs from kidney
   omics with data from other molecular layers including plasma proteomics
   and metabolomics to uncover the consequences of BP-related changes in
   gene/protein expression in clinically accessible materials. Such
   analyses are a critical step in translating the findings from post-GWAS
   omics into diagnostically actionable targets with a potential to be
   tested e.g., as bio-markers of hypertension-mediated organ damage. We
   further provide a proof-of-principle for the informativeness of urinary
   cells to track the expression of the kidney transcriptome and the
   expression of genes associated with BP in GWAS and TWAS. Urinary mRNA
   signatures have emerged as non-invasive predictors of kidney allograft
   status^[415]97,[416]99; our data corroborate the evidence behind
   potential diagnostic applicability of cell harvested from human urine
   even in the absence of a powerful clinical driver stimulating the
   urinary excretion of leucocytes^[417]97,[418]99.

   Apart from uncovering new potential repositioning opportunities for
   hypertension, and illuminating BP-related effects of drugs used in
   other conditions than hypertension, our analyses provide specific cues
   into targets already tested in clinical trials (i.e. glutamyl
   aminopeptidase encoded by ENPEP) or those emerging as novel therapeutic
   strategies for hypertension [(e.g. suppression of AGT mRNA expression
   using antisense oligonucleotides^[419]118]. Numerous kidney genes
   causally associated with BP in our studies show a high level of
   therapeutic tractability. Indeed, SLC5A11 and AQP4 belong to druggable
   families whose other members (e.g. SLC5A2 and AQP2) are already
   targeted (either directly or indirectly) by existing
   cardiovascular/nephroprotective medications (i.e. SGLT2 inhibitors and
   vasopressin receptor 2 antagonists)^[420]119–[421]121.

   Our study has several limitations. First, it should be noted that TWAS,
   MR and FOCUS are not fully orthogonal approaches to discovery of genes
   associated with a phenotype of interest. However, integration of these
   three approaches in one analytical pipeline increases the level of
   robustness and confidence in the detected signals (genes) that received
   support at each stage of this computational strategy. Second, at the
   TWAS stage we chose an FDR-based correction for multiple testing rather
   than a more conservative Bonferroni correction. This means we could
   have discovered more genes with perhaps less stringent statistical
   evidence. However, we have carefully reduced the likelihood of false
   positive signals through a layered system of gene replications and
   refinements created by applying additional downstream filters on genes
   delivered by TWAS, i.e., independent replication, MR and FOCUS. Third,
   we accept that our “recapitulation analyses” tested a limited number of
   carefully selected biochemically active transporters and receptors in
   the kidney. Further studies inclusive of other genes, molecules,
   pathways, and phenotypes of relevance to kidney physiology will help to
   strengthen the evidence for biological robustness of the predicted
   kidney gene expression models. Finally, due to a very limited
   availability of kidney tissues samples from individuals of non-white
   European origin^[422]24 we were restricted in our studies to the
   European ancestry group. More international efforts are required to
   overcome this barrier to cross-ancestry analyses of kidney-relevant
   diseases in the post-GWAS era.

   Taken together, these studies demonstrate the value of kidney omics to
   provide new biological understanding of the genetic regulation of BP
   and to generate insights of therapeutic relevance to hypertension.

Methods

Prioritisation of human tissues of relevance to blood pressure

Transcriptome-wide association studies across 49 human cell-types and tissues

   We used genotype information with matching transcriptome of 49 human
   cell-types and tissues included in the Genotype-Tissue Expression
   (GTEx, v8) for the purpose of this analysis. Given that the currently
   employed metrics cannot fully account for the dependence of TWAS
   discovery on the sample size, we opted for a numerically equal
   representation of samples from each selected tissue – 65 samples of
   European ancestry was the most optimal cost-performance trade-off
   between the power of discovery and the number of tissues included in
   the analysis^[423]122. Using an equally weighted sampling
   algorithm^[424]123, we selected a random 65 samples from each of 49
   tissues using publicly available GTEx v8. Genotype information derived
   from whole genome DNA sequencing was downloaded from dbGaP. We
   conducted the quality control on the genotype data using updated
   PredictDB Pipeline ([425]http://predictdb.org/). In total, 6,821,602
   single nucleotide polymorphisms (SNPs) were available for further
   analyses. Gene expression information was derived from
   quality-controlled and normalised RNA-sequenced tissue datasets from
   the GTEx ([426]http://www.gtexportal.org/). In line with the
   recommendations of GTEx Consortium^[427]122, 15 probabilistic
   estimation of expression residuals (PEER) were generated using the PEER
   framework^[428]124 for each tissue. Gene expression residuals for the
   purpose of transcriptomics model prediction were generated by adjusting
   the gene expression for sex, RNA-sequencing protocol (PCR-based or
   PCR-free), sequencing platform (Illumina HiSeq 2000 or HiSeq X), top
   five genotyping principal components and 15 PEER factors. As an input
   into TWAS analysis, we also used meta-analysis summary of the
   associations between 7,088,121/7,160,657 SNPs and SBP/DBP from 750,000
   individuals^[429]22 (UK Biobank and ICBP data). The position of SNPs
   under GRCh37 in the data was converted to GRCh38 by LiftOver^[430]125.
   SNPs not present in the 1000 genome reference panel under GRCh38 were
   excluded from the analysis.

   Following the updated PredictDB Pipeline, we trained predictors of gene
   expression by applying PrediXcan^[431]126 to genotype and
   RNA-sequencing datasets across 49 tissues in GTEx v8. For each tissue,
   we kept genes with nested cross validated correlation between predicted
   and actual levels > 0.10 (or equivalently R^2 > 1%) and P-value of the
   correlation test <0.05. We identified gene-SBP/DBP associations by
   analysing summary statistics from SBP/DBP and gene expression
   predictors using S-PrediXcan^[432]127. SNPs in the broad major
   histocompatibility complex (MHC) region (chromosome 6: 28–34 Mb) were
   excluded. The correction for multiple testing was calculated using
   Bonferroni-adjusted P-value < 0.05 after adjustment for the total
   number of genes in each tissue tested. The gene-SBP/DBP associations
   that remained after this filtering were considered significant.

   To quantify the overall relevance of a tissue to BP, we adopted three
   independent metrics of the tissue-disease association. Firstly, a
   proportion of independent genes identified from TWAS was quantified in
   each tissue. Gene-gene independence was defined as a correlation
   between genes R^2 < 0.05^[433]128 within a 2 Mb region or genes located
   larger than 2 Mb from each other. Second, the mean TWAS association
   statistics (mean of squared Z-score) using all independent genes
   identified from TWAS^[434]110 were used to quantify average strength of
   their association with SBP and DBP – the tissues were ranked based on
   the average effect of the magnitude of significant genes’ association
   with BP. Third, we counted the number of significant genes associated
   with SBP/DBP outside the previously identified BP GWAS loci, i.e. 49
   tissues were ranked according to the number of TWAS genes showing no
   overlap with any BP GWAS loci (GWAS locus was defined as ±1 Mb from the
   sentinel BP GWAS SNP). The independence of three metrics was tested by
   Pearson’s product-moment correlation test. The overall rank was derived
   by taking the sum of three ranks for each tissue. The relevance of each
   tissue to BP were then visualised by integrating DBP overall ranking
   score against SBP overall ranking score for each tissue (Fig. [435]2A).

Populations, DNA and RNA processing

Human Kidney Tissue Resource – populations and samples

   The Human Kidney Tissue Resource (HKTR) is the collection of human
   kidney tissue samples secured for the purpose of multi-omics analyses.
   The following studies have contributed to the resource: the
   TRANScriptome of renaL humAn TissuE study
   (TRANSLATE)^[436]25,[437]31–[438]34,[439]84, TRANScriptome of renaL
   humAn TissuE - Transplant study (TRANSLATE-T)^[440]25,[441]32,[442]34,
   moleculAr analysis of human kiDney-Manchester renal tIssue pRojEct
   (ADMIRE)^[443]25,[444]34, Renal gEne expreSsion and PredispOsition to
   cardiovascular and kidNey Disease (RESPOND)^[445]25,[446]34 and
   moleculaR analysis of mEchanisms regulating gene exPression in
   post-ischAemic Injury to Renal allograft (REPAIR)^[447]25,[448]34.

   As reported before^[449]25, the specimens were taken directly from the
   healthy (unaffected by cancer) pole of the kidney immediately after
   elective nephrectomy or by needle biopsy of donor kidneys before the
   transplantation. Of the 478 specimens used in this study 296 were from
   nephrectomies and 182 were from kidney biopsies. All individuals were
   of white-European ancestry. Further information on the individuals
   recruited into each study are given in Supplementary Data [450]31.

Human Kidney Tissue Resource – DNA genotyping data generation, quality
control and analysis, genetic principal components

   As reported previously^[451]25,[452]31–[453]33,[454]84, tissue samples
   were first homogenised and DNA extracted using the Qiagen DNeasy Blood
   and Tissue kit. DNA was then hybridised to the Illumina
   HumanCoreExome-24 beadchip array. Genotype calls for each sample were
   made using Illumina GenomeStudio. DNA quality control consisted of
   excluding any sample displaying cryptic relatedness, low genotyping
   rate (<95%), heterozygosity outside ±3 standard deviations from the
   mean, genetic/phenotypic sex mismatch. Variant-level quality control
   excluded all variants with genotyping rate (<95%), genomic location on
   a sex-chromosome or mitochondrial genome, ambiguous genomic location,
   Hardy-Weinberg equilibrium (HWE) P-value <1 × 10^−3 and minor allele
   frequency <5%. Genotypes were imputed by the Michigan Imputation Server
   (MIS)^[455]129 using the 1000 Genomes phase 3 reference panel.
   Post-imputation quality control excluded all variants duplicated
   genomic location, imputation score <0.4, minor allele frequency <1% or
   HWE P-value < 1 × 10^−6. 8,735,852 variants remained after all
   genotyping quality control steps. Genotype principal components were
   derived from genotyped autosomal variants that passed all genotyping
   quality control filters using EIGENSTRAT^[456]130 and
   SNPWeights^[457]131.

Human Kidney Tissue Resource – RNA-sequencing data generation, quality
control and analysis

   As reported previously^[458]25,[459]31–[460]33,[461]84, kidney tissue
   was first homogenised and then either the Qiagen RNeasy kit or the
   Qiagen miRNeasy kit was used to complete the extraction of RNA. RNA
   integrity and purity was checked and 1 μg of normalised RNA was used in
   either a New England Biosciences or Illumina TruSeq poly-A selection
   sequencing library preparation protocol. Libraries were then sequenced
   paired-end with either a 75 bp, 100 bp or 150 bp read length. An
   average of 35 million paired reads and 6 Gb of sequencing data were
   generated per sample. RNA-sequencing quality control metrics were
   generated by FastQC and RNASeQC. Reads were pseudoaligned to the
   Ensembl v83 GRCh38 human transcriptome reference using Kallisto
   v0.44.0. Gene expression was quantified at the transcript-level in
   units of Transcripts Per Million (TPM) and was summarised to the
   gene-level by summing expression of all transcripts produced by each
   gene in the Ensembl transcriptome reference. The criteria for a gene to
   be expressed were if at least 20% of kidney samples in each population
   had TPM > 0.1 and read count > 5. After application of these criteria
   21,414 kidney genes remained available for analysis.

Inference of cell-type proportions from RNA-sequencing data

   To adjust for between-sample cell-type heterogeneity in gene expression
   data from HKTR we used a computational deconvolution approach combined
   with a single-cell renal gene expression dataset. Firstly, we extracted
   expression profiles for the 3448 normal kidney cells present in the
   data generated by Young et al^[462]132. These data were generated from
   FACS-sorted cellular suspensions which were derived from ≈30mm^3 kidney
   tissue samples. Sequencing cDNA libraries were created by the 10X
   Genomics Chromium platform and sequenced on an Illumina HiSeq 4000.
   Gene expression, at the single cell level, was then normalised and
   kidney cell-types were identified using the Seurat R package^[463]133.
   We identified seven key distinct cell-type clusters amongst these cells
   and then used non-negative least squares multivariate regression on the
   single-cell expression data to determine cell-type specific gene
   weightings using the “MuSiC” R package^[464]134. We applied these gene
   weightings to all bulk tissue sample profiles and deconvolved estimated
   cell-type proportions for each sample.

Human Kidney Tissue Resource ethical compliance

   The studies adhered to the Declaration of Helsinki and were
   approved/ratified by the Bioethics Committee of the Medical University
   of Silesia (Katowice, Poland), Bioethics Committee of Karol
   Marcinkowski Medical University (Poznan, Poland), Ethics Committee of
   University of Leicester (Leicester, UK), University of Manchester
   Research Ethics Committee (Manchester, UK) and National Research Ethics
   Service Committee Northwest (Manchester, UK). Informed written consents
   were obtained from all individuals recruited (for the deceased donors,
   the consent was obtained in line with the local governance; e.g. from
   the family members).

National Institutes of Health kidney collections – populations and samples

   We used kidney samples from The Cancer Genome Atlas (TCGA)^[465]135,
   GTEx^[466]122 and Clinical Proteomic Tumor Analysis Consortium
   (CPTAC)^[467]136.

   TCGA contains human tissue samples collected after elective surgical
   procedures for a variety of cancers. Normal adjacent tissue samples
   (NATs) were taken from companion normal tissue adjacent to the
   tumour^[468]135. We identified 91 kidney NATs with matching genotype
   and RNA-sequencing data in this resource.

   Tissue samples in the GTEx project were collected post-mortem and
   immediately stored for DNA/RNA extraction and processing^[469]122. We
   identified and used 65 kidney cortex samples with matching whole genome
   sequencing and RNA-sequencing data.

   CPTAC collects and generates proteomic, transcriptomic and genomic data
   from a variety of solid tissue tumours, along with same data from
   NATs^[470]136. We used 66 NAT kidney tissue samples with whole genome
   sequencing and RNA-sequencing profiles for the purpose of TWAS.

   Collectively, we used 222 kidney samples from NIH cohorts for the
   purpose of prediction performance validation of the GReX prediction
   models. The characteristics of individuals from these cohorts are given
   in Supplementary Data [471]32.

National Institutes of Health kidney collections – DNA extraction, genotyping
data generation, quality control and analysis

   In TCGA, DNA was extracted from blood samples using QiAamp Blood Midi
   Kit (CGARN, 2016) and hybridised with probes on the Affymetrix SNP 6.0
   array (composed of 906,600 probes); genotype calls were made using the
   Birdseed algorithm
   ([472]https://www.broadinstitute.org/birdsuite/birdsuite-analysis). The
   TCGA genotype data were downloaded from the Genomic Data Commons (GDC)
   Portal’s legacy archive. A total of 525 cases/files were initially
   identified using the following query criteria: “project name”—“TCGA”,
   “primary site”—“kidney”, “sample type”—“solid tissue normal”,
   “race”—“white”, “data category”—“simple nucleotide variation”, “data
   type”—“genotypes”, “experimental strategy”—“genotyping array” and
   “access”—“controlled”. We downloaded the data for 110 individuals who
   had matching RNA-sequencing-derived information on the transcriptome of
   normal kidney tissue. Genotype quality control, imputation,
   post-imputation quality control and genotype principal component
   analyses were performed identically to the HKTR genotyping data set.
   After all steps were complete, 8,541,201 variants remained in the TCGA
   genotyping data set.

   Genotyping in the GTEx project was performed by whole genome sequencing
   on an Illumina HiSeqX (to 15x coverage) on DNA extracted from blood
   samples. The full experimental protocol is reported in the following
   NCI SOPs: BBRB-PR-0004, BBRB-PR-0004-W1, BBRB-PR-0004-W1-G3 and in the
   original publication^[473]122. Complete methodological detail is
   provided elsewhere^[474]122, but in brief, raw reads were mapped to the
   human GRCh38 reference sequence using BWA-MEM
   ([475]http://bio-bwa.sourceforge.net). Autosomal variant calling was
   performed by SHAPEIT. Genotype data was downloaded in indexed VCF
   format from dbGAP. The final number of variants available for
   combination with other genotype datasets was 45,138,608.

   DNA extraction in CPTAC was performed on blood samples for each sample
   according to the following SOPs:
   [476]https://brd.nci.nih.gov/brd/sop-compendium/show/41. Briefly, DNA
   was extracted using the QIAsymphony DNA Mini Kit (Qiagen), acoustically
   sheared, indexed, multiplexed and then sequenced to 15x coverage on an
   Illumina HiSeqX. FASTQ reads were mapped to the National Cancer
   Institute (NCI) Genomic Data Commons (GDC) GRCh38.d1.vd1 reference
   sequence
   ([477]https://api.gdc.cancer.gov/data/254f697d-310d-4d7d-a27b-27fbf767a
   834) following the standard GDC protocol
   ([478]https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq
   _Variant_Calling_Pipeline/#alignment-workflow) which involves read
   mapping with BWA-MEM^[479]137 alignment sorting and merging using
   Picard ([480]https://broadinstitute.github.io/picard), duplicate
   marking (also using Picard) and finally base quality score
   recalibration using the Genome Analysis ToolKit (GATK). BAM files
   containing reads aligned using this workflow were then downloaded from
   the GDC portal using the following query parameters:
   cases.samples.sample_type = ‘solid tissue normal’, cases.primary_site =
   “kidney”, cases.project.program.name = “CPTAC”, files.data_format =
   “bam” and files.experimental_strategy = “WGS”. Variant calling was
   performed by the GATK version 3.8 “HaplotypeCaller” variant caller in
   single sample GVCF mode with the following command line arguments
   “-dontUseSoftClippedBases -ERC GVCF” using 8 CPU cores with 32GB of
   RAM. All individual GVCF files were then genotyped jointly using the
   GATK tool “GenotypeGVCFs” using default arguments using 32 CPU cores
   with 128GB or RAM. BCFTools was then used to split any multiallelic
   variant calls into biallelic calls to produce a final variant call set
   which could be integrated with that from all other cohorts. The final
   number of variants available for combination with other genotype
   datasets was 19,847,739.

National Institutes of Health kidney collections – RNA processing,
RNA-sequencing data generation, quality control and analysis

   In TCGA kidney RNA was extracted using the Qiagen ALLPrep kit.
   Sequencing libraries were generated from poly-A selected mRNA and
   sequenced on an Illumina HiSeq2000 producing an average of 80.6 million
   paired reads per sample. All TCGA sequencing reads for kidney NAT
   samples were downloaded from the genomic data commons portal
   ([481]https://portal.gdc.cancer.gov).

   Tissue RNA extraction in GTEx was performed as reported
   before^[482]138; sequencing libraries were generated using the Illumina
   TruSeq poly-A selection protocol. The libraries were sequenced on
   either an Illumina HiSeq2000 or HiSeq2500 producing an average of 82
   million paired 76 bp reads per sample. GTEx sequencing reads for all
   kidney cortex samples were downloaded from the GTEx v8 google cloud
   computing bucket (data accessed May 2021).

   Extraction of RNA from CPTAC samples was performed using the study’s
   SOPs: [483]https://brd.nci.nih.gov/brd/sop-compendium/show/41. The
   libraries were constructed using the Illumina TruSeq total RNA protocol
   with rRNA depletion (RiboZero gold) and sequenced on an Illumina
   HiSeq4000 generating a minimum of 120 million paired 75 bp reads per
   sample. CPTAC aligned reads were downloaded from the genomic data
   commons data portal ([484]https://portal.gdc.cancer.gov/).

   All validation panel samples were downloaded, raw FASTQ reads extracted
   and pre-processed using the protocol identical to the discovery
   resource. Raw FASTQ reads were extracted from aligned BAM files using
   bamtofastq from biobambam ([485]https://github.com/gt1/biobambam).

Regulatory compliance

   NIH has granted us access to TCGA, GTEx and CPTAC data under the
   approved dbGAP project 13040.

Generation of genetically regulated expression models for the purpose of
kidney blood pressure transcriptome-wide association studies

Input into gene expression prediction models – Human Kidney Tissue Resource
(discovery resource)

   We conducted the quality control on the genotype data using updated
   PredictDB Pipeline ([486]http://predictdb.org/). SNPs not present in
   the 1000 genome reference panel under GRCh37 and not available from the
   meta-analysis summary of the SNP-BP associations^[487]22 were excluded
   from the analysis. In total, 6,571,172 SNPs remained for the analysis.

   Gene expression, in TPM units, was normalised by logarithmic
   transformation, quantile normalisation (using the R package
   aroma.light) and rank-based inverse normal transformation, as described
   before^[488]25. We then used PEER^[489]124 to infer 100 hidden factors
   that describe global sources of variation in the normalised data.
   Residuals of gene expression were then calculated from the normalised
   data by adjusting for age, sex, tissue source (nephrectomy/biopsy), the
   first three genetic principal components, the 100 PEER hidden factors
   and seven cell-type proportions using linear regression.

Gene annotation data

   Gene biotypes, start and end coordinates, strand information and
   chromosomal localisations were collected using the “biomaRt” R package
   using the Ensembl gene ID for all expressed genes in our data set. We
   queried v83 of the “hsapiens_gene_ensembl” dataset using the GRCh38
   human genome build, data accessed February 2021. The positions of genes
   were also converted to GRCh37 by LiftOver^[490]125 for downstream
   analyses.

Prediction Using Models Informed by Chromatin conformations and Epigenomics
model for imputing gene expression in the kidney

   PUMICE^[491]41 generates GReX models utilising epigenetic information
   (i.e. epigenetic annotation tracks) to prioritise essential genetic
   variants that carry important functional roles and 3D genomic
   information to define windows that harbour cis-regulatory variants.

   Variants that overlap these annotation tracks are deemed “essential”
   genetic variants, and variants that do not overlap the annotation
   tracks are deemed “non-essential variants”. We use X[1] to represent
   the set of essential genetic variants, X[2] to denote “non-essential”
   genetic variants, and E to represent the vector of gene expression
   levels across multiple individuals. ϕ is a tuning parameter that
   controls the penalty on the “essential” predictors relative to the
   non-essential predictors^[492]139. We assume ϕ≤1 so that “essential”
   predictors are penalised no more than the non-essential predictors.

   The GReX model seeks to estimate the weights β[1] and β[2] by
   minimising the following objective function:
   [MATH: <mi>L</mi><mrow><mo>(</mo><mrow><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>,</mo><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>;</mo><mi>λ</mi><mo>,</mo><mi>ϕ</mi></mrow><mo>)</mo></mrow><mo>=</mo
   ><mo>∣</mo><mo>∣</mo><mi
   mathvariant="bold-italic">E</mi><mo>−</mo><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>−</mo><msub><mrow><mi
   mathvariant="bold-italic">X</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   sub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>∣</mo><msubsup><mrow><mo>∣</mo></mrow><mrow><mn>2</mn></mrow><mrow><m
   n>2</mn></mrow></msubsup><mo>+</mo><mfrac><mrow><mn>1</mn></mrow><mrow>
   <mn>2</mn></mrow></mfrac><mo>×</mo><mfrac><mrow><mi>λ</mi></mrow><mrow>
   <mn>2</mn></mrow></mfrac><mrow><mo>(</mo><mrow><mi>ϕ</mi><mo>∣</mo><mo>
   ∣</mo><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>∣</mo><msubsup><mrow><mo>∣</mo></mrow><mrow><mn>2</mn></mrow><mrow><m
   n>2</mn></mrow></msubsup><mo>+</mo><mo>∣</mo><mo>∣</mo><mspace
   width="0.25em"></mspace><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>∣</mo><msubsup><mrow><mo>∣</mo></mrow><mrow><mn>2</mn></mrow><mrow><m
   n>2</mn></mrow></msubsup></mrow><mo>)</mo></mrow><mo>+</mo><mfrac><mrow
   ><mn>1</mn></mrow><mrow><mn>2</mn></mrow></mfrac><mi>λ</mi><mrow><mo>(<
   /mo><mrow><mi>ϕ</mi><mo>∣</mo><mo>∣</mo><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>1</mn></mrow></msub><m
   o>∣</mo><msubsup><mrow><mo>∣</mo></mrow><mrow><mn>1</mn></mrow><mrow><m
   n>1</mn></mrow></msubsup><mo>+</mo><mspace
   width="0.25em"></mspace><mo>∣</mo><mo>∣</mo><msub><mrow><mi
   mathvariant="bold-italic">β</mi></mrow><mrow><mn>2</mn></mrow></msub><m
   o>∣</mo><msubsup><mrow><mo>∣</mo></mrow><mrow><mn>1</mn></mrow><mrow><m
   n>1</mn></mrow></msubsup></mrow><mo>)</mo></mrow> :MATH]
   1

   PUMICE also utilises different choices for windows that harbour
   cis-regulatory variants as another tuning parameter (denoted as w),
   which includes the ones defined by conventional linear windows
   surrounding gene start and end sites (i.e., ±250 kb and ±1 Mb) as well
   as by 3D genomics informed regions (i.e., loop and TAD). In total,
   nested cross-validation is utilised to select the optimal combinations
   of tuning parameters λ, ϕ, and w.

   For generating PUMICE-kidney-specific GReX models, we retrieved
   epigenomic annotation data for adult human kidney tissue from
   ENCODE^[493]140,[494]141 and kidney-specific chromatin
   conformations^[495]42. Specifically, we downloaded epigenomic
   annotations (in BED format) from ChIP-seq experiments for the following
   epigenetic marks: H3K27ac (ENCFF077LXK), H3K4me3 (ENCFF423PKK), DNase
   hypersensitivity sites (ENCFF416ORJ), and CTCF (ENCSR000DMC). For BED
   files based on GRCh38 human genome build, we used LiftOver implemented
   in rtracklayer^[496]142 to convert genomic positions to GRCh37 version.
   Fastq files of two technical replicates for H3K27ac HiChIP from the
   proximal tubule-derived HK2 cell line were obtained from European
   Nucleotide Archive (Run Accession SRR11434878 and SRR11434879)^[497]42.
   HiChIP paired-end reads for each replicate were combined and aligned to
   hg19 reference genomes using the Juicer pipeline^[498]143 with default
   parameters to assign reads to MboI restriction fragments and generate
   binned interaction matrices with aligned reads MAPQ ≥ 30. TADs were
   called using HiCtool^[499]144 at the required 40 kb resolution. A 40 kb
   resolution interaction matrix was obtained by converting 10 kb matrix
   generated by Juicer to 40 kb resolution using HiCExplorer^[500]145.
   Loops were called using Peakachu^[501]146 at 10 kb resolution using
   pre-trained CTCF ChIA-PET and H3K27ac HiChIP model for 30 million
   intra-reads. The final set of loops were combined from both models with
   predicted probability > 0.59. The probability threshold was chosen to
   obtain approximately 10,000 loops combining results from both models.

   We then used 478 samples from HKTR to generate GReX models for kidney
   genes. We trained predictors of gene expression by applying the above
   PUMICE-based algorithm to genotype and RNA-sequencing-derived
   transcriptome from HKTR. We kept genes with nested cross-validated
   Pearson correlation between predicted and actual levels > 0.10 (or
   equivalently r^2 > 0.01) and P-value of the correlation test
   <0.05^[502]126.

Prediction performance validation of PUMICE-derived genetically regulated
expression models for kidney genes using the validation panel of The
Genotype-Tissue Expression, The Cancer Genome Atlas and Clinical Proteomic
Tumor Analysis Consortium data

   For the purpose of the prediction performance validation, genotype data
   from GTEx, TCGA and CPTAC (n = 222) were combined. TCGA was integrated
   with GTEx and CPTAC by only retaining variants common to all data sets.
   SNPs with missing rate > 5% were excluded. After further overlapping
   with the genotype data from HKTR, the final genotype dataset contained
   6,305,298 autosomal variants that were used for prediction performance
   validation of the GReX models.

   Gene expression from all NIH datasets (GTEx, TCGA, CPTAC, totalling 222
   samples), in units of TPM, was normalised by logarithmic
   transformation, quantile normalisation (using “aroma.light”), and
   rank-based inverse normal transformation, as described before^[503]25.
   We then used PEER^[504]124 to infer 30 hidden factors that describe
   global sources of variation in the normalised data. Residuals of gene
   expression were then calculated from the normalised data by adjusting
   for age, sex, study (GTEx, TCGA, CPTAC), top three genetic principal
   components and the 30 hidden factors from PEER using linear regression.

   We obtained predicted expression for all imputable genes by applying
   the PUMICE-derived GReX models to our fully independent validation
   dataset of 222 human kidney tissue samples (from GTEx, TCGA and CPTAC).
   Predicted expression for each imputable gene was compared to residuals
   of gene expression across all samples using Pearson’s correlation
   coefficient. A correlation coefficient of > 0.1 was used as the
   criterion of a validated model. Predictive performance of
   PUMICE-derived validated models between the discovery and validation
   kidney resources were tested using Pearson’s correlation coefficient.

Analysis of recapitulation of biological function for generated models of
predicted gene expression

   We examined the association between predicted kidney expression of
   genes of key relevance to four biochemical blood phenotypes (i)
   reflective of the kidney’s involvement in excretion and/or reabsorption
   of solutes and (ii) measured in UK Biobank^[505]147 [serum glucose
   (mmol/L), serum urate (µmol/L), serum phosphate (mmol/L) and serum
   calcium (mmol/L)].

   For each phenotype, we looked up the GWAS SNPs (ordered by the
   magnitude of statistical significance of association with the
   respective trait) from the list of GWAS Catalog^[506]148 associations
   and selected the top genes with validated GReX models to which the GWAS
   SNPs were mapped (SLC2A2 for glucose, SLC2A9 and SLC2A12 for urate,
   IP6K3 for phosphate, CASR for calcium).

   We also manually curated the diseases arising from/related to
   abnormalities in each of the selected blood biochemistry traits
   (Supplementary Data [507]5) from those that (i) had at least 100 cases
   present in UK Biobank and (ii) were assigned full ICD10 code
   (Supplementary Data [508]5). In brief, type 2 diabetes and gout were
   selected as the diseases related to glucose and urate, respectively.
   For phosphate we examined “polyarthrosis” and “other arthrosis” traits
   (7 diseases in total) (Supplementary Data [509]5). A total of 17
   diseases were examined as related to renal metabolism of calcium under
   two umbrella terms (“urolithiasis” and “obstructive and reflux
   uropathy”) (Supplementary Data [510]5). The following four disorders
   were selected as relevant to renal handling of sodium/potassium
   “hyperosmolality and hypernatraemia”, “hypo-osmolality and
   hyponatraemia”, “hyperkalaemia” and “hypokalaemia” (Supplementary
   Data [511]5). Further details of UK Biobank fields, field codes and ICD
   codes are provided in Supplementary Data [512]5.

   In the absence of serum sodium/potassium levels in UK Biobank we
   selected SCNN1B as the well-established contributor to renal
   homeostasis of sodium/potassium^[513]24,[514]149.

   Associations between a genetically predicted expression of the relevant
   kidney gene (generated using a PUMICE algorithm in the discovery
   resource and validated in the validation resource) and a continuous
   blood biochemistry phenotype were conducted using the relevant genotype
   data from 337,350 unrelated individuals of white-European ancestry in
   UK Biobank^[515]147 – we used linear regression with an adjustment for
   age, sex, genotyping array and the top ten genetic principal
   components.

   Associations between a genetically predicted expression of the relevant
   kidney gene and a binary disease/disorder status were determined by
   Firth logistic regression^[516]150 (to minimise a potential bias
   arising from an imbalanced case/control ratio). The logistic regression
   models were adjusted for the same set of covariates as those included
   in the analysis of continuous traits. For the scenarios with more than
   one trait examined we applied a correction for multiple testing
   (calculated using the Benjamini–Hochberg FDR); the threshold of
   corrected statistical significance was established at FDR  <  0.05.

Kidney transcriptome-wide association studies of blood pressure

A catalogue of variants associated with blood pressure in genome-wide
association studies

   As an input into TWAS analysis, we used summary statistics of the
   associations between (i) 19,267,390 SNPs and SBP/DBP/PP from 337,422 UK
   Biobank individuals, and (ii) 7,371,711/7,476,460/7,359,508 SNPs and
   SBP/DBP/PP from 299,024 ICBP individuals^[517]22 (Supplementary
   Data [518]33). SNPs (i) not present in the 1000 genome reference panel
   under GRCh37 and (ii) not available in the BP GWAS analyses (UK Biobank
   and ICBP) were excluded from the analysis. A total of 6,305,298
   variants common for HKTR, TCGA, GTEx and CPTAC were retained for the
   TWAS analysis.

Existing blood pressure genome-wide association study loci and genomic loci
outside blood pressure genome-wide association study loci

   A total of 885 GWAS variants were associated with at least one of the
   BP traits (SBP, DBP, PP) in previous BP GWASs^[519]22. A GWAS locus was
   defined by a GWAS variant and the 1 Mb region adjacent to the variant
   (on both sides). For the purpose of this study, overlapping loci were
   merged into single locus. In such cases, the most significant GWAS
   variant in each locus was defined as the sentinel GWAS variant for this
   locus. As a result, 429 independent BP GWAS loci were identified
   (Supplementary Data [520]34). Genomic regions not overlapping with
   these BP GWAS loci were mapped to pre-defined disjoint LD
   blocks^[521]151.

Blood pressure kidney transcriptome-wide association studies – reciprocal
replication in UK Biobank and International Consortium for Blood Pressure

   We examined associations between kidney PUMICE-derived predicted gene
   expression and SBP, DBP and PP using S-PrediXcan^[522]127. We first
   examined associations between all 6490 kidney genes with validated
   expression models developed using kidney PUMICE with each of three
   BP-defining traits in each of the two resources, separately. The
   correction for multiple testing was based on the Benjamini–Hochberg
   FDR. Kidney genes whose predicted expression retained its significant
   association with BP in one of the resources were then taken for
   reciprocal replication (for analysis with the same BP trait) in the
   other resource. The correction for multiple testing was calculated
   using the Benjamini–Hochberg FDR (at the individual resource level)
   separately for each BP-defining trait. The threshold of corrected
   statistical significance in both stages above was established at
   FDR < 0.05. Only genes showing directionally consistent associations
   with the same BP-defining trait in both cohorts were considered
   statistically significant (Fig. [523]S3).

Computational drug repurposing with Connectivity Map

   Using S-PrediXcan, we also generated summary statistics for
   gene-SBP/DBP/PP associations using meta-analysis GWAS summary
   statistics on SBP/DBP/PP from UK Biobank and ICBP^[524]22. For each
   BP-defining trait, we extracted its significant TWAS associations (FDR
   q-value < 0.05) from the pool of 6490 kidney genes and used this as a
   proxy for a trait signature. We then applied CMap algorithm^[525]52 to
   identify drugs capable of inducing and reversing the disease signature.
   Specifically, we queried BP TWAS association signals against the
   reference profiles in the CMap database (from L1000 assay)^[526]53,
   which recorded gene expression changes caused by perturbagens as the
   signature of the drug x gene pair. Only reference data from touchstone
   dataset were used, which comprised of reference signatures across nine
   cell lines treated with ~3000 small molecule drugs. To quantify the
   correlation between a query signature and reference profile, CMap
   calculated a τ-score. A negative τ-score indicates that the identified
   molecule can normalise trait signature and are capable of drug
   reposition, and vice versa. Only drug classes that survived a
   correction for multiple testing (calculated by the Benjamini–Hochberg
   FDR < 0.05) were considered as statistically significant.

Causal inference and fine mapping analyses

Mendelian randomisation with PMR-Egger

   We adopted a recent two-sample Mendelian randomisation method,
   PMR-Egger^[527]62, developed specifically to control for horizontal
   pleiotropy and the presence of multiple correlated instruments through
   a maximum likelihood inference framework^[528]62. As an input to each
   MR analysis, we used: (i) a summary of the associations between the
   instrumental SNPs and SBP/DBP/PP, calculated from a meta-analysis of
   ~750,000 individuals^[529]22 from UK Biobank and ICBP and (ii) a
   summary of the associations between the same SNPs and the selected BP
   TWAS genes generated in a kidney cis-eQTL analysis conducted using 478
   samples with informative genotype and transcriptome information from
   HKTR. In brief, kidney cis-eQTL analysis brought together 8,735,852
   genetic variants and 21,414 kidney genes. The normalised expression of
   each kidney gene was regressed against alternative allele dosage, age,
   sex, source of tissue indicator (nephrectomy/kidney biopsy), the top
   three principal components derived from genotyped autosomal variants,
   100 hidden factors estimated using PEER and seven kidney cell-types.
   Only variants within 1Mb-regions from the transcription start site of a
   gene were considered and the analysis was carried out using
   FastQTL^[530]152. For each MR analysis, all cis-SNPs were LD-clumped
   with r^2 < 0.5 and selected as instruments. The correction for multiple
   testing was calculated using the Benjamini–Hochberg FDR, and the
   threshold of corrected statistical significance was established at
   FDR  <  0.05. Genes with statistically significant causal effects and
   no evidence of pleiotropic effects estimated through PMR-Egger (i.e.,
   FDR > 0.05) were selected for further analyses.

Fine-mapping of transcriptome-wide association study associations

   We first mapped kidney genes prioritised through BP TWAS and PMR-Egger
   to pre-defined disjoint LD blocks^[531]151. Genes located within the
   MHC region were excluded. We then applied probabilistic fine-mapping to
   each of the independent genomic regions with more than one TWAS signal.
   Using Fine-mapping Of CaUsal gene Sets^[532]63 (FOCUS) we prioritised
   kidney genes within each locus while controlling for pleiotropic
   effects. Within each locus we computed a 95% credible set of causal
   genes and the posterior inclusion probability (PIP) for each
   gene^[533]63 (Fig. [534]S4). We retained all genes present in the
   credible set and with a PIP > 0.5 for further analysis. As an input
   into FOCUS analysis, we used (i) a weight database containing 6490
   validated kidney gene expression prediction models generated by kidney
   PUMICE from HKTR, (ii) a summary of the associations between SNPs and
   SBP/DBP/PP from the meta-analysis GWAS^[535]22 and (iii) 1000 genome
   reference panel.

Mapping TWAS signals to independent genomic regions

   We mapped TWAS signals to existing BP GWAS loci and genomic loci
   outside BP GWAS loci (BP TWAS loci) and examined whether a genomic
   region contained a single or multiple TWAS signals.

Integration of outputs from blood pressure kidney transcritome-wide
association studies with plasma metabolomics and proteomics

Association between kidney expression of solute carrier family 5
(sodium/inositol cotransporter), member 11 (SLC5A11) and serum glucose and
HbA1c

   Using the validated kidney PUMICE-derived prediction model for SLC5A11,
   we generated its GReX in UK Biobank. We then examined its association
   with serum glucose (Data-Field 30740) and HbA1c (Data-Field 30750)
   values from 294,159 and 321,435 unrelated UK Biobank individuals of
   white-European ethnicity (respectively). In brief, included in these
   analyses were individuals who passed the sample level quality control
   filters^[536]147. Prior to the analysis, circulating glucose levels and
   HbA1c were normalised using rank-based inverse normal transformation.
   We then examined the associations between predicted kidney expression
   of SLC5A11 and both glucose and HbA1c by regressing their normalised
   values on the SLC5A11 GReX adjusting for age, sex, genotyping array and
   the top ten genetic principal components (PCs) in respective linear
   regression models.

SLC5A11 kidney expression, plasma myo-inositol and pulse pressure – mediation
analysis using two-step Mendelian randomisation

   We conducted mediation analysis using two-step MR^[537]153,[538]154 to
   investigate the extent to which the effect of predicted kidney
   expression of SLC5A11 on PP may be mediated by plasma levels of
   myo-inositol. In brief, each step of the two-step MR is an independent
   univariable two-sample MR analysis. Penalised inverse-variance weighted
   (IVW) method^[539]155 was used to estimate the potentially causal
   effect of the exposure (e.g. kidney mRNA expression of SLC5A11) on the
   mediator (e.g. plasma myo-inositol) and the causal effect of the
   mediator on the outcome (e.g. PP) in the first and second step
   (respectively), separately. MR-Egger regression was used to detect
   horizontal pleiotropy in each MR analysis^[540]155. A modified version
   of the IVW and MR-Egger estimators^[541]156 that account for genetic
   correlations was used for any MR analysis with selected correlated
   instruments. The total effect of the exposure on the outcome was
   estimated using penalised IVW. The indirect effect (i.e. mediation
   effect) was calculated by multiplying the estimated effects from both
   steps. Mediation proportion (MP) was then calculated by dividing the
   indirect effect by the total effect. Standard errors of the estimated
   indirect effect and MP were derived using the delta
   method^[542]157,[543]158. In each MR analysis, significant causal
   effect was identified if (i) the effect estimate was statistically
   significant (P-value < 0.05) and (ii) there was no horizontal
   pleiotropy (P-value > 0.05).

   As an input, we used (i) summary statistics for SNPs associated with
   kidney expression of SLC5A11 in cis-eQTL analysis conducted in 478
   individuals from HKTR, (ii) GWAS summary statistics for plasma
   myo-inositol levels from INTERVAL and EPIC-Norfolk (n = 14,296)^[544]75
   and (iii) GWAS summary statistics for PP from UK Biobank and
   ICBP^[545]22 (n = 750,000). Moderately correlated SNPs associated with
   SLC5A11 expression (P-value < 1 × 10^−3 and LD r^2 < 0.5) were selected
   as instruments for estimating effects from SLC5A11 expression to
   myo-inositol and PP, respectively. Independent SNPs associated with
   myo-inositol (P-value < 5 × 10^−8 and LD r^2 < 0.1) were selected as
   instruments to estimate an effect from myo-inositol to PP.

Expression of AGMAT – tissue and cell-type enrichment in human tissues

   Tissue enrichment was examined in a panel of human tissues from the
   Human Protein Atlas (HPA) based on RNA tissue specificity
   classification^[546]159. The kidney cell-type enrichment of AGMAT was
   based on the HPA cell-type specificity definition
   ([547]https://www.proteinatlas.org/ENSG00000116771-AGMAT/single+cell+ty
   pe).

Analysis of association between kidney expression of AGMAT and blood urea
nitrogen

   We first generated GReX based on the validated kidney PUMICE-derived
   prediction model for AGMAT using UK Biobank imputed genotype
   data^[548]147. BUN values (mmol/L) were derived from blood urea values
   (mmol/L) (Data-Field 30530) measured in 321,409 unrelated UK Biobank
   individuals of white-European ethnicity surviving sample level quality
   control^[549]147 by multiplying the blood urea values by
   0.1554^[550]160. The BUN values were further normalised using
   rank-based inverse normal transformation. We then estimated the effect
   of AGMAT GReX on BUN by regressing normalised values of the latter on
   AGMAT GReX adjusting for age, sex, genotyping array and the top ten
   genetic principal components (PCs) in linear regression.

   We next estimated the effect of AGMAT expression on BUN in an
   independent cohort from CKDGen Cosnortium using two-sample MR. IVW was
   used to estimate the effect of AGMAT expression on BUN and MR-Egger
   regression was used to examine the presence of horizontal pleiotropy.
   For the purpose of the analysis, we used: (i) a summary of the
   associations between the instrumental SNPs and BUN, calculated from a
   meta-analysis of ~243,000 individuals^[551]160 from CKDgen (ii) summary
   statistics of the associations between the same SNPs and kidney
   expression of AGMAT generated in the kidney cis-eQTL analysis in 478
   samples from HKTR. The correction for multiple testing was conducted by
   Benjamini–Hochberg FDR on cis-eQTL associations on AGMAT expression.
   Independent SNPs associated with kidney expression of AGMAT (FDR < 0.05
   and LD r^2 < 0.1) were selected as instruments for the MR analysis.
   Significant causal effect was identified if the effect estimate is
   significant (P-value < 0.05) with no horizontal pleiotropy
   (P-value > 0.05).

AGMAT kidney mRNA expression, plasma concentrations of 4-guanidinobutanoate
and systolic blood pressure – mediation analysis using two-step Mendelian
randomisation

   We conducted a mediation analysis under the above MR framework to
   estimate whether the effect of kidney expression of AGMAT on PP is
   mediated by plasma 4-guanidinobutanoate. For the purpose of this
   analysis, we used (i) summary statistics for AGMAT expression in the
   kidney (from cis-eQTL analysis carried out in 478 samples from HKTR),
   (ii) summary statistics for plasma 4-guanidinobutanoate from GWAS in
   INTERVAL and EPIC-Norfolk (n = 14,296)^[552]75 and (iii) GWAS summary
   statistics for SBP from UK Biobank and ICBP^[553]22 (n = 750,000). We
   selected independent SNPs associated with kidney expression of AGMAT
   (P-value < 1 × 10^−2 and LD r^2 < 0.1) as instruments to estimate
   effects of AGMAT expression on SBP. Independent SNPs associated with
   kidney expression of AGMAT (P-value < 1×10^−3 and LD r^2 < 0.1) were
   selected as instruments for the purpose of estimating effects from
   AGMAT expression to 4-guanidinobutanoate. Independent SNPs associated
   with myo-inositol (P-value < 5×10^−8 and LD r^2 < 0.1) were selected as
   instruments in estimation of the effect of 4-guanidinobutanoate on SBP.

AGT kidney mRNA expression, angiotensinogen plasma protein levels and
systolic blood pressure – mediation analysis using two-step Mendelian
randomisation

   To determine whether plasma angiotensinogen protein levels mediate the
   effect of AGT kidney mRNA expression on SBP, we performed a mediation
   analysis under the MR framework described above. As an input, we used
   (i) summary statistics from cis-eQTL analysis of kidney AGT mRNA
   expression conducted in 478 samples from HKTR, (ii) GWAS summary
   statistics for angiotensinogen plasma protein levels from the Fenland
   study (n = 10,708)^[554]82 and (iii) GWAS summary statistics for PP
   from UK Biobank and ICBP^[555]22 (n = 750,000). MR analyses of the
   effects of kidney AGT expression on circulating angiotensinogen protein
   levels and SBP (respectively) were based on a selected set of
   independent (r^2  <  0.1) instrumental SNPs showing a significant
   (P-value  <  1 × 10^−3) association with AGT expression in the cis-eQTL
   analysis. In estimating effect of circulating concentrations of
   angiotensinogen protein on SBP we used independent SNPs associated with
   plasma angiotensinogen (P-value < 5 × 10^−8 and LD r^2 < 0.1) as
   instruments.

Characterisation of kidney microRNAome and kidney blood pressure
microRNAome-wide association studies (microRNA-TWAS)

Processing of miRNA-sequencing data and quality control

   A total of 379 samples from HKTR studies underwent small RNA-sequencing
   analysis to identify and quantify kidney miRNAs; all of these samples
   had matching genotype information. The libraries were generated using
   Illumina TruSeq and were sequenced using either 75 bp single-end reads
   from an Illumina NextSeq or 50 bp single-end reads from an Illumina
   HiSeq2500. In addition, 114 TCGA samples had kidney miRNA profiles
   available; all were sequenced using 30 bp single-end reads on an
   Illumina HiSeq2500^[556]161. Finally, 75 kidney miRNA profiles were
   secured from CPTAC resource^[557]162. These samples were processed
   using an Illumina HiSeq4000, as reported before^[558]162. The TCGA and
   CPTAC miRNA-sequencing reads were obtained from the GDC data portal and
   filtered using the following filters: (i) Program: TCGA or CPTAC,
   respectively, (ii) Primary site: kidney, (iii) Race: white, (iv) Sample
   type: solid tissue normal, (v) Experimental strategy: miRNA-sequencing
   and (vi) Data format: bam. In total, 114 TCGA and 75 CPTAC
   miRNA-sequencing datasets with corresponding genotyping data were
   downloaded.

   In all studies, adaptors were trimmed with Trimmomatic using standard
   Illumina adapter sequences (sliding window size: 4 bases; minimum
   window length: 15 bases)^[559]163. Next, Spliced Transcripts Alignment
   to a Reference (STAR) software was used to align reads and to quantify
   miRNA mature products into reads per million (RPM)^[560]164. miRNA
   mature product annotations were taken from miRbase (release 22) using
   the GRCh38 reference genome^[561]165. ComBat-seq was later used for
   batch effect correction^[562]166.

   Samples with total read number <2 million and a D-statistic >
   5^[563]167 were excluded from further analysis. A total of 489 samples
   with matching genotype data survived quality control (339 – HKTR, 85 –
   TCGA and 65 – CPTAC, Supplementary Data [564]35) and were included into
   the downstream analyses.

   A miRNA was retained for downstream analysis if its expression in RPM
   was >0.1 and read count ≥5 in at least 10% of the kidney samples in
   each population. miRNAs that did not fulfil these criteria, had an
   interquartile range of 0 or were located on sex chromosomes were
   excluded from further analysis. There were 1459 miRNAs that passed the
   quality control filters in HKTR, 967 in TCGA and 1459 in CPTAC. A total
   of 967 miRNAs common to all datasets were retained for further
   analyses.

   Expression data were normalised by logarithmic transformation, quantile
   normalisation (using the R package aroma.light^[565]168), and
   rank-based inverse normal transformation, as described before^[566]25.
   miRNA expression residuals from linear regression models were then
   further adjusted for (i) age, (ii) sex, (iii) source of tissue sample
   indicator (nephrectomy/pre-transplantation biopsy), (iv) the first
   three autosomal genetic principal components (calculated using the
   EIGENSTRAT^[567]130 and SNPWeights^[568]131 packages), and (v) 30 PEER
   factors^[569]124.

Mapping miRNA onto host genes and their classification

   miRNA annotations for the human (Homo sapiens) genome were obtained
   from miRbase (GRCh38; release 22)^[570]165 and genomic coordinates were
   used to map each miRNA onto gene positions curated from the Ensembl
   genome database (GRCh38; release 108)^[571]169. miRNAs located within
   the start and end coordinates of genes were defined as ‘intragenic’,
   and further divided into ‘intronic’ and ‘exonic’ based on their
   position within the host gene. Other miRNAs (not present within the
   body of any gene) were defined as ‘intergenic’.

Complexity of kidney microRNAome

   To determine relative transcriptomic complexity of miRNAs, we first
   divided all 21,414 kidney-expressed genes into simplified categories
   through manual mapping of Ensembl v83 “gene_biotype” values
   (Supplementary Data [572]36) and excluded 72 miRNAs and 23 other small
   non-coding RNAs identified using polyA RNA sequencing. In total, there
   were 15,205 protein-coding genes, 5063 long non-coding RNAs and 1051
   pseudogenes. For each biotype, we calculated the proportion of the
   transcriptome/microRNAome attributable to each kidney-expressed gene
   and ranked them in descending order. We quantified the cumulative
   proportion of transcriptomes for each biotype referable to each
   quartile (25, 50 and 75%) of expression.

miRNA tissue enrichment

   Human (Homo sapiens) small non-coding RNA expression information was
   curated from miRNATissueAtlas2^[573]86 – we identified a total of 2656
   mature miRNAs across 35 organs. Each miRNA was then assessed for
   enrichment using the HPA definitions^[574]159: miRNAs were categorised
   as ‘enriched’ if the expression in an organ was at least four times
   that of any other organ, ‘group enriched’ if the expression in a group
   of 2-5 organs was at least four times that of any other organ,
   ‘enhanced’ if the expression in an organ was at least four times the
   cross-tissue mean, ‘low specificity’ if the expression in at least one
   organ had log[2] RPM > 1 but did not belong to any previous category,
   and ‘not detected’ if the expression in all organs had log[2] RPM < 1.

Kidney blood pressure microRNAome-wide association studies

   We used elastic net regression under Predixcan framework^[575]126 to
   develop prediction models for kidney miRNAs expression using (i)
   genotype data and (ii) the adjusted residuals of renal miRNA expression
   data derived from normalised small RNA-sequencing profiles, as reported
   above. Consistent with the computational pipeline developed for the
   purpose of BP kidney TWAS, all HKTR samples with informative kidney
   miRNA profiles and matching genotype information were used as a
   discovery resource (n = 339) while samples from NIH resources (TCGA and
   CPTAC, n = 150) were used for the purpose of validation. The prediction
   models of expression for 1459 kidney miRNAs in the discovery resource
   were produced using nested cross validated elastic net regression.

   The prediction models with nested cross validated Pearson’s correlation
   coefficient > 0.1 and P-value < 0.05 in the discovery resource were
   retained for further analysis. They were further examined in the
   validation resource; the residuals of miRNA expression in the discovery
   cohort were correlated with their respective GReX in the validation
   resource. The prediction models with Pearson’s correlation coefficient
   >0.1 in the validation resource were retained for further downstream
   analysis.

   Overall, there were 201 prediction models for kidney miRNA expression
   in the discovery resource. Of these, 80 were validated and retained for
   kidney microRNA-TWAS of BP. As an input into this analysis, we used the
   GWAS summary statistics for SBP, DBP and PP from UK Biobank
   (n = 337,422) and ICBP (n = 299,024), as reported above. We first
   examined associations between all 80 kidney miRNAs with fully validated
   GReX models and each of three BP-defining traits in each of the two
   resources, separately. The correction for multiple testing was based on
   the Benjamini–Hochberg FDR. Kidney miRNAs whose predicted expression
   retained its significant association with the same BP trait in one of
   the resources were then taken for a reciprocal replication in the other
   resource. The correction for multiple testing was conducted using
   Benjamini–Hochberg FDR separately for each BP-defining trait. Only
   miRNAs showing directionally consistent associations with the same
   BP-defining trait in both resources were considered statistically
   significant (Fig. [576]S7).

Kidney miRNAs associated with blood pressure – biological annotations

   To provide functional context to the detected associations between
   kidney miRNAs and BP we have examined their (i) genetic origin, (ii)
   magnitude of their kidney expression, (iii) tissue specificity and (iv)
   genetic imputability. To determine the relative expression of
   BP-associated miRNAs in the kidney, we quantified mean expression
   values for 1459 renal miRNAs using 339 samples in the discovery
   resource and scaled the values between 0 and 1. To characterise kidney
   specificity, we retrieved miRNA expression data for multiple tissues
   from miRNATissueAtlas2^[577]52. We derived a mean value of expression
   for each miRNA in each organ and inferred specificity by calculating
   the fold-change of mean expression in the kidney against the
   cross-tissue mean. To quantify the imputability of miRNAs, we assessed
   the performance of their prediction models using nested cross-validated
   correlation between the predicted and measured levels of expression in
   the kidney.

Blood pressure-associated kidney miRNAs and the neighbouring genes –
conditional analyses

   We sought to examine the extent to which the identified signal of
   associations between BP and 11 kidney miRNAs may be mediated by (i) the
   host genes (i.e. genes whose sequences overlap with miRNAs), (ii) all
   BP kidney TWAS genes sharing the same chromosomal region with the
   miRNA. For each BP TWAS miRNA, we first identified all validated
   imputable protein-coding genes whose TSS map within 500 kb of the TSS
   of the BP TWAS miRNA. For further analyses we selected either the host
   gene (if acting as BP kidney TWAS gene) or each of the BP kidney TWAS
   genes within the 500 kb boundaries.

   As an input into the conditional analysis, we used (i) a Z-score for
   the BP TWAS miRNA (obtained from the BP microRNA-TWAS showing the most
   significant association), denoted Z[miRNA], (ii) a Z-score for the
   corresponding selected gene (obtained from the BP TWAS), denoted
   Z[gene], and (iii) cis-regulated genetic correlation (r) of the BP TWAS
   miRNA and the corresponding selected gene.

   We conditioned BP TWAS gene associations on the corresponding miRNA to
   obtain conditional Z-scores, i.e. Z[gene|miRNA] using the following
   formula^[578]170:
   [MATH: <mspace
   width="0.25em"></mspace><msub><mrow><mi>Z</mi></mrow><mrow><mi
   mathvariant="normal">gene</mi><mo>∣</mo><mi
   mathvariant="normal">miRNA</mi></mrow></msub><mo>=</mo><mfrac><mrow><ms
   ub><mrow><mi>Z</mi></mrow><mrow><mi
   mathvariant="normal">gene</mi></mrow></msub><mo>−</mo><mi>r</mi><mo>×</
   mo><msub><mrow><mi>Z</mi></mrow><mrow><mi
   mathvariant="normal">miRNA</mi></mrow></msub></mrow><mrow><msqrt><mrow>
   <mn>1</mn><mo>−</mo><msup><mrow><mi>r</mi></mrow><mrow><mn>2</mn></mrow
   ></msup></mrow></msqrt></mrow></mfrac> :MATH]
   2

   The cis-regulated genetic correlation was calculated from the Pearson’s
   coefficients of correlation between imputed miRNA expression and
   imputed gene expressions for individuals from 1000 Genome by applying
   weights of their imputation models to the genotype data. A significant
   mediation effect was identified if (i), |r| > 0.1, (ii) |Z[gene]| >
   |Z[gene|miRNA]| > 2, and (iii), the direction of Z[gene|miRNA] was
   consistent with Z[gene]^[579]171.

Characterisation of kidney proteome and blood pressure kidney proteome-wide
association studies

CPTAC human kidney tissue proteomics – data generation, quality control and
quantification

   The abundance of human kidney tissue proteins were obtained from the
   CPTAC pre-processed protein-level assembly^[580]136 and downloaded from
   the proteomic data commons (data accessed May 2022,
   [581]https://pdc.cancer.gov/pdc/study/PDC000127). The details of the
   biochemical analyses are reported in full in the original
   publication^[582]136. In brief, we made use of the NAT samples (taken
   from regions of the kidney adjacent to renal clear cell tumours) and
   reviewed by a board-certified pathologist (to confirm the histological
   status)^[583]136. Each kidney tissue sample was homogenised, lysed,
   digested and trypsinised. Samples were then multiplexed using tandem
   mass tag (TMT) and fractionated by basic reversed-phase liquid
   chromatography (bRPLC)^[584]136. Peptides were then separated by
   ultra-high-performance liquid chromatography (UHPLC) and analysed using
   the Thermo Fusion Lumos mass spectrometer^[585]136. The protein-level
   assembly of spectra and peptides into estimates of protein abundance
   was performed by a software workflow using the Philosopher
   pipeline^[586]172 including spectral search by MSFragger^[587]173 and
   refinement by PeptideProphet^[588]174. Peptide spectral match data for
   each sample were then normalised by log[2]-transformed reference
   intensities determined for each TMT channel. A total of 83 NAT samples
   had raw protein-level abundance data; 72 of these had genotype
   information from whole genome sequencing available and 65 of these also
   had poly-A RNA-sequencing data. All samples used in this analysis were
   collected from patients of European ancestry. A total of 7291 proteins
   were quantifiable in all 72 samples and 7036 of these from 65 samples
   were available with measurable expression of their source gene in the
   RNA-sequencing data.

General characteristics of kidney proteins

   We have first assigned each of the quantified kidney proteins into
   three predicted localisation categories (intracellular, membrane and
   secreted) provided by the HPA
   [589]https://www.proteinatlas.org/about/download (data accessed May
   2022). We then examined the extent to which proteins from the CPTAC
   dataset are enriched for varying degrees of kidney specific expression.
   Kidney enriched, group enriched and kidney enhanced genes were
   downloaded from the HPA (data accessed May 2022). We then used Fisher’s
   exact test to calculate the enrichment odds ratio (with corresponding
   P-values) illustrative of the extent to which each of the
   kidney-specificity categories are over-represented amongst the proteins
   quantified in the CPTAC dataset compared to the whole HPA.

Correlation between kidney gene expression and protein abundance

   Using 65 CPTAC samples with overlapping transcriptome/proteome
   information we examined Pearson’s correlation between abundances of
   each of 7036 proteins (normalised intensity) and log[2]-transformed
   estimated read counts of their parent genes (from the matching kidney
   RNA-sequencing data). These were then catalogued across the entire
   distribution of the observed correlations – from the most negative
   (r = -0.54, P-value = 9.9 × 10^−7) to the most positive (r = 0.92,
   P-value = 1.6 × 10^−29) and further examined using a density plot. An
   FDR threshold of 5% was used to adjust for multiple testing and
   determine the statistical significance of the identified correlations.

   We examined whether genes of relevance to BP/hypertension are enriched
   for positive kidney gene-protein correlations within the CPTAC
   catalogue of 7036 gene-protein pairs. We selected four categories of
   relevance to BP/hypertension: (i) 216 genes with enriched or enhanced
   expression in kidney from HPA (ii) eight monogenic
   hypertension/hypotension genes collected from the manual review of
   Samani and Tomaszewski et al^[590]91, (iii) seven antihypertensive drug
   targets manually collected from drugbank
   ([591]https://go.drugbank.com/) and the OpenTargets platform
   ([592]https://platform.opentargets.org/), (iv) 152 protein-coding BP
   kidney TWAS genes selected from the list of 399 BP TWAS genes showing
   causal association to BP in MR analyses. The enrichment P-value was
   calculated by a two-sample Kolmogorov–Smirnov test.

Kidney proteome-wide association studies of blood pressure in Clinical
Proteomic Tumor Analysis Consortium

   Using 72 kidney proteome profiles with matching genotype information
   from CPTAC as a reference we developed prediction models for protein
   abundance in the kidney tissue (Fig. [593]S8). All individuals included
   in this analysis were of white-European ancestry.

   Of 7291 proteins whose abundance in the kidney tissue were quantified
   by mass spectrometer based tissue proteomics, included were those with
   matching expression data for their parent gene in the RNA-sequencing
   dataset from HKTR. A total of 6712 such kidney proteins were identified
   for the downstream analyses (Fig. [594]6E).

   Prior to the analysis, we used linear regression to adjust kidney
   abundance of each included protein for age, sex, the top three
   principal components derived from genotyped autosomal variants and the
   top ten principal components derived from normalised kidney abundance.
   The generated residuals of 6712 proteins were then included as an input
   in the protein abundance model prediction together with genotype
   information on 5,785,933 variants derived from the whole genome
   sequencing data (Fig. [595]6E). Variant positions were lifted to
   GRCh37. We retained all variants present in the 1000 Genome reference
   panel (GRCh37) and that were also available in the BP GWAS analyses (UK
   Biobank and ICBP).

   We trained prediction models of kidney protein abundance using the
   genotype and protein abundance data from CPTAC using the PrediXcan
   framework^[596]126 (Fig. [597]S8). We kept predictive models with
   nested cross validated correlation between predicted and actual levels
   > 0.1 and P-value of the Pearson’s correlation test <0.05
   (Fig. [598]S8). Those that satisfied these criteria were included in
   kidney PWAS of BP. For the purpose of this analysis, we used the GWAS
   summary statistics on SBP, DBP and PP from UK Biobank and ICBP reported
   above. In brief, we included 337,422 individuals from UK Biobank and
   299,024 from ICBP in this analysis (Figs. [599]6E and [600]S8). We
   first examined association between all 815 cross-validated kidney
   protein abundance models with each of three BP defining traits in each
   of the two resources, separately. The correction for multiple testing
   was applied based on the Benjamini–Hochberg FDR. Kidney proteins whose
   predicted abundance retained its significant association with BP in one
   of the resources were then taken for a reciprocal replication in the
   other resource (Fig. [601]S8). The correction for multiple testing was
   applied again at this stage and was based on the Benjamini–Hochberg
   FDR. The threshold of corrected statistical significance in both stages
   was established at FDR < 0.05.

Annotating blood pressure-associated kidney proteins with additional levels
of evidence for relevance to blood pressure

   We sought to identify additional evidence for each BP-associated kidney
   protein at other molecular levels, i.e.: (i) association with BP in BP
   kidney TWAS and (ii) prior evidence of colocalisation between BP and
   any other molecular phenotypes (i.e. mRNA expression, alternative
   splicing or DNA methylation) from previous kidney QTL studies^[602]25.
   We further partitioned BP-associated kidney proteins with no
   association with BP in kidney TWAS into two categories (i) those that
   yielded non-significant associations with BP in TWAS and (ii) those
   whose GReX model did not fully converge prior to TWAS.

Kidney proteins associated with blood pressure in proteome-wide association
studies –overrepresentation pathway analysis

   Overrepresentation analysis was performed by the “fora” function in the
   “fgsea” R package. All KEGG pathways from MSigDB subcategory “C2:KEGG”
   and all Human Phenotype Ontology (HPO) terms from MSigDB subcategory
   “HPO” with more than ten genes and fewer than 1000 genes were tested.
   The enrichment odds ratio and the respective P-value were calculated by
   a hypergeometric test. Nominal P-values were adjusted by the FDR method
   and the corrected significance threshold was set at 5% FDR. KEGG
   pathways were manually grouped and labelled into classes based on the
   known functions of genes overlapping with each pathway. HPO terms were
   grouped into a category distinct from KEGG pathways.

Kidney proteins associated with blood pressure in proteome-wide association
studies – drug tractability enrichment

   Drug tractability data was collected from the OpenTargets ftp server
   ([603]http://ftp.ebi.ac.uk/pub/databases/opentargets/platform/latest/in
   put/target-inputs/tractability/tractability.tsv, data accessed 05/2023)
   and was a product of the OpenTargets tractability pipeline
   ([604]https://github.com/chembl/tractability_pipeline_v2). Tractability
   categories for each modality were extracted from the columns
   “Category_sm”, “Category_ab”, “Category_PROTAC” and manually
   relabelled. The enrichment analysis was performed using 100,000
   permutations for each of the eight categories. For each permutation a
   random sample of genes was selected (without replacement) of size equal
   to the number of genes present in the category being tested and then
   the number of genes assigned to that category was recorded. P-values
   were calculated by one-sided permutation test for positive enrichment
   which is calculated as the proportion of permuted values which exceed
   the test value (the number of BP PWAS protein belonging to the category
   in question) divided by the number of permutations performed. For
   visualisation, the distribution of all permuted values (per category)
   was smoothed using the function “stat_density_ridges” from the
   “ggridges” R package with a bandwidth of 0.4.

Characerisation of urinary cell transcriptome

Urinary cell collection, purification and RNA extraction

   Demographic information for all individuals included in this analysis
   is presented in Supplementary Data [605]25. In brief, 100 ml of fresh
   urine was collected and refrigerated (up to a maximum of 2 hours) until
   centrifugation at 2000 g for 30 minutes at 4°C. The supernatant was
   then discarded, and the pelletised material resuspended in 1 ml of PBS
   minus (Sigma-Aldrich). The sample was then spun a second time at
   10,000 g for 4 minutes at room temperature. The supernatant was again
   removed and 1 ml of a mixture of 70% Hank’s balanced salt solution
   (HBSS), 20% foetal calf serum (FCS) and 10% dimethyl sulfoxide (DMSO)
   before storage at -80°C. RNA was extracted using the Qiagen RNeasy kit,
   following the standard protocol. Urinary cell RNA concentration and
   purity was quantified by NanoDrop and RIN scores estimated by
   TapeStation (Agilent) with a high-sensitivity assay.

Urinary cell RNA-sequencing and data processing

   Sequencing libraries were generated using the New England Biolabs (NEB)
   poly-A selection kit for Illumina sequencers. The gene expression
   quantification protocols was identical to that used by the GTEx for
   version 8 of their data release^[606]122. This protocol was selected so
   the urinary cell expression data could be compared with the GTEx gene
   expression profiles without introducing a potential bias introduced by
   the differences in the quantification protocol. Briefly, the read
   alignment to the genome was conducted using STAR^[607]164, read
   deduplication and quantification – by RNASeQC (against the GTEx v8
   transcriptome reference in units of TPM at the gene level). Our quality
   control process excluded samples with (i) less than ten million paired
   reads, (ii) less than 2 million mapped paired reads and (iii) those
   with a D-statistic greater than 5. A total of 33 urinary cell pellet
   samples passed all quality control criteria. Expressed genes were
   defined as those with greater than 6 mapped reads and a TPM > 0.1 in
   more than 20% of samples; 21,981 genes passed all quality control
   filters and were used for the downstream analyses.

Salivary cell RNA-sequencing and RNA-sequencing metric calculation

   Salivary samples were obtained from the publicly available GEO series
   ([608]GSE108664) of longitudinally collected saliva specimens sampled
   before administration of a pneumococcal vaccination^[609]175. We used
   all available 40 samples collected before administration of the
   vaccination (“pre-vaccination”). The NCBI SRA-toolkit was used to
   download and extract raw fastq data (“fasterq-dump”) for all 40
   samples. The raw fastq was then processed using the same pipeline
   applied to the urinary cell and GTEx tissue samples. Gene expression
   was also quantified in an identical manner. RNA-sequencing metrics were
   calculated by RNASeQC. RNASeQC metrics were manually grouped by the
   range or units of reported values into 3 groups: (1) Read and fragment
   count metrics [0-100 million reads/fragments], (2) rates [0.0–1.0
   frequency] and (3) small value metrics [0-2.0 units].

Urinary cell transcriptome – overall characteristics

   We assigned one of four biotype categories to each expressed gene by
   grouping all Ensembl v83 “gene_biotype” values using a manual mapping
   (Supplementary Data [610]36). We then examined the complexity of the
   urinary cell and kidney tissue transcriptome by calculating the
   cumulative proportion of each transcriptome attributable to each
   expressed gene and ranking them from highest to lowest expression. We
   then quantified the cumulative proportion of each transcriptome which
   is attributable to the top 25, 50 and 75% of expressed genes.

   To examine whether transcriptomic profiles of urinary cell pellets
   collected from patients undergoing cancer nephrectomies differ from
   those who did not have cancer we used hierarchical clustering of sample
   similarity values. We used Pearson distances to determine an inverse
   measure of urinary cell sample similarity. Pearson distances were
   calculated as (1 – Pearson correlation coefficient) between all sample
   pairs, using log[2] TPM data from all genes expressed in urine. These
   distances were then clustered hierarchically using the complete linkage
   method for cluster agglomeration as implemented in the R function
   “hclust”; samples were annotated as “nephrectomy” (to indicate that the
   urinary cell sediment was secured from a patient undergoing surgical
   removal of the kidney due to cancer) and “biopsy” (to indicate that the
   urinary cell pellet was secured from a donor prior to kidney
   transplantation).

Urinary cell overrepresented pathways

   The top 100 expressed genes, by median TPM, were identified in both (i)
   33 urinary cell samples and (ii) 430 kidneys from HKTR^[611]25. The top
   100 HGNC gene symbols from each were then examined by an
   overrepresentation analysis using the Database for Annotation,
   Visualization and Integrated Discovery (DAVID)^[612]176 functional
   annotation tool with the KEGG pathways and Gene Ontology biological
   process term annotations. The results significant at 5% FDR were
   grouped by shared genes and were manually assigned to specific
   biological themes using known functionality of overlapping genes.

Analysis of correlation between the transcriptomic footprint of urinary cells
and different human tissues/cell-types

   We used the publicly available RNA-sequencing-derived median gene
   expression profiles for each of the 54 different GTEx tissues from the
   GTEx portal (accessed 09/2021). The gene expression data were processed
   using the standard GTEx computational pipeline^[613]122. In brief, gene
   expression was quantified by RNASeQC in TPM units at the sample level
   and then a median value was calculated across all samples from each
   tissue, for all genes. The urinary cell pellet expression profile from
   33 individuals was summarised in an identical way to calculate a median
   gene expression profile. All 19,273 protein-coding genes from the GTEx
   v8 transcriptome reference were used in the correlation analysis.
   Overall transcriptomic correlation was calculated as the squared
   Spearman’s correlation coefficient between the urinary cell pellet
   profile and the profile for each GTEx tissue. Tissues were then ordered
   according to the strength of their transcriptomic correlation with
   urinary cell pellets. To validate the findings on correlation between
   transcriptome of urinary cells and the kidney tissues from GTEx we then
   used RNA-sequencing data from our in-house collection of 430 kidney
   samples (HKTR)^[614]25 quantified and pre-processed according to the
   same pipeline used for GTEx samples and urinary cell pellets. Overall
   transcriptome correlation was calculated between urinary cell pellets
   and the kidney profile from the HKTR resource using all 19,273
   protein-coding genes from the GTEx v8 transcriptome reference.

Expression of markers for specific segments of the nephron in urinary cells

   To investigate whether cellular components of different kidney regions
   are present within the cells harvested from urine, we first selected
   marker genes by renal cell-type (either enriched or enhanced for any
   specific kidney cell-type), from the HPA^[615]177. Marker genes for
   cortex were selected from the following cell-types - “Proximal tubule”,
   “Glomerulus”, “Distal tubule”. Medullary marker genes were selected
   from the “Loop of Henle”, “Collecting duct” cell-types. Marker genes
   for cortex and medulla were then pruned by the following criteria: (i)
   median expression <than 1 log[2](TPM + 1) and (ii) any significant
   expression ( > 10 log[2](TPM + 1)) in a renal cell-type from the
   opposite region. Finally, marker genes were trimmed to the top 20 most
   highly expressed, per region. Gene expression for each tissue (in
   log[2](TPM + 1) units) was standardised before visualisation. Genes
   were ordered (within each kidney region) by hierarchical clustering of
   a Euclidean distance matrix calculated from the marker gene expression
   profile of all 3 tissues by the R function “hclust” using the complete
   linkage method.

Analysis of correlation in expression of genes causally associated with blood
pressure between the kidney and urinary cells

   We first identified that out of 399 BP TWAS genes, 339 had detectable
   expression in urinary cells. We used calculated median expression in
   log[2](TPM + 1) values for these genes from our urinary cell samples
   dataset (n = 33) and kidney tissue samples (n = 430) and then fitted a
   linear regression line (with kidney expression as the dependent
   variable and urinary cell expression as the independent variable). The
   ten genes with the smallest deviation (defined as smallest absolute
   residual) from the regression line were then identified and labelled.

Glutamyl aminopeptidase gene (ENPEP) and blood pressure – blood pressure
kidney transcriptome-wide association studies, blood pressure kidney
proteome-wide association studies and rare variant analyses

ENPEP expression across human tissues

   Expression of ENPEP across 54 human tissues from the HPA was based on
   the reclassified HPA gene expression enrichment classification.

Association between kidney expression of ENPEP and urinary sodium excretion

   Using UK Biobank imputed genotype data^[616]147, we generated GReX
   based on the validated kidney PUMICE-derived prediction model for
   ENPEP. Urinary sodium excretion values (Data-Field 30530) from 326,986
   unrelated UK Biobank individuals of white-European ethnicity with
   surviving sample level quality control^[617]147 were normalised using
   rank-based inverse normal transformation. We then estimated the effect
   of ENPEP GReX on urinary sodium excretion by regressing normalised
   values of the latter on ENPEP GReX adjusting for age, sex, genotyping
   array and the top ten genetic principal components in linear
   regression.

ENPEP gene expression, protein abundance and blood pressure – multi-trait
colocalisation

   To assess shared genetic effects across ENPEP expression and protein
   abundance and DBP, we conducted multi-trait colocalisation using
   HyPrColoc^[618]178 (v.1.0.0) with default settings. In brief, HyPrColoc
   implements an efficient deterministic Bayesian algorithm to detect
   colocalisation across multiple traits. The colocalisation analyses were
   performed using summary statistics from cis-eQTL (from HKTR, n = 478)
   and cis-protein quantitative trait loci (pQTL) (from CPTAC, n = 72) of
   glutamyl aminopeptidase and meta-analysis BP GWAS from UK Biobank and
   ICBP^[619]22 (n = ~750,000) in pre-defined disjoint LD blocks^[620]151.
   Only cases with > 50 overlapping variants were considered. Posterior
   probability > 0.8 was considered as a signal of colocalisation across
   traits consistent with all the traits sharing a causal variant within
   the tested region.

   The summary statistics for ENPEP mRNA expression were collected from
   cis-eQTL analysis conducted using 478 HKTR kidney samples, as reported
   above. The summary statistics for protein abundance of glutamyl
   aminopeptidase were obtained from cis-pQTL analysis of 72 CPTAC kidney
   samples with informative genotype and protein abundance. The
   log-transformed protein abundance was regressed against effect of
   allele dosage, age, sex, the top three genetic principal components
   derived from genotyped autosomal variants and top ten principal
   components derived from normalised protein abundance. Only genetic
   variants with minor allele count > 5, missingness rate <0.05 and within
   500 kb from the transcription start site of ENPEP were considered. A
   total of 2478 genetic variants satisfied these criteria and were
   included in the cis-pQTL of glutamyl aminopeptidase.

Homology modelling of truncated protein

   All PDB structures linked to the human ENPEP gene^[621]179 in the
   UniProtKB database (release 2023_01)^[622]180 (4kx7, 4kx8, 4kx9, 4kxa,
   4kxb, 4kxc, 4kxd) were downloaded. All of them were resolved using
   X-ray crystallography, and have a resolution between 2.15 Å and 2.40 Å.
   The crystal structures were assessed using
   MolProbity^[623]181–[624]183: (1) hydrogens were added, (2) Asn, Gln
   and His side-chain flips were allowed, and 3) the all-atom contacts and
   the geometry of the optimised structures were analysed. The structure
   with the fewer warnings in the different validation categories, and the
   best MolProbity score was selected to be the template for homology
   modelling: the optimised 4kx7 structure.

   The protein sequences of the protein structure and the truncated
   protein were aligned using the Smith-Waterman algorithm^[625]184 as
   implemented in EMBOSS^[626]185. The default parameters (BLOSUM62
   substitution matrix, gap opening penalty = 10, and gap extension
   penalty = 0.5) were used.

   Ten structural models were built using MODELLER (version
   10.4)^[627]186. We used the option of slow (thorough) Variable Target
   Function Method and Molecular Dynamics optimisations. We repeated the
   whole optimisation cycle twice.

   All ten structural models were assessed using MolProbity as previously
   mentioned; briefly, 1) hydrogens were added, 2) flips were allowed, and
   3) the contacts and geometry were assessed. Again, the model with the
   fewer warnings, and the best MolProbity score was selected.

Analysis and visualisation of the model of the truncated protein

   Canonical and truncated protein structures were visualised with
   ChimeraX (version 1.6)^[628]187. CATH (version 4.3)^[629]188
   annotations were used to identify the different structural domains
   within the canonical and truncated proteins. An assessment of the
   hydropathy of the surface of the protein structures was performed by
   scoring each residue with the Kyte and Doolittle scale^[630]189. The
   annotation on functional sites was collated from the UniProtKB
   database^[631]180.

Effect of rs33966350 on ENPEP mRNA expression, protein abundance, urinary
sodium excretion and disease endpoints from FinnGen

   We first collected phenome-wide association results for variant
   rs33066350 from the FinnGen release R9 (r9.finngen.fi), which includes
   data from 377,277 individuals and 2269 disease endpoints. Effect of
   rs33066350 on each of the disease endpoint was estimated under the
   additive GWAS model containing age, sex, ten genetic PCs and genotyping
   batch as covariates. The correction for multiple testing was calculated
   using Bonferroni-adjusted P-value  <  0.05 after adjustment for the
   total number of disease endpoints.

   We used linear regression to assess the effect of rs33966350 on urinary
   excretion of sodium in 326,986 biologically unrelated white-European
   individuals who survived sample level quality control^[632]147 in UK
   Biobank. Urinary sodium excretion (Data-Field 30530) underwent
   rank-based inverse normal transformation. We then regressed normalised
   urinary sodium excretion on genotype of rs33966350 (under additive
   model of inheritance) adjusting for age, sex, genotyping array and the
   top ten genetic PCs generated from autosomal genotype data.

   We next estimated the effect of rare loss-of-function variant
   (rs33966350) on ENPEP mRNA expression in 478 individuals from HKTR. In
   the absence of carriers of AA genotype in this dataset, we regressed
   the RNA-sequencing-derived normalised renal ENPEP expression on rare
   allele dosage adjusting for age, sex, tissue source
   (nephrectomy/biopsy), the first three genetic principal components, the
   100 PEER hidden factors and seven cell-type proportions. The normalised
   ENPEP expression values were derived by logarithmic transformation,
   quantile normalisation and rank-based inverse normal transformation. We
   also evaluated the effect of rs33966350 on ENPEP mRNA expression in an
   independent cohort (CPTAC, n = 60) by regressing the normalised renal
   ENPEP expression on genotype of rs33966350 (under additive model of
   inheritance) adjusting for age, sex, the top three genetic principal
   components derived from genotyped autosomal variants and top ten
   principal components derived from normalised gene expression. The
   normalised ENPEP expression values in CPTAC were obtained by
   logarithmic transformation.

   We assessed the effect of rs33966350 on glutamyl aminopeptidase protein
   in the same cohort (CPTAC, n = 67) by regressing the normalised renal
   glutamyl aminopeptidase protein abundance on genotype of rs33966350
   (under additive model of inheritance) adjusting for the same covariates
   as above (i.e. those used in the analysis of rs33966350 effect on ENPEP
   mRNA expression in CPTAC). The normalised glutamyl aminopeptidase
   protein abundance in CPTAC were derived by logarithmic transformation.

UK Biobank ethical compliance

   UK Biobank has approval from the North West Multi-centre Research
   Ethics Committee (MREC) to obtain and disseminate data and samples from
   the participants, and these ethical regulations cover the work in this
   study. Written informed consent was obtained from all participants.

Reporting summary

   Further information on research design is available in the [633]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [634]Supplementary information^ (1.7MB, pdf)
   [635]Peer Review File^ (825.6KB, pdf)
   [636]41467_2024_46132_MOESM3_ESM.pdf^ (129.8KB, pdf)

   Description of Additional Supplementary Files
   [637]Supplementary Data 1-36^ (2.1MB, xlsx)
   [638]Reporting Summary^ (2.6MB, pdf)

Source data

   [639]Source Data^ (10.4KB, xlsx)

Acknowledgements