Abstract Varicose veins affect one-third of Western society, with a significant subset of patients developing venous ulceration, costing $14.9 billion annually in the USA. Current management consists of either compression stockings, or surgical ablation for more advanced disease. Most varicose veins patients report a positive family history, and heritability is ~17%. We describe the largest two-stage genome-wide association study of varicose veins in 401,656 individuals from UK Biobank, and replication in 408,969 individuals from 23andMe (total 135,514 cases and 675,111 controls). Forty-nine signals at 46 susceptibility loci were discovered. We map 237 genes to these loci, several of which are biologically plausible and tractable to therapeutic targeting. Pathway analysis identified enrichment in extracellular matrix biology, inflammation, (lymph)angiogenesis, vascular smooth muscle cell migration, and apoptosis. Using a polygenic risk score (PRS) derived in an independent cohort, we demonstrate its predictive utility and correlation with varicose veins surgery. Subject terms: Peripheral vascular disease, Valvular disease, Genome-wide association studies __________________________________________________________________ Although varicose veins are a common condition, the genetic basis is not well understood. Here, the authors find genetic variants associated with varicose veins and show that a higher polygenic risk score for varicose veins correlates with a greater likelihood of patients undergoing surgical treatment. Introduction Varicose veins (VVs) are a very common manifestation of chronic venous disease, affecting over 30% of the population in Western countries^[42]1. In the USA, chronic venous disease affects over 11 million males and 22 million females aged 40–80 years^[43]2, meaning it is twice as prevalent as coronary heart disease^[44]3. Chronic venous insufficiency leads to serious complications in 10% of cases, including lipodermatosclerosis, venous ulceration and rarely amputation. Despite best care, 25–50% of venous leg ulcers remain unhealed after six months of treatment^[45]4. Ongoing management of venous leg ulcers costs around $14.9 billion annually and 4.5 million work days per year are lost to venous-related illness in the USA^[46]5,[47]6. Despite this, at present no medical treatments exist for VVs. For symptomatic patients, endovenous ablation is the first-line treatment approach^[48]7. However, recurrence following surgery is 20%, with no difference in recurrence compared to conventional open surgery^[49]8. VVs are thought to develop from a combination of valvular insufficiency, venous wall alterations, and haemodynamic changes that precipitate venous reflux, stasis and hypertension of the venous network, causing varicosities^[50]9,[51]10. Risk factors for VVs include older age, female sex, pregnancy, a positive family history, obesity, tall height and previous deep-vein thrombosis (DVT)^[52]3,[53]11,[54]12. Many patients with VVs report a positive family history^[55]13, and among offspring with one affected parent the familial standardised incidence ratio is 2.39^[56]14, with a heritability of 17%^[57]15, suggesting a genetic component to aetiology. Two recent genome-wide association studies (GWAS) of VVs have been described. Ellinghaus et al.^[58]16 tested for associations in 323 cases and 4619 controls, with suggestive associations examined in an independent cohort totalling 1946 cases and 3146 controls. They reported two associations, mapped to EFEMP1 and KCNH8. Fukaya et al.^[59]17 used UK Biobank to identify a further 30 putative associations associated with VVs. However, their cases were defined only by the International Classification of Diseases (ICD) diagnostic codes, meaning that thousands of cases defined by operation codes were misclassified as controls^[60]18. Moreover, the genetic associations discovered were not replicated in an independent cohort. To advance substantially our understanding of the aetiology and genetic architecture of VVs, we performed the largest genome-wide association study (GWAS) of surgically confirmed VVs. We use data from participants from the UK Biobank (n = 401,656) and confirm top associations in a large independent cohort of research participants with self-reported VVs from 23andMe (n = 408,969). We examine patterns of expression of genes in clinically relevant biologic pathways and prioritise targets for therapeutic development. Furthermore, we derive a polygenic risk score (PRS) for VVs in an independent cohort (FinnGen), and demonstrate its predictive utility in UK Biobank. Results The overall analytic workflow is shown in Fig. [61]1. Fig. 1. Study design and analysis workflow. [62]Fig. 1 [63]Open in a new tab A discovery GWAS was performed in the UK Biobank cohort, with the top independent lead variants tested within the 23andMe replication cohort. Forty-nine independent variants at 46 loci met the Bonferroni-corrected threshold in the replication cohort, and were interrogated further in subsequent analyses. Association analysis Genome-wide testing of the UK Biobank discovery cohort (22,473 cases and 379,183 controls) yielded genome-wide significant associations (P < 5 × 10^−8) at 108 risk loci (Supplementary Data [64]1). Conditional regression yielded a further seven independent signals at six of the 108 loci associated significantly with VVs. As expected, the large discovery sample led to the λ[GC] showing inflation (1.25) (Supplementary Fig. [65]1). The linkage equilibrium score regression (LDSC) intercept (1.06), and attenuation ratio of 0.13 is consistent with polygenicity^[66]19. We estimated the total SNP heritability (h^2[g]) for VVs in UK Biobank to be 8.04% (S.E. = 0.17%) using a variance components method, and 5.03% (S.E. = 0.30%) using LDSC^[67]19. The estimated heritability in the 23andMe dataset (113,041 cases and 295,928 controls) was 5.40% (S.E. = 0.30%). We tested the top 116 signals at the 108 risk loci in the independent 23andMe dataset. Of the 116 associated variants, 106 passed QC and were available in the 23andMe summary statistics. Forty-nine of 116 variants demonstrated significant association at a conservative Bonferroni-corrected threshold of P < 4.72 × 10^−4. Thus, we identified 49 significant associations at 46 independent risk loci (Fig. [68]2; regional association plots for all 49 signals can be found in Supplementary Fig. [69]2). Allelic effects were concordant across both cohorts at all 49 replicated variants, with minimal evidence of heterogeneity between the two GWAS at all loci (Q-statistic > 0.05). Eighteen loci were previously reported (Supplementary Data [70]2). Sixty-seven variants at 67 loci did not replicate (Supplementary Data [71]3); however, there was high overall concordance in the effect sizes of the 106 SNPs between the two datasets (Pearson’s r^2 = 0.84, P < 1 × 10^−16; Supplementary Fig. [72]3A). Fig. 2. Results of genome-wide association study in varicose veins. [73]Fig. 2 [74]Open in a new tab Manhattan plot showing genome-wide P-values plotted against position on each of the autosomes. The dark blue, light blue and green dots refer to the discovery UK Biobank Cohort, with the red dots corresponding to the 49 variants from the 23andMe cohort at each replicated locus (shown in Supplementary Data [75]2). The dark blue peaks correspond to the 46 loci that replicated in the 23andMe cohort at a Bonferroni-corrected threshold of P < 4.72 × 10^−4. Candidate genes at each locus are named above each signal, with previously unreported genetic loci in blue, and previously described loci in black. In silico annotation To interrogate and annotate SNPs at our susceptibility loci, we used FUMA (Functional Mapping and Annotation of GWAS)^[76]20. FUMA identified 5315 genome-wide significant candidate SNPs from our discovery cohort associated with VVs at 45 of the 46 replicated loci (Supplementary Fig. [77]4; Supplementary Data [78]4). Around 2% of candidate SNPs (n = 103) were exonic, of which 56 were non-synonymous (52 missense, two stop-gain, one splice site variant, and one frameshift variant). Of the non-synonymous variants, four missense variants were predicted to affect protein structure or function, and were in moderate linkage disequilibrium (LD) (r^2 ≥ 0.22 and Dʹ ≥ 0.75) with the index SNP at three loci—12q13.12 (index SNP: rs7308356), 16q24.3 (index SNP: rs2002833) and 17q24.3 (index SNP: rs9895127) (Supplementary Data [79]5). This includes rs7184427 (A/G) (P = 9.10 × 10^−40, OR = 1.19, r^2[index] = 0.22, Dʹ[index] = 0.75), which causes a predicted deleterious p.Val250Ala substitution within PIEZO1 (SIFT^[80]21 score: 0). PIEZO1 encodes a mechanically active ion channel involved in the detection of vascular shear stress, and was previously associated with VVs and lymphoedema^[81]17,[82]22. Of the 4294 intronic and intergenic variants identified by FUMA (Supplementary Fig. [83]4), 3735 (87.0%) resided in open chromatin regions (Supplementary Data [84]4), and 163 demonstrated evidence of functionality with CADD (Combined Annotation-Dependent Depletion^[85]23) score ≥12.37, the threshold suggested for deleterious variants (Supplementary Data [86]6). Using RegulomeDB (RDB^[87]24) to investigate their regulatory functions, 17 had a RDB score of at least 2b (likely to affect binding) and eight had an RDB score of at least 1f (likely to affect binding and linked to expression of a gene target). To narrow down the list of probable functional variants, we performed functionally informed fine-mapping^[88]25 for each replicated locus. Eighteen SNPs were identified with posterior probability >95% (Supplementary Table [89]1), including two likely functional variants in PIEZO1 (rs112070238 and rs8053350), affecting the 5ʹUTR and promoter regions, respectively. Gene mapping Positional mapping in FUMA SNP2GENE highlighted 204 genes based on genomic position at 38 loci (Supplementary Data [90]7)^[91]20. eQTL mapping, based on GTEx v8 tibial artery tissue^[92]26, mapped a total of 80 genes. Genome-wide, gene-based association analysis implemented in MAGMA v1.07^[93]27 identified 248 protein-coding genes significantly associated with VVs (P < 2.67 × 10^−6); 117 were within our replicated loci (Supplementary Data [94]8; Supplementary Fig. [95]5). In the summary-based Mendelian randomisation (SMR) analysis^[96]28, testing was performed for 4946 probes with a cis-eQTL at P < 5 × 10^−8; a threshold of significance was set at P[SMR] < 1.01 × 10^−5 (0.05/4,946). Forty-four genes passed the set significance threshold (P[SMR] < 1.01 × 10^−5). To exclude SMR associations due to linkage, we performed HEIDI analysis across 44 significant genes −25 passed the HEIDI test (P[HEIDI] ≥ 1.12 × 10^−3; Supplementary Data [97]9), 12 of which were within our VVs susceptibility loci, highlighting an association with VVs through pleiotropy rather than LD and co-localisation. In summary, 237 unique genes were mapped by at least one mapping approach to 39 of the 46 susceptibility loci. Significant overlap between the mapping strategies was seen, with the majority of genes (54.9%, n = 130) being prioritised by two or more mapping approaches. Thirty-six genes were prioritised by three mapping approaches, and six genes (ATF1, AP1M1, DNAH10OS, FBLN7, LBH, WDR92) were prioritised by all four approaches. Gene set, tissue-specific and pathway enrichment To delineate gene sets and enriched pathways where the 237 prioritised genes converged, gene-set analysis was conducted in MAGMA v1.07^[98]27. Following MAGMA gene-set enrichment analysis, four Gene Ontology (GO) gene sets were significantly over-represented in our data: cardiovascular development (P = 1.56 × 10^−8, n = 666); tube morphogenesis (P = 9.35 × 10^−8, n = 778); blood vessel morphogenesis (P = 9.39 × 10^−7, n = 555); and tube development (P = 1.68 × 10^−6, n = 956) (Supplementary Table [99]2). Further, tissue-specific gene property analysis demonstrated significant gene expression in all three vascular tissue types present in GTEx 54 tissue types: coronary artery (P = 6.23 × 10^−7, 2nd most enriched), tibial artery (P = 1.05 × 10^−6, 3rd most enriched) and aorta (P = 3.92 × 10^−5, 8th most enriched; Supplementary Fig. [100]6). MAGMA analysis of GTEx 30 general tissue types also demonstrated blood vessels to be a highly enriched tissue (P = 3.8 × 10^−4, 3rd most enriched). Using eXploring Genomic Relations (XGR) analysis^[101]29, six canonical pathways were significantly enriched. This included enrichment for genes in pathways related to extracellular matrix biology, the VEGF and VEGFR signalling network, and intracellular Ca^2+ signalling in the T-cell receptor (TCR) Pathway (Table [102]1). Table 1. Gene-based enrichment analysis. Biological process Z-score P-value FDR Number of overlapped genes Genes Alpha9 beta1 integrin signalling events 3.85 0.0012 0.032 2 TNC, VEGFA Genes encoding structural ECM glycoproteins 3.39 0.0012 0.032 6 EFEMP1, FBLN7, FBN2, IGFBP7, RSPO3, TNC Calcium signalling in the CD4 + TCR pathway 3.5 0.0019 0.032 2 NFATC2, PPP3R1 Ensemble of genes encoding core extracellular matrix, including ECM glycoproteins, collagens and proteoglycans 3.11 0.0019 0.032 7 COL27A1, EFEMP1, FBLN7, FBN2, IGFBP7, RSPO3, TNC Non-canonical WNT signalling pathway 3.29 0.0025 0.035 2 MAPK10, NFATC2 VEGF and VEGFR signalling network 3.11 0.0032 0.036 1 VEGFA [103]Open in a new tab Ontology enrichment analysis was performed across all mapped genes using XGR, in “canonical pathways”,with the following settings: hypergeometric distribution testing, any number of genes annotated, any overlap with input genes and an adjusted false-discovery rate (FDR) < 0.05. The Z-scores, P-values, false-discovery rate and the overlapped genes for each of the ontologies are shown. Polygenic risk score analyses In order to derive a polygenic risk score (PRS) without overfitting to the UKB cohort, we utilised VV summary statistics from the FinnGen cohort (17,027 cases, 190,028 controls), which were highly concordant with the UK Biobank GWAS (Pearson’s r^2 = 0.95, P < 1 × 10^−16; Supplementary Fig. [104]3B). Using multivariable logistic regression, we found that the FinnGen-derived PRS was an independent predictor for varicose vein case/control status (Fig. [105]3 and Supplementary Table [106]3). Individuals in the top PRS decile had markedly increased odds of VV diagnosis (OR 4.57, 95% CI 4.27–4.92, P < 1 × 10^−300) compared to the bottom PRS decile. We hypothesised that VV cases who underwent surgery are phenotypically more severe and so would have a greater genetic liability than non-surgical VV cases. In a subgroup analysis of UK Biobank VV cases defined by self-reported diagnosis, ICD-10 or OPCS-4 code, history of surgery was associated with significantly increased PRS (0.407 vs. 0.330; P = 5.5 × 10^−4), suggesting that the VV-PRS captures disease severity. Fig. 3. Logistic regression for varicose vein case/control status in UK Biobank against VV-polygenic risk score (VV-PRS) decile. Fig. 3 [107]Open in a new tab The VV-PRS was derived in an independent study sample, FinnGen (n = 17,027 VV cases and 190,028 controls). Odds ratio refers to odds ratio of VV in UK Biobank versus decile 1 of VV-PRS (lowest). Error bars signify 95% confidence intervals of the odds ratios. Phenome-wide VV associations To uncover associations between VV and other phenotypes, we first performed genetic correlation analysis using LD score regression^[108]19. Of the 176 traits from nine categories tested for genetic correlation, twelve traits from two categories (anthropometric and autoimmune) met our Bonferroni-corrected significance threshold (P < 5.56 × 10^−3; Supplementary Table [109]4). All twelve significant traits were positively correlated with VVs (r[g] range: 9–21%). Eleven traits belonged to the anthropometric category and pertained to height and weight phenotypes, which are established risk factors for VVs^[110]3,[111]11,[112]12. In the autoimmune trait category, we discovered systemic lupus erythematosus (SLE) to be correlated with VVs, sharing ~19% genetic overlap (P = 4.2 × 10^−3, r[g] = 0.195). Of note, the C allele of variant rs17321999 at 2p23.1 (LBH), which is associated with an increased risk of SLE (P = 2.22 × 10^−16, OR = 1.20)^[113]30, was also significantly associated with VVs in our discovery cohort (P[disc] = 3.20 × 10^−14, OR = 1.09), and in high LD with lead SNP at 2p23.1 in the meta-analysis (rs9967884; P[meta] = 1.40 × 10^−14, OR = 1.11; r^2 = 0.87). We used the Open Targets Genetics platform^[114]31 to confirm the associations between VVs and SLE at the variant level, using a phenome-wide association study (PheWAS) approach (Supplementary Data [115]10); three of our 49 replicated lead variants (rs4849044, rs7773004, rs61863928) demonstrated an association with SLE. As a complementary approach, we leveraged the FinnGen-derived VV-PRS to perform a phenome-wide polygenic risk score analysis using a set of curated time-to-event phenotypes (n = 694) in UK Biobank. Eighteen phenotypes met the phenome-wide significance threshold (P = 1 × 10^−5) in the Cox regression (Supplementary Table [116]5). The most significantly associated phenotypes were “Phlebitis and thrombophlebitis”, “Unspecified monoarthritis”, “Obesity”, “Postphlebitic syndrome”, “Pulmonary embolism and infarction” and “Umbilical hernia”. To investigate whether these phenotypes are causally associated with VV, we performed bidirectional Mendelian randomisation using genetic instruments derived from our UK Biobank GWAS. Using the inverse-variance weighted (IVW) method as the primary analysis, we found evidence for raised body mass index being causal in VV, with no evidence for significant directional pleiotropy on MR-Egger sensitivity analysis (Supplementary Table [117]6). Two further suggestive causal associations were found: VVs (exposure phenotype) on thrombophlebitis risk and hernia (exposure phenotype) on VVs risk. Drug-target enrichment analysis Finally, we tested the amenability of our associated genes to drug targeting using the Open Targets Platform^[118]32. Of the 237 mapped genes, 200 genetic targets were identified^[119]32. Forty-two drug pathways reached nominal significance (P < 0.05), with the Butyrophilin family interactions pathway being most enriched (P = 2.6 × 10^−7, six targets), followed by the Calcineurin-NFAT pathway (P = 9.4 × 10^−4, two targets), and the Transcriptional regulation by RUNX1 pathway, which possessed the highest number of targets (P = 1.6 × 10^−3, 14 targets; Supplementary Data [120]11). Tractability information for 105 gene targets was available, with 65 of 237 genes predicted to be tractable to antibody targeting to a high confidence, and 26 genes predicted to be tractable to small molecule targeting (Supplementary Data [121]12). Eight of the 237 genes overlapped with pharmacologically active targets with known pharmaceutical interactions (CDK10, COL27A1, GABBR1, KCNJ2, MAPK10, OPRL1, TNC and VEGFA) (Supplementary Data [122]13). Of note, VEGFA is a target for several antibody, protein and oligosaccharide agents, which are currently in phase 2, 3 and 4 clinical trials, including for several ocular vascular disorders. Discussion VVs cause significant morbidity and large healthcare costs. There is a compelling need for a deeper understanding of the biology of VVs in order to develop new therapeutic strategies. Our analysis represents the largest and most comprehensive GWAS to date of VVs, including 135,514 patients and 675,111 controls. We discovered 28 previously unreported risk loci (29 signals), and independently replicated 18 of 29 previously reported but non-replicated loci (20 of 32 known signals). Our in silico analyses demonstrated strong evidence of functional variants in VV-associated genomic regions. Furthermore, our pathway analyses establish a strong enrichment for genes expressed in the extracellular matrix, immune cell signalling and circulatory system development. We have also identified a previously unreported genetic correlation between VVs and systemic lupus erythematosus (SLE). Several prioritised genes demonstrate the potential for pharmacological targeting, and are currently under active investigation in other diseases. Lastly, we demonstrate the potential utility of a polygenic risk score (PRS) in VVs, demonstrating that genetic risk correlates with a more severe phenotype—a first step in facilitating personalised medicine approaches to management. Biological drug targets This study has identified several biological pathways as mediators of VV pathophysiology that may be of relevance to pharmacological targeting. These cluster broadly into five functional categories: extracellular matrix regulation, the immune response, angiogenesis and lymphangiogenesis, smooth muscle cell biology and apoptosis (Supplementary Data [123]14), and we will briefly examine the three functional categories that produced the most compelling evidence in this study. Extracellular matrix regulation VVs demonstrate an increased luminal diameter and intimal hypertrophy, features closely related to disruption of the extracellular matrix (ECM)^[124]33. Deposition of ECM in the perivascular space in VVs is also recognised—possibly a compensatory mechanism to reinforce an already weakened wall^[125]34. Venous dilatation and valve ring enlargement seen in VVs is thought to affect the ability of venous valves to co-apt, which in turn contributes to venous reflux and hypertension^[126]10. Indeed, there is a noted imbalance of ECM proteins in VVs, specifically collagen and elastin—with a preponderance of collagen compared to normal vein^[127]35. It is therefore possible that disruption in intrinsic connective tissue components of the vein wall or valves may contribute to VVs. Our prioritised genes significantly overlapped with canonical pathways relating to ECM components, including Collagen Type XXVII Alpha 1 Chain (COL27A1) and EGF-containing Fibulin-like Extracellular Matrix Protein 1 (EFEMP1). rs753085 (P = 2.17 × 10^−11, OR = 1.07) is in an intron of COL27A1. COL27A1 is a fibrillar collagen in the extracellular matrices of several tissues, including blood vessels^[128]36. COL27A1 expression has been demonstrated to be reduced in VV samples^[129]37. Moreover, our drug-enrichment analysis demonstrated COL27A1 to be a pharmacologically active target, with pharmaceutical agents currently being investigated in several clinical trials (Supplementary Data [130]13), demonstrating its potential candidacy as a therapeutic target for VV prevention or treatment. In our GWAS, we replicated the previously described association between rs3791679 and VVs (P = 1.59 × 10^−13, OR = 1.08)^[131]16. rs3791679 resides in an enhancer region of EFEMP1, and is believed to be the causal variant at this locus^[132]38. EFEMP1 encodes the ECM glycoprotein fibulin-3^[133]39, which plays a key role in maintaining the integrity of elastic tissues^[134]16, and is highly expressed in vein endothelial cells. Interestingly, the same A allele of rs3791679 is also associated with abdominal hernia susceptibility^[135]40; this pleiotropy may in part explain our Mendelian randomisation finding that placed hernias on the causal pathway for VVs. Fibulin-3 has been found to antagonise vascular development by reducing the expression of the matrix metalloproteases, MMP2 and MMP3, and increasing expression of tissue inhibitors of metalloproteases (TIMPs) in endothelial cells^[136]41. The saphenofemoral junction in VVs demonstrates reduced expression of MMP2, and heightened expression of TIMP1 and MMP1 protein levels^[137]42. Alterations in expression of these enzymes may therefore result in weakness in the venous wall and predispose patients to VVs. Our drug-enrichment analysis found fibulin-3 to be tractable to antibody targeting to a high confidence; moreover, metformin has been demonstrated to downregulate fibulin-3 through downregulation of MMP2^[138]43. Fibulin-3 therefore represents another potential pharmacological target for VVs. The immune response Chronic Inflammation in the venous wall has been proposed to play a role in VV aetiopathology^[139]33. When compared to normal veins, enhanced expression of inflammatory mediators has been observed^[140]10. VVs contain increased mast cells, monocytes and macrophages compared to normal veins^[141]10. We defined five inflammation-associated risk loci in our GWAS. Of particular note is rs78216177 (P[meta] = 5.80 × 10^−14, OR = 1.10), in an intron of DOCK8 that has roles in both innate and adaptive immune systems. Deletion of DOCK8 is strongly associated with Hyper-IgE syndrome, a type of primary immunodeficiency that affects multiple systems, including the vasculature^[142]44. Vascular abnormalities in hyper-IgE syndrome include aneurysmal changes and abnormalities in great vessels. Notably, we also discovered a significant genetic overlap between VVs and SLE, an autoimmune disease, in the genetic correlation analyses, albeit not in the phenome-wide analysis of VV-polygenic risk score. The associations between VVs and SLE were confirmed at the variant level using a PheWAS approach, with three of 49 replicated lead variants (rs4849044, rs7773004, rs61863928) demonstrating an association with SLE. Furthermore, lead variant rs1471251 (P = 8.33 × 10^−11, OR = 1.06) is a known eQTL of AFF1, which is associated with SLE^[143]45. Reinforcing this shared genetic basis, a VVs-associated variant at LBH, rs17321999 (P[disc] = 3.20 × 10^−14, OR = 1.09), was also previously associated with SLE (P = 2.22 × 10^−16, OR = 1.20)^[144]30. XGR analysis demonstrated enrichment of “intracellular calcium signalling in the CD4 + T-cell receptor (TCR) pathway” (P = 1.9 × 10^−3, Z = 3.5), specifically highlighting genes NFATC2 and PPP3R1, which are intimately involved in this pathway. rs3787184 (P[meta] = 2.51 × 10^−36, OR = 1.16) was one of 18 SNPs identified as likely causal variants in our functionally informed fine-mapping (Supplementary Table [145]1), and resides in a promoter region of NFATC2. rs2861819 (P[meta] = 2.65 × 10^−77, OR = 1.20) is in an intergenic region ~19 kb upstream of PPP3R1. PPP3R1 encodes calcium binding B (CnB), a subunit of calcineurin, a Ca^2+influx-activated serine/threonine-specific phosphatase that interacts with NFAT transcription factors in the regulation of naive T-cell activation^[146]46. VVs are defined by clustering and infiltration of T lymphocytes^[147]34, which are predominantly distributed close to the venous valve agger^[148]47, a fibroelastic structure located at the base of venous valves where tunica media meets adventitia. Therefore, we hypothesise that aberrant PPP3R1 and NFATC2 expression could alter calcium signalling in T-cells, which may contribute to the valvular pathology seen in VVs. The Calcineurin-NFAT pathway was also the second-most enriched in our drug-target enrichment analysis, and may represent a novel therapeutic approach to VVs. Angiogenesis and lymphangiogenesis Disruption in normal angiogenic processes can lead to VVs^[149]48, potentially because of a failure to develop properly formed venous walls and valves, or to repair defects following vascular stress injury. MAGMA gene-set analysis revealed enrichment of several gene sets relating to tube formation and morphology, with the VEGF/VEGFR signalling network also being enriched in the XGR analysis. rs11967262 at 6p21.1 (P[meta] = 1.45 × 10^−19, OR = 1.09) lies in an intergenic region ~7 kb upstream of vascular endothelial growth factor A (VEGFA). VEGFA is a critical regulator of angiogenesis, and is fundamental to maintaining the integrity and functionality of the vessel wall^[150]49. VEGFA is a selective endothelial mitogen, binding to its receptor VEGFR2 to induce endothelial cell proliferation, migration and differentiation. VEGFA and VEGFR2 expression are significantly enhanced in the wall of VVs compared to normal veins, particularly in VVs complicated by thrombophlebitis^[151]50; interestingly, our Mendelian randomisation analysis found evidence for a causal role played by VVs in thrombophlebitis risk. Plasma levels of VEGFA have also been demonstrated to be significantly increased in patients with VVs^[152]51. VEGFA also causes vasodilatation, which reduces vessel tone, leading to stasis and the release of oxygen free radicals, which contributes to vein wall weakness^[153]52. Intriguingly, VEGFA also promotes inflammation via expression of intercellular and vascular cell adhesion molecules, linking angiogenesis to immune dysregulation^[154]53. Of note, anti-VEGFA agents are currently being investigated in several clinical trials for the treatment of retinal vein occlusion (Supplementary Data [155]13). The VEGF axis therefore represents a promising candidate for therapeutic targeting in the treatment of VVs. Positional and MAGMA mapping highlighted genes at four loci relating to lymphangiogenesis. Indeed, the lymphatic system develops from veins, and its function is intimately related with the venous circulation, draining extracellular fluid back into circulation. It is therefore feasible that similar genetic defects may result in either lymphoedema, varicose veins or a combination of the two conditions. PROX1 is a master inducer gene necessary for the development of lymphatic vasculature^[156]54, and PROX1knock-out mice are deficient of lymphatic vasculature^[157]55. During developmental lymphangiogenesis, PROX1 has been shown to be necessary for the formation of lymphovenous valves^[158]56,[159]57, suggesting it may predispose to VV development by causing defects in venous valves. rs340875, (P[meta] = 4.22 × 10^−20, OR = 1.09) is ~2 kb upstream of PROX1, and we also fine-mapped two probable functional variants at this locus (Supplementary Table [160]1). Furthermore, PROX1 is co-expressed and functions alongside the transcription factor FOXC2 in lymphatic valve-forming cells at the earliest stage of lymphatic development^[161]58. Mutations in FOXC2 cause hereditary lymphoedema-distichiasis—a disease characterised by VVs and peripheral lymphoedema—highlighting genetic overlap between the two disorders^[162]59. A previous analysis also suggested FOXC2 to be implicated in the development of varicose veins in the general population^[163]60, but we did not find evidence of association between varicose veins and FOXC2 in our study. In fact, the only gene identified in this study known to be associated with a Mendelian disease that results in VVs is PIEZO1, which also predisposes to both VVs and lymphoedema (Supplementary Table [164]7). Polygenic risk score Over two million people in the USA have advanced chronic venous disease^[165]2, and around 500,000 per year undergo invasive surgical procedures. Our polygenic risk score (PRS) analysis of operated vs unoperated VVs cases represents an important proof-of-principle, demonstrating the feasibility of enhanced prognostication in enabling the subset of VVs patients that will require surgical intervention to be identified. Preventive strategies in high-risk individuals could include prophylactic compression stocking use, or early ablation procedures to mitigate the risk of venous ulceration. The benefits of early endovenous ablation in improving healing of venous leg ulcers has been demonstrated^[166]61. Future research may also enable the identification of those at high risk of recurrence following surgery, a significant problem in current management of VVs patients. Strengths and limitations Certain limitations of this study must be acknowledged. While the discovery GWAS in UK Biobank used a combination of hospital diagnostic codes, operation codes and self-report codes, VV cases in 23andMe were identified based on self-report alone, meaning that the phenotyping for the replication GWAS was necessarily less stringent. Moreover, rather than undertake a formal meta-analysis between the discovery and replication GWAS, we independently tested the association for the 116 independent lead SNPs that were genome-wide significant in the UK Biobank discovery GWAS. Thus, sub-threshold signals in the discovery GWAS that may have reached genome-wide significance in the replication GWAS, or under meta-analysis, were not identified. Furthermore, not having access to the full summary statistics for the replication GWAS also meant that our in silico analyses were performed on the summary statistics from the discovery GWAS alone. Finally, we acknowledge that out of the many VVs-associated genes identified by a multitude of gene-prioritisation and mapping methods, only a small minority have robust evidence linking them to VVs through a truly causal variant. Moreover, our eQTL-based gene mapping and summary-based Mendelian randomisation analyses were limited by our lack of access to eQTL data for venous tissue. We therefore had to employ GTEx tibial artery tissue data as an imperfect surrogate, reasoning that it is the anatomically, histologically, and embryologically closest tissue to lower limb veins within the GTEx dataset. Functional validation will be required in diseased venous tissue to confidently associate these candidate susceptibility variants and genes to VVs. Several strengths of the paper mitigate these limitations. We have performed the largest GWAS of VVs to date by a considerable margin, with 135,514 cases and 675,111 controls. We have stringently controlled our false-positive rate by reporting only the loci that were genome-wide significant in the discovery GWAS and that subsequently replicated, so we can be confident in the veracity of these 49 signals. This is reflected in the plethora of biologically plausible genes, gene clusters and biological pathways that were associated with these loci. Furthermore, by including surgical codes for phenotyping in the UK Biobank discovery GWAS, we were able to identify a considerably greater number of cases than a previous GWAS that also used the UK Biobank resource but relied on ICD diagnostic codes alone (22,473 vs. 9577 cases)^[167]17,[168]18. As a general principle of case ascertainment, we believe there is much to be gained by seeking out individuals who have undergone surgery for a disease: given the inevitable risk of complications, surgery is generally reserved for those at the more phenotypically severe end of the spectrum who have failed non-surgical treatment. Finally, the predictive capabilities of a VVs polygenic risk score derived in an independent cohort not only underscores the validity of the VVs-associated polygenic signals uncovered in our discovery GWAS, but also opens the door to the future use of genetic risk-stratification in improving prognostication and guiding decision-making in the management of VVs patients. In summary, we have described the largest GWAS to date of VVs, a highly prevalent disease with a substantial health and socioeconomic cost. We discovered 49 variants at 46 loci that predispose to VVs. We have identified pathways and genes involved in extracellular matrix regulation, inflammation, vascular and lymphatic development, smooth muscle cell activity and apoptosis, all of which are biologically plausible contributors to the pathobiology of VVs, and provide excellent candidates for further investigation of venous biology. Several genes appear tractable to pharmacological targeting (notably VEGFA, COL27A1, EFEMP1, PPP3R1 and NFATC2), and may represent viable therapeutic targets in the future management of VV patients. Importantly, our polygenic risk score represents a first step towards better prognostication in patients with VVs. Methods Ethical approval and consent UK Biobank obtained ethical approval from the North West Multi-Centre Research Ethics Committee (MREC) (11/NW/0382) to collect and disseminate data and samples from participants (for more details: [169]ukbiobank.ac.uk/ethics). This study was conducted under UK Biobank study ID 22572. All participants provided informed consent for their genotype data to be used for this research. The consent procedures for UK Biobank are provided elsewhere (for more details: [170]www.ukbiobank.ac.uk/). All 23andMe participants were obtained from the customer base of 23andMe, Inc. Genotyping of participants was performed by the 23andMe Personal Genome Service. All 23andMe research participants provided informed consent for their genotype data to be utilised for research purposes under a protocol approved by the external AAHRPP-accredited IRB, Ethical & Independent Review Services. Further details regarding the consent processes of 23andMe can be found elsewhere ([171]23andme.com/en-gb/about/consent). Population and phenotype definition The UK Biobank is a population-level resource comprising a prospective cohort of approximately 500,000 participants, recruited at age 40–69 years, at 22 centres across England, Scotland and Wales, between 2006 and 2010^[172]62. Participants underwent whole-genome genotyping and data linkage with their medical records was performed to permit deep phenotyping^[173]63. The characteristics of the full UK Biobank cohort are described in detail elsewhere^[174]64. In the discovery analysis, VV cases were identified from the UK Biobank data showcase ([175]ukbiobank.ac.uk) using the following diagnostic, operative and self-report codes (Supplementary Table [176]8): 1. Primary and/or secondary ICD-10 codes for varicose veins (I83) 2. Primary and/or secondary OPCS code for varicose vein surgery: ((L84-L88) 3. Self-reported operation code for varicose vein surgery (1479) 4. Self-reported non-cancer illness code for varicose veins (1494) In summary, 27,165 individuals from the UK Biobank cohort possessed at least one of the above codes and were classified as VVs cases. Following quality control (QC), we identified 22,473 VV cases and the remaining 379,183 individuals were defined as controls. In the 23andMe replication cohort, participants answered the question “Do you have varicose veins on your legs? (Yes/Not Sure/No)”. VV cases were identified if they answered “Yes”, while controls were those who answered “No”. In all, 113,041 VV cases and 295,928 controls were included in the final replication analysis. Genotyping UK Biobank participants were genotyped on UK BiLEVE (49,950 participants) and UK Biobank Axiom arrays (438,427 participants), with ~95% shared content. In all, 805,426 directly genotyped variants from 488,377 participants were available prior to QC. The 23andMe replication cohort was genotyped on one of four custom arrays (v1/v2, v3, v4, v5). Illumina HumanHap550+ BeadChip was used for v1/v2 (1680 cases, 4882 controls) and the Illumina OmniExpress+ BeadChip was used for v3 (21,342 cases, 56,448 controls). For v4 a fully customised array (58,883 cases, 148,637 controls) was used, and for v5, the Illumina Infinium Global Screening Array was used (31,136 cases, 85,961 controls). Successive arrays contained significant overlap with all previous arrays. Quality control Quality control (QC) for the UK Biobank discovery cohort was conducted using PLINK v1.9^[177]65 ([178]https://zzz.bwh.harvard.edu/plink/) and R v3.3.1, as previously described (Supplementary Fig. [179]7)^[180]66. Briefly, all SNPs with a call rate <90% were removed. Following which, sample-level QC was conducted—individuals were excluded if: (i) they demonstrated heterozygosity >3 S.D. from the mean (calculated using UK Biobank’s PCA-adjusted heterozygosity values, Data Field 20004); (ii) there was disparity between genetically inferred sex (Data Field 22001) and self-reported sex (Data Field 31) or individuals with aneuploidy of sex chromosomes (Data Field 22019); and (iii) had a call rate <98%. Further, we removed all individuals who were not of white British ancestry (based on PCA and self-reported ethnicity (Data Field 22006). To confirm individuals were of white British-ancestry, we combined our UK Biobank data with individuals identified as GBR from the 1000 Genomes Project ([181]www.1000genomes.org), and using FlashPCA v2.0 ([182]github.com/gabraham/flashpca/) we identified agreement between the two datasets. Following this analysis, 86,693 individuals were excluded from our discovery GWAS. Using a linear mixed model implemented in BOLT-LMM^[183]67 allowed us to include and account for related individuals. Next, conducting SNP-level QC, we excluded 230,562 SNPs based on: (i) a call rate <98%, (ii) Hardy–Weinberg Equilibrium (HWE) P < 1 × 10^−4, (iii) MAF < 1%. We further excluded six individuals who were visual outliers when autosomal heterozygosity was plotted against call rate. Our post-QC discovery GWAS, therefore, consisted of 401,667 individuals and 547,011 genotyped SNPs (Supplementary Fig. [184]7). Following QC, our final discovery GWAS consisted of 401,667 individuals of white British ancestry and 547,011 genotyped single-nucleotide polymorphisms (SNPs). For the 23andMe replication analysis, samples were restricted to individuals from European ancestry determined through an analysis of local ancestry^[185]68. A maximal set of unrelated individuals was chosen for each analysis using a segmental identity-by-descent (IBD) estimation algorithm, defining related individuals if they shared more than 700 cM IBD, including regions where the two individuals share either one or both genomic segments IBD. Cases were preferentially retained in the analysis. Variant QC was applied independently to genotyped and imputed GWAS results. The SNPs failing QC were flagged based on multiple criteria, such as HWE P-value, call rate, imputation R-square and test statistics of batch effects. Imputation The phasing and imputation of UK Biobank has been previously described^[186]63. Briefly, using SHAPEIT3^[187]69 ([188]jmarchini.org/shapeit3/), autosome phasing was conducted, with a reference panel based on Phase 3 1000 Genomes Project data^[189]70. Imputation was performed using a combined HRC reference panel ([190]www.haplotype-reference-consortium.org/) and a merged UK10000/1000 Genomes Phase 3 Panel^[191]71. This resulted in an imputation file consisting of 92,693,895 autosomal SNPs, short indels and large structural variants^[192]63. In 23andMe, out-of-sample modified versions of the Beagle graph-based haplotype phasing algorithm^[193]72 and Eagle2 v2.3^[194]73 algorithm were used to phase samples. We used a 23andMe internally developed tool, which implements the Beagle graph-based haplotype phasing algorithm, modified to separate the haplotype graph construction and phasing steps. We imputed samples against a single unified imputation reference panel combining the 1000 Genomes Phase 3 haplotypes^[195]70 with the UK10K imputation reference panel^[196]71 using Minimac3^[197]74. Association analysis In the UK Biobank, we performed GWAS using a linear mixed non-infinitesimal model implemented in BOLT-LMM v2.3^[198]67 across 547,011 genotyped SNPs (minor allele frequency (MAF) ≥ 0.01) and 8,397,536 imputed SNPs (MAF ≥ 0.01 and INFO score ≥0.90), adjusting for genotyping platform and genetic sex. The reference genetic map used was hg19 and linkage disequilibrium scores were generated from European-ancestry individuals taken from the BOLT-LMM package. The genome-wide significance threshold in this analysis was set at P < 5 × 10^−8. Conditional regression analysis was performed at the top signal at each of 109 associated loci in BOLT-LMM^[199]67, excluding the MHC region due to the high density of genes and high linkage of variants. In 23andMe, summary statistics were generated via logistic regression assuming an additive model for allelic effects. Association analysis was performed adjusting for age, sex, the first five principal components, and genotyping platform. The top independent variants from the discovery GWAS were tested for their association with VVs in the 23andMe cohort. Of the 116 genome-wide significant variants in the discovery GWAS, 106 were present in, and passed QC within the replication cohort. The Bonferroni-corrected significance threshold for replication was set at P < 4.72 × 10^−4 (0.05/106). Data for all top independent variants that were available in both UK Biobank and 23andMe and that met the SNP QC within 23andMe were meta-analysed using a fixed-effects meta-analysis performed in GWAMA v2.2.2^[200]75. Functional annotation of SNPs To annotate SNPs at our susceptibility loci, SNP2GENE was performed in Functional Mapping and Annotation of GWAS (FUMA) v1.3.3^[201]20 ([202]https://fuma.ctglab.nl/) using summary statistics from the UK Biobank cohort and default settings. FUMA defined risk loci borders by using all independent genome-wide significant SNPs (r^2 < 0.6), and identified all SNPs that were in linkage disequilibrium (LD) with one of these candidate SNPs. Using ANNOVAR ([203]https://annovar.openbioinformatics.org/en/latest/), candidate SNPs within the replicated loci were annotated on the basis of genomic location. Exonic SNPs were investigated further using gnomAD and Ensembl genome browsers ([204]www.ensembl.org/index.html) to identify non-synonymous missense variants (see URLs). All candidate SNPs were annotated with Combined Annotation-Dependent Depletion (CADD)^[205]23 ([206]cadd.gs.washington.edu/), RegulomeDB^[207]24 ([208]www.regulomedb.org/), and 15-core chromatin states (Roadmap ChromHMM model^[209]76) to predict any regulatory or transcription effects from chromatin states at each SNP. Fine-mapping For each locus that was replicated in the 23andMe summary statistics (n = 45), functionally informed fine-mapping was performed using Polyfun-SuSiE v0.11.92^[210]25 using a 2MB window around each window SNP with the maximum number of causal SNPs set as 5 and a posterior probability threshold of 95%. The LD reference was derived from 1000 Genomes project European population. Fine-mapping was implemented using the echolocatoR package v0.2.2 for R^[211]77. Candidate gene mapping Four gene mapping approaches—positional mapping, eQTL mapping, MAGMA gene mapping ([212]www.ctg.cncr.nl/software/magma), and summary-based Mendelian randomisation (SMR)—were used to map putative genes at the replicated loci. For FUMA positional mapping, all genome-wide significant SNPs at each locus were mapped to genes within a positional window of 10 Kb^[213]20. eQTL gene mapping was used to map genes based on having at least one genome-wide significant cis-eQTL in GTEx v8 tibial artery tissue^[214]20. Using MAGMA v.1.07^[215]27, we conducted a genome-wide, gene-based association study, testing 17,966 protein-coding genes. The significance threshold was set at P < 2.78 × 10^−6 (0.05/17966). To identify gene expression levels associated with VVs due to pleiotropy, SMR and HEIDI analyses were performed using SMR v0.710^[216]28. Summary statistics from the UK Biobank discovery GWAS were used, alongside eQTL data for GTEx V7 tibial artery ([217]www.gtexportal.org/home/). The top-associated eQTL for each gene was used as an instrumental variable to examine association with VVs. A Bonferroni-corrected significance for all SMR probes was set at P[SMR] < 1.01 × 10^−5 (0.05/4946 probes). Subsequently, a HEIDI test was conducted across all SMR-significant probes to examine for heterogeneity in SMR estimates (significance set at P < 0.05/ number of P[SMR]-significant probes). Gene set, tissue-specific and pathway enrichment analysis Gene-set analysis was implemented in MAGMA v1.07^[218]27 across 15,496 gene sets derived from MSigDB v8.0^[219]78 with a significance threshold of P < 3.23 × 10^−6 (0.05/15496) (Supplementary Data [220]15). Tissue-specific gene property analysis was also performed in MAGMA v1.07 to determine the expression of the protein-coding genes in VV-related tissue types in GTEx v8.0^[221]26. Pathway analysis was performed in eXploring Genomic Relations (XGR v1.1.3)^[222]29 ([223]https://cran.r-project.org/web/packages/XGR/index.html). XGR ontology enrichment analysis was performed across all mapped genes, in “canonical pathways” with the following settings: hypergeometric distribution testing, any number of genes annotated, any overlap with input genes, and an adjusted FDR < 0.05. SNP-based heritability, genetic correlation and PheWAS The SNP-based heritability for the UK Biobank cohort was computed using BOLT-REML v2.3, a variance components method^[224]79, and LD score (LDSC) regression using LDSC v1.0.1^[225]19. LDSC was also used to calculate the LDSC intercept, mean chi-squared test and attenuation score, and SNP-based heritability (h^2[g]) in the 23andMe cohort^[226]80. This approach calculates heritability for a specific phenotype by regression of a SNP’s association statistic onto its LDSC^[227]80. LDSC regression was also used to calculate the genetic correlation between VVs and 176 preselected traits, across nine trait categories: metabolites, glycaemic traits, autoimmune diseases, anthropometric traits, smoking behaviour, lipids, cardiometabolic traits, reproductive traits and haematological traits, from publicly available GWAS data within LD Hub ([228]https://ldsc.broadinstitute.org/)^[229]80. Traits were selected based on putative epidemiological associations with VVs from the literature. We confirmed associations between VVs and the correlated traits at the variant level by employing a PheWAS approach. Each of 49 replicated variants was searched in Open Targets Genetics^[230]31 ([231]https://genetics.opentargets.org/, accessed 25/02/2022), and we collated all phenotypes with P-value < 0.005 in the databases available on the platform (UK Biobank, FinnGen, and GWAS Catalog). All phenotypes associated with the 49 variants are shown in Supplementary Data [232]10. Polygenic risk score The varicose veins polygenic risk score (VV-PRS) was derived in FinnGen ([233]https://www.finngen.fi/en), a public-private partnership project combining genotype data from Finnish biobanks and digital health record data from Finnish health registries. Genotyping, phenotyping, and quality control of the FinnGen cohort has been reported previously^[234]81, and varicose veins phenotype definition is available at [235]https://risteys.finngen.fi/phenocode/I9_VARICVE. Summary statistics for varicose vein ICD-10 diagnosis (I9_VARICVE) in FinnGen were downloaded from [236]https://console.cloud.google.com/storage/browser/finngen-public-da ta-r5/summary_stats/. There were 17,027 VVs cases and 190,028 controls. Summary statistics were lifted over from GRCh38 to GRCh37 using the “default_lift_data” function in the gnomAD package v0.4.0 for Hail v0.2.77 ([237]https://hail.is/). Non-autosomal SNPs were removed and SNP-allele pairs were matched with the provided UK Biobank-derived LD reference comprising high-quality HapMap3 variants using the “snp_match” function in the bigsnpr v1.9.11 package for R. SNPs with an absolute MAF difference >0.1 between FinnGen and UKB were excluded (n = 36,656), leaving a total of 1,004,944 SNPs. VVs polygenic risk score was derived using LDpred2^[238]82 (automatic model), following instructions in the package vignette. Model convergence was visually confirmed by inspection of the chain plot. The PLINK2 linear scoring function (–score) was used to calculate the per-sample score for unrelated European participants in UK Biobank (n = 381,544). To validate the predictive utility of this score, we performed logistic regression of varicose vein case/control status (defined as above) against PRS decile, adjusted for year of birth, sex, genotyping array, recruitment centre, body mass index (at recruitment) and PC1-10. For phenome-wide VV-PRS analysis, Phecode^[239]83-level time-to-event data was extracted from UKB phenotype data as described previously^[240]84, and varicose vein-specific phenotypes were removed. For each phenotype, Cox regression was performed for phenotype against Z-scored PRS adjusted for year of birth, sex, genotyping array, recruitment centre and PC1-10. Bidirectional two-sample Mendelian randomisation analyses Genetic instruments were extracted from summary statistics by thresholding at a significance level of 5 × 10^−8 and clumping according to r^2 < 0.01 using a 1000 Genomes European population reference. Varicose vein instruments were extracted from the summary statistics generated above, while other instruments were extracted from the IEU OpenGWAS project^[241]85 ([242]https://gwas.mrcieu.ac.uk/). Exposure and outcome data were harmonised using the “harmonise_data” function in the TwoSampleMR v0.5.6 for R ([243]https://mrcieu.github.io/TwoSampleMR/). The primary analysis performed was the inverse-variance weighted (IVW) method, assuming that all instrumental variables are valid. As sensitivity analyses, we additionally performed MR-Egger regression to identify available of significant directional pleiotropy. All analyses were implemented in the TwoSampleMR package for R. Drug-target enrichment analysis The prioritised genes at our replicated loci were queried on the Open Targets Platform^[244]32, assessing whether encoded proteins were tractable to small molecule or antibody targeting, or drug targets in any phase of clinical trial. Genes were also analysed for enriched drug pathways, with a nominal P < 0.05 threshold of significance. Reporting summary Further information on research design is available in the [245]Nature Research Reporting Summary linked to this article. Supplementary information [246]Supplementary Information^ (42.3MB, pdf) [247]Peer Review File^ (421KB, pdf) [248]41467_2022_30765_MOESM3_ESM.pdf^ (138.1KB, pdf) Description of Additional Supplementary Files [249]Supplementary Data 1^ (1MB, xlsx) [250]Supplementary Data 2^ (20KB, xlsx) [251]Supplementary Data 3^ (21.3KB, xlsx) [252]Supplementary Data 4^ (604.9KB, xlsx) [253]Supplementary Data 5^ (24.1KB, xlsx) [254]Supplementary Data 6^ (22.7KB, xlsx) [255]Supplementary Data 7^ (21KB, xlsx) [256]Supplementary Data 8^ (20KB, xlsx) [257]Supplementary Data 9^ (12.5KB, xlsx) [258]Supplementary Data 10^ (516.5KB, xlsx) [259]Supplementary Data 11^ (11.8KB, xlsx) [260]Supplementary Data 12^ (12.7KB, xlsx) [261]Supplementary Data 13^ (11.4KB, xlsx) [262]Supplementary Data 14^ (17.3KB, xlsx) [263]Supplementary Data 15^ (1.1MB, xlsx) [264]Reporting Summary^ (2.3MB, pdf) Acknowledgements