Abstract

   Alcohol consumption is a heritable behavior seriously endangers human
   health. However, genetic studies on alcohol consumption primarily
   focuses on common variants, while insights from rare coding variants
   are lacking. Here we leverage whole exome sequencing data across
   304,119 white British individuals from UK Biobank to identify
   protein-coding variants associated with alcohol consumption.
   Twenty-five variants are associated with alcohol consumption through
   single variant analysis and thirteen genes through gene-based analysis,
   ten of which have not been reported previously. Notably, the two
   unreported alcohol consumption-related genes GIGYF1 and ANKRD12 show
   enrichment in brain function-related pathways including glial cell
   differentiation and are strongly expressed in the cerebellum.
   Phenome-wide association analyses reveal that alcohol
   consumption-related genes are associated with brain white matter
   integrity and risk of digestive and neuropsychiatric diseases. In
   summary, this study enhances the comprehension of the genetic
   architecture of alcohol consumption and implies biological mechanisms
   underlying alcohol-related adverse outcomes.

   Subject terms: Behavioural genetics, Genetic association study, Health
   care
     __________________________________________________________________

   The authors leverage whole exome sequencing data to investigate
   protein-coding variants associated with alcohol consumption.
   Phenome-wide association analyses reveal that alcohol
   consumption-related genes are associated with brain white matter
   integrity and risk of digestive and neuropsychiatric diseases.

Introduction

   Alcohol consumption is a prominent risk factors for death and
   disability worldwide, accounting for over two million deaths each
   year^[48]1. It poses a tremendous threat to human health through
   multiple mechanisms, including cumulative damage to organs and leading
   to self-harm or violence^[49]2,[50]3. Notably, these adverse effects
   are largely dependent on the average volume of alcohol
   consumption^[51]4. Identifying the risk factors that influence one’s
   level of alcohol consumption can contribute to the prevention,
   identification, and treatment of adverse outcomes from alcohol
   consumption^[52]5.

   Over the recent decades, comprehensive genome-wide association studies
   (GWAS) have indicated the potential influence of genetic factors on
   one’s alcohol consumption volume and identified over 100 related
   variants^[53]6,[54]7. However, a predominant proportion of the
   identified variants are localized within noncoding regions, and their
   effect sizes tend to be small, making interpretation and identification
   of the causal gene challenging^[55]8. In addition, previous GWAS mainly
   utilized imputed genotype data, which only cover limited regions of the
   genome, and thus may have missed many potential genes. Furthermore,
   GWAS studies focused mainly on common variants, and few studies have
   investigated rare variants associated with alcohol consumption, which
   yield greater potential to interpret biological function and elucidate
   mechanisms^[56]9. Although there are studies that have attempted to
   leverage exome chip data to identify rare variants contributing to
   alcohol consumption, the sample size was small and limited regions of
   the whole exome were examined^[57]10.

   The introduction of whole exome sequencing (WES) provides a great
   chance to overcome the limitations of previous genetic studies on
   alcohol consumption with a substantially larger amount of rare and
   ultra-rare protein-coding variants^[58]11–[59]13. Collapsing of
   loss-of-function (LOF) variants helps estimate the effect direction of
   associated genes^[60]13,[61]14. When combined with large-scale
   population cohorts with multi-modal phenotypic data, WES would greatly
   facilitate our understanding of the genetic underpinnings of alcohol
   consumption as well as its implication on physical and mental
   health^[62]6. However, to our knowledge, there have been few
   large-scale WES studies on alcohol consumption, let alone elucidating
   the potential implications of the identified genes^[63]10,[64]15.
   Meanwhile, as indicated by a previous genome-wide association study,
   significant genetic associations existed between alcohol consumption
   and several body health phenotypes^[65]7. The application of
   phenome-wide analysis for alcohol-related genes can help extend and
   deepen our current comprehension of the association between alcohol
   consumption and human health.

   Hence, aiming to refine the genetic architecture of alcohol
   consumption, we conduct an exome-wide association study (ExWAS) for
   alcohol consumption among 304,119 individuals from the UK Biobank
   (UKB). We also examine the rare-variant associations with genes
   reported by previous GWAS^[66]6,[67]7,[68]16,[69]17. Finally, we
   provide biological insights into the identified genes via
   bioinformatics analyses and phenome-wide association analysis (PheWAS).

Results

Study population and data description

   We leveraged exome sequencing data and phenotypic data from UKB and
   excluded low-quality variants and samples (Methods)^[70]13,[71]18. For
   the main analysis, we included 304,119 unrelated white British
   participants. The average age was 56.87 years at enrollment and 54.09%
   participants were female. Information about alcohol drinking per week
   were obtained from self-completed touchscreen interviews at baseline
   (Methods and Supplementary Data [72]1). The average alcohol consumption
   (alcohol amounts after natural logarithm) of the whole sample was 2.06
   (Standard Deviation (SD) = 1.44), with a mean of 2.47 (SD = 1.41) and
   1.72 (SD = 1.38) for males and females respectively (Supplementary
   Data [73]2). Finally, the exome-wide association analysis included
   100,101 common variants (with a MAF of ≥1%) and 13,018,630 rare
   variants (with a MAF of < 1%). Figure [74]1 provided the general schema
   of our study.

Fig. 1. Study overview.

   [75]Fig. 1
   [76]Open in a new tab

   The top section outlines the data utilized in the study, including
   alcohol use, exome sequence data, and health-related phenotypes. The
   middle section outlines the identification of exome-wide significant
   genes, involving exome-wide association analysis, replication in the
   FinnGen cohort, and sensitivity analysis. The bottom-left section
   outlines biological functions analysis of the identified genes,
   including GO analysis, tissue expression enrichment analysis, cell-type
   expression, and lifespan spatio-temporal brain expression trajectory
   analysis. The bottom-right part focuses on exploring phenome-wide
   associations of the identified genes.GO Gene Ontology, PCW
   Post-conception weeks, RPKM Reads per kilobase million; tSNE
   t–Stochastic Neighbourhood Embedding. ‘image: Flaticon.com’. This cover
   has been designed using images from Flaticon.com.

ExWAS for alcohol consumption

   To test whether alcohol consumption was associated with damaging coding
   variants, we conducted ExWAS using a linear mixed model with
   adjustments for ten principal components, age, and sex (Methods). The
   analysis discovered two rare variants and 23 independent common
   variants linked to alcohol consumption (P < 5 × 10^−8) (Table [77]1,
   Fig. [78]2a, [79]b). The genomic control lambda is 1.04, indicating
   that the association statistics are not systematically inflated (see
   Supplementary Fig. [80]1 for the corresponding quantile-quantile plot).
   The top rare variant, rs283413 (MAF = 0.8%; β[A] = −0.15,
   P = 2.73 × 10^−31) is a stop-gain variant in ADH1C, the well-known gene
   related to alcohol metabolism. Among the 23 common variants, three were
   not reported previously (rs41288799, rs4975020 and rs77623289). Most of
   the identified variants are intron (46%) or missense (19%)
   (Fig. [81]2c, Supplementary Data [82]3, Methods). Additionally, 15 of
   the 22 identified variants, which were examined in an independent
   alcohol consumption GWAS^[83]19, showed nominal significance (P < 0.05)
   (Table [84]1, Supplementary Data [85]4). Further, 17 of the 24
   identified variants available in the FinnGen study^[86]20 exhibited
   nominal associations with alcohol use disorder (AUD) (P < 0.05)
   (Supplementary Data [87]5). To assess the robustness of the main
   analysis, we adjusted for rs1229984, a well-established marker strongly
   linked to alcohol consumption^[88]6,[89]21. Notably, 23 of 25 variants
   (92%) retained the same association directions, with 22 variants (88%)
   maintaining their significance (P < 5 × 10^−8) (Supplementary
   Data [90]6). Additionally, the main analysis maintained its robustness
   after excluding former drinkers and non-drinkers. Further, all effect
   directions remained the same, and 20 of the initially identified 25
   variants (80%) retained their significance (P < 5 × 10^−8)
   (Supplementary Data [91]7). Finally, the ExWAS for scores of alcohol
   use disorders identification test (AUDIT) identified a rare variant
   (rs283413) and two independent common variants (rs13107325 and
   rs201168482) associated with alcohol use problems (Supplementary
   Figs. [92]2–[93]4, Supplementary Data [94]8).

Table 1.

   Exome-wide significant variants for alcohol consumption
   CHR     SNP      A1  A2   SYMBOL     AF     β    SE      P
   2   rs41288799  G    C  PREB, ABHD1 0.037 −0.03 0.006 3.23E-08
   2   rs3214499   G    GA NRBP1       0.447 0.02  0.002 1.28E-22
   2   rs3811644   A    G  C2orf16     0.210 0.02  0.003 3.94E-09
   4   rs149109767 AGAG A  HTT         0.074 −0.02 0.004 4.07E-08
   4   rs11096989  A    AG WDR19       0.437 0.01  0.002 3.78E-09
   4   rs4975015   T    C  KLB         0.186 0.02  0.003 8.61E−10
   4   rs4975017   C    A  KLB         0.328 −0.02 0.002 7.20E-12
   4   rs4975020   A    G  UGDH        0.324 0.01  0.002 4.08E-09
   4   rs190428650 C    G  ADH1A       0.000 −0.29 0.052 2.53E-08
   4   rs283413    C    A  ADH1C       0.008 −0.15 0.013 2.73E-31
   4   rs17526590  G    A  ADH1C       0.096 0.03  0.004 3.94E-15
   4   rs113337987 G    A  MTTP        0.020 −0.06 0.008 4.70E−13
   4   rs13107325  C    T  SLC39A8     0.074 −0.04 0.004 1.30E-20
   7   rs13235543  C    T  MLXIPL      0.129 0.02  0.003 5.26E-09
   11  rs755555    C    T  SLC39A13    0.320 −0.02 0.002 7.94E-12
   11  rs10891540  A    G  TTC12       0.463 0.01  0.002 3.31E-08
   11  rs1800497   G    A  ANKK1       0.201 0.02  0.003 4.92E-08
   12  rs2400895   A    T  ACSS3       0.450 0.01  0.002 4.70E-08
   14  rs28929474  C    T  SERPINA1    0.021 −0.05 0.008 3.07E-09
   15  rs77623289  G    T  ISL2        0.053 0.03  0.005 1.81E-08
   16  rs62036622  T    G  ATXN2L      0.420 −0.01 0.002 7.26E−10
   16  rs2278557   C    G  PPP4C       0.404 −0.01 0.002 2.72E-08
   17  rs8073146   A    G  CRHR1       0.226 −0.02 0.003 3.83E-15
   18  rs1788825   C    T  RMC1        0.346 −0.01 0.002 3.80E-10
   19  rs516246    T    C  FUT2        0.492 −0.02 0.002 3.77E-12
   [95]Open in a new tab

   SAIGE GENE+ was utilized to conduct single-variant association tests
   (two-sided). Independent significant (P < 5 × 10^−8) variants for
   alcohol consumption are presented. No corrections were applied for
   multiple comparisons. A variant is reported if the mapped genes have
   been previously reported for alcohol use according to the GWAS catalog,
   with boldface indicating associations not previously identified. CHR
   Chromosome, A1 Allele 1; A2 allele 2, AF Allele frequency of allele 2,
   β β value for allele 2.

Fig. 2. Single-variant ExWAS of alcohol consumption.

   [96]Fig. 2
   [97]Open in a new tab

   a Manhattan plot showing the results of the common variants from ExWAS
   of alcohol consumption. SAIGE GENE+ was used to perform single-variant
   association tests (N = 304,119 biologically independent samples). The
   chromosomal position of the variant across the 22 chromosomes is
   represented on the x-axis, while the y-axis displays the
   -log[10]-transformed p-value. The significance threshold
   (P < 5 × 10^−8) is denoted by the horizontal black line. Models are
   corrected for the top ten ancestral principal components, age, and sex.
   The presented p-values are two-sided and have not been adjusted for
   multiple testing. Significant variants were marked with red.
   Independent significant variants were marked with the nearby genes.
   Genes identified in this study were marked in bold. b Plot of effect
   size (absolute value) versus allele frequency of 22 previously reported
   alcohol consumption variants (blue) and 3 previously not reported
   alcohol consumption variants (red). c Distribution of the functional
   consequences of independent significant variants.

   Since a single rare variant tends to be of insufficient power to
   identify significant signals, we further performed gene-based
   collapsing analysis to detect genes related to alcohol consumption. LOF
   and missense rare variants of each gene and three MAF thresholds (< 1%,
   < 0.1% and < 0.01%) were utilized. In total, we identified 19
   associations (covering seven genes) after Bonferroni correction
   (Table [98]2 and Fig. [99]3a; Supplementary Data [100]9;
   P < 0.05/19852 = 2.5 × 10^−6). Rare variants in the known alcohol
   consumption-related gene, ADH1C showed the most significant gene-based
   association at P = 1.91 × 10^−30. The maximum genomic control lambda
   was 1.076 (see Supplementary Fig. [101]5 for the corresponding
   quantile-quantile plots). The total rare burden heritability of alcohol
   consumption was 0.88% (Fig. [102]3b and Supplementary Data [103]10). We
   additionally identified six putative alcohol consumption-related genes
   under the threshold of overall false discovery rate (FDR) < 0.05 (P 
   <1.69 × 10^−5). Among these rare-variant genes, seven (GIGYF1, ANKRD12,
   KDM5B, APC2, LGI2, ATP1A2, and ENSG00000224076 (not officially
   designated and excluded from further analysis)) were not previously
   reported in GWAS studies for alcohol consumption. The LOF and missense
   burden in eleven of the rare-variant genes reduced alcohol consumption
   (β = −0.003 to −0.023; Fig. [104]3c, Table [105]2). In addition, 2.03%
   (n = 8825) of the participants carried a LOF variant located in ADH1C
   exons and GIGYF1 variants were carried by 1.72% (n = 7449) participants
   (Fig. [106]3d). After excluding the former drinkers and non-drinkers,
   19 out of the initially identified 39 associations retained the
   significance (P < 1.69 × 10^−5, Supplementary Data [107]11). Following
   adjustment for rs1229984, the identified associations were robust
   except for ADH1C, ADH1A, SNX17 and ADH5 (Supplementary Data [108]12).
   Additionally, we performed ExWAS for AUDIT and identified two genes
   (ADH1C and CA1) associated with alcohol use problems (Supplementary
   Figs. [109]6-[110]8, Supplementary Data [111]13).

Table 2.

   Gene associated with alcohol consumption at FDR < 0.05
   Region Group Max MAF MAC N rare N ultra-rare β[Burden] P
   ADH1C lof 0.01 4986 2 19 −0.007 1.91E-30
   GIGYF1 lof 1.00E-04 103 1 58 −0.023 6.92E-11
   ANKRD12 lof 1.00E-04 214 4 81 −0.015 3.91E-10
   ADH1A lof 0.001 424 5 19 −0.008 1.39E-08
   ADH1A missense; lof 0.001 935 19 68 −0.003 4.05E-08
   KDM5B missense; lof 0.001 1386 19 427 −0.004 6.07E-07
   CTNNA2 missense; lof 1.00E-04 530 10 175 −0.007 1.16E-06
   APC2 missense; lof 0.001 2057 36 331 −0.004 1.58E-06
   KDM5B missense; lof 1.00E-04 1124 16 427 −0.004 2.26E-06
   LGI2 missense; lof 0.001 621 11 117 −0.007 3.48E-06
   ANKRD12 missense; lof 1.00E-04 1191 27 337 −0.005 3.52E-06
   APC2 missense; lof 1.00E-04 1462 30 331 −0.004 3.61E-06
   GIGYF1 missense; lof 0.001 1310 29 198 −0.004 8.72E-06
   ANKRD12 missense; lof 0.001 2121 33 337 −0.003 1.02E-05
   ENSG00000224076 lof 1.00E-04 5 0 1 0.069 1.14E-05
   ATP1A2 missense; lof 1.00E-04 512 9 166 −0.006 1.29E-05
   SNX17 missense; lof 0.01 5405 8 86 0.002 1.54E-05
   HECTD4 lof 1.00E-04 117 2 66 −0.012 1.56E-05
   ATP1A2 missense; lof 0.001 574 10 166 −0.005 1.63E-05
   ADH5 missense; lof 0.001 1215 17 88 −0.004 1.64E-05
   [112]Open in a new tab

   SAIGE GENE+ was utilized to conduct gene-based collapsing tests. Genes
   for alcohol consumption with FDR < 0.05, which is equivalent to
   P < 1.69 × 10^−5 are presented. No corrections were applied for
   multiple comparisons. Genes not previously reported to be associated
   with alcohol use according to the GWAS catalog were highlighted in
   boldface. Region gene name, Group Annotation mask, lof loss of
   function, Max MAF Maximum MAF cutoff, P p-value for SKAT-O test,
   BETA_Burden effect size of burden test, SE_Burden standard error of
   BETA_Burden.

Fig. 3. Gene-based ExWAS of alcohol consumption.

   [113]Fig. 3
   [114]Open in a new tab

   a Manhattan plot showing the results of the rare variants (LOF and
   Missense) from ExWAS of alcohol consumption with three different MAF
   thresholds in gene-based analysis. The x-axis represents the gene
   position on 22 chromosomes. The y-axis indicates the
   -log[10]-transformed p-value. Shape indicates combinations of different
   MAF groups and consequence groups, including LOF and Missense. The
   black horizontal line denotes the exome-wide significance level using
   Bonferroni correction (P < 2.5× 10^−6). The black dashed horizontal
   line indicates significant associations at an overall
   FDR < 0.05(P < 1.69 × 10^−5). Models are corrected for the top ten
   ancestral principal components, age, and sex. P-values are two-sided
   and unadjusted for multiple testing. Significant genes were marked with
   red. b Burden heritability of alcohol consumption in different groups.
   The x-axis denotes the different MAF groups (ultra-rare
   (MAF < 1 × 10^−5) and rare (1 × 10^−5 ≤ MAF < 1 × 10^−2)). The color
   shows the LOF, Missense, and aggregation of the two groups. The y-axis
   indicates the burden heritability (h^[115]2) in percent. Error bars
   indicate the standard error. c Plot of the effect sizes of burden test
   of the significant associations. N = 304,119 biologically independent
   samples were utilized in the analysis. For each gene, the most
   significant associations were plotted. Data presentation is in the form
   of β ± s.e. × 1.96. d The carrier percentage for rare LOF and missense
   variants in the alcohol consumption-related genes. The color of the bar
   indicates LOF (green) or missense (orange) groups of the variants.

Leave-one-variant-out (LOVO) and conditional analysis

   To investigate whether a single variant dominated the gene-based
   associations, we firstly conducted LOVO analysis. While the maximum
   P-value for ADH1C was P = 0.802 after the removal of rs283413,
   P = 0.873 for ADH1A after the removal of rs190428650, P = 0.446 for
   SNX17 after the removal of rs147740391, and P = 0.016 for ADH5 after
   the removal of rs62325244, the other associations did not exhibit
   substantial attenuation (Supplementary Figs. [116]9–[117]20,
   Supplementary Data [118]14). Hence, even a single variant, i.e. of
   ADH1C, ADH1A, and ADH5, may critically influence alcohol consumption,
   whereas the other significant associations were based on a burden of
   multiple rare variants. Subsequently, conditional analysis was
   performed to assess whether the significant associations with rare
   variants were influenced by adjacent common variants (Methods). Seven
   genes were found to have nearby common variants exhibiting significant
   associations with alcohol consumption. The associations of GIGYF1,
   ANKRD12 and APC2 did not exhibit substantial attenuation, whereas the
   associations of ADH1C, ADH5 and SNX17 exhibited attenuation, though
   still nominally significant, and the association of ADH1A lost its
   significance after adjustment for the nearby common variants
   (Supplementary Data [119]15).

Sex-specific analysis of the associations

   As the average alcohol consumption showed a significant difference
   between males and females, we conducted gene-based collapsing analyses
   on participants separated by sex to explore whether the genetic
   contributions to alcohol consumption also differed by sex. While the
   KDM5B gene’s association with alcohol consumption was only observed in
   males (P = 3.04 × 10^−7 for males and P = 0.170 for females), the other
   genes were significantly associated with alcohol consumption in both
   males and females (P < 0.05, Supplementary Data [120]16).

Associations of rare variants in alcohol-related genes

   We then examined the impact of rare variants based on previous GWAS
   findings on alcohol consumption. We assessed a total of 174 alcohol
   consumption-related genes identified by the most recent GWAS
   studies^[121]6,[122]7,[123]16,[124]17. Although 25 genes showed nominal
   significance, only the ADH1C gene was significant after Bonferroni
   correction (Supplementary Data [125]17). The influence of coding
   variants within the GWAS regions did not exhibit substantial effects,
   potentially due to the limited statistical power of ExWAS.

Biological function and tissue expression of the alcohol consumption-related
genes

   We further conducted a series of bioinformatics analyses to investigate
   the biological functions of the alcohol consumption-related genes. We
   first performed pathway enrichment analyses. We found the enrichment of
   gene ontology (GO) pathways relevant to alcohol dehydrogenase activity,
   oxidoreductase activity, ethanol oxidation and ethanol metabolism
   (Fig. [126]4a, Supplementary Data [127]18). Also, the analysis of Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathways identified the
   enrichment of these genes in tyrosine metabolism, fatty acid
   degradation, and pyruvate metabolism. These results hence supported the
   biological validity of our genetic findings.

Fig. 4. Biological function of the alcohol consumption-related genes.

   [128]Fig. 4
   [129]Open in a new tab

   a Results of the functional enrichment analysis. The assessment of
   functional enrichment for the genes associated with alcohol consumption
   was assessed using the hypergeometric test. The g:SCS method was used
   for multiple testing correction. The x-axis indicates -log[10] of the
   adjusted p-value for each term. The y-axis indicates different terms,
   and each source is marked with different colors. BP Biological process,
   GO Gene Ontology, MF Molecular function; KEGG, Kyoto Encyclopedia of
   Genes and Genomes. oxidoreductase activity^[130]1, oxidoreductase
   activity, acting on the CH-OH group of donors, NAD or NADP as acceptor;
   oxidoreductase activity^[131]2, oxidoreductase activity, acting on
   CH-OH group of donors. b The bar plot shows the tissue-specific gene
   enrichment. The x-axis represents various tissue types. The y-axis
   indicates the fold-change values of the tissue-specific gene
   enrichment. Tissues with nominal enrichment were marked with *. c The
   t–Stochastic Neighbourhood Embedding (tSNE) plot shows different cell
   types within the liver. The cell type is represented by the color of
   the dots. d The feature plot displaying the expression level of
   tissue-specific genes in different cell types within the liver. TPM
   transcripts per million.

   We further analyzed tissue-specific expression enrichment of the
   identified genes based on the Human Protein Atlas project using the
   TissueEnrich R package^[132]22. We observed six, four, and two genes
   enriched in the liver, duodenum, and adipose tissue, respectively
   (Fig. [133]4b, Supplementary Fig. [134]21, and Supplementary
   Data [135]19). Genes, including SERPINA1, ADH1C, ADH1A, MLXIPL, MTTP,
   and KLB were specifically enriched in the liver (Supplementary
   Fig. [136]22). Subsequently, we evaluated the expression levels of
   these six genes across various cell types in the liver with single-cell
   RNA sequencing (scRNA-seq) data. While SERPINA1 was widely expressed in
   all cell types, ADH1A, ADH1C, MTTP, and MLXIPL were all predominantly
   expressed in the hepatocytes (Fig. [137]4c, [138]d).

   We further estimated the similarities between genes based on the
   association results of collapsing analyses across 1419 quantitative
   traits in UKB using Gene-SCOUT^[139]23. Notably, the GIGYF1 gene
   exhibited the highest similarity to ANKRD12 (Fig. [140]5a and
   Supplementary Data [141]20). Interestingly, the top 10 similar genes of
   ANKRD12 are enriched in brain function-related pathways containing
   glial cell differentiation, cognitive function, and glutamate secretion
   (Fig. [142]5b and Supplementary Data [143]21). Thus, to gain more
   insights into how these rare-variant genes may be related to alcohol
   use, we further examined the expression of ANKRD12 and GIGYF1 across
   tissues within the Human Protein Atlas^[144]24. Notably, both ANKRD12
   and GIGYF1 exhibited strong expression in the brain, particularly in
   the cerebellum (Fig. [145]5c, Supplementary Fig. [146]23). In addition,
   both ANKRD12 and GIGYF1 showed broad expression in all cell types in
   brain (Supplementary Fig. [147]24). We subsequently characterized the
   spatiotemporal expression trajectories of ANKRD12 and GIGYF1 in the
   human brain, using mRNA sequencing (mRNA-seq) data from the PsychEncode
   study^[148]25. Our findings revealed unique temporal expression
   patterns of these genes in the cerebellum compared to other regions of
   the brain (Fig. [149]5d, e). These results imply that these two genes
   associated with alcohol consumption may alter the function of brain,
   which are important targeted organ of alcohol intake, providing clues
   for future research on the alcohol-related brain injury.

Fig. 5. Functional analysis of the rare-variant genes identified in our
study.

   [150]Fig. 5
   [151]Open in a new tab

   a Top 10 genes with the most similar quantitative trait associations to
   ANKRD12 using exome-sequencing data in the UKB. b Enriched gene sets of
   ANKRD12 plus the top 10 similar genes of ANKRD12. Fisher’s exact test
   was performed by Gene-SCOUT. The x-axis indicates the PHRED score
   (−10×log[10](P)), derived from Gene-SCOUT. c Expression of ANKRD12 and
   GIGYF1 in human tissues from the Human Protein Atlas. Abbreviations:
   nTPM, normalised transcripts per million. The top 20 tissues are
   included here; the complete plot is available in Supplementary
   Fig. [152]23. d Lifespan spatiotemporal expression trajectory of
   ANKRD12 in the human brain. Expression is shown in both prenatal and
   postnatal periods derived from mRNA-seq data of the PsychENCODE study
   ^[153]25. The x-axis denotes the age, represented in both
   post-conception weeks (prenatal) and years (postnatal), categorized
   into eight distinct periods: < 13 post-conception weeks (PCW)), 13-18
   PCW, 19-23 PCW, 24-37 PCW, 0-2 years, 3−12 years, 13–19 years, and > 19
   years. The y-axis depicts the log[2]-transformed expression value,
   given in reads per kilobase million (RPKM). Each brain region’s
   expression trajectory was visualized through a fitted non-linear LOESS
   regression line, accompanied by error bands (shaded areas) indicating
   the 95% confidence interval. e. Lifespan spatiotemporal expression
   trajectory of GIGYF1 in the human brain. The x-axis denotes the age,
   represented in both post-conception weeks (prenatal) and years
   (postnatal). The log[2] transformed expression values were represented
   on the y-axis. Each brain region’s expression trajectory was
   illustrated through a fitted non-linear LOESS regression line,
   accompanied by error bands (shaded areas) denoting the 95% confidence
   interval.

Phenotypic associations with alcohol consumption-related genes

   Alcohol consumption has been documented to correlate with various
   biological markers, including metabolites, and health
   outcomes^[154]7,[155]26–[156]28. To systematically assess the
   relationship between genetic variation in alcohol consumption and a
   broad spectrum of health phenotypes, we performed PheWAS for the
   identified alcohol consumption-related genes across blood indices,
   major diseases, body function, and brain structures from the UKB
   (Methods and Supplementary Data [157]22).

   Among the 82 significant gene-phenotype and 380 variant-phenotype
   associations (P < 0.05/316/12 = 1.32 × 10^−5,
   P < 0.05/316/25 = 6.33 × 10^−6, respectively), 81.7% and 47.4% were
   related to inflammatory and blood biochemistry indices (Fig. [158]6 and
   Supplementary Data [159]23, [160]24, Supplementary
   Figs. [161]25–[162]61). Indicators of inflammation and disturbance of
   lipid metabolism showed significant associations with alcohol
   consumption-related genes. GIGYF1 and ANKRD12 showed the most
   phenotypic associations. GIGYF1 showed strong positive associations
   with HbA1c (β[burden] = 0.029, P = 2.51 × 10^−13) and glucose
   (β[burden] = 0.027, P = 1.92 × 10^−10), and negative associations with
   total cholesterol level (β[burden] = −0.029, P = 4.52 × 10^−13),
   low-density lipoprotein cholesterol level (LDLC) (β[burden] = −0.026,
   P = 1.33 × 10^−10) and Apolipoprotein B (β[burden] = −0.024,
   P = 6.15 × 10^−10). ANKRD12 showed strong positive associations with
   neutrophil percentage (β[burden] = 0.021, P = 1.19 × 10^−13) and
   neutrophil-lymphocyte ratio (β[burden] = 0.020, P = 7.34 × 10^−13), and
   negative associations with lymphocyte percentage (β[burden] = −0.021,
   P = 2.75 × 10^−13), total protein level (β[burden] = −0.019,
   P = 1.36 × 10^−10), and monocyte percentage (β[burden] = −0.015,
   P = 3.04 × 10^−8).

Fig. 6. Phenotypic associations of the rare-variant genes linked to alcohol
consumption.

   [163]Fig. 6
   [164]Open in a new tab

   The x-axis represented various categories encompassing various
   phenotypes, including 10 neuropsychiatric diseases, 7 cardiovascular
   diseases, 19 digestive diseases, 10 cognition scores, 9 inflammatory
   traits, 30 blood biochemistry traits, 214 nervous traits (166 grey
   matter measures and 48 white matter measures), 8 heart structure
   measures, and 9 spirometry measures. For each phenotype-gene
   association, we applied three different maximum MAF cutoffs (0.01%,
   0.1% and 1%) and two variant annotations (LOF and LOF+missense). The
   y-axis indicates the -log[10] of the p-value for each association, with
   p-values adjusted for age, sex, and ten ancestral principal components.
   The red horizontal line denotes the threshold for significant
   association (P < 0.05/316/12 = 1.32 × 10^−5), and the grey line
   signifies the threshold for a significant association to a lesser
   extent (P < 0.001). Presented p-values are two-sided and unadjusted for
   multiple testing. For each phenotype-gene association, the minimum
   p-value was plotted. Results for all genes at different variant
   frequencies and groups are presented in Supplementary Data [165]23.
   NEU% Neutrophill Percentage, HbA1c Glycated Haemoglobin (Hba1C), LYM%
   Lymphocyte Percentage, TC Cholesterol, NLR Neutrophill Lymphocyte
   Ratio, Reaction Time Mean Time To Correctly Identify Matches, LDLC Ldl
   Cholesterol, TP Total Protein, Glu Glucose; IGF-1, IGF1; Fluid
   Intelligence, Fluid Intelligence Score, ApoB Apolipoprotein B, LYM
   Lymphocyte Count, ALP Alkaline Phosphatase, MON%, Monocyte Percentage;
   TG, Triglycerides; Pairs Matching, Number Of Incorrect Matches In
   Round; Urea, Urea; MON, Monocyte Count; Cr, Creatinine, HDLC Hdl
   Cholesterol, Ca Calcium, ILD Inflammatory Liver Disease, MIG Migraine,
   FEV1 Forced Expiratory Volume In 1-Second (Fev1), FEV1_Best, Forced
   Expiratory Volume In 1-Second (Fev1), Best Measure; FVC_Z, Forced Vital
   Capacity (Fvc) Z-Score, Vermis10 Vermis_10, FEV1_predperc Forced
   Expiratory Volume In 1-Second (Fev1), Predicted Percentage; UA, Urate;
   CRP, C-Reactive Protein; Fornix, Fornix; CysC, Cystatin C; Posterior
   Thalamic Radiation (R), Posterior Thalamic Radiation (R), FVC, Forced
   Vital Capacity (Fvc); FVC_Best, Forced Vital Capacity (Fvc), Best
   Measure; CO, Cardiac Output, Sagittal Stratum (R), Sagittal Stratum
   (R); CI, Cardiac Index; LD, Liver Disease; ApoA, Apolipoprotein A,
   FEV1_Z Forced Expiratory Volume In 1-Second (Fev1) Z-Score, Posterior
   Corona Radiata (L), Posterior Corona Radiata (L), IBS Inflammatory
   Bowel Disease; PD, Parkinson’s Disease; GBD, Gallbladder Disease; Alb,
   Albumin, VD Vitamin D, NEU Neutrophill Count, AMYGD (R), Volume of
   right amygdala.

   Interestingly, the gene-phenotype associations also extended to
   cognitive function and white matter. ANKRD12 showed significant
   associations with lower fluid intelligence scores (β[burden] = −0.028,
   P = 6.03 × 10^−10) and worse performance in the pairs matching task
   (β[burden] = 0.010, P = 2.93×10^−7). GIGYF1 showed nominal associations
   with lower fractional anisotropy (FA) in the fornix tract
   (β[burden] = −0.059, P = 1.06 × 10^−4), and longer reaction time
   (β[burden] = 0.014, P = 1.40×10^−4). The Mendelian randomization
   analyses failed to uncover any causal relationship between cognition
   and alcohol consumption (Supplementary Data [166]25), in line with
   results from previous studies^[167]29. Given the limited evidence
   supporting causal links between cognition and alcohol consumption, it
   is plausible that the observed associations may stem from the
   pleiotropic effects of ANKRD12 and GIGYF1.

   The variant-phenotype association analyses revealed significant
   correlations with various white matter tracts. Notably, significant
   correlations were observed for FA in specific regions, including left
   anterior limb of the internal capsule (β = −0.072, P = 9.52 × 10^−13),
   genu of corpus callosum (β = −0.069, P = 9.68 × 10^−13), and left
   superior frontal-occipital fasciculus (β = −0.067, P = 6.13 × 10^−12).

ExWAS in all white British participants and unrelated non-white British
participants

   Since SAIGE can handle sample relatedness in the regression model, we
   included all 373,152 white British participants (including both
   unrelated and related participants) in the analyses to increase
   statistical power. In the ExWAS for single variants, we identified 26
   independent significant variants associated with alcohol consumption,
   including four variants not detected in the unrelated white British
   sample, of which two were not previously linked to alcohol consumption
   (Supplementary Data [168]26). The gene-based collapsing analysis
   identified 23 potential alcohol consumption-related genes with an
   overall FDR < 0.05. Of the 23 genes, 13 were not found in the unrelated
   white British participants, and among these, eight were not previously
   associated with alcohol consumption (Supplementary Data [169]27).

   Moreover, ExWAS was conducted in 61,076 unrelated non-white British
   participants. While the ExWAS for single variants identified one locus
   significantly linked to alcohol consumption (Supplementary
   Data [170]28), the gene-based collapsing analysis did not uncover any
   significant associations after FDR correction, potentially attributed
   to the constrained sample size among non-white British participants.

Discussion

   Herein we describe the largest comprehensive ExWAS of alcohol
   consumption to date and provide deep biological insights into the
   identified genes via functional analysis and phenome-wide association
   analysis with health-related data from the UKB. We identified ten
   previously unreported genes associated with alcohol consumption as well
   as replicated several known genes, which may shed light on
   pathophysiological processes in alcohol use. Furthermore,
   bioinformatics analyses supported the biological validity of the
   genetic associations and gene expression analysis highlighted the role
   of the cerebellum in alcohol consumption. PheWAS analyses provide
   strong support for the pleiotropic and consequent effects of alcohol
   consumption-related genes on human health, especially on inflammation,
   lipid metabolism, and white matter integrity.

   Previous GWAS studies have enabled the identification of alcohol
   consumption-related genes, but our study extended previous findings via
   the discovery of more genes as well as the identification of more
   common and rare variants to the reported genes of alcohol consumption.
   We have identified thirteen genes at exome-wide significance based from
   rare variants using gene-based collapsing analysis, seven of which
   (GIGYF1, ANKRD12, KDM5B, APC2, LGI2, ENSG00000224076 and ATP1A2) were
   not reported by previous GWAS studies. Moreover, among the 174 reported
   genes from the most recent GWAS studies^[171]6,[172]7,[173]16,[174]17,
   twenty-five showed nominal significance and ADH1C passed Bonferroni
   correction. Notably, utilizing the LOVO analysis, we found that for
   those reported GWAS genes including ADH1C, ADH1A, SNX17, and ADH5,
   removal of a single SNP leads to loss of significance in gene-based
   collapsing analysis, while for the genes not previously reported in the
   GWAS studies, removal of any single SNP does not influence the
   significance. The results indicated that the significance of these
   genes is the cumulative effect from a group of rare SNPs, which may
   explain why they were not detected by previous GWAS studies. This is
   further validated by the single variant analysis, where a significant
   signal was detected in ADH1C while not in those genes. Our results
   emphasized the value of rare variants as well as the necessity of
   gene-based collapsing analysis in WES studies on alcohol consumption.

   For the two rare-variant genes (GIGYF1 and ANKRD12) associated with
   alcohol consumption, GIGYF1, identified as a risk gene for diabetes in
   earlier research^[175]13,[176]30, is a protein-coding gene intricately
   involved in the regulation of cell growth and division. One
   meta-analysis of 38 studies demonstrated that a moderate level of
   alcohol intake was linked to a lower risk of type 2 diabetes compared
   to abstainers^[177]31. The association might be mediated by the
   beneficial metabolic effect of alcohol consumption such as altered HDL
   cholesterol and inflammation levels^[178]26. Meanwhile, results of the
   PheWAS showed that the alcohol consumption-related gene, GIGYF1, was
   significantly associated with blood levels of HDL cholesterol and
   several inflammatory biomarkers. Therefore, GIGYF1 may participate in
   the metabolic disturbance caused by alcohol consumption. As for another
   gene ANKRD12, less evidence was found on its possible role in alcohol
   consumption. While the Gene-SCOUT analysis provided interesting
   findings that GIGYF1 and ANKRD12 showed high similarity in biomarker
   profiles, which suggested that they might execute similar biological
   functions. Interestingly, ANKRD12 and GIGYF1 are associated with a
   higher Townsend deprivation index, which could possibly lead to a less
   access to alcohol^[179]32. Given that those genes were associated with
   cognitive function in our PheWAS results and in previous
   studies^[180]33,[181]34, it is possible that reduced cognitive function
   in the gene carriers results in increased material deprivation and in
   turn reduced alcohol consumption. Nevertheless, the findings may be
   confounded by many factors and the causality is not validated by
   mechanism study, so further research is needed to clarify the potential
   associations of GIGYF1 and ANKRD12 with alcohol consumption.

   In addition to the discovery of genetic associations, we also provide
   insights into alcohol metabolism-related brain alterations based on the
   two rare-variant genes identified in this study. The Gene-SCOUT
   analysis identified a series of genes that were highly similar to these
   two genes. These genes displayed significant enrichment in the
   regulation of glial cell differentiation and observational learning. As
   evidenced by previous human and animal studies, disrupted
   differentiation of glial cell (astrocytes and oligodendrocytes) is one
   of the human alcohol-related neuropathology^[182]35 and heavy alcohol
   exposures could result in cognitive impairment^[183]36. Since alcohol
   consumption influences intracellular signaling mechanisms, causing
   alterations in gene expression that gradually produce long-lasting
   damage in the brain^[184]37, these identified genes might be involved
   in the pathological process. What’s more, glia dysfunction is known to
   cause white matter atrophy, and these two genes are significantly
   expressed in white matter, further hinting that they might mediate
   alcohol-related brain damage. Another finding lies in their dominant
   expression in the cerebellum, one of the major target organs of alcohol
   abuse. Moreover, ANKRD12 and GIGYF1 are well-known genes for reduced
   cognitive function and intellectual disability as evidenced by previous
   studies^[185]33,[186]34. Consistently with previous findings, our
   PheWAS analyses indicated strong correlations between these genes and
   cognitive decline as well as altered white matter integrity, which
   suggests that these genes play a significant role in brain function and
   structure. The findings are plausible as prior studies have observed
   the associations between heavy alcohol consumption and changes in brain
   structure^[187]38,[188]39. More interestingly, alcohol
   consumption-related white matter microstructure changes have been
   considered a hallmark of AUD^[189]40,[190]41. Therefore, ANKRD12,
   significantly associated with alcohol consumption, AUDIT, and white
   matter integrity alterations, might serve as therapeutic targets for
   the prevention of AUD.

   We observed sex heterogeneity for KDM5B, the association between KDM5B
   and alcohol consumption was only observed in the male group. As we only
   observed heterogeneity in one gene, it is possible due to the
   sex-specific biological function of KDM5B. KDM5B encodes a
   lysine-specific histone demethylase, which is an important regulator of
   liver molecular pathways after alcohol consumption^[191]42. Previous
   studies found sex-specific roles of KDM5B in the alcohol-induced
   hepatic response, which regulates a fibrogenic program in females while
   contributes to hepatocyte dedifferentiation and fatty acid synthesis in
   males^[192]43,[193]44. However, the sex-specific mechanisms underlying
   the influence of KDM5B on alcohol consumption is still unclear. Future
   studies to identify the mechanisms will be necessary.

   Despite these significant findings, our study has some limitations.
   First, as WES could only detect variants in the protein-coding regions,
   the possible genetic associations in non-protein-coding region were
   less investigated in this work. Second, because of the scarcity of a
   comparable population cohort with genetic sequencing and phenotype data
   for replication, we relied on existing GWAS data for alcohol
   consumption and AUD to support our findings. Further whole-exome
   studies are needed to replicate the identified genes. Third, the
   causality between the reported genes and alcohol use was largely
   unknown. Further research are needed to replicate and verify the
   identified genes and the potential relationship with alcohol
   consumption. Lastly, participants who drinking up to 3 times monthly
   and less were assigned a weekly drinking level of zero following a
   previous study^[194]45. While this simplified approach may introduce
   some error into their drinking levels, it is expected to be relatively
   small given the infrequency of their alcohol consumption.

   In conclusion, by sequencing the protein-coding regions, we were able
   to replicate the genes previously reported and identify common and rare
   coding variants that have a strong effect on alcohol consumption.
   Additionally, functional analysis of the identified genes not only
   recapitulated known biological processes in alcohol consumption but
   also provided insights into the brain’s role in alcohol consumption. We
   anticipate that our findings of the alcohol consumption-related genes
   will facilitate the identification of individuals that are vulnerable
   or intolerant to alcohol consumption, contributing eventually to the
   prevention as well as treatment of alcohol-related adverse outcomes.

Methods

UK Biobank

   The UKB included phenotypic and genetic information for approximately
   500,000 participants of ages between 40 and 69^[195]46,[196]47.
   Informed consent has been signed by all participants. The UKB cohort
   was approved by the NHS National Research Ethics Service North West
   (reference number: 16/NW/0274). The data utilized in the study included
   demographic data, alcohol-related phenotypes, neuropsychiatric
   diseases, cardiovascular diseases, cognition, brain grey matter and
   white matter phenotypes, heart function, lung function, biochemistry,
   and inflammation phenotypes. The research was performed under
   application number 19542.

Study phenotypes

   The alcohol consumption score was determined through a
   self-administered touchscreen interview conducted during the baseline
   appointment. Initial data acquisition involved obtaining mean weekly
   alcohol consumption data, taking into account various beverage types,
   from participants reporting alcohol consumption more than once or twice
   weekly. Each alcoholic drink type was measured in specific units:
   spirits in measures, wines in glasses, and beer/cider in pints,
   approximately equating to one, two, and two point five units,
   respectively. For respondents indicating intake frequencies of “one to
   three times a month,” “special occasions only,” or “never” (for whom
   weekly alcohol consumption data were unavailable), a weekly volume of 0
   units was assigned. The determination of alcoholic units per week
   involved aggregating the intakes for these five drink types, consistent
   with a previous study^[197]45. The median alcoholic units per week of
   the whole sample was 10 (Supplementary Data [198]2). The alcohol
   consumption score was the log (units+1) transformed alcoholic units per
   week. Detailed information was available in Supplementary Data [199]1.

Whole exome sequencing data

   WES was performed for approximately 454,756 individuals from the UKB
   with IDT xGen Exome Research Panel v1.0^[200]11,[201]18. We implemented
   centralized quality control following extensive quality control
   procedures following previous research^[202]13. Concisely,
   multi-allelic sites were segregated into bi-allelic sites and calls
   with poor genotype quality or excessively low/high genotype depth were
   marked as no-call. Next, we excluded variants located in Ensembl
   low-complexity regions, along with variants possessing call rate ≤ 90%,
   and Hardy-Weinberg Equilibrium (HWE) P-value ≤ 10^−15. Finally, we
   removed participants who withdrew from the UKB, duplicates,
   participants exhibiting discrepancies between self-reported and
   genetically indicated sex, and participants with Ti/Tv, Het/Hom,
   SNV/indel, and the amount of singletons exceeding 8 standard deviations
   from the mean. Additionally, we excluded individuals who were
   genetically related at the 3rd degree or closer in the main analysis.
   Overall, a total of 304,119 individuals with available alcohol
   consumption data and genetic data passed the initial quality check and
   were used in the main analysis. We additionally conducted ExWAS in all
   (both genetically related and unrelated) white British participants and
   unrelated non-white British individuals. White British individuals were
   identified as the intersection of participants who self-reported as
   ‘White British’ and those who exhibited very similar genetic ancestry
   based on genetic components. To control population stratification, we
   generated the top 10 ancestral principal components (PCs) using a
   high-quality independent autosomal variants subset, as outlined in a
   prior study^[203]13. Specifically, this subset of variants comprised
   variants with MAF > 0.1%, HWE P > 10^−6, missingness < 1%, and
   underwent two rounds of pruning (--indep-pairwise 200 100 0.1 and 200
   100 0.05 in PLINK).

Variant annotation

   First, rare variants were defined as MAF less than 1%. SnpEff was
   utilized to annotate the variants^[204]48, during which the most
   detrimental consequence of the gene transcript was retained.
   Subsequently, variants annotated as frameshift, splicing donor, stop
   gain, splicing acceptor, stop loss, and start loss were categorized as
   loss of function (LOF). Variants that were consistently predicted as
   deleteriousness in SIFT^[205]49, PolyPhen2 HDIV, and PolyPhen2
   HVAR^[206]50, LRT^[207]51, and MutationTaster^[208]52 were defined as
   likely deleterious missense.

ExWAS

   ExWAS analysis was conducted using the SKAT-O test through
   SAIGE-GENE + ^[209]53. In SAIGE-GENE + , ultra-rare variants (minor
   allele carrier (MAC) ≤ 10) were collapsed into a pseudo marker,
   effectively addressing data sparsity caused by the presence of
   ultra-rare variants^[210]53. Therefore, both rare and ultra-rare
   variants could be investigated. First, single-variant association
   analyses were performed for all variants with MAC ≥ 20, as suggested by
   SAIGE-GENE + ^[211]53. Independent significant variants were identified
   using linkage disequilibrium (LD)-clumping (r^2 < 0.1), with the UKB
   WES data utilized as the reference panel, and subsequently mapped to
   genes using VEP^[212]54. Then, in the gene-based collapsing analyses,
   SKAT-O tests were conducted utilizing the minimum p-value
   method^[213]53,[214]55. We used three distinct maximum MAF cutoffs
   (0.01%, 0.1%, and 1%) and two annotations masks (LOF and LOF plus
   missense). We adjusted age, sex, and the top ten ancestral PCs (which
   were calculated with WES data). All quantitative phenotypes underwent
   inverse normalization in SAIGE-GENE + . A relative coefficient cutoff
   of 0.05 was applied to the sparse genetic relationship matrix for the
   estimation of variance ratios.

Genotype and imputation

   Genotype data (version 3) were from the UKB cohort. The UKB conducted
   array design, genotyping, quality control, and imputation
   procedures^[215]46. We performed quality control (excluding variants
   with MAF < 0.005, INFO < 0.3, call rate < 90% or HWE P < 10^−50) with
   PLINK v2^[216]56 software. Additionally, participants with missingness
   less than 0.05, no sex mismatch, no abnormal sex chromosome aneuploidy,
   no outliers in heterozygosity rate, and estimated white British
   ancestry, with a maximum of ten putative third-generation relatives,
   were incorporated into the analysis.

ExWAS for AUDIT

   To extend the implications of alcohol consumption findings to alcohol
   use disorder, we conducted an ExWAS utilizing measures from the Alcohol
   Use Disorders Identification Test (AUDIT)^[217]57, obtained through an
   online mental health questionnaire and processed following the
   methodology detailed in the previous study^[218]58. Specifically, the
   scores for the AUDIT subdomains, representing alcohol consumption
   (AUDIT-C) and indicating alcohol dependence and problematic alcohol use
   (AUDIT-P), were calculated by consolidating scores from items 1–3 and
   items 4–10, respectively. The total score (AUDIT-T) was the sum of
   items 1–10. Detailed information was available in Supplementary
   Data [219]1. A total of 101,240 participants with available AUDIT
   measurements, WES data and covariate information were used for the
   analyses. We conducted ExWAS for both the total score and the
   subscores.

LOVO analysis

   The LOVO analysis was performed for associations identified in the
   gene-based analysis. For each gene-phenotype association, the
   collapsing test was iterated upon excluding each variant initially
   included, where each variant would have a P-value. This was undertaken
   to address specific aspects: firstly, to examine the stability and
   consistency of the results across variant exclusions; secondly, to
   discern whether the gene-based collapsing association results were
   predominantly driven by specific variants; and finally, to investigate
   whether the observed gene-based collapsing associations were influenced
   by numerous rare variants characterized by relatively small effect
   sizes. If the collapsing analysis after removing a single variant
   yields an attenuated significance (P > 0.01), that single variant was
   considered to predominantly drive the gene-phenotype
   association^[220]13. This analytical approach allows for a
   comprehensive evaluation of the role of individual variants within the
   broader gene-based context.

Conditional analysis

   To test for independence between the significant rare variant
   associations and nearby common variation, we re-conducted the
   gene-based collapsing analyses additionally correcting the nearby
   common variants associated with alcohol consumption^[221]13. First, we
   conducted association analyses for common variants (MAF > 0.5%) within
   the 500 kb genomic region of the identified genes, utilizing the UKB
   imputed genotype data. Then, LD-clumping was performed to identify
   independent significant loci (P < 1 × 10^−5 and r^2 < 0.01). At last,
   we performed the collapsing analyses additionally adjusting for the
   independent significant loci.

Burden heritability estimation

   We estimated the burden heritability based on rare coding variants (LOF
   and missense) using the burden heritability regression (BHR)
   method^[222]59. The BHR performed regression of the burden test
   statistic on the burden score using summary statistics of the
   association analysis and allele frequencies at the variant level, and
   derived the burden heritability through estimation of the regression
   slope^[223]59.

Pathway enrichment analysis

   We used the g:Profiler^[224]60 software to conduct the enrichment
   analysis, selecting Gene Ontology and KEGG database as the gene set
   databases. The g:SCS (Set Counts and Sizes) correction method was
   employed for multiple testing correction.

Tissue enrichment and expression analysis

   To determine whether the identified genes were enriched in multiple
   tissues, we conducted tissue enrichment analysis using the R package
   TissueEnrich^[225]22. The source data were from the Human Protein
   Atlas, and the hypergeometric test was used^[226]22.

   Transcript expression levels of the two genes (GIGYF1 and ANKRD12) in
   256 tissues were determined utilizing RNA sequencing data from the
   Human Protein Atlas^[227]24. The dataset corresponds to Human Protein
   Atlas version 22.0 and Ensembl version 103.38. Additional details
   regarding the data are available elsewhere at
   ([228]https://www.ebi.ac.uk/biostudies/arrayexpress/studies/E-MTAB-2836
   ).

Lifespan spatio-temporal gene expression trajectory

   The lifespan spatio-temporal brain expression trajectories of the
   alcohol consumption-related genes were characterized using the mRNA-seq
   data of human brain from the PsychENCODE study^[229]25. The expression
   of each gene in each anatomical tissue was estimated. Gene expression
   levels was quantified utilizing the reads per kilobase per million
   mapped reads (RPKM) metric.

Single-cell expression

   We used liver scRNA-seq data from Gene Expression Omnibus (GEO)
   database (accession ID: [230]GSE115469)^[231]61 and processed it with
   the R package Seurat^[232]62. Individual cells with low quality,
   defined as the cells with less than 200 expressed genes or larger than
   75% mitochondrial counts, were excluded. Then the gene expression
   matrix underwent normalization using the NormalizeData function in
   Seurat^[233]62. The top 25 PCs and a resolution of 0.4 were used to
   conduct clustering, and then the clusters were annotated according to
   the previous publication^[234]61.

   Additionally, the brain scRNA-seq data sourced from temporal cortex
   tissues was obtained from the GEO database under accession ID
   [235]GSE173731^[236]63. In the dataset, all cell types in the brain
   were isolated and sequenced^[237]63. Analysis and visualization were
   performed using the metadata files with the R package Seurat^[238]62.

Gene similarity

   We utilized Gene-SCOUT^[239]23 to estimate the similarities between
   genes using association results of collapsing analyses across various
   quantitative traits in the UKB. In this tool, we searched the “seed
   gene” ANKRD12 to identify the similar genes. The top 10 similar genes
   and the “seed gene” were then employed in the enrichment analysis with
   Gene Ontology terms^[240]23.

MRI data and preprocessing

   Structural MRI data were obtained from three dedicated and identical
   imaging centers^[241]64,[242]65. Preprocessing of this data followed a
   pipeline established in previous studies^[243]66,[244]67 with SPM12
   software and the CAT12 toolbox^[245]68 with default settings. This
   included high-dimensional spatial normalization, nonlinear modulations,
   and smoothing (with an 8 mm half-maximum full-width Gaussian kernel).
   For regional grey matter volume, we employed the Automated Anatomical
   Labeling 3 (AAL3) atlas^[246]69, a brain parcellation system that
   subdivides the brain into 166 distinct regions. We utilized the AAL3
   atlas due to its finer parcellation, especially in the subcortical
   regions, which are closely linked to alcohol use and addiction.

   We utilized fractional anisotropy (FA) of white matter tracts provided
   by UKB. Detailed data processing and quality control procedures have
   been comprehensively outlined in prior study^[247]60. Specifically,
   dual diffusion-weighted shells were employed to acquire
   diffusion-weighted images, incorporating 50 distinct diffusion-encoding
   directions for each shell, and with a resolution of 2 × 2 × 2 mm.
   TBSS^[248]70 was used to conduct the alignment of FA images to a
   standard-space white matter skeleton. FA images was further improved
   with high-dimensional FNIRT-based warping for enhanced
   alignment^[249]71. Our analyses encompassed 48 distinct white matter
   tracts extracted based on the JHU ICBM-DTI-81 atlas^[250]72.

Phenome-wide association analysis

   The phenotypes in PheWAS were centered around traits that are
   associated with alcohol consumption, including behavioral aspects and
   health outcomes. The disease-related analysis covered neuropsychiatric
   diseases, cardiovascular diseases, and digestive diseases, which can be
   impacted by alcohol consumption patterns. Additionally, the analysis
   incorporated cognitive tasks, inflammatory traits, blood biochemistry
   traits, neuroimaging traits (including grey and white matter measures),
   and cardiac and lung function measures, all of which are pertinent to
   understanding the impacts of genes related to alcohol consumption on
   human health and functioning. This comprehensive selection of
   phenotypes aligns with the aim of investigating the potential genetic
   influences on alcohol consumption and its related health implications.
   In the analysis of diseases, we investigated 10 neuropsychiatric
   diseases, 7 cardiovascular diseases, and 19 digestive diseases. For the
   analysis of continuous phenotypes, we examined 10 cognition tasks, 9
   inflammatory traits, 30 blood biochemistry traits, 214 neuroimaging
   traits (including 166 grey matter measures and 48 white matter
   measures), 8 heart structure measures, and 9 spirometry measures.
   Comprehensive details regarding the phenotypes can be found in
   Supplementary Data [251]22. We used single-variant association tests
   for identified variants and SKAT-O tests for identified genes^[252]53,
   adjusting for the top ten ancestral PCs, age, and sex.

   For the cognitive function tasks, data were preprocessed similar to the
   previous study^[253]73. We incorporated cognitive tests from both
   baseline and imaging follow-up. Specifically, we selected the
   timepoints that corresponded to the maximum sample size for each
   cognitive test.

Mendelian randomization analysis

   To explore the mediating relationships between ANKRD12, GIGYF1,
   cognition, and alcohol consumption, we first conducted a bidirectional
   Mendelian randomization (MR) between cognition and alcohol consumption
   using TwoSampleMR R package. We employed GWAS summary data for the
   general factor of intelligence, derived from a compilation of seven
   distinct cognitive tests^[254]74, all sourced from the UK Biobank.
   Ensuring the avoidance of sample overlap, we utilized separate GWAS
   summary data for alcohol consumption, excluding participants from the
   UK Biobank^[255]19.

Sensitivity analysis

   To evaluate the stability of the main results, we conducted multiple
   sensitivity analyses. Initially, we excluded participants who were
   former drinkers and non-drinkers (Field 20117) and performed
   association analysis for the identified genes. Additionally, we
   adjusted for rs1229984, a well-known alcohol consumption-related
   locus^[256]6,[257]21, to identify independent associations.

Reporting summary

   Further information on research design is available in the [258]Nature
   Portfolio Reporting Summary linked to this article.

Supplementary information

   [259]Supplementary Information^ (25.6MB, pdf)
   [260]Peer Review File^ (831.2KB, pdf)
   [261]41467_2024_50132_MOESM3_ESM.pdf^ (228.2KB, pdf)

   Description of Additional Supplementary Files
   [262]Supplementary Data 1-28^ (4.2MB, xlsx)
   [263]Reporting Summary^ (118KB, pdf)

Source data

   [264]Source Data^ (3.8MB, zip)

Acknowledgements