Graphical abstract

   graphic file with name fx1.jpg
   [39]Open in a new tab

Highlights

     * •
       This is a comprehensive study on shared genetic backgrounds of 21
       digestive diseases
     * •
       Genetic correlations and causal relationships among these diseases
       are revealed
     * •
       Shared genetic variants and genes inform potential pathogenesis of
       these diseases
     __________________________________________________________________

   Biological sciences; Health sciences; Human genetics

Introduction

   Digestive disorders have significantly increased the years living with
   disability;[40]^1 three digestive malignant neoplasms are ranked into
   top 10 according to the incidence worldwide, including colorectal
   cancer (CRC), gastric cancer (GC), and liver cancer (LC).[41]^2 Among
   the top 10 cancers with the worst prognosis, half are digestive
   malignant neoplasms, including colon, gastric, liver, esophageal, and
   pancreatic cancers.[42]^3 Identifying causal factors of development of
   digestive disorders is crucial for disease prevention. Profound
   influences of genetic variations on the risk of a broad list of
   digestive disorders have been studied by large-scale genomic studies,
   and unveiled genetic loci for Barrett’s esophagus (BE) and other
   digestive disorders.[43]^4 Notably, a substantial proportion of the
   heritability is contributed by common variants leading to
   susceptibility of multiple digestive disorders, emphasizing the
   complexity and highly polygenic nature of these
   conditions.[44]^5^,[45]^6

   Illustrating the causal relationships of cross-disorders and their
   shared genes has considerable implications for disease prevention and
   mechanistic understanding.[46]^7 Simultaneously, deciphering the
   functional genomics of shared genetic factors across cross-traits aids
   in uncovering the biological mechanisms of pleiotropic loci,
   facilitating the identification of targets for clinical diagnosis,
   treatment, and drug intervention. Such study has been successfully
   implemented in psychiatric disorders[47]^8 and pan-cancer.[48]^9 For
   digestive disorders, previous genome-wide association studies have
   identified several pleiotropic loci that were shared among
   gastroesophageal reflux disease (GERD) and severe esophageal and
   colorectal diseases.[49]^10^,[50]^11 However, these studies have not
   delved deeply into the underlying mechanisms and have been limited to a
   few traits, lacking comprehensive research that systematically covers
   common benign and malignant digestive disorders.

   Here, we present a cross-trait analysis on a broad list of digestive
   disorders in UK Biobank (UKB). We address three major questions
   regarding the shared genetic basis of these disorders: 1) causal
   relationship among these digestive disorders; 2) novel susceptibility
   loci and annotated genes contributed to the risk of digestive disorders
   through multiple pathways; and 3) functional explorations of the shared
   genes.

Results

   The flowchart of our study is given in [51]Figure 1. Based on the
   definition of disorders in the UKB ([52]Table S1), 21 types of
   digestive disorders were included. The number of noncancer disorder
   cases varied from 1,115 for cholangitis (CHATIS) to 43,831 for GERD,
   while the number of cancer cases varied from 93 for gallbladder cancer
   (GBC) to 6,015 for CRC ([53]Figure 2A; [54]Table S2).

Figure 1.

   [55]Figure 1
   [56]Open in a new tab

   Flowchart of the study

   In brief, delineation of the causal relationship of 21 digestive
   disorders in UKB allows for the identification of shared variants and
   genes to European populations from different level using the
   cross-trait approach.

Figure 2.

   [57]Figure 2
   [58]Open in a new tab

   The 21 digestive disorders, heritability, and GWAS findings

   (A) Digestive disorders presented by anatomical location and their
   sample size.

   (B) Dot plot indicates heritability estimates and 95% confidence
   interval for the digestive disorders having sample size over 1000.

   (C) Bar plot indicates the number of significant index SNPs
   (p < 5 × 10^−8) from GWAS analyses.

Genome-wide association studies of 21 digestive disorders

   We conducted genome-wide association studies (GWASs) for 21 digestive
   disorders. A total of 204 independent variants reached genome-wide
   significant (p ≤ 5 × 10^−8) for individual disorders ([59]Figure 2C),
   of which 13 were associated with two disorders. 113 variants overlapped
   or had LD r^2 ≥ 0.1 with the previously identified SNPs, while the
   remaining 91 variants were novel (r^2 < 0.1) ([60]Table S3). 69 novel
   variants were independent of previously reported variants but in known
   regions associated with digestive disorders. The top five disorders
   associated with them were cholelithiasis (CHSIS, 44 novel variants),
   cholecystitis (CHETIS, 10 novel variants), gastric and duodenal polyp
   (GDP, 3 novel variants), colorectal polyp (CRP, 3 novel variants), and
   GBC (3 novel variants). The remaining 22 novel variants were in the
   region that was not previously reported for any digestive disorder.

   We estimated the SNP-based heritability (h^2[SNP]) using linkage
   disequilibrium (LD) score regression on both the observed scale and
   liability scale, assuming the proportion of the cases in the sample as
   the disease lifetime risk estimates ([61]Table S4). Among the 15
   digestive disorders with over 1,000 cases, 14 had significant genetic
   heritability. The h^2[SNP] estimates ranged from 5.83% for CRP to
   15.75% for esophageal cancer (EC), except for CHATIS ([62]Figure 2B).

   36 pleiotropic LD blocks were defined using LD clumping procedure
   according to the 204 SNPs identified in GWAS analyses ([63]Figure S1;
   [64]Table S8). 20 pleiotropic blocks had direct evidence (index SNPs
   were previously reported the association with risk of digestive
   disorders) or indirect evidence (index SNPs had high LD (r^2 > 0.1)
   with the SNPs previously reported the association with risk of
   digestive disorders) that were associated with the corresponding
   disorders.

Genetic causal relationships across 21 digestive disorders

   Among 91 pairs of the 14 disorders with significant heritability, 64
   pairs showed positive genetic correlation with Bonferroni correction
   (p ≤ 0.05/91), and the other 18 pairs had nominally significant genetic
   correlations (p ≤ 0.05), indicating considerable genetic basis of
   complex relationships among these disorders ([65]Figure 3A;
   [66]Table S5). Moreover, Bayesian network analysis obtained 53
   high-confidence causal relationships among the digestive disorders
   ([67]Table S6), of which 32 were positively correlated in pairwise
   genetic correlation analyses. Non-cancerous digestive disorders showed
   complex pathogenic interactions and, in turn, to multiple types of
   digestive cancers ([68]Figure 3B).

Figure 3.

   [69]Figure 3
   [70]Open in a new tab

   Genetic correlation and causal inference by Bayesian network

   (A) Results for genetic correlation among the 14 digestive phenotypes
   with sample size over 1000. “∗” represents genetic correlation
   significant after Bonferroni correction (p < 0.05/91). The color and
   size of the square scales with the correlation of pairwise of
   disorders.

   (B) Causal network comprised 21 disorders constructed based on the
   intersectional results of Bayesian network and Mendelian randomization
   analysis to reveal complex genetic relationships. Orange nodes indicate
   cancers and dark blue nodes indicate noncancerous digestive disorders,
   with edges indicating the estimates of the IVW methods.

   Further, to validate the causal relationships inferred by Bayesian
   network, we carried out Mendelian randomization (MR) for all the
   relationships on the network ([71]Table S7). The relationships,
   illustrated as 49 arcs on the network that were replicated in MR
   analysis with statistical significance (p ≤ 0.05), were retained on the
   network ([72]Figures 3B and [73]4, and [74]Table S6). What’s more, 48
   retrained arcs relationships were additionally confirmed using
   generalized summary data-based MR method, of which 42 relationships
   were further confirmed using median-MR method and maximum-likelihood
   method[75]^12 ([76]Table S6).

Figure 4.

   [77]Figure 4
   [78]Open in a new tab

   Causal relationships inferred by Mendelian randomization

   The causal relationships from one causal disorder (on left y axis) to
   the other outcome disorder (on right y axis) were presented by the
   estimates (dot) and 95% confidence interval (horizontal line). A, B, C,
   and D showed the results of causal disorders at esophagus, gastric and
   duodenum, liver-bile-pancreas, and intestines, respectively.

Pleiotropic genetic variants of 21 digestive disorders

   To provide further evidence for shared genetic factors, we performed a
   cross-disorder meta-analysis for the GWAS analyses on the 21 digestive
   disorders using ASSET. Of the 7,337 variants with p ≤ 1 × 10^−4 in the
   single-disorder GWASs, 539 variants were pleiotropic that passed the
   genome-wide significant threshold (P[meta] ≤ 5 × 10^−8) ([79]Figure 5;
   [80]Tables S9 and [81]S10); 176 of them were reported ([82]Table S12).
   75% (4404/539) and 74% (398/539) of the identified pleiotropic variants
   were related to gallbladder disorders including CHSIS and CHETIS,
   respectively. GERD, which had the largest number of cases, was
   associated with 62% (332/539) of the pleiotropic variants.

Figure 5.

   [83]Figure 5
   [84]Open in a new tab

   Manhattan plot of cross-disorder meta-analysis

   The x axis represents genomic position (chromosomes 1–22), and y axis
   represents statistical significance in the scale of -log[10] (P for
   overall test). SNPs with genome-wide significance are shown above the
   horizontal red line which corresponding to the significant threshold at
   5 × 10^−8. The highlighted SNPs in orange are SNPs associated with two
   subsets of disorders while having opposite association direction; those
   in yellow are novel SNPs. Red circles represent pleiotropic SNPs that
   associated with multiple disorders, and the minimum of GWAS p values
   was used for presentation.

   Among 539 pleiotropic SNPs, 498 exhibited effect with consistent
   direction, while 41 showed effect with different directions. Further,
   114 LD blocks were shared between disorders using the same criteria in
   GWAS ([85]Table S11).

Functional characterization of pleiotropic variants

   We annotated SNPs by their physical location on genomic sequence. 28 of
   539 SNPs (5%) were in exon region ([86]Figure S2A), of which 22 were
   nonsynonymous. This result was consistent with previous findings that
   most (∼93%) disease-associated SNPs in the GWAS Catalog were in
   non-coding regions.[87]^13

   To comprehensively investigate the regulatory roles of the variants, we
   systematically annotated all the SNPs in aspect of functional basis
   ([88]Figure S2B). Specifically, 128 SNPs interacted with the target
   genes through 3D chromatin loops, 209 SNPs located in or near
   super-enhancers/promoters, and 335 SNPs acted as eQTLs with the target
   genes. The 394 SNPs (73%) were annotated with at least one type of
   functional categories, and 71 SNPs (13%) had functional support from
   all three types of information. Detailed annotations were provided in
   [89]Tables S9 and [90]S10. The enrichment of functional annotations for
   these SNPs suggested that the pleiotropic SNPs might play essential
   roles in digestive disorders through functional regulation. Unique
   genes can be annotated from different sources ([91]Figure S2C).
   Finally, we obtained 1,381 candidate genes for the following analysis.

Pleiotropic genes shared among digestive disorders

   We tested 1,381 candidate genes that showed clues in the cross-trait
   meta-analysis according to the SNP annotation results ([92]Table S13).
   736 genes located at 146 independent genetic regions showed pleiotropic
   effects on 14 disorders after false discovery rate (FDR) correction for
   multiple testing (FDR-q ≤ 0.05) ([93]Figure S3A; [94]Table S13). About
   half of these pleiotropic genes (369/736) were associated with at least
   three digestive disorders in ACAT analysis.

   To better understand the pathogenic mechanism of these genes, we
   divided these 736 pleiotropic genes into two categories according to
   the results of ACAT: 690 noncancer-related pleiotropic genes and 46
   cancer-related pleiotropic genes. The top noncancer-related pleiotropic
   genes were MCCD1, ATP6V1G2, and LTA that shared among seven digestive
   disorders in ACAT analysis, followed by 17 genes which were detected in
   six digestive disorders ([95]Figure 6A). Notably, most of the top genes
   (PSORS1C2, TCF19, XXbac-BPG299F13.17, HLA-C, HLA-B, MCCD1,
   ATP6V1G2-DDX39B, DDX39B, SNORD117, SNORD84, DDX39B-AS1, ATP6V1G2,
   NFKBIL1, LTA, TNF, and HLA-DQA2) were located on 6p21.33 in major
   histocompatibility complex (MHC) region.

Figure 6.

   [96]Figure 6
   [97]Open in a new tab

   Pleiotropic genes annotation

   (A) describes the noncancer-related genes shared among digestive
   disorders in ACAT.

   (B) shows the cancer-related genes shared among digestive disorders in
   ACAT. Color of the labels in A and B indicates biological type and
   tissue of the gene.

   (C) is the upset plot showing the overlap of pleiotropic cancer-related
   genes identified in the gene-based analysis for different digestive
   disorders.

   Approach of the genes annotated.

   a: annotated as novel gene.b: positional mapped from SNPs.c: mapped
   through three-dimensional chromatin looping.d: mapped to super
   enhancer/promoter.e: mapped by eQTL relationship.f: identified by organ
   specific SMR analysis.g: identified by whole blood SMR analysis.h:
   identified by cross-organ SMR analysis.

   We highlighted 46 cancer-related pleiotropic genes which associated
   with three digestive cancers, including EC (1 gene), small intestinal
   cancer (SIC, 4 genes), and CRC (41 genes) ([98]Figures 6C and 6D). Of
   these genes, 14 were novel, 39 were annotated from position, three were
   interacted with SNP through 3D chromatin loops, 16 were target genes
   for super-enhancer/promoter, and 27 were based on eQTLs.

   Further, we explored whether the effect of genetic variants on risk of
   disorders was mediated by alteration of the corresponding genes’
   expression. The summary data-based MR test was performed on 14
   disorders that shared genes identified in ACAT for 1,023 probes that
   had at least one cis-eQTL at P[eQTL] ≤ 5 × 10^−8. After HEIDI test, we
   retained the results of gene-trait pairs that had been identified in
   the ACAT analysis ([99]Table S14). We identified 65 pairs of
   association on gene expression and risk of disorder with FDR-q ≤ 0.05
   in the specific tissue, 299 pairs in the whole blood, and 184 pairs in
   the cross-tissue ([100]Figures S3B–S3D). Of these, 130 genes’
   expressions were associated with two or more disorders. The analyses
   proved that the pleiotropic genetic variants and corresponding gene
   transcription contributed to the risk of multiple digestive disorders.

Functional enrichment analysis of pleiotropic genes

   The Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and
   Genomes (KEGG) pathway analysis performed by the noncancer-related
   genes and cancer-related genes were utilized to explore the shared
   biological functions and pathways related to digestive disorders
   ([101]Tables S15, [102]S16, [103]S17, and [104]S18). The top 10
   significant GO results and all significant KEGG results are shown in
   [105]Figure S4.

   For noncancer-related genes, the GO enrichment analysis showed that
   they were enriched in the biological process (BP) related to chronic
   inflammation and immune responses, such as cellular response to
   interferon-gamma. Meanwhile, noncancer-related genes were significantly
   enriched in cellular component (CC) related to intestinal
   inflammation,[106]^14 such as integral component of endoplasmic
   reticulum membrane. For the molecular function, these genes are
   enriched in MHC class II receptor activity, peptide antigen binding,
   and glucuronosyltransferase activity. For KEGG pathway, the genes were
   enriched in pathways which play the potential role in digestive system,
   such as antigen processing and presentation, bile secretion, and
   intestinal immune network for IgA production.

   For cancer-related genes, the GO enrichment analysis results showed
   that the top significant BPs were related to epithelial-mesenchymal
   transition, which is known to be crucial for malignant progression,
   such as regulation of epithelial to mesenchymal transition.[107]^15
   Interestingly, the top CC was laminin complex, which may act as
   regulators of cancer stem cells, and play an instrumental role in
   long-term cancer maintenance, metastasis development, and therapeutic
   resistance.[108]^16 For the KEGG pathway enrichment, the significant
   results were related to developmental pathways (TGF-β and Hippo) and
   signaling pathways regulating pluripotency of stem cells.

   To summarize, these results mentioned previously suggested that the
   pleiotropic genes were closely related to digestive system and cancer.

Drug-gene interactions related to digestive disorders

   As described previously, we detected 1,812 unique drugs that had
   drug-gene interactions with the target pleiotropic genes. The top ten
   genes related to drugs were EHMT2, KCNH2, SMAD3, FEN1, ABCB1, MPHOSPH8,
   TNF, CNR1, UGT1A1, and CYP19A1. Among these drugs, 66 drugs were
   indicated for digestive disorders ([109]Table S19).

Discussion

   This is the first study that comprehensively investigated the causal
   relationships and shared genetic factors across 21 digestive disorders
   among 329,707 European individuals of UKB. Specifically, we explored 49
   causal relationships among the digestive disorders and detected 539
   pleiotropic SNPs enriched for regulatory functions, which mapped to 46
   target genes shared across digestive cancers and noncancerous digestive
   disorders. Our findings provided new insights into the etiology and
   causality of digestive disorders.

   The broad genetic overlap between pairwise disorders reflected the
   shared genetic across these digestive disorders, and prompting the
   exploration of phenotypic causal network, which further proved by MR, a
   sophisticated causal inference method. The 20 disorders were involved
   in the causal network except for SIC, partly due to insufficient number
   of cases. For some digestive disorders, we validated several
   relationships through the methodology of genetic studies that
   recognized in clinical and experimental studies, such as CHSIS and
   cancer,[110]^17 BE, and EC.[111]^18 In comparison, we discovered the
   existed significant genetic correlation between irritable bowel
   syndrome (IBS) and inflammatory bowel disease (IBD), differing from the
   previous study which removed IBD cases from IBS cases and may loss the
   potential overlap.[112]^10 Moreover, some relationships were evaluated
   by us for the first time, such as IBS and CRC, which provided new clue
   to genetic basis from both MR and Bayesian causal network. Most
   importantly, this study also indicates potential causal relationships
   among noncancerous disorders to cancers, such as from BE to EC, from
   IBS and CRP to CRC, which may provide evidence for the means forward
   for cancer prevention and warrants further investigation.

   Our study identified pleiotropic LD blocks for digestive disorders,
   most of which were previously reported in GWAS on digestive disorders,
   indicating the reliability of our results. Interestingly, a
   considerable number of blocks were located at 2p21; the leading
   independent variant (rs56266464) is located at super enhancer of ABCG5
   and ABCG8 which have role in cholesterol secretion and may contribute
   to sterol accumulation by mutation,[113]^19 and shared among GERD,
   gastritis and duodenitis (GDS), CHETIS, and CHSIS. Insights into these
   complex relationships may inform personalized treatment strategies,
   guide drug development, and facilitate early diagnosis and risk
   assessment, ultimately providing more accurate and individualized
   guidance for clinical decision-making.

   Cross-trait meta-analysis detected more than double signals than that
   in GWAS, significantly increased statistical power, especially for the
   disorders with small sample size. The top 12 variants were all at 2p21,
   which was more tissue specific and congregated in hepatobiliary and
   pancreatic diseases (liver and intrahepatic bile ducts cancer, CHATIS,
   bile duct cancer, CHETIS, CHSIS, GBC, and pancreatic cancer). The top
   novel variant 13:29549405:AT:A which was associated with 20 digestive
   disorders, was an intronic variant located at 13q12.3 and ∼50 kb
   upstream of microtubule-associated scaffold protein 2. This region was
   previously associated with non-alcoholic fatty liver disease
   (NAFLD)[114]^20 and peptic ulcer.[115]^10 Moreover, this gene had been
   reported to be capable of regulating entotic cell-in-cell formation,
   which was described as a nonapoptotic cell death process that occurred
   in human tumors.[116]^21 Furthermore, in conjunction with the causal
   network, the pleiotropic variants could account for plentiful causal
   pathways. Meanwhile, rs760077, a missense variant of MTX1 at 1q22 and
   in high LD (r^2 = 0.72) with rs2075570, which has been reported having
   association with susceptibility of CRC[117]^22 and gastric
   cancer[118]^23 in European population, exhibits a significant
   association with risks of nine types of digestive disorders (esophageal
   ulcer, BE, gastric and duodenal ulcer, GC, IBD, CRP, liver fibrosis and
   cirrhosis, bile duct cancer, and GBC) which are also connected on the
   causal network.

   It is noteworthy that we identified 41 variants that had heterogeneous
   effects on risk of digestive disorders although some of them had
   positive genetic correlation and causal relationships. Nine of them are
   located in the immune-mediated human leukocyte antigen region (6p21.3),
   which were highly polymorphic and had complex associations with
   digestive disorders with different pathological conditions.[119]^24
   Other regions also had several bidirectional variants. Notably,
   consistent with our results, rs1260326 in the exon of GCKR at 2p23.3
   was reported to have the opposite effect in gallstone disease (risk
   allele: T, OR = 0.89)[120]^25 compared to that with the other
   disorders, including NAFLD (risk allele: T, OR = 1.28),[121]^26 IBD
   (risk allele: T, OR = 1.38),[122]^27^,[123]^28 Crohn’s disease, and
   ulcerative colitis (risk allele: T, OR = 1.046).[124]^29 The
   heterogeneity of effects among digestive disorders could help the
   better understanding of cross-trait genetic relationships.

   Notably, the pleiotropic variants were annotated in both positional and
   functional aspects to maximize the list of potential genes involved in
   the risk of gastric disorders, especially for the noncoding
   variants.[125]^30 In the gene-level analyses, abundant significant
   genes had been shown to play important roles in the digestive
   disorders’ pathogenesis. ATP6V1G2, which is linked to seven types of
   digestive disorders including GERD, GDS, GDP, IBD, IBS, CHETIS, and
   CHSIS, plays a significant role in human energy metabolism and induces
   oxidative stress, and had been considered as the risk gene for
   CRC.[126]^31 LTA Lymphotoxin alpha, corresponding to the same list of
   seven disorders described previously, a member of the tumor necrosis
   factor family, is among the master regulators of intestinal lymphoid
   development[127]^32 and was suggested to play a bigger part in
   esophageal metaplasia.[128]^33 Inter-alpha-trypsin inhibitor heavy
   chain 4, which was associated with six digestive disorders including
   GERD, GDS, GDP, CRP, CHETIS, and CHSIS, located on 3p21.1, has been
   reported related to growing early colorectal adenomas.[129]^34

   Moreover, we highlighted the 46 genes that could drive digestive
   disorders to cancers. Notably, the top three protein-coding genes
   (TMEM110-MUSTN1, TMEM110, and SFMBT1) shared among six disorders,
   including GERD, GDS, GDP, CRP, CHSIS, and CRC. TMEM110-MUSTN1 is novel
   for digestive disorder, but has been identified as the putative marker
   for lung adenocarcinoma.[130]^35 TMEM110, also known as STIMATE, was
   novel but was found to be a regulator of STIM1 activation, which could
   promote tumor growth and metastasis in a variety of cancer
   types.[131]^36 Another reported gene, SFMBT1, had been verified the
   potential oncogenic function using in vitro functional assays in
   multiple CRC cells.[132]^37 These findings provide a deeper
   understanding of the genetic mechanisms and pathogenesis underlying
   digestive disorders. Worth further exploration is whether these genes
   share similar genetic pathways among different digestive disorders, and
   whether their roles vary across different organs.

   Functional enrichment analysis showed that genes related to benign
   digestive tract traits were mainly enriched in pathways related to
   chronic inflammation and immune response, which was basically
   consistent with the previous review.[133]^38 However, genes related to
   malignant digestive tract traits are enriched in important signal
   pathways related to carcinogenesis, including TGF-β and Hippo signal
   pathways regulating stem cell pluripotency. Among them, several
   crosstalk modes between TGF-β family signal and Hippo signal have been
   proved to regulate the proliferation, invasion, and migration of cancer
   cells.[134]^39 In this study, cancer targeting-related genes enriched
   in these pathways also affect other precancerous lesions, which may
   provide clues for the identification of cancer progression and
   metastasis. At the same time, the genes that jointly drive benign and
   malignant diseases may provide new insights into disease prevention and
   clinical treatment.

   In this study, the cancer-related genes enriched in these pathways were
   found to impact other benign disorders, possibly contributing to a
   pro-oncogenic environment. This provides insights into potential clues
   for identifying cancer progression and metastasis, indicating shared
   biological mechanisms among diseases that might influence the
   development of cancer. By delving into the molecular mechanisms of
   these shared effects through gene and pathway analysis, we can infer
   and identify key factors that may influence cancer progression and
   metastasis. Simultaneously, the remarkable consistency of driver genes
   across different diseases in this study may offer novel insights for
   clinical treatment and disease prevention. If these driver genes
   maintain consistency across multiple diseases, they could play pivotal
   roles in the pathological processes of various conditions. This opens
   doors to opportunities for developing treatment methods and
   preventative strategies targeting these genes, introducing new
   possibilities for disease management and intervention. We also provided
   further evidence supporting existing drugs for the treatment of
   digestive disorders. For instance, several experimental researches had
   identified the importance of EHMT2 (also known as G9a) in multiple
   digestive disorders including GC,[135]^40 LC,[136]^41 and CRC.[137]^42

   Our study has several strengths. First, this is a comprehensive study
   to investigate the relationships among a broad list of digestive
   disorders, in both phenotypic and genetic aspects, from single disorder
   to multiple disorders, and from statistical association to medical
   causality. This comprehensiveness not only enriches our understanding
   of the interactions between diseases, but also provides insight into
   the development of more accurate prevention and treatment strategies.
   Second, our study uncovers several novel and crucial genetic variants
   and genes that contribute to the causal pathway of multiple digestive
   disorders, which provide new clues for further mechanistic and
   functional research. Third, we explain the relationship between chronic
   diseases and gastrointestinal tumors through gene-based analysis and
   identify pleiotropic genes with remarkable biological functions. As we
   know, gastrointestinal tumors may develop from chronic diseases. Thus,
   focusing on the shared genes among them may help identify the high-risk
   population carrying risk alleles that are more susceptible to cancers.

   In summary, our study substantiates the extensive genetic correlations
   and causal relationships among 21 digestive disorders, identifying
   shared genetic factors and elucidating the underlying biological
   mechanisms among these conditions. These findings provide insights into
   the etiology, causal relationships, and potential drug targets for
   clinical interventions.

Limitations of the study

   We also acknowledge the limitations of this study. First, number of
   cases for individual disorders varied from 93 for GDC to 43,831 for
   GERD. Sample size of part disorders was small, which limited power to
   detect pleiotropic effects. Additionally, the imbalance in sample sizes
   potentially results in an inflation of type I error rates. Second, we
   included only individuals of European ancestry to avoid potential
   confounding due to ancestral heterogeneity across distinct disorder
   studies. It is essential to evaluate the signals in non-European
   populations. Third, functional clues of this study were bioinformatics
   explorations using public databases which warrant well-designed
   experimental studies in future. Fourth, this study does not
   specifically explore the role of epigenetic factors, and requires more
   in-depth correlation analysis.

STAR★Methods

Key resources table

   REAGENT or RESOURCE SOURCE IDENTIFIER
   Biological samples
     __________________________________________________________________

   UK Biobank: 57471 UK Biobank [138]https://www.ukbiobank.ac.uk/
     __________________________________________________________________

   Deposited data
     __________________________________________________________________

   eQTL data from eQTLGen eQTLGen [139]https://eqtlgen.org/
   eQTL data from GTEx GTEx [140]https://www.gtexportal.org/home
     __________________________________________________________________

   Software and algorithms
     __________________________________________________________________

   PLINK v1.90 [141]http://pngu.mgh.harvard.edu/purcell/plink/
   LDSC [142]https://github.com/bulik/ldsc
   ANNOVAR
   [143]https://annovar.openbioinformatics.org/en/latest/user-guide/downlo
   ad/
   bedtool [144]https://code.google.com/archive/p/bedtools/
   Summary-data-based Mendelian Randomization
   [145]https://yanglab.westlake.edu.cn/software/smr/
   LDlink R package [146]https://github.com/CBIIT/LDlinkR
   bnlearn R package [147]https://github.com/cran/bnlearn
   MendelianRandomization R package
   [148]https://cran.r-project.org/web/packages/MendelianRandomization/
   GSMR R package [149]https://yanglab.westlake.edu.cn/software/gsmr/
   ASSET R package [150]https://dceg.cancer.gov/tools/analysis/asset
   ACAT R package [151]https://github.com/yaowuliu/ACAT
   clusterProfiler R package
   [152]https://bioconductor.org/packages/release/bioc/html/clusterProfile
   r.html
   original code This paper
   Zenedo;[153]https://doi.org/10.5281/zenodo.8405925
   [154]Open in a new tab

Resource availability

Lead contact

   Further information and requests for resources should be directed to
   and will be fulfilled by the lead contact, Yongyue Wei
   (ywei@pku.edu.cn).

Materials availability

   This study did not generate new unique reagents.

Data and code availability

   The data used in this study is all from public databases. Data support
   the main findings in this study are accessible via the UK Biobank under
   application number 57471. Other data can be obtained from the GTEx and
   eQTLGen. Download URLs are listed in the [155]key resources table.

   Original code has been deposited at Zenodo and is publicly available as
   of the date of publication. DOIs are listed in the [156]key resources
   table.

   Any additional information required to reanalyze the data reported in
   this paper is available from the [157]lead contact upon request.

Experimental model and subject details

Study population

   The data were obtained from the UKB cohort (Proposal ID: 57471). UKB is
   a population-based longitudinal cohort of ∼500,000 individuals
   recruited at 22 centers across the United Kingdom.[158]^43 The UKB
   phenotypes were derived from the following data field IDs: self-report
   (20001, cancer code; 20002, noncancer illness code), ICD10 (41270,
   diagnoses in ICD10; 40001, underlying (primary) cause of death in
   ICD10), ICD9 (41271, diagnoses in ICD9) ([159]Figure 2A and
   [160]Table S1). Individuals who have any other disorders of the
   digestive system were excluded according to the above data fields, and
   the rest of the individuals were defined as controls ([161]Table S2).
   Analyses were limited to ‘Caucasian’ according to Field ID 22006 to
   reduce population stratification. The kinship relationship was inferred
   by KING software with default parameters.[162]^44 After filtering,
   329,707 European individuals including 116,382 cases with at least one
   digestive disorder and 213,325 controls were retained.

GWAS statistics

   Genotyping was conducted using either the UKB Axiom array or the UK
   BiLEVE array.[163]^45 We excluded SNPs with imputation accuracy (Info)
   score < 0.8, minor allele frequency (MAF) < 0.01, Hardy-Weinberg
   equilibrium test P value < 1.0 × 10^-6, or missing genotype rate > 0.05
   using PLINK 1.9,[164]^46 leaving 8,573,123 variants for the following
   analyses.

   We performed case-control GWAS analyses using a logistic regression
   model additively modeled the SNPs with genetic sex, age, and top 10
   ancestry principal components (PCs) as covariates in PLINK. We randomly
   selected 20,000 European individuals and set SNPs of them as linkage
   disequilibrium (LD) reference. Independent trait-associated SNPs were
   generated using PLINK (--clump-p1 5×10^-8 --clump-r2 0.1 –clump-kb
   500).

   The significant SNPs were searched in GWAS Catalog and were divide into
   two categories: previously reported SNPs related to digestive disorders
   and novel SNPs, via R package LDlink according to the published GWAS
   from GWAS Catalog. We determined that a SNP was potentially novel if
   GWAS Catalog SNPs had LD r2 ≤ 0.1 with the SNP.[165]^9 Based on the
   GWAS analysis, shared LD blocks were assessed for overlap among
   multiple digestive disorders.

Method details

   Our study does not involve experiments, and the relevant statistical
   methods and analysis procedures will be discussed in the
   "[166]Quantification and statistical analysis" section.

Quantification and statistical analysis

   All statistical analyses using R packages were performed using R 4.2.1,
   unless otherwise stated. Information on specific statistical analyses
   are described below.

Genetic heritability and genetic correlation

   We estimated the SNP-based heritability (h^2[SNP]) using linkage
   disequilibrium score regression (LDSC).[167]^47 To convert to
   liability-scale heritability, we adjusted for lifetime risks of each
   digestive disorder based on the proportion of the cases in the sample.
   The genomic inflation factor (λ[GC]) was also reported for each
   disorder. Genetic correlations (r[g]) for each pair of the 21 digestive
   disorders were calculated using bivariate LDSC.

Inference of Bayesian causal network

   To understand the causal relationship among multiple digestive
   disorders, we conducted the Bayesian network using the score-based
   hill-climbing (HC) algorithm with a sufficiently large sample size to
   enable effective inference.[168]^48 In our study, the network was
   bootstrapped 2000 times, using the 21 disorders as discrete variables,
   and arc directions were identified significantly which the probability
   is more than 85%3. The strength of the probabilistic relationships
   expressed by the arcs was measured by the logarithm of the Bayesian
   Dirichlet equivalent score (bde).[169]^49 For the undirected arc which
   probability of its direction is 0.5, we retained the direction which
   had the stronger strength. The Bayesian network was generated in R
   package bnlearn.

Mendelian randomization analysis

   To explore the potential causal effect among all pairs of 21 digestive
   disorders, we used Mendelian randomization (MR) with
   exposure-significant SNPs as the instrument variables. Considering that
   some digestive disorders had insufficient number of cases, resulted in
   the limited genome-wide significant SNPs (P ≤ 5 × 10^-8), we relaxed
   the significance threshold to 5 × 10^-6 to obtain sufficient genetic
   instrumental variables for those digestive disorders. Due to the
   complexity and strong linkage disequilibrium of the MHC region, only
   the most significant SNP within MHC region (chr6: 25-34 Mb) was
   reserved for MR analysis.[170]^50 Based on the UKB reference panel, we
   used linkage disequilibrium r^2 < 0.01 as a clumping threshold and set
   the physical distance threshold to be 10 Mb to ensure uncorrelated
   genetic instruments.

   We applied the inverse-variance weighted (IVW) method to estimate the
   causal relationship.[171]^12 To ensure the robustness of the results,
   we additionally performed GSMR,[172]^51 median-MR,[173]^52 and
   mendelian randomization analysis using Maximum-likelihood[174]^12 as
   sensitivity analysis to control for the influence of pleiotropic
   effects, instrumental outliers, and sample overlap. Statistical
   analyses were performed using the packages
   MendelianRandomization[175]^53 and GSMR.

Cross-disorder GWAS meta-analysis

   To identify the shared variation of the multiple digestive disorders,
   cross-disorder meta-analysis was carried out via association analysis
   based on subsets (ASSET).[176]^54 We conducted ASSET analyse on the
   independent signals (index SNPs which P[GWAS] ≤ 1 × 10^-4) for each
   digestive disorder. In the bidirectional pleiotropy analysis, P value
   for each direction is provided as well as an overall P for the total
   association signal for both directions combined. The pleiotropic
   independent variants were determined via LD clumping with overall P ≤
   5 × 10^-8, and other SNPs were clumped with the lead variant if they
   had overall P < 0.05, were within 500kb of the index SNP and had
   r^2 > 0.1 with the index SNP. A SNP was determined to have effect with
   consistent direction if the overall P ≤ 5 × 10^-8 and the P for one
   direction was ≤ 0.05. Similarly, a SNP was determined to have effect
   with different directions if the overall P ≤ 5 × 10^-8 and the P values
   for both directions were < 0.05.

Functional annotation and gene-mapping of pleiotropic variants

   In terms of assessing variant functions and mapping SNPs to genes, we
   first annotated SNPs based on ANNOVAR.[177]^55 To perform a more
   comprehensive evaluation of functional genetic variations, we obtained
   candidate genes from other various resources including VARAdb[178]^56
   and 3DSNP[179]^57 as supplement to ANNOVAR.

   We also annotated SNPs to genes in which the SNPs located in or near
   the super-enhancers and promoters. 3DSNP was used to map SNPs to distal
   target genes through three-dimensional (3D) chromatin looping. Cis-eQTL
   mapping provides significant genes (FDR q value<0.05) in nine specific
   digestive tissue types (esophagus muscularis, esophagus mucosa,
   esophagus gastroesophageal junction, stomach, small intestine terminal
   ileum, colon sigmoid, colon transverse, liver, pancreas) from GTEx
   v8[180]^58 and whole blood from eQTLGen[181]^59 database. The eQTLGen
   provides the largest existing eQTL summary statistics from 31,684 whole
   blood samples. All tissue or cell types corresponding to the data
   source above were digestive-specific or whole blood.

   Annotation results of multiple means were merged to form a list of
   candidate genes. The function cluster from bedtools[182]^60 was used to
   cluster these genes into independent 1-Mb regions. MalaCards,[183]^61 a
   database that provides the gene-disease relations from multiple data
   sources, was used to search for the existing evidence for the
   association between candidate genes and disorders.

Gene-based association analysis

   We applied aggregated Cauchy association test (ACAT) to combine the
   statistical evidence from multiple SNPs within the corresponding gene
   to determine the association of target gene and individual digestive
   disorders via R package ACAT based on the GWAS summary results.[184]^62
   Gene boundary relies on Ensembl database build GRCh37.3, extending 35
   kb upstream and 10 kb downstream to include regulatory regions.[185]^63
   Genomic locations unavailable from Ensembl were manually annotated
   using NCBI’s Gene online web resource. Pseudogenes were not included
   because of potential concerns of inaccurate calling.[186]^64

SMR analysis for candidate genes

   Summary-data-based Mendelian randomization (SMR)[187]^65 was used to
   provide putative causal evidence between SNPs and disorders via gene
   expression. SMR was performed using the expression quantitative trait
   Loci (eQTL) summary statistics from the eQTLGen and GTEx v8 described
   in the supplementary methods. Only transcripts with at least one
   cis-eQTL (P ≤ 5 × 10^-8) were taken into consideration. The significant
   threshold for the Heterogeneity in Dependent Instrument (HEIDI) test
   was PHEIDI ≥ 0.01.

Pathway enrichment analysis and drug target exploration

   To explore functional discrepancy of the detected pleiotropic genes,
   the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes
   (KEGG) analyses were performed by the R package clusterProfiler with
   p-value cutoff = 0.05. To investigate the potential drugs related to
   digestive disorders, drug target genes and indications were obtained
   from the Drug-Gene Interaction Database (DGIdb),[188]^66 and DrugBank
   version 5.1.9.[189]^67

Acknowledgments