Graphical abstract graphic file with name fx1.jpg [39]Open in a new tab Highlights * • This is a comprehensive study on shared genetic backgrounds of 21 digestive diseases * • Genetic correlations and causal relationships among these diseases are revealed * • Shared genetic variants and genes inform potential pathogenesis of these diseases __________________________________________________________________ Biological sciences; Health sciences; Human genetics Introduction Digestive disorders have significantly increased the years living with disability;[40]^1 three digestive malignant neoplasms are ranked into top 10 according to the incidence worldwide, including colorectal cancer (CRC), gastric cancer (GC), and liver cancer (LC).[41]^2 Among the top 10 cancers with the worst prognosis, half are digestive malignant neoplasms, including colon, gastric, liver, esophageal, and pancreatic cancers.[42]^3 Identifying causal factors of development of digestive disorders is crucial for disease prevention. Profound influences of genetic variations on the risk of a broad list of digestive disorders have been studied by large-scale genomic studies, and unveiled genetic loci for Barrett’s esophagus (BE) and other digestive disorders.[43]^4 Notably, a substantial proportion of the heritability is contributed by common variants leading to susceptibility of multiple digestive disorders, emphasizing the complexity and highly polygenic nature of these conditions.[44]^5^,[45]^6 Illustrating the causal relationships of cross-disorders and their shared genes has considerable implications for disease prevention and mechanistic understanding.[46]^7 Simultaneously, deciphering the functional genomics of shared genetic factors across cross-traits aids in uncovering the biological mechanisms of pleiotropic loci, facilitating the identification of targets for clinical diagnosis, treatment, and drug intervention. Such study has been successfully implemented in psychiatric disorders[47]^8 and pan-cancer.[48]^9 For digestive disorders, previous genome-wide association studies have identified several pleiotropic loci that were shared among gastroesophageal reflux disease (GERD) and severe esophageal and colorectal diseases.[49]^10^,[50]^11 However, these studies have not delved deeply into the underlying mechanisms and have been limited to a few traits, lacking comprehensive research that systematically covers common benign and malignant digestive disorders. Here, we present a cross-trait analysis on a broad list of digestive disorders in UK Biobank (UKB). We address three major questions regarding the shared genetic basis of these disorders: 1) causal relationship among these digestive disorders; 2) novel susceptibility loci and annotated genes contributed to the risk of digestive disorders through multiple pathways; and 3) functional explorations of the shared genes. Results The flowchart of our study is given in [51]Figure 1. Based on the definition of disorders in the UKB ([52]Table S1), 21 types of digestive disorders were included. The number of noncancer disorder cases varied from 1,115 for cholangitis (CHATIS) to 43,831 for GERD, while the number of cancer cases varied from 93 for gallbladder cancer (GBC) to 6,015 for CRC ([53]Figure 2A; [54]Table S2). Figure 1. [55]Figure 1 [56]Open in a new tab Flowchart of the study In brief, delineation of the causal relationship of 21 digestive disorders in UKB allows for the identification of shared variants and genes to European populations from different level using the cross-trait approach. Figure 2. [57]Figure 2 [58]Open in a new tab The 21 digestive disorders, heritability, and GWAS findings (A) Digestive disorders presented by anatomical location and their sample size. (B) Dot plot indicates heritability estimates and 95% confidence interval for the digestive disorders having sample size over 1000. (C) Bar plot indicates the number of significant index SNPs (p < 5 × 10^−8) from GWAS analyses. Genome-wide association studies of 21 digestive disorders We conducted genome-wide association studies (GWASs) for 21 digestive disorders. A total of 204 independent variants reached genome-wide significant (p ≤ 5 × 10^−8) for individual disorders ([59]Figure 2C), of which 13 were associated with two disorders. 113 variants overlapped or had LD r^2 ≥ 0.1 with the previously identified SNPs, while the remaining 91 variants were novel (r^2 < 0.1) ([60]Table S3). 69 novel variants were independent of previously reported variants but in known regions associated with digestive disorders. The top five disorders associated with them were cholelithiasis (CHSIS, 44 novel variants), cholecystitis (CHETIS, 10 novel variants), gastric and duodenal polyp (GDP, 3 novel variants), colorectal polyp (CRP, 3 novel variants), and GBC (3 novel variants). The remaining 22 novel variants were in the region that was not previously reported for any digestive disorder. We estimated the SNP-based heritability (h^2[SNP]) using linkage disequilibrium (LD) score regression on both the observed scale and liability scale, assuming the proportion of the cases in the sample as the disease lifetime risk estimates ([61]Table S4). Among the 15 digestive disorders with over 1,000 cases, 14 had significant genetic heritability. The h^2[SNP] estimates ranged from 5.83% for CRP to 15.75% for esophageal cancer (EC), except for CHATIS ([62]Figure 2B). 36 pleiotropic LD blocks were defined using LD clumping procedure according to the 204 SNPs identified in GWAS analyses ([63]Figure S1; [64]Table S8). 20 pleiotropic blocks had direct evidence (index SNPs were previously reported the association with risk of digestive disorders) or indirect evidence (index SNPs had high LD (r^2 > 0.1) with the SNPs previously reported the association with risk of digestive disorders) that were associated with the corresponding disorders. Genetic causal relationships across 21 digestive disorders Among 91 pairs of the 14 disorders with significant heritability, 64 pairs showed positive genetic correlation with Bonferroni correction (p ≤ 0.05/91), and the other 18 pairs had nominally significant genetic correlations (p ≤ 0.05), indicating considerable genetic basis of complex relationships among these disorders ([65]Figure 3A; [66]Table S5). Moreover, Bayesian network analysis obtained 53 high-confidence causal relationships among the digestive disorders ([67]Table S6), of which 32 were positively correlated in pairwise genetic correlation analyses. Non-cancerous digestive disorders showed complex pathogenic interactions and, in turn, to multiple types of digestive cancers ([68]Figure 3B). Figure 3. [69]Figure 3 [70]Open in a new tab Genetic correlation and causal inference by Bayesian network (A) Results for genetic correlation among the 14 digestive phenotypes with sample size over 1000. “∗” represents genetic correlation significant after Bonferroni correction (p < 0.05/91). The color and size of the square scales with the correlation of pairwise of disorders. (B) Causal network comprised 21 disorders constructed based on the intersectional results of Bayesian network and Mendelian randomization analysis to reveal complex genetic relationships. Orange nodes indicate cancers and dark blue nodes indicate noncancerous digestive disorders, with edges indicating the estimates of the IVW methods. Further, to validate the causal relationships inferred by Bayesian network, we carried out Mendelian randomization (MR) for all the relationships on the network ([71]Table S7). The relationships, illustrated as 49 arcs on the network that were replicated in MR analysis with statistical significance (p ≤ 0.05), were retained on the network ([72]Figures 3B and [73]4, and [74]Table S6). What’s more, 48 retrained arcs relationships were additionally confirmed using generalized summary data-based MR method, of which 42 relationships were further confirmed using median-MR method and maximum-likelihood method[75]^12 ([76]Table S6). Figure 4. [77]Figure 4 [78]Open in a new tab Causal relationships inferred by Mendelian randomization The causal relationships from one causal disorder (on left y axis) to the other outcome disorder (on right y axis) were presented by the estimates (dot) and 95% confidence interval (horizontal line). A, B, C, and D showed the results of causal disorders at esophagus, gastric and duodenum, liver-bile-pancreas, and intestines, respectively. Pleiotropic genetic variants of 21 digestive disorders To provide further evidence for shared genetic factors, we performed a cross-disorder meta-analysis for the GWAS analyses on the 21 digestive disorders using ASSET. Of the 7,337 variants with p ≤ 1 × 10^−4 in the single-disorder GWASs, 539 variants were pleiotropic that passed the genome-wide significant threshold (P[meta] ≤ 5 × 10^−8) ([79]Figure 5; [80]Tables S9 and [81]S10); 176 of them were reported ([82]Table S12). 75% (4404/539) and 74% (398/539) of the identified pleiotropic variants were related to gallbladder disorders including CHSIS and CHETIS, respectively. GERD, which had the largest number of cases, was associated with 62% (332/539) of the pleiotropic variants. Figure 5. [83]Figure 5 [84]Open in a new tab Manhattan plot of cross-disorder meta-analysis The x axis represents genomic position (chromosomes 1–22), and y axis represents statistical significance in the scale of -log[10] (P for overall test). SNPs with genome-wide significance are shown above the horizontal red line which corresponding to the significant threshold at 5 × 10^−8. The highlighted SNPs in orange are SNPs associated with two subsets of disorders while having opposite association direction; those in yellow are novel SNPs. Red circles represent pleiotropic SNPs that associated with multiple disorders, and the minimum of GWAS p values was used for presentation. Among 539 pleiotropic SNPs, 498 exhibited effect with consistent direction, while 41 showed effect with different directions. Further, 114 LD blocks were shared between disorders using the same criteria in GWAS ([85]Table S11). Functional characterization of pleiotropic variants We annotated SNPs by their physical location on genomic sequence. 28 of 539 SNPs (5%) were in exon region ([86]Figure S2A), of which 22 were nonsynonymous. This result was consistent with previous findings that most (∼93%) disease-associated SNPs in the GWAS Catalog were in non-coding regions.[87]^13 To comprehensively investigate the regulatory roles of the variants, we systematically annotated all the SNPs in aspect of functional basis ([88]Figure S2B). Specifically, 128 SNPs interacted with the target genes through 3D chromatin loops, 209 SNPs located in or near super-enhancers/promoters, and 335 SNPs acted as eQTLs with the target genes. The 394 SNPs (73%) were annotated with at least one type of functional categories, and 71 SNPs (13%) had functional support from all three types of information. Detailed annotations were provided in [89]Tables S9 and [90]S10. The enrichment of functional annotations for these SNPs suggested that the pleiotropic SNPs might play essential roles in digestive disorders through functional regulation. Unique genes can be annotated from different sources ([91]Figure S2C). Finally, we obtained 1,381 candidate genes for the following analysis. Pleiotropic genes shared among digestive disorders We tested 1,381 candidate genes that showed clues in the cross-trait meta-analysis according to the SNP annotation results ([92]Table S13). 736 genes located at 146 independent genetic regions showed pleiotropic effects on 14 disorders after false discovery rate (FDR) correction for multiple testing (FDR-q ≤ 0.05) ([93]Figure S3A; [94]Table S13). About half of these pleiotropic genes (369/736) were associated with at least three digestive disorders in ACAT analysis. To better understand the pathogenic mechanism of these genes, we divided these 736 pleiotropic genes into two categories according to the results of ACAT: 690 noncancer-related pleiotropic genes and 46 cancer-related pleiotropic genes. The top noncancer-related pleiotropic genes were MCCD1, ATP6V1G2, and LTA that shared among seven digestive disorders in ACAT analysis, followed by 17 genes which were detected in six digestive disorders ([95]Figure 6A). Notably, most of the top genes (PSORS1C2, TCF19, XXbac-BPG299F13.17, HLA-C, HLA-B, MCCD1, ATP6V1G2-DDX39B, DDX39B, SNORD117, SNORD84, DDX39B-AS1, ATP6V1G2, NFKBIL1, LTA, TNF, and HLA-DQA2) were located on 6p21.33 in major histocompatibility complex (MHC) region. Figure 6. [96]Figure 6 [97]Open in a new tab Pleiotropic genes annotation (A) describes the noncancer-related genes shared among digestive disorders in ACAT. (B) shows the cancer-related genes shared among digestive disorders in ACAT. Color of the labels in A and B indicates biological type and tissue of the gene. (C) is the upset plot showing the overlap of pleiotropic cancer-related genes identified in the gene-based analysis for different digestive disorders. Approach of the genes annotated. a: annotated as novel gene.b: positional mapped from SNPs.c: mapped through three-dimensional chromatin looping.d: mapped to super enhancer/promoter.e: mapped by eQTL relationship.f: identified by organ specific SMR analysis.g: identified by whole blood SMR analysis.h: identified by cross-organ SMR analysis. We highlighted 46 cancer-related pleiotropic genes which associated with three digestive cancers, including EC (1 gene), small intestinal cancer (SIC, 4 genes), and CRC (41 genes) ([98]Figures 6C and 6D). Of these genes, 14 were novel, 39 were annotated from position, three were interacted with SNP through 3D chromatin loops, 16 were target genes for super-enhancer/promoter, and 27 were based on eQTLs. Further, we explored whether the effect of genetic variants on risk of disorders was mediated by alteration of the corresponding genes’ expression. The summary data-based MR test was performed on 14 disorders that shared genes identified in ACAT for 1,023 probes that had at least one cis-eQTL at P[eQTL] ≤ 5 × 10^−8. After HEIDI test, we retained the results of gene-trait pairs that had been identified in the ACAT analysis ([99]Table S14). We identified 65 pairs of association on gene expression and risk of disorder with FDR-q ≤ 0.05 in the specific tissue, 299 pairs in the whole blood, and 184 pairs in the cross-tissue ([100]Figures S3B–S3D). Of these, 130 genes’ expressions were associated with two or more disorders. The analyses proved that the pleiotropic genetic variants and corresponding gene transcription contributed to the risk of multiple digestive disorders. Functional enrichment analysis of pleiotropic genes The Gene Ontology (GO) enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis performed by the noncancer-related genes and cancer-related genes were utilized to explore the shared biological functions and pathways related to digestive disorders ([101]Tables S15, [102]S16, [103]S17, and [104]S18). The top 10 significant GO results and all significant KEGG results are shown in [105]Figure S4. For noncancer-related genes, the GO enrichment analysis showed that they were enriched in the biological process (BP) related to chronic inflammation and immune responses, such as cellular response to interferon-gamma. Meanwhile, noncancer-related genes were significantly enriched in cellular component (CC) related to intestinal inflammation,[106]^14 such as integral component of endoplasmic reticulum membrane. For the molecular function, these genes are enriched in MHC class II receptor activity, peptide antigen binding, and glucuronosyltransferase activity. For KEGG pathway, the genes were enriched in pathways which play the potential role in digestive system, such as antigen processing and presentation, bile secretion, and intestinal immune network for IgA production. For cancer-related genes, the GO enrichment analysis results showed that the top significant BPs were related to epithelial-mesenchymal transition, which is known to be crucial for malignant progression, such as regulation of epithelial to mesenchymal transition.[107]^15 Interestingly, the top CC was laminin complex, which may act as regulators of cancer stem cells, and play an instrumental role in long-term cancer maintenance, metastasis development, and therapeutic resistance.[108]^16 For the KEGG pathway enrichment, the significant results were related to developmental pathways (TGF-β and Hippo) and signaling pathways regulating pluripotency of stem cells. To summarize, these results mentioned previously suggested that the pleiotropic genes were closely related to digestive system and cancer. Drug-gene interactions related to digestive disorders As described previously, we detected 1,812 unique drugs that had drug-gene interactions with the target pleiotropic genes. The top ten genes related to drugs were EHMT2, KCNH2, SMAD3, FEN1, ABCB1, MPHOSPH8, TNF, CNR1, UGT1A1, and CYP19A1. Among these drugs, 66 drugs were indicated for digestive disorders ([109]Table S19). Discussion This is the first study that comprehensively investigated the causal relationships and shared genetic factors across 21 digestive disorders among 329,707 European individuals of UKB. Specifically, we explored 49 causal relationships among the digestive disorders and detected 539 pleiotropic SNPs enriched for regulatory functions, which mapped to 46 target genes shared across digestive cancers and noncancerous digestive disorders. Our findings provided new insights into the etiology and causality of digestive disorders. The broad genetic overlap between pairwise disorders reflected the shared genetic across these digestive disorders, and prompting the exploration of phenotypic causal network, which further proved by MR, a sophisticated causal inference method. The 20 disorders were involved in the causal network except for SIC, partly due to insufficient number of cases. For some digestive disorders, we validated several relationships through the methodology of genetic studies that recognized in clinical and experimental studies, such as CHSIS and cancer,[110]^17 BE, and EC.[111]^18 In comparison, we discovered the existed significant genetic correlation between irritable bowel syndrome (IBS) and inflammatory bowel disease (IBD), differing from the previous study which removed IBD cases from IBS cases and may loss the potential overlap.[112]^10 Moreover, some relationships were evaluated by us for the first time, such as IBS and CRC, which provided new clue to genetic basis from both MR and Bayesian causal network. Most importantly, this study also indicates potential causal relationships among noncancerous disorders to cancers, such as from BE to EC, from IBS and CRP to CRC, which may provide evidence for the means forward for cancer prevention and warrants further investigation. Our study identified pleiotropic LD blocks for digestive disorders, most of which were previously reported in GWAS on digestive disorders, indicating the reliability of our results. Interestingly, a considerable number of blocks were located at 2p21; the leading independent variant (rs56266464) is located at super enhancer of ABCG5 and ABCG8 which have role in cholesterol secretion and may contribute to sterol accumulation by mutation,[113]^19 and shared among GERD, gastritis and duodenitis (GDS), CHETIS, and CHSIS. Insights into these complex relationships may inform personalized treatment strategies, guide drug development, and facilitate early diagnosis and risk assessment, ultimately providing more accurate and individualized guidance for clinical decision-making. Cross-trait meta-analysis detected more than double signals than that in GWAS, significantly increased statistical power, especially for the disorders with small sample size. The top 12 variants were all at 2p21, which was more tissue specific and congregated in hepatobiliary and pancreatic diseases (liver and intrahepatic bile ducts cancer, CHATIS, bile duct cancer, CHETIS, CHSIS, GBC, and pancreatic cancer). The top novel variant 13:29549405:AT:A which was associated with 20 digestive disorders, was an intronic variant located at 13q12.3 and ∼50 kb upstream of microtubule-associated scaffold protein 2. This region was previously associated with non-alcoholic fatty liver disease (NAFLD)[114]^20 and peptic ulcer.[115]^10 Moreover, this gene had been reported to be capable of regulating entotic cell-in-cell formation, which was described as a nonapoptotic cell death process that occurred in human tumors.[116]^21 Furthermore, in conjunction with the causal network, the pleiotropic variants could account for plentiful causal pathways. Meanwhile, rs760077, a missense variant of MTX1 at 1q22 and in high LD (r^2 = 0.72) with rs2075570, which has been reported having association with susceptibility of CRC[117]^22 and gastric cancer[118]^23 in European population, exhibits a significant association with risks of nine types of digestive disorders (esophageal ulcer, BE, gastric and duodenal ulcer, GC, IBD, CRP, liver fibrosis and cirrhosis, bile duct cancer, and GBC) which are also connected on the causal network. It is noteworthy that we identified 41 variants that had heterogeneous effects on risk of digestive disorders although some of them had positive genetic correlation and causal relationships. Nine of them are located in the immune-mediated human leukocyte antigen region (6p21.3), which were highly polymorphic and had complex associations with digestive disorders with different pathological conditions.[119]^24 Other regions also had several bidirectional variants. Notably, consistent with our results, rs1260326 in the exon of GCKR at 2p23.3 was reported to have the opposite effect in gallstone disease (risk allele: T, OR = 0.89)[120]^25 compared to that with the other disorders, including NAFLD (risk allele: T, OR = 1.28),[121]^26 IBD (risk allele: T, OR = 1.38),[122]^27^,[123]^28 Crohn’s disease, and ulcerative colitis (risk allele: T, OR = 1.046).[124]^29 The heterogeneity of effects among digestive disorders could help the better understanding of cross-trait genetic relationships. Notably, the pleiotropic variants were annotated in both positional and functional aspects to maximize the list of potential genes involved in the risk of gastric disorders, especially for the noncoding variants.[125]^30 In the gene-level analyses, abundant significant genes had been shown to play important roles in the digestive disorders’ pathogenesis. ATP6V1G2, which is linked to seven types of digestive disorders including GERD, GDS, GDP, IBD, IBS, CHETIS, and CHSIS, plays a significant role in human energy metabolism and induces oxidative stress, and had been considered as the risk gene for CRC.[126]^31 LTA Lymphotoxin alpha, corresponding to the same list of seven disorders described previously, a member of the tumor necrosis factor family, is among the master regulators of intestinal lymphoid development[127]^32 and was suggested to play a bigger part in esophageal metaplasia.[128]^33 Inter-alpha-trypsin inhibitor heavy chain 4, which was associated with six digestive disorders including GERD, GDS, GDP, CRP, CHETIS, and CHSIS, located on 3p21.1, has been reported related to growing early colorectal adenomas.[129]^34 Moreover, we highlighted the 46 genes that could drive digestive disorders to cancers. Notably, the top three protein-coding genes (TMEM110-MUSTN1, TMEM110, and SFMBT1) shared among six disorders, including GERD, GDS, GDP, CRP, CHSIS, and CRC. TMEM110-MUSTN1 is novel for digestive disorder, but has been identified as the putative marker for lung adenocarcinoma.[130]^35 TMEM110, also known as STIMATE, was novel but was found to be a regulator of STIM1 activation, which could promote tumor growth and metastasis in a variety of cancer types.[131]^36 Another reported gene, SFMBT1, had been verified the potential oncogenic function using in vitro functional assays in multiple CRC cells.[132]^37 These findings provide a deeper understanding of the genetic mechanisms and pathogenesis underlying digestive disorders. Worth further exploration is whether these genes share similar genetic pathways among different digestive disorders, and whether their roles vary across different organs. Functional enrichment analysis showed that genes related to benign digestive tract traits were mainly enriched in pathways related to chronic inflammation and immune response, which was basically consistent with the previous review.[133]^38 However, genes related to malignant digestive tract traits are enriched in important signal pathways related to carcinogenesis, including TGF-β and Hippo signal pathways regulating stem cell pluripotency. Among them, several crosstalk modes between TGF-β family signal and Hippo signal have been proved to regulate the proliferation, invasion, and migration of cancer cells.[134]^39 In this study, cancer targeting-related genes enriched in these pathways also affect other precancerous lesions, which may provide clues for the identification of cancer progression and metastasis. At the same time, the genes that jointly drive benign and malignant diseases may provide new insights into disease prevention and clinical treatment. In this study, the cancer-related genes enriched in these pathways were found to impact other benign disorders, possibly contributing to a pro-oncogenic environment. This provides insights into potential clues for identifying cancer progression and metastasis, indicating shared biological mechanisms among diseases that might influence the development of cancer. By delving into the molecular mechanisms of these shared effects through gene and pathway analysis, we can infer and identify key factors that may influence cancer progression and metastasis. Simultaneously, the remarkable consistency of driver genes across different diseases in this study may offer novel insights for clinical treatment and disease prevention. If these driver genes maintain consistency across multiple diseases, they could play pivotal roles in the pathological processes of various conditions. This opens doors to opportunities for developing treatment methods and preventative strategies targeting these genes, introducing new possibilities for disease management and intervention. We also provided further evidence supporting existing drugs for the treatment of digestive disorders. For instance, several experimental researches had identified the importance of EHMT2 (also known as G9a) in multiple digestive disorders including GC,[135]^40 LC,[136]^41 and CRC.[137]^42 Our study has several strengths. First, this is a comprehensive study to investigate the relationships among a broad list of digestive disorders, in both phenotypic and genetic aspects, from single disorder to multiple disorders, and from statistical association to medical causality. This comprehensiveness not only enriches our understanding of the interactions between diseases, but also provides insight into the development of more accurate prevention and treatment strategies. Second, our study uncovers several novel and crucial genetic variants and genes that contribute to the causal pathway of multiple digestive disorders, which provide new clues for further mechanistic and functional research. Third, we explain the relationship between chronic diseases and gastrointestinal tumors through gene-based analysis and identify pleiotropic genes with remarkable biological functions. As we know, gastrointestinal tumors may develop from chronic diseases. Thus, focusing on the shared genes among them may help identify the high-risk population carrying risk alleles that are more susceptible to cancers. In summary, our study substantiates the extensive genetic correlations and causal relationships among 21 digestive disorders, identifying shared genetic factors and elucidating the underlying biological mechanisms among these conditions. These findings provide insights into the etiology, causal relationships, and potential drug targets for clinical interventions. Limitations of the study We also acknowledge the limitations of this study. First, number of cases for individual disorders varied from 93 for GDC to 43,831 for GERD. Sample size of part disorders was small, which limited power to detect pleiotropic effects. Additionally, the imbalance in sample sizes potentially results in an inflation of type I error rates. Second, we included only individuals of European ancestry to avoid potential confounding due to ancestral heterogeneity across distinct disorder studies. It is essential to evaluate the signals in non-European populations. Third, functional clues of this study were bioinformatics explorations using public databases which warrant well-designed experimental studies in future. Fourth, this study does not specifically explore the role of epigenetic factors, and requires more in-depth correlation analysis. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples __________________________________________________________________ UK Biobank: 57471 UK Biobank [138]https://www.ukbiobank.ac.uk/ __________________________________________________________________ Deposited data __________________________________________________________________ eQTL data from eQTLGen eQTLGen [139]https://eqtlgen.org/ eQTL data from GTEx GTEx [140]https://www.gtexportal.org/home __________________________________________________________________ Software and algorithms __________________________________________________________________ PLINK v1.90 [141]http://pngu.mgh.harvard.edu/purcell/plink/ LDSC [142]https://github.com/bulik/ldsc ANNOVAR [143]https://annovar.openbioinformatics.org/en/latest/user-guide/downlo ad/ bedtool [144]https://code.google.com/archive/p/bedtools/ Summary-data-based Mendelian Randomization [145]https://yanglab.westlake.edu.cn/software/smr/ LDlink R package [146]https://github.com/CBIIT/LDlinkR bnlearn R package [147]https://github.com/cran/bnlearn MendelianRandomization R package [148]https://cran.r-project.org/web/packages/MendelianRandomization/ GSMR R package [149]https://yanglab.westlake.edu.cn/software/gsmr/ ASSET R package [150]https://dceg.cancer.gov/tools/analysis/asset ACAT R package [151]https://github.com/yaowuliu/ACAT clusterProfiler R package [152]https://bioconductor.org/packages/release/bioc/html/clusterProfile r.html original code This paper Zenedo;[153]https://doi.org/10.5281/zenodo.8405925 [154]Open in a new tab Resource availability Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Yongyue Wei (ywei@pku.edu.cn). Materials availability This study did not generate new unique reagents. Data and code availability The data used in this study is all from public databases. Data support the main findings in this study are accessible via the UK Biobank under application number 57471. Other data can be obtained from the GTEx and eQTLGen. Download URLs are listed in the [155]key resources table. Original code has been deposited at Zenodo and is publicly available as of the date of publication. DOIs are listed in the [156]key resources table. Any additional information required to reanalyze the data reported in this paper is available from the [157]lead contact upon request. Experimental model and subject details Study population The data were obtained from the UKB cohort (Proposal ID: 57471). UKB is a population-based longitudinal cohort of ∼500,000 individuals recruited at 22 centers across the United Kingdom.[158]^43 The UKB phenotypes were derived from the following data field IDs: self-report (20001, cancer code; 20002, noncancer illness code), ICD10 (41270, diagnoses in ICD10; 40001, underlying (primary) cause of death in ICD10), ICD9 (41271, diagnoses in ICD9) ([159]Figure 2A and [160]Table S1). Individuals who have any other disorders of the digestive system were excluded according to the above data fields, and the rest of the individuals were defined as controls ([161]Table S2). Analyses were limited to ‘Caucasian’ according to Field ID 22006 to reduce population stratification. The kinship relationship was inferred by KING software with default parameters.[162]^44 After filtering, 329,707 European individuals including 116,382 cases with at least one digestive disorder and 213,325 controls were retained. GWAS statistics Genotyping was conducted using either the UKB Axiom array or the UK BiLEVE array.[163]^45 We excluded SNPs with imputation accuracy (Info) score < 0.8, minor allele frequency (MAF) < 0.01, Hardy-Weinberg equilibrium test P value < 1.0 × 10^-6, or missing genotype rate > 0.05 using PLINK 1.9,[164]^46 leaving 8,573,123 variants for the following analyses. We performed case-control GWAS analyses using a logistic regression model additively modeled the SNPs with genetic sex, age, and top 10 ancestry principal components (PCs) as covariates in PLINK. We randomly selected 20,000 European individuals and set SNPs of them as linkage disequilibrium (LD) reference. Independent trait-associated SNPs were generated using PLINK (--clump-p1 5×10^-8 --clump-r2 0.1 –clump-kb 500). The significant SNPs were searched in GWAS Catalog and were divide into two categories: previously reported SNPs related to digestive disorders and novel SNPs, via R package LDlink according to the published GWAS from GWAS Catalog. We determined that a SNP was potentially novel if GWAS Catalog SNPs had LD r2 ≤ 0.1 with the SNP.[165]^9 Based on the GWAS analysis, shared LD blocks were assessed for overlap among multiple digestive disorders. Method details Our study does not involve experiments, and the relevant statistical methods and analysis procedures will be discussed in the "[166]Quantification and statistical analysis" section. Quantification and statistical analysis All statistical analyses using R packages were performed using R 4.2.1, unless otherwise stated. Information on specific statistical analyses are described below. Genetic heritability and genetic correlation We estimated the SNP-based heritability (h^2[SNP]) using linkage disequilibrium score regression (LDSC).[167]^47 To convert to liability-scale heritability, we adjusted for lifetime risks of each digestive disorder based on the proportion of the cases in the sample. The genomic inflation factor (λ[GC]) was also reported for each disorder. Genetic correlations (r[g]) for each pair of the 21 digestive disorders were calculated using bivariate LDSC. Inference of Bayesian causal network To understand the causal relationship among multiple digestive disorders, we conducted the Bayesian network using the score-based hill-climbing (HC) algorithm with a sufficiently large sample size to enable effective inference.[168]^48 In our study, the network was bootstrapped 2000 times, using the 21 disorders as discrete variables, and arc directions were identified significantly which the probability is more than 85%3. The strength of the probabilistic relationships expressed by the arcs was measured by the logarithm of the Bayesian Dirichlet equivalent score (bde).[169]^49 For the undirected arc which probability of its direction is 0.5, we retained the direction which had the stronger strength. The Bayesian network was generated in R package bnlearn. Mendelian randomization analysis To explore the potential causal effect among all pairs of 21 digestive disorders, we used Mendelian randomization (MR) with exposure-significant SNPs as the instrument variables. Considering that some digestive disorders had insufficient number of cases, resulted in the limited genome-wide significant SNPs (P ≤ 5 × 10^-8), we relaxed the significance threshold to 5 × 10^-6 to obtain sufficient genetic instrumental variables for those digestive disorders. Due to the complexity and strong linkage disequilibrium of the MHC region, only the most significant SNP within MHC region (chr6: 25-34 Mb) was reserved for MR analysis.[170]^50 Based on the UKB reference panel, we used linkage disequilibrium r^2 < 0.01 as a clumping threshold and set the physical distance threshold to be 10 Mb to ensure uncorrelated genetic instruments. We applied the inverse-variance weighted (IVW) method to estimate the causal relationship.[171]^12 To ensure the robustness of the results, we additionally performed GSMR,[172]^51 median-MR,[173]^52 and mendelian randomization analysis using Maximum-likelihood[174]^12 as sensitivity analysis to control for the influence of pleiotropic effects, instrumental outliers, and sample overlap. Statistical analyses were performed using the packages MendelianRandomization[175]^53 and GSMR. Cross-disorder GWAS meta-analysis To identify the shared variation of the multiple digestive disorders, cross-disorder meta-analysis was carried out via association analysis based on subsets (ASSET).[176]^54 We conducted ASSET analyse on the independent signals (index SNPs which P[GWAS] ≤ 1 × 10^-4) for each digestive disorder. In the bidirectional pleiotropy analysis, P value for each direction is provided as well as an overall P for the total association signal for both directions combined. The pleiotropic independent variants were determined via LD clumping with overall P ≤ 5 × 10^-8, and other SNPs were clumped with the lead variant if they had overall P < 0.05, were within 500kb of the index SNP and had r^2 > 0.1 with the index SNP. A SNP was determined to have effect with consistent direction if the overall P ≤ 5 × 10^-8 and the P for one direction was ≤ 0.05. Similarly, a SNP was determined to have effect with different directions if the overall P ≤ 5 × 10^-8 and the P values for both directions were < 0.05. Functional annotation and gene-mapping of pleiotropic variants In terms of assessing variant functions and mapping SNPs to genes, we first annotated SNPs based on ANNOVAR.[177]^55 To perform a more comprehensive evaluation of functional genetic variations, we obtained candidate genes from other various resources including VARAdb[178]^56 and 3DSNP[179]^57 as supplement to ANNOVAR. We also annotated SNPs to genes in which the SNPs located in or near the super-enhancers and promoters. 3DSNP was used to map SNPs to distal target genes through three-dimensional (3D) chromatin looping. Cis-eQTL mapping provides significant genes (FDR q value<0.05) in nine specific digestive tissue types (esophagus muscularis, esophagus mucosa, esophagus gastroesophageal junction, stomach, small intestine terminal ileum, colon sigmoid, colon transverse, liver, pancreas) from GTEx v8[180]^58 and whole blood from eQTLGen[181]^59 database. The eQTLGen provides the largest existing eQTL summary statistics from 31,684 whole blood samples. All tissue or cell types corresponding to the data source above were digestive-specific or whole blood. Annotation results of multiple means were merged to form a list of candidate genes. The function cluster from bedtools[182]^60 was used to cluster these genes into independent 1-Mb regions. MalaCards,[183]^61 a database that provides the gene-disease relations from multiple data sources, was used to search for the existing evidence for the association between candidate genes and disorders. Gene-based association analysis We applied aggregated Cauchy association test (ACAT) to combine the statistical evidence from multiple SNPs within the corresponding gene to determine the association of target gene and individual digestive disorders via R package ACAT based on the GWAS summary results.[184]^62 Gene boundary relies on Ensembl database build GRCh37.3, extending 35 kb upstream and 10 kb downstream to include regulatory regions.[185]^63 Genomic locations unavailable from Ensembl were manually annotated using NCBI’s Gene online web resource. Pseudogenes were not included because of potential concerns of inaccurate calling.[186]^64 SMR analysis for candidate genes Summary-data-based Mendelian randomization (SMR)[187]^65 was used to provide putative causal evidence between SNPs and disorders via gene expression. SMR was performed using the expression quantitative trait Loci (eQTL) summary statistics from the eQTLGen and GTEx v8 described in the supplementary methods. Only transcripts with at least one cis-eQTL (P ≤ 5 × 10^-8) were taken into consideration. The significant threshold for the Heterogeneity in Dependent Instrument (HEIDI) test was PHEIDI ≥ 0.01. Pathway enrichment analysis and drug target exploration To explore functional discrepancy of the detected pleiotropic genes, the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed by the R package clusterProfiler with p-value cutoff = 0.05. To investigate the potential drugs related to digestive disorders, drug target genes and indications were obtained from the Drug-Gene Interaction Database (DGIdb),[188]^66 and DrugBank version 5.1.9.[189]^67 Acknowledgments