Abstract Genome‐wide association studies have identified several germline variants in gastric cancer. Meanwhile, sequencing studies have characterized extensive somatic alterations that arise during gastric carcinogenesis. However, the relationship between the germline variants and somatic alterations is still unclear in gastric cancer. A total of 11 susceptibility loci and 276 driver genes of gastric cancer were determined based on previous studies and publicly available database. An enrichment analysis was made to detect whether driver genes were enriched in susceptibility regions. Besides, we performed a pathway enrichment analysis to find common‐enrich pathways of cancer driver genes and susceptibility genes. Finally, on the basis of the gastric cancer samples and data from TCGA STAD project, we evaluated the associations between susceptibility loci and somatic alterations. Enrichment analysis showed that gastric cancer susceptibility genes were more likely to be enriched in driver genes than in all the genes (P = .05). The susceptibility genes and driver genes were commonly enriched in 8 biological pathways. Gastric cancer susceptibility locus of rs2285947 was associated with truncation mutation within Signaling by PDGF pathway (OR = 0.26, 95%CI: 0.12‐0.55, P = 3.93 × 10^−4). The rs1679709 was connected with COSMIC Signature15 (P = .026). Moreover, rs1679709 was also associated with copy number values of RFC4 which is related to Signature15. These results provide evidence for the relationship between germline variants and somatic alterations, which facilitate understanding the interactive mechanism of germline variations with somatic alterations in gastric cancer development. Keywords: association studies, enrichment analysis, gastric cancer, germline variants, somatic alteration 1. INTRODUCTION Gastric cancer is the second most common cancer and the second leading cause of cancer death in China.[32]1 The environmental risk factors for gastric cancer include high‐salt diet, smoking, and infectious agents.[33]2 Besides, there are still numerous genetic factors which determine an individual's predisposition to gastric cancer.[34]3 Many genetic variations, most of which are single nucleotide polymorphisms (SNPs), have been detected over the past years by the genome‐wide association studies (GWAS) of gastric cancer.[35]4 These common susceptibility loci included 1q22 (MUC1), 3q13.32 (ZBTB20), 5p13.1 (PRKAA1), 5q14.3 (lnc‐POLR3G‐4), 6p21.1 (UNC5CL), 8q24 (PSCA), and 10q23 (PLCE1).[36]5, [37]6, [38]7, [39]8, [40]9, [41]10 In parallel, a growing number of whole‐exome and whole‐genome sequencing studies have been conducted to define the landscape of somatic mutations in gastric cancer. These studies have identified many driver genes, whose mutations confer selective growth advantage to tumor.[42]11 Some of these driver genes are previously known cancer genes (eg, TP53, ARID1A, and CDH1), while the others are new‐found significantly mutated genes in gastric cancer (eg, MUC6, CTNNA2, GLI3, and RNF43).[43]4, [44]12, [45]13, [46]14, [47]15, [48]16, [49]17, [50]18, [51]19, [52]20, [53]21, [54]22 Moreover, the copy number changes and characteristic mutational signatures also play important roles in gastric cancer development.[55]4, [56]16, [57]17, [58]18, [59]19 Recent studies have revealed the associations between germline mutations and somatic alterations in tumor development. According to these studies, a large fraction of cancer predisposition genes can contribute to oncogenesis when they have somatic mutation events in tumors.[60]23 The germline MC1R status may influence somatic mutation burden in melanoma, and the common germline risk variants are connected with total somatic mutation count in breast cancer.[61]24, [62]25 In addition, another study reported that oncoprotein EWSR1‐FLI1 preferentially bound to the risk allele of susceptibility SNP rs79965208 in Ewing sarcoma.[63]26 However, the associations between the genetic susceptibility variants and somatic alterations in gastric cancer are still unknown. In this study, we set out to examine the association between the established gastric cancer susceptibility loci and the somatic alterations. First, we made enrichment analyses to examine whether driver genes are enriched in germline susceptibility regions, and whether cancer susceptibility genes are enriched in driver genes. Then we made a pathway enrichment analysis to explore whether driver genes and susceptibility genes are enriched in the same pathways. Finally, serial association analyses were conducted to investigate how the risk SNP genotypes affect the somatic alterations during gastric cancer development (Figure [64]1). Figure 1. Figure 1 [65]Open in a new tab The flowchart of the study design. Firstly, we collected gastric cancer risk SNPs and driver genes. Next we built the susceptibility regions and identified cancer susceptibility genes (CSGs), following by an enrichment analysis to investigate whether driver genes are more likely to locate within susceptibility regions, and whether susceptibility genes and driver genes are enriched in common biological pathways. Finally we made association analyses between gastric cancer risk SNPs genotypes and several somatic events. GC, gastric cancer; CPG, cancer predisposing gene; MMR, DNA mismatch repair pathway 2. MATERIALS AND METHODS 2.1. Risk SNPs and cancer susceptibility genes Gastric cancer risk SNPs were extracted from the original gastric cancer GWAS studies in GWAS Catalog.[66]5, [67]6, [68]7, [69]8, [70]9, [71]10, [72]27 Only 1 SNP was remained when multiple variants were in linkage disequilibrium (LD, r ^2 ≥ .8), and the minor‐allele frequency (MAF) should be greater or equal than .05. In addition, we included SNP rs1679709 which was reported in a new gastric cancer GWAS study in 2017 and meet the above selection criteria.[73]28 Gastric cancer susceptibility regions were defined as 200 kb upstream and downstream of a risk SNP. The protein coding genes located in these susceptibility regions were defined as cancer susceptibility genes (CSGs). Moreover, based on the Genotype‐Tissue Expression (GTEx) v6p database, the protein coding genes whose expression in stomach tissue was associated with risk SNPs or SNPs in high LD (r ^2 ≥ .8) with risk SNPs were also defined as CSGs.[74]29 Genomic coordinates and gene symbols of the protein coding genes were gained from GENCODE version 19.[75]30 2.2. Driver genes Gastric cancer driver genes were obtained from 3 sources: (1) 16 gastric cancer‐related driver genes from Catalogue of Somatic Mutations in Cancer (COSMIC) Cancer Gene Census(v78)[76]31; (2) 175 driver genes of gastric cancer from the Integrative Onco Genomics (IntOGen) database[77]32; (3) 108 significantly mutated genes (SMGs) and 26 somatic copy number alteration genes from previously published Whole Genome Sequencing (WGS) or Whole Exome Sequencing (WES) articles.[78]4, [79]12, [80]13, [81]14, [82]15, [83]16, [84]17, [85]18, [86]19, [87]20, [88]21, [89]22 2.3. Region enrichment analysis and gene‐based enrichment analysis First of all, we calculated the proportion of driver genes in susceptibility regions. A random sampling of SNP sets was carried out to appraise the background distribution of driver genes, and a total of 10 000 sets of randomly sampled SNPs, which were similar with risk SNP sets in several genomic features, were obtained using the SNPsnap online server. The SNPsnap Web server provide matched sets of SNPs based on allele frequency, number of SNPs in LD, distance to nearest gene and gene density that can be used to calibrate background expectations.[90]33 We calculated the number of random SNP sets with proportions of driver genes equal to or larger than that of risk SNPs. Then we divided the calculated number by 10 000 and defined it as the P values for enrichment of driver genes in susceptibility regions. To detect whether the length of flanking regions will affect the enrichment result, we defined 50kb, 100kb, 200kb and 500kb upstream and downstream of a risk SNP as the susceptibility regions, calculating P values for each susceptibility region separately. In the gene‐based enrichment analysis, we computed the enrichment ratio of CSGs in all genes (56 318 genes from GENCODE V19) and that in driver genes. The fold enrichment ratio between these 2 ratios was counted as well. The P value of fold enrichment ratio was counted using the “phyper” function in R 3.3.1. 2.4. Pathway enrichment analysis Reactome pathways were downloaded from the MSigDB website ([91]http://software.broadinstitute.org/gsea/index.jsp/). Each Reactome pathway included in our analysis should contain at least 1 gastric cancer driver gene or susceptibility gene. We used the “phyper” function as implemented in R to compute the P value for enrichment of CSGs or driver genes in a given pathway. The P value was adjusted to account for multiple hypotheses testing with Benjamini‐Hochberg correction and we defined a pathway to be significant if the false discovery rate (FDR) ≤ 0.1. 2.5. Genotype We used data from TCGA STAD project to perform association analyses. The germline genotypes were generated using the Affymetrix Genome‐Wide Human SNP Array 6.0. There were 442 cases left after the standard quality control process. Then we conducted genotype imputation using SHAPIT for prephasing and IMPUTE2 for imputation, based on the 1000 Genomes Project Phase III integrated variant set release.[92]34, [93]35 Risk SNPs included in the association analysis should meet the standards as follows: imputation info ≥0.5, minor allele frequency (MAF) ≥0.01, and Hardy‐Weinberg equilibrium P‐values ≥.001. 2.6. Association analysis on somatic mutations within driver genes and somatic truncation mutations within key pathways The mutation information of gastric cancer was available online, based on the whole‐exome sequencing data supplied by TCGA‐STAD project and we used the mutation annotation file (TCGA.STAD.mutect.a88b4065‐34b4‐4858‐9c16‐55def79c38f2.DR‐6.0.public.ma f) from Genomic data commons (GDC) data portal ([94]https://portal.gdc.cancer.gov/projects/TCGA-STAD). The 276 driver genes described above were included into analysis, and for each patient, a driver gene was considered mutated if one or more DNA mutations mapped to this gene. Pathways where driver genes and CSGs were both significantly enriched in were defined as key pathways. As a result, a total of 8 Reactome pathways were included in our analysis. For each patient, a key pathway was considered to be mutated if one or more truncation mutations were detected in this pathway. 2.7. Association analysis on somatic copy number alterations Output files of SNP6 copy number analysis (GISTIC2) were obtained from the Broad Institute Genome Data Analysis Center (GDAC) Firehose portal ([95]http://gdac.broadinstitute.org/runs/analyses__latest/reports/cance r/STAD-TP/CopyNumber_Gistic2/nozzle.html). We extracted the CNV regional information from the “all_lesions.conf_99.txt” file, and it contains data about the significant regions of amplification and deletion as well as which samples are amplified or deleted in each of these regions. Besides, we extracted gene‐level copy number values of gastric cancer driver genes in each TCGA STAD sample from the “all_data_by_genes.txt” file. We take the absolute values of the copy number values and our analysis only focus on the copy number values of 276 driver genes. 2.8. Association analysis on COSMIC signatures We calculated the weight of each mutational signature contributing to an individual TCGA STAD sample using the “deconstructSigs” package in R 3.3.1 (package details in [96]https://github.com/raerose01/deconstructSigs) with the mutation annotation file described above (TCGA.STAD.mutect.a88b4065‐34b4‐4858‐9c16‐55def79c38f2.DR‐6.0.public.ma f). We included the 11 COSMIC signatures which had been reported in gastric cancer into our analysis (COSMIC signatures 1, 2, 5, 13, 15, 17, 18, 20, 21, 26, and 28). For each patient, the weights of each mutational signature were used as somatic phenotype. 2.9. Collection of DNA mismatch repair genes DNA mismatch repair (MMR) genes were collected from 2 resources: identified MMR genes in a reported study and the genes in KEGG mismatch repair pathway.[97]21 As a result, a total of 26 MMR genes were included in our analysis (Table [98]S1). 2.10. Statistical analyses for association analysis Logistic regression was performed for binary phenotypes, and multiple linear regression was performed for quantitative traits in the association analysis progress. The additive model was utilized in our study, and we controlled for age, gender, clinical stage, and the first 10 principal components. The clinical information (age, gender, and clinical stage) was obtained from GDC data portal, and the missing clinical variable values were imputed with the corresponding median values in our study. The association p values would be adjusted by Benjamini‐Hochberg correction method, and all the tests were two‐sided, a false discovery rate (FDR) of 0.1 was used as significance threshold. All the association analyses were conducted in R‐3.3.1 ([99]http://www.R-project.org/). 3. RESULTS We identified 11 gastric cancer risk SNPs according to our definitions and quality control standards (Table [100]S2). Besides, we also identified 74 cancer susceptibility genes and collected 276 driver genes from COSMIC Cancer Gene Census, IntOGen database, and reported WGS or WES studies (Tables [101]S3 and [102]S4). With the enrichment analysis, we found a trend that the proportion of driver genes in susceptibility regions decreased with the increasing size of susceptibility regions, as from 50 kb to 500 kb upstream and downstream of risk SNPs. Inversely, the enrichment P‐values increased from 0.0956, 0.1248, 0.175 to 0.3064 (Figures [103]2 and [104]3A, Table [105]S5). Figure 2. Figure 2 [106]Open in a new tab The enrichment analysis of Driver genes in SNP susceptibility regions. The figure showed the proportion results of gastric cancer driver genes in the 4 SNP susceptibility regions: A, The 50 kb‐amplification susceptibility regions. B, The 100 kb‐amplification susceptibility regions. C, The 200 kb‐amplification susceptibility regions. D, The 500 kb‐amplification susceptibility regions Figure 3. Figure 3 [107]Open in a new tab The tendency of Driver genes in SNP susceptibility region and the gene‐based enrichment analysis. A, The trend of proportions of gastric cancer driver genes and −log10 transformed enrichment p‐values of driver genes in 50 kb, 100 kb, 200 kb, and 500 kb upstream and downstream of risk SNPs. B, A bar graph of the percentage of CSGs in the 276 driver genes compared with all genes (56 318) in GENCODE (V19) Gene‐based enrichment analysis showed that 2 CSGs were among the 276 driver genes (0.72%), which represented a 5.51‐fold enrichment compared with the 56 318 annotated genes in GENCODE (P = .05, Figure [108]3B). According to pathway enrichment analysis result, CSGs and driver genes were commonly enriched in 8 biological pathways. These pathways included insulin receptor signaling cascade, PI3K cascade, PPARA activates gene expression, semaphoring interactions, signaling by insulin receptor, signaling by PDGF, PERK regulated gene expression, and other semaphoring interactions (Figure [109]4, Table [110]S6). Figure 4. Figure 4 [111]Open in a new tab Reactome pathways with FDR<0.1 in hyper‐representative test for gastric cancer driver genes and gastric cancer susceptibility genes. A, Enrichment ratios of significant pathways. B, −log10 transformed false discovery rates (FDRs) of significant pathways Association analysis identified a total of 130 associations (P < .05) between somatic mutations in driver genes and risk SNP genotypes, of which the strongest associations were rs9841504 at 3q13.32 with POLE (P = 7.36 × 10^−4, Table [112]S7) and rs2285947 at 7p15.3 with SOS1 (P = 2.06 × 10^−3, Table [113]S7). However, no association remained significant after FDR adjustment (Table [114]S7). We wondered whether the 130 SNP‐gene pairs remain significant on gene expression level, followed with an expression quantitative trait loci (eQTL) analysis in stomach tissues based on GTEx. As a result, the eQTL analysis identified 11 associations between the gene expression levels and genotypes (P < .05, Figure [115]S1, Table [116]S8). However, no results remain significant after FDR adjustment. Analysis of truncation mutations in key pathways identified one significant result with FDR < 0.1 (Table [117]S9). The risk allele (A) of rs2285947 (7p15.3) was associated with Signaling by PDGF pathway (OR = 0.26, 95% CI: 0.12‐0.55, P = 3.93 × 10^−4) (Table [118]1). To further explore the association between rs2285947 and signaling by PDGF pathway, we performed a stratified analysis by age, sex, ethnicity, and tumor stage status. As shown in Table [119]S10, the association remained significant in both age groups, females and Caucasians. Table 1. The associations between rs2285947 and truncation mutation within Signaling by PDGF pathway Genotypes Cases with mutation (n = 28)[120]^a Cases without mutation (n = 386)[121]^a OR (95% CI)[122]^b P value[123]^b N % N % GG 18 64.29 131 33.94 1 GA 10 35.71 191 49.48 0.36 (0.15‐0.85) 1.94E−02 AA 0 0 64 16.58 ‐ ‐ GA/AA 10 35.71 255 66.06 0.24 (0.10‐0.58) 1.38E−03 Additive model 0.26 (0.12‐0.55) 3.93E−04 [124]Open in a new tab ^a Patients with or without somatic truncation mutations in pathway. ^b Odds ratios, 95% confidence intervals, and P values were calculated using logistic regression models adjusting for age, gender, clinical stage, and the first 10 principal components. We investigated the interactions between risk SNP genotypes and somatic copy number alterations (SCNAs). Results showed there were several “germline‐SCNAs” associations, like rs2494938 (6p21.1) with the ERBB3 gene copy number values (P = 2.38 × 10^−3), rs13361707 (5p13.1) with the significant focal amplification region 12p12.1 (P = 9.03 × 10^−3), and rs1679709 (6p22.1) with the significant focal deletion region 9q21.11 (P = 5.33 × 10^−3) (Tables [125]S11‐S13). However, we did not find any association with FDR < 0.1. There were 11 patterns of mutational signatures found in gastric cancer, and our analysis identified 5 pairs of associations with P‐values less than .05 between risk SNPs genotypes and the weight of mutational signature (Table [126]S14). Signature15 were associated with rs1679709 (6p22.1) (P = .026), as signature15 was DNA mismatch repair related signature according to COSMIC website. Thus, we further investigated whether rs1679709 genotypes were associated with expression of MMR genes. The results showed rs1679709 was concerned with 1 gene in gene‐level copy number values: RFC4 (P = 1.25 × 10^−2) (Figure [127]5, Table [128]S15). Figure 5. Figure 5 [129]Open in a new tab The association between genotypes of rs1679709 and gene copy number values of RFC4 gene. The box plot displays the first and third quartiles (top and bottom of the boxes), the median (band inside the boxes), and the lowest and highest point within 1.5 times the interquartile range of the lower and higher quartile 4. DISCUSSION Region enrichment analysis showed that there was no difference between the proportion of driver genes in cancer susceptibility regions and in the background regions. A recent study has reported that cancer susceptibility regions have gene mutation frequencies comparable to background mutation frequencies.[130]36 Besides, with the gene‐based enrichment analysis we observed that gastric cancer CSGs were more likely to be enriched in driver genes, although it was only marginal significant. It makes sense as somatic cancer driver mutations and germline cancer predisposing mutations are highly overlapped, whereas such mutual interrogation had been underestimated due to the artifact of different research approaches.[131]23 In addition, pathway enrichment analysis found CSGs and driver genes were commonly enriched in 8 Reactome pathways. The result implied that germline mutations and somatic mutations may work together in some particular biological pathways during gastric cancer development. Our interaction analysis of somatic mutations in driver genes and risk SNPs genotypes identified several significant SNP‐gene pairs. As a previous study reported, genetic background could influence the somatic evolution of a tumor by modifying the likelihood of acquiring mutations in specific cancer genes,[132]37 which could possibly explain why there were a number of SNP‐gene pairs found in our study. A pathway‐based analysis was performed later, and we detected that the risk allele of rs2285947 (A) was significantly associated with the occurrence of truncation mutations in Reactome pathway Signaling by PDGF. PDGF pathway has long been implicated in cancers and is known to be involved in many biological processes.[133]38 Rs2285947 is an intronic variant in DNAH11 and is able to regulate expression levels of DNAH11 in stomach tissues based on GTEx (P = 3.40 × 10^−5), as DNAH11 is a microtubule‐dependent motor ATPase according to GeneCards website. Results showed that risk allele of rs2285947 (A) has an inverse association with DNAH11 gene expression (effect size = −0.28) and PDGF signaling pathway (Beta = −1.35) at the same time. One possible interpretation is that rs2285947 may affect the ATPase activity by regulating DNAH11 expression level, and it may influence the release of PDGF‐R, since E5 oncoprotein can form a ternary complex with PDGF‐R and the 16K subunit of the vacuolar V‐ATPase.[134]39 These PDGF receptors may dimerize and undergo autophosphorylation and then attract downstream effectors to transduct the signal into the cell.[135]40, [136]41, [137]42 According to the “two‐hit” model which explain the interaction of germline and somatic mutation, individuals with elevated genetic predisposition may require fewer stages to develop a tumor than those at lower genetic risk. Thus, it is possible that the observed inverse association between rs2285947 and PDGF signaling pathway is a result of the continuous process of cancer development, where both germline variants and somatic alterations contribute to the development of gastric cancer. In the past few years, there are 30 patterns of mutational signatures be found across the spectrum of human cancer types from many large‐scale analyses,[138]43, [139]44, [140]45, [141]46, [142]47 including 11 types of signatures found in gastric cancer. We observed a significant association between rs1679709 and the weights of COSMIC Signature 15. Signature 15 is linked to defective DNA mismatch repair according to COSMIC website.[143]48 In the following analysis, we found gene copy number values of MMR gene RFC4 was also related to rs1679709 genotypes. Therefore, rs1679709 might influence defective DNA mismatch repair associated mutational signatures in gastric cancer, via regulating the copy number changes in relevant genes like RFC4. Nowadays fundamental gaps remain in our knowledge of how normal cells evolve into cancer cells and what are the vital potential genetic backgrounds. In this research, we did identified several interactions between germline susceptibility loci and somatic mutations in gastric cancer. However, the exact mechanism how the germline alleles affect the progression of later somatic events is still unknown because of the limit of study sample scale and the lack of functional experiments. In addition, it is a limitation that there are only 89 Asians among 443 cases in TCGA STAD, but most germline susceptibility loci were identified in Asian populations. We believe with more susceptibility loci discovered and larger samples of sequencing studies performed in the future, especially in Asian population, the networks of informative interactions between germline mutation and somatic mutation in gastric cancer will eventually be revealed, which may make contributions to the precision medicine. CONFLICT OF INTEREST The authors have no conflict of interest. Supporting information [144]Click here for additional data file.^ (377.8KB, pdf) [145]Click here for additional data file.^ (11.2KB, xlsx) [146]Click here for additional data file.^ (11.2KB, xlsx) [147]Click here for additional data file.^ (13.3KB, xlsx) [148]Click here for additional data file.^ (26.9KB, xlsx) [149]Click here for additional data file.^ (10.1KB, xlsx) [150]Click here for additional data file.^ (11KB, xlsx) [151]Click here for additional data file.^ (376.5KB, xlsx) [152]Click here for additional data file.^ (17.8KB, xlsx) [153]Click here for additional data file.^ (18KB, xlsx) [154]Click here for additional data file.^ (11KB, xlsx) [155]Click here for additional data file.^ (258.3KB, xlsx) [156]Click here for additional data file.^ (42.7KB, xlsx) [157]Click here for additional data file.^ (56.2KB, xlsx) [158]Click here for additional data file.^ (20.1KB, xlsx) [159]Click here for additional data file.^ (11.9KB, xlsx) ACKNOWLEDGMENTS