Abstract Schizophrenia is a common psychiatric disorder with high heritability and complex genetic architecture. Genome-wide association studies (GWAS) have identified several significant loci associated with schizophrenia. However, the explained heritability is still low. Growing evidence has shown schizophrenia is attributable to multiple genes with moderate effects. In-depth mining and integration of GWAS data is urgently expected to uncover disease-related gene combination patterns. Network-based analysis is a promising strategy to better interpret GWAS to identify disease-related network modules. We performed a network-based analysis on three independent schizophrenia GWASs by using a refined analysis framework, which included a more accurate gene P-value calculation, dynamic network module searching algorithm and detailed functional analysis for the obtained modules genes. The result generated 79 modules including 238 genes, which form a highly connected subnetwork with more statistical significance than expected by chance. The result validated several reported disease genes, such as MAD1L1, MCC, SDCCAG8, VAT1L, MAPK14, MYH9 and FXYD6, and also obtained several novel candidate genes and gene-gene interactions. Pathway enrichment analysis of the module genes suggested they were enriched in several neural and immune system related pathways/GO terms, such as neurotrophin signaling pathway, synaptosome, regulation of protein ubiquitination, and antigen processing and presentation. Further crosstalk analysis revealed these pathways/GO terms were cooperated with each other, and identified several important genes, which might play vital roles to connect these functions. Our network-based analysis of schizophrenia GWASs will facilitate the understanding of genetic mechanisms of schizophrenia. Introduction Schizophrenia (SZ) is a severe psychiatric disorder affecting ~5‰ of the population worldwide [[29]1, [30]2]. Family and twin studies have shown schizophrenia has high heritability ranging 73%–90% [[31]3]. Genome-wide association study (GWAS) is an effective strategy to screen disease related genetic factors at genome level [[32]4, [33]5]. Plenty of GWASs have been conducted for schizophrenia and have identified many susceptibility loci [[34]6–[35]12], but they could only explain a small proportion of the genetic component of schizophrenia. The underlying genes remain largely unknown, especially the interplays between these susceptibility genes. It has been commonly accepted that the combination effects of multiple loci with moderate statistical significance, which might be lost in the GWAS approach due to the stringent significance level after multiple comparison correction, might make a risk contribution to schizophrenia. To detect the common polygenic variations contributed to schizophrenia, analyses that focus on pathways or gene combination patterns could facilitate the reveal of disease related genes and the underlying genetic basis [[36]13]. Pathway-based analysis (PBA) is one type of method employed to identify trait-associated pathways from GWAS [[37]13] and has been used in many psychiatric diseases [[38]14–[39]16], autoimmune diseases [[40]17, [41]18] etc. However, PBA depends on prior biological pathways and treats the whole pathway as a single unit, it cannot detect a small portion of the pathway or other new combination of genes which may be associated with disease. Recent years, network-based analysis, which was originally used for gene expression data [[42]19], was proposed to analyze GWAS data [[43]20] and has been applied into several diseases with different analysis framework [[44]21–[45]24]. This method aims to identify connected network modules that show significant disease association signals by integrating GWAS data with molecular network. Compared with pathway-based analysis, the molecular network used in network-based analysis has higher coverage of human genes and the resulted modules could be of any size [[46]20]. The Psychiatric Genetic Consortium (PGC) has collected a wealth of valuable genotypic data for schizophrenia, while the traditional single marker analysis will miss a large proportion of weakly associated markers. Network-based analysis of these data would facilitate the discovery of novel susceptibility genes or modules for schizophrenia. Despite there have been three network-based analyses for schizophrenia [[47]21, [48]24, [49]25], the analysis pipeline still needs to further refine to sufficiently explore valuable information from the large number of genotyping data. Firstly, considering the collections of GWAS data from PGC were genotyped using various platforms and the sample sizes varied largely, how to effectively in-depth mine these data requires continuous exploration to obtain new knowledge. Secondly, it has been shown that gene P-value calculation is a critical step in network-based analysis [[50]25]. Several gene-based association test methods have been developed and they have better performance than minimal P-value of SNPs in one gene, such as VEGAS [[51]26] and GATES [[52]27]. Furthermore, besides detecting the combined pattern of multiple genes relevant to schizophrenia, deeply functional annotation and analysis of the identified module genes could provide new insights for the understanding of disease mechanism. In this study, we performed a network-based analysis on three independent schizophrenia GWASs from PGC by using a more accurate gene P-value calculation [[53]26], dynamic network module searching algorithm [[54]20] and detailed functional analysis for the obtained module genes. The result generated 79 modules including 238 genes, which formed a highly connected subnetwork with more statistical significance than expected by chance. The result validated several reported disease genes, such as MAD1L1, MCC, SDCCAG8, VAT1L, MAPK14, MYH9 and FXYD6, and also obtained several novel candidate genes and gene-gene interactions. Pathway enrichment analysis for the module genes showed they were significantly enriched in several neural and immune system related signaling pathways, biological processes, and protein activities. Furthermore, crosstalk analysis of these pathways assisted the understanding of the functional connections between them and revealed several important genes, such as MAPK1, PSMB8, TRAF6 and CAV1, which might play vital roles during these functions. Our network-based analysis of schizophrenia GWASs would facilitate the understanding of genetic mechanism of schizophrenia. Materials and Methods GWAS datasets and preprocessing The GWAS data used in this study was from PGC and approved by the National Institute of Mental Health (NIMH) ([55]https://www.nimhgenetics.org/available_data/data_biosamples/gwas_d ata.php). Six collections of GWAS data genotyped on Affymetrix 6.0 platform and three collections of GWAS data genotyped on Affymetrix 500K platform were selected for this study. Then, we merged these datasets into three groups and named them as MGS, Affy6, and Affy500K separately. The detailed information about the GWAS datasets was summarized in [56]Table 1. A consistent quality control process was conducted for the three GWAS datasets separately using PLINK [[57]28]. Briefly, (i) SNPs were excluded if they had a missing rate >0.05 (before sample removal below); (ii) we removed samples with missing rate >0.05 or heterozygosity rate outside 3 standard deviation from the mean; (iii) we calculated identity by descent (IBD) after pruning SNPs using r ^2 = 0.2 as threshold, and then excluded duplicated or related individuals with IBD>0.185; and (iv) we excluded SNPs with significant call rate difference in cases and controls (P<10^−5), minor allele frequency (MAF) <0.05 or Hardy-Weinberg equilibrium P<10^−6. Principal component estimation was done with EIGENSTRAT [[58]29], which used the same collection of SNPs for IBD calculation. We estimated the first 20 principle components and evaluated their impact on the genome-wide test statistics using λ as [[59]11]. Based on this, we used the top five principle components with the smallest λ together with the study indicator variables as associated covariates for logistic regression test to calculate P-values of SNPs. To check the P-value of identified module genes in a larger dataset, SCZ2 was applied [[60]30]. SCZ2 SNP P-value list was downloaded from PGC website ([61]http://www.med.unc.edu/pgc/downloads) and filtered to keep only SNPs whose imputation info score > 0.8. More than eight million SNPs were left for subsequent analysis. Table 1. Schizophrenia GWAS datasets used for this analysis. Collection Country Platform Cases[62] ^b Controls[63] ^b SNPs[64] ^b MGS United States, Australia Affymetrix 6.0 2681 (2679) 2653 (2653) 638,937 (576,742) Affy6 1805 (1800) 2351 (2327) 541,356 (490,100) ISC-Cardiff Bulgaria Affymetrix 6.0 527 609 654,278 ISC-Dublin Ireland Affymetrix 6.0 272 860 642,723 ISC-Edinburgh UK Affymetrix 6.0 368 284 646,310 ISC-SW2 Sweden Affymetrix 6.0 390 229 661,602 SGENE-TOP3 Norway Affymetrix 6.0 248 369 620,195 Affy500K 1078 (1058) 3591 (3509) 203,009 (182,499) Cardiff UK[65] ^a UK Affymetrix 500K 476 2,938 362,569 CATIE United States Affymetrix 500K; Perlegen 164K 410 391 384,528 Zucker Hillside United States Affymetrix 500K 192 190 258,470 [66]Open in a new tab ^a The controls of Cardiff UK were from WTCCC [[67]69]. ^b The number of cases, controls and SNPs after quality control were labeled in parentheses. Protein-protein interaction network dataset The protein-protein interaction (PPI) network data was collected from InWeb [[68]31], which contains high-confidence interactions as those seen in multiple independent experiments and reported more often in lower-throughput experiments. After mapping with approved gene symbol in HGNC [[69]32] and removing self-interactions, the remaining PPI network was comprised of 12,387 proteins and 156,095 interactions. Gene-based association analysis In the network analysis pipeline, the gene-based association analysis was conducted using VEGAS [[70]26], which takes into account both the effects of all SNPs mapped to gene and linkage disequilibrium (LD) between SNPs by using simulations from the multivariate normal distribution to calculate the gene P-values. VEGAS allocates SNPs to one or more autosomal genes according to gene positions, in which gene version of hg18 was used and LD patterns of SNPs were estimated using HapMap data. Herein, the offline version of VEGAS was used. All SNPs mapped to gene within 20kb upstream and downstream of gene coordinates were used to calculate the gene P-value. In addition, several improvements were made based on the downloaded version. First, the SNP positions of three GWAS datasets and gene version were updated to GRCh37.p11 and downloaded from Ensembl BioMarts [[71]33]; second, all genes with HGNC symbols were filtered to ensure at least one SNP was mapped within 20kb of gene coordinates; third, the SZ specific genotype data was used to estimate LD patterns for SNPs within each gene instead of the HapMap data. Adjust gene P-values was calculated using p.adjust function in R by inputting gene P-value list as input and setting adjust method as Benjamini & Hochberg (FDR). To calculate gene P-value from the largest scale of GWAS data SCZ2 for replication, we used an alternative gene-based association test tool, i.e. GATES [[72]27], which does not need permutation or simulation to evaluate empirical significance and is much faster. 1000 Genome phase I [[73]34] EUR genotype data (hg19) were used for LD calculation. Extended gene region length was 20kb at both 5’ and 3’. Threshold of r ^2 for SNPs in high LD was 0.8. Other parameters were default. Network module search and evaluation We used a dense module search (DMS) method, which is a R package developed by Jia et al. (called dmGWAS) [[74]20], to identify functional modules enriched for SZ association signals. The DMS algorithm dynamically searches for a dense module that holds as many genes with small P-value as possible for each node in the context of a node-weighted PPI network. The module score is defined as [MATH: Zm=zi< msqrt>k :MATH] , where k is the number of genes within a module, z [i] is transferred from gene P-value according to z [i] = ∅^−1(1 − P [i]), in which ∅^−1 denotes the inverse normal distribution function [[75]19]. In each round of module searching, the DMS algorithm starts with each seed gene as the initial module and identifies neighborhood interactors, which are defined as nodes whose shortest path to the module is within a distance d. The genes generating the maximum increment of Z [m] will be added to the module if Z [m+1] > Z [m](1 + r), where Z [m] is the original module score, Z [m+1] is the new module score, and r is a pre-defined expansion rate. Herein, d and r were set to 2 and 0.1 as [[76]20]. This process iterates until none of the nodes can satisfy Z [m+1] > Z [m](1 + r). To evaluate the significance of the identified modules, we undertook two steps of examination. First, we calculated P-values based on module score (Z [m]) for each module by empirically estimating the null distribution [[77]35]. Specifically, module scores Z [m] were first median-centered by subtracting the median value of Z [m] from each of them (Z [median]). Then, the mean δ and standard deviation σ for the empirical null distribution were estimated using locfdr in R packages. The module scores were standardized by [MATH: Zs=Zmedianδσ :MATH] and converted to P-values by P(Z [m]) = 1 − ∅(Z [S]), where ∅ is the normal cumulative density function. Second, we made a cross evaluation among three GWAS datasets using ‘dualEval’ function in the dmGWAS package to minimize the bias from different GWAS datasets [[78]20]. Details of cross evaluation are provided in dmGWAS document. Briefly, the modules from one GWAS dataset were used as discovery dataset and the modules’ significance in the other two GWAS datasets, which were as evaluation datasets, were evaluated in turn (i.e. P(Z [m(eval)]) was calculated). The criteria used to screen modules were the modules with P(Z [m])<0.05 in the discovery dataset and P(Z [m(eval)])<0.05 in any of the two evaluation datasets. Finally, the modules passed the tests from all three datasets were merged together and denoted as the resultant modules. The workflow of the network-based analysis for SZ GWAS data was shown in [79]S1 Fig. Statistical and functional analysis of module genes DAPPLE (Disease Association Protein-Protein Link Evaluator) [[80]36] was used with input of a list of genes to evaluate how closely connected between the proteins encoded by module genes. Number of permutation was set to 10,000 times and common interactor binding degree cutoff was set to 2. We used the DAVID Gene Set Enrichment Analysis tool [[81]37, [82]38] to evaluate the enriched functions of identified module genes, including GO biological process, GO cellular component, GO molecular function [[83]39] and KEGG pathways [[84]40]. Only terms with an adjusted P-value (Benjamini & Hochberg) of less than 0.05 and with gene numbers less than 350 were retained as [[85]41]. Results Gene-based tests We calculated gene P-values for the three SZ GWASs datasets using VEGAS with some refinements. The result was shown in [86]Table 2. In order to perform network-based analysis, each node in the PPI network was assigned a gene P-value as the node weight if the name was matched. Accordingly, a MGS-weighted PPI consisting of 11,456 genes, an Affy6-weighted PPI consisting of 11,362 genes, and an Affy500K-weighted PPI consisting of 10,232 genes were generated, and these three weighted PPIs were denoted as background sets for the follow-up network analysis. There were 651 (5.7%), 681 (6.0%), and 589 (5.8%) genes with P-value < 0.05 in the background sets of MGS, Affy6 and Affy500K separately. Table 2. Results for gene-based test and dynamic module search of three schizophrenia GWAS datasets. GWAS datasets MGS Affy6 Affy500K No. of genes Total genes 27,188 26,899 23,420 Genes in PPI 11,456 11,362 10,232 Genes in PPI and P-value < 0.05 651 (5.7%) 681 (6.0%) 589 (5.8%) Genes in PPI and FDR < 0.6 56 (0.49%) 6 (0.05%) 46 (0.45%) No. of modules Initial dmGWAS modules 10,910 10,786 9,481 ① evaluation, P(Z [m]) < 0.05 786 123 612 ② evaluation in Affy6, P(Z [m(eval)]) < 0.05 17 - 41 ③ evaluation in Affy500K, P(Z [m(eval)]) < 0.05 11 19 - ④ evaluation in MGS, P(Z [m(eval)]) < 0.05 - 34 5 Final modules: ①∩(②∪③∪④) 27 6 46 Involved genes (238) 68 29 146 Genes with P-value < 0.05 34 (50%) 17 (58.6%) 83 (56.8%) Genes with FDR < 0.6 9 (13.24%) 6 (20.69%) 15 (10.27%) [87]Open in a new tab Disease related network modules We next sought to identify modules enriched for SZ association signals in the background PPIs using dmGWAS. By performing dense module search with each gene in the background PPI as a seed, 10,910 modules for MGS, 10,786 modules for Affy6, and 9,481 modules for Affy500K were initially obtained ([88]Table 2). After examining the significance of module score and cross validation, 27 modules (including 68 genes) for MGS, 6 modules (including 29 genes) for Affy6 and 46 modules (including 146 genes) for Affy500K met the criteria. Furthermore, compared with the proportion of genes with P-value < 0.05 contained in the background PPI sets, the identified module genes for three datasets have a higher proportion of genes with P-value < 0.05 (as shown in [89]Table 2). After merging all modules from three datasets, a total of 79 disease related modules were obtained and they formed a resultant subnetwork ([90]Fig 1), which was composed of 238 module genes ([91]S1 Table) and 317 interactions ([92]S2 Table) between the genes in the same module. In this subnetwork, two genes were shared between MGS and Affy6 (PSMB8 and ZFYVE20), and three genes were shared between MGS and Affy500K (TRAF6, MYL12A and CAV1). In addition, some interactions were overrepresented in different modules, such as RPL35—SNORD21—PSAT1—PTPN21 from MGS, FCHSD2—WASL—GRB2—MAPK14 from Affy6, PSMA6—VHL—DGKI and TIAM1- ITIH1 from Affy500K. Fig 1. Protein-protein interaction network involving all merged module genes. [93]Fig 1 [94]Open in a new tab Square nodes denote the reported genes associated with schizophrenia or bipolar disorder. The color of the node was proportioned with the P-value of gene. The width of the edge was proportioned with the No. of repeats of the edge in the modules. The purple edges, green edges and blue edges were interactions from MGS, Affy6 and Affy500K respectively. Annotation and functional analysis of module genes To annotate the module genes which had been reported to be associated with schizophrenia, we searched two genetic databases: GWAS Catalog [[95]42] and SZGene [[96]43], and found that four genes (MAD1L1, SDCCAG8, MCC and VAT1L) had been reported their susceptibility with SZ by GWAS [[97]11, [98]44–[99]46], and three genes (MAPK14 [[100]47], MYH9 [[101]48] and FXYD6 [[102]49]) had been reported at least one positive association in SZGene. Since many evidence suggested that schizophrenia and bipolar disorder (BD) share some symptoms and genetic factors [[103]50], we also searched these genes in the bipolar disorder genetic database BDgene [[104]51]. The result showed that 6 genes (including MAD1L1 [[105]52], JAM3 [[106]53], MYL12B [[107]53], MYL12A [[108]53], ITIH1 [[109]52] and RYR2 [[110]54]) had been reported to be significant associated with bipolar disorder in at least one genetic study. To investigate the statistical significance of the interactions among proteins encoded by the module genes, we analyzed these genes by using DAPPLE. The results showed 230 out of the 238 module genes participated in the direct network ([111]S2 Fig), and the direct PPI network of module genes had significantly more edges than expected by chance (P = 9.9 × 10^−5), which means the network formed by the module genes were statistically significantly connected. To explore the biological function of the module genes, pathway enrichment analyses were conducted for 238 merged module genes, 68 MGS module genes, 29 Affy6 module genes and 146 Affy500K module genes respectively. The results were listed in [112]Table 3. The pathways enriched by 238 merged module genes involved several signaling pathways (such as neurotrophin signaling pathway, VEGF signaling pathway), biological processes related with cellular adhesion, regulation of actin cytoskeleton, leukocyte transendothelial migration and regulation of protein metabolism, modification and ubiquitination, and cellular component of synaptosome. In addition, the analyses of the module genes from three separate datasets enriched an additional signaling pathway (GnRH signaling pathway) and one biological process (antigen processing and presentation). Table 3. Enriched KEGG pathways and GO terms by module genes. Gene Source[113] ^a Category Term N (X)[114] ^b FDR[115] ^c Merged KEGG hsa04670:Leukocyte transendothelial migration 118 (15) 8.32×10^−5 Merged KEGG hsa04722:Neurotrophin signaling pathway 124 (14) 0.0011 Merged KEGG hsa04370:VEGF signaling pathway 75 (11) 0.0024 Merged KEGG hsa04810:Regulation of actin cytoskeleton 215 (17) 0.0058 Merged KEGG hsa04510:Focal adhesion 201 (16) 0.0114 Merged KEGG hsa04530:Tight junction 134 (13) 0.0154 Merged KEGG hsa04910:Insulin signaling pathway 135 (13) 0.0166 Merged KEGG hsa05200:Pathways in cancer 328 (22) 0.0019 Merged KEGG hsa05214:Glioma 63 (10) 0.0044 Merged KEGG hsa05222:Small cell lung cancer 84 (10) 0.0478 Merged GO CC GO:0019717~synaptosome 85 (10) 0.0065 Merged GO BP GO:0051247~positive regulation of protein metabolic process 243 (17) 0.0016 Merged GO BP GO:0032270~positive regulation of cellular protein metabolic process 233 (16) 0.0047 Merged GO BP GO:0031399~regulation of protein modification process 295 (17) 0.0196 Merged GO BP GO:0031400~negative regulation of protein modification process 119 (11) 0.0229 Merged GO BP GO:0031396~regulation of protein ubiquitination 100 (10) 0.0358 Merged GO BP GO:0031401~positive regulation of protein modification process 187 (13) 0.0497 Affy500K KEGG hsa04670:Leukocyte transendothelial migration 118 (11) 6.8×10^−4 Affy500K KEGG hsa04722:Neurotrophin signaling pathway 124 (10) 0.0096 Affy500K KEGG hsa05200:Pathways in cancer 328 (15) 0.0133 Affy500K KEGG hsa04530:Tight junction 134 (10) 0.0180 Affy6 KEGG hsa05214:Glioma 63 (6) 0.0026 Affy6 KEGG hsa04912:GnRH signaling pathway 98 (6) 0.0234 MGS GO BP GO:0019882~antigen processing and presentation 83 (6) 0.0422 [116]Open in a new tab ^a Merged denotes all genes of the merged modules from MGS, Affy6 and Affy500K. ^b N is total number of genes in the pathway or GO term. X is number of input genes which is mapped to the pathway. Only pathways or GO terms with N < 350 were shown. ^c FDR is the Benjamini & Hochberg-adjusted P-value. Only pathways or GO terms with FDR < 0.05 were shown. CC: cellular component; BP: biological process. To further understand the functional connections between these enriched pathways/GO terms, a crosstalk analysis was performed for them. According to their function and shared genes, the enriched pathways/GO terms (except three cancer related pathways) were classified into four groups as shown in [117]Fig 2. The first group included eight pathways from KEGG, which connect with many basic signaling pathways, such as MAPK signaling pathway, PI3K-Akt signaling pathway, and calcium signaling pathway. Furthermore, genes MAPK1, CDC42, PIK3CA, PIK3CG and PIK3R1 were commonly shared by most of the eight pathways. The second group was synaptosome, which shares genes ITGB1 and MPDZ with the first group. The third group included six GO biological processes related with regulation of protein metabolic, modification, and ubiquitination, which are functionally related with neurotrophin signaling pathway through pathway ubiquitin mediated proteolysis. All these six GO terms involved several proteasome related genes, including PSMA2, PSMB6, PSMB7, PSMB8, PSMC1 and PSMD14, and ubiqutin C (UBC). The last group was antigen processing and presentation, which involves several immune related genes, such as HLA-A, B2M, CD1D and TAP2. The shared genes between these four groups of pathways/GO terms were shown in [118]Fig 2 panel C. Fig 2. Crosstalk analysis of enriched pathways/GO terms. [119]Fig 2 [120]Open in a new tab Panel A shows the connections among the enriched pathways/GO terms. Blue squares are enriched KEGG pathways, green squares are their connected pathways from KEGG, red squares are enriched GO terms. Panel B shows the shared genes between the first group of enriched pathways. Panel C shows the shared genes between four groups of enriched pathways/GO terms. The genes in groups with more than one pathway/GO term were combined for the shared gene analysis. Discussion Genetic data of schizophrenia has been accumulated rapidly over the past several years. In this study, we conducted a network-based analysis on three independent schizophrenia GWAS datasets to explore the joint effects of multiple genetic association signals on schizophrenia. After merging the modules identified separately from the three GWASs, 79 modules enriched with robust genetic signals in GWAS datasets were screened out, and 238 genes were involved in the identified modules. These modules and module genes could provide potential candidate genes, gene-gene interactions and molecular pathways involved in schizophrenia pathogenesis. Of the 238 module genes, seven schizophrenia candidate genes had been collected in either GWAS Catalog or SZGene. Since the network-based analysis aims to search for module genes, which is a combination of several jointly GWAS signals, some genes with bigger P-value may also be selected. For example, among the seven replicated genes, although most of them had P-value < 0.05 in at least one of three SZ GWAS datasets, gene VAT1L had P-value > 0.05 in all three SZ GWAS datasets. The identification of VAT1L by our network-based analysis might be due to its connection with another two genes with P-value < 0.05 (SEPT3 and ARPC5L). One of the seven genes, MYH9, had been analyzed as a candidate gene for schizophrenia in Japanese population [[121]48], which was a three-stage case-control association study. In the first and second stages, the authors found a potential association of MYH9 with schizophrenia (allelic P-value = 0.047). In the third stage, however, they could not replicate the result in a larger sample size. In another study conducted in Taiwanese sample, MYH9 was found as a vulnerable gene for neuropsychological defined subgroups of schizophrenia patients (P-value = 0.0059 with haplotype analysis) [[122]55]. So far, it is still arguable for MYH9 about its susceptibility to schizophrenia, especially in Caucasian population. In addition, considering the possible shared genetic variants between SZ and BD, we also investigated how many identified module genes had been reported their susceptibility to BD. One of the results was MAD1L1 gene, which also has been clearly reported its association with SZ by GWAS (P-value < 5 × 10^−8) [[123]44–[124]46]. Another interesting gene is ITIH1, which had P-value = 1.03 × 10^−5 in Affy500K, and was involved in an overrepresented interaction (ITIH1—TIAM1) in Affy500K modules. Notably, ITIH1, ITIH3 and ITIH4 belong to a family of serine protease and are arranged in the order of ITIH1-ITIH3-ITIH4 on chromosome 3p21. Currently, ITIH molecules have been found to play a particularly important role in inflammation [[125]56]. In addition, the region ITIH3-ITIH4 has been identified to be susceptible to schizophrenia by GWAS (P-value = 7.8 × 10^−9) [[126]11]. In the study by He et al., they found three out of six SNPs they tested within the ITIH1-ITIH3-ITIH4 genomic region were significantly associated with SZ in the Han Chinese population (the strongest SNP was rs2710322 with allelic P-Bonferroni = 0.0018) [[127]57]. These lines of evidence implied that IT1H1 might be not only associated with BD but also SZ. Among the unreported module genes, it is noteworthy that DGKI gene had gene P-value < 0.05 in all datasets (gene P-value in MGS = 3.4 × 10^−2, in Affy6 = 2.7 × 10^−3, in Affy500K = 2.17 × 10^−4). In addition, the interactions between DGKI and VHL and PSMA6 genes (PSMA6—VHL—DGKI) were overrepresented in Affy500K modules. Actually, DGKI has been reported to be related with schizophrenia at gene level test (gene-wide P [min] = 6.7×10^−4) [[128]53]. Among the 238 module genes, there were 155 genes with P-value < 0.05 (65%) in any of MGS, Affy6 or Affy500K dataset. To check the P-values of our 238 module genes in a larger dataset, we calculated the gene P-values in SCZ2. Considering the huge number of SCZ2 SNPs, besides VEGAS, we also used GATES [[129]27] to calculate the gene P-value in parallel. The results were shown in [130]S1 Table. In summary, among all module genes, there were 88 and 61 genes with P-value < 0.05 by using VEGAS and GATES respectively, which contained totally 93 genes (39.1%), including several possible candidate genes mentioned above, such ITIH1, DGKI. As one of the first group of pathways, neurotrophin signaling pathway has been reported to be associated with schizophrenia in several analyses [[131]24, [132]25]. Neurotrophins are a family of trophic factors involved in differentiation and survival of neural cells [[133]58]. The dysfunction of neurotrophin signaling pathway can affect axonal outgrowth, axonal guidance and synapse formation (from KEGG), which is functionally related with the second group of enriched GO term synaptosome. There is considerable evidence showing that abnormalities of synapse connectivity, synaptic transmission and synapse development contribute to the pathogenesis of schizophrenia [[134]59, [135]60], so the enrichment of synaptosome is consistent with previous reports. Another interesting pathway is regulation of protein ubiquitination. The ubiquitin proteasome system has been identified as a canonical pathway associated with several neuropsychiatric disorders, including Alzheimer’s [[136]61], Parkinson’s [[137]62] and bipolar disorder [[138]63]. Recently, Maria et al. have reported the abnormalities of ubiquitination system in schizophrenia by using gene expression analysis [[139]64]. The evidence supported the association of this pathway with schizophrenia. The enrichment of antigen processing and presentation is also consistent with the finding of the association of genes in MHC region with schizophrenia in several GWASs of schizophrenia [[140]11, [141]12]. Meanwhile, the shared gene analysis between different groups of pathways could reveal some important connected genes. PSMB8 and TRAF6, both of which were shared by two groups of pathways and identified by two GWAS datasets separately, were two notable examples. PSMB8 encodes proteasome, which cleaves peptides in an ATP/ubiquitin-dependent process in a non-lysosomal pathway [[142]65]. An essential function of a modified proteasome, the immunoproteasome, is the processing of class I MHC peptides [[143]66]. TRAF6 encodes TNF receptor-associated factor 6, E3 ubiqutin protein ligase, which functions as a signal transducer in the NF-kappaB pathway that activates IkappaB kinase (IKK) in response to proinflammatory cytokines, and also interacts with ubiquitin conjugating enzymes catalyzing the formation of polyubiquitin chains [[144]67]. Thus, these two genes are related with both immune system and protein ubiquitination process and may play important functions in these pathways. In summary, the four groups of enriched pathways/GO terms were potentially associated with schizophrenia, and crosstalks among them revealed they were connected with each other through some shared genes and basic signaling pathways, which may implement the neural and immune related functions and contribute to the pathogenesis of schizophrenia. Compared with previous network-based analyses for schizophrenia [[145]21, [146]24, [147]25], our analysis included more GWAS datasets and merged them into three groups according to genotyping platforms. Cross evaluation among three groups at the module level could improve the reliability of results. In addition, our analysis pipeline used more accurate gene P-value calculation and in-depth functional analysis for the module genes. We also compared our module network with two previous related studies. At the gene level, our module genes shared 12 genes with Jia et. al [[148]21] and eight genes with Yu et. al [[149]24] (the shared genes were shown in [150]S1 Table). At the edge level (protein-protein interaction) our module network only share one edge with Jia et. al [[151]21] (VHL (pp) FN1) and one edge with Yu et. al [[152]24] (SRC (pp) GRB2). However, at the pathway level, our results validated several pathways enriched by other analyses, such as neurotrophin signaling pathway by [[153]24] and [[154]25], tight junction by [[155]21] and [[156]24], antigen processing and presentation by [[157]25], which was also validated by another PPI network analysis paper about the top genes of schizophrenia [[158]68]. Furthermore, one of the enriched GO terms of our analysis, “regulation of protein ubiquitination”, was firstly identified by network-based analysis, which was also validated by other types of studies [[159]64]. These results would provide new insights and hypothesis for further study. Additional experimental replication and verification are required in future genetic, gene expression and molecular functional studies. In conclusion, our study suggests that network-based analysis of schizophrenia GWASs is a useful method that could identify new susceptible genes, gene interactions and molecular pathways for schizophrenia. Our findings emphasize a central role for neural and immune-related pathways in the etiology of schizophrenia, and provide several candidate pathways and genes associated with schizophrenia, which would facilitate the understanding of genetic mechanism of schizophrenia. Supporting Information S1 Fig. Workflow of network-based analysis of GWAS data to identify functional modules for schizophrenia. (PDF) [160]Click here for additional data file.^ (53KB, pdf) S2 Fig. The direct network formed by the module genes from DAPPLE. (PDF) [161]Click here for additional data file.^ (280KB, pdf) S1 Table. Annotation for network module involved genes. (XLSX) [162]Click here for additional data file.^ (75.5KB, xlsx) S2 Table. PPI pairs involved in the identified modules for schizophrenia. Number of repeat denotes how many modules involved the interaction. (XLSX) [163]Click here for additional data file.^ (45.1KB, xlsx) Acknowledgments