Abstract
Schizophrenia is a common psychiatric disorder with high heritability
and complex genetic architecture. Genome-wide association studies
(GWAS) have identified several significant loci associated with
schizophrenia. However, the explained heritability is still low.
Growing evidence has shown schizophrenia is attributable to multiple
genes with moderate effects. In-depth mining and integration of GWAS
data is urgently expected to uncover disease-related gene combination
patterns. Network-based analysis is a promising strategy to better
interpret GWAS to identify disease-related network modules. We
performed a network-based analysis on three independent schizophrenia
GWASs by using a refined analysis framework, which included a more
accurate gene P-value calculation, dynamic network module searching
algorithm and detailed functional analysis for the obtained modules
genes. The result generated 79 modules including 238 genes, which form
a highly connected subnetwork with more statistical significance than
expected by chance. The result validated several reported disease
genes, such as MAD1L1, MCC, SDCCAG8, VAT1L, MAPK14, MYH9 and FXYD6, and
also obtained several novel candidate genes and gene-gene interactions.
Pathway enrichment analysis of the module genes suggested they were
enriched in several neural and immune system related pathways/GO terms,
such as neurotrophin signaling pathway, synaptosome, regulation of
protein ubiquitination, and antigen processing and presentation.
Further crosstalk analysis revealed these pathways/GO terms were
cooperated with each other, and identified several important genes,
which might play vital roles to connect these functions. Our
network-based analysis of schizophrenia GWASs will facilitate the
understanding of genetic mechanisms of schizophrenia.
Introduction
Schizophrenia (SZ) is a severe psychiatric disorder affecting ~5‰ of
the population worldwide [[29]1, [30]2]. Family and twin studies have
shown schizophrenia has high heritability ranging 73%–90% [[31]3].
Genome-wide association study (GWAS) is an effective strategy to screen
disease related genetic factors at genome level [[32]4, [33]5]. Plenty
of GWASs have been conducted for schizophrenia and have identified many
susceptibility loci [[34]6–[35]12], but they could only explain a small
proportion of the genetic component of schizophrenia. The underlying
genes remain largely unknown, especially the interplays between these
susceptibility genes. It has been commonly accepted that the
combination effects of multiple loci with moderate statistical
significance, which might be lost in the GWAS approach due to the
stringent significance level after multiple comparison correction,
might make a risk contribution to schizophrenia.
To detect the common polygenic variations contributed to schizophrenia,
analyses that focus on pathways or gene combination patterns could
facilitate the reveal of disease related genes and the underlying
genetic basis [[36]13]. Pathway-based analysis (PBA) is one type of
method employed to identify trait-associated pathways from GWAS
[[37]13] and has been used in many psychiatric diseases
[[38]14–[39]16], autoimmune diseases [[40]17, [41]18] etc. However, PBA
depends on prior biological pathways and treats the whole pathway as a
single unit, it cannot detect a small portion of the pathway or other
new combination of genes which may be associated with disease. Recent
years, network-based analysis, which was originally used for gene
expression data [[42]19], was proposed to analyze GWAS data [[43]20]
and has been applied into several diseases with different analysis
framework [[44]21–[45]24]. This method aims to identify connected
network modules that show significant disease association signals by
integrating GWAS data with molecular network. Compared with
pathway-based analysis, the molecular network used in network-based
analysis has higher coverage of human genes and the resulted modules
could be of any size [[46]20].
The Psychiatric Genetic Consortium (PGC) has collected a wealth of
valuable genotypic data for schizophrenia, while the traditional single
marker analysis will miss a large proportion of weakly associated
markers. Network-based analysis of these data would facilitate the
discovery of novel susceptibility genes or modules for schizophrenia.
Despite there have been three network-based analyses for schizophrenia
[[47]21, [48]24, [49]25], the analysis pipeline still needs to further
refine to sufficiently explore valuable information from the large
number of genotyping data. Firstly, considering the collections of GWAS
data from PGC were genotyped using various platforms and the sample
sizes varied largely, how to effectively in-depth mine these data
requires continuous exploration to obtain new knowledge. Secondly, it
has been shown that gene P-value calculation is a critical step in
network-based analysis [[50]25]. Several gene-based association test
methods have been developed and they have better performance than
minimal P-value of SNPs in one gene, such as VEGAS [[51]26] and GATES
[[52]27]. Furthermore, besides detecting the combined pattern of
multiple genes relevant to schizophrenia, deeply functional annotation
and analysis of the identified module genes could provide new insights
for the understanding of disease mechanism.
In this study, we performed a network-based analysis on three
independent schizophrenia GWASs from PGC by using a more accurate gene
P-value calculation [[53]26], dynamic network module searching
algorithm [[54]20] and detailed functional analysis for the obtained
module genes. The result generated 79 modules including 238 genes,
which formed a highly connected subnetwork with more statistical
significance than expected by chance. The result validated several
reported disease genes, such as MAD1L1, MCC, SDCCAG8, VAT1L, MAPK14,
MYH9 and FXYD6, and also obtained several novel candidate genes and
gene-gene interactions. Pathway enrichment analysis for the module
genes showed they were significantly enriched in several neural and
immune system related signaling pathways, biological processes, and
protein activities. Furthermore, crosstalk analysis of these pathways
assisted the understanding of the functional connections between them
and revealed several important genes, such as MAPK1, PSMB8, TRAF6 and
CAV1, which might play vital roles during these functions. Our
network-based analysis of schizophrenia GWASs would facilitate the
understanding of genetic mechanism of schizophrenia.
Materials and Methods
GWAS datasets and preprocessing
The GWAS data used in this study was from PGC and approved by the
National Institute of Mental Health (NIMH)
([55]https://www.nimhgenetics.org/available_data/data_biosamples/gwas_d
ata.php). Six collections of GWAS data genotyped on Affymetrix 6.0
platform and three collections of GWAS data genotyped on Affymetrix
500K platform were selected for this study. Then, we merged these
datasets into three groups and named them as MGS, Affy6, and Affy500K
separately. The detailed information about the GWAS datasets was
summarized in [56]Table 1. A consistent quality control process was
conducted for the three GWAS datasets separately using PLINK [[57]28].
Briefly, (i) SNPs were excluded if they had a missing rate >0.05
(before sample removal below); (ii) we removed samples with missing
rate >0.05 or heterozygosity rate outside 3 standard deviation from the
mean; (iii) we calculated identity by descent (IBD) after pruning SNPs
using r ^2 = 0.2 as threshold, and then excluded duplicated or related
individuals with IBD>0.185; and (iv) we excluded SNPs with significant
call rate difference in cases and controls (P<10^−5), minor allele
frequency (MAF) <0.05 or Hardy-Weinberg equilibrium P<10^−6. Principal
component estimation was done with EIGENSTRAT [[58]29], which used the
same collection of SNPs for IBD calculation. We estimated the first 20
principle components and evaluated their impact on the genome-wide test
statistics using λ as [[59]11]. Based on this, we used the top five
principle components with the smallest λ together with the study
indicator variables as associated covariates for logistic regression
test to calculate P-values of SNPs. To check the P-value of identified
module genes in a larger dataset, SCZ2 was applied [[60]30]. SCZ2 SNP
P-value list was downloaded from PGC website
([61]http://www.med.unc.edu/pgc/downloads) and filtered to keep only
SNPs whose imputation info score > 0.8. More than eight million SNPs
were left for subsequent analysis.
Table 1. Schizophrenia GWAS datasets used for this analysis.
Collection Country Platform Cases[62] ^b Controls[63] ^b SNPs[64] ^b
MGS United States, Australia Affymetrix 6.0 2681 (2679) 2653 (2653)
638,937 (576,742)
Affy6 1805 (1800) 2351 (2327) 541,356 (490,100)
ISC-Cardiff Bulgaria Affymetrix 6.0 527 609 654,278
ISC-Dublin Ireland Affymetrix 6.0 272 860 642,723
ISC-Edinburgh UK Affymetrix 6.0 368 284 646,310
ISC-SW2 Sweden Affymetrix 6.0 390 229 661,602
SGENE-TOP3 Norway Affymetrix 6.0 248 369 620,195
Affy500K 1078 (1058) 3591 (3509) 203,009 (182,499)
Cardiff UK[65] ^a UK Affymetrix 500K 476 2,938 362,569
CATIE United States Affymetrix 500K; Perlegen 164K 410 391 384,528
Zucker Hillside United States Affymetrix 500K 192 190 258,470
[66]Open in a new tab
^a The controls of Cardiff UK were from WTCCC [[67]69].
^b The number of cases, controls and SNPs after quality control were
labeled in parentheses.
Protein-protein interaction network dataset
The protein-protein interaction (PPI) network data was collected from
InWeb [[68]31], which contains high-confidence interactions as those
seen in multiple independent experiments and reported more often in
lower-throughput experiments. After mapping with approved gene symbol
in HGNC [[69]32] and removing self-interactions, the remaining PPI
network was comprised of 12,387 proteins and 156,095 interactions.
Gene-based association analysis
In the network analysis pipeline, the gene-based association analysis
was conducted using VEGAS [[70]26], which takes into account both the
effects of all SNPs mapped to gene and linkage disequilibrium (LD)
between SNPs by using simulations from the multivariate normal
distribution to calculate the gene P-values. VEGAS allocates SNPs to
one or more autosomal genes according to gene positions, in which gene
version of hg18 was used and LD patterns of SNPs were estimated using
HapMap data. Herein, the offline version of VEGAS was used. All SNPs
mapped to gene within 20kb upstream and downstream of gene coordinates
were used to calculate the gene P-value. In addition, several
improvements were made based on the downloaded version. First, the SNP
positions of three GWAS datasets and gene version were updated to
GRCh37.p11 and downloaded from Ensembl BioMarts [[71]33]; second, all
genes with HGNC symbols were filtered to ensure at least one SNP was
mapped within 20kb of gene coordinates; third, the SZ specific genotype
data was used to estimate LD patterns for SNPs within each gene instead
of the HapMap data. Adjust gene P-values was calculated using p.adjust
function in R by inputting gene P-value list as input and setting
adjust method as Benjamini & Hochberg (FDR). To calculate gene P-value
from the largest scale of GWAS data SCZ2 for replication, we used an
alternative gene-based association test tool, i.e. GATES [[72]27],
which does not need permutation or simulation to evaluate empirical
significance and is much faster. 1000 Genome phase I [[73]34] EUR
genotype data (hg19) were used for LD calculation. Extended gene region
length was 20kb at both 5’ and 3’. Threshold of r ^2 for SNPs in high
LD was 0.8. Other parameters were default.
Network module search and evaluation
We used a dense module search (DMS) method, which is a R package
developed by Jia et al. (called dmGWAS) [[74]20], to identify
functional modules enriched for SZ association signals. The DMS
algorithm dynamically searches for a dense module that holds as many
genes with small P-value as possible for each node in the context of a
node-weighted PPI network. The module score is defined as
[MATH: Zm=∑zi<
msqrt>k :MATH]
, where k is the number of genes within a module, z [i] is transferred
from gene P-value according to z [i] = ∅^−1(1 − P [i]), in which ∅^−1
denotes the inverse normal distribution function [[75]19]. In each
round of module searching, the DMS algorithm starts with each seed gene
as the initial module and identifies neighborhood interactors, which
are defined as nodes whose shortest path to the module is within a
distance d. The genes generating the maximum increment of Z [m] will be
added to the module if Z [m+1] > Z [m](1 + r), where Z [m] is the
original module score, Z [m+1] is the new module score, and r is a
pre-defined expansion rate. Herein, d and r were set to 2 and 0.1 as
[[76]20]. This process iterates until none of the nodes can satisfy Z
[m+1] > Z [m](1 + r). To evaluate the significance of the identified
modules, we undertook two steps of examination. First, we calculated
P-values based on module score (Z [m]) for each module by empirically
estimating the null distribution [[77]35]. Specifically, module scores
Z [m] were first median-centered by subtracting the median value of Z
[m] from each of them (Z [median]). Then, the mean δ and standard
deviation σ for the empirical null distribution were estimated using
locfdr in R packages. The module scores were standardized by
[MATH: Zs=Zmed
mi>ian−δσ :MATH]
and converted to P-values by P(Z [m]) = 1 − ∅(Z [S]), where ∅ is the
normal cumulative density function. Second, we made a cross evaluation
among three GWAS datasets using ‘dualEval’ function in the dmGWAS
package to minimize the bias from different GWAS datasets [[78]20].
Details of cross evaluation are provided in dmGWAS document. Briefly,
the modules from one GWAS dataset were used as discovery dataset and
the modules’ significance in the other two GWAS datasets, which were as
evaluation datasets, were evaluated in turn (i.e. P(Z [m(eval)]) was
calculated). The criteria used to screen modules were the modules with
P(Z [m])<0.05 in the discovery dataset and P(Z [m(eval)])<0.05 in any
of the two evaluation datasets. Finally, the modules passed the tests
from all three datasets were merged together and denoted as the
resultant modules. The workflow of the network-based analysis for SZ
GWAS data was shown in [79]S1 Fig.
Statistical and functional analysis of module genes
DAPPLE (Disease Association Protein-Protein Link Evaluator) [[80]36]
was used with input of a list of genes to evaluate how closely
connected between the proteins encoded by module genes. Number of
permutation was set to 10,000 times and common interactor binding
degree cutoff was set to 2.
We used the DAVID Gene Set Enrichment Analysis tool [[81]37, [82]38] to
evaluate the enriched functions of identified module genes, including
GO biological process, GO cellular component, GO molecular function
[[83]39] and KEGG pathways [[84]40]. Only terms with an adjusted
P-value (Benjamini & Hochberg) of less than 0.05 and with gene numbers
less than 350 were retained as [[85]41].
Results
Gene-based tests
We calculated gene P-values for the three SZ GWASs datasets using VEGAS
with some refinements. The result was shown in [86]Table 2. In order to
perform network-based analysis, each node in the PPI network was
assigned a gene P-value as the node weight if the name was matched.
Accordingly, a MGS-weighted PPI consisting of 11,456 genes, an
Affy6-weighted PPI consisting of 11,362 genes, and an Affy500K-weighted
PPI consisting of 10,232 genes were generated, and these three weighted
PPIs were denoted as background sets for the follow-up network
analysis. There were 651 (5.7%), 681 (6.0%), and 589 (5.8%) genes with
P-value < 0.05 in the background sets of MGS, Affy6 and Affy500K
separately.
Table 2. Results for gene-based test and dynamic module search of three
schizophrenia GWAS datasets.
GWAS datasets MGS Affy6 Affy500K
No. of genes Total genes 27,188 26,899 23,420
Genes in PPI 11,456 11,362 10,232
Genes in PPI and P-value < 0.05 651 (5.7%) 681 (6.0%) 589 (5.8%)
Genes in PPI and FDR < 0.6 56 (0.49%) 6 (0.05%) 46 (0.45%)
No. of modules Initial dmGWAS modules 10,910 10,786 9,481
① evaluation, P(Z [m]) < 0.05 786 123 612
② evaluation in Affy6, P(Z [m(eval)]) < 0.05 17 - 41
③ evaluation in Affy500K, P(Z [m(eval)]) < 0.05 11 19 -
④ evaluation in MGS, P(Z [m(eval)]) < 0.05 - 34 5
Final modules: ①∩(②∪③∪④) 27 6 46
Involved genes (238) 68 29 146
Genes with P-value < 0.05 34 (50%) 17 (58.6%) 83 (56.8%)
Genes with FDR < 0.6 9 (13.24%) 6 (20.69%) 15 (10.27%)
[87]Open in a new tab
Disease related network modules
We next sought to identify modules enriched for SZ association signals
in the background PPIs using dmGWAS. By performing dense module search
with each gene in the background PPI as a seed, 10,910 modules for MGS,
10,786 modules for Affy6, and 9,481 modules for Affy500K were initially
obtained ([88]Table 2). After examining the significance of module
score and cross validation, 27 modules (including 68 genes) for MGS, 6
modules (including 29 genes) for Affy6 and 46 modules (including 146
genes) for Affy500K met the criteria. Furthermore, compared with the
proportion of genes with P-value < 0.05 contained in the background PPI
sets, the identified module genes for three datasets have a higher
proportion of genes with P-value < 0.05 (as shown in [89]Table 2).
After merging all modules from three datasets, a total of 79 disease
related modules were obtained and they formed a resultant subnetwork
([90]Fig 1), which was composed of 238 module genes ([91]S1 Table) and
317 interactions ([92]S2 Table) between the genes in the same module.
In this subnetwork, two genes were shared between MGS and Affy6 (PSMB8
and ZFYVE20), and three genes were shared between MGS and Affy500K
(TRAF6, MYL12A and CAV1). In addition, some interactions were
overrepresented in different modules, such as
RPL35—SNORD21—PSAT1—PTPN21 from MGS, FCHSD2—WASL—GRB2—MAPK14 from
Affy6, PSMA6—VHL—DGKI and TIAM1- ITIH1 from Affy500K.
Fig 1. Protein-protein interaction network involving all merged module genes.
[93]Fig 1
[94]Open in a new tab
Square nodes denote the reported genes associated with schizophrenia or
bipolar disorder. The color of the node was proportioned with the
P-value of gene. The width of the edge was proportioned with the No. of
repeats of the edge in the modules. The purple edges, green edges and
blue edges were interactions from MGS, Affy6 and Affy500K respectively.
Annotation and functional analysis of module genes
To annotate the module genes which had been reported to be associated
with schizophrenia, we searched two genetic databases: GWAS Catalog
[[95]42] and SZGene [[96]43], and found that four genes (MAD1L1,
SDCCAG8, MCC and VAT1L) had been reported their susceptibility with SZ
by GWAS [[97]11, [98]44–[99]46], and three genes (MAPK14 [[100]47],
MYH9 [[101]48] and FXYD6 [[102]49]) had been reported at least one
positive association in SZGene. Since many evidence suggested that
schizophrenia and bipolar disorder (BD) share some symptoms and genetic
factors [[103]50], we also searched these genes in the bipolar disorder
genetic database BDgene [[104]51]. The result showed that 6 genes
(including MAD1L1 [[105]52], JAM3 [[106]53], MYL12B [[107]53], MYL12A
[[108]53], ITIH1 [[109]52] and RYR2 [[110]54]) had been reported to be
significant associated with bipolar disorder in at least one genetic
study.
To investigate the statistical significance of the interactions among
proteins encoded by the module genes, we analyzed these genes by using
DAPPLE. The results showed 230 out of the 238 module genes participated
in the direct network ([111]S2 Fig), and the direct PPI network of
module genes had significantly more edges than expected by chance (P =
9.9 × 10^−5), which means the network formed by the module genes were
statistically significantly connected.
To explore the biological function of the module genes, pathway
enrichment analyses were conducted for 238 merged module genes, 68 MGS
module genes, 29 Affy6 module genes and 146 Affy500K module genes
respectively. The results were listed in [112]Table 3. The pathways
enriched by 238 merged module genes involved several signaling pathways
(such as neurotrophin signaling pathway, VEGF signaling pathway),
biological processes related with cellular adhesion, regulation of
actin cytoskeleton, leukocyte transendothelial migration and regulation
of protein metabolism, modification and ubiquitination, and cellular
component of synaptosome. In addition, the analyses of the module genes
from three separate datasets enriched an additional signaling pathway
(GnRH signaling pathway) and one biological process (antigen processing
and presentation).
Table 3. Enriched KEGG pathways and GO terms by module genes.
Gene Source[113] ^a Category Term N (X)[114] ^b FDR[115] ^c
Merged KEGG hsa04670:Leukocyte transendothelial migration 118 (15)
8.32×10^−5
Merged KEGG hsa04722:Neurotrophin signaling pathway 124 (14) 0.0011
Merged KEGG hsa04370:VEGF signaling pathway 75 (11) 0.0024
Merged KEGG hsa04810:Regulation of actin cytoskeleton 215 (17) 0.0058
Merged KEGG hsa04510:Focal adhesion 201 (16) 0.0114
Merged KEGG hsa04530:Tight junction 134 (13) 0.0154
Merged KEGG hsa04910:Insulin signaling pathway 135 (13) 0.0166
Merged KEGG hsa05200:Pathways in cancer 328 (22) 0.0019
Merged KEGG hsa05214:Glioma 63 (10) 0.0044
Merged KEGG hsa05222:Small cell lung cancer 84 (10) 0.0478
Merged GO CC GO:0019717~synaptosome 85 (10) 0.0065
Merged GO BP GO:0051247~positive regulation of protein metabolic
process 243 (17) 0.0016
Merged GO BP GO:0032270~positive regulation of cellular protein
metabolic process 233 (16) 0.0047
Merged GO BP GO:0031399~regulation of protein modification process 295
(17) 0.0196
Merged GO BP GO:0031400~negative regulation of protein modification
process 119 (11) 0.0229
Merged GO BP GO:0031396~regulation of protein ubiquitination 100 (10)
0.0358
Merged GO BP GO:0031401~positive regulation of protein modification
process 187 (13) 0.0497
Affy500K KEGG hsa04670:Leukocyte transendothelial migration 118 (11)
6.8×10^−4
Affy500K KEGG hsa04722:Neurotrophin signaling pathway 124 (10) 0.0096
Affy500K KEGG hsa05200:Pathways in cancer 328 (15) 0.0133
Affy500K KEGG hsa04530:Tight junction 134 (10) 0.0180
Affy6 KEGG hsa05214:Glioma 63 (6) 0.0026
Affy6 KEGG hsa04912:GnRH signaling pathway 98 (6) 0.0234
MGS GO BP GO:0019882~antigen processing and presentation 83 (6) 0.0422
[116]Open in a new tab
^a Merged denotes all genes of the merged modules from MGS, Affy6 and
Affy500K.
^b N is total number of genes in the pathway or GO term. X is number of
input genes which is mapped to the pathway. Only pathways or GO terms
with N < 350 were shown.
^c FDR is the Benjamini & Hochberg-adjusted P-value. Only pathways or
GO terms with FDR < 0.05 were shown.
CC: cellular component; BP: biological process.
To further understand the functional connections between these enriched
pathways/GO terms, a crosstalk analysis was performed for them.
According to their function and shared genes, the enriched pathways/GO
terms (except three cancer related pathways) were classified into four
groups as shown in [117]Fig 2. The first group included eight pathways
from KEGG, which connect with many basic signaling pathways, such as
MAPK signaling pathway, PI3K-Akt signaling pathway, and calcium
signaling pathway. Furthermore, genes MAPK1, CDC42, PIK3CA, PIK3CG and
PIK3R1 were commonly shared by most of the eight pathways. The second
group was synaptosome, which shares genes ITGB1 and MPDZ with the first
group. The third group included six GO biological processes related
with regulation of protein metabolic, modification, and ubiquitination,
which are functionally related with neurotrophin signaling pathway
through pathway ubiquitin mediated proteolysis. All these six GO terms
involved several proteasome related genes, including PSMA2, PSMB6,
PSMB7, PSMB8, PSMC1 and PSMD14, and ubiqutin C (UBC). The last group
was antigen processing and presentation, which involves several immune
related genes, such as HLA-A, B2M, CD1D and TAP2. The shared genes
between these four groups of pathways/GO terms were shown in [118]Fig 2
panel C.
Fig 2. Crosstalk analysis of enriched pathways/GO terms.
[119]Fig 2
[120]Open in a new tab
Panel A shows the connections among the enriched pathways/GO terms.
Blue squares are enriched KEGG pathways, green squares are their
connected pathways from KEGG, red squares are enriched GO terms. Panel
B shows the shared genes between the first group of enriched pathways.
Panel C shows the shared genes between four groups of enriched
pathways/GO terms. The genes in groups with more than one pathway/GO
term were combined for the shared gene analysis.
Discussion
Genetic data of schizophrenia has been accumulated rapidly over the
past several years. In this study, we conducted a network-based
analysis on three independent schizophrenia GWAS datasets to explore
the joint effects of multiple genetic association signals on
schizophrenia. After merging the modules identified separately from the
three GWASs, 79 modules enriched with robust genetic signals in GWAS
datasets were screened out, and 238 genes were involved in the
identified modules. These modules and module genes could provide
potential candidate genes, gene-gene interactions and molecular
pathways involved in schizophrenia pathogenesis.
Of the 238 module genes, seven schizophrenia candidate genes had been
collected in either GWAS Catalog or SZGene. Since the network-based
analysis aims to search for module genes, which is a combination of
several jointly GWAS signals, some genes with bigger P-value may also
be selected. For example, among the seven replicated genes, although
most of them had P-value < 0.05 in at least one of three SZ GWAS
datasets, gene VAT1L had P-value > 0.05 in all three SZ GWAS datasets.
The identification of VAT1L by our network-based analysis might be due
to its connection with another two genes with P-value < 0.05 (SEPT3 and
ARPC5L). One of the seven genes, MYH9, had been analyzed as a candidate
gene for schizophrenia in Japanese population [[121]48], which was a
three-stage case-control association study. In the first and second
stages, the authors found a potential association of MYH9 with
schizophrenia (allelic P-value = 0.047). In the third stage, however,
they could not replicate the result in a larger sample size. In another
study conducted in Taiwanese sample, MYH9 was found as a vulnerable
gene for neuropsychological defined subgroups of schizophrenia patients
(P-value = 0.0059 with haplotype analysis) [[122]55]. So far, it is
still arguable for MYH9 about its susceptibility to schizophrenia,
especially in Caucasian population. In addition, considering the
possible shared genetic variants between SZ and BD, we also
investigated how many identified module genes had been reported their
susceptibility to BD. One of the results was MAD1L1 gene, which also
has been clearly reported its association with SZ by GWAS (P-value < 5
× 10^−8) [[123]44–[124]46]. Another interesting gene is ITIH1, which
had P-value = 1.03 × 10^−5 in Affy500K, and was involved in an
overrepresented interaction (ITIH1—TIAM1) in Affy500K modules. Notably,
ITIH1, ITIH3 and ITIH4 belong to a family of serine protease and are
arranged in the order of ITIH1-ITIH3-ITIH4 on chromosome 3p21.
Currently, ITIH molecules have been found to play a particularly
important role in inflammation [[125]56]. In addition, the region
ITIH3-ITIH4 has been identified to be susceptible to schizophrenia by
GWAS (P-value = 7.8 × 10^−9) [[126]11]. In the study by He et al., they
found three out of six SNPs they tested within the ITIH1-ITIH3-ITIH4
genomic region were significantly associated with SZ in the Han Chinese
population (the strongest SNP was rs2710322 with allelic P-Bonferroni =
0.0018) [[127]57]. These lines of evidence implied that IT1H1 might be
not only associated with BD but also SZ. Among the unreported module
genes, it is noteworthy that DGKI gene had gene P-value < 0.05 in all
datasets (gene P-value in MGS = 3.4 × 10^−2, in Affy6 = 2.7 × 10^−3, in
Affy500K = 2.17 × 10^−4). In addition, the interactions between DGKI
and VHL and PSMA6 genes (PSMA6—VHL—DGKI) were overrepresented in
Affy500K modules. Actually, DGKI has been reported to be related with
schizophrenia at gene level test (gene-wide P [min] = 6.7×10^−4)
[[128]53].
Among the 238 module genes, there were 155 genes with P-value < 0.05
(65%) in any of MGS, Affy6 or Affy500K dataset. To check the P-values
of our 238 module genes in a larger dataset, we calculated the gene
P-values in SCZ2. Considering the huge number of SCZ2 SNPs, besides
VEGAS, we also used GATES [[129]27] to calculate the gene P-value in
parallel. The results were shown in [130]S1 Table. In summary, among
all module genes, there were 88 and 61 genes with P-value < 0.05 by
using VEGAS and GATES respectively, which contained totally 93 genes
(39.1%), including several possible candidate genes mentioned above,
such ITIH1, DGKI.
As one of the first group of pathways, neurotrophin signaling pathway
has been reported to be associated with schizophrenia in several
analyses [[131]24, [132]25]. Neurotrophins are a family of trophic
factors involved in differentiation and survival of neural cells
[[133]58]. The dysfunction of neurotrophin signaling pathway can affect
axonal outgrowth, axonal guidance and synapse formation (from KEGG),
which is functionally related with the second group of enriched GO term
synaptosome. There is considerable evidence showing that abnormalities
of synapse connectivity, synaptic transmission and synapse development
contribute to the pathogenesis of schizophrenia [[134]59, [135]60], so
the enrichment of synaptosome is consistent with previous reports.
Another interesting pathway is regulation of protein ubiquitination.
The ubiquitin proteasome system has been identified as a canonical
pathway associated with several neuropsychiatric disorders, including
Alzheimer’s [[136]61], Parkinson’s [[137]62] and bipolar disorder
[[138]63]. Recently, Maria et al. have reported the abnormalities of
ubiquitination system in schizophrenia by using gene expression
analysis [[139]64]. The evidence supported the association of this
pathway with schizophrenia. The enrichment of antigen processing and
presentation is also consistent with the finding of the association of
genes in MHC region with schizophrenia in several GWASs of
schizophrenia [[140]11, [141]12]. Meanwhile, the shared gene analysis
between different groups of pathways could reveal some important
connected genes. PSMB8 and TRAF6, both of which were shared by two
groups of pathways and identified by two GWAS datasets separately, were
two notable examples. PSMB8 encodes proteasome, which cleaves peptides
in an ATP/ubiquitin-dependent process in a non-lysosomal pathway
[[142]65]. An essential function of a modified proteasome, the
immunoproteasome, is the processing of class I MHC peptides [[143]66].
TRAF6 encodes TNF receptor-associated factor 6, E3 ubiqutin protein
ligase, which functions as a signal transducer in the NF-kappaB pathway
that activates IkappaB kinase (IKK) in response to proinflammatory
cytokines, and also interacts with ubiquitin conjugating enzymes
catalyzing the formation of polyubiquitin chains [[144]67]. Thus, these
two genes are related with both immune system and protein
ubiquitination process and may play important functions in these
pathways. In summary, the four groups of enriched pathways/GO terms
were potentially associated with schizophrenia, and crosstalks among
them revealed they were connected with each other through some shared
genes and basic signaling pathways, which may implement the neural and
immune related functions and contribute to the pathogenesis of
schizophrenia.
Compared with previous network-based analyses for schizophrenia
[[145]21, [146]24, [147]25], our analysis included more GWAS datasets
and merged them into three groups according to genotyping platforms.
Cross evaluation among three groups at the module level could improve
the reliability of results. In addition, our analysis pipeline used
more accurate gene P-value calculation and in-depth functional analysis
for the module genes. We also compared our module network with two
previous related studies. At the gene level, our module genes shared 12
genes with Jia et. al [[148]21] and eight genes with Yu et. al
[[149]24] (the shared genes were shown in [150]S1 Table). At the edge
level (protein-protein interaction) our module network only share one
edge with Jia et. al [[151]21] (VHL (pp) FN1) and one edge with Yu et.
al [[152]24] (SRC (pp) GRB2). However, at the pathway level, our
results validated several pathways enriched by other analyses, such as
neurotrophin signaling pathway by [[153]24] and [[154]25], tight
junction by [[155]21] and [[156]24], antigen processing and
presentation by [[157]25], which was also validated by another PPI
network analysis paper about the top genes of schizophrenia [[158]68].
Furthermore, one of the enriched GO terms of our analysis, “regulation
of protein ubiquitination”, was firstly identified by network-based
analysis, which was also validated by other types of studies [[159]64].
These results would provide new insights and hypothesis for further
study. Additional experimental replication and verification are
required in future genetic, gene expression and molecular functional
studies.
In conclusion, our study suggests that network-based analysis of
schizophrenia GWASs is a useful method that could identify new
susceptible genes, gene interactions and molecular pathways for
schizophrenia. Our findings emphasize a central role for neural and
immune-related pathways in the etiology of schizophrenia, and provide
several candidate pathways and genes associated with schizophrenia,
which would facilitate the understanding of genetic mechanism of
schizophrenia.
Supporting Information
S1 Fig. Workflow of network-based analysis of GWAS data to identify
functional modules for schizophrenia.
(PDF)
[160]Click here for additional data file.^ (53KB, pdf)
S2 Fig. The direct network formed by the module genes from DAPPLE.
(PDF)
[161]Click here for additional data file.^ (280KB, pdf)
S1 Table. Annotation for network module involved genes.
(XLSX)
[162]Click here for additional data file.^ (75.5KB, xlsx)
S2 Table. PPI pairs involved in the identified modules for
schizophrenia.
Number of repeat denotes how many modules involved the interaction.
(XLSX)
[163]Click here for additional data file.^ (45.1KB, xlsx)
Acknowledgments