Abstract

   Schizophrenia (SCZ) is a devastating genetic mental disorder.
   Identification of the SCZ risk genes in brains is helpful to understand
   this disease. Thus, we first used the minimum Redundancy-Maximum
   Relevance (mRMR) approach to integrate the genome-wide sequence
   analysis results on SCZ and the expression quantitative trait locus
   (eQTL) data from ten brain tissues to identify the genes related to
   SCZ. Second, we adopted the variance inflation factor regression
   algorithm to identify their interacting genes in brains. Third, using
   multiple analysis methods, we explored and validated their roles. By
   means of the aforementioned procedures, we have found that (1) the
   cerebellum may play a crucial role in the pathogenesis of SCZ and (2)
   ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ.
   These interesting findings may stimulate novel strategy for developing
   new drugs against SCZ. It has not escaped our notice that the approach
   reported here is of use for studying many other genome diseases as
   well.

   Keywords: schizophrenia, eQTL, mRMR, SNP, GTEx, brain, GO, YWHA, EIF2,
   ITIH4

Introduction

   Schizophrenia (SCZ) is a devastating chronic psychiatric disorder,
   characterized by a group of symptoms including hallucinations and
   delusions, severely inappropriate emotional and behavioral responses,
   substantial cognitive changes, the division of thought, and impaired
   coordination of social or occupational function.[37]^1 Despite its low
   prevalence (about 1% of the population), SCZ imposes a substantial
   burden on the family and society.[38]^2 Now, it is widely considered to
   be of a complex genetic disease, which is affected by environmental
   factors together with multiple micro- or intermediate-effect
   genes.[39]3, [40]4 Although the studies by the genome-wide association
   study (GWAS) analysis have identified a number of significantly
   associated variants with SCZ, most of them are located in noncoding
   regions and their effects remain elusive.

   In 2001, the mRNA expression in the whole genome was proposed as a
   quantitative trait. Meanwhile, the first expression quantitative trait
   locus (eQTL) mapping analysis, which relates SNP allelic variation to
   target transcript abundance, was performed.[41]^5 Because the gene
   expression is tissue specific and influenced by environmental factors,
   integration of eQTL data and the variants associated with a specific
   disease in specific tissue may reveal some problematic genes causing
   diseases. Furthermore, many studies have indicated that significant
   changes in gene expression rather than alterations in protein structure
   and/or function play a crucial role in SCZ susceptibility.[42]6, [43]7,
   [44]8 Accordingly, SCZ-susceptible variants could be eQTLs that would
   influence the expression of some genes.

   In the present study, we used the minimum Redundancy-Maximum Relevance
   (mRMR) algorithm to identify the potential eQTL genes for SCZ by
   integrating eQTL data from 10 human brain tissues from the
   Genotype-Tissue Expression (GTEx) project with the results from a
   meta-analysis of GWASs.[45]9, [46]10 Compared with common classifiers
   of the Naive Bayes, a library for support vector machines (LIBSVM)
   version (v.)3.22, linear discriminant analysis (LDA), and logistic
   regression, mRMR algorithm has the advantages of reducing mutual
   redundancy within the selected genes and effectively selecting the
   genes to be more representative of the target phenotypes.[47]11,
   [48]12, [49]13, [50]14 In addition to eQTL genes, their target genes
   may also have some effects on SCZ; we subsequently used the identified
   genes to explore their target genes in corresponding tissues, and we
   determined their putative roles in the brain.

Results

SCZ Risk Genes Based on the Integration Analysis of eQTL and GWASs

   A total of 10,301 SNPs met the GWAS significant threshold of p < 10e−8.
   From the 10 brain tissues, 492,401 eQTL SNPs, which affected the
   expression of 22,832 genes, were collected. Of these, only 134 SNPs
   exhibited positive expression SNPs (eSNPs). Thus, for each of 10,000
   SNP benchmark datasets, there were 134 positive eSNPs and 134 negative
   randomly selected eSNPs. Subsequently, based on the MaxRel scores of
   the eQTL gene feature in the mRMR analysis, we identified the most
   discriminative eQTL gene features from different brain tissues for the
   positive eSNPs of SCZ. Using the average MaxRel score of greater than
   0.01 and the frequency of gene feature reappearance in the top 500
   among all tested eSNP-gene pair matrix more than 70%, we identified 22
   eQTL gene features, which included 12 candidate genes in eight
   different brain tissues, excluding the anterior cingulate cortex BA24
   and the caudate basal ganglia (see [51]Table S1).

   Furthermore, these 12 genes were supported by at least one item of
   evidence from the GWASs, gene differential expression ones, and/or
   alternative eQTL data for replication. These genes may play crucial
   roles in the pathogenesis of SCZ, and they can serve as potential
   putative genes that increase the risk of developing SCZ. Of these, the
   gene with the highest average MaxRel score was PRSS16 from cerebellum
   (average MaxRel = 0.0311), which exhibited the most significant
   association with SCZ and was only supported to be risk for SCZ by the
   results of GWASs. Furthermore, this gene was also found to increase the
   risk for SCZ in the cerebellar hemisphere and hippocampus. The second
   most significant SCZ eQTL gene was complement factor 4A (C4A) in the
   cerebellum and the frontal cortex BA9 (average MaxRel = 0.0165 and
   0.0129, respectively), which was only supported to be risk for SCZ by
   the results of the gene differential expression study GEO:
   [52]GSE53987. Interestingly, the AS3MT gene was found to be a potential
   risk gene for SCZ in the most number of tissues, i.e., the cerebellum,
   cerebellar hemisphere, and cortex. Furthermore, in the cerebellum
   noncoding RNA lnc-CNNM2-1 targeting the CNNM2 gene (average MaxRel =
   0.0127) and CYP21A1P (average MaxRel = 0.0147) in the cerebellum and
   ZNF192P1 in the cerebellum and cortex (both average MaxRel = 0.011)
   were identified to be SCZ risk genes in the present study
   ([53]Figure 1).

Figure 1.

   [54]Figure 1
   [55]Open in a new tab

   Association of eQTL with Corresponding Genes Based on the BrainCloud
   eQTL Database

   (A) rs17693963 with ZNF 192P1. (B) rs67682613 with CYP21A1P.

Potential Genes Interacted with SCZ Risk Genes Identified above

   To determine the target interacting genes of the SCZ risk genes
   identified, we first identified their coexpressed genes in each of the
   corresponding brain tissues using the variance inflation factor (VIF)
   regression algorithm, and then we used adjusted
   [MATH: <mrow><msup><mtext>R</mtext><mn>2</mn></msup></mrow> :MATH]
   to select the potential interactors. In total, 186 genes were
   identified to interact with the nine SCZ candidate genes (i.e., ARL3,
   AS3MT, C10orf32, C4A, CYP21A1P, HLA-DMA, PRSS16, ARL6IP4, and SNX19) in
   the three brain tissues of cerebellum, frontal Cortex BA9, and nucleus
   accumbens basal ganglia (see [56]Table S2). Of these, ARL6IP4 in the
   nucleus accumbens basal ganglia exhibited the largest number (174) of
   functionally relevant target genes. Moreover, the nucleus accumbens
   basal ganglia interactor gene SNX19 had 96 target genes that probably
   participate in a wide variety of physiological processes relevant for
   SNX19. Another interactor gene, C4A, was identified with nine target
   genes in the cerebellum and with four target genes in the frontal
   cortex BA9. In the present study, only nine genes of all these
   identified genes overlapped with known SCZ genes ([57]Figure 2).

Figure 2.

   [58]Figure 2
   [59]Open in a new tab

   Venn Diagram Comparison among Three Groups of Genes

   Known SCZ genes reported by GWASs, identified SCZ candidate genes in
   the present study, and differentially expressed genes in PBMCs. Error
   bars mean SD.

Enrichment Analysis

   Gene enrichment analysis of the genes expressed in the brain indicated
   that the candidate risk genes are significantly enriched within the
   known SCZ genes[60]^15 (p = 0.015). Furthermore, gene ontology (GO)
   enrichment analysis demonstrated that the genes were involved in a
   variety of physiological and pathophysiological processes. Within the
   molecular function GO category, all the above genes were significantly
   enriched in protein binding (false discovery rate [FDR]-adjusted p =
   5.75E−06) and poly(A) RNA binding (FDR-adjusted p = 2.02E−03). Within
   the cellular component GO category, the significantly enriched terms
   were cytosol (FDR-adjusted p = 1.06E−05), mitochondrion (FDR-adjusted
   p = 1.07E−04), extracellular exosome (FDR-adjusted p = 2.85E−04), and
   myelin sheath (FDR-adjusted p = 3.61E−03). Within the biological
   process GO category, three enriched terms, specifically SRP-dependent
   cotranslational protein targeting to the membrane (FDR-adjusted p =
   8.18E−03), viral transcription (FDR-adjusted p = 2.99E−02), and
   nuclear-transcribed mRNA catabolic process, nonsense-mediated decay
   (FDR-adjusted p = 4.64E−02), were revealed (see [61]Table S2).

   Results from pathway enrichment analysis, performed using the
   hypergeometric test, are illustrated in [62]Figure 3. Eight of these
   pathways fulfilled the criterion that –logp > 2. The top three pathways
   associated with SCZ were EIF2 signaling, IGF-1 signaling, and
   14-3-3-mediated signaling. Moreover, interestingly, all proteins in
   L-cysteine Degradation III pathways (i.e., MPST and GOT1) were among
   the candidate SCZ proteins ([63]Table S3).

Figure 3.

   [64]Figure 3
   [65]Open in a new tab

   The Top Eight Signaling Pathways in which All Identified Genes in the
   Present Study Are Enriched

Systematic Review of 14-3-3 Isoforms

   Because 14-3-3 protein includes seven isoforms (β, ε, γ, σ, η, θ, and
   ζ) and the 14-3-3-mediated pathway is involved in SCZ, we attempted to
   identify the isoforms that might play a role in SCZ. To that end, we
   performed an updated systemic review of 14-3-3 isoforms with SCZ. The
   previous results are listed in [66]Table S4. In total, 11 studies
   meeting the analysis criteria were included; they concerned six
   isoforms, namely, β, ε, γ, η, θ, and ζ. Among these studies, p values
   were calculated on the basis of either Student’s t test or a
   multivariate analysis of covariance. All studies of the θ isoform had p
   values less than 0.05. Since a multivariate analysis of covariance is
   more strict, after excluding the studies using Student’s t test, all
   six isoforms were significantly associated with SCZ; and, furthermore,
   the average fold changes (FCs) for the six isoforms β, ε, γ, η, θ, and
   ζ were 0.89, 1.42, 0.741, 1.135, 0.787, and 0.879, respectively.
   According to one study,[67]^16 a variation of a minimum 40% is viewed
   as significant regulation; thus, the results suggest that the ε and θ
   isoforms tend to play important roles in SCZ.

Potential Candidate Genes for Clinical Diagnosis

   Among the genes identified in brain tissues, inter-α-trypsin inhibitor
   H4 (ITIH4), MOSPD3, SNAP25, RNPEPL1, UBE4A, SLC25A39, ZNF688, ANK2,
   BAD, and THAP7 were found to be significantly dysregulated in
   peripheral blood mononuclear cells (PBMCs) of patients with SCZ with
   the FC > 1.5 (see [68]Table S5). Furthermore, ITIH4 (p[adj.] = 0.010,
   logFC = −1.102), SNAP25 (p[adj.] = 0.026, logFC = −1.373), RNPEPL1
   (p[adj.] = 0.028, logFC = −0.725), UBE4A (p[adj.] = 0.044, logFC =
   −0.780), BAD (p[adj.] = 0.030, logFC = −0.671), and THAP7 (p[adj.] =
   0.048, logFC = −0.878) were found to be downregulated significantly in
   patients with SCZ, whereas the significantly upregulated genes were
   SLC25A39 (p[adj.] = 0.0002, logFC = 0.979), ZNF688 (p[adj.] = 0.012,
   logFC = 0.606), ANK2 (p[adj.] = 0.026, logFC = 0.637), and MOSPD3
   (p[adj.] = 0.003, logFC = 0.611). The common genes among those known
   for SCZ, candidate genes in the brain and dysregulated expressed genes
   in PBMCs are also depicted in [69]Figure 1. However, only ITIH4 was
   found to display an overlap among these three groups, and, therefore,
   it may serve as a potential putative gene for diagnosing SCZ by a blood
   test.

Discussion

   Schizophrenia is a multifactorial and polygenic psychiatric disorder.
   Due to limited sample size, several GWASs on SCZ reported various
   independent genomic loci exceeding genome-wide significance, i.e., p <
   10^−8.[70]17, [71]18, [72]19, [73]20 Furthermore, most of the
   identified risk variants are located in noncoding regions. How these
   risk variants contribute to SCZ susceptibility remains unidentified.
   Therefore, the Schizophrenia Working Group of the Psychiatric Genomics
   Consortium (PGC) was created to combine all available SCZ samples with
   published or unpublished GWAS analysis genotypes into a single,
   systematic meta-analysis.[74]^10 Since many studies implicated that
   changes in gene expression rather than alterations in protein structure
   and/or function play critical roles in SCZ susceptibility,[75]7, [76]8
   it was suggested that those risk variants in GWAS may alter the
   expression of SCZ-related genes rather than protein function.
   Furthermore, it is well known that brain tissues appear to be most
   relevant to SCZ; however, so far which part of the brain plays a
   significant role in SCZ remains elusive. In the present study, we
   integrated eQTL data from 10 brain tissues and genetic association
   findings from the largest meta-GWAS on SCZ with a total of 150,064
   subjects using mRMR method, and we identified the potential putative
   interactors in corresponding brain tissues.

   The simple mRMR approach is one of the most potent methods proposed by
   Peng et al. to use mutual information (MI) for gene feature selection
   based on microarray gene expression data.[77]11, [78]21, [79]22,
   [80]23, [81]24 Probably, mRMR is much faster and in practice more
   robust, since this algorithm is theoretically more efficient to perform
   an optimal max-dependency selection and produce a feature set with
   little pairwise redundancy, and usually mRMR yields more excellent
   classification accuracy than other classifiers (e.g., LIBSVM, LDA,
   Naive Bayes, logistic regression, etc.). Using this algorithm, we found
   that cerebellum is the most closely linked to SCZ since the most amount
   of genes was identified within it than other brain tissues. Cerebellum
   has been established to be associated with the auditory, cognitive, and
   social behavior of SCZ in addition to motor function.[82]^25
   Furthermore, our results found that C4A, known for its role in
   immunity, is an eQTL gene in cerebellum and frontal cortex, which
   supports that C4A is an authentic risk gene for SCZ. The C4A gene,
   located in major histocompatibility complex (MHC) class III region on
   chromosome 6, encodes the acidic form of complement factor 4, which is
   the primary effector of the innate and the adaptive immune system, and
   is involved in the classical pathway of complement activation
   system.[83]^26 Recently, studies reported that C4 might play essential
   roles in the pathogenesis of SCZ.[84]26, [85]27 It was also suggested
   that some C4 variants in the brain caused significant differential
   expression of C4A and C4B and the SCZ-related common C4 allele is more
   likely to cause higher expression of C4A.[86]^26

   In the current study, at least three items of evidence support that
   ITIH4 is a risk gene for SCZ. Also, interestingly, ITIH4, which was
   also identified as an eQTL gene in putamen basal ganglia, was found to
   be significantly decreased in the serum of patients suffering
   acute-phase processes.[87]^28 ITIH4 is one of the heavy chains of
   inter-α-trypsin inhibitor (ITI), which encodes the ITI family molecules
   with four other homologous heavy chains and one light chain. It has
   been demonstrated that the ITIH3-ITIH4 region is one of the most
   significantly associated with SCZ and bipolar disorder.[88]^20 Also, we
   have identified that the SNPs rs2239547, rs4687552, and rs2535627 in
   ITIH4 exceed the GWAS threshold and regulate expression of ITIH4. Over
   the last decade, many research groups have been interested in finding a
   reliable clinical biomarker for the early detection of SCZ.[89]2,
   [90]29, [91]30 Although significant differences between patients with
   SCZ and healthy controls have been found in brain structure, functional
   brain imaging, gene expression, and genetic polymorphisms, etc., the
   overlap of reported abnormalities between patients and healthy controls
   indicates that there is no valid diagnostic test for establishing a
   concrete early diagnosis of SCZ.[92]^29 Here, supported by the above
   multiple findings, ITIH4 is suggested to be a potential clinical
   biomarker for the diagnosis of SCZ through a blood test, which can
   provide easy operation and objective diagnosis criteria. Furthermore,
   we identified two risk genes risk for SCZ, including CYP21A1P and
   ZNF192P1. These are pseudogenes, whose products function as regulatory
   elements. Although we identified three target genes for CYP21A1P,
   further work is warranted to investigate the mechanism underlying these
   genes.

   Since SCZ is a complex disease, multiple genes/pathways are involved in
   disease progression. Thus, we further explored target genes within the
   eQTL-corresponding brain tissues using the VIF regression algorithm,
   which provided a list of prioritized genes. With all identified genes
   in the brain, we performed pathway analysis. The most relevant pathway
   of eIF2 signaling was suggested. eIF2 is a multimeric protein
   consisting of α, β, and γ subunits, and it is generally considered to
   affect the maintenance of a rate-limiting step in mRNA
   translation.[93]^31 eIF2 signaling has important roles in the
   pathogenesis of SCZ as the corresponding stressors (starvation, virus,
   cytokines, and oxidative and endoplasmic reticulum stress) activate
   eIF2α kinases, which ultimately suppress protein synthesis through a
   series of reactions of phosphorylated eIF2-alpha.[94]^32 The second
   significant pathway identified was IGF1 signaling. IGF1, insulin-like
   growth factors 1, is a multifunctional protein whose amino terminus is
   highly homologous to the insulin B chain, which makes it possible to
   promote the consumption of glucose in adipose tissue via the
   insulin/IGF1 axis.[95]^33 Previous studies on IGF1 signaling in human
   neuroblastoma cells demonstrated that IGF1 signaling is involved in
   SCZ, as the pharmacological stimulation of muscarinic and insulin/IGF1
   receptors reverses the expression levels of the specific subunits of
   disordered genes in SCZ.[96]^34 Another critical pathway including
   14-3-3 proteins was also identified, which is a family of highly
   conserved, multifunctional proteins highly expressed in the brain
   during development. Moreover, many studies have examined the 14-3-3
   family gene and protein expression in the brain of patients with SCZ,
   and 14-3-3 proteins include seven isoforms, β, ε, γ, η, σ, θ, and
   ζ;[97]35, [98]36 however, conflicting results have been obtained, and
   which isoform plays a role in SCZ remains to be elucidated.[99]16,
   [100]35, [101]36, [102]37, [103]38 Our results suggested that the
   isoforms of ε and θ have essential roles in SCZ, although other
   isoforms required more data to validate.

   There are some limitations to the present analysis that need to be
   acknowledged and addressed. First, in addition to SNPs, other variants,
   such as copy number variation (CNV) and chromosomal aberration, may
   contribute to gene expression alteration and SCZ. In the present study,
   GTEx project V6p eQTL only provides the full data about the SNPs. If
   more data about other variants were available, the data could be used
   in the mRMR analysis. Second, as pointed out in Chou and Shen[104]^39
   and demonstrated in a series of recent publications[105]40, [106]41,
   [107]42, [108]43, [109]44, [110]45, [111]46, [112]47, [113]48, [114]49,
   [115]50, [116]51, [117]52, [118]53, [119]54, [120]55, [121]56, [122]57,
   [123]58, [124]59, [125]60, [126]61, [127]62, [128]63, [129]64, [130]65,
   [131]66, [132]67, [133]68, [134]69, [135]70, [136]71, [137]72,
   user-friendly and publicly accessible web servers represent the future
   direction for developing practically more useful prediction methods and
   computational tools. Actually, many practically useful web servers have
   increasing impacts on medical science,[138]^73 driving medicinal
   chemistry into an unprecedented revolution.[139]^74 We shall make
   efforts in our future work to provide a web server for the prediction
   method presented in this paper. (Once the web server has been
   established, an announcement will be made in the official website of
   Bio-X Institutes and via the MTNA journal.) Last, it is widely
   considered that environmental factors and genetic factors work together
   to induce many diseases, including SCZ. Gene expression is the direct
   result from environmental and genetic factors. Although here we focus
   on the integration of data from eQTL and GWAS to identify SCZ risk
   genes, we do not identify those genes associated with a certain
   environmental condition, and further studies are required to explore
   the specific genes for an environmental factor related to SCZ.

Conclusions

   Our analysis by integrating the data from brain eQTL and GWAS of SCZ
   using the mRMR algorithm has indicated that cerebellum may play a
   crucial role in the pathogenesis of SCZ. Also, ITIH4 may be utilized as
   a clinical biomarker for the diagnosis of SCZ, since its quantity has
   been observed significantly decreased in the serum. Furthermore, three
   major pathways, i.e., EIF2 signaling, IGF-1 signaling, and
   14-3-3-mediated signaling, have been identified to confer risk of SCZ.
   Further in-depth studies, both experimental and theoretical, are needed
   to reveal the molecular mechanism of such important findings.

Materials and Methods

Benchmark Dataset

   According to the 5-step rule[140]^75 widely used in performing various
   genome or proteome analyses[141]40, [142]41, [143]42, [144]43, [145]44,
   [146]76, [147]77, [148]78, [149]79, [150]80, [151]81, [152]82, [153]83,
   [154]84, [155]85, [156]86, [157]87, [158]88, [159]89, [160]90, the
   first important thing is to construct or select an effective benchmark
   dataset.

   In this study, the raw eQTL data were taken from 10 human brain
   tissues, i.e., anterior cingulate cortex BA24, caudate basal ganglia,
   cerebellar hemisphere, cerebellum, cortex, frontal cortex BA9,
   hippocampus, hypothalamus, nucleus accumbens basal ganglia, and putamen
   basal ganglia, from the GTEx and association information on SNPs was
   taken from the genome-wide meta-analysis about SCZ.[161]9, [162]10 In
   this meta-analysis,[163]^10 a total of 36,989 cases with SCZ and
   113,075 healthy controls was considered, and p values of a total of
   9,444,231 SNPs were calculated for their genetic association with SCZ.
   The GTEx project (V6p eQTL)
   ([164]https://gtexportal.org/home/datasets), which is currently the
   most massive eQTL project including the gene expression and genotype
   data of 53 normal human tissues from 544 donors, provides the
   association p values for SNPs regulating the gene expression.[165]^9
   The p value for each SNP-gene pair in GTEx databases was transformed
   into
   [MATH: <mrow><mo>−</mo><mi>log</mi><mspace
   width="0.25em"></mspace><mn>10</mn><mrow><mo>(</mo><mrow><mi>p</mi><msp
   ace
   width="0.25em"></mspace><mtext>value</mtext></mrow><mo>)</mo></mrow></m
   row> :MATH]
   . Then, when a variant had no significant effects on gene expression,
   the
   [MATH: <mrow><mo>−</mo><mi>log</mi><mspace
   width="0.25em"></mspace><mn>10</mn><mrow><mo>(</mo><mrow><mi>p</mi><msp
   ace
   width="0.25em"></mspace><mtext>value</mtext></mrow><mo>)</mo></mrow></m
   row> :MATH]
   was set to be 0, i.e., p value of a corresponding SNP = 1. When a
   variant had significant effects on gene expression, i.e., eQTL, the
   [MATH: <mrow><mo>−</mo><mi>log</mi><mspace
   width="0.25em"></mspace><mn>10</mn><mrow><mo>(</mo><mrow><mi>p</mi><msp
   ace
   width="0.25em"></mspace><mtext>value</mtext></mrow><mo>)</mo></mrow></m
   row> :MATH]
   for this eQTL was more than 0. Moreover, those eQTLs that were
   significantly associated with SCZ were classified as positive eSNPs,
   and those that were not were referred to as negative eSNPs. Since the
   number of negative eSNPs was much higher than that of the positive eSNP
   set, we randomly selected 10,000 negative eSNP sets, each of which
   matched the number of the total positive eSNP. Then, a benchmark
   dataset was constructed by the total positive eSNPs and each randomly
   selected negative eSNP set with the same number. Thus, overall, there
   were 10,000 eSNP benchmark datasets.

   Based on each benchmark dataset, an eSNP-gene matrix was constructed
   for the next analysis. In this matrix, the rows were eSNPs, whereas the
   columns were class of eSNP, i.e., positive or negative ones, and genes
   regulated by the eSNPs from the 10 brain tissues. Totally, 22,832 eQTL
   genes were included in this matrix for each eSNP.

The mRMR Method Integrating Brain eQTL and GWASs

   The mRMR algorithm has been widely used in computational biology for
   genome and proteome analyses.[166]41, [167]91, [168]92, [169]93,
   [170]94, [171]95 Here we also used the mRMR approach to identify the
   potential eQTL genes for SCZ by calculating the MI between two features
   and ranking these features.[172]^11 Given two variables x and y, their
   MI value can be calculated according to the following equation:
   [MATH:
   <mrow><mi>I</mi><mrow><mo>(</mo><mrow><mi>x</mi><mo>,</mo><mi>y</mi></m
   row><mo>)</mo></mrow><mo>=</mo><mrow><mo>∫</mo><mrow><mrow><mo>∫</mo><m
   row><mi>p</mi><mrow><mo>(</mo><mrow><mi>x</mi><mo>,</mo><mi>y</mi></mro
   w><mo>)</mo></mrow><mi>log</mi><mfrac><mrow><mi>p</mi><mrow><mo>(</mo><
   mrow><mi>x</mi><mo>,</mo><mi>y</mi></mrow><mo>)</mo></mrow></mrow><mrow
   ><mi>p</mi><mrow><mo>(</mo><mi>x</mi><mo>)</mo></mrow><mi>p</mi><mrow><
   mo>(</mo><mi>y</mi><mo>)</mo></mrow></mrow></mfrac><mi>d</mi><mi>x</mi>
   </mrow></mrow><mspace
   width="0.25em"></mspace><mi>d</mi><mi>y</mi></mrow></mrow><mtext>,</mte
   xt></mrow> :MATH]
   (1)

   where p(x) and p(y) are the marginal probabilities of x and y; and
   p (x, y) is their joint probabilistic distribution. Using the value of
   MI, the distance between two variables can also be quantitatively
   measured. Based on the definition of MI, the MaxRel distance can be
   formulated as the distance between a given feature and the target
   classes, which reflects the relevance between the eQTL gene features
   from 10 brain tissues and positive eSNPs. A larger MaxRel score, which
   is highly interpretative and can reveal the difference between target
   classes, is indicative of a stronger relevance. Since there were 10,000
   benchmark datasets, there were 10,000 MaxRel scores for each eQTL gene.
   We ranked the eQTL genes based on both the MaxRel scores for each
   benchmark dataset and the average of the MaxRel scores for all tested
   benchmark datasets.

   Furthermore, the identified candidate genes were evaluated by searching
   more evidence for them as potential SCZ risk genes in the SZDB database
   ([173]http://www.szdb.org/). In this database,[174]^96 SCZ risk genes
   reaching the genome-wide significance level were extracted from
   multiple GWASs and 5 microarray datasets, including GEO: [175]GSE53987
   (114 samples of prefrontal cortex, striatum, and hippocampus),[176]^97
   GEO: [177]GSE12649 (69 post mortem samples of prefrontal
   cortex),[178]^98 GEO: [179]GSE21138 (59 postmortem samples of
   prefrontal cortex s),[180]^99 GEO: [181]GSE35978 (195 samples of
   cerebellum and parietal cortex brain),[182]^100 and GEO: [183]GSE62191
   (59 samples of frontal cortex)[184]^101
   ([185]https://www.ncbi.nlm.nih.gov/geo/). Moreover, BrainCloud eQTL
   database,[186]^102 which contains the eQTL data from the human post
   mortem dorsolateral prefrontal cortex (DLPFC) of 261 normal human
   subjects in Caucasians and African Americans, was used for replication
   analysis of the eQTL association.

Identify the Potential Interactors of SCZ Risk Genes in the Brain

   To identify the target interacting genes of each eQTL gene, the VIF
   regression algorithm, an efficient and accurate method, was
   adopted.[187]^103 The objective of this algorithm used here was to
   select the optimal genes as interactors that can fit the expression
   pattern of the interesting genes. We tried to identify the optimal that
   could minimize the penalized sum of squared errors, l[0], using the
   algorithm represented by the following equation:
   [MATH:
   <mrow><mi>arg</mi><munder><mrow><mi>min</mi><mtext> </mtext></mrow><mi>
   β</mi></munder><mrow><mo>{</mo><mrow><mrow><mo>‖</mo><mrow><mi>y</mi><m
   o>−</mo><mi>X</mi><mi>β</mi></mrow><mo>‖</mo></mrow><mtable><mtr><mtd><
   mn>2</mn></mtd></mtr><mtr><mtd><mn>2</mn></mtd></mtr></mtable><mo>+</mo
   ><msub><mi>λ</mi><mn>0</mn></msub><msub><mrow><mrow><mo>‖</mo><mi>β</mi
   ><mo>‖</mo></mrow></mrow><mrow><msub><mi>l</mi><mi>o</mi></msub></mrow>
   </msub></mrow><mo>}</mo></mrow><mo>,</mo></mrow> :MATH]
   (2)

   where
   [MATH:
   <mrow><mtext>y</mtext><mo>=</mo><msup><mrow><mrow><mo>(</mo><mrow><msub
   ><mi>y</mi><mn>1</mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>y</
   mi><mi>n</mi></msub></mrow><mo>)</mo></mrow></mrow><mo>'</mo></msup></m
   row> :MATH]
   are n observations of the target gene,
   [MATH:
   <mrow><mi>X</mi><mo>=</mo><mrow><mo>(</mo><mrow><msub><mi>X</mi><mn>1</
   mn></msub><mo>,</mo><mo>…</mo><mo>,</mo><msub><mi>X</mi><mi>p</mi></msu
   b></mrow><mo>)</mo></mrow></mrow> :MATH]
   are p interactors,
   [MATH:
   <mrow><msub><mrow><mrow><mo>‖</mo><mi>β</mi><mo>‖</mo></mrow></mrow><mr
   ow><msub><mi>l</mi><mn>0</mn></msub></mrow></msub><mo>=</mo><msubsup><m
   o>∑</mo><mrow><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>p</mi></msubsup>
   <mrow><msub><mi>I</mi><mrow><mrow><mo>{</mo><mrow><mi>β</mi><mo>≠</mo><
   mn>0</mn></mrow><mo>}</mo></mrow></mrow></msub></mrow><mo>.</mo></mrow>
   :MATH]
   This algorithm calculates the correlations of each candidate interactor
   with the interesting genes using a small presampled dataset, and it
   searches the optimal interactor subset by applying t-statistic with a
   correction procedure when adding or removing one interactor at a time.
   The R package [188]http://cran.r-project.org/web/packages/VIF/ was used
   to implement the VIF method.

   Furthermore, to assess the goodness of fit for VIF regression models,
   we calculated the adjusted coefficient of determination, also known as
   adjusted
   [MATH: <mrow><msup><mtext>R</mtext><mn>2</mn></msup></mrow> :MATH]
   ,[189]^104 which measures how well the regression model fits the real
   data points and considers the number of interactors that have been
   used. In the present study, the regression models with adjusted
   [MATH: <mrow><msup><mtext>R</mtext><mn>2</mn></msup></mrow> :MATH]
   values greater than 0.6 were considered. The scheme for the exploration
   of candidate SCZ genes is shown in [190]Figure 4.

Figure 4.

   [191]Figure 4
   [192]Open in a new tab

   Flow Chart Detailing the Inclusion Process to the Present Study

Enrichment Analysis

   To gain a better understanding of the biological effects of all the
   identified genes, we performed GO enrichment analysis.[193]^105 Using a
   hypergeometric test, we analyzed whether all the above genes, including
   eQTL genes and their interactors, significantly overlapped certain GO
   terms.[194]^106 For each specific GO gene set, the hypergeometric test
   p value was caudated as
   [MATH:
   <mrow><mi>P</mi><mo>=</mo><munderover><mo>∑</mo><mrow><mi>k</mi><mo>=</
   mo><mi>m</mi></mrow><mi>n</mi></munderover><mrow><mfrac><mrow><mtable><
   mtr><mtd><mrow><mrow><mo>(</mo><mrow><mtable><mtr><mtd><mi>M</mi></mtd>
   </mtr><mtr><mtd><mi>k</mi></mtd></mtr></mtable></mrow><mo>)</mo></mrow>
   </mrow></mtd><mtd><mrow><mrow><mo>(</mo><mrow><mtable><mtr><mtd><mrow><
   mi>N</mi><mo>−</mo><mi>M</mi></mrow></mtd></mtr><mtr><mtd><mrow><mi>n</
   mi><mo>−</mo><mi>k</mi></mrow></mtd></mtr></mtable></mrow><mo>)</mo></m
   row></mrow></mtd></mtr></mtable></mrow><mrow><mrow><mo>(</mo><mrow><mta
   ble><mtr><mtd><mi>N</mi></mtd></mtr><mtr><mtd><mi>n</mi></mtd></mtr></m
   table></mrow><mo>)</mo></mrow></mrow></mfrac></mrow><mo>,</mo></mrow>
   :MATH]
   (3)

   where N is the number of all human genes, M is the number of GO genes,
   n is the number of interesting genes, and m is the number of
   interesting genes that are GO disease genes. To control the FDR, the p
   values of the hypergeometric test were adjusted with the
   Benjamini-Hochberg method.[195]^107

   Furthermore, not only the overlap with GO but also the overlap with the
   reported SCZ genes was evaluated. The known SCZ genes reported by GWASs
   and genes expressed in brain tissues are listed in [196]Table S1. In
   addition, we identified the canonical pathways associated with these
   SCZ candidate genes using the Ingenuity Pathway Analysis (IPA) suite
   ([197]https://www.qiagenbioinformatics.com/). In canonical
   pathway-based analysis, the criteria for involved significant pathways
   was set as
   [MATH: <mrow><mo>–</mo><mi>log</mi><mspace
   width="0.25em"></mspace><mtext>p</mtext><mo>></mo><mn>2</mn></mrow>
   :MATH]
   .

Systematic Review of 14-3-3 Isoforms Associated with SCZ

   Further, to determine the association of 14-3-3 isoforms with SCZ, we
   performed an updated systematic review with a literature search of
   studies published between January 1990 and December 2017 in six
   English-language databases (PubMed, Embase, Web of Science,
   ScienceDirect, SpringerLink, and EBSCO) and two Chinese databases
   (Wanfang and Chinese National Knowledge Infrastructure databases). The
   following keywords were used: 14-3-3 or YWHA and SCZ. The scheme for
   this systematic review is described in [198]Figure S1.

   Data extraction was independently performed by two investigators; any
   discrepancies between the two reviewers were resolved through
   discussion, and a consensus was reached by a third party who was from a
   different organization. Inclusion criteria for the analysis were as
   follows: (1) detailed diagnosis definition of SCZ; (2) sample size, FC,
   and p value; and (3) at least three qualifying studies per isoform. The
   strength of the associations between gene expression levels and SCZ was
   measured by calculating the FC and p value.

Potential Candidate Genes for Diagnosis

   To identify the potential candidate genes for the blood test, the gene
   expression profile of PBMCs was examined in our previous
   study.[199]^108 Briefly, blood samples from 18 first-onset SCZ patients
   (8 males and 10 females, aged 14.78 ± 1.70 years) and 12 healthy
   controls (6 males and 6 females, aged 14.75 ± 2.14 years) were
   collected. The patients were untreated and drug naive and were
   independently diagnosed by at least two experienced psychiatrists
   according to the Diagnosis and Statistical Manual of Mental Disorders
   Fourth Edition (DSM-IV) criteria for SCZ. Agilent Human LncRNA
   Microarray v.2.0 and 17,200 valid probes were used to identify the
   putative clinical gene biomarkers. All participants have provided
   informed consent in accordance with the approval of the Bioethics
   Committee of Bio-X Institutes of Shanghai Jiaotong University and the
   principles set forth by the Declaration of Helsinki.

Author Contributions

   L.C., L.H., and K.-C.C. conceived the study. L.C., T.H., and J.S.
   designed and performed the analyses. L.C., T.H., and X.Z. drafted the
   manuscript. W.C. and F.Z. provided the data and performed gene
   expression tests. L.C., and K.-C.C. finalized the paper.

Conflicts of Interest

   The authors have no conflict of interest.

Acknowledgments