Abstract Schizophrenia (SCZ) is a devastating genetic mental disorder. Identification of the SCZ risk genes in brains is helpful to understand this disease. Thus, we first used the minimum Redundancy-Maximum Relevance (mRMR) approach to integrate the genome-wide sequence analysis results on SCZ and the expression quantitative trait locus (eQTL) data from ten brain tissues to identify the genes related to SCZ. Second, we adopted the variance inflation factor regression algorithm to identify their interacting genes in brains. Third, using multiple analysis methods, we explored and validated their roles. By means of the aforementioned procedures, we have found that (1) the cerebellum may play a crucial role in the pathogenesis of SCZ and (2) ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ. These interesting findings may stimulate novel strategy for developing new drugs against SCZ. It has not escaped our notice that the approach reported here is of use for studying many other genome diseases as well. Keywords: schizophrenia, eQTL, mRMR, SNP, GTEx, brain, GO, YWHA, EIF2, ITIH4 Introduction Schizophrenia (SCZ) is a devastating chronic psychiatric disorder, characterized by a group of symptoms including hallucinations and delusions, severely inappropriate emotional and behavioral responses, substantial cognitive changes, the division of thought, and impaired coordination of social or occupational function.[37]^1 Despite its low prevalence (about 1% of the population), SCZ imposes a substantial burden on the family and society.[38]^2 Now, it is widely considered to be of a complex genetic disease, which is affected by environmental factors together with multiple micro- or intermediate-effect genes.[39]3, [40]4 Although the studies by the genome-wide association study (GWAS) analysis have identified a number of significantly associated variants with SCZ, most of them are located in noncoding regions and their effects remain elusive. In 2001, the mRNA expression in the whole genome was proposed as a quantitative trait. Meanwhile, the first expression quantitative trait locus (eQTL) mapping analysis, which relates SNP allelic variation to target transcript abundance, was performed.[41]^5 Because the gene expression is tissue specific and influenced by environmental factors, integration of eQTL data and the variants associated with a specific disease in specific tissue may reveal some problematic genes causing diseases. Furthermore, many studies have indicated that significant changes in gene expression rather than alterations in protein structure and/or function play a crucial role in SCZ susceptibility.[42]6, [43]7, [44]8 Accordingly, SCZ-susceptible variants could be eQTLs that would influence the expression of some genes. In the present study, we used the minimum Redundancy-Maximum Relevance (mRMR) algorithm to identify the potential eQTL genes for SCZ by integrating eQTL data from 10 human brain tissues from the Genotype-Tissue Expression (GTEx) project with the results from a meta-analysis of GWASs.[45]9, [46]10 Compared with common classifiers of the Naive Bayes, a library for support vector machines (LIBSVM) version (v.)3.22, linear discriminant analysis (LDA), and logistic regression, mRMR algorithm has the advantages of reducing mutual redundancy within the selected genes and effectively selecting the genes to be more representative of the target phenotypes.[47]11, [48]12, [49]13, [50]14 In addition to eQTL genes, their target genes may also have some effects on SCZ; we subsequently used the identified genes to explore their target genes in corresponding tissues, and we determined their putative roles in the brain. Results SCZ Risk Genes Based on the Integration Analysis of eQTL and GWASs A total of 10,301 SNPs met the GWAS significant threshold of p < 10e−8. From the 10 brain tissues, 492,401 eQTL SNPs, which affected the expression of 22,832 genes, were collected. Of these, only 134 SNPs exhibited positive expression SNPs (eSNPs). Thus, for each of 10,000 SNP benchmark datasets, there were 134 positive eSNPs and 134 negative randomly selected eSNPs. Subsequently, based on the MaxRel scores of the eQTL gene feature in the mRMR analysis, we identified the most discriminative eQTL gene features from different brain tissues for the positive eSNPs of SCZ. Using the average MaxRel score of greater than 0.01 and the frequency of gene feature reappearance in the top 500 among all tested eSNP-gene pair matrix more than 70%, we identified 22 eQTL gene features, which included 12 candidate genes in eight different brain tissues, excluding the anterior cingulate cortex BA24 and the caudate basal ganglia (see [51]Table S1). Furthermore, these 12 genes were supported by at least one item of evidence from the GWASs, gene differential expression ones, and/or alternative eQTL data for replication. These genes may play crucial roles in the pathogenesis of SCZ, and they can serve as potential putative genes that increase the risk of developing SCZ. Of these, the gene with the highest average MaxRel score was PRSS16 from cerebellum (average MaxRel = 0.0311), which exhibited the most significant association with SCZ and was only supported to be risk for SCZ by the results of GWASs. Furthermore, this gene was also found to increase the risk for SCZ in the cerebellar hemisphere and hippocampus. The second most significant SCZ eQTL gene was complement factor 4A (C4A) in the cerebellum and the frontal cortex BA9 (average MaxRel = 0.0165 and 0.0129, respectively), which was only supported to be risk for SCZ by the results of the gene differential expression study GEO: [52]GSE53987. Interestingly, the AS3MT gene was found to be a potential risk gene for SCZ in the most number of tissues, i.e., the cerebellum, cerebellar hemisphere, and cortex. Furthermore, in the cerebellum noncoding RNA lnc-CNNM2-1 targeting the CNNM2 gene (average MaxRel = 0.0127) and CYP21A1P (average MaxRel = 0.0147) in the cerebellum and ZNF192P1 in the cerebellum and cortex (both average MaxRel = 0.011) were identified to be SCZ risk genes in the present study ([53]Figure 1). Figure 1. [54]Figure 1 [55]Open in a new tab Association of eQTL with Corresponding Genes Based on the BrainCloud eQTL Database (A) rs17693963 with ZNF 192P1. (B) rs67682613 with CYP21A1P. Potential Genes Interacted with SCZ Risk Genes Identified above To determine the target interacting genes of the SCZ risk genes identified, we first identified their coexpressed genes in each of the corresponding brain tissues using the variance inflation factor (VIF) regression algorithm, and then we used adjusted [MATH: R2 :MATH] to select the potential interactors. In total, 186 genes were identified to interact with the nine SCZ candidate genes (i.e., ARL3, AS3MT, C10orf32, C4A, CYP21A1P, HLA-DMA, PRSS16, ARL6IP4, and SNX19) in the three brain tissues of cerebellum, frontal Cortex BA9, and nucleus accumbens basal ganglia (see [56]Table S2). Of these, ARL6IP4 in the nucleus accumbens basal ganglia exhibited the largest number (174) of functionally relevant target genes. Moreover, the nucleus accumbens basal ganglia interactor gene SNX19 had 96 target genes that probably participate in a wide variety of physiological processes relevant for SNX19. Another interactor gene, C4A, was identified with nine target genes in the cerebellum and with four target genes in the frontal cortex BA9. In the present study, only nine genes of all these identified genes overlapped with known SCZ genes ([57]Figure 2). Figure 2. [58]Figure 2 [59]Open in a new tab Venn Diagram Comparison among Three Groups of Genes Known SCZ genes reported by GWASs, identified SCZ candidate genes in the present study, and differentially expressed genes in PBMCs. Error bars mean SD. Enrichment Analysis Gene enrichment analysis of the genes expressed in the brain indicated that the candidate risk genes are significantly enriched within the known SCZ genes[60]^15 (p = 0.015). Furthermore, gene ontology (GO) enrichment analysis demonstrated that the genes were involved in a variety of physiological and pathophysiological processes. Within the molecular function GO category, all the above genes were significantly enriched in protein binding (false discovery rate [FDR]-adjusted p = 5.75E−06) and poly(A) RNA binding (FDR-adjusted p = 2.02E−03). Within the cellular component GO category, the significantly enriched terms were cytosol (FDR-adjusted p = 1.06E−05), mitochondrion (FDR-adjusted p = 1.07E−04), extracellular exosome (FDR-adjusted p = 2.85E−04), and myelin sheath (FDR-adjusted p = 3.61E−03). Within the biological process GO category, three enriched terms, specifically SRP-dependent cotranslational protein targeting to the membrane (FDR-adjusted p = 8.18E−03), viral transcription (FDR-adjusted p = 2.99E−02), and nuclear-transcribed mRNA catabolic process, nonsense-mediated decay (FDR-adjusted p = 4.64E−02), were revealed (see [61]Table S2). Results from pathway enrichment analysis, performed using the hypergeometric test, are illustrated in [62]Figure 3. Eight of these pathways fulfilled the criterion that –logp > 2. The top three pathways associated with SCZ were EIF2 signaling, IGF-1 signaling, and 14-3-3-mediated signaling. Moreover, interestingly, all proteins in L-cysteine Degradation III pathways (i.e., MPST and GOT1) were among the candidate SCZ proteins ([63]Table S3). Figure 3. [64]Figure 3 [65]Open in a new tab The Top Eight Signaling Pathways in which All Identified Genes in the Present Study Are Enriched Systematic Review of 14-3-3 Isoforms Because 14-3-3 protein includes seven isoforms (β, ε, γ, σ, η, θ, and ζ) and the 14-3-3-mediated pathway is involved in SCZ, we attempted to identify the isoforms that might play a role in SCZ. To that end, we performed an updated systemic review of 14-3-3 isoforms with SCZ. The previous results are listed in [66]Table S4. In total, 11 studies meeting the analysis criteria were included; they concerned six isoforms, namely, β, ε, γ, η, θ, and ζ. Among these studies, p values were calculated on the basis of either Student’s t test or a multivariate analysis of covariance. All studies of the θ isoform had p values less than 0.05. Since a multivariate analysis of covariance is more strict, after excluding the studies using Student’s t test, all six isoforms were significantly associated with SCZ; and, furthermore, the average fold changes (FCs) for the six isoforms β, ε, γ, η, θ, and ζ were 0.89, 1.42, 0.741, 1.135, 0.787, and 0.879, respectively. According to one study,[67]^16 a variation of a minimum 40% is viewed as significant regulation; thus, the results suggest that the ε and θ isoforms tend to play important roles in SCZ. Potential Candidate Genes for Clinical Diagnosis Among the genes identified in brain tissues, inter-α-trypsin inhibitor H4 (ITIH4), MOSPD3, SNAP25, RNPEPL1, UBE4A, SLC25A39, ZNF688, ANK2, BAD, and THAP7 were found to be significantly dysregulated in peripheral blood mononuclear cells (PBMCs) of patients with SCZ with the FC > 1.5 (see [68]Table S5). Furthermore, ITIH4 (p[adj.] = 0.010, logFC = −1.102), SNAP25 (p[adj.] = 0.026, logFC = −1.373), RNPEPL1 (p[adj.] = 0.028, logFC = −0.725), UBE4A (p[adj.] = 0.044, logFC = −0.780), BAD (p[adj.] = 0.030, logFC = −0.671), and THAP7 (p[adj.] = 0.048, logFC = −0.878) were found to be downregulated significantly in patients with SCZ, whereas the significantly upregulated genes were SLC25A39 (p[adj.] = 0.0002, logFC = 0.979), ZNF688 (p[adj.] = 0.012, logFC = 0.606), ANK2 (p[adj.] = 0.026, logFC = 0.637), and MOSPD3 (p[adj.] = 0.003, logFC = 0.611). The common genes among those known for SCZ, candidate genes in the brain and dysregulated expressed genes in PBMCs are also depicted in [69]Figure 1. However, only ITIH4 was found to display an overlap among these three groups, and, therefore, it may serve as a potential putative gene for diagnosing SCZ by a blood test. Discussion Schizophrenia is a multifactorial and polygenic psychiatric disorder. Due to limited sample size, several GWASs on SCZ reported various independent genomic loci exceeding genome-wide significance, i.e., p < 10^−8.[70]17, [71]18, [72]19, [73]20 Furthermore, most of the identified risk variants are located in noncoding regions. How these risk variants contribute to SCZ susceptibility remains unidentified. Therefore, the Schizophrenia Working Group of the Psychiatric Genomics Consortium (PGC) was created to combine all available SCZ samples with published or unpublished GWAS analysis genotypes into a single, systematic meta-analysis.[74]^10 Since many studies implicated that changes in gene expression rather than alterations in protein structure and/or function play critical roles in SCZ susceptibility,[75]7, [76]8 it was suggested that those risk variants in GWAS may alter the expression of SCZ-related genes rather than protein function. Furthermore, it is well known that brain tissues appear to be most relevant to SCZ; however, so far which part of the brain plays a significant role in SCZ remains elusive. In the present study, we integrated eQTL data from 10 brain tissues and genetic association findings from the largest meta-GWAS on SCZ with a total of 150,064 subjects using mRMR method, and we identified the potential putative interactors in corresponding brain tissues. The simple mRMR approach is one of the most potent methods proposed by Peng et al. to use mutual information (MI) for gene feature selection based on microarray gene expression data.[77]11, [78]21, [79]22, [80]23, [81]24 Probably, mRMR is much faster and in practice more robust, since this algorithm is theoretically more efficient to perform an optimal max-dependency selection and produce a feature set with little pairwise redundancy, and usually mRMR yields more excellent classification accuracy than other classifiers (e.g., LIBSVM, LDA, Naive Bayes, logistic regression, etc.). Using this algorithm, we found that cerebellum is the most closely linked to SCZ since the most amount of genes was identified within it than other brain tissues. Cerebellum has been established to be associated with the auditory, cognitive, and social behavior of SCZ in addition to motor function.[82]^25 Furthermore, our results found that C4A, known for its role in immunity, is an eQTL gene in cerebellum and frontal cortex, which supports that C4A is an authentic risk gene for SCZ. The C4A gene, located in major histocompatibility complex (MHC) class III region on chromosome 6, encodes the acidic form of complement factor 4, which is the primary effector of the innate and the adaptive immune system, and is involved in the classical pathway of complement activation system.[83]^26 Recently, studies reported that C4 might play essential roles in the pathogenesis of SCZ.[84]26, [85]27 It was also suggested that some C4 variants in the brain caused significant differential expression of C4A and C4B and the SCZ-related common C4 allele is more likely to cause higher expression of C4A.[86]^26 In the current study, at least three items of evidence support that ITIH4 is a risk gene for SCZ. Also, interestingly, ITIH4, which was also identified as an eQTL gene in putamen basal ganglia, was found to be significantly decreased in the serum of patients suffering acute-phase processes.[87]^28 ITIH4 is one of the heavy chains of inter-α-trypsin inhibitor (ITI), which encodes the ITI family molecules with four other homologous heavy chains and one light chain. It has been demonstrated that the ITIH3-ITIH4 region is one of the most significantly associated with SCZ and bipolar disorder.[88]^20 Also, we have identified that the SNPs rs2239547, rs4687552, and rs2535627 in ITIH4 exceed the GWAS threshold and regulate expression of ITIH4. Over the last decade, many research groups have been interested in finding a reliable clinical biomarker for the early detection of SCZ.[89]2, [90]29, [91]30 Although significant differences between patients with SCZ and healthy controls have been found in brain structure, functional brain imaging, gene expression, and genetic polymorphisms, etc., the overlap of reported abnormalities between patients and healthy controls indicates that there is no valid diagnostic test for establishing a concrete early diagnosis of SCZ.[92]^29 Here, supported by the above multiple findings, ITIH4 is suggested to be a potential clinical biomarker for the diagnosis of SCZ through a blood test, which can provide easy operation and objective diagnosis criteria. Furthermore, we identified two risk genes risk for SCZ, including CYP21A1P and ZNF192P1. These are pseudogenes, whose products function as regulatory elements. Although we identified three target genes for CYP21A1P, further work is warranted to investigate the mechanism underlying these genes. Since SCZ is a complex disease, multiple genes/pathways are involved in disease progression. Thus, we further explored target genes within the eQTL-corresponding brain tissues using the VIF regression algorithm, which provided a list of prioritized genes. With all identified genes in the brain, we performed pathway analysis. The most relevant pathway of eIF2 signaling was suggested. eIF2 is a multimeric protein consisting of α, β, and γ subunits, and it is generally considered to affect the maintenance of a rate-limiting step in mRNA translation.[93]^31 eIF2 signaling has important roles in the pathogenesis of SCZ as the corresponding stressors (starvation, virus, cytokines, and oxidative and endoplasmic reticulum stress) activate eIF2α kinases, which ultimately suppress protein synthesis through a series of reactions of phosphorylated eIF2-alpha.[94]^32 The second significant pathway identified was IGF1 signaling. IGF1, insulin-like growth factors 1, is a multifunctional protein whose amino terminus is highly homologous to the insulin B chain, which makes it possible to promote the consumption of glucose in adipose tissue via the insulin/IGF1 axis.[95]^33 Previous studies on IGF1 signaling in human neuroblastoma cells demonstrated that IGF1 signaling is involved in SCZ, as the pharmacological stimulation of muscarinic and insulin/IGF1 receptors reverses the expression levels of the specific subunits of disordered genes in SCZ.[96]^34 Another critical pathway including 14-3-3 proteins was also identified, which is a family of highly conserved, multifunctional proteins highly expressed in the brain during development. Moreover, many studies have examined the 14-3-3 family gene and protein expression in the brain of patients with SCZ, and 14-3-3 proteins include seven isoforms, β, ε, γ, η, σ, θ, and ζ;[97]35, [98]36 however, conflicting results have been obtained, and which isoform plays a role in SCZ remains to be elucidated.[99]16, [100]35, [101]36, [102]37, [103]38 Our results suggested that the isoforms of ε and θ have essential roles in SCZ, although other isoforms required more data to validate. There are some limitations to the present analysis that need to be acknowledged and addressed. First, in addition to SNPs, other variants, such as copy number variation (CNV) and chromosomal aberration, may contribute to gene expression alteration and SCZ. In the present study, GTEx project V6p eQTL only provides the full data about the SNPs. If more data about other variants were available, the data could be used in the mRMR analysis. Second, as pointed out in Chou and Shen[104]^39 and demonstrated in a series of recent publications[105]40, [106]41, [107]42, [108]43, [109]44, [110]45, [111]46, [112]47, [113]48, [114]49, [115]50, [116]51, [117]52, [118]53, [119]54, [120]55, [121]56, [122]57, [123]58, [124]59, [125]60, [126]61, [127]62, [128]63, [129]64, [130]65, [131]66, [132]67, [133]68, [134]69, [135]70, [136]71, [137]72, user-friendly and publicly accessible web servers represent the future direction for developing practically more useful prediction methods and computational tools. Actually, many practically useful web servers have increasing impacts on medical science,[138]^73 driving medicinal chemistry into an unprecedented revolution.[139]^74 We shall make efforts in our future work to provide a web server for the prediction method presented in this paper. (Once the web server has been established, an announcement will be made in the official website of Bio-X Institutes and via the MTNA journal.) Last, it is widely considered that environmental factors and genetic factors work together to induce many diseases, including SCZ. Gene expression is the direct result from environmental and genetic factors. Although here we focus on the integration of data from eQTL and GWAS to identify SCZ risk genes, we do not identify those genes associated with a certain environmental condition, and further studies are required to explore the specific genes for an environmental factor related to SCZ. Conclusions Our analysis by integrating the data from brain eQTL and GWAS of SCZ using the mRMR algorithm has indicated that cerebellum may play a crucial role in the pathogenesis of SCZ. Also, ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ, since its quantity has been observed significantly decreased in the serum. Furthermore, three major pathways, i.e., EIF2 signaling, IGF-1 signaling, and 14-3-3-mediated signaling, have been identified to confer risk of SCZ. Further in-depth studies, both experimental and theoretical, are needed to reveal the molecular mechanism of such important findings. Materials and Methods Benchmark Dataset According to the 5-step rule[140]^75 widely used in performing various genome or proteome analyses[141]40, [142]41, [143]42, [144]43, [145]44, [146]76, [147]77, [148]78, [149]79, [150]80, [151]81, [152]82, [153]83, [154]84, [155]85, [156]86, [157]87, [158]88, [159]89, [160]90, the first important thing is to construct or select an effective benchmark dataset. In this study, the raw eQTL data were taken from 10 human brain tissues, i.e., anterior cingulate cortex BA24, caudate basal ganglia, cerebellar hemisphere, cerebellum, cortex, frontal cortex BA9, hippocampus, hypothalamus, nucleus accumbens basal ganglia, and putamen basal ganglia, from the GTEx and association information on SNPs was taken from the genome-wide meta-analysis about SCZ.[161]9, [162]10 In this meta-analysis,[163]^10 a total of 36,989 cases with SCZ and 113,075 healthy controls was considered, and p values of a total of 9,444,231 SNPs were calculated for their genetic association with SCZ. The GTEx project (V6p eQTL) ([164]https://gtexportal.org/home/datasets), which is currently the most massive eQTL project including the gene expression and genotype data of 53 normal human tissues from 544 donors, provides the association p values for SNPs regulating the gene expression.[165]^9 The p value for each SNP-gene pair in GTEx databases was transformed into [MATH: log10(pvalue) :MATH] . Then, when a variant had no significant effects on gene expression, the [MATH: log10(pvalue) :MATH] was set to be 0, i.e., p value of a corresponding SNP = 1. When a variant had significant effects on gene expression, i.e., eQTL, the [MATH: log10(pvalue) :MATH] for this eQTL was more than 0. Moreover, those eQTLs that were significantly associated with SCZ were classified as positive eSNPs, and those that were not were referred to as negative eSNPs. Since the number of negative eSNPs was much higher than that of the positive eSNP set, we randomly selected 10,000 negative eSNP sets, each of which matched the number of the total positive eSNP. Then, a benchmark dataset was constructed by the total positive eSNPs and each randomly selected negative eSNP set with the same number. Thus, overall, there were 10,000 eSNP benchmark datasets. Based on each benchmark dataset, an eSNP-gene matrix was constructed for the next analysis. In this matrix, the rows were eSNPs, whereas the columns were class of eSNP, i.e., positive or negative ones, and genes regulated by the eSNPs from the 10 brain tissues. Totally, 22,832 eQTL genes were included in this matrix for each eSNP. The mRMR Method Integrating Brain eQTL and GWASs The mRMR algorithm has been widely used in computational biology for genome and proteome analyses.[166]41, [167]91, [168]92, [169]93, [170]94, [171]95 Here we also used the mRMR approach to identify the potential eQTL genes for SCZ by calculating the MI between two features and ranking these features.[172]^11 Given two variables x and y, their MI value can be calculated according to the following equation: [MATH: I(x,y)=p(x,y)logp(< mrow>x,y)p(x)p< mo>(y)dx dy, :MATH] (1) where p(x) and p(y) are the marginal probabilities of x and y; and p (x, y) is their joint probabilistic distribution. Using the value of MI, the distance between two variables can also be quantitatively measured. Based on the definition of MI, the MaxRel distance can be formulated as the distance between a given feature and the target classes, which reflects the relevance between the eQTL gene features from 10 brain tissues and positive eSNPs. A larger MaxRel score, which is highly interpretative and can reveal the difference between target classes, is indicative of a stronger relevance. Since there were 10,000 benchmark datasets, there were 10,000 MaxRel scores for each eQTL gene. We ranked the eQTL genes based on both the MaxRel scores for each benchmark dataset and the average of the MaxRel scores for all tested benchmark datasets. Furthermore, the identified candidate genes were evaluated by searching more evidence for them as potential SCZ risk genes in the SZDB database ([173]http://www.szdb.org/). In this database,[174]^96 SCZ risk genes reaching the genome-wide significance level were extracted from multiple GWASs and 5 microarray datasets, including GEO: [175]GSE53987 (114 samples of prefrontal cortex, striatum, and hippocampus),[176]^97 GEO: [177]GSE12649 (69 post mortem samples of prefrontal cortex),[178]^98 GEO: [179]GSE21138 (59 postmortem samples of prefrontal cortex s),[180]^99 GEO: [181]GSE35978 (195 samples of cerebellum and parietal cortex brain),[182]^100 and GEO: [183]GSE62191 (59 samples of frontal cortex)[184]^101 ([185]https://www.ncbi.nlm.nih.gov/geo/). Moreover, BrainCloud eQTL database,[186]^102 which contains the eQTL data from the human post mortem dorsolateral prefrontal cortex (DLPFC) of 261 normal human subjects in Caucasians and African Americans, was used for replication analysis of the eQTL association. Identify the Potential Interactors of SCZ Risk Genes in the Brain To identify the target interacting genes of each eQTL gene, the VIF regression algorithm, an efficient and accurate method, was adopted.[187]^103 The objective of this algorithm used here was to select the optimal genes as interactors that can fit the expression pattern of the interesting genes. We tried to identify the optimal that could minimize the penalized sum of squared errors, l[0], using the algorithm represented by the following equation: [MATH: argmin β{yXβ< mn>22+λ0βlo }, :MATH] (2) where [MATH: y=(y1,,yn)' :MATH] are n observations of the target gene, [MATH: X=(X1,,Xp) :MATH] are p interactors, [MATH: βl0=i=1p I{β< mn>0}. :MATH] This algorithm calculates the correlations of each candidate interactor with the interesting genes using a small presampled dataset, and it searches the optimal interactor subset by applying t-statistic with a correction procedure when adding or removing one interactor at a time. The R package [188]http://cran.r-project.org/web/packages/VIF/ was used to implement the VIF method. Furthermore, to assess the goodness of fit for VIF regression models, we calculated the adjusted coefficient of determination, also known as adjusted [MATH: R2 :MATH] ,[189]^104 which measures how well the regression model fits the real data points and considers the number of interactors that have been used. In the present study, the regression models with adjusted [MATH: R2 :MATH] values greater than 0.6 were considered. The scheme for the exploration of candidate SCZ genes is shown in [190]Figure 4. Figure 4. [191]Figure 4 [192]Open in a new tab Flow Chart Detailing the Inclusion Process to the Present Study Enrichment Analysis To gain a better understanding of the biological effects of all the identified genes, we performed GO enrichment analysis.[193]^105 Using a hypergeometric test, we analyzed whether all the above genes, including eQTL genes and their interactors, significantly overlapped certain GO terms.[194]^106 For each specific GO gene set, the hypergeometric test p value was caudated as [MATH: P=k=mn< mtr>(M k) (< mi>NMnk)(Nn), :MATH] (3) where N is the number of all human genes, M is the number of GO genes, n is the number of interesting genes, and m is the number of interesting genes that are GO disease genes. To control the FDR, the p values of the hypergeometric test were adjusted with the Benjamini-Hochberg method.[195]^107 Furthermore, not only the overlap with GO but also the overlap with the reported SCZ genes was evaluated. The known SCZ genes reported by GWASs and genes expressed in brain tissues are listed in [196]Table S1. In addition, we identified the canonical pathways associated with these SCZ candidate genes using the Ingenuity Pathway Analysis (IPA) suite ([197]https://www.qiagenbioinformatics.com/). In canonical pathway-based analysis, the criteria for involved significant pathways was set as [MATH: logp>2 :MATH] . Systematic Review of 14-3-3 Isoforms Associated with SCZ Further, to determine the association of 14-3-3 isoforms with SCZ, we performed an updated systematic review with a literature search of studies published between January 1990 and December 2017 in six English-language databases (PubMed, Embase, Web of Science, ScienceDirect, SpringerLink, and EBSCO) and two Chinese databases (Wanfang and Chinese National Knowledge Infrastructure databases). The following keywords were used: 14-3-3 or YWHA and SCZ. The scheme for this systematic review is described in [198]Figure S1. Data extraction was independently performed by two investigators; any discrepancies between the two reviewers were resolved through discussion, and a consensus was reached by a third party who was from a different organization. Inclusion criteria for the analysis were as follows: (1) detailed diagnosis definition of SCZ; (2) sample size, FC, and p value; and (3) at least three qualifying studies per isoform. The strength of the associations between gene expression levels and SCZ was measured by calculating the FC and p value. Potential Candidate Genes for Diagnosis To identify the potential candidate genes for the blood test, the gene expression profile of PBMCs was examined in our previous study.[199]^108 Briefly, blood samples from 18 first-onset SCZ patients (8 males and 10 females, aged 14.78 ± 1.70 years) and 12 healthy controls (6 males and 6 females, aged 14.75 ± 2.14 years) were collected. The patients were untreated and drug naive and were independently diagnosed by at least two experienced psychiatrists according to the Diagnosis and Statistical Manual of Mental Disorders Fourth Edition (DSM-IV) criteria for SCZ. Agilent Human LncRNA Microarray v.2.0 and 17,200 valid probes were used to identify the putative clinical gene biomarkers. All participants have provided informed consent in accordance with the approval of the Bioethics Committee of Bio-X Institutes of Shanghai Jiaotong University and the principles set forth by the Declaration of Helsinki. Author Contributions L.C., L.H., and K.-C.C. conceived the study. L.C., T.H., and J.S. designed and performed the analyses. L.C., T.H., and X.Z. drafted the manuscript. W.C. and F.Z. provided the data and performed gene expression tests. L.C., and K.-C.C. finalized the paper. Conflicts of Interest The authors have no conflict of interest. Acknowledgments