Abstract
Schizophrenia (SCZ) is a devastating genetic mental disorder.
Identification of the SCZ risk genes in brains is helpful to understand
this disease. Thus, we first used the minimum Redundancy-Maximum
Relevance (mRMR) approach to integrate the genome-wide sequence
analysis results on SCZ and the expression quantitative trait locus
(eQTL) data from ten brain tissues to identify the genes related to
SCZ. Second, we adopted the variance inflation factor regression
algorithm to identify their interacting genes in brains. Third, using
multiple analysis methods, we explored and validated their roles. By
means of the aforementioned procedures, we have found that (1) the
cerebellum may play a crucial role in the pathogenesis of SCZ and (2)
ITIH4 may be utilized as a clinical biomarker for the diagnosis of SCZ.
These interesting findings may stimulate novel strategy for developing
new drugs against SCZ. It has not escaped our notice that the approach
reported here is of use for studying many other genome diseases as
well.
Keywords: schizophrenia, eQTL, mRMR, SNP, GTEx, brain, GO, YWHA, EIF2,
ITIH4
Introduction
Schizophrenia (SCZ) is a devastating chronic psychiatric disorder,
characterized by a group of symptoms including hallucinations and
delusions, severely inappropriate emotional and behavioral responses,
substantial cognitive changes, the division of thought, and impaired
coordination of social or occupational function.[37]^1 Despite its low
prevalence (about 1% of the population), SCZ imposes a substantial
burden on the family and society.[38]^2 Now, it is widely considered to
be of a complex genetic disease, which is affected by environmental
factors together with multiple micro- or intermediate-effect
genes.[39]3, [40]4 Although the studies by the genome-wide association
study (GWAS) analysis have identified a number of significantly
associated variants with SCZ, most of them are located in noncoding
regions and their effects remain elusive.
In 2001, the mRNA expression in the whole genome was proposed as a
quantitative trait. Meanwhile, the first expression quantitative trait
locus (eQTL) mapping analysis, which relates SNP allelic variation to
target transcript abundance, was performed.[41]^5 Because the gene
expression is tissue specific and influenced by environmental factors,
integration of eQTL data and the variants associated with a specific
disease in specific tissue may reveal some problematic genes causing
diseases. Furthermore, many studies have indicated that significant
changes in gene expression rather than alterations in protein structure
and/or function play a crucial role in SCZ susceptibility.[42]6, [43]7,
[44]8 Accordingly, SCZ-susceptible variants could be eQTLs that would
influence the expression of some genes.
In the present study, we used the minimum Redundancy-Maximum Relevance
(mRMR) algorithm to identify the potential eQTL genes for SCZ by
integrating eQTL data from 10 human brain tissues from the
Genotype-Tissue Expression (GTEx) project with the results from a
meta-analysis of GWASs.[45]9, [46]10 Compared with common classifiers
of the Naive Bayes, a library for support vector machines (LIBSVM)
version (v.)3.22, linear discriminant analysis (LDA), and logistic
regression, mRMR algorithm has the advantages of reducing mutual
redundancy within the selected genes and effectively selecting the
genes to be more representative of the target phenotypes.[47]11,
[48]12, [49]13, [50]14 In addition to eQTL genes, their target genes
may also have some effects on SCZ; we subsequently used the identified
genes to explore their target genes in corresponding tissues, and we
determined their putative roles in the brain.
Results
SCZ Risk Genes Based on the Integration Analysis of eQTL and GWASs
A total of 10,301 SNPs met the GWAS significant threshold of p < 10e−8.
From the 10 brain tissues, 492,401 eQTL SNPs, which affected the
expression of 22,832 genes, were collected. Of these, only 134 SNPs
exhibited positive expression SNPs (eSNPs). Thus, for each of 10,000
SNP benchmark datasets, there were 134 positive eSNPs and 134 negative
randomly selected eSNPs. Subsequently, based on the MaxRel scores of
the eQTL gene feature in the mRMR analysis, we identified the most
discriminative eQTL gene features from different brain tissues for the
positive eSNPs of SCZ. Using the average MaxRel score of greater than
0.01 and the frequency of gene feature reappearance in the top 500
among all tested eSNP-gene pair matrix more than 70%, we identified 22
eQTL gene features, which included 12 candidate genes in eight
different brain tissues, excluding the anterior cingulate cortex BA24
and the caudate basal ganglia (see [51]Table S1).
Furthermore, these 12 genes were supported by at least one item of
evidence from the GWASs, gene differential expression ones, and/or
alternative eQTL data for replication. These genes may play crucial
roles in the pathogenesis of SCZ, and they can serve as potential
putative genes that increase the risk of developing SCZ. Of these, the
gene with the highest average MaxRel score was PRSS16 from cerebellum
(average MaxRel = 0.0311), which exhibited the most significant
association with SCZ and was only supported to be risk for SCZ by the
results of GWASs. Furthermore, this gene was also found to increase the
risk for SCZ in the cerebellar hemisphere and hippocampus. The second
most significant SCZ eQTL gene was complement factor 4A (C4A) in the
cerebellum and the frontal cortex BA9 (average MaxRel = 0.0165 and
0.0129, respectively), which was only supported to be risk for SCZ by
the results of the gene differential expression study GEO:
[52]GSE53987. Interestingly, the AS3MT gene was found to be a potential
risk gene for SCZ in the most number of tissues, i.e., the cerebellum,
cerebellar hemisphere, and cortex. Furthermore, in the cerebellum
noncoding RNA lnc-CNNM2-1 targeting the CNNM2 gene (average MaxRel =
0.0127) and CYP21A1P (average MaxRel = 0.0147) in the cerebellum and
ZNF192P1 in the cerebellum and cortex (both average MaxRel = 0.011)
were identified to be SCZ risk genes in the present study
([53]Figure 1).
Figure 1.
[54]Figure 1
[55]Open in a new tab
Association of eQTL with Corresponding Genes Based on the BrainCloud
eQTL Database
(A) rs17693963 with ZNF 192P1. (B) rs67682613 with CYP21A1P.
Potential Genes Interacted with SCZ Risk Genes Identified above
To determine the target interacting genes of the SCZ risk genes
identified, we first identified their coexpressed genes in each of the
corresponding brain tissues using the variance inflation factor (VIF)
regression algorithm, and then we used adjusted
[MATH: R2 :MATH]
to select the potential interactors. In total, 186 genes were
identified to interact with the nine SCZ candidate genes (i.e., ARL3,
AS3MT, C10orf32, C4A, CYP21A1P, HLA-DMA, PRSS16, ARL6IP4, and SNX19) in
the three brain tissues of cerebellum, frontal Cortex BA9, and nucleus
accumbens basal ganglia (see [56]Table S2). Of these, ARL6IP4 in the
nucleus accumbens basal ganglia exhibited the largest number (174) of
functionally relevant target genes. Moreover, the nucleus accumbens
basal ganglia interactor gene SNX19 had 96 target genes that probably
participate in a wide variety of physiological processes relevant for
SNX19. Another interactor gene, C4A, was identified with nine target
genes in the cerebellum and with four target genes in the frontal
cortex BA9. In the present study, only nine genes of all these
identified genes overlapped with known SCZ genes ([57]Figure 2).
Figure 2.
[58]Figure 2
[59]Open in a new tab
Venn Diagram Comparison among Three Groups of Genes
Known SCZ genes reported by GWASs, identified SCZ candidate genes in
the present study, and differentially expressed genes in PBMCs. Error
bars mean SD.
Enrichment Analysis
Gene enrichment analysis of the genes expressed in the brain indicated
that the candidate risk genes are significantly enriched within the
known SCZ genes[60]^15 (p = 0.015). Furthermore, gene ontology (GO)
enrichment analysis demonstrated that the genes were involved in a
variety of physiological and pathophysiological processes. Within the
molecular function GO category, all the above genes were significantly
enriched in protein binding (false discovery rate [FDR]-adjusted p =
5.75E−06) and poly(A) RNA binding (FDR-adjusted p = 2.02E−03). Within
the cellular component GO category, the significantly enriched terms
were cytosol (FDR-adjusted p = 1.06E−05), mitochondrion (FDR-adjusted
p = 1.07E−04), extracellular exosome (FDR-adjusted p = 2.85E−04), and
myelin sheath (FDR-adjusted p = 3.61E−03). Within the biological
process GO category, three enriched terms, specifically SRP-dependent
cotranslational protein targeting to the membrane (FDR-adjusted p =
8.18E−03), viral transcription (FDR-adjusted p = 2.99E−02), and
nuclear-transcribed mRNA catabolic process, nonsense-mediated decay
(FDR-adjusted p = 4.64E−02), were revealed (see [61]Table S2).
Results from pathway enrichment analysis, performed using the
hypergeometric test, are illustrated in [62]Figure 3. Eight of these
pathways fulfilled the criterion that –logp > 2. The top three pathways
associated with SCZ were EIF2 signaling, IGF-1 signaling, and
14-3-3-mediated signaling. Moreover, interestingly, all proteins in
L-cysteine Degradation III pathways (i.e., MPST and GOT1) were among
the candidate SCZ proteins ([63]Table S3).
Figure 3.
[64]Figure 3
[65]Open in a new tab
The Top Eight Signaling Pathways in which All Identified Genes in the
Present Study Are Enriched
Systematic Review of 14-3-3 Isoforms
Because 14-3-3 protein includes seven isoforms (β, ε, γ, σ, η, θ, and
ζ) and the 14-3-3-mediated pathway is involved in SCZ, we attempted to
identify the isoforms that might play a role in SCZ. To that end, we
performed an updated systemic review of 14-3-3 isoforms with SCZ. The
previous results are listed in [66]Table S4. In total, 11 studies
meeting the analysis criteria were included; they concerned six
isoforms, namely, β, ε, γ, η, θ, and ζ. Among these studies, p values
were calculated on the basis of either Student’s t test or a
multivariate analysis of covariance. All studies of the θ isoform had p
values less than 0.05. Since a multivariate analysis of covariance is
more strict, after excluding the studies using Student’s t test, all
six isoforms were significantly associated with SCZ; and, furthermore,
the average fold changes (FCs) for the six isoforms β, ε, γ, η, θ, and
ζ were 0.89, 1.42, 0.741, 1.135, 0.787, and 0.879, respectively.
According to one study,[67]^16 a variation of a minimum 40% is viewed
as significant regulation; thus, the results suggest that the ε and θ
isoforms tend to play important roles in SCZ.
Potential Candidate Genes for Clinical Diagnosis
Among the genes identified in brain tissues, inter-α-trypsin inhibitor
H4 (ITIH4), MOSPD3, SNAP25, RNPEPL1, UBE4A, SLC25A39, ZNF688, ANK2,
BAD, and THAP7 were found to be significantly dysregulated in
peripheral blood mononuclear cells (PBMCs) of patients with SCZ with
the FC > 1.5 (see [68]Table S5). Furthermore, ITIH4 (p[adj.] = 0.010,
logFC = −1.102), SNAP25 (p[adj.] = 0.026, logFC = −1.373), RNPEPL1
(p[adj.] = 0.028, logFC = −0.725), UBE4A (p[adj.] = 0.044, logFC =
−0.780), BAD (p[adj.] = 0.030, logFC = −0.671), and THAP7 (p[adj.] =
0.048, logFC = −0.878) were found to be downregulated significantly in
patients with SCZ, whereas the significantly upregulated genes were
SLC25A39 (p[adj.] = 0.0002, logFC = 0.979), ZNF688 (p[adj.] = 0.012,
logFC = 0.606), ANK2 (p[adj.] = 0.026, logFC = 0.637), and MOSPD3
(p[adj.] = 0.003, logFC = 0.611). The common genes among those known
for SCZ, candidate genes in the brain and dysregulated expressed genes
in PBMCs are also depicted in [69]Figure 1. However, only ITIH4 was
found to display an overlap among these three groups, and, therefore,
it may serve as a potential putative gene for diagnosing SCZ by a blood
test.
Discussion
Schizophrenia is a multifactorial and polygenic psychiatric disorder.
Due to limited sample size, several GWASs on SCZ reported various
independent genomic loci exceeding genome-wide significance, i.e., p <
10^−8.[70]17, [71]18, [72]19, [73]20 Furthermore, most of the
identified risk variants are located in noncoding regions. How these
risk variants contribute to SCZ susceptibility remains unidentified.
Therefore, the Schizophrenia Working Group of the Psychiatric Genomics
Consortium (PGC) was created to combine all available SCZ samples with
published or unpublished GWAS analysis genotypes into a single,
systematic meta-analysis.[74]^10 Since many studies implicated that
changes in gene expression rather than alterations in protein structure
and/or function play critical roles in SCZ susceptibility,[75]7, [76]8
it was suggested that those risk variants in GWAS may alter the
expression of SCZ-related genes rather than protein function.
Furthermore, it is well known that brain tissues appear to be most
relevant to SCZ; however, so far which part of the brain plays a
significant role in SCZ remains elusive. In the present study, we
integrated eQTL data from 10 brain tissues and genetic association
findings from the largest meta-GWAS on SCZ with a total of 150,064
subjects using mRMR method, and we identified the potential putative
interactors in corresponding brain tissues.
The simple mRMR approach is one of the most potent methods proposed by
Peng et al. to use mutual information (MI) for gene feature selection
based on microarray gene expression data.[77]11, [78]21, [79]22,
[80]23, [81]24 Probably, mRMR is much faster and in practice more
robust, since this algorithm is theoretically more efficient to perform
an optimal max-dependency selection and produce a feature set with
little pairwise redundancy, and usually mRMR yields more excellent
classification accuracy than other classifiers (e.g., LIBSVM, LDA,
Naive Bayes, logistic regression, etc.). Using this algorithm, we found
that cerebellum is the most closely linked to SCZ since the most amount
of genes was identified within it than other brain tissues. Cerebellum
has been established to be associated with the auditory, cognitive, and
social behavior of SCZ in addition to motor function.[82]^25
Furthermore, our results found that C4A, known for its role in
immunity, is an eQTL gene in cerebellum and frontal cortex, which
supports that C4A is an authentic risk gene for SCZ. The C4A gene,
located in major histocompatibility complex (MHC) class III region on
chromosome 6, encodes the acidic form of complement factor 4, which is
the primary effector of the innate and the adaptive immune system, and
is involved in the classical pathway of complement activation
system.[83]^26 Recently, studies reported that C4 might play essential
roles in the pathogenesis of SCZ.[84]26, [85]27 It was also suggested
that some C4 variants in the brain caused significant differential
expression of C4A and C4B and the SCZ-related common C4 allele is more
likely to cause higher expression of C4A.[86]^26
In the current study, at least three items of evidence support that
ITIH4 is a risk gene for SCZ. Also, interestingly, ITIH4, which was
also identified as an eQTL gene in putamen basal ganglia, was found to
be significantly decreased in the serum of patients suffering
acute-phase processes.[87]^28 ITIH4 is one of the heavy chains of
inter-α-trypsin inhibitor (ITI), which encodes the ITI family molecules
with four other homologous heavy chains and one light chain. It has
been demonstrated that the ITIH3-ITIH4 region is one of the most
significantly associated with SCZ and bipolar disorder.[88]^20 Also, we
have identified that the SNPs rs2239547, rs4687552, and rs2535627 in
ITIH4 exceed the GWAS threshold and regulate expression of ITIH4. Over
the last decade, many research groups have been interested in finding a
reliable clinical biomarker for the early detection of SCZ.[89]2,
[90]29, [91]30 Although significant differences between patients with
SCZ and healthy controls have been found in brain structure, functional
brain imaging, gene expression, and genetic polymorphisms, etc., the
overlap of reported abnormalities between patients and healthy controls
indicates that there is no valid diagnostic test for establishing a
concrete early diagnosis of SCZ.[92]^29 Here, supported by the above
multiple findings, ITIH4 is suggested to be a potential clinical
biomarker for the diagnosis of SCZ through a blood test, which can
provide easy operation and objective diagnosis criteria. Furthermore,
we identified two risk genes risk for SCZ, including CYP21A1P and
ZNF192P1. These are pseudogenes, whose products function as regulatory
elements. Although we identified three target genes for CYP21A1P,
further work is warranted to investigate the mechanism underlying these
genes.
Since SCZ is a complex disease, multiple genes/pathways are involved in
disease progression. Thus, we further explored target genes within the
eQTL-corresponding brain tissues using the VIF regression algorithm,
which provided a list of prioritized genes. With all identified genes
in the brain, we performed pathway analysis. The most relevant pathway
of eIF2 signaling was suggested. eIF2 is a multimeric protein
consisting of α, β, and γ subunits, and it is generally considered to
affect the maintenance of a rate-limiting step in mRNA
translation.[93]^31 eIF2 signaling has important roles in the
pathogenesis of SCZ as the corresponding stressors (starvation, virus,
cytokines, and oxidative and endoplasmic reticulum stress) activate
eIF2α kinases, which ultimately suppress protein synthesis through a
series of reactions of phosphorylated eIF2-alpha.[94]^32 The second
significant pathway identified was IGF1 signaling. IGF1, insulin-like
growth factors 1, is a multifunctional protein whose amino terminus is
highly homologous to the insulin B chain, which makes it possible to
promote the consumption of glucose in adipose tissue via the
insulin/IGF1 axis.[95]^33 Previous studies on IGF1 signaling in human
neuroblastoma cells demonstrated that IGF1 signaling is involved in
SCZ, as the pharmacological stimulation of muscarinic and insulin/IGF1
receptors reverses the expression levels of the specific subunits of
disordered genes in SCZ.[96]^34 Another critical pathway including
14-3-3 proteins was also identified, which is a family of highly
conserved, multifunctional proteins highly expressed in the brain
during development. Moreover, many studies have examined the 14-3-3
family gene and protein expression in the brain of patients with SCZ,
and 14-3-3 proteins include seven isoforms, β, ε, γ, η, σ, θ, and
ζ;[97]35, [98]36 however, conflicting results have been obtained, and
which isoform plays a role in SCZ remains to be elucidated.[99]16,
[100]35, [101]36, [102]37, [103]38 Our results suggested that the
isoforms of ε and θ have essential roles in SCZ, although other
isoforms required more data to validate.
There are some limitations to the present analysis that need to be
acknowledged and addressed. First, in addition to SNPs, other variants,
such as copy number variation (CNV) and chromosomal aberration, may
contribute to gene expression alteration and SCZ. In the present study,
GTEx project V6p eQTL only provides the full data about the SNPs. If
more data about other variants were available, the data could be used
in the mRMR analysis. Second, as pointed out in Chou and Shen[104]^39
and demonstrated in a series of recent publications[105]40, [106]41,
[107]42, [108]43, [109]44, [110]45, [111]46, [112]47, [113]48, [114]49,
[115]50, [116]51, [117]52, [118]53, [119]54, [120]55, [121]56, [122]57,
[123]58, [124]59, [125]60, [126]61, [127]62, [128]63, [129]64, [130]65,
[131]66, [132]67, [133]68, [134]69, [135]70, [136]71, [137]72,
user-friendly and publicly accessible web servers represent the future
direction for developing practically more useful prediction methods and
computational tools. Actually, many practically useful web servers have
increasing impacts on medical science,[138]^73 driving medicinal
chemistry into an unprecedented revolution.[139]^74 We shall make
efforts in our future work to provide a web server for the prediction
method presented in this paper. (Once the web server has been
established, an announcement will be made in the official website of
Bio-X Institutes and via the MTNA journal.) Last, it is widely
considered that environmental factors and genetic factors work together
to induce many diseases, including SCZ. Gene expression is the direct
result from environmental and genetic factors. Although here we focus
on the integration of data from eQTL and GWAS to identify SCZ risk
genes, we do not identify those genes associated with a certain
environmental condition, and further studies are required to explore
the specific genes for an environmental factor related to SCZ.
Conclusions
Our analysis by integrating the data from brain eQTL and GWAS of SCZ
using the mRMR algorithm has indicated that cerebellum may play a
crucial role in the pathogenesis of SCZ. Also, ITIH4 may be utilized as
a clinical biomarker for the diagnosis of SCZ, since its quantity has
been observed significantly decreased in the serum. Furthermore, three
major pathways, i.e., EIF2 signaling, IGF-1 signaling, and
14-3-3-mediated signaling, have been identified to confer risk of SCZ.
Further in-depth studies, both experimental and theoretical, are needed
to reveal the molecular mechanism of such important findings.
Materials and Methods
Benchmark Dataset
According to the 5-step rule[140]^75 widely used in performing various
genome or proteome analyses[141]40, [142]41, [143]42, [144]43, [145]44,
[146]76, [147]77, [148]78, [149]79, [150]80, [151]81, [152]82, [153]83,
[154]84, [155]85, [156]86, [157]87, [158]88, [159]89, [160]90, the
first important thing is to construct or select an effective benchmark
dataset.
In this study, the raw eQTL data were taken from 10 human brain
tissues, i.e., anterior cingulate cortex BA24, caudate basal ganglia,
cerebellar hemisphere, cerebellum, cortex, frontal cortex BA9,
hippocampus, hypothalamus, nucleus accumbens basal ganglia, and putamen
basal ganglia, from the GTEx and association information on SNPs was
taken from the genome-wide meta-analysis about SCZ.[161]9, [162]10 In
this meta-analysis,[163]^10 a total of 36,989 cases with SCZ and
113,075 healthy controls was considered, and p values of a total of
9,444,231 SNPs were calculated for their genetic association with SCZ.
The GTEx project (V6p eQTL)
([164]https://gtexportal.org/home/datasets), which is currently the
most massive eQTL project including the gene expression and genotype
data of 53 normal human tissues from 544 donors, provides the
association p values for SNPs regulating the gene expression.[165]^9
The p value for each SNP-gene pair in GTEx databases was transformed
into
[MATH: −log10(pvalue) :MATH]
. Then, when a variant had no significant effects on gene expression,
the
[MATH: −log10(pvalue) :MATH]
was set to be 0, i.e., p value of a corresponding SNP = 1. When a
variant had significant effects on gene expression, i.e., eQTL, the
[MATH: −log10(pvalue) :MATH]
for this eQTL was more than 0. Moreover, those eQTLs that were
significantly associated with SCZ were classified as positive eSNPs,
and those that were not were referred to as negative eSNPs. Since the
number of negative eSNPs was much higher than that of the positive eSNP
set, we randomly selected 10,000 negative eSNP sets, each of which
matched the number of the total positive eSNP. Then, a benchmark
dataset was constructed by the total positive eSNPs and each randomly
selected negative eSNP set with the same number. Thus, overall, there
were 10,000 eSNP benchmark datasets.
Based on each benchmark dataset, an eSNP-gene matrix was constructed
for the next analysis. In this matrix, the rows were eSNPs, whereas the
columns were class of eSNP, i.e., positive or negative ones, and genes
regulated by the eSNPs from the 10 brain tissues. Totally, 22,832 eQTL
genes were included in this matrix for each eSNP.
The mRMR Method Integrating Brain eQTL and GWASs
The mRMR algorithm has been widely used in computational biology for
genome and proteome analyses.[166]41, [167]91, [168]92, [169]93,
[170]94, [171]95 Here we also used the mRMR approach to identify the
potential eQTL genes for SCZ by calculating the MI between two features
and ranking these features.[172]^11 Given two variables x and y, their
MI value can be calculated according to the following equation:
[MATH:
I(x,y)=∫∫p(x,y)logp(<
mrow>x,y)p(x)p<
mo>(y)dx
dy, :MATH]
(1)
where p(x) and p(y) are the marginal probabilities of x and y; and
p (x, y) is their joint probabilistic distribution. Using the value of
MI, the distance between two variables can also be quantitatively
measured. Based on the definition of MI, the MaxRel distance can be
formulated as the distance between a given feature and the target
classes, which reflects the relevance between the eQTL gene features
from 10 brain tissues and positive eSNPs. A larger MaxRel score, which
is highly interpretative and can reveal the difference between target
classes, is indicative of a stronger relevance. Since there were 10,000
benchmark datasets, there were 10,000 MaxRel scores for each eQTL gene.
We ranked the eQTL genes based on both the MaxRel scores for each
benchmark dataset and the average of the MaxRel scores for all tested
benchmark datasets.
Furthermore, the identified candidate genes were evaluated by searching
more evidence for them as potential SCZ risk genes in the SZDB database
([173]http://www.szdb.org/). In this database,[174]^96 SCZ risk genes
reaching the genome-wide significance level were extracted from
multiple GWASs and 5 microarray datasets, including GEO: [175]GSE53987
(114 samples of prefrontal cortex, striatum, and hippocampus),[176]^97
GEO: [177]GSE12649 (69 post mortem samples of prefrontal
cortex),[178]^98 GEO: [179]GSE21138 (59 postmortem samples of
prefrontal cortex s),[180]^99 GEO: [181]GSE35978 (195 samples of
cerebellum and parietal cortex brain),[182]^100 and GEO: [183]GSE62191
(59 samples of frontal cortex)[184]^101
([185]https://www.ncbi.nlm.nih.gov/geo/). Moreover, BrainCloud eQTL
database,[186]^102 which contains the eQTL data from the human post
mortem dorsolateral prefrontal cortex (DLPFC) of 261 normal human
subjects in Caucasians and African Americans, was used for replication
analysis of the eQTL association.
Identify the Potential Interactors of SCZ Risk Genes in the Brain
To identify the target interacting genes of each eQTL gene, the VIF
regression algorithm, an efficient and accurate method, was
adopted.[187]^103 The objective of this algorithm used here was to
select the optimal genes as interactors that can fit the expression
pattern of the interesting genes. We tried to identify the optimal that
could minimize the penalized sum of squared errors, l[0], using the
algorithm represented by the following equation:
[MATH:
argmin
β{‖y−Xβ‖<
mn>22+λ0‖β‖lo
}, :MATH]
(2)
where
[MATH:
y=(y1,…,y
mi>n)' :MATH]
are n observations of the target gene,
[MATH:
X=(X1
mn>,…,Xp) :MATH]
are p interactors,
[MATH:
‖β‖l0=∑i=1p
I{β≠<
mn>0}.
:MATH]
This algorithm calculates the correlations of each candidate interactor
with the interesting genes using a small presampled dataset, and it
searches the optimal interactor subset by applying t-statistic with a
correction procedure when adding or removing one interactor at a time.
The R package [188]http://cran.r-project.org/web/packages/VIF/ was used
to implement the VIF method.
Furthermore, to assess the goodness of fit for VIF regression models,
we calculated the adjusted coefficient of determination, also known as
adjusted
[MATH: R2 :MATH]
,[189]^104 which measures how well the regression model fits the real
data points and considers the number of interactors that have been
used. In the present study, the regression models with adjusted
[MATH: R2 :MATH]
values greater than 0.6 were considered. The scheme for the exploration
of candidate SCZ genes is shown in [190]Figure 4.
Figure 4.
[191]Figure 4
[192]Open in a new tab
Flow Chart Detailing the Inclusion Process to the Present Study
Enrichment Analysis
To gain a better understanding of the biological effects of all the
identified genes, we performed GO enrichment analysis.[193]^105 Using a
hypergeometric test, we analyzed whether all the above genes, including
eQTL genes and their interactors, significantly overlapped certain GO
terms.[194]^106 For each specific GO gene set, the hypergeometric test
p value was caudated as
[MATH:
P=∑k=
mo>mn<
mtr>(M
k)
(<
mi>N−Mn
mi>−k)(Nn),
:MATH]
(3)
where N is the number of all human genes, M is the number of GO genes,
n is the number of interesting genes, and m is the number of
interesting genes that are GO disease genes. To control the FDR, the p
values of the hypergeometric test were adjusted with the
Benjamini-Hochberg method.[195]^107
Furthermore, not only the overlap with GO but also the overlap with the
reported SCZ genes was evaluated. The known SCZ genes reported by GWASs
and genes expressed in brain tissues are listed in [196]Table S1. In
addition, we identified the canonical pathways associated with these
SCZ candidate genes using the Ingenuity Pathway Analysis (IPA) suite
([197]https://www.qiagenbioinformatics.com/). In canonical
pathway-based analysis, the criteria for involved significant pathways
was set as
[MATH: –logp>2
:MATH]
.
Systematic Review of 14-3-3 Isoforms Associated with SCZ
Further, to determine the association of 14-3-3 isoforms with SCZ, we
performed an updated systematic review with a literature search of
studies published between January 1990 and December 2017 in six
English-language databases (PubMed, Embase, Web of Science,
ScienceDirect, SpringerLink, and EBSCO) and two Chinese databases
(Wanfang and Chinese National Knowledge Infrastructure databases). The
following keywords were used: 14-3-3 or YWHA and SCZ. The scheme for
this systematic review is described in [198]Figure S1.
Data extraction was independently performed by two investigators; any
discrepancies between the two reviewers were resolved through
discussion, and a consensus was reached by a third party who was from a
different organization. Inclusion criteria for the analysis were as
follows: (1) detailed diagnosis definition of SCZ; (2) sample size, FC,
and p value; and (3) at least three qualifying studies per isoform. The
strength of the associations between gene expression levels and SCZ was
measured by calculating the FC and p value.
Potential Candidate Genes for Diagnosis
To identify the potential candidate genes for the blood test, the gene
expression profile of PBMCs was examined in our previous
study.[199]^108 Briefly, blood samples from 18 first-onset SCZ patients
(8 males and 10 females, aged 14.78 ± 1.70 years) and 12 healthy
controls (6 males and 6 females, aged 14.75 ± 2.14 years) were
collected. The patients were untreated and drug naive and were
independently diagnosed by at least two experienced psychiatrists
according to the Diagnosis and Statistical Manual of Mental Disorders
Fourth Edition (DSM-IV) criteria for SCZ. Agilent Human LncRNA
Microarray v.2.0 and 17,200 valid probes were used to identify the
putative clinical gene biomarkers. All participants have provided
informed consent in accordance with the approval of the Bioethics
Committee of Bio-X Institutes of Shanghai Jiaotong University and the
principles set forth by the Declaration of Helsinki.
Author Contributions
L.C., L.H., and K.-C.C. conceived the study. L.C., T.H., and J.S.
designed and performed the analyses. L.C., T.H., and X.Z. drafted the
manuscript. W.C. and F.Z. provided the data and performed gene
expression tests. L.C., and K.-C.C. finalized the paper.
Conflicts of Interest
The authors have no conflict of interest.
Acknowledgments