Graphical abstract
graphic file with name fx1.jpg
[39]Open in a new tab
Highlights
* •
A high-quality molecular network for schizophrenia (SCZ Network)
* •
A landscape of molecular pathways linking immune activation and
schizophrenia
* •
The spatiotemporal network dynamics revealing stages susceptible to
immune activation
* •
Identification of the molecular pathways and regulators in the
immune-activated brain
__________________________________________________________________
Clinical neuroscience; Immunology; Bioinformatics; Biocomputational
method; Systems biology
Introduction
Schizophrenia (SCZ) has a lifetime incidence rate around 1%, and is
associated with patient disability and severe socioeconomic burdens for
families and our entire society ([40]Knapp et al., 2004; [41]Saha
et al., 2007). It is a complex polygenic disorder with a wide range of
psychiatric symptoms that are generally categorized as positive,
negative, or cognitive ([42]International Advisory Group for the
Revision of ICD-10 Mental and Behavioural Disorders, 2011; [43]Janca
et al., 1993). Schizophrenia has a heritability of about 80%
([44]Cannon et al., 1998; [45]Farmer et al., 1987) and its genetic risk
factors show very high heterogeneity. Studies of functional genetics
have identified several schizophrenia susceptibility genes, such as
DISC1 ([46]Blackwood et al., 2001), PDE4B ([47]Millar et al., 2005),
NPAS3 ([48]Yu et al., 2014), and NRG1 ([49]Harrison and Law, 2006).
Many SCZ-associated loci have been identified through genome-wide
association studies (GWASs) ([50]O'Donovan et al., 2008; [51]Polednak,
1991; [52]Ripke et al., 2013; [53]Schizophrenia Psychiatric Genome-Wide
Association Study, 2011). Recently, two large cohort GWASs have
identified robust and replicable common alleles with significant
associations to schizophrenia ([54]Pardinas et al., 2018;
[55]Schizophrenia Working Group of the Psychiatric Genomics, 2014). The
major histocompatibility complex (MHC) has been found to be associated
with schizophrenia by several groups ([56]International Schizophrenia
Consortium et al., 2009; [57]Pardinas et al., 2018; [58]Schizophrenia
Working Group of the Psychiatric Genomics, 2014; [59]Shi et al., 2009;
[60]Stefansson et al., 2009; [61]Walters et al., 2013). Some hotspots
of genomic deletions or duplications have been identified at 1q21.1,
15q11.2 and 15q13.3, 22q11.2, and 16p11.2 ([62]Rees et al., 2014;
[63]Szatkiewicz et al., 2014). In addition, whole exome sequencing
discovered de novo mutations, which are notably enriched in
glutamatergic postsynaptic proteins comprising of activity-regulated
cytoskeleton-associated protein (ARC) and N-methyl-D-aspartate receptor
(NMDAR) complexes ([64]Fromer et al., 2014; [65]Xu et al., 2011),
suggesting that a dynamic synapse regulation may play a role in the
disease development. Recent transcriptomic studies on the postmortem
brain samples of the patients have identified SCZ-dysregulated genes in
the brain, which are enriched for synaptic genes and immune genes
([66]Bowen et al., 2019; [67]Fromer et al., 2016).
A great challenge of studying the molecular mechanisms underlying
schizophrenia and of identifying drug targets is that some loci are
common variants, and each may only play a small role in the
pathogenesis. Rare variants are found to significantly increase the
disease burden ([68]Fromer et al., 2014; [69]Singh et al., 2017; [70]Xu
et al., 2011), and they are suggested to have bigger effects and thus
higher penetrance ([71]Singh et al., 2017). Another challenge is that
“gene-environment interaction” and “environment-environment
interaction” may play a critical role in schizophrenia ([72]Khan and
Powell, 2018). The “Two-Hit Hypothesis” of schizophrenia, proposed in
1999, suggests that schizophrenia needs two events (“hits”) for its
development: the first event (i.e., basic requirement) represents
genetic factors such as germline mutations; and the second consists of
environmental factors such as viral infections and social stressors
([73]Bayer et al., 1999; [74]Feigenson et al., 2014). Later on, it has
been suggested that in-utero or early life factors, such as maternal
infection and stress, may also serve as “the first hit” ([75]Estes and
McAllister, 2016).
In any case, accumulating evidence points to the important role of
abnormal immune activation in the etiology of schizophrenia.
Proinflammatory cytokine levels have been consistently reported to be
increased in the serum ([76]Goldsmith et al., 2016; [77]Miller et al.,
2011) and also in the prefrontal cortex ([78]Fillman et al., 2013;
[79]van Kesteren et al., 2017) of schizophrenia patients. Prenatal
exposure to maternal immune activation increases the risk of
schizophrenia later in adult life ([80]Brown, 2012; [81]Brown and
Derkits, 2010; [82]Estes and McAllister, 2016); adolescent stress is
thought to be another driving factor ([83]Gomes and Grace, 2017).
Moreover, as previously mentioned, genome-wide association studies have
identified loci covering genes involved in immune function, such as the
major histocompatibility complex (MHC) ([84]International Schizophrenia
Consortium et al., 2009; [85]Pardinas et al., 2018; [86]Schizophrenia
Working Group of the Psychiatric Genomics, 2014; [87]Shi et al., 2009;
[88]Stefansson et al., 2009; [89]Walters et al., 2013). Transcriptome
sequencing of postmortem brain samples found transcriptional alteration
associated with widespread increasing inflammation in the brain of
schizophrenia patients ([90]Lanz et al., 2019). A recent study found
increased microglia-mediated synaptic pruning in SCZ patient-derived
neural culture ([91]Sellgren et al., 2019). All these lines of evidence
point to the critical role of abnormal immune activation in
schizophrenia. However, studying the molecular mechanism that links
abnormal immune activation to schizophrenia remains a major challenge.
Despite the large number of genes involved in SCZ, patients display
similar core phenotypic features, suggesting that SCZ risk genes may
share a common molecular network. Therefore, a high-quality molecular
network underlying this disease is urgently needed for illustrating the
molecular mechanisms on how genetic and environmental factors may lead
to the development of schizophrenia. Proteome-scale studies of protein
interactions have shed light on the etiology of diseases and putative
therapeutic targets ([92]Cusick et al., 2005; [93]Gonzalez and Kann,
2012). High-throughput mapping of protein interactions ([94]Havugimana
et al., 2012; [95]Hein et al., 2015; [96]Rolland et al., 2014; [97]Yang
et al., 2016) and literature curation of small-scale studies
([98]Cusick et al., 2009) have accumulated huge amounts of
protein-protein interaction data. Recently, the interactions from
different sources have been integrated into a comprehensive scored
human interactome network containing about 600,000 interactions ([99]Li
et al., 2017). This vast amount of protein interaction data provides a
potential molecular network for illustrating pathways underlying
complex disorders such as schizophrenia. “Random walk” was first
developed to study the network topology ([100]Lovász, 1993), and later
“random walk with restart” (RWR) was introduced for Internet search
engines ([101]Page, 1998). More recently, RWR has been utilized to
search for novel disease candidate genes, using known disease genes as
seeds in a molecular network ([102]Kohler et al., 2008;
[103]Valdeolivas et al., 2019).
Here, we have developed an approach to identifying disease-associated
neighborhoods across the human interactome network, which we termed as
“neighborhood walk” (NW). Utilizing the NW approach in combination with
RWR, we have constructed a high-quality protein interaction network for
schizophrenia (SCZ Network). This network is enriched in signaling
pathways regulating immune response as well as synapse formation and
modulation. By integrating the spatiotemporal gene expression data of
human brain with the SCZ Network, we found that the immune response
genes in this network are enriched at two developmental stages (mid- to
late fetal stage and puberty stage). Therefore, it is likely that at
these two developmental stages, the SCZ Network is most sensitive to
the perturbation of immune activation. To further verify that the
potential schizophrenia signaling pathways are sensitive to the
perturbation of immune activation during the two critical stages, we
induced immune activation at these two stages and carried out
transcriptome sequencing on immune-activated mouse brains. By
integrating the transcriptome data from immune-activated mouse brains
with the SCZ Network, we have identified potential molecular pathways
and key regulators, which may illustrate a detailed view of the
mechanisms on how maternal immune activation during fetal development
may increase the risk for schizophrenia with a second immune activation
at puberty.
Results
Constructing a protein-protein interaction network for schizophrenia using
the “neighborhood walk” and “random walk with restart” methods
To collect high-confidence schizophrenia candidate genes, we used a
recently updated comprehensive database for schizophrenia genetic
research (SZDB) ([104]Wu et al., 2017, [105]2020). The SZDB (version
2.0) collects a total of 7,903 schizophrenia candidate genes from
literature, which were identified by 7 different methods in 87 studies
including newly published genome scale studies ([106]Wu et al., 2017,
[107]2020). We re-evaluated each source data and selected a core set of
1,836 high-confidence SCZ candidate genes, including 223 genes from 4
linkage studies, 529 genes from 2 large-cohort genome-wide association
studies, 153 genes from 77 copy number variation studies, 543 genes
from 1 transcriptomic study on postmortem brains, 63 genes from 1
imputation study, 307 genes from 3 studies on differential methylation
and 165 genes from 14 studies on exome sequencing ([108]Figures S1A and
S1B). Of the 1,836 high-confidence SCZ candidate genes, 1,720 are
protein-coding genes, which were used as seed genes to expand SCZ
candidate genes. Of these 1,720 seed genes, 55.4% (953 of 1,720) are
reported in two or more studies ([109]Figure 1A), while 92.3% (1,588 of
1,720) are found by only one of the seven methods ([110]Figure 1B),
suggesting that each method may have its bias. To test the functional
coherence of the selected seed genes, we mapped these 1,720 seeds to
human interactome to measure their interconnectivity. The same number
of randomly chosen coding genes from 7,903 candidate genes was used as
control. Compared with the candidate genes in the source data, selected
seed genes have a significantly larger main interconnected network
(empirical p = 0.001, [111]Figure 1C), suggesting that these seed genes
are more functionally related.
Figure 1.
[112]Figure 1
[113]Open in a new tab
Construction of SCZ Network from human interactome using Neighborhood
walk (NW) and Random walk with Restart (RWR) methods (See also
[114]Figure S1)
(A) The distribution of the number of studies (PubMed IDs) reporting
the SCZ association of each of the core set of 1,720 high-confidence
SCZ candidate genes: 767 genes are supported by only one study, 708
genes by two studies, and 245 genes by at least 3 studies.
(B) Distribution of the methods for identifying the SCZ association of
each of the core set of 1,720 high-confidence SCZ candidate genes:
1,588 genes supported by one method, 119 genes supported by two methods
and 13 genes supported by at least 3 methods. The methods include
linkage analysis, genome-wide association studies (GWAS), copy number
variations (CNV), differentially expressed genes (DEG), imputation
analysis, differentially methylated genes (DMG) and whole exome
sequencing (WES).
(C) The interconnectivity of selected core set of SCZ candidate genes
(seeds). The core set of 1,720 high-confidence SCZ candidate genes were
mapped to human interactome and an interconnected network of 1,298 seed
genes was extracted. The same number of randomly chosen coding genes
from 7,903 SCZ candidate genes in the source data were used as control.
Compared with the 1,000 random samplings of SCZ candidate genes in the
source data, selected seed genes have a significantly larger
interconnected network (empirical p = 0.001).
(D)Mapping seeds onto the human interactome. The 1,720 high-confidence
SCZ candidate genes (as seeds) were mapped onto the comprehensive human
interactome network which contains 17,210 nodes (1,592 seeds, 15,618
nonseeds) and 597,859 edges.
(E) A schematic for RWR and NW. Top: the genes in the human interactome
network were ranked based on their proximity with “seed genes” using
“random walk with restart (RWR)” approach to expanding SCZ candidate
genes. Bottom: The neighborhoods in the human interactome network were
defined and ranked based on their “seed ratio” using “neighborhood walk
(NW)” approach.
(F) Extracting SCZ Network from human interactome. Comparison between
RWR-extracted network and NW-extracted network is shown as the Venn
diagram. RWR-extracted subnetwork (RWR-Subnet): the genes with RWR
score more than 1e-05 were included as expanded SCZ candidate genes and
a network containing the 1,592 seed genes and 5,920 expanded SCZ
candidate genes were extracted from the human interactome. NW-extracted
subnetwork (NW-Subnet): The neighborhoods with seed ratio ≥75% were
extracted from the human interactome and these neighborhoods were
further combined into a network containing 1,592 seed genes and 3,858
expanded candidate genes. The SCZ Network: The intersection of
RWR-extracted subnetwork and NW-extracted subnetwork was taken as the
disease network for schizophrenia, containing 4,573 nodes (1,592 seeds;
2,981 candidates) and 192,392 edges.
From a network perspective, products of the SCZ disease genes should
interconnect at the molecular level to determine the complex traits of
this disease. To identify additional SCZ candidate genes that may
function together with the known SCZ candidate genes, we mapped the
1,720 high-confidence SCZ candidate genes onto a high-quality
comprehensive human interactome network ([115]Figure 1D), which
contains 17,210 protein-coding genes (nodes) and about 600,000
interactions (edges) ([116]Table S2) ([117]Li et al., 2017). Of the
1,720 high-confidence SCZ candidate genes, 1,592 were located in the
human interactome network. These 1,592 seed genes were more connected
than randomly picked nodes with the same degrees ([118]Figures S1C and
S1D), suggesting that the seed genes were functionally related. Using
these seed genes, we applied two strategies and combined the results to
construct a protein interaction network for schizophrenia
([119]Figure 1E). Firstly, we carried out the “random walk with
restart” (RWR) approach to rank the genes according to their similarity
with these seed genes in the human interactome network. Secondly, we
developed a method, “neighborhood walk” (NW), to rank the neighborhoods
according to their seed ratios and p-value, which were considered as
the levels of association to schizophrenia.
RWR has been adapted for ranking novel candidate genes using known
disease genes as seeds in molecular networks ([120]Kohler et al., 2008;
[121]Valdeolivas et al., 2019). Here, we used RWR to expand SCZ
candidate genes in the human interactome ([122]Figure 1E). We extracted
an RWR-retrieved subnetwork containing 7,512 genes (including 1,592
seed genes and 5,920 expanded SCZ candidate genes) using RWR score =
1 × 10^−5 as the cutoff for choosing the expanded candidate genes
([123]Figure 1F and [124]Table S3).
The neighborhood walk (NW) approach was based on the idea that
functionally related genes are more likely to cluster together in the
same neighborhood in a molecular network. Therefore, the neighborhoods
with higher level of seed density are more likely to be associated with
schizophrenia. Using NW approach to searching for neighborhoods in the
human interactome network based on seed density ([125]Figure S1E, See
[126]STAR Methods for details), we were able to identify the network
neighborhoods that may be involved in schizophrenia. Briefly, we
initially mapped the SCZ candidate genes as “seeds” onto the human
interactome network and assigned a risk score (R-score) to each seed
gene according to the number of methods used in identifying the gene as
an SCZ candidate gene ([127]Table S3). For each nonseed gene, we
calculated a guilt score (G-score = the average R-score of its
interactors) ([128]Table S3). For each edge, we calculated an edge
weight between the two interacting nodes (W = the sum of the R-score
and/or G-score) and a weighted edge length (WEL = 1/W) ([129]Table S3),
and for any two nodes in the network, we calculated a weighted shortest
distance
[MATH: (WSD=∑(1/Wi
)) :MATH]
([130]Figure S1E). To search for SCZ-associated neighborhoods, we
started the “walking” from each seed and stopped at a potential
neighborhood border at a certain walking distance (WSD) from the
origin, i.e., if further “walking” to the next sphere made the seed
ratio in the neighborhood drop significantly ([131]Figure S1E). We
obtained 1,592 potential SCZ-associated neighborhoods (one for each
seed) and ranked them according to the seed ratio and p-value
([132]Table S3). Three neighborhoods were chosen as examples to be
visualized in the background of human interactome network using
Cytoscape, and functional enrichment was performed to identify the main
functional terms ([133]Figure S1F), showing topological and functional
coherence. The neighborhoods with seed ratio ≥75% were selected and
combined into a subnetwork (NW-based Subnetwork), which contains 1,592
seed genes and 3,858 expanded SCZ candidate genes.
The NW approach was designed based on the hypothesis that disease genes
of a complex disease were functionally converged and thus clustered
together in neighborhoods of the human interactome network. In order to
test this hypothesis, we randomly picked 1,592 genes with exactly the
same degrees as the 1,592 seeds and, starting with the
degree-controlled fake seeds, we performed the same procedure using the
same cutoff of 75% seed ratio for selecting “disease”-associated
neighborhoods. The procedure was repeated for 100 times, and the best
result showed only three very small retrieved neighborhoods (a total of
eight genes, including 7 fake seeds and 1 “expanded candidate gene”)
([134]Figure S1G). This result demonstrates that NW exclusively
retrieves disease-associated neighborhoods based on functional
convergence of disease genes. We also performed RWR for 100 times
starting with 1,592 fake seeds using the cutoff 10ˆ-5
([135]Figure S1H). Using fake seeds, RWR retrieved more “expanded
candidate genes” than using real seeds, suggesting that RWR has no
preference for seed genes with functional convergence.
NW-retrieved subnetwork and RWR-retrieved subnetwork have 54.5%
(4,573/8,389) nodes in common ([136]Figure 1F). We took the
intersection of these two subnetworks as the final network for
schizophrenia (SCZ Network) ([137]Figure 1F, [138]Table S4) for
downstream analysis, which contains 1,592 seed genes and 2,981 expanded
candidate genes. This network only has one major component and four
isolated nodes.
Evaluation on the SCZ Network
In order to evaluate the quality of the expanded candidate genes in the
SCZ Network, we used independent datasets to compare SCZ relevance
between the expanded candidate genes and the seed genes. We obtained
the independent SCZ-associated gene sets from three recently published
systematic studies ([139]Table S5), including: (1) genomic variants
associated with schizophrenia ([140]Howrigan et al., 2020; [141]Rees
et al., 2020); (2) dysregulated genes in the postmortem brain of
schizophrenia patients (SCZ-dysregulated genes) ([142]Bowen et al.,
2019). We calculated the recovery ratios of these two SCZ-associated
gene sets by the seed genes, the expanded candidate genes in the RWR-
or NW-extracted subnetworks, the intersection of these two subnetworks,
and the nonseed genes in the background network (the human interactome
network) respectively ([143]Figures 2A and 2B). Compared to the nonseed
genes in the background network, both the seed genes and the expanded
candidate genes in the RWR- or NW-extracted subnetworks show
significant enrichment of SCZ-associated variants (Two-tailed Fisher’s
exact tests, p = 6.17e−26 for seed genes, p = 1.37e−07 for the expanded
candidates by NW, p = 3.27e−07 for the expanded candidates by RWR, p =
1.43e-7 for the expanded candidate genes identified by both,
[144]Figure 2A). In addition, compared to the nonseed genes in the
background network, both the seed genes and the expanded candidate
genes by RWR and NW show significant enrichment of SCZ-dysregulated
genes (Two-tailed Fisher’s exact tests, p = 7.49e−3 for seed genes, p =
3.23e−10 for the expanded candidates by NW, p = 1.03e−21 for the
expanded candidates by RWR, p = 1.83e-13 for the expanded candidate
genes identified by both, [145]Figure 2B). Furthermore, the seed genes
are more enriched with SCZ-associated variants than the expanded
candidates (Two-tailed Fisher’s exact test, p = 2.07e-11 for the
expanded candidate genes by NW, p = 3.61e-15 for the expanded candidate
genes by RWR, p =1.89e-9 for the expanded candidate genes identified by
both, [146]Figure 2A), but show similar enrichment of SCZ-dysregulated
genes compared with the expanded candidate genes (Two-tailed Fisher’s
exact test, p = 0.64 for the expanded candidate genes by NW, p = 0.44
for the expanded candidate genes by RWR, p = 0.14 for the expanded
candidate genes identified by both, [147]Figure 2B). These results
suggest that previous studies might be biased towards genetic variants,
while the RWR or NW methods did not exhibit such bias. Using a core set
of high-quality SCZ candidate genes as seed genes, RWR and NW performed
equally well according to the enrichment of SCZ-variants in the
expanded candidate genes (Two-tailed Fisher’s exact test, p = 0.32 for
SCZ-variants, p = 0.72 for SCZ-DEGs). Furthermore, the expanded
candidate genes by both RWR and NW show greater enrichment of the
SCZ-associated genes than those expanded by RWR alone (Two-tailed
Fisher’s exact test, p = 8.24e-3 for SCZ-associated variants,
[148]Figure 2A) or NW alone (Two-tailed Fisher’s exact test, p =
1.84e-4 for dysregulated genes) ([149]Figure 2B), showing that the
combination of these two methods has better performance than either
one. These results support our decision on choosing the intersection of
RWR-retrieved subnetwork and NW-retrieved subnetwork as high-confidence
SCZ Network.
Figure 2.
[150]Figure 2
[151]Open in a new tab
Validation of the SCZ Network using independent data sets of variants
and dysregulated genes associated with schizophrenia (See also
[152]Figure S2)
(A) Box and whisker graph for the fractions of SCZ-associated variants
in the expanded candidate genes in SCZ Network (SCZ-Net), NW-extracted
subnetwork (NW-Subnet) and RWR-extracted subnetwork (RWR-Subnet),
compared with those in the seeds in human interactome (seeds-HI) and
the nonseeds in human interactome (nonseeds-HI).
(B) Box and whisker graph for the fractions of SCZ-dysregulated genes
in the expanded candidate genes in SCZ Network (SCZ-Net), NW-extracted
subnetwork (NW-Subnet) and RWR-extracted subnetwork (RWR-Subnet),
compared with those in the seeds in human interactome (seeds-HI) and
the nonseeds in human interactome (nonseeds-HI).
(C) Box and whisker graph for the fractions of SCZ-associated variants
in the SCZ Network (SCZ-Net), NW-extracted subnetwork (NW-Subnet) and
RWR-extracted subnetwork (RWR-Subnet), compared with those in HI (human
interactome).
(D) Box and whisker graph for the fractions of SCZ-dysregulated genes
in the SCZ Network (SCZ-Net), NW-extracted subnetwork (NW-Subnet) and
RWR-extracted subnetwork (RWR-Subnet), compared with those in HI (human
interactome). P-values were calculated using Two-sided Fisher’s exact
test. P-values on lines above or below the boxes are calculated from
comparisons between the two groups. Lines extending vertically from the
boxes indicate variability outside the upper and lower quartiles,
estimated using bootstrapping method with 1,000 resamplings.
We further compared the recovery ratios of the SCZ-associated gene sets
by the SCZ Network with those by RWR/NW-extracted subnetworks or the
background network ([153]Figures 2C and 2D). Compared to the background
network, the extracted subnetworks using RWR, NW or both (SCZ Network)
show significant enrichment with SCZ-associated variants (p = 5.23e-18
for RWR-extracted subnetwork; p = 4.52e-22 for NW-extracted subnetwork,
p = 4.81e-24 for SCZ Network, [154]Figure 2C) and significant
enrichment with SCZ-dysregulated genes (p = 4.07e-23 for RWR-extracted
subnetwork; p = 1.02e-11 for NW-extracted subnetwork, p = 1.57e-14,
[155]Figure 2D). Compared to RWR/NW-extracted subnetworks, SCZ Network
showed both increased enrichment with SCZ-associated variants
(Two-tailed Fisher’s exact tests, p = 8.78e-4 compared to NW-extracted
network, p = 1.17e-08 compared to RWR-extracted subnetwork,
[156]Figure 2C) and increased enrichment with SCZ-dysregulated genes
(Two-tailed Fisher’s exact tests, p = 5.03e-3 compared to NW-based
subnetwork, [157]Figure 2D). All these results demonstrate that both
RWR and NW approaches have identified additional SCZ candidate genes,
which show significant enrichment of SCZ-associated genes, and that the
combination of RWR and NW had significantly better performance than RWR
or NW alone, suggesting that the SCZ Network constructed using the
combination of RWR and NW methods may be a highly SCZ-relevant
molecular network.
We further performed enrichment analysis on MetaCore disease biomarkers
of mental disorders for the expanded candidate genes and seed genes in
the SCZ Network. We found that biomarkers for several mental disorders,
including schizophrenia, schizophrenia spectrum disorder, major
depressive disorder, neurodevelopmental disorders, bipolar disorders,
autism spectrum disorder, etc., are significantly enriched with both
the seed genes and the expanded candidate genes ([158]Figure S2,
[159]Table S5). These results further confirm the high quality of the
SCZ Network, and also suggest a potential common molecular mechanism
for the comorbidity of these mental disorders. The expanded candidate
genes and seed genes are enriched with biomarkers for ‘mental
disorders’ in equal magnitude, but the seed genes show greater
enrichment for ‘schizophrenia’, probably due to the bias of
schizophrenia biomarkers towards well known SCZ candidate genes.
SCZ network is enriched in synaptic and immune signaling pathways
In order to explore the molecular mechanisms underlying schizophrenia,
we performed pathway enrichment analysis using MetaCore and KEGG
databases. MetaCore pathway database is a comprehensive pathway
database, and therefore, the enriched MetaCore pathways may provide a
list of potential SCZ-associated pathways in the SCZ Network
([160]Table S6-1), which may help further illustrate the molecular
mechanisms of schizophrenia. We examined literature for the 10 most
significantly enriched pathways, including ROS-induced cellular
signaling, GPCR signaling, MET and RON signaling, cytokine/JAK/STAT
signaling, GDNF signaling, EGFR signaling, etc., ([161]Figure S3A), and
found that each of these pathways is well reported to associate with
both schizophrenia and immunity by multiple studies. Interestingly, all
these pathways are also reported to be involved in synaptogenesis or
synapse modulation (See [162]Table S6-2 for literature information).
These top enriched MetaCore pathways, each suggested by literature to
be associated with immune activation, synaptic regulation, and
schizophrenia, may provide us with a direction for further exploring
the SCZ Network. The enriched KEGG pathways would provide functional
information of the network ([163]Table S6-3). We compared the top
enriched KEGG pathways between the seed genes, the expanded candidate
genes and all genes ([164]Figure 3A). Among these pathways, some
pathways such as ‘MAPK signaling’, ‘cAMP signaling’, ‘ErbB signaling’,
‘GABAergic synapse’, ‘dopaminergic synapse’, ‘Alzheimer's disease’, and
‘circadian entrainment’ are common between the seed genes and the
expanded candidate genes ([165]Figure 3A); some pathways such as
‘calcium signaling’, ‘neuroactive ligand-receptor interaction’ and
‘nicotine addiction’ are only found for seed genes ([166]Figure 3A),
while other pathways such as ‘viral infections’, ‘bacterial
infections’, ‘viral carcinogenesis’, ‘pancreatic cancer’, ‘cellular
senescence’ and ‘ubiquitin-mediated proteolysis’ are found for the
expanded candidate genes ([167]Figure 3A). Interestingly, the pathways
that are only enriched with the expanded candidate genes are mostly
related to immune response and cancer, suggesting that the network
approach may compensate the genetic approaches in identifying some SCZ
candidate genes involved in immunity. All these top enriched KEGG
pathways are documented to associate with schizophrenia or psychiatric
disorders (See [168]Table S6-4 for literature information). The
literature information on the top enriched MetaCore and KEGG pathways
suggests a close link between immune activation, synapse function, and
schizophrenia. It is consistent with previous studies which discovered
the involvement of immune system ([169]Brown, 2012; [170]Brown and
Derkits, 2010; [171]Estes and McAllister, 2016; [172]Fillman et al.,
2013; [173]van Kesteren et al., 2017) and synaptic loss and impaired
synaptic plasticity ([174]Boksa, 2012; [175]Forsyth and Lewis, 2017;
[176]Keshavan et al., 1994; [177]Osimo et al., 2019; [178]Sellgren
et al., 2019; [179]Stephan et al., 2006) in the development of
schizophrenia.
Figure 3.
[180]Figure 3
[181]Open in a new tab
SCZ network is enriched for synaptic and immune function (See also
[182]Figure S3)
(A) Functional enrichment analyses on the SCZ Network for the seed
genes (known candidate genes) and the expanded candidate genes. The
enriched KEGG pathways are those with overrepresentation of genes in
the SCZ Network. Only top enriched KEGG pathways are shown. The sizes
of the dots represent gene ratios and the colors represent the adjusted
p-values. The pathways in immune response are in blue font, the
synaptic signaling pathways (MAPK, cAMP or ErbB signaling pathways) are
in green font; the synapse pathways are in cyan, other pathways related
to cancer, Alzheimer's disease or circadian entrainment are in black.
(B) Synaptic and immune genes in the SCZ Network. Left pie chart is
split into 4 categories: (1) immune genes, (2) synaptic genes, (3) the
genes annotated as both immune and synaptic genes, and (4) other genes.
The bars on the right show the proportions of the seeds and the
expanded candidate genes in categories (i), (ii) and (iii).
(C) Enrichment of synaptic genes in the seed genes and the expanded SCZ
candidate genes compared to the human interactome (p = 4.88e-47 for all
genes in SCZ Network, p = 2.37e-27 for seed genes, p = 2.39e-15 for the
expanded candidates). The seed genes are significantly more enriched
with synaptic genes than the expanded candidates (p = 1.98e-4).
(D) Enrichment of immune genes in the seed genes and the expanded
candidate genes compared to the human interactome (p = 1.44e-42 for all
genes in SCZ Network, p = 0.09 for seed genes, p = 4.22e-47 for the
expanded candidates). The expanded candidate genes are significantly
more enriched immune genes than seed genes (p = 1.37e-09).
P-values were calculated using Two-sided Fisher’s exact test. P-values
on lines above the top of boxes are from comparisons between two
groups. Lines extending vertically from the boxes indicate variability
outside the upper and lower quartiles, estimated using bootstrapping
method with 1,000 resamplings.
Consistent with the pathway enrichment results, the SCZ Network
contains 10.4% (474/4,573) synaptic genes and 24.3% (1,109/4,573)
immune genes ([183]Figure 3B, [184]Tables S6-5–S6-8), showing
significant enrichment (Fisher’s exact tests, p = 4.88e-47 for synaptic
genes, p =1.44e-42 for immune genes, [185]Figures 3C and 3D), compared
to the background network which contains 4.2% (532/12,637) synaptic
genes and 15.0% (1,901/12,637) immune genes. Compared to other genes in
the background network (the human interactome), both the seed genes and
the expanded candidate genes are significantly enriched with synaptic
genes (Fisher’s exact tests, p = 2.37e-27 for seed genes; p = 2.39e-15
for the expanded candidate genes, [186]Figure 3C), but the seed genes
show stronger enrichment than the expanded candidate genes (Fisher’s
exact test, p = 1.98e-4, [187]Figure 3C). However, only the expanded
candidate genes are significantly enriched with immune genes (Fisher’s
exact test, p = 0.09 for seed genes; p = 4.22e-47 for the expanded
candidate genes, [188]Figure 3D). These results demonstrate that apart
from synaptic genes our network approach has captured many additional
candidate genes involving in SCZ-related immune response that have not
been identified in previous genetic studies. The interconnection among
the immune genes, synaptic genes and other genes in the SCZ Network
([189]Figure S3B) suggests a critical role of immune genes in the
pathogenesis of schizophrenia, which provides the basis for systematic
investigation on the molecular mechanisms linking immune activation to
schizophrenia.
Spatiotemporal expression patterns of genes in SCZ network associates with
two susceptible developmental stages
We are interested in understanding whether the spatiotemporal dynamics
of the SCZ Network might reveal the stages and brain regions sensitive
to immune activation and, more importantly, the underlying molecular
pathways. We utilized the expression data from 4 regions of human
brains at different developmental periods (periods 2-13, P2-P13)
([190]Kang et al., 2011; [191]Lin et al., 2015) to calculate the
overrepresentation of the genes in the SCZ Network in different
spatiotemporal phases ([192]Figures 4A and [193]S4; see [194]STAR
Methods and [195]Table S7 for details). The p-values for the
enrichments were calculated using one-side Fisher’s exact test. Region
1 includes: posterior inferior parietal cortex, primary auditory (A1)
cortex, primary visual (V1) cortex, superior temporal cortex, and
inferior temporal cortex; region 2 includes: primary somatosensory
cortex, primary motor cortex, orbital prefrontal cortex, dorsolateral
prefrontal cortex, medial prefrontal cortex, and ventrolateral
prefrontal cortex; region 3 includes: striatum, hippocampus, and
amygdala; region 4 includes: mediodorsal nucleus of the thalamus, and
cerebella cortex ([196]Figure 4A).
Figure 4.
[197]Figure 4
[198]Open in a new tab
Spatiotemporal dynamic expression of the genes in the SCZ Network
during brain development (See also [199]Figure S4)
(A) A schematic of human spatiotemporal expression data source: 4 brain
regions and 10 developmental periods. Region 1 includes: posterior
inferior parietal cortex, primary auditory (A1) cortex, primary visual
(V1) cortex, superior temporal cortex, inferior temporal cortex; region
2 includes: primary somatosensory cortex, primary motor cortex, orbital
prefrontal cortex, dorsolateral prefrontal cortex, medial prefrontal
cortex and ventrolateral prefrontal cortex; region 3 includes:
striatum, hippocampus and amygdala; region 4 includes: mediodorsal
nucleus of the thalamus and cerebella cortex. Developmental periods
including: early fetal development, early mid-fetal development, late
mid-fetal development, late fetal development, neonatal and early
infancy, late infancy, early childhood, middle and late childhood,
adolescence and young adulthood (see [200]Figure S4).
(B–E) Enrichment of different categories of genes of the SCZ Network in
spatiotemporal phases. The heatmap shows the significance levels of
overrepresentation of the genes of the SCZ Network in the
spatiotemporal phases. One-sided Fisher’s exact test was used for the
analyses of enrichments. The numbers in the squares represent the
−log[10](p-value). (B) Enrichment of all genes of the SCZ Network in
spatiotemporal phases; (C) Enrichment of the non-immune genes of the
SCZ Network in spatiotemporal phases; (D) Enrichment of the synaptic
genes of the SCZ Network in spatiotemporal phases; (E) Enrichment of
the immune genes of the SCZ Network in spatiotemporal phases.
The genes in the SCZ Network are significantly enriched in the regions
1 and 2 (the cortex regions) during most of development periods
(log[10]-transformed p-values shown in [201]Figure 4B). Stronger
enrichment is found in region 1 during periods 3-6, 11-13 (P3-6: early
fetal to late mid-fetal development, p = 1.66e-10 ∼ 6.60e-07; P11-13:
middle childhood to young adulthood, p = 8.05e-06 ∼ 9.59e-11), in
region 2 during periods 3-4, 5-7 (P3-4: early fetal to early mid-fetal
development, p = 5.87e-06; P5-7: early mid- to late fetal development,
p = 1.20e-07 ∼ 4.45e-06) and periods 12-13 (adolescence to young
adulthood, p = 3.77e-11) and in region 3 during periods 12-13
(adolescence to young adulthood, p = 4.85e-5) ([202]Figure 4B). The
overrepresentation of the genes in the SCZ Network in the stages of
early- to late fetal development and adolescence to young adult
suggests that this network may be more susceptible to perturbation
during fetal development and adolescence. Since 76% of the genes in the
SCZ Network are nonimmune genes, the nonimmune genes show a very
similar pattern as all genes in the network ([203]Figure 4C).
Accumulating evidence suggests that synaptic plasticity impairment
during development may play a role in the development of schizophrenia
([204]Boksa, 2012; [205]Forsyth and Lewis, 2017; [206]Keshavan et al.,
1994; [207]Osimo et al., 2019; [208]Sellgren et al., 2019; [209]Stephan
et al., 2006). We further explored the spatial temporal expression
pattern of the synaptic genes of the SCZ Network ([210]Figure 4D).
Synaptic genes are significantly enriched in region 1 during periods
2-6 (P2-6: early fetal to late mid-fetal development, p = 7.6e-4 ∼
0.01) and 10-13 (P10-13: early childhood to young adulthood, p =
0.04–0.02); in region 2 during periods 5-6 (P5-6: early-to-late
midfetal development, p = 0.03) and periods 12-13 (P12-13: adolescence
to young adulthood, p = 0.03) ([211]Figure 4E). The overrepresentation
of synaptic genes during periods from early fetal to late mid-fetal
development and from early childhood to young adult suggests these
stages are critical for synaptic development. In addition, these stages
are coincident with the two critical stages for cortex development, in
which cell differentiation and synapse remodeling may be influenced by
the environment factors, respectively ([212]Selemon and Zecevic, 2015).
To further identify the brain regions and developmental stages that are
sensitive to immune activation, we further calculated the
overrepresentation of the immune genes of the SCZ Network in the
spatiotemporal phases. The immune genes are more significantly enriched
in region 1 during periods 5-7, 11-13 (P5-7: early mid- to late fetal
development p = 0.02–0.03; P11-13: middle childhood to young adulthood,
p = 0.02–0.03); in region 2 during periods 6-7 (P6-7: late mid-to late
fetal development, p = 0.007) ([213]Figure 4E). The overrepresentation
of immune genes in this network during both early- to mid-fetal
development and peripuberty suggests that it may be more easily
affected by immune activation during these stages. These results are
consistent previous findings that fetal and peripubertal developmental
stages are the two key stages for schizophrenia: (1) maternal infection
([214]Brown, 2012; [215]Brown and Derkits, 2010; [216]Estes and
McAllister, 2016), maternal stress ([217]Gomes et al., 2019) and
placental diseases ([218]Kendell et al., 1996) during gestation
increase the risk of schizophrenia in the offspring; (2) adolescent
stress is also a risk factor for schizophrenia ([219]Gomes and Grace,
2017). These results suggest that abnormal immune activation during
these stages may perturb the molecular network in the brain,
potentially leading to schizophrenia. On the other hand, many
environmental factors, including infection and stress, induce
peripheral and central immune activation ([220]Uhlhaas, 2011).
Based on network topology, the immune genes, synaptic genes and other
genes are interconnected with each other, and therefore immune
activation may affect brain development and synapse remodeling. Recent
studies show that immune genes in the brain may regulate neuronal
synapse remodeling ([221]Boulanger, 2009; [222]Morimoto and Nakajima,
2019). The complement component 4 (C4) participates in the synaptic
pruning regulation ([223]Sekar et al., 2016). The immune activation
increases synaptic pruning through microglia ([224]Sellgren et al.,
2019). Our results are consistent with previous findings of the
critical role of immune system in brain development ([225]Morimoto and
Nakajima, 2019; [226]Schepanski et al., 2018). Overall, our findings
illustrated the connection between immune activation and synapse
plasticity during peripubertal developmental stages, further confirming
that fetal and peripubertal developmental stages are critical for
schizophrenia development ([227]Mottahedin et al., 2017). Since the
typical onset time of schizophrenia is the transition period from
adolescence to adulthood ([228]Uhlhaas, 2011), it may be possible that
maternal immune activation during fetal development may prime the brain
to be susceptible to a second immune activation at adolescence.
Immune activation during fetal development and puberty dysregulates signaling
pathways in SCZ network
In order to further investigate how immune activation might be involved
in the molecular mechanism underlying schizophrenia, we induced immune
activation twice in mice: first during mid-late fetal development stage
by intraperitoneal injection (IPI) of lipopolysaccharide (LPS) into
pregnant mice at G16 (16^th day after gestation), and a second time
during puberty by IPI of LPS into the male offspring at P30 (30^th day
after birth) ([229]Figure 5A). We used saline as control, and designed
the experiments in three combinations of saline and LPS treatments: (1)
S-S (control group): saline at G16 + saline at P30; (2) L-S: LPS at
G16 + saline at P30; (3) S-L: saline at G16 + LPS at P30; (4) L-L: LPS
at G16 + LPS at P30 ([230]Figure 5A).
Figure 5.
[231]Figure 5
[232]Open in a new tab
Immune activation during fetal development and puberty dysregulates
signaling pathways in SCZ Network (See also [233]Figures S5–S7)
(A) A schematic of experimental design. The first intraperitoneal
injection (IPI) of LPS or saline to the pregnant mice was performed on
G16 (Gestation Day 16) and the second IPI of LPS or saline to the
offspring was performed on P30 (Postnatal Day 30). The combination of
LPS and saline treatments: saline at G16 + saline at P30 (S-L); LPS at
G16 + saline at P30 (L-S); saline at G16 + LPS at P30 (S-L); LPS at
G16 + LPS at P30 (L-L).
(B) The Venn diagram of the DEGs in the L-S group (yellow), S-L group
(cyan) and L-L group (blue) compared with S-S group (p < 0.05 and
|FC|≥1.3).
(C–E) The threshold-free comparison of expression alteration between
three treatment groups by rank-rank hypergeometric overlap (RRHO). The
log[10]-transformed hypergeometric p-values are plotted in a heatmap as
indicated by an accompanying color scale, mapping the degree of the
statistical significance of overlaps between two differential
transcriptomes (two ranked gene lists on the X-axis or Y-axis). Each
pixel in a RRHO map represents the log[10]-transformed hypergeometric
p-value for the overlap of subsections of two ranked lists from a point
on the map (corresponding to a rank threshold pair: Rank X, Y) to
either bottom left corner (Rank 1, 1) or top right corner (Rank N, N).
The numbers on color scales indicate −log[10](p-value). The genes along
each axis are sorted from most significantly up-regulated to most
down-regulated from the lower left corners. The RRHO maps: L-S group
versus L-L group (C); L-S group versus S-L group (D); S-L group versus
L-L group (E).
(F) Over-represented disease biomarkers of mental disorders in the DEGs
of L-S, S-L and L-L groups identified via MetaCore “Diseases (by
Biomarkers)”. The X-axis represents −log10 (FDR). Y-axis lists the
enriched diseases. Vertical red dashed line shows the cutoff of FDR =
0.05.
(G) The node degree distribution of SCZ Network with top hub DEGs
highlighted. Y-axis represents the gene count and X-axis represents the
degree. Top 10% of genes have degrees ≥212 as indicated by the red
line. The 10 DEGs among the top hub genes are marked (The degrees of
all the DGEs in the SCZ Network are listed in [234]Table S9).
(H) The MetaCore pathways enriched with DEGs. The Venn diagrams show
the overlaps between the enriched pathways with DEGs and the enriched
pathways with genes in SCZ Network: 18 of 20 (90%), 69 of 85 (81.2%)
and 95 of 114 (83.3%). A total number of 100 enriched pathways in L-S,
S-L or L-L groups are overlapped with enriched pathways of SCZ Network.
These 100 enriched pathways are clustered into 18 groups by calculating
the distance between two pathways based on shared genes within SCZ
Network. The dot plot shows the enrichment of the pathways enriched
with DEGs in L-S, S-L, L-L groups and with genes in SCZ Network. The
red, yellow, cyan or blue dots show the −log[10] transformed FDR value
of pathways in SCZ network, L-S, S-L, and L-L groups, respectively.
Vertical purple dashed line shows the cutoff of FDR = 0.05. The 18
representative pathways are listed on the bottom right corner. The
information of all pathways is listed in [235]Table S9.
We carried out poly-A selected RNA-seq sequencing on the whole brain at
P32 to evaluate transcriptional changes and then performed the
differential expression analysis using DeSeq2 ([236]Figures S5A and
S5B, [237]STAR Methods). The differentially expressed genes (DEGs)
identified were 170 from L-S group, 474 from S-L group and 927 from L-L
group using S-S group as control (p< 0.05 and |Fold Change| ≥1.3,
[238]Figure 5B, [239]Table S8), suggesting that L-L group have greater
immune response than S-L group or L-S group. The overlap of DEGs is
22.9% (39/170) between L-S and L-L group, while 52.3% (248/474) between
S-L and L-L group ([240]Figure 5B). We then looked at the genome-wide
comparison of differential expression patterns at threshold-free manner
among these three groups using rank-rank hypergeometric pair-wise
comparison ([241]Plaisier et al., 2010), which showed co-regulation of
genes among the three groups ([242]Figures 5C–5E).
We further performed MetaCore disease biomarker enrichment analysis in
the DEGs of L-S, S-L and L-L groups (FDR <0.05). The top enriched
mental diseases include ‘schizophrenia’, ‘schizophrenia spectrum and
other psychotic disorders’, ‘bipolar’, ‘bipolar related disorder’,
‘major depressive disorder’ and ‘depressive disorder’ ([243]Figure 5F,
[244]Table S9-1), suggesting that gene dysregulation by maternal and
adolescent immune activation may underly multiple psychotic disorders.
As described above ([245]Figure S2), the SCZ Network is also enriched
for biomarkers of these same mental diseases. These results not only
confirm the current perspective on the role of immune activation in the
pathogenesis of psychiatric disorders, but also suggest potential
common mechanisms for the comorbidity between these mental disorders
reported in epidemiological studies ([246]Buckley et al., 2009).
To evaluate the potential effects that the gene differential expression
induced by immune activation may have on the SCZ Network, we mapped the
DEGs on the SCZ Network and ranked the node degrees ([247]Figure 5G,
[248]Table S9-2). Of 184 DEGs that are in the SCZ Network, 28 DEGs have
a node degree above the average while 8 DEGs (ACTA2, ALDH1A2, PLK1,
MYH11, CBL, MAP1LC3A, AURKB and RPL18A) are among the top 10% hub genes
([249]Figure 5G). These hub genes may play a key role in linking immune
activation to schizophrenia. The literature mining for these top hub
DEGs identified many studies about their association with
schizophrenia, psychiatric disorders or nervous system related
functions (See [250]Table S9-3 for literature information). For
example, ACTA2 is significantly downregulated in the postmortem cortex
of schizophrenia patients ([251]Mladinov et al., 2016), and
downregulation of ACTA2 inhibits neural stem cell migration ([252]Zhang
et al., 2020). The deletion of 15q11-13, which covers gene ALDH1A2, is
suggested to be involved schizophrenia ([253]Reay and Cairns, 2020).
ALDH1A2 is found to associate with schizophrenia in a large scale
genome-wide association study ([254]Reay et al., 2020).
In order to understand the molecular mechanisms on how the immune
activation during gestation (the first hit) and during puberty (the
second hit) may increase the risk for schizophrenia, we searched for
MetaCore pathways enriched with DEGs of each group. Using a cutoff of
FDR < 0.05, we identified 20 enriched pathways for DEGs in L-S group,
85 for S-L group and 104 for L-L group. Of the enriched pathways, 90.0%
(18/20) in L-S group, 81.2% (69/85) in S-L group and 83.3% (95/114) in
L-L group are overlapping with the enriched pathways in SCZ Network
([255]Figures 5H and [256]S6, [257]Tables S9-4, S9-5, and S9-6). A
total number of 121 enriched pathways are identified in the three
groups of DEGs, and 100 are also enriched in SCZ Network ([258]Table
S9-7). We also carried out pathway enrichment analysis on the seed
genes and the expanded candidate genes of the SCZ Network separately,
and found that the seed genes and the expanded candidate genes had
similar proportion of enriched pathways overlapping with those of DEGs
([259]Figure S7). These results suggest that the majority of the
dysregulated signaling pathways in the brain induced by immune
activation during fetal development and/or during puberty may be
involved in schizophrenia, consistent with the accumulating evidence
that maternal and peripubertal immune activation increase the risk of
schizophrenia ([260]Brown, 2012; [261]Brown and Derkits, 2010;
[262]Estes and McAllister, 2016; [263]Gomes and Grace, 2017; [264]Gomes
et al., 2019; [265]Kendell et al., 1996).
The enrichment of these pathways was compared between the three groups
and the SCZ Network ([266]Figures 5H and [267]S6). L-L group shows
better enrichment in most of these pathways than L-S and S-L groups
([268]Figures 5H and [269]S6), suggesting that maternal immune
activation during fetal development may increase the risk of
schizophrenia after a second immune activation at puberty. In order to
identify the main categories of pathways linking immune activation to
schizophrenia, we clustered the enriched pathways based on shared genes
within SCZ Network ([270]Figures 5H and [271]S6). Eighteen pathway
clusters were identified, and one representative pathway was selected
for each cluster. Some of the clusters, i.e., C2, C5, C17 and C18, are
highly enriched with genes in SCZ Network ([272]Figure S6), suggesting
that the disruption of pathway in these clusters may have more impact
on the molecular network. The “first hit” and the “second hit” may have
different effects on the pathways. The “first hit” alone affects only a
small number of clusters, including the WNT signaling pathways in
clusters C14 and C15 ([273]Figure S6), which are important for fetal
brain development, while the “second hit” alone affects most of the
clusters. On the other hand, the vast majority of clusters except a
few, are more affected by ‘two hits’ (L-L group) than by the “second
hit” alone. Examples of the exception include cluster C7 and C10, which
are more affected by “second hit” (S-L group) than by ‘two hits’ (L-L
group) ([274]Figure S6), suggesting that the “first hit” may have
primed the brain to response differently to the “second hit”. All these
18 representative pathways are well supported to associate with
schizophrenia based on literature (See [275]Table S9-8 for literature
information). For example, pathways for antigen presentation by MHC
([276]International Schizophrenia Consortium et al., 2009;
[277]Pardinas et al., 2018; [278]Schizophrenia Working Group of the
Psychiatric Genomics, 2014; [279]Shi et al., 2009; [280]Stefansson
et al., 2009; [281]Walters et al., 2013), canonical (beta-catenin
dependent) and noncanonical (beta-catenin independent) WNT signaling
([282]Hoseth et al., 2018; [283]Hussaini et al., 2014) are
well-documented to be involved in schizophrenia.
Key regulators for immune activation in mouse brain
The comparison of the brain transcriptome profiles among the L-S, S-L
and L-L groups, suggest that injection of LPS to the mother during
mid-late gestation stage may induce latent sensitization of the brain
to a second injection of LPS to the offspring at adolescence
([284]Figures 5 and [285]S5). We assumed that the response patterns of
the brain to immune activation is established and maintained by the
gene expression programs through transcription factors, cofactors and
chromatin regulators. In order to identify the major transcription
factors (TFs) that may regulate the transcriptional alteration in these
three groups, we searched for enriched TF-binding motifs in the
promoter regions of the DEGs using HOMER ([286]Heinz et al., 2010). We
found a total number of 24 TF-binding motifs corresponding to 26 TFs
after removing the motifs whose corresponding TFs were not expressed in
any of the transcriptomes in L-S, S-L, L-L or control groups
([287]Figures 6A and [288]S8A). These 24 motifs are enriched in
up-regulated genes (ISRE, ZBTB12, MAFA and KLF9), down-regulated genes
(NF1-halfsite, ARNT:AHR, NR2E1, SOX7, SOX17, SMAD4, SIX1, TWIST1, HIF1A
and PAX3:FOXO1) or both (IRF2, IRF8, ZBTB1, PRDM1, SIX2, SPI1, IRF3,
IRF1 and NR2F2): 6 for L-S group, 6 for S-L group and 16 for L-L group
([289]Figures 6A, [290]S8B, [291]Tables S10-1 and S10-2). There are 3
common motifs (ISRE, ARNT:AHR and NF1-halfsite) between L-L and S-L
group, and 1 common motif (IRF2) between L-L and L-S groups, but no
common motifs between L-S and S-L groups ([292]Figures 6A and
[293]S8A).
Figure 6.
[294]Figure 6
[295]Open in a new tab
Key regulators for immune activation in mouse brain (See also
[296]Figure S8)
(A) The 24 binding motifs and corresponding transcription factors (TFs)
predicted for DEGs of L-S, S-L and L-L groups using HOMER. The motif
patterns are shown on the left. The colored bars on the right indicate
the target DEGs in L-S, S-L and L-L groups.
(B) The mRNA expression levels (detected in the bulk RNA-Seq) of the
identified TFs. Dot size shows absolute value of log[2](fold change) in
L-S, S-L and L-L groups. The colors represent the up-regulation (red)
or down-regulation (blue). The asterisks indicate the p-values <0.5.
(C) The 24 transcription factors (TFs) were clustered into 6 groups
based on the correlation between their shared targets using Pearson
correlation coefficient. Red and blue show the comparison pairs with a
higher and lower correlation coefficient respectively. The colored bars
on the top show the groups the DEGs belong to.
(D) The Venn diagrams show the overlaps between the pathways enriched
with the target DEGs of the TF groups and the pathways enriched with
the genes in SCZ Network.
(E) The comparison of the top enriched pathways among the 6 TF groups.
The enrichment of the top 10 enriched pathways in each TF group was
compared with enrichment of these pathways in other TF groups and SCZ
Network. The sizes of the dots represent gene ratios and the colors
represent FDR adjusted p-values.
(F) Integrated protein-protein interactions and regulatory cascades of
the identified TFs. Node size represents the degree. Red, yellow and
grey node colors represent seeds, expanded candidates or other genes in
SCZ Network, respectively. Cyan arrow edges and grey solid edges show
the TF-to-target relationship and protein-protein interaction
respectively.
(G) The node degree distribution of SCZ Network with the identified TFs
highlighted. The 11 of 26 identified TFs are in the SCZ Network.
We checked the expression alteration of the 26 TFs corresponding to
these 24 motifs by examining the transcriptomes of mouse brain of S-S,
L-S, S-L, and LL groups. Of the 26 TF genes, 10 (Irf8, Spi1, Arnt, Nf1,
Smad4, Six1, Six2, Twist1, Klf9 and Foxo1) in L-L group and 5 (Spi1,
Nf1, Six2, Twist1 and Klf9) in S-L group showed significant expression
alteration (Wald test, p < 0.05, [297]Figure 6B, [298]Table S10-3), but
no TFs in L-S group showed significant expression change
([299]Figure 6B, [300]Table S10-3). The genes Irf8, Arnt, Smad4, Six1,
and Foxo1 only showed significant expression change in L-L group, not
in S-L group, suggesting that both “first hit” and “second hit” may be
needed for expression alteration of these 5 TFs. We noticed that
expression levels of some TFs were altered at an extent below the
differential expression cutoff for TFs (p = 0.05) in L-S group. For
example, Six1 showed 1.51-fold upregulation (Wald test, p = 0.13) in
L-S group and a 1.14-fold change (p = 0.69, Wald test) in S-L group,
but a fold change of 2.27 (Wald test, p = 0.001) in L-L group. These
results demonstrated that, “first hit” or “second hit” alone was not
able to significantly alter the expression level of some TFs while the
combination of “two hits” was able to change their expression,
suggesting that the “first hit” may prime the brain to make it more
sensitive to a “second hit”. Such priming phenomenon has also been
observed in early life stress in generating mouse model for depressive
disorder, which makes the individuals more sensitive to social stress
in adult life ([301]Pena et al., 2017).
In order to find potential collaborative regulation among the 26 TFs
during immune activation, we clustered the TF-binding motifs by
calculating the correlation coefficient between two TF-binding motifs
based on their shared target DEGs, and identified 6 groups including 5
clusters and 1 singleton ([302]Figure 6C). We further performed
MetaCore pathway enrichment analysis (FDR <0.05) for the TF targets in
each group and compared the enriched pathways with those identified in
SCZ Network ([303]Tables S10-4–S10-9). The majority of the enriched
pathways were also enriched with SCZ Network: Group I: 88.9% (40/45);
Group II: 98% (49/50); Group III: 86.8% (59/68); Group IV: 88.5%
(46/52); Group V: 80% (20/25); Group VI: 75% (3/4). We compared the top
10 enriched pathways from each group and found that each of these TF
groups regulate both distinct and common pathways ([304]Figure 6E). For
example, ‘Antigen presentation by MHC class I: cross-presentation’ is
shared by Group 1 and Group 3, and ‘Immune response_IFN-gamma in
macrophages activation’ is unique to Group 1. Most of these top
enriched pathways (73%) are reported to be associated with
schizophrenia or psychotic disorders (See [305]Table S10-10 for
literature information). Therefore, together with the pathway clusters
identified for the 100 SCZ-associated pathways ([306]Figures 5H and
[307]S6; [308]Table S9), these 6 TF groups and their target pathways
([309]Figures 6C–6E; [310]Table S10) provide a framework for studying
the immune activation-induced transcriptional regulation of the
pathways involved in schizophrenia.
In order to understand how the TFs may regulate each other to
coordinate the immune responses after the “first hit” and/or the
“second hit”, we mapped potential transcriptional regulation among
these TFs using HOMER motif scan results for the promoter regions of
the TF genes, and also integrated with the protein-protein interactions
among these TFs using interaction data of the SCZ Network. This network
contains 14 transcriptional regulatory relationships and 8
protein-protein interactions among 15 TFs ([311]Figure 6F), where SIX1
and SIX2 are TF-TF regulatory hubs, while SMAD4 is protein interaction
hub, suggesting that SIX1, SIX2 and SMAD4 may play critical role in
immune activation (see [312]Table S10-11 for literature information).
SIX1 can be regulated by 5 TFs (SIX2, PRDM1, IRF8, IRF3 and SPI1) while
SIX2 can regulate three genes (SIX1, TWIST1 and IRF8) and can be
regulated by itself and two other TFs (ZBTB18 and NR2F2). As described
above, SIX1 is only upregulated in mice with LPS treatments during both
gestation and puberty ([313]Figure 6B), suggesting that the priming by
“first hit” during embryonic development may be necessary for the
transcriptional upregulation of this genes after the “second hit”
during puberty. It is reported in a recent study that SIX1 and SIX2
control the activity of inflammatory genes such as interferon (IFN) in
response to noncanonical NF-κB pathway activation to protect cell from
inflammatory damage ([314]Liu et al., 2019). Multiple studies also
reported the dysregulation of NF-κB signaling pathway in cortical
immune activation in schizophrenia ([315]Roussos et al., 2013;
[316]Volk et al., 2019). As an interaction hub ([317]Figures 6F and
6G), SMAD4 binds 4 TFs (IRF3, NR2F2, NF1 and FOXO1). IRF3 ([318]Xu
et al., 2014), NR2F2 ([319]Qin et al., 2013), NF1 ([320]Kretova et al.,
2014), and FOXO1 ([321]Ellis et al., 1989), each can bind to SMAD4 to
suppress SMAD transcriptional complexes, which is at the core of the
TGF-β signaling pathways ([322]Mullen et al., 2011). Genetic variants
in TGF-β are found to associate with schizophrenia ([323]Jia et al.,
2010) while TGF-β signaling promotes neuronal axon formation ([324]Yi
et al., 2010) and dendritic growth ([325]Luo et al., 2016). All these
lines of evidence support our prediction on the critical roles of SIX1,
SIX2, and SMAD4 in the immune activation associated with schizophrenia.
Systematic empirical investigation into the roles of these
transcription factors and signaling pathways in an immune response may
illustrate a more detailed molecular mechanisms on how immune
activation in the brain may lead to schizophrenia.
Discussion
The most significant advances in searching for risk factors of
schizophrenia over the last decade include: first, a large number of
schizophrenia-associated genes have been identified by genome wide
studies ([326]Bowen et al., 2019; [327]Fromer et al., 2014;
[328]Howrigan et al., 2020; [329]Pardinas et al., 2018; [330]Rees
et al., 2020; [331]Schizophrenia Working Group of the Psychiatric
Genomics, 2014; [332]Xu et al., 2011); and second, immune activation
has been repeatedly proven to be associated with this disease
([333]Brown, 2012; [334]Brown and Derkits, 2010; [335]Fillman et al.,
2013; [336]Goldsmith et al., 2016; [337]Miller et al., 2011; [338]van
Kesteren et al., 2017). Proinflammatory cytokine levels have been
consistently reported to be increased in the serum of SCZ patients
([339]Goldsmith et al., 2016; [340]Miller et al., 2011) and also in
their prefrontal cortex ([341]Fillman et al., 2013). Maternal infection
or stress during pregnancy greatly increases the risk of schizophrenia
in the offspring ([342]Brown, 2012; [343]Brown and Derkits, 2010;
[344]Estes and McAllister, 2016; [345]Gomes et al., 2019), while
adolescent stress is also reportedly a risk factor ([346]Gomes and
Grace, 2017; [347]Harrop and Trower, 2001; [348]Trotman et al., 2013).
The “environment-environment interaction” model of the “two-hit”
hypothesis is proposed explanation for the etiology ([349]Khan and
Powell, 2018). What is needed is a systematic way to illustrate the
underlying molecular mechanism, particularly how immune system
contributes to the disease onset. Here we presented a systematic search
for molecular pathways linking immune activation to schizophrenia by
extracting a high-quality protein interaction network for schizophrenia
(SCZ Network) using our in-house developed “neighborhood walk” (NW)
approach in combination with “random walk with restart” (RWR).
Starting from a core set of 1,720 high-confidence SCZ candidate genes,
we extracted the SCZ Network from a high-quality comprehensive human
interactome network ([350]Li et al., 2017) using NW and RWR
([351]Figure 1). This is the first time that NW approach is applied in
searching for disease-associate neighborhoods in a molecular network.
NW and RWR show comparable performance ([352]Figure 2). Although both
RWR and NW start walking from seeds in the network, they retrieve
disease candidate genes based on different principles. RWR walks from a
seed to all directions with a restart probability, to examine the
proximity between each individual seed and all the other nodes based on
the topology of the network. Therefore, RWR does not depend on the
functional convergence of the disease genes. Starting from randomly
picked degree-controlled fake seeds, RWR retrieved more “disease”
candidate genes than starting from real seeds ([353]Figure S1H),
probably because the fake seeds were not functionally related and thus
were topologically scattered in the network. However, NW walks from a
seed to explore the disease association of the neighborhood. It depends
on the functional convergence of the seed genes in neighborhoods of the
network. Starting with randomly picked degree-controlled fake seeds, NW
was not able to retrieve sizable “disease”-associated neighborhoods
([354]Figure S1G), because functionally unrelated genes were scattered
in the network, not clustered in neighborhoods of the network. The
common nodes of these two subnetworks show stronger enrichment with
SCZ-associated genes than either subnetwork alone ([355]Figure 2).
Therefore, we selected the common genes identified by these two
methods, which generated a more reliable molecular network for
schizophrenia than each method alone. The expanded candidate genes and
the seed genes in the SCZ Network have enrichment of equal magnitude
with the SCZ-dysregulated genes in the brain ([356]Figure 2B),
confirming the high quality of the SCZ Network. MetaCore and KEGG
pathway enrichment analyses identified underlying signaling pathways
([357]Tables S6-1 and S6-3). We examined the literature for the top ten
enriched MetaCore pathways (see [358]Table S6-2 for literature
information) and found that all these top pathways are involved in
schizophrenia and immune regulation. Most interestingly, these same
pathways are also reported to be involved in synaptogenesis and synapse
modulation. The literature search for the top enriched KEGG pathways
reached similar conclusion (see [359]Table S6-4 for literature
information). These findings together suggest that the molecular
signaling pathways in the SCZ Network that are involved in immune
regulation may contribute to the development of schizophrenia through
regulating the formation and modulation of synapses. These results not
only further confirm the high quality of the SCZ Network, but also
suggest that molecular processes of immune activation and synapse
remodeling might be a fundamental mechanism for schizophrenia.
The spatiotemporal expression pattern of the genes of the SCZ Network
in the brain during development demonstrates that this molecular
network is overrepresented in the expressed genes during both “early-
to late fetal development” and “adolescence to early adulthood”
([360]Figure 4B), suggesting that this molecular network may be more
susceptible to perturbation during these developmental stages.
Additionally, immune genes are more likely to be expressed in cortex
regions from mid- to late fetal development and from adolescence to
early adulthood ([361]Figure 4E), further suggesting that immune
activation during these two sensitive developmental stages may disturb
the molecular network. The spatiotemporal expression pattern of the SCZ
Network is coincident with the structural and functional abnormalities
found in the cortex of the schizophrenia patients ([362]Karlsgodt
et al., 2010), and with the developmental stages of cortex in animal
models ([363]Selemon and Zecevic, 2015), suggesting that the normal
function of this molecular network may be involved in the structural
and functional development of cortex regions, and that immune
activation at these critical stages may disturb the cortex development,
leading to schizophrenia. The spatiotemporal expression pattern of the
immune genes in the SCZ Network fits well with “environment-environment
interaction” model of the “two-hit” hypothesis on the etiology
([364]Khan and Powell, 2018), and it is also consistent with the
clinical finding on increased immune response in patients ([365]Fillman
et al., 2013; [366]Goldsmith et al., 2016; [367]Miller et al., 2011),
suggesting a “twice immune-activation” mechanism for this model.
To explore “twice immune-activation” mechanism, we have built a
“two-hit” mouse model using LPS to mimic infection during gestation and
during puberty. We induced immune activation twice in mice, first
during mid-late fetal development stage by intraperitoneal injection
(IPI) of lipopolysaccharide (LPS) in to the pregnant mice at G16 (16^th
day after gestation), and a second time during puberty by IPI of LPS
into the male offspring at P30 (30^th day after birth)
([368]Figure 5A), and carried out transcriptome sequencing on the brain
at P32 to explore the differentially expressed genes (DEGs)
([369]Figure 5B). The DEGs of L-S, S-L or L-L groups are enriched in
biomarkers of ‘schizophrenia’ and other mental disorders such as
‘bipolar’, ‘bipolar related disorder’, ‘major depressive disorder’ and
‘depressive disorder’ ([370]Figure 5F), suggesting that immune
activation may underlie multiple psychotic disorders. The
differentially expressed genes in three treatment groups are enriched
in a total of 121 pathways ([371]Tables S9-4–S9-6). Surprisingly, 82.6%
(100/121) of the enriched pathways of the differentially expressed
genes induced by immune activation are also identified in SCZ Network
([372]Table S9-7). These pathways are clustered into 18 groups
according to their shared genes in SCZ Network ([373]Figures 5H and
[374]S6). All the representative pathways from the 18 clusters are
documented to associate with schizophrenia or other psychiatric
disorders ([375]Table S9-8), providing a framework for investigating
the mechanisms on how immune activation in the brain may lead to
schizophrenia.
The mice with “two hits” showed greater transcription alteration than
the mice with the “first hit” or the “second hit” alone
([376]Figure 5B, [377]Table S8), suggesting that the “first hit” may
have effects on the brain to make it more sensitive to the “second
hit”. We searched for enriched binding motifs in the promoter regions
of the differentially expressed genes and found 24 TF-binding motifs
corresponding to 26 transcription factors ([378]Figure 6A). These
transcription factors can be clustered into 6 groups, each regulating a
group of pathways ([379]Figure 6C). Majority of pathways identified for
each group of TFs overlap with the signaling pathways identified in the
SCZ Network ([380]Figure 6D; [381]Tables S10-4–S10-9), further
providing a framework for studying the key regulators of these
immune-activated pathways linking to schizophrenia. The network of TFs
with regulatory edges and protein interaction edges suggest that SIX1
and SIX2 are regulatory hubs and SMAD4 is an interaction hub
([382]Figures 6F and 6G). SIX1 and SIX2 are reported to be involved in
cortical immune activation through NF-κB pathway ([383]Liu et al.,
2019; [384]Roussos et al., 2013; [385]Volk et al., 2019). SMAD4
transcription factor is at the core of the TGF-β signaling pathways
([386]Mullen et al., 2011), regulating neuronal axon formation ([387]Yi
et al., 2010) and dendritic growth ([388]Luo et al., 2016).
In summary, in order to lay the foundation for studying molecular
mechanism underlying schizophrenia, we consolidated a core set of
high-confidence SCZ candidate genes and then constructed a high-quality
SCZ Network from a comprehensive interactome dataset using
“neighborhood walk” and “random walk with restart”. The dynamic
spatiotemporal expression patterns of the genes of the SCZ Network in
human brain revealed “mid- to late fetal development” and “adolescence
to early adulthood” as critical stages when the molecular network is
susceptible to environmental insults. The top enriched pathways
suggested a tight link between immune activation, synapse remodeling
and schizophrenia. The transcriptome sequencing on the brain of mice
with immune activation at late fetal developmental stage or puberty
revealed that over 80% of the dysregulated pathways by immune
activation overlap with pathways in the SCZ Network. The dysregulated
pathways further supported the molecular link between immune signaling,
synaptic function, and schizophrenia. HOMER scan on the promoter
regions of the dysregulated genes identified a small number of
potential key regulators, such as SIX1, SIX2, and SMAD4, of the
SCZ-associated immune activation in the brain. Our work provides a
strategy for illustrating the molecular mechanism underlying complex
psychiatric disorders and a rich resource for disease candidate genes
as well as systematically evaluated signaling pathways to study the
pathogenesis and therapeutics of schizophrenia.
Limitations of the study
This study only provides a landscape of genes and pathways that may be
involved in the development of schizophrenia, especially the molecular
pathways and regulators that may link the immune activation to this
disease. A limitation of our study is the lack of deeper understanding
of the identified pathways and key regulators using cell models and
mouse models. Another limitation is that our in-house developed NW
method has not been tested in different datasets. In theory, NW
retrieves disease candidate genes based on their convergence in the
neighborhoods of the network, and therefore functionally unrelated
genes may be left out of the selected disease-associated neighborhoods.
Although we have demonstrated the validity of NW in retrieving SCZ
candidate genes, this method needs to be further tested in the study of
other complex diseases.
STAR★Methods
Key resources table
REAGENT or RESOURCE SOURCE IDENTIFIER
Chemicals, peptides, and recombinant proteins
__________________________________________________________________
Lipopolysaccharide Sigma-Aldrich L2630-10MG, lot: #099M4002V
__________________________________________________________________
Critical commercial assays
__________________________________________________________________
RNeasy Plus Universal Mini Kit Qiagen Cat.NO.73404
__________________________________________________________________
Deposited data
__________________________________________________________________
Raw and analyzed data This paper [389]GSE156568
Mouse reference genome, GRCm38 Genome Reference Consortium
[390]https://www.ncbi.nlm.nih.gov/grc/mouse
__________________________________________________________________
Experimental models: organisms/strains
__________________________________________________________________
Mouse: C57BL/6J Southern Medical University N/A
__________________________________________________________________
Software and algorithms
__________________________________________________________________
R (version 3.6.0) The R Project for Statistical Computing
[391]https://cran.r-project.org/bin/windows/base/old/
RandomWalkRestartMH (version 1.6.0) [392]Valdeolivas et al., 2019
[393]http://bioconductor.org/packages/RandomWalkRestartMH/
Neighborhood walk This paper [394]Supplemental information
STAR (version 2.5.2b). [395]Dobin et al., 2013
[396]https://github.com/alexdobin/STAR/releases
HTSeq (version 0.11.2) [397]Anders et al., 2015
[398]https://htseq.readthedocs.io/en/master/
HOMER (version v4.11.1) [399]Heinz et al., 2010
[400]http://homer.ucsd.edu/homer/
__________________________________________________________________
Other
__________________________________________________________________
Resource data for selecting core set of high-confidence SCZ candidate
genes This paper [401]STAR Methods and [402]supplemental information
Construction and validation of SCZ Network, related resources This
paper [403]STAR Methods and [404]supplemental information
SCZ Network analyses and related resources This paper [405]STAR Methods
and [406]supplemental information
RNA-seq data analyses and related resources This paper [407]STAR
Methods and [408]supplemental information
[409]Open in a new tab
Resource availability
Lead contact
Further information and requests for resources and reagents should be
directed to and will be fulfilled by the lead contact, Xinping Yang
([410]xpyang1@smu.edu.cn)
Materials availability
This study did not generate new unique reagents. All renewable reagents
and detailed protocols will be available upon request.
Experimental model and subject details
Animal
The C57BL/6 mice were housed under standard conditions (12 h/12 h
light/dark cycle, access to dry food and water ad libitum). All the
experimental procedures involving mice were approved by the Committee
on Animal Care and Use at the Southern Medical University.
Immune activation in mice during mid-fetal stage and adolescent stage
Immune activation was induced twice in mice, first during mid-late
fetal development stage by intraperitoneal injection (IPI) of
lipopolysaccharide (LPS) (150ug/kg) in to the pregnant mice at G16
(16th day after gestation) and a second time during puberty by IPI of
LPS into the male offspring at P30 (30th day after birth). We then
carried out poly-A selected RNA-seq sequencing on the brain at P32 to
evaluate transcriptional changes. We used saline as control for LPS,
and set the experiments in three combination of saline and LPS
treatments: (1) S-S (control group): saline at G16 + saline at P30; (2)
L-S: LPS at G16 + saline at P30; (3) S-L: saline at G16 + LPS at P30;
(4) L-L: LPS at G16 + LPS at P30. Lipopolysaccharide (LPS) was
purchased from Sigma, USA.
Method details
Collection of Schizophrenia candidate genes
A total number of 1,720 schizophrenia candidate genes (protein coding
genes) were collected from 7 methods in 87 studies pooled by SZDB: copy
number variations (CNVs), differential expressed genes (DEGs),
differential methylated genes (DMGs), whole exome sequencing (WES),
genome-wide association studies (GWASs), imputation and linkage
([411]Table S1). For genes from CNVs, DMGs and WES, only genes
supported by two or more publications are kept. For genes from DEGs,
only genes with FDR ≤0.05 are kept.
Human interactome data
The comprehensive human interactome data were downloaded from InBioMap
([412]https://www.intomics.com/inbio/map.html) ([413]Table S2).
Random walk with restart (RWR)
RWR was performed using R package “RandomWalkRestartMH” (version 1.6.0)
([414]Valdeolivas et al., 2019). The cutoff score ≥ 1e-05 from RWRM
results was used for identifying disease candidate genes.
Neighborhood walk (NW)
NW was performed using in-house developed algorithm.
* i.
We first mapped the 1,720 high-confidence SCZ candidate genes as
“seeds” onto the human interactome network ([415]Table S2).
* ii.
For each seed, we assigned a risk score (R-score = 1-1.4^−x, x is
the number of methods used in identifying the gene as SCZ candidate
gene in literature). We used “R-score = 1-1.4^−x” for scoring seeds
because it was shown to give seeds proper weights ([416]Calderone
et al., 2013). For each nonseed, we calculated a guilt score
(G-score = the average R-score of its interactors) ([417]Table S3).
* iii.
For each edge, we calculated an edge weight (W = the sum of the
R-score and/or G-score of the two interacting nodes) and a weighted
edge length (WEL) (WEL = 1/W) ([418]Table S3).
* iv.
For any two nodes in the network, we calculated a weighted shortest
distance (WSD):
[MATH: WSD=∑(1<
/mn>/Wi
) :MATH]
(1/Wi is the weighted edge length of an edge in a given shortest
path) ([419]Figure S1E).
* v.
For each seed as an origin, we started the “walking” from the
origin and stopped at a potential neighborhood border (a sphere
with certain walking distance) if further “walking” to next sphere
made the seed ratio in the neighborhood drop significantly,
(walking distance between two spheres is 1.0) ([420]Figure S1E);
* vi.
We obtained 1,592 ranked neighborhoods (one for each seed) based on
seed ratio ([421]Table S3). The neighborhoods with seed ratio from
75% to 100% were considered as SCZ-related network neighborhoods.
* vii.
We extracted an NW-retrieved Subnetwork by fusing the 1,043
neighborhoods with the seed ratio ≥75%, which contains 1,592 seed
genes (high-confidence candidate genes) and 3,858 additional SCZ
candidate genes (nonseed genes in the network).
Independent datasets for validation of SCZ network
In order to compare the SCZ relevance of the expanded candidate genes
and seed genes in SCZ network, we calculated the enrichment of with
dysregulated genes or variants identified in SCZ patients collected
from recent literature. These data sources have not been used for
collection of seed genes and therefore they contain SCZ-associated
genes (dysregulated genes or genes harboring variants) unbiased towards
seeds or nonseeds. A total number of 1,682 brain differentially
expressed genes (DEGs) ([422]Table S5) between schizophrenia patients
and normal subjects were collected from the study by [423]Bowen et al.
(2019), and a total number of 4,217 exome variants ([424]Table S5)
identified in the postmortem brains of SCZ patients were collected from
the studies by [425]Howrigan et al. (2020) and [426]Rees et al. (2020).
The three studies are listed below:
* 1
.Bowen, E. F. W. et al. DLPFC transcriptome defines two molecular
subtypes of schizophrenia. Transl Psychiatry 9, 147,
[427]https://doi.org/10.1038/s41398-019-0472-z (2019).
* 2
.Howrigan, D. P. et al. Exome sequencing in schizophrenia-affected
parent-offspring trios reveals risk conferred by protein-coding de
novo mutations. Nature neuroscience 23, 185-193,
[428]https://doi.org/10.1038/s41593-019-0564-3 (2020).
* 3
.Rees, E. et al. De novo mutations identified by exome sequencing
implicate rare missense variants in SLC6A1 in schizophrenia. Nature
neuroscience 23, 179-184,
[429]https://doi.org/10.1038/s41593-019-0565-2 (2020).
Tools and data sources for functional enrichment analysis
* 1
.KEGG pathway enrichment analysis were performed using R package
“clusterProfiler” (version 3.14.3) ([430]Yu et al., 2012).
* 2
.Disease Enrichment was performed using MetaCore Disease (by
biomarker) from Clarivate Analytics (version 20.2).
* 3
.Collection of immune genes: 3,904 immune genes ([431]Table S6)
were collected from:
+ i.
InnateDB (Innate Immunity Genes,
[432]https://www.innatedb.com) ([433]Breuer et al., 2013);
+ ii.
GO-immune-response
([434]http://amigo.geneontology.org/amigo/term/GO:0006955);
+ iii.
ImmPort:
([435]http://www.immport.org/immport-open/public/reference/gen
elists);
+ iv.
Nanostring's Immune Profiling Panel Probe
([436]https://www.nanostring.com/products/gene-expression-pane
ls/gene-expression-panels-overview/hallmarks-cancer-gene-expre
ssion-panel-collection/pancancer-immune-profiling-panel?jumpto
=SUPPORT);
+ v.
IMGT/GENE-DB (the international ImMunoGeneTics information
system): ([437]http://www.imgt.org/genedb/).
* 4
.Collection of synaptic transmission genes: 1,068 synaptic
transmission genes ([438]Table S6) were collected from GO
([439]http://amigo.geneontology.org/amigo/landing) using key word
“synaptic transmission”.
The spatiotemporal expression of genes in human brain
The brain transcriptome data was generated across 13 dissection stages
([440]Kang et al., 2011). Spatiotemporal co-expression was calculated
according to a previous study ([441]Lin et al., 2015), as described
below. Since well-defined anatomical brain structures are limited
during early embryonic development, the first period (4-8
postconceptional weeks, PCW) was removed from further analysis. The
dissection stages were merged into ten partially-overlapping
developmental periods ranging from 8 PCW to 40 years of age
([442]Figures 4A and [443]S4). Brain regions were grouped into four
clusters using hierarchical clustering based on brain transcriptional
similarity to reflect actual topological proximity and functional
segregation ([444]Willsey et al., 2013). As a result, 38 spatiotemporal
regions were defined after eliminating one region from P2-P4 (P2-4, R4)
due to lack of transcriptome data. Co-expression networks were used to
identify phase-specific functional relationships between genes. A
co-expression between two genes was defined as positive if the
pair-wise Pearson Correlation Coefficient (PCC) value was > 0.8. Using
this approach, 38 different spatiotemporal co-expression networks were
generated and the co-expressed gene pairs were used for the enrichment
analyses of SCZ network ([445]Table S7).
RNA preparation
For each group, six pregnant mice were treated saline or LPS (maternal
immune activation); and the male offspring were treated with saline or
LPS (immune activation at puberty). For each group, four whole-brain
RNA samples were prepared. Each RNA sample was extracted from the whole
brain of one adult mouse according to the manufacturer's protocol
(RNAeasy Mini Kit, Qiagen, Germantown, MD, USA). The quality and yield
of the isolated RNAs was assessed using a NanoDrop Spectrophotometer
(Thermo Fisher Scientific, Waltham, MA, USA) and Agilent 2100
Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Only RNAs
with a high RNA integrity number (RIN > 9) were selected and used for
the subsequent sequencing.
RNA-Seq data analysis and differential motifs
RNA sequencing was performed at Berry Genomics (Beijing, China) using
Illumina NovaSeq. Sequenced reads were mapped to GRCm38 whole genome
using STAR_2.5.2b. Raw counts are calculated by HTSeq-count version
0.11.2. Pair-wised analyses are performed, genes with >3 raw counts in
all four samples of case or control were considered as expressed genes
and used for downstream analysis ([446]Table S8). The principal
components analysis (PCA) is performed by plotPCA function using
normalized count by variance stabilizing transformation function of
DEseq2 (v1.20.0).
Differential expression analysis on two groups was performed using the
DESeq2. Differentially expressed genes (DEGs) were determined using a
cutoff of p < 0.05 and |FC|≥1.3 for DESeq2 ([447]Table S8). Functional
annotation for pathways was performed using MetaCore from Clarivate
Analytics (version 20.2).
The threshold-free comparison of expression alteration between
different treatment groups was carried out using R package “RRHO”
(version 1.26.0). Full threshold-free gene list of differential
transcriptomes (L-S group, S-L group or L-L group compared to S-S
group) were ranked by the −log[10](p-value) multiplied by the sign of
the fold change from the DESeq2 analysis.
The log[10]-transformed hypergeometric p-value are plotted to map the
degree of the statistically significant overlap between two
differential transcriptomes (two ranked gene lists). Each pixel in a
RRHO map represents the signed log[10]-transformed hypergeometric
p-value for the overlap of subsections of two ranked lists from a point
on the map (corresponding to a rank threshold pair: Rank X, Y) to
either bottom left corner (Rank 1, 1) or top right corner (Rank N, N).
Genes along each axis are sorted from most significantly up-regulated
to most down-regulated from the lower left corners.
Pathways are clustered by calculating the distance between two pathways
based on shared genes within SCZ Network using dist() function with
method “euclidean” and hclust() function with method “ward.D” in R
package “stats” (version 3.6.3). The dendrogram tree was plot using R
packages dendextend (version 1.14.0) and ggdendro (version 0.1-20).
HOMER (version v4.11.1) motif analysis was used to predict differential
motifs (enriched transcription factor-binding sites) located within
−2,000 to +1,000 bases of the TSS of DEGs.
TF correlation was calculated by their shared target DEGs using cor()
function with method “pearson” in R package “stats” (version 3.6.3).
Then plot was performed by corrplot package (version 0.84) using
hierarchical clustering order.
Quantification and statistical analysis
The p-values in the enrichment analysis were calculated using Fisher's
Exact Test. And the p-values in the differentially expressed analysis
were calculated using Wald significance tests.
Acknowledgments