Abstract
Mental health disorders emerge from complex interactions among
neurobiological processes across multiple scales, which poses
challenges in uncovering pathological pathways from molecular
dysfunction to neuroimaging changes. Here, we proposed a multiscale
fusion (mFusion) method to evaluate the relevance of each gene to the
neuroimaging traits of mental health disorders. We combined
gene-neuroimaging associations with gene-positron emission tomography
(PET) and PET-neuroimaging associations using protein-protein
interaction networks, where various genes traced by PET maps are
involved in neurotransmission. Compared with previous methods, the
proposed algorithm identified more disease genes on both simulated and
empirical data sets. Applying mFusion to eight mental health disorders,
we found that these disorders formed three clusters with distinct
associated genes. In summary, mFusion is a promising tool of
prioritizing genes for mental health disorders by establishing
gene-PET-neuroimaging pathways.
Subject terms: Computational models, Gene expression
__________________________________________________________________
We introduced mFusion, a method that integrates gene and neuroimaging
data to identify disease-related genes in mental disorders. By
analyzing gene interactions and PET data, mFusion successfully clusters
disorders and highlight critical gene pathways.
Introduction
Mental health disorders, constituting 16% of the global burden of
diseases, rank among the leading causes of disability worldwide^[30]1.
In severe cases, they can diminish life expectancy by 10 to 20
years^[31]2. Despite substantial progresses in understanding molecular
mechanisms of brain functions in animal models, the rate of successful
clinical translations to humans remains notably low^[32]3. The primary
obstacle lies in the current knowledge gap between molecular
processes^[33]4 and psychiatric symptoms. There exist many complex
interactions across multiple scales from genes, through
neurotransmitters, to neural networks. This complexity is compounded by
the challenge of concurrently collecting multiscale data within the
human brain. As human brain data rapidly accumulate but separately at
various scales, there is an urgent need for dedicated analytic method
to integrate these data comprehensively, enabling the discovery of
insights into mental health disorders.
At present, some public collection databases can identify
disease-related genes, such as DisGeNET^[34]5 and CTD (Comparative
Toxicogenomics Database)^[35]6, but they lack the capacity to establish
connections with neurotransmitter systems or pathways. Both gene
differential expression analysis and Genome-wide association study
(GWAS) analysis fall short in addressing this challenge^[36]7, with
limited coverage of disease phenotypes. Partial Least Squares (PLS)
regression analysis can establish associations between genes and
imaging phenotypes based on spatial molecular distribution patterns in
the brain^[37]8,[38]9. However, it can only perform pairwise
correlation analysis, necessitating a method to facilitate the
establishment of cross-scale pathway associations.
Neuroimaging studies have identified various alterations in
neuroimaging features of human brains associated with mental health
disorders, i.e., spatial distributions of alterations across different
brain regions in psychiatric patients compared with healthy
controls^[39]10. Leveraging transcriptomic data from postmortem brain
tissues^[40]11, researchers have initiated efforts to correlate
neuroimaging features with gene expressions, prioritizing relevant
genes and molecular pathways^[41]12. In this way, genes associated with
neurodevelopment, neuroplasticity, and neurotransmission have been
implicated in autism spectrum disorder (ASD)^[42]9 and schizophrenia
(SCZ)^[43]13. Despite these progresses, a significant knowledge gap
persists between gene expressions and neuroimaging traits. Recently,
positron emission tomography (PET) studies have started to reveal
spatial associations between neurotransmitter receptors/transporters
and structural/functional traits of mental health disorders in the
human brain^[44]14,[45]15. Leveraging neurotransmissions revealed by
PET images, this study aims to establish biological bridges for the gap
between gene expressions and neuroimaging traits for mental disorders.
The disease related genes are defined by 4 curated disease gene
databases listed in Table [46]1.
Table 1.
Four gene-disease databases
Database # of SCZ risk genes # of ASD risk genes Collection date URL
DisGeNet 2872 (score > 0) 1071 (score > 0) June, 2020 (v7.0)
[47]https://www.disgenet.org/
CTD 2875 (score > 15.28) 1071 (score > 29) June 30, 2023 (17123)
[48]https://ctdbase.org/
DISEASES 1548 (Z > 3) 211 (Z > 3) March, 2015
[49]https://diseases.jensenlab.org/Downloads
PGC-GWAS 380 (p < 5e-8) 56 (p < 5e-4)
SCZ:2022^[50]57/
ASD:2019^[51]53
[52]https://pgc.unc.edu/for-researchers/download-results/
[53]Open in a new tab
This study proposes a multiscale fusion (mFusion) method to bridge
genes to mental disorders through establishing links between gene
expressions in brain tissues, neurotransmissions, and neuroimaging
traits of these disorders. Leveraging the knowledge in the
protein-protein interaction (PPI) network made available by the STRING
database^[54]16, mFusion provides a tool for integrating 15,408 gene
expression maps from the Allen Human Brain Atlas (AHBA)^[55]17,[56]18,
45 PET maps across various neurotransmitter
systems^[57]14,[58]19,[59]20, and neuroimaging traits associated with
mental disorders. Performances of mFusion were first evaluated by
numerical simulations, and then demonstrated by applying to
neuroimaging traits of two mental disorders (i.e., autism^[60]9 and
schizophrenia^[61]13). The ENIGMA (Enhancing NeuroImaging Genetics
through Meta-Analysis) consortium has reported neuroimaging traits for
mental disorders by analyzing thousands of neuroimaging scans^[62]21.
Using these neuroimaging traits, mFusion enabled us to reveal the
clustering structure for eight major mental disorders.
Results
Overview of mFusion framework
In this study, the mFusion integrated gene expressions in brain tissues
and PET maps for specific proteins (related to the receptors,
transporters, or release of neurotransmitters) within a PPI network, to
link neuroimaging traits to genes (Fig. [63]1; Additional file 1:
Fig. [64]S1) through proteins (measured by PET maps; Table [65]2;
Supplementary Table [66]S1). First, we examined Z-scores value of genes
or proteins from three types of (PLS) associations independently,
including gene-trait, PET-trait, and gene-PET associations. Second, we
utilized the Z-transform test, also referred to as the “Stouffer’s
method”^[67]22, to combine multiscale Z-scores of a gene. Meanwhile,
the neighboring information of PPI network from STRING database was
used to boost the ability of identifying disease related genes.
Finally, disease category^[68]5 and Gene Ontology (GO)^[69]23 term
enrichment analysis was conducted on the top-ranked genes, which were
determined by the mFusion methods, to identify important biomolecular
pathways or processes that relate to candidate genes. Further details
are provided in Methods, and Supplementary Fig. [70]S1.
Fig. 1. The framework and working interface of the “mFusion” method.
[71]Fig. 1
[72]Open in a new tab
By using partial least square association to integrate spatial
correlations of gene expressions in the human brain with information
about neurotransmission and neuroimaging, the mFusion method yields a
relevance score for each gene and pathway associated with a mental
disorder, facilitating the identification of top-ranked genes and
pathways. This fusion method additively provided the potential reasons
for neurochemical architectures (neurotransmissions) in PET images
influencing gene scores. Subsequent enrichment analysis of top genes
identifies biological process and pathways relate to the mental
disorder.
Table 2.
Neurotransmission-related PET maps included in analyses
Protein Neurotransmitter Tracer Measure n Age Reference
HTR1A Serotonin [^11C]CUMI-101 BP[ND] 8 (5) 28.4 ± 8.8 Beliveau et
al.^[73]75
HTR1A Serotonin [^11C]WAY-100635 BP[ND] 35 (17) 26.3 ± 5.2 Savli et
al.^[74]76
HTR1B Serotonin [^11C]AZ10419369 BP[ND] 36 (12) 27.8 ± 6.9 Beliveau et
al.^[75]75
HTR1B Serotonin [^11C]P943 BP[ND] 23 (8) 28.7 ± 7.0 Savli et al.^[76]76
HTR1B Serotonin [^11C]P943 BP[ND] 65 (16) 33.7 ± 9.7 Gallezot et
al.^[77]77
HTR2A Serotonin [^18F]altanserin BP[ND] 19 (8) 28.2 ± 5.7 Savli et
al.^[78]76
HTR2A Serotonin [^11C]Cimbi-36 BP[ND] 29 (14) 22.6 ± 2.7 Beliveau et
al.^[79]75
HTR2A Serotonin [^11C]MDL100907 BP[ND] 3 (1) 35 ± 9 Talbot et
al.^[80]78
HTR4 Serotonin [^11C]SB207145 BP[ND] 59 (18) 25.9 ± 5.3 Beliveau et
al.^[81]75
HTR6 Serotonin [^11C]GSK215083 BP[ND] 30 (0) 36.6 ± 9.0 Radhakrishnan
et al.^[82]79
SLC6A4 Serotonin [^11C]DASB BP[ND] 100 (71) 25.1 ± 5.8 Beliveau et
al.^[83]75
SLC6A4 Serotonin [^11C]DASB BP[ND] 18 (6) 30.5 ± 9.5 Savli et
al.^[84]76
SLC6A4 Serotonin [^11C]MADAM BP[ND] 10 (2) range: 51–67 Fazio et
al.^[85]80
SLC6A4 Serotonin [^11C]MADAM BP[ND] 16 (2) range: 21–67 Dukart et
al.^[86]20
CNR1 Cannabinoid [^18F]FMPEP-d2 V[T] 22 (11) male: 27 ± 6; female:
28 ± 10 Laurikainen et al.^[87]81
CNR1 Cannabinoid [^11C]OMAR V[T] 77 (28) 30.0 ± 8.9 Normandin et
al.^[88]82.
DRD1 Dopamine [^11C]SCH23390 BP[ND] 13 (7) 33 ± 13 Kaller et
al.^[89]83.
DRD2 Dopamine [^11C]FLB457 BP[ND] 55 (29) 32.5 ± 9.7 Hansen et
al.^[90]14.
DRD2 Dopamine [^11C]FLB457 BP[ND] 6 (2) 39.5 ± 6.8 Sandiego et
al.^[91]84.
DRD2 Dopamine [^18F]fallypride BP[ND] 58 (22) 18.5 ± 0.6 Jaworska et
al.^[92]85.
DRD2 Dopamine [^11C]FLB457 BP[ND] 37 (20) 48.4 ± 16.9 Smith et
al.^[93]86.
DRD2 Dopamine [^11C]raclopride BP[ND] 7 (0) 24 ± 2 Alakurtti et
al.^[94]87.
SLC6A3 Dopamine [^123I]FP-CIT SUVR 174 (65) 61 ± 11 Dukart et
al.^[95]88.
SLC6A3 Dopamine [^123I]Ioflupano SUVR 26 (--) range 35 ~ 65 García-G et
al.^[96]89.
SLC6A3 Dopamine [^18F]FE-PE2I SUVR 10 (0) 28.1 ± 6.9 Sasaki et
al.^[97]90.
GABRA1 GABA -- -- 26 (0) 26 ± 5 Dukart et al.^[98]88.
GABRA1 GABA [^11C]flumazenil B[max] 16 (9) 26.6 ± 8 Nørgaard et
al^[99]91.
HRH3 Histamine [^11C]GSK189254 V[T] 8 (1) 31.7 ± 9.0 Gallezot et
al.^[100]92.
OPRM1 Opioid [^11C]carfentanil BP[ND] 204 (72) 32.3 ± 10.8 Kantonen et
al.^[101]93.
OPRM1 Opioid [^11C]carfentanil BP[ND] 39 (19) 37.0 ± 4.9 Turtonen et
al.^[102]94.
SLC6A2 Norepinephrine [^11C]MRB BP[ND] 77 (27) 33.4 ± 9.2 Ding et
al.^[103]95.
SLC6A2 Norepinephrine [^11C]MRB BP[ND] 20 (8) 33.3 ± 10.0 Hesse et
al.^[104]96.
KIF17 Glutamate [^18F]GE-179 V[T] 29 (8) 40.9 ± 12.7 Galovic et
al.^[105]97.
SV2A* -- [^11C]UCB-J BP[ND] 10 (3) 36 ± 10 Finnema et al.^[106]98.
VAT1L Acetylcholine [^18F]FEOBV SUVR 5 (4) 68.4 ± 3.4 Hansen et
al.^[107]14.
VAT1L Acetylcholine [^18F]FEOBV SUVR 6 (3) 67.0 ± 11.1 Aghourian et
al.^[108]99.
VAT1L Acetylcholine [^18F]FEOBV SUVR 4 (1) 37 ± 10.2 PI: Lauri Tuominen
& Synthia Guimond
VAT1L Acetylcholine [^18F]FEOBV SUVR 18 (13) 66.8 ± 6.8 Hansen et
al.^[109]14.
VAT1L Acetylcholine [^18F]FEOBV SUVR 5 (1) 68.3 ± 3.1 Bedard et
al.^[110]100.
CHRM1 Acetylcholine [^11C]LSN3172176 BP[ND] 24 (11) 40.5 ± 11.7
Naganawa et al.^[111]101.
GRM5 Glutamate [^11C]ABP688 BP[ND] 22 (10) 67.9 ± 9.6 PI: Rosa-Neto, P.
& Kobayashi, E.
GRM5 Glutamate [^11C]ABP688 BP[ND] 28 (13) 33.1 ± 11.2 DuBois et
al.^[112]102.
GRM5 Glutamate [^11C]ABP688 BP[ND] 74 (49) 20 ± 3.0 Smart et
al.^[113]103.
GRM5 Glutamate [^11C]ABP688 BP[ND] 22 (10) 67.9 ± 9.6 Hansen et
al.^[114]14.
CHRNA4 Acetylcholine [^18F]Flubatine V[T] 30 (10) 33.5 ± 10.7 Hillmer
et al.^[115]104.
[116]Open in a new tab
The Protein column indicate the protein names in the STRING database.
Supplementary Table [117]S1 also includes more extensive methodological
details, such as Excitatory/Inhibitory, Ionotropic/Metabotropic, and
Source toolkit. Values in parentheses (under n) indicate the number of
females.
BP[ND] parametric and regional non-displaceable binding potential,
B[max] density (pmol ml^−1) converted from binding potential (5-HT) or
distributional volume (GABA) using autoradiography-derived densities,
V[T] tracer distribution volume, SUVR standardized uptake value ratio.
*The synaptic vesicle glycoprotein 2 A(SV2A) is targeted by PET imaging
to quantify synaptic density in human brains^[118]98.
mFusion outperformed the traditional method on simulation data
We compared performance on simulation data between the traditional
partial least squares (PLS) association method, and five fusion methods
proposed by this study (i.e., meanGP, meanGPT, meanPPI, maxGPT, and
maxPPI, see “Methods”). Evaluation metrics included the correlation
between estimated gene scores and real gene weights, the number (or
rate) of hits, the area under curve (AUC) of receiver operating
characteristic (ROC), AUC of precision-recall (PR) curve (see
“Methods”).
Compared with other methods, we found that gene scores given by the
meanPPI and maxPPI methods demonstrated higher correlation with real
gene weights defined in the simulation model (Fig. [119]2a, unpaired
Wilcoxon test, 500 times of simulations), higher hit rates of active
genes in the simulation (Fig. [120]2b), and larger AUCs of both the ROC
(Fig. [121]2c) and PR (Fig. [122]2d) curves, these curves were all
generated by the mean value of 500 times of simulations.
Fig. 2. Evaluation of fusion methods from simulated datasets.
[123]Fig. 2
[124]Open in a new tab
a The correlation between real gene weights and fusion weights measured
by different fusion methods of 500 simulated experiments. The lower
whisker extends from the first quartile (Q1) to the smallest data point
that is within 1.5 * interquartile range (IQR) below Q1. The upper
whisker extends from the third quartile (Q3) to the largest data point
that is within 1.5 * IQR above Q3. The number next to bar represents
the median of the population (using unpaired Wilcoxon test). b Average
hit rates of genes in all 500 simulations. The hit rate was measured by
the rate of really active genes in top K genes ranked by specific
fusion method. c ROC (Receiver Operating Characteristic) curve of
different fusion methods on simulation data. In simulation experiments,
[MATH: w~<
mi
mathvariant="bold-italic">X×
w~<
mi mathvariant="bold-italic">MT :MATH]
is completely accurate connection matrix, and this noiseless PPI
information greatly improves the performance of maxPPI and meanPPI
methods, so the AUC-ROC of maxPPI is 1. d PR (precision-recall) curve
of different fusion methods on simulation data. e AUC-ROC value of
different fusion method when number of active genes changed. f AUC-ROC
value of different fusion method when covariance between latent
variables changed.
We tested the performance of mFusion under different conditions as
defined by both the sparsity in activate genes and the strength of the
gene-PET covariance (Methods). The AUC-ROCs of both meanPPI and maxPPI
outperformed the PLS method at different sparse levels of activate
genes (Fig. [125]2e). Conversely, the results presented in Fig. [126]2f
indicate that the two fusion methods, meanPPI and maxPPI, exhibited
insensitivity to changes in the covariance between gene expression and
neurotransmission PET maps.
And then, three kinds of perturbations were performed on the PPI
networks to illustrated the influence of PPI information on the mFusion
method for 500 repetitions: (1) randomly shuffle 30% of the elements
within the adjacency matrix
[MATH: w~<
mi
mathvariant="bold-italic">X×
w~<
mi mathvariant="bold-italic">MT :MATH]
; (2) set the minimum 30% of the elements in the adjacency matrix to be
zero; (3) randomly shuffle 30% of the elements, and then set the
minimum 30% of the elements in the adjacency matrix to be zero. We
found that the meanPPI and maxPPI methods consistently outperformed
their counterparts in all three conditions (Fig. [127]S2).
Thirdly, we conducted a simulation of brain maps at three distinct
spatial resolutions. Specifically, the number of brain regions (n) was
varied between 100, 200, and 500 (see “Methods” for further details),
as delineated in Fig. [128]S3. The results of this simulation
demonstrated a positive correlation between the spatial resolution of
the X, Y, and Z matrices and the efficacy of the methods in identifying
activated genes. Notably, the meanPPI and maxPPI methodologies
consistently exhibited superior performance compared to other methods,
exhibiting a level of stability that highlights their robustness in
high-resolution brain mapping analyses.
mFusion outperformed the traditional method on empirical data
We used SCZ morphological similarity differences and ASD cortical
thickness difference as the traits and get genes Z-scores from
different fusion method, as described in Methods. Compared to the
traditional PLS regression method and other fusion methods, the meanPPI
and maxPPI method got a larger AUC on DisGeNet database (SCZ:
Fig. [129]3a and Table [130]S2; ASD: Fig. [131]3b and Table [132]S3),
which demonstrated superior identification of disorder-related genes.
On the other hand, we compared the number of hits in the top K genes
given by various methods. When we varied the parameter K from 41 to
1541, where 1541 was 10% of the total of 15,408 genes, we found that
the proposed methods had consistently more hits as compared with the
other algorithms (Fig. [133]3c–j). Notably, when referencing the
DisGeNet database, the meanPPI method outperformed all the other
methods in identifying SCZ-related hit genes significantly
(Fig. [134]3c; p < 0.001, paired Wilcoxon test for meanPPI and PLS
method. Gene scores refer to Supplementary Table [135]S4). Among the
ASD related genes in the DisGeNet database, the number of hit genes in
the top K gene sets identified by the meanPPI method was also
significantly greater than that identified by other five methods
(Fig. [136]3g; p < 0.001, paired Wilcoxon test for meanPPI and PLS
method. Gene scores refer to Supplementary Table [137]S5). Furthermore,
when compared to fusion methods lacking PPI information, such as
meanGPT and maxGPT, their PPI-informed counterparts, meanPPI and
maxPPI, consistently demonstrated superior performance across the board
(Fig. [138]3c–j).
Fig. 3. Performance on SCZ and ASD disease of fusion methods under different
disease databases.
[139]Fig. 3
[140]Open in a new tab
a ROC curve of different fusion methods on DisGeNet database for SCZ. b
ROC curve of different fusion methods on DisGeNet database for ASD. c–j
Number of overlapped genes for SCZ (c–f) and ASD (g–j) in different
standard datebases: DisGeNet, CTD, DISEASES, and PGC-GWAS datasets
(corresponding to Table [141]1). Line types mean different fusion
methods.
Sensitivity analysis on empirical data
To identify optimal parameters for fusion methods, we compared
performances of these methods with different network depths (d) and
edge confidences (c) for the PPI. We observed that the meanPPI method
exhibited superior performance (i.e., a larger number of hit genes,
AUC-ROC value, or AUC-PR values) when its PPI depth d was set to 1 in
comparison to 2 (Fig. [142]4 and Fig. [143]S4). This trend was
consistent across various edge confidence values ranging from 0.3 to
0.7. When the PPI depth was set as 2, meanPPI performed similarly to
other methods (Fig. [144]S5). Meanwhile, we noted that the meanPPI’s
performance was less sensitive to the edge confidence of PPI when it
varied from 0.3 to 0.7 (Fig. [145]4e, f). However, when it increased to
0.8 or 0.9, the meanPPI’s performance declined mainly owing to the fact
that too few PPIs remained effective at such high confidence levels
(Fig. [146]4 and Fig. [147]S4). Using the physical subnetwork (i.e.,
with evidence of binding or forming a physical complex) instead of the
full STRING PPI network, the meanPPI method exhibited a decrease in the
number of hits. Nevertheless, it consistently outperformed other
methods that did not incorporate the PPI information (Fig. [148]S6).
Consequently, we opted for d = 1 and c = 0.5 in subsequent analyses.
Fig. 4. Performance of meanPPI method on DisgeNet database with different
threshold for pruning the PPI network.
[149]Fig. 4
[150]Open in a new tab
a, b Number of hit genes for SCZ with different PPI depth d and
confidence scores c, d = 1 in A and 2 in B, respectively. c, d Number
of hit genes for ASD with different PPI depth and confidence scores,
d = 1 in C and 2 in D, respectively. e ROC curve at different PPI
confidence for SCZ. f ROC curve at different PPI confidence for ASD.
In order to evaluate the importance of PPIs in the context of the
mFusion-meanPPI method, a comparative analysis was conducted on SCZ and
ASD phenotypes separately. The analysis comprised a computational
evaluation of 500 randomly generated PPIs for each disease (see
“Methods”), with the resulting null distribution of the number of hit
genes presented in Fig. [151]S7A, B separately. The results
demonstrated that the application of the meanPPI method using real PPI
data markedly augmented the capacity to identify hit genes compared to
the use of random PPI. In addition, a similar permutation was made for
the 45 PET maps (see “Methods”) and reapplied to the analysis of the
SCZ and ASD disease. The results in Figure [152]S7C, D revealed a
marked reduction in the ability of the meanPPI method in pinpointing
disease-associated genes, thereby indicating that real PET maps are
pivotal in the meanPPI method.
To assess the effect of the quality of PET maps on the results, the 45
redundant maps were synthesized and averaged into 20 unique maps
(Fig. [153]S8). Subsequently, the characteristics of SCZ and ASD were
reanalyzed (Figs. [154]S9, [155]S10). The meanPPI method demonstrated
remarkable consistency with the primary findings regarding the
identification of disease risk genes, exhibiting a spearman correlation
for gene scores of r = 0.97 (p < 2e-16) and r = 0.98 (p < 2e-16),
respectively (Fig. [156]S9). Furthermore, both the meanPPI and maxPPI
methods emerged as the most effective approaches (Fig. [157]S10).
Top-ranked genes enriched in the relevant diseases
As an analysis module of mFusion analysis, we performed enrichment
analysis for top 1541 (10% of 15,408) genes that had negative relevant
scores to SCZ or ASD given by different methods (see “Methods”).
Following the FDR correction among 30,170 diseases, traits, and
phenotypes in the DisGeNet (Fig. [158]5a, b), genes prioritized by the
meanPPI method for SCZ/ASD were enriched in the corresponding disease
gene sets. In contrast, the top genes identified by the PLS method did
not have such enrichments (Tables [159]S8, [160]S9).
Fig. 5. Enrichment analysis of top-ranked genes related to SCZ and ASD
traits.
[161]Fig. 5
[162]Open in a new tab
a, b Disease enrichment results in DisGeNet diseases on top 1541
trait-related genes for SCZ (a) and ASD (b). The Y-axis lists disease
with categories in alphabetical order. c–f Clusters of GO terms
enrichment results on top 1541 genes for SCZ (overlapped terms in c,
terms uniquely enriched by meanPPI method in d) and ASD (overlapped
terms in (e), terms uniquely enriched by meanPPI method in (f). The
size and color of the dots were proportional to the number of pathway
genes and enrichment significance, respectively. The p-values were
adjusted using Bonferroni correction. Clusters were generated from
enriched GO terms by aPEAR (Advanced Pathway Enrichment Analysis
Representation) package. It exploits the similarities between pathway
gene sets and represents them as a network of interconnected clusters.
Each cluster is assigned a meaningful name that highlights the main
biological theme of the experiment.
Top-ranked genes enriched in more biological pathways
For SCZ, the meanPPI and PLS methods shared enrichment in 92 GO terms,
while the meanPPI had enriched 837 new GO terms. The shared terms
included the establishment of protein localization to the membrane
(GO_BP:0090150), regulation of synapse structure or activity
(GO_BP:0050803), channel inhibitor activity (GO_MF:0008200), etc.
(Fig. [163]5c; Table [164]S6). Newly enriched terms of meanPPI included
the calcium ion transport (GO_BP:0060402), cation channel activity
(GO_MF:0022843), GABA-A receptor activity (GO_CC:1902711), etc.
(Fig. [165]5d). Importantly, these unique biological processes have
been implicated in SCZ^[166]24,[167]25.
For ASD, these two methods shared enrichment in 38 GO terms, including
the synaptic membrane (GO_CC:0097060), neuron projection terminus
(GO_CC:0044306), positive regulation of protein transport
(GO_BP:0051222), etc. (Fig. [168]5e). In comparison to the PLS results,
the meanPPI results introduced new enrichments in 795 GO terms,
including the gated channel activity (GO_MF:0022836), neurotransmitter
secretion (GO_BP:0001956), GABA-A receptor activity (GO_MF:0004890),
etc. (Fig. [169]5f; Table [170]S7).
Top-ranked genes had more hits in a disease-related gene database
To characterize differences between genes prioritized by the proposed
method (i.e., mFusion-meanPPI) and the traditional PLS method, we
compared the top 1541 (10% of 15,408) genes identified by different
ranking methods (Fig. [171]6). By comparing gene scores with
disease-related genes listed in the DisGeNet database, we observed that
higher meanPPI fusion scores were associated with higher hit rates.
Since the PLS-regression is essentially a multivariate approach, which
is prone to overfitting, we found more false positives in the genes
with high PLS-regression weights. In contrast, we demonstrated that the
mFusion-meanPPI approach reduced the false positive rate by combining
the information from multiscale. Among the top 10% genes, the meanPPI
method identified 534 SCZ-related genes listed in the DisGeNet
database, which was significantly more than the 235 genes identified by
traditional PLS method (p < 2.2e-16, Chi-squared test; Fig. [172]6a;
Tables [173]S4, [174]S6). Similarly, among the 1071 ASD risk genes
listed in the DisGeNet database, the meanPPI method identified 221 of
them within the top 10% genes, which was significantly more than the 98
genes identified by the PLS method (p = 5.42e-13, Chi-squared test;
Fig. [175]6b; Tables [176]S5, [177]S7). Therefore, the proposed
approach identified more genes that have already been implicated in
mental disorders than the traditional PLS method did.
Fig. 6. Differential plot of genes by different fusion methods and
neurotransmissions for SCZ and ASD.
[178]Fig. 6
[179]Open in a new tab
a, b Gene scores from meanPPI method and PLS method. Black dots: genes
overlapped among the genes from DisGeNet standard database, top 10%
genes from meanPPI method, and top 10% genes from PLS method
simultaneously. Blue triangles: genes overlapped between the genes from
DisGeNet database and 10% genes from PLS method. Magenta triangles:
genes overlapped between the genes from DisGeNet database and 10% genes
from meanPPI methods. The bar chart at the edge shows the hit rates of
these disease related genes. c, d Associations measured by PLS Z-score
between all PET maps of various neurotransmission process and disease
trait (c: SCZ; d: ASD). e, f Top 20 candidate genes identified by
meanPPI method, and the gene-PET effects measured by PLS Z- score for
SCZ (e) and ASD (f) disease trait. Point shapes of genes in (e–f) have
the same meanings as in (a, b).
We examined the neurotransmissions-trait and gene-neurotransmissions
association for SCZ and ASD (Fig. [180]6c, d). We found that the top 20
genes prioritized for SCZ by mFusion-meanPPI had two patterns of
correlations with five neurotransmitter receptors, including 17 genes
with positive correlations with HTR1A, CNR1, DRD1 DRD2, and OPRM1, and
3 genes with negative correlations with these receptors (Fig. [181]6e).
Similar patterns were observed for ASD (Fig. [182]6f).
Gene-neurotransmission PLS association analysis revealed that the
majority of the top 20 genes were linked to these neurotransmissions
(Fig. [183]6e, f). Specifically, 14 of the top 20 genes identified by
the mFusion-meanPPI method were listed as SCZ-related genes in the
DisGeNet database, and five of these 14 genes were not detected by the
PLS method.
Comparison of correlations among multiple brain disorders
We applied the mFusion-meanPPI algorithm to neuroimaging traits of
eight disorder cohorts separately (Fig. [184]7a, see “Methods”), and
prioritized top 10% genes based on their Z-scores. Spearman correlation
analysis of these genes was performed to assess the similarity between
each pair of disorders. Following this, hierarchical clustering was
applied to the spearman correlation coefficients among these diseases,
resulting in the identification of three distinct clusters. These
clusters reflected the expressional association among these diseases,
as inferred from the gene Z-scores. The first cluster comprised the
ASD, EPI, and PD, the second included the ADHD and DEP, and the third
cluster encompassed the OCD, SCZ, and BIP (Fig. [185]7b). This
clustering structure was supported by both morphological (Fig. [186]7c)
and genetic (Fig. [187]7d,) correlations. Especially, the OCD-SCZ-BIP
cluster and the EPI-PD cluster presented in all three clustering
structures, which are supported by previous studies of the
cross-disease similarity at different levels^[188]10,[189]26,[190]27.
In the other two clusters, the EPI-PD correlation exhibited consistent
stability. However, while genetically ASD showed more similarity to the
DEP-ADHD cluster, neuroimaging traits placed it closer to the EPI-PD
cluster. Simultaneously, the DEP-ADHD correlation was more pronounced
genetically but less evident in terms of imaging trait correlation. Our
identification of the clustering structure for eight major mental
disorders unveiled a notable concordance of these disorders across
multiple scales (Supplementary Table [191]S8, Table [192]S9, and
Table [193]S10).
Fig. 7. Correlation of eight brain disorders from multiple biomolecular
levels.
[194]Fig. 7
[195]Open in a new tab
a Cohen’s d maps of cortical thickness difference for eight disorders
on Desikan–Killiany atlas regions. b Heatmap of expressional
correlations across eight disorders (Spearman’s r value). c Heatmap of
morphological correlations across eight disorders (Pearson r value). d
Heatmap of genetic correlations across eight disorders (LDCS
[MATH: rg
:MATH]
value). e The overlap of top10% genes among three disease clusters is
shown in the Veen map. f GO:MF (molecular function) terms enrichment
results for three groups of cluster-specific genes (Cluster1: 102;
Cluster 2: 410; Cluster 3: 109). g GABRA1 related pathway scores across
different neurotransmissions. ADHD Attention-deficit/hyperactivity
disorder, ASD Autism spectrum disease, BIP Bipolar disorder, DEP
Depression, EPI Epilepsy, OCD Obsessive-compulsive disorder, PD
Parkinson’s disease, SCZ Schizophrenia.
Comparing among the top 10% genes for each disorder, we identified
three cluster-specific gene sets including 102, 410 and 109 genes for
three clusters, respectively (Fig. [196]7e; Table [197]S11). Meanwhile,
the genes related to cluster 1 were enriched in a wide range of pre-
and post-synaptic functions, and the genes for cluster 2 enriched
mainly in the postsynaptic functions (Fig. [198]7f). Notably, the
“GABRA1” was the only gene associated with all eight disorders but with
distinct gene-transmission pathways (Fig. [199]7g, Table [200]S12). The
GABRA1-GRM5 or -CNR1 pathway was prioritized for PD, while the
GABRA1-HRH3 pathway was prioritized for OCD. This is consistent with
the literature reporting that CNR1 agonists help relieve symptoms in PD
patients^[201]28–[202]30.
In total, all 43,126 gene-neurotransmissions-trait pathways among
15,408 genes, 20 neurotransmissions, and 29 disease traits were listed
in a quadrable database
([203]https://xomicsbio.shinyapps.io/mfusion_shiny/) and summarized in
Supplementary Fig. [204]S12.
Discussion
For making use of the human brain data, that have been rapidly
accumulating but separately collected at various scales, this study
proposed an analytical method, namely mFusion, to bridge neuroimaging
traits and genes for mental disorders. Different from previous methods
that examine pair-wise associations across two scales, mFusion
establishes gene-neurotransmissions-trait pathways across three scales.
The advantage of the mFusion method over the previous methods was
demonstrated in both simulated and experimental datasets. Both
well-known genes and new candidate genes were identified by this method
for mental disorders. To our knowledge, it is the first method to
prioritize cross-scale pathways for mental health disorders, providing
a richer and more comprehensive perspective on disease exploration. In
the current study, we demonstrated the performance of the proposed
mFusion as a tool for finding gene hits in mental disorders using the
PET maps, it is worth noting that the method could be applied to any
brain maps, such as the functional MRI or magnetoencephalography,
single-photon emission computed tomography, etc.
The proposed method, mFusion, also suggested new disease-related genes
that have not been listed in the reference database (e.g., DisGeNet,
Fig. [205]6E, F). For example, the gene CNR1 was prioritized for SCZ by
mFution-meanPPI but not the traditional PLS method (Fig. [206]6E). The
CNR1 (cannabinoid receptor 1) encodes cannabinoid receptors and is
implicated in the pathophysiology of SCZ. In the literature, the
decreased expression of this gene has been reported in the DLPFC of
patients with schizophrenia^[207]31. The prioritization of this gene by
the proposed method was contributed to by its gene-PET association with
the DRD2, which is supported by its physical interaction with DRD2 to
form CB1R–DRD2 heteromers^[208]32.
Another example is the gene KCNC1 (Potassium Voltage-Gated Channel
Subfamily C Member 1, see Supplementary Fig. [209]S11A for its PPI
network), which is involved in the monoatomic ion channel activity and
delayed rectifier potassium channel activity^[210]33. It was reported
that the level of KCNC1 channels protein decreased in the neocortex of
SCZ-infected mice compared with the control group^[211]34,[212]35.
Another example is GABRA3, which has already been associated with both
dopamine transporter transcripts and the disinhibition of nigrostriatal
dopamine neurotransmission in the literature^[213]36. A recent study
using peripheral blood-mesenchymal stem cells has reported its
transcriptomic association with ASD^[214]37.
Furthermore, for different disorders, gene-PET-trait pathways mediated
by different neurotransmissions had great changes of influence
(Fig. [215]S11B, Table [216]S12). For example, the neurotransmission
GRM5 have strong effect on PD disease (average pathway score = 4.92,
refer to Table [217]S12) while not for SCZ (score = 1.64) and BD
(score = 1.95) disease. When we refer to pathways in Table [218]S12,
the “SNCA” have stronger pathway scores mediated by neurotransmissions
including GRM5 (score = 5.76), CHRNB4 (score = 5.00), and CNR1
(score = 4.87), compared with other disease (these pathways scores less
than 3 all). The SNCA (alpha-synuclein gene) has been widely reported
to be involved in the onset of Parkinson’s disease, especially in the
formation of Lewy bodies^[219]38–[220]40.
Nevertheless, the multiscale fusion analysis framework has its
limitations. First, the currently available 45 PET maps of
neurotransmissions cover only 9 neurotransmitter systems and the
synaptic density, more PET maps of neurotransmitters remained exclusive
due to numerous methodological and data-sharing challenges. The present
study would be strengthened in future with advanced biomolecular
imaging techniques. Second, the choice of processing parameters can
influence the AHBA gene expression estimates^[221]41. To mitigate this
challenge, we normalized the expression values and focused only on
analyses related to the relative rank of genes as opposed to the
absolute values. Third, the gene expression data within brain tissues
is restricted to a finite set of samples. As additional data
encompassing a broader range of genes becomes accessible in the future,
the proposed method will be poised for application to these expanded
datasets.
Conclusion
In this study, we proposed an analytical method to integrate
information across multiple scales, including genes, neurotransmitters,
and neuroimages. This method provides a neurotransmission bridge,
bridging neuroimaging traits to genes in human brains for mental
disorders. The mFusion method identified both well-known genes and new
candidate genes of SCZ and ASD separately, demonstrating its advantages
in mental disorder phenotypes. This novel method also prioritizes
cross-scale pathways related to mental disorders, providing a richer
and more comprehensive perspective on disease exploration.
Methods
Data preprocessing
Gene expression in human brain tissues
Microarray expression data for brain tissues were sourced from the
Allen Human Brain Atlas (AHBA)^[222]11,[223]17, featuring samples from
six neurotypical donors aged between 26 to 54 years, with five males
and one female. The database encompasses probe expressions from a total
of 3702 samples, which have been normalized across all brains. Given
the limited availability of right hemisphere samples from only two
donors, our analysis focused on 2664 samples from the left hemisphere
across all six donors. Following recommended preprocessing steps
outlined by Arnatkevičiūtė et al. ^[224]18 and consistent with
procedures detailed in our prior publication^[225]42, the data
underwent re-annotation, intensity filtering, probe selection based on
mean values, and normalization. This process yielded a matrix of gene
expression comprising 2664 samples × 15,408 unique genes.
Neurotransmission images
PET imaging has proven invaluable for noninvasively mapping the in vivo
spatial distributions of neurotransmissions within the human brain. In
this study, we curated a comprehensive database comprising 45
neurotransmission-related PET maps for 9 neurotransmitter systems and
synaptic density. Among them, 36 maps were provided in the neuromaps
toolbox
([226]https://netneurolab.github.io/neuromaps/index.html)^[227]19, 6
were available through the JuSpace toolbox
([228]https://github.com/juryxy/JuSpace)^[229]20, and 3 were available
at the PET imaging database provided by Hansen et al. ^[230]14
([231]https://github.com/netneurolab/hansen_receptors/tree/main/data/PE
T_nifti_images). These systems encompass serotonin, cannabinoid,
dopamine, gamma-aminobutyric acid, histamine, mu-type opioid,
norepinephrine, N-methyl-D-aspartate, synaptic vesicle membrane
protein, acetylcholine, glutamate, and nicotinic-acetylcholine
(Table [232]2 and Supplementary Table [233]S1).
Protein-protein interaction (PPI) network
Recognizing the collaborative nature of proteins coded by genes in
performing various functions^[234]43, our study employed the STRING
Protein-Protein Interaction (PPI) network (Version 11.5, August 12,
2021)^[235]16. This repository stands as one of the largest and most
widely utilized sources of PPI data, encompassing both direct
(physical) and indirect (functional) interactions. These interactions
are derived from a range of sources, including experimental data, gene
co-expression, and text-mining. Within the PPI network, the strength of
an edge is quantified by the confidence score (c), while the distance
between two nodes is measured by the depth (d). Specifically, a larger
c and a smaller d contribute to a PPI network that is substantiated by
stronger evidence.
Brain traits of mental disorders using the Desikan–Killiany (DK) atlas
The ENIGMA consortium and ENIGMA toolbox
([236]https://enigma-toolbox.readthedocs.io/en/latest/index.html#)^[237
]21 have provided the structural case-control differences for eight
mental disorders, including attention-deficit/hyperactivity disorder
(ADHD)^[238]44, ASD^[239]45, bipolar disorder (BD)^[240]46, common
epilepsy syndromes (EPI)^[241]47, depression (DEP)^[242]48,
obsessive-compulsive disorder (OCD)^[243]49, Parkinson’s disease
(PD)^[244]50, and SCZ^[245]51. In this study, we employed maps
detailing case-control differences in cortical thicknesses, represented
by inverted Cohen’s d values^[246]14 (this means, larger values
represent greater cortical thinning), for 68 specific DK brain regions
(Table [247]S13).
Brain traits of mental disorders in the DK308 Atlas
In our investigation, we incorporated a brain map depicting
case-control differences in morphological similarity, specifically the
correlation of seven morphological parameters (i.e., gray matter
volume, surface area, cortical thickness, Gaussian curvature, mean
curvature, fractional anisotropy, and mean diffusivity) derived from
MRI and diffusion-weighted imaging data, concerning schizophrenia. This
map is defined by the Desikan–Killiany 308 atlas (DK308)^[248]13, an
improved version of the DK atlas that maintains small-world properties
of anatomical cortical networks while enhancing resolution with 308
regions^[249]8. We also employed another case-control differences map
in cortical thickness for ASD illustrated by DK308 atlas^[250]9.
GWAS summary statistics for mental disorders
We compiled GWAS summary results for six mental disorders from
published research, drawing from the Psychiatric Genomics Consortium
(PGC) datasets for ADHD^[251]52, ASD^[252]53, BIP^[253]54, DEP^[254]55,
OCD^[255]56, SCZ^[256]57. Additionally, we incorporated data from other
relevant studies (EPI^[257]58, PD^[258]59). Table [259]S14 offers
comprehensive details on the individual GWAS samples, including
references, sample sizes, and SNP numbers.