Abstract
Background
Multiple sclerosis (MS) has a complex pathophysiology, variable
clinical presentation, and unpredictable prognosis; understanding the
underlying mechanisms requires combinatorial approaches that warrant
the integration of diverse molecular omics data.
Methods
Here, we combined genomic and proteomic data of the same individuals
among a Turkish MS patient group to search for biologically important
networks. We previously identified differentially-expressed proteins by
cerebrospinal fluid proteome analysis of 179 MS patients and 42 non-MS
controls. Among this study group, 11 unrelated MS patients and 60
independent, healthy controls were subjected to whole-genome SNP
genotyping, and genome-wide associations were assessed. Pathway
enrichment analyses of MS-associated SNPs and differentially-expressed
proteins were conducted using the functional enrichment tool, PANOGA.
Results
Nine shared pathways were detected between the genomic and proteomic
datasets after merging and clustering the enriched pathways. Complement
and coagulation cascade was the most significantly associated pathway
(hsa04610, P = 6.96 × 10^−30). Other pathways involved in neurological
or immunological mechanisms included adherens junctions (hsa04520,
P = 6.64 × 10^−25), pathogenic Escherichia coli infection (hsa05130,
P = 9.03 × 10^−14), prion diseases (hsa05020, P = 5.13 × 10^−13).
Conclusion
We conclude that integrating multiple datasets of the same patients
helps reducing false negative and positive results of genome-wide SNP
associations and highlights the most prominent cellular players among
the complex pathophysiological mechanisms.
Keywords: Bioinformatics, Genomics, Proteomics, Multiple sclerosis
Introduction
Multiple sclerosis (MS) is known to be an immune-mediated,
neurodegenerative central nervous system (NS) disorder with complex
inheritance and pathophysiological mechanisms. Although approximately
common 250 genetic variants with low to modest risk effects have been
associated with MS mainly by genome-wide association studies (GWAS)
using rather large sample groups ([38]International Multiple Sclerosis
Genetics Consortium, 2007; [39]Sawcer et al., 2011; [40]Patsopoulos et
al., 2017; [41]International Multiple Sclerosis Genetics Consortium et
al., 2019), known variants not only fail to explain predicted MS
heritability but also cannot be efficiently translated into disease
mechanisms. To date, numerous studies have also revealed potential
diagnostic and prognostic biomarkers and disease-related cellular
pathways that emphasize the different pathological components of the
disease; however, the exact underlying mechanisms in disease
development and progression are largely unknown. In order to better
translate the growing number of findings into disease pathophysiology,
algorithms for pathway analyses of multiple high-throughput omics data
seem essential. In this context, the integration of multiple omics data
is essential to better describe the complex nature of MS.
In our previous study ([42]Avsar et al., 2015), we have conducted a
cerebrospinal fluid (CSF) proteome analysis using 2D-gel
electrophoresis and mass spectrophotometry and identified 151
differentially expressed proteins between an MS cohort of 179 patients
with different clinical MS phenotypes and 42 non-MS controls. Later,
affected Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were
identified using the functional enrichment tool Pathway and Network-
Oriented GWAS Analysis (PANOGA) ([43]Bakir-Gungor, Egemen & Sezerman,
2014), revealing MS-related pathways including aldosterone-regulated
sodium reabsorption pathway, renin- angiotensin system, notch signaling
pathway, and vitamin digestion and absorption pathway. Here, we further
explored disease-related pathways, applying single nucleotide
polymorphism (SNP) genotyping on 11 MS patients, who were included in
the previous proteomic study, and 60 independent, healthy individuals.
Pathway enrichment analyses of the genomic and proteomic data were
conducted. The two datasets were merged, highlighting the most
prominent pathways that may be affected in the studied patient group.
Materials & Methods
Study groups
All patients were diagnosed at Istanbul University-Cerrahpaşa,
Cerrahpaşa School of Medicine, Department of Neurology, according to
the McDonald criteria ([44]Polman et al., 2011). Eleven unrelated MS
patients were randomly selected among the study group of our previous
work comprising of 179 MS patients. The patient group had a
heterogeneous disease presentation at the time of their CSF sample
collection. Sixty independent age- and gender-matched intrinsic healthy
controls were included in the analyses. All individuals in the study
group are of Turkish origin. All procedures of the study were in
accordance with the Helsinki Declaration of 1964 and its later
amendments. The Ethics Committee of Istanbul Technical University
approved the study (İTÜ-SM.İNAREK-MBG-1), and each individual of the
study group gave a written informed consent form prior to the sample
collection.
Genome-wide associations
DNA isolation from blood samples was performed (Roche DNA Isolation Kit
for Mammalian Blood), and genotyping for 300.000 SNP markers was
applied for each individual on the Illumina HumanCytoSNP-12 array.
Illumina GenomeStudio software was used for quality control. All
individuals showed sample call rates of more than 98% (98.65–99.63%)
and were therefore included in the study. Golden Helix SNP & Variation
Suite software was used for identity-by-descent detection, suggesting
no relatedness between the individuals. SNP filtering was performed:
SNPs on the Y chromosome, with call rates lower than 95%, minor allele
frequency lower than 0.01, and in strong linkage disequilibrium (r^2 >
0.5) were excluded from the study. A total of 129.547 SNPs was included
in the analysis. Frequency differences of SNPs between cases and
controls were assessed, and genotypic association P-values were
obtained.
Pathway enrichment analysis
Genotypic associations and proteomic data were subjected to pathway
enrichment analyses using the functional enrichment tool PANOGA, which
reveals functionally relevant pathways by identifying genes within the
pathways, incorporating protein-protein interaction (PPI) information,
and extracting significant pathways. SNPs with genotypic association
P-values lower than 0.05 and differentially-expressed proteins with
P-values lower than 0.05 from our previous study were used for PANOGA
procedure as follows: For SNP associations, because a SNP might affect
more than one gene, each SNP was initially assigned to the gene on
which the SNP has the most important functional effect, using the SPOT
webserver ([45]Saccone et al., 2010). Functional information of SNPs
was obtained utilizing SPOT, F-SNP ([46]Lee & Shatkay, 2007), SNPnexus
([47]Dayem Ullah, Lemoine & Chelala, 2012), and SNPinfo ([48]Xu &
Taylor, 2009). Genes and proteins were then mapped onto a PPI network,
for which Goh et al.’s human PPI network was used ([49]Goh et al.,
2007). Next, the jActive Modules algorithm ([50]Ideker et al., 2002 was
applied to identify active subnetworks containing a large number of the
disease-affected genes and proteins in the PPI network. Each genotypic
association and differential expression P-value was taken into account,
and active subnetworks that overlap at most 50% with each other were
extracted. In order to evaluate the biological importance of the
subnetworks, the number of genes and proteins found in a specific KEGG
pathway was compared to the total gene and protein number involved in
the corresponding pathway. For this functional enrichment procedure, a
two-sided hypergeometric test was used, and the Bonferroni method was
applied for multiple testing corrections of the P-values. Significantly
associated KEGG pathways to our patient group consisted of
significantly enriched (P < 0.05) pathways for at least one of the
active subnetworks. For the pathways that were enriched in multiple
subnetworks, the one with the lowest P-value was reported. PANOGA was
run 10 times for both genotypic associations and
differentially-expressed proteins, and the lowest P-values over the 10
iterations were reported. The resulting enriched pathways for both
datasets were then merged. If a given pathway was identified in both
analyses, the corresponding P-values were merged using Fisher’s
combined probability test. If the pathway was identified in only one of
the analyses, the corresponding P-value was used as the final P-value.
Clustering of enriched pathways
Using a method previously described by [51]Chen et al. (2014) we
clustered the enriched pathways to identify similar groups and
establish representative pathways. The clustering approach can be
summarized as follows: initially overlap index matrix (OI) that
consists of the overlap indices between all of the pairs of enriched
pathways was calculated. The overlap index (OI[i,j]) of a pair of
pathways P[i] and P[j] (denoting the ith and jth pathways,
respectively) was defined as:
[MATH: OIi<
/mi>,j=|
Gi∩Gj|minGi,|G<
mrow>j|
:MATH]
where G[i] is the set of all genes in the ith pathway. In the overlap
matrix, each row (o[i]) represents the gene overlap profile of a
pathway (i.e., o[i] is the gene overlap profile of the ith pathway).
To identify similarity between each pair of pathways Pearson
correlation coefficients (R) between each pair of overlap profiles
(e.g., o[i] and o[j]) were calculated. These correlation coefficients
were then converted to pairwise distances (PD) as:
[MATH: PD=1−<
mi>R. :MATH]
Using PD as the distance metric, hierarchical clustering
(average-linkage) of the enriched pathways was performed. Examining the
hierarchical clustering dendrogram, the dendrogram was partitioned into
clusters at the manually selected PD cut-off value of 0.55. The pathway
with the lowest P-value in each cluster was selected as the
representative pathway for that cluster.
Results
SNP associations
All 71 individuals and a total of 129.547 SNPs passed the quality check
and therefore were included in the final analyses. The Manhattan plot
summarizing the genome-wide associations is given in [52]Fig. 1. The
most significantly associated SNP was rs7873, located on 3′ UTR of the
IGF2 gene, whose significance level was the only value close to a
classical stringent P-value cut-off for a GWAS (P = 4.39 × 10^−07).
When considering SNPs with a significance level of lower than a looser
P-value cut off (P < 10^−05) given the small sample size, 4 more SNPs
have high associations with MS in the studied patient group, one of
which is located on a gene: [53]rs17187282 (P = 1.17 × 10^−06),
[54]rs11688088 (P = 2.8 × 10^−06), rs654188 (P = 5.24 × 10^−06), and
[55]rs7092208 (P = 9.66 × 10^−06, intronic variant on the MGMT gene).
Figure 1. Genome-wide Manhattan plot showing –log[10] of the P-values of
129.547 SNPs (y-axis) against their genomic positions (x-axis) for
associations.
[56]Figure 1
[57]Open in a new tab
The results are plotted left to right from the p-terminal ends of the
chromosomes, which are shown in different colors. The blue line
represents the P-value threshold (0.05).
For enrichment analysis of the SNP associations, the P-value threshold
was set lower than 0.05 ([58]Fig. 1, blue line) in order to prevent the
elimination of potentially MS-associated SNPs with falsely high
P-values due to the low sample size and limitations of the genome-wide
association methodology. Later, elimination of the false-positive SNP
associations resulted from preferring not to set a conventional
stringent P-value was aimed to be achieved during the identification of
MS-related pathways using SNP subnetworks, therefore ruling
functionally irrelevant SNPs out. A total of 6.594 SNPs had P-values
under the threshold of 0.05 ([59]Table S1), which were included in the
subsequent analyses.
MS-related pathways
The combination of enriched pathways for the resulting SNPs and
differentially expressed proteins resulted in 151 enriched pathways in
total ([60]Table S2). In order to identify pathways with similar
content and function, this combined list of enriched pathways was
clustered ([61]Fig. 2). The dendrogram was then manually partitioned
into biologically relevant clusters, and representative pathways were
established. In total, 33 clusters, therefore 33 representative
pathways were obtained ([62]Table S3). Nine of the representative
pathways emerged from both genomic and proteomic analyses, among which
the complement and coagulation cascade (hsa04610) was the most
significantly associated pathway in the studied group (P =
6.96 × 10^−30). [63]Figure 3 shows the complement and coagulation
cascades with genes identified through the genomic dataset and
differentially expressed proteins identified through the proteomic
dataset as a representative pathway for the relationships between the
genomic and proteomic findings.
Figure 2. Clustering dendrogram of the enriched pathways.
[64]Figure 2
[65]Open in a new tab
The dendrogram shows all 151 enriched pathways, which were grouped in
clusters based on their relevance to each other. The red line shows the
cut-off for biologically relevant clusters, which was cut at the
pairwise distance of 0.55, resulting in 33 representative clusters.
Thirty-three pathways with the lowest P-values in each cluster
constituted the representative pathways.
Figure 3. Complement and coagulation cascades.
[66]Figure 3
[67]Open in a new tab
The illustration of complement and coagulation cascades shows the
identified genes by SNP associations (blue) and differentially
expressed proteins (yellow) as a representative pathway for the merged
data analysis. The pathway was adapted from KEGG: Kyoto Encyclopedia of
Genes and Genomes ([68]Kanehisa & Goto, 2000; hsa04610, 2019) and
created with BioRender.com.
Regulation of actin cytoskeleton (hsa04810) and focal adhesions
(hsa04510) were the two systems with the second and third lowest
P-values even though there were no significant protein level changes
related to these pathways (P = 9.64 × 10^−27 and P = 3.29 × 10^−26,
respectively). Other 8 shared pathways with high significance levels
are as follows: Adherens junctions (hsa04520, P = 5.38 × 10^−17),
colorectal cancer (hsa05210, P = 1.70 × 10^−15), pathogenic Escherichia
coli infection (hsa05130, P = 7.02 × 10^−9), prion diseases (hsa05020,
P = 2.48 × 10^−5), endometrial cancer (hsa05213, P = 6.00 × 10^−9),
non-small cell lung cancer (hsa05223, P = 1.40 × 10^−8), bladder cancer
(hsa05219, P = 4.83 × 10^−6), and non-homologous end-joining (hsa03450,
P = 4.27 × 10^−5). [69]Table 1 shows the detected SNP associations and
differentially expressed proteins in the above-mentioned pathways.
Table 1. The most prominent MS-related pathways emerged from the analyses.
KEGG terms and IDs SNP associations Differentially expressed proteins
FinalP-value
P-value Genes P-value
Complement and coagulation cascades, hsa04610 1.14 × 10^−16 FGA
[[70]rs2070018 ]; CR1 [[71]rs1571344 ]; F13A1 [[72]rs3901123;
[73]rs34736558; [74]rs1267843; [75]rs1267914; [76]rs749005 ]; PLG
[[77]rs9295131 ]; F5 [[78]rs12131397 ]; C6 [[79]rs2921184;
[80]rs1801033 ]; PROC [[81]rs2069933; [82]rs1799808 ] 8.43 × 10^−16
6.96 × 10^−30
Adherens junction, hsa04520 5.38 × 10^−17 INSR [[83]rs10409516 ]; LMO7
[[84]rs9573625 ]; WASL [[85]rs10270793 ]; EGFR [[86]rs6970262 ]; IGF1R
[[87]rs2684788 ]; FER [[88]rs7715933; [89]rs7700630; [90]rs7713591 ];
CTNNA3 [[91]rs10997250; [92]rs7076094; [93]rs10997316 ]; RAC1
[[94]rs2347338 ]; PVRL2 [[95]rs519825 ] 2.03 × 10^−10 6.64 × 10^−25
Colorectal cancer, hsa05210 1.70 × 10^−15 TCF7L1 [[96]rs6547608;
[97]rs10195517 ]; TGFB2 [[98]rs1473527; [99]rs1417488 ]; AXIN1
[[100]rs3916990 ]; MAPK9 [[101]rs6867398; [102]rs17080136;
[103]rs11956696 ]; MAPK8 [[104]rs17780725 ]; RAC1 [[105]rs2347338 ]
2.88 × 10^−9 2.68 × 10^−22
Pathogenic Escherichia coli infection, hsa05130 7.02 × 10^−9 ROCK2
[[106]rs6432187 ]; WASL [[107]rs10270793 ] 3.72 × 10^−7 9.03 × 10^−14
Prion diseases, hsa05020 2.48 × 10^−5 NOTCH1 [[108]rs3812605;
[109]rs3812609; [110]rs2229974 ]; NCAM2 [rs232398] 6.31 × 10^−10
5.13 × 10^−13
Endometrial cancer, hsa05213 6.00 × 10^−9 TCF7L1 [[111]rs6547608;
[112]rs10195517 ]; AXIN1 [[113]rs3916990 ]; EGFR [[114]rs6970262 ];
CTNNA3 [[115]rs10997250; [116]rs7076094; [117]rs10997316 ]; SOS2
[[118]rs2227276 ] 5.77 × 10^−6 1.11 × 10^−12
Non-small cell lung cancer, hsa05223 1.40 × 10^−8 ALK [[119]rs7591913;
[120]rs7589120 ]; FHIT [[121]rs643629; [122]rs10510827;
[123]rs10510852; [124]rs3772479; [125]rs6778312; [126]rs12633994;
[127]rs13317933; [128]rs13078190 ]; SOS2 [[129]rs2227276 ]; EGFR [
[130]rs6970262 ] 2.82 × 10^−6 1.26 × 10^−12
Bladder cancer, hsa05219 4.83 × 10^−6 E2F3 [[131]rs12527393 ]
8.47 × 10^−7 1.11 × 10^−10
Non-homologous end-joining, hsa03450 4.27 × 10^−5 XRCC4 [[132]rs2662242
] 9.88 × 10^−6 9.53 × 10^−9
Top 2 pathways emerged only from SNP associations
Regulation of actin cytoskeleton, hsa04810 9.64 × 10^−27 CYFIP1
[[133]rs7182576; [134]rs2120968; [135]rs17137192 ]; ROCK2
[[136]rs6432187 ]; WASL [ [137]rs10270793 ]; EGFR [[138]rs6970262 ];
MYLK [[139]rs4678060; [140]rs3911406; [141]rs2124508; rs702032]; GNA13
[[142]rs12944877 ]; PAK7 [[143]rs6118687; [144]rs6118709 ]; RAC1
[[145]rs2347338 ]; PAK2 [[146]rs6583176 ]; PAK4 [[147]rs11083505;
[148]rs692364 ]; VAV3 [[149]rs11576720 ]; ITGA4 [[150]rs3770111 ];
ACTN1 [[151]rs181484 ]; ITGA2 [[152]rs27504 ]; FN1 [[153]rs1404772 ];
VAV2 [[154]rs7026263 ]; TIAM1 [[155]rs2833359; [156]rs845972;
[157]rs9984499; [158]rs11700792 ]; ARHGEF4 [[159]rs2403238 ]; MYH9
[[160]rs11703176 ]; CRK [[161]rs6502707 ]; DOCK1 [[162]rs10741150;
[163]rs6482855; [164]rs10458718 ]; ARHGEF6 [[165]rs41312580 ]; ITGA9
[rs149815; [166]rs1984311 ] – 9.64 × 10^−27
Focal adhesion, hsa04510 3.29 × 10^−26 FIGF [[167]rs7877192;
[168]rs6632528 ]; SHC3 [[169]rs1331189 ]; LAMA1 [[170]rs658121 ]; IGF1R
[[171]rs2684788 ]; MAPK9 [[172]rs6867398; [173]rs17080136;
[[174]rs11956696 ]; MAPK8 [[175]rs17780725 ]; KDR [[176]rs7692791 ];
ITGB6 [[177]rs6730023 ]; RAC1 [[178]rs2347338 ]; PAK4 [[179]rs11083505;
[180]rs692364 ]; VASP [[181]rs4803831 ]; VAV3 [[182]rs11576720 ]; CAV3
[[183]rs11713611 ]; ITGA4 [[184]rs3770111 ]; ACTN1 [[185]rs181484 ];
ITGA2 [[186]rs27504 ]; FN1 [[187]rs1404772 ]; MAPK10 [[188]rs1469869 ];
ITGA10 [[189]rs12401622 ]; ITGA8 [[190]rs2039910; [191]rs1451668 ];
RAPGEF1 [[192]rs2296950 ]; TLN1 [[193]rs2295794 ]; CRK [[194]rs6502707
]; DOCK1 [[195]rs10741150; [196]rs6482855; [197]rs10458718 ]; SOS2
[[198]rs2227276 ]; ITGA9 [[199]rs149815; [200]rs1984311 ] –
3.29 × 10^−26
[201]Open in a new tab
The most significantly MS-associated pathways obtained from the SNP
associations and differentially expressed proteins did not overlap when
the analyses were performed separately ([202]Table S2). The merged data
analysis resulted in a combined list of pathways from both datasets,
revealing a different set of pathways with the lowest P-values and
emphasizing the most relevant pathways that could not have been
prioritized by a straight-forward, non-combinatorial omics approach.
The top five pathways that emerged from the proteomic-only,
genomic-only, and merged dataset analyses are given in [203]Table 2.
Table 2. Top five pathways emerged from each dataset analysis.
KEGG Terms and IDs P-value
Top 5 pathways emerged from the proteomic-only data analysis
Complement and coagulation cascades, hsa04610 8.43 × 10^−16
Adherens junction, hsa04520 2.03 × 10^−10
Prion diseases, hsa05020 6.31 × 10^−10
Colorectal cancer, hsa05210 2.88 × 10^−09
Pathogenic Escherichia coli infection, hsa05130 3.72 × 10^−07
Top 5 pathways emerged from the genomic-only data analysis
Regulation of actin cytoskeleton, hsa04810 9.64 × 10^−27
Focal adhesion, hsa04510 3.29 × 10^−26
ErbB signaling pathway, hsa04012 1.28 × 10^−23
Rap1 signaling pathway, hsa04015 2.35 × 10^−20
Axon guidance, hsa04360 6.36 × 10^−20
Top 5 pathways emerged from the merged data analysis
Complement and coagulation cascades, hsa04610 6.96 × 10^−30
Regulation of actin cytoskeleton, hsa04810 9.64 × 10^−27
Focal adhesion, hsa04510 3.29 × 10^−26
Adherens junction, hsa04520 6.64 × 10^−25
ErbB signaling pathway, hsa04012 1.28 × 10^−23
[204]Open in a new tab
Discussion
The complement and coagulation cascade (CCC) pathway emerged as the
most significantly associated pathway to MS in our patient group, with
many SNP associations on different genes and differentially expressed
CSF proteins as shown in [205]Fig. 3. The complement and the
coagulation systems are two closely linked cascades, both having roles
in immunity. In a study by [206]Magliozzi et al. (2019), CSF proteomic
profiles of MS patients with low or high cortical lesion load were
compared, revealing that the identified differentially expressed
proteins were mainly involved in the complement and coagulation
cascade. Examination of white matter lesions of MS patients has
revealed dysregulation of coagulation-associated proteins in chronic
active plaques involving Serpin A5 and tissue factor ([207]Han et al.,
2008). Also, significantly upregulated Serpin E1, another CCC
component, was reported in the post-mortem cortex of progressive MS
patients ([208]Yates et al., 2017). In our study, even though the genes
and differentially expressed proteins involved in CCC do not overlap,
the pathway was found 10 times for each dataset in PANOGA, resulting in
a high association with MS. Our results confirm the previous findings
suggesting the involvement of CCC alterations in MS pathophysiology and
highlight other pathway components that may also be responsible for
these alterations.
Disruption of blood–brain barrier (BBB) integrity is one of the
hallmarks in MS pathophysiology, during which massive leukocyte
infiltration occurs across the damaged BBB into the CNS ([209]Alvarez,
Cayrol & Prat, 2011) Tight and adherens junctions between the
endothelial cells of BBB have significant importance for normal immune
surveillance in the CNS. Tight junctions consist of claudins, occludin,
junctional adhesion molecules, and Zonula Occludens ([210]Hawkins &
Davis, 2005), whose dysfunctions leading to BBB abnormalities in MS
have been previously reported ([211]Kirk et al., 2003; [212]Padden et
al., 2007; [213]Plumb et al., 2002). Cadherins form adherens junctions
via homophilic interactions between endothelial cells and bind
cytoskeletal components through cytoplasmic catenin proteins
([214]Dejana, Orsenigo & Lampugnani, 2008). Although adherens junctions
are required for the overall junctional organization to maintain the
BBB integrity, their roles in normal physiology and diseased states
have not been well-established. In a previous study, the level of an
adherens junction protein, β-catenin, was similar in progressive MS and
non-MS brain sections ([215]Padden et al., 2007). In our patient group,
adherens junctions, which were not detected in our previous proteomic
study, emerged as the second most significantly altered cellular
components with SNP associations on different genes and differentially
expressed proteins indirectly related to the pathway. Our findings
indicate that adherens junctions may have more influence on BBB
function than it has been thought and are needed to be explored in both
normal physiology and MS pathophysiology in more detail.
Focal adhesions and actin cytoskeleton regulation are two interrelated
pathways that emerged with low final P-values from the analyses, even
though no significant protein expression changes directly or indirectly
related to these systems were detected. Organized clusters of focal
adhesion complexes connect extracellular matrix through transmembrane
integrin proteins to the intracellular actin cytoskeleton through other
focal adhesion proteins in the complexes and mainly have roles in cell
motility ([216]Wozniak et al., 2004). During leukocyte infiltration
across the BBB, α4-integrin proteins on leukocytes form firm
connections with the endothelial surface, initiating the process
([217]Berlin et al., 1995; [218]Vajkoczy, Laschinger & Engelhardt,
2001). In this context, Natalizumab (Antegren, Elan Pharmaceuticals and
Biogen), a monoclonal antibody against α4-integrin, has been used in MS
treatment, which suppresses inflammatory activity by inhibiting
leukocyte migration to the inflammation areas ([219]Polman et al.,
2006). We detected many SNP associations on genes encoding for many
focal adhesion proteins, including the α4-integrin-coding ITGA4 gene.
Focal adhesion molecules and their combinations to create different
complexes are diverse, and regulation differences of these interactions
for specific cell behaviors are yet to be elucidated.
In the studied MS patients, increased CSF nucleolin expression and SNP
associations on WASL and ROCK2 genes were detected, all of which are
components of the Pathogenic Escherichia coli infection pathway. Neural
Wiskott-Aldrich syndrome protein encoded by WASL and Rho-associated
protein kinase 2 encoded by ROCK2 are both regulators of the actin
cytoskeleton ([220]Miki, Miura & Takenawa, 1996; [221]Gallo et al.,
2012), and the involvement of the pathway may indicate induction of
inflammatory cell mobility. Prion diseases are fatal conditions
presenting neuronal degeneration, and the emergence of the pathway from
our analyses emphasizes some shared molecular alterations with MS
pathophysiology, as both neurodegeneration and neuroinflammation also
occur in prion diseases ([222]Perry, Cunningham & Boche, 2002;
[223]Eikelenboom et al., 2002). We have also detected a number of
cancer pathways involving both SNP associations and differential CSF
expressions. Cancer risk among MS patients has been investigated in a
number of studies with diverse results, revealing unchanged
([224]Nielsen et al., 2006), reduced ([225]Bahmanyar et al., 2009), and
increased ([226]Grytten et al., 2020) cancer rates. Altered cancer
pathways in our study group may be attributed to changes in
immunological mechanisms involving the same components as in mechanisms
against anti-tumor surveillance. Other pathways that emerged only from
the SNP associations included a number of immunological and
neurological mechanisms. T cell receptor signaling pathway (hsa04660, P
= 1.86 × 10^−19), chemokine signaling pathway (hsa04062, P =
4.38 × 10^−19), and Fc-gamma R-mediated phagocytosis (hsa04666, P =
8.78 × 10^−16) were the immune system-related pathways. Nervous
system-related pathways included axon guidance (hsa04360, P =
6.36 × 10^−20), retrograde endocannabinoid signaling (hsa04723, P =
6.74 × 10^−18), neurotrophin signaling pathway (hsa04722, P = 8.03 ×
10^−18), glutamatergic synapse (hsa04724, P = 1.74 × 10^−15),
cholinergic synapse (hsa04725, P = 7.79 × 10^−15), and dopaminergic
synapse (hsa04728, P = 5.83 × 10^−12). Also, the Rap1 signaling pathway
(hsa04015, P = 2.35 × 10^−20) was detected, further supporting the
involvement of adherens junction, focal adhesion, and regulation of
actin cytoskeleton pathways.
Working on small sample sizes for genome-wide associations leads to
enormous numbers of false-positive SNP associations. Using large sample
sizes, on the other hand, may result in false-negative results due to
the low effect sizes of disease-associated SNPs. One limitation of our
study is the considerably low sample size in the assessment of SNP
associations. In this regard, we aimed to take advantage of having both
DNA and CSF samples of the same affected individuals. Therefore, we
aimed to include both false-positive SNP associations and prevent
missing out SNPs with falsely high P-values in the first place by
setting a non-stringent P-value cut-off. We then aimed to exclude
false-positive associations during the pathway enrichment analyses in
which interacting SNP subnetworks were used to identify the pathways,
eliminating functionally irrelevant SNPs. Integrative analyses of the
genomic and proteomic datasets determined the shared pathways between
the two datasets, which seem to represent the pathophysiological
mechanisms. Another limitation of the study is that we have used a
genotyping chip with a relatively low SNP number (300K) since it was
one of the most powerful chips when the genotyping had been carried
out. Lastly, we had not obtained mRNA samples of the study participants
at the time of their sample collection, which would lead to a more
comprehensive analysis of the altered pathways in the studied patient
group.
Conclusions
Here, we provide an integrated pathway analysis of whole-genome SNP and
CSF proteome datasets obtained from the same MS patients, highlighting
the most prominent molecular pathways that may be altered in the
studied patient group. We confirmed the involvement of some of the
previously known players in MS pathophysiology and suggested possible
roles of some others. We conclude that this approach provides a better
placing of separate genomic and proteomic datasets into a biological
context and is useful for elucidating disease mechanisms.
Supplemental Information
Supplemental Information 1. SNPs included in the pathway analysis.
[227]peerj-09-11922-s001.xlsx^ (258.6KB, xlsx)
DOI: 10.7717/peerj.11922/supp-1
Supplemental Information 2. All pathways found by the combined pathway
enrichment analysis of the SNPs and proteins.
[228]peerj-09-11922-s002.xlsx^ (30KB, xlsx)
DOI: 10.7717/peerj.11922/supp-2
Supplemental Information 3. Representative pathways found by clustering
of the merged enriched pathways.
[229]peerj-09-11922-s003.xlsx^ (17.7KB, xlsx)
DOI: 10.7717/peerj.11922/supp-3
Supplemental Information 4. Raw genotype data obtained using
HumanCytoSNP-12 (Illumina).
[230]peerj-09-11922-s004.pdf^ (23.4MB, pdf)
DOI: 10.7717/peerj.11922/supp-4
Acknowledgments