Abstract

Background

   Multiple sclerosis (MS) has a complex pathophysiology, variable
   clinical presentation, and unpredictable prognosis; understanding the
   underlying mechanisms requires combinatorial approaches that warrant
   the integration of diverse molecular omics data.

Methods

   Here, we combined genomic and proteomic data of the same individuals
   among a Turkish MS patient group to search for biologically important
   networks. We previously identified differentially-expressed proteins by
   cerebrospinal fluid proteome analysis of 179 MS patients and 42 non-MS
   controls. Among this study group, 11 unrelated MS patients and 60
   independent, healthy controls were subjected to whole-genome SNP
   genotyping, and genome-wide associations were assessed. Pathway
   enrichment analyses of MS-associated SNPs and differentially-expressed
   proteins were conducted using the functional enrichment tool, PANOGA.

Results

   Nine shared pathways were detected between the genomic and proteomic
   datasets after merging and clustering the enriched pathways. Complement
   and coagulation cascade was the most significantly associated pathway
   (hsa04610, P = 6.96 × 10^−30). Other pathways involved in neurological
   or immunological mechanisms included adherens junctions (hsa04520,
   P = 6.64 × 10^−25), pathogenic Escherichia coli infection (hsa05130,
   P = 9.03 × 10^−14), prion diseases (hsa05020, P = 5.13 × 10^−13).

Conclusion

   We conclude that integrating multiple datasets of the same patients
   helps reducing false negative and positive results of genome-wide SNP
   associations and highlights the most prominent cellular players among
   the complex pathophysiological mechanisms.

   Keywords: Bioinformatics, Genomics, Proteomics, Multiple sclerosis

Introduction

   Multiple sclerosis (MS) is known to be an immune-mediated,
   neurodegenerative central nervous system (NS) disorder with complex
   inheritance and pathophysiological mechanisms. Although approximately
   common 250 genetic variants with low to modest risk effects have been
   associated with MS mainly by genome-wide association studies (GWAS)
   using rather large sample groups ([38]International Multiple Sclerosis
   Genetics Consortium, 2007; [39]Sawcer et al., 2011; [40]Patsopoulos et
   al., 2017; [41]International Multiple Sclerosis Genetics Consortium et
   al., 2019), known variants not only fail to explain predicted MS
   heritability but also cannot be efficiently translated into disease
   mechanisms. To date, numerous studies have also revealed potential
   diagnostic and prognostic biomarkers and disease-related cellular
   pathways that emphasize the different pathological components of the
   disease; however, the exact underlying mechanisms in disease
   development and progression are largely unknown. In order to better
   translate the growing number of findings into disease pathophysiology,
   algorithms for pathway analyses of multiple high-throughput omics data
   seem essential. In this context, the integration of multiple omics data
   is essential to better describe the complex nature of MS.

   In our previous study ([42]Avsar et al., 2015), we have conducted a
   cerebrospinal fluid (CSF) proteome analysis using 2D-gel
   electrophoresis and mass spectrophotometry and identified 151
   differentially expressed proteins between an MS cohort of 179 patients
   with different clinical MS phenotypes and 42 non-MS controls. Later,
   affected Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were
   identified using the functional enrichment tool Pathway and Network-
   Oriented GWAS Analysis (PANOGA) ([43]Bakir-Gungor, Egemen & Sezerman,
   2014), revealing MS-related pathways including aldosterone-regulated
   sodium reabsorption pathway, renin- angiotensin system, notch signaling
   pathway, and vitamin digestion and absorption pathway. Here, we further
   explored disease-related pathways, applying single nucleotide
   polymorphism (SNP) genotyping on 11 MS patients, who were included in
   the previous proteomic study, and 60 independent, healthy individuals.
   Pathway enrichment analyses of the genomic and proteomic data were
   conducted. The two datasets were merged, highlighting the most
   prominent pathways that may be affected in the studied patient group.

Materials & Methods

Study groups

   All patients were diagnosed at Istanbul University-Cerrahpaşa,
   Cerrahpaşa School of Medicine, Department of Neurology, according to
   the McDonald criteria ([44]Polman et al., 2011). Eleven unrelated MS
   patients were randomly selected among the study group of our previous
   work comprising of 179 MS patients. The patient group had a
   heterogeneous disease presentation at the time of their CSF sample
   collection. Sixty independent age- and gender-matched intrinsic healthy
   controls were included in the analyses. All individuals in the study
   group are of Turkish origin. All procedures of the study were in
   accordance with the Helsinki Declaration of 1964 and its later
   amendments. The Ethics Committee of Istanbul Technical University
   approved the study (İTÜ-SM.İNAREK-MBG-1), and each individual of the
   study group gave a written informed consent form prior to the sample
   collection.

Genome-wide associations

   DNA isolation from blood samples was performed (Roche DNA Isolation Kit
   for Mammalian Blood), and genotyping for 300.000 SNP markers was
   applied for each individual on the Illumina HumanCytoSNP-12 array.
   Illumina GenomeStudio software was used for quality control. All
   individuals showed sample call rates of more than 98% (98.65–99.63%)
   and were therefore included in the study. Golden Helix SNP & Variation
   Suite software was used for identity-by-descent detection, suggesting
   no relatedness between the individuals. SNP filtering was performed:
   SNPs on the Y chromosome, with call rates lower than 95%, minor allele
   frequency lower than 0.01, and in strong linkage disequilibrium (r^2 >
   0.5) were excluded from the study. A total of 129.547 SNPs was included
   in the analysis. Frequency differences of SNPs between cases and
   controls were assessed, and genotypic association P-values were
   obtained.

Pathway enrichment analysis

   Genotypic associations and proteomic data were subjected to pathway
   enrichment analyses using the functional enrichment tool PANOGA, which
   reveals functionally relevant pathways by identifying genes within the
   pathways, incorporating protein-protein interaction (PPI) information,
   and extracting significant pathways. SNPs with genotypic association
   P-values lower than 0.05 and differentially-expressed proteins with
   P-values lower than 0.05 from our previous study were used for PANOGA
   procedure as follows: For SNP associations, because a SNP might affect
   more than one gene, each SNP was initially assigned to the gene on
   which the SNP has the most important functional effect, using the SPOT
   webserver ([45]Saccone et al., 2010). Functional information of SNPs
   was obtained utilizing SPOT, F-SNP ([46]Lee & Shatkay, 2007), SNPnexus
   ([47]Dayem Ullah, Lemoine & Chelala, 2012), and SNPinfo ([48]Xu &
   Taylor, 2009). Genes and proteins were then mapped onto a PPI network,
   for which Goh et al.’s human PPI network was used ([49]Goh et al.,
   2007). Next, the jActive Modules algorithm ([50]Ideker et al., 2002 was
   applied to identify active subnetworks containing a large number of the
   disease-affected genes and proteins in the PPI network. Each genotypic
   association and differential expression P-value was taken into account,
   and active subnetworks that overlap at most 50% with each other were
   extracted. In order to evaluate the biological importance of the
   subnetworks, the number of genes and proteins found in a specific KEGG
   pathway was compared to the total gene and protein number involved in
   the corresponding pathway. For this functional enrichment procedure, a
   two-sided hypergeometric test was used, and the Bonferroni method was
   applied for multiple testing corrections of the P-values. Significantly
   associated KEGG pathways to our patient group consisted of
   significantly enriched (P < 0.05) pathways for at least one of the
   active subnetworks. For the pathways that were enriched in multiple
   subnetworks, the one with the lowest P-value was reported. PANOGA was
   run 10 times for both genotypic associations and
   differentially-expressed proteins, and the lowest P-values over the 10
   iterations were reported. The resulting enriched pathways for both
   datasets were then merged. If a given pathway was identified in both
   analyses, the corresponding P-values were merged using Fisher’s
   combined probability test. If the pathway was identified in only one of
   the analyses, the corresponding P-value was used as the final P-value.

Clustering of enriched pathways

   Using a method previously described by [51]Chen et al. (2014) we
   clustered the enriched pathways to identify similar groups and
   establish representative pathways. The clustering approach can be
   summarized as follows: initially overlap index matrix (OI) that
   consists of the overlap indices between all of the pairs of enriched
   pathways was calculated. The overlap index (OI[i,j]) of a pair of
   pathways P[i] and P[j] (denoting the ith and jth pathways,
   respectively) was defined as:
   [MATH: <mstyle
   displaystyle="true"><mi>O</mi><msub><mrow><mi>I</mi></mrow><mrow><mi>i<
   /mi><mo>,</mo><mi>j</mi></mrow></msub><mo>=</mo><mfrac><mrow><mo>|</mo>
   <msub><mrow><mi>G</mi></mrow><mrow><mi>i</mi></mrow></msub><mo>∩</mo><m
   sub><mrow><mi>G</mi></mrow><mrow><mi>j</mi></mrow></msub><mo>|</mo></mr
   ow><mrow><mo class="qopname">min</mo><mrow><mfenced separators=""
   open="(" close=")"><mfenced separators="" open="|"
   close="|"><mrow><msub><mrow><mi>G</mi></mrow><mrow><mi>i</mi></mrow></m
   sub></mrow></mfenced><mo>,</mo><mo>|</mo><msub><mrow><mi>G</mi></mrow><
   mrow><mi>j</mi></mrow></msub><mo>|</mo></mfenced></mrow></mrow></mfrac>
   </mstyle> :MATH]

   where G[i] is the set of all genes in the ith pathway. In the overlap
   matrix, each row (o[i]) represents the gene overlap profile of a
   pathway (i.e.,  o[i] is the gene overlap profile of the ith pathway).
   To identify similarity between each pair of pathways Pearson
   correlation coefficients (R) between each pair of overlap profiles
   (e.g.,  o[i] and o[j]) were calculated. These correlation coefficients
   were then converted to pairwise distances (PD) as:
   [MATH: <mstyle
   displaystyle="true"><mi>P</mi><mi>D</mi><mo>=</mo><mn>1</mn><mo>−</mo><
   mi>R</mi><mo>.</mo></mstyle> :MATH]

   Using PD as the distance metric, hierarchical clustering
   (average-linkage) of the enriched pathways was performed. Examining the
   hierarchical clustering dendrogram, the dendrogram was partitioned into
   clusters at the manually selected PD cut-off value of 0.55. The pathway
   with the lowest P-value in each cluster was selected as the
   representative pathway for that cluster.

Results

SNP associations

   All 71 individuals and a total of 129.547 SNPs passed the quality check
   and therefore were included in the final analyses. The Manhattan plot
   summarizing the genome-wide associations is given in [52]Fig. 1. The
   most significantly associated SNP was rs7873, located on 3′ UTR of the
   IGF2 gene, whose significance level was the only value close to a
   classical stringent P-value cut-off for a GWAS (P = 4.39 × 10^−07).
   When considering SNPs with a significance level of lower than a looser
   P-value cut off (P < 10^−05) given the small sample size, 4 more SNPs
   have high associations with MS in the studied patient group, one of
   which is located on a gene: [53]rs17187282 (P = 1.17 × 10^−06),
   [54]rs11688088 (P = 2.8 × 10^−06), rs654188 (P = 5.24 × 10^−06), and
   [55]rs7092208 (P = 9.66 × 10^−06, intronic variant on the MGMT gene).

Figure 1. Genome-wide Manhattan plot showing –log[10] of the P-values of
129.547 SNPs (y-axis) against their genomic positions (x-axis) for
associations.

   [56]Figure 1
   [57]Open in a new tab

   The results are plotted left to right from the p-terminal ends of the
   chromosomes, which are shown in different colors. The blue line
   represents the P-value threshold (0.05).

   For enrichment analysis of the SNP associations, the P-value threshold
   was set lower than 0.05 ([58]Fig. 1, blue line) in order to prevent the
   elimination of potentially MS-associated SNPs with falsely high
   P-values due to the low sample size and limitations of the genome-wide
   association methodology. Later, elimination of the false-positive SNP
   associations resulted from preferring not to set a conventional
   stringent P-value was aimed to be achieved during the identification of
   MS-related pathways using SNP subnetworks, therefore ruling
   functionally irrelevant SNPs out. A total of 6.594 SNPs had P-values
   under the threshold of 0.05 ([59]Table S1), which were included in the
   subsequent analyses.

MS-related pathways

   The combination of enriched pathways for the resulting SNPs and
   differentially expressed proteins resulted in 151 enriched pathways in
   total ([60]Table S2). In order to identify pathways with similar
   content and function, this combined list of enriched pathways was
   clustered ([61]Fig. 2). The dendrogram was then manually partitioned
   into biologically relevant clusters, and representative pathways were
   established. In total, 33 clusters, therefore 33 representative
   pathways were obtained ([62]Table S3). Nine of the representative
   pathways emerged from both genomic and proteomic analyses, among which
   the complement and coagulation cascade (hsa04610) was the most
   significantly associated pathway in the studied group (P =
   6.96 × 10^−30). [63]Figure 3 shows the complement and coagulation
   cascades with genes identified through the genomic dataset and
   differentially expressed proteins identified through the proteomic
   dataset as a representative pathway for the relationships between the
   genomic and proteomic findings.

Figure 2. Clustering dendrogram of the enriched pathways.

   [64]Figure 2
   [65]Open in a new tab

   The dendrogram shows all 151 enriched pathways, which were grouped in
   clusters based on their relevance to each other. The red line shows the
   cut-off for biologically relevant clusters, which was cut at the
   pairwise distance of 0.55, resulting in 33 representative clusters.
   Thirty-three pathways with the lowest P-values in each cluster
   constituted the representative pathways.

Figure 3. Complement and coagulation cascades.

   [66]Figure 3
   [67]Open in a new tab

   The illustration of complement and coagulation cascades shows the
   identified genes by SNP associations (blue) and differentially
   expressed proteins (yellow) as a representative pathway for the merged
   data analysis. The pathway was adapted from KEGG: Kyoto Encyclopedia of
   Genes and Genomes ([68]Kanehisa & Goto, 2000; hsa04610, 2019) and
   created with BioRender.com.

   Regulation of actin cytoskeleton (hsa04810) and focal adhesions
   (hsa04510) were the two systems with the second and third lowest
   P-values even though there were no significant protein level changes
   related to these pathways (P = 9.64 × 10^−27 and P = 3.29 × 10^−26,
   respectively). Other 8 shared pathways with high significance levels
   are as follows: Adherens junctions (hsa04520, P = 5.38 × 10^−17),
   colorectal cancer (hsa05210, P = 1.70 × 10^−15), pathogenic Escherichia
   coli infection (hsa05130, P = 7.02 × 10^−9), prion diseases (hsa05020,
   P = 2.48 × 10^−5), endometrial cancer (hsa05213, P = 6.00 × 10^−9),
   non-small cell lung cancer (hsa05223, P = 1.40 × 10^−8), bladder cancer
   (hsa05219, P = 4.83 × 10^−6), and non-homologous end-joining (hsa03450,
   P = 4.27 × 10^−5). [69]Table 1 shows the detected SNP associations and
   differentially expressed proteins in the above-mentioned pathways.

Table 1. The most prominent MS-related pathways emerged from the analyses.

   KEGG terms and IDs SNP associations Differentially expressed proteins
   FinalP-value
   P-value Genes P-value
   Complement and coagulation cascades, hsa04610 1.14 × 10^−16 FGA
   [[70]rs2070018 ]; CR1 [[71]rs1571344 ]; F13A1 [[72]rs3901123;
   [73]rs34736558; [74]rs1267843; [75]rs1267914; [76]rs749005 ]; PLG
   [[77]rs9295131 ]; F5 [[78]rs12131397 ]; C6 [[79]rs2921184;
   [80]rs1801033 ]; PROC [[81]rs2069933; [82]rs1799808 ] 8.43 × 10^−16
   6.96 × 10^−30
   Adherens junction, hsa04520 5.38 × 10^−17 INSR [[83]rs10409516 ]; LMO7
   [[84]rs9573625 ]; WASL [[85]rs10270793 ]; EGFR [[86]rs6970262 ]; IGF1R
   [[87]rs2684788 ]; FER [[88]rs7715933; [89]rs7700630; [90]rs7713591 ];
   CTNNA3 [[91]rs10997250; [92]rs7076094; [93]rs10997316 ]; RAC1
   [[94]rs2347338 ]; PVRL2 [[95]rs519825 ] 2.03 × 10^−10 6.64 × 10^−25
   Colorectal cancer, hsa05210 1.70 × 10^−15 TCF7L1 [[96]rs6547608;
   [97]rs10195517 ]; TGFB2 [[98]rs1473527; [99]rs1417488 ]; AXIN1
   [[100]rs3916990 ]; MAPK9 [[101]rs6867398; [102]rs17080136;
   [103]rs11956696 ]; MAPK8 [[104]rs17780725 ]; RAC1 [[105]rs2347338 ]
   2.88 × 10^−9 2.68 × 10^−22
   Pathogenic Escherichia coli infection, hsa05130 7.02 × 10^−9 ROCK2
   [[106]rs6432187 ]; WASL [[107]rs10270793 ] 3.72 × 10^−7 9.03 × 10^−14
   Prion diseases, hsa05020 2.48 × 10^−5 NOTCH1 [[108]rs3812605;
   [109]rs3812609; [110]rs2229974 ]; NCAM2 [rs232398] 6.31 × 10^−10
   5.13 × 10^−13
   Endometrial cancer, hsa05213 6.00 × 10^−9 TCF7L1 [[111]rs6547608;
   [112]rs10195517 ]; AXIN1 [[113]rs3916990 ]; EGFR [[114]rs6970262 ];
   CTNNA3 [[115]rs10997250; [116]rs7076094; [117]rs10997316 ]; SOS2
   [[118]rs2227276 ] 5.77 × 10^−6 1.11 × 10^−12
   Non-small cell lung cancer, hsa05223 1.40 × 10^−8 ALK [[119]rs7591913;
   [120]rs7589120 ]; FHIT [[121]rs643629; [122]rs10510827;
   [123]rs10510852; [124]rs3772479; [125]rs6778312; [126]rs12633994;
   [127]rs13317933; [128]rs13078190 ]; SOS2 [[129]rs2227276 ]; EGFR [
   [130]rs6970262 ] 2.82 × 10^−6 1.26 × 10^−12
   Bladder cancer, hsa05219 4.83 × 10^−6 E2F3 [[131]rs12527393 ]
   8.47 × 10^−7 1.11 × 10^−10
   Non-homologous end-joining, hsa03450 4.27 × 10^−5 XRCC4 [[132]rs2662242
   ] 9.88 × 10^−6 9.53 × 10^−9
   Top 2 pathways emerged only from SNP associations
   Regulation of actin cytoskeleton, hsa04810 9.64 × 10^−27 CYFIP1
   [[133]rs7182576; [134]rs2120968; [135]rs17137192 ]; ROCK2
   [[136]rs6432187 ]; WASL [ [137]rs10270793 ]; EGFR [[138]rs6970262 ];
   MYLK [[139]rs4678060; [140]rs3911406; [141]rs2124508; rs702032]; GNA13
   [[142]rs12944877 ]; PAK7 [[143]rs6118687; [144]rs6118709 ]; RAC1
   [[145]rs2347338 ]; PAK2 [[146]rs6583176 ]; PAK4 [[147]rs11083505;
   [148]rs692364 ]; VAV3 [[149]rs11576720 ]; ITGA4 [[150]rs3770111 ];
   ACTN1 [[151]rs181484 ]; ITGA2 [[152]rs27504 ]; FN1 [[153]rs1404772 ];
   VAV2 [[154]rs7026263 ]; TIAM1 [[155]rs2833359; [156]rs845972;
   [157]rs9984499; [158]rs11700792 ]; ARHGEF4 [[159]rs2403238 ]; MYH9
   [[160]rs11703176 ]; CRK [[161]rs6502707 ]; DOCK1 [[162]rs10741150;
   [163]rs6482855; [164]rs10458718 ]; ARHGEF6 [[165]rs41312580 ]; ITGA9
   [rs149815; [166]rs1984311 ] – 9.64 × 10^−27
   Focal adhesion, hsa04510 3.29 × 10^−26 FIGF [[167]rs7877192;
   [168]rs6632528 ]; SHC3 [[169]rs1331189 ]; LAMA1 [[170]rs658121 ]; IGF1R
   [[171]rs2684788 ]; MAPK9 [[172]rs6867398; [173]rs17080136;
   [[174]rs11956696 ]; MAPK8 [[175]rs17780725 ]; KDR [[176]rs7692791 ];
   ITGB6 [[177]rs6730023 ]; RAC1 [[178]rs2347338 ]; PAK4 [[179]rs11083505;
   [180]rs692364 ]; VASP [[181]rs4803831 ]; VAV3 [[182]rs11576720 ]; CAV3
   [[183]rs11713611 ]; ITGA4 [[184]rs3770111 ]; ACTN1 [[185]rs181484 ];
   ITGA2 [[186]rs27504 ]; FN1 [[187]rs1404772 ]; MAPK10 [[188]rs1469869 ];
   ITGA10 [[189]rs12401622 ]; ITGA8 [[190]rs2039910; [191]rs1451668 ];
   RAPGEF1 [[192]rs2296950 ]; TLN1 [[193]rs2295794 ]; CRK [[194]rs6502707
   ]; DOCK1 [[195]rs10741150; [196]rs6482855; [197]rs10458718 ]; SOS2
   [[198]rs2227276 ]; ITGA9 [[199]rs149815; [200]rs1984311 ] –
   3.29 × 10^−26
   [201]Open in a new tab

   The most significantly MS-associated pathways obtained from the SNP
   associations and differentially expressed proteins did not overlap when
   the analyses were performed separately ([202]Table S2). The merged data
   analysis resulted in a combined list of pathways from both datasets,
   revealing a different set of pathways with the lowest P-values and
   emphasizing the most relevant pathways that could not have been
   prioritized by a straight-forward, non-combinatorial omics approach.
   The top five pathways that emerged from the proteomic-only,
   genomic-only, and merged dataset analyses are given in [203]Table 2.

Table 2. Top five pathways emerged from each dataset analysis.

   KEGG Terms and IDs P-value
   Top 5 pathways emerged from the proteomic-only data analysis
   Complement and coagulation cascades, hsa04610 8.43 × 10^−16
   Adherens junction, hsa04520 2.03 × 10^−10
   Prion diseases, hsa05020 6.31 × 10^−10
   Colorectal cancer, hsa05210 2.88 × 10^−09
   Pathogenic Escherichia coli infection, hsa05130 3.72 × 10^−07
   Top 5 pathways emerged from the genomic-only data analysis
   Regulation of actin cytoskeleton, hsa04810 9.64 × 10^−27
   Focal adhesion, hsa04510 3.29 × 10^−26
   ErbB signaling pathway, hsa04012 1.28 × 10^−23
   Rap1 signaling pathway, hsa04015 2.35 × 10^−20
   Axon guidance, hsa04360 6.36 × 10^−20
   Top 5 pathways emerged from the merged data analysis
   Complement and coagulation cascades, hsa04610 6.96 × 10^−30
   Regulation of actin cytoskeleton, hsa04810 9.64 × 10^−27
   Focal adhesion, hsa04510 3.29 × 10^−26
   Adherens junction, hsa04520 6.64 × 10^−25
   ErbB signaling pathway, hsa04012 1.28 × 10^−23
   [204]Open in a new tab

Discussion

   The complement and coagulation cascade (CCC) pathway emerged as the
   most significantly associated pathway to MS in our patient group, with
   many SNP associations on different genes and differentially expressed
   CSF proteins as shown in [205]Fig. 3. The complement and the
   coagulation systems are two closely linked cascades, both having roles
   in immunity. In a study by [206]Magliozzi et al. (2019), CSF proteomic
   profiles of MS patients with low or high cortical lesion load were
   compared, revealing that the identified differentially expressed
   proteins were mainly involved in the complement and coagulation
   cascade. Examination of white matter lesions of MS patients has
   revealed dysregulation of coagulation-associated proteins in chronic
   active plaques involving Serpin A5 and tissue factor ([207]Han et al.,
   2008). Also, significantly upregulated Serpin E1, another CCC
   component, was reported in the post-mortem cortex of progressive MS
   patients ([208]Yates et al., 2017). In our study, even though the genes
   and differentially expressed proteins involved in CCC do not overlap,
   the pathway was found 10 times for each dataset in PANOGA, resulting in
   a high association with MS. Our results confirm the previous findings
   suggesting the involvement of CCC alterations in MS pathophysiology and
   highlight other pathway components that may also be responsible for
   these alterations.

   Disruption of blood–brain barrier (BBB) integrity is one of the
   hallmarks in MS pathophysiology, during which massive leukocyte
   infiltration occurs across the damaged BBB into the CNS ([209]Alvarez,
   Cayrol & Prat, 2011) Tight and adherens junctions between the
   endothelial cells of BBB have significant importance for normal immune
   surveillance in the CNS. Tight junctions consist of claudins, occludin,
   junctional adhesion molecules, and Zonula Occludens ([210]Hawkins &
   Davis, 2005), whose dysfunctions leading to BBB abnormalities in MS
   have been previously reported ([211]Kirk et al., 2003; [212]Padden et
   al., 2007; [213]Plumb et al., 2002). Cadherins form adherens junctions
   via homophilic interactions between endothelial cells and bind
   cytoskeletal components through cytoplasmic catenin proteins
   ([214]Dejana, Orsenigo & Lampugnani, 2008). Although adherens junctions
   are required for the overall junctional organization to maintain the
   BBB integrity, their roles in normal physiology and diseased states
   have not been well-established. In a previous study, the level of an
   adherens junction protein, β-catenin, was similar in progressive MS and
   non-MS brain sections ([215]Padden et al., 2007). In our patient group,
   adherens junctions, which were not detected in our previous proteomic
   study, emerged as the second most significantly altered cellular
   components with SNP associations on different genes and differentially
   expressed proteins indirectly related to the pathway. Our findings
   indicate that adherens junctions may have more influence on BBB
   function than it has been thought and are needed to be explored in both
   normal physiology and MS pathophysiology in more detail.

   Focal adhesions and actin cytoskeleton regulation are two interrelated
   pathways that emerged with low final P-values from the analyses, even
   though no significant protein expression changes directly or indirectly
   related to these systems were detected. Organized clusters of focal
   adhesion complexes connect extracellular matrix through transmembrane
   integrin proteins to the intracellular actin cytoskeleton through other
   focal adhesion proteins in the complexes and mainly have roles in cell
   motility ([216]Wozniak et al., 2004). During leukocyte infiltration
   across the BBB, α4-integrin proteins on leukocytes form firm
   connections with the endothelial surface, initiating the process
   ([217]Berlin et al., 1995; [218]Vajkoczy, Laschinger & Engelhardt,
   2001). In this context, Natalizumab (Antegren, Elan Pharmaceuticals and
   Biogen), a monoclonal antibody against α4-integrin, has been used in MS
   treatment, which suppresses inflammatory activity by inhibiting
   leukocyte migration to the inflammation areas ([219]Polman et al.,
   2006). We detected many SNP associations on genes encoding for many
   focal adhesion proteins, including the α4-integrin-coding ITGA4 gene.
   Focal adhesion molecules and their combinations to create different
   complexes are diverse, and regulation differences of these interactions
   for specific cell behaviors are yet to be elucidated.

   In the studied MS patients, increased CSF nucleolin expression and SNP
   associations on WASL and ROCK2 genes were detected, all of which are
   components of the Pathogenic Escherichia coli infection pathway. Neural
   Wiskott-Aldrich syndrome protein encoded by WASL and Rho-associated
   protein kinase 2 encoded by ROCK2 are both regulators of the actin
   cytoskeleton ([220]Miki, Miura & Takenawa, 1996; [221]Gallo et al.,
   2012), and the involvement of the pathway may indicate induction of
   inflammatory cell mobility. Prion diseases are fatal conditions
   presenting neuronal degeneration, and the emergence of the pathway from
   our analyses emphasizes some shared molecular alterations with MS
   pathophysiology, as both neurodegeneration and neuroinflammation also
   occur in prion diseases ([222]Perry, Cunningham & Boche, 2002;
   [223]Eikelenboom et al., 2002). We have also detected a number of
   cancer pathways involving both SNP associations and differential CSF
   expressions. Cancer risk among MS patients has been investigated in a
   number of studies with diverse results, revealing unchanged
   ([224]Nielsen et al., 2006), reduced ([225]Bahmanyar et al., 2009), and
   increased ([226]Grytten et al., 2020) cancer rates. Altered cancer
   pathways in our study group may be attributed to changes in
   immunological mechanisms involving the same components as in mechanisms
   against anti-tumor surveillance. Other pathways that emerged only from
   the SNP associations included a number of immunological and
   neurological mechanisms. T cell receptor signaling pathway (hsa04660, P
   = 1.86 × 10^−19), chemokine signaling pathway (hsa04062, P =
   4.38 × 10^−19), and Fc-gamma R-mediated phagocytosis (hsa04666, P =
   8.78 × 10^−16) were the immune system-related pathways. Nervous
   system-related pathways included axon guidance (hsa04360, P =
   6.36 × 10^−20), retrograde endocannabinoid signaling (hsa04723, P =
   6.74 × 10^−18), neurotrophin signaling pathway (hsa04722, P = 8.03 ×
   10^−18), glutamatergic synapse (hsa04724, P = 1.74 × 10^−15),
   cholinergic synapse (hsa04725, P = 7.79 × 10^−15), and dopaminergic
   synapse (hsa04728, P = 5.83 × 10^−12). Also, the Rap1 signaling pathway
   (hsa04015, P = 2.35 × 10^−20) was detected, further supporting the
   involvement of adherens junction, focal adhesion, and regulation of
   actin cytoskeleton pathways.

   Working on small sample sizes for genome-wide associations leads to
   enormous numbers of false-positive SNP associations. Using large sample
   sizes, on the other hand, may result in false-negative results due to
   the low effect sizes of disease-associated SNPs. One limitation of our
   study is the considerably low sample size in the assessment of SNP
   associations. In this regard, we aimed to take advantage of having both
   DNA and CSF samples of the same affected individuals. Therefore, we
   aimed to include both false-positive SNP associations and prevent
   missing out SNPs with falsely high P-values in the first place by
   setting a non-stringent P-value cut-off. We then aimed to exclude
   false-positive associations during the pathway enrichment analyses in
   which interacting SNP subnetworks were used to identify the pathways,
   eliminating functionally irrelevant SNPs. Integrative analyses of the
   genomic and proteomic datasets determined the shared pathways between
   the two datasets, which seem to represent the pathophysiological
   mechanisms. Another limitation of the study is that we have used a
   genotyping chip with a relatively low SNP number (300K) since it was
   one of the most powerful chips when the genotyping had been carried
   out. Lastly, we had not obtained mRNA samples of the study participants
   at the time of their sample collection, which would lead to a more
   comprehensive analysis of the altered pathways in the studied patient
   group.

Conclusions

   Here, we provide an integrated pathway analysis of whole-genome SNP and
   CSF proteome datasets obtained from the same MS patients, highlighting
   the most prominent molecular pathways that may be altered in the
   studied patient group. We confirmed the involvement of some of the
   previously known players in MS pathophysiology and suggested possible
   roles of some others. We conclude that this approach provides a better
   placing of separate genomic and proteomic datasets into a biological
   context and is useful for elucidating disease mechanisms.

Supplemental Information

   Supplemental Information 1. SNPs included in the pathway analysis.
   [227]peerj-09-11922-s001.xlsx^ (258.6KB, xlsx)
   DOI: 10.7717/peerj.11922/supp-1
   Supplemental Information 2. All pathways found by the combined pathway
   enrichment analysis of the SNPs and proteins.
   [228]peerj-09-11922-s002.xlsx^ (30KB, xlsx)
   DOI: 10.7717/peerj.11922/supp-2
   Supplemental Information 3. Representative pathways found by clustering
   of the merged enriched pathways.
   [229]peerj-09-11922-s003.xlsx^ (17.7KB, xlsx)
   DOI: 10.7717/peerj.11922/supp-3
   Supplemental Information 4. Raw genotype data obtained using
   HumanCytoSNP-12 (Illumina).
   [230]peerj-09-11922-s004.pdf^ (23.4MB, pdf)
   DOI: 10.7717/peerj.11922/supp-4

Acknowledgments