Abstract

   Sensitive and reliable protein biomarkers are needed to predict disease
   trajectory and personalize treatment strategies for multiple sclerosis
   (MS). Here, we use the highly sensitive proximity-extension assay
   combined with next-generation sequencing (Olink Explore) to quantify
   1463 proteins in cerebrospinal fluid (CSF) and plasma from 143 people
   with early-stage MS and 43 healthy controls. With longitudinally
   followed discovery and replication cohorts, we identify CSF proteins
   that consistently predicted both short- and long-term disease
   progression. Lower levels of neurofilament light chain (NfL) in CSF is
   superior in predicting the absence of disease activity two years after
   sampling (replication AUC = 0.77) compared to all other tested
   proteins. Importantly, we also identify a combination of 11 CSF
   proteins (CXCL13, LTA, FCN2, ICAM3, LY9, SLAMF7, TYMP, CHI3L1, FYB1,
   TNFRSF1B and NfL) that predict the severity of disability worsening
   according to the normalized age-related MS severity score (replication
   AUC = 0.90). The identification of these proteins may help elucidate
   pathogenetic processes and might aid decisions on treatment strategies
   for persons with MS.

   Subject terms: Multiple sclerosis, Machine learning, Prognostic
   markers, Diagnostic markers
     __________________________________________________________________

   Precise biomarkers for multiple sclerosis prognosis are vital for
   treatment decisions. Here, the authors identify specific proteins in
   cerebrospinal fluid that can predict short-term disease activity and
   long-term disability outcomes in persons with multiple sclerosis.

Introduction

   Achieving personalized multiple sclerosis (MS) treatment strategies
   requires more refined data than evaluation of relapse rate, disease
   progression, and measurements of magnetic resonance imaging (MRI)
   activity in early disease stages^[52]1. The comprehensive investigation
   of MS biomarkers, including their validation on a completely new
   cohort, remains exceptionally rare. A recent meta-analysis study has
   shown that less than 8% of all studies have adopted this stringent
   methodology in order to establish the robustness and generalizability
   for modeling MS^[53]2. To identify new MS biomarkers, extensive
   discovery approaches are required, such as large-scale proteomics^[54]3
   which has shown significant potential in investigating cerebrospinal
   fluid (CSF) to elucidate various aspects of the disease^[55]4. The
   proximity extension assay (PEA), recently combined with next-generation
   sequencing (PEA-NGS or Olink Explore), allows for large-scale
   investigation of almost 1500 proteins in a small volume with high
   sensitivity and accuracy^[56]5–[57]7. This technology has provided
   opportunities for identifying protein biomarkers^[58]5,[59]8–[60]10
   that are otherwise difficult to detect due to their low abundance in
   body fluids.

   MS is a chronic inflammatory and degenerative disease of the central
   nervous system (CNS), causing inflammation, demyelination and
   neuroaxonal damage^[61]11. Early initiation of treatment, particularly
   with high-efficacy therapies, has been associated with better clinical
   outcomes and can delay neurological disability
   progression^[62]12–[63]15. On the other hand, unnecessary treatment
   must be avoided^[64]16. Since early treatment affects long-term
   disability outcome, it is likely that disease-associated pathways
   leading to demyelination and neuroaxonal damage are present already at
   early stages of the disease. This, in turn, would allow for the
   discovery of early biomarkers able to predict subsequent disease
   progression and to provide optimal treatment strategies for each
   person.

   Immunological and neurological disease processes can impact the
   composition of circulating body fluids^[65]9. As a result, changes in
   protein levels in blood and CSF can be used as biomarkers for disease
   recognition and disease activity^[66]10,[67]17–[68]19. Most protein
   biomarkers of relevance in MS have been identified in CSF^[69]20, while
   only a few candidates have been identified in plasma^[70]10. Since
   blood samples are much easier to collect and can be collected
   repeatedly as compared to CSF, plasma makes a more attractive option
   for biomarker discovery. However, potential biomarker proteins are
   generally less abundant in plasma than in CSF^[71]21. Furthermore, it
   remains unclear how well protein levels in plasma reflect
   disease-relevant processes taking place in the CNS, and in general,
   plasma and CSF protein levels do not correlate^[72]22.

   In this study (see overview in Fig. [73]1), we use the highly sensitive
   and specific PEA-NGS technology to measure the expression of 1463
   proteins in paired CSF and plasma samples from two well-defined cohorts
   of persons with MS (pwMS) in the early stages and healthy controls
   (HC). We identify a set of differentially expressed MS-relevant
   proteins and test their ability to predict, either individually or in
   combination, short-term disease activity and long-term confirmed
   disability worsening.

Fig. 1. Overview of the study.

   [74]Fig. 1
   [75]Open in a new tab

   a Prospective longitudinal study of two Swedish cohorts of persons with
   MS (pwMS) in the early stages and healthy controls (HC). b Proteomics
   profiling of cerebrospinal fluid (CSF) and plasma samples of all pwMS
   and HC at baseline. c Clinical examination of pwMS during a follow-up
   of up to 13 years. d Differential expression analysis, performed with a
   two-sided linear model t-test (Limma analysis), to find MS biomarker
   candidates. e Building machine learning models for identification of
   protein MS biomarkers for diagnosis (logistic regression model),
   prediction of short-term disease activity (logistic regression model),
   and prediction of long-term disability worsening (linear regression
   model).

Results

Proteins in CSF were differentially expressed in MS versus HC in two
independent cohorts

   We analyzed protein expression levels of 1463 proteins in both CSF and
   plasma samples from 143 pwMS in early stages of the disease and 43 HC.
   The pwMS were divided into a discovery cohort (92 pwMS and 23 HC from
   Linköping University Hospital) and a replication cohort (51 pwMS and 20
   HC from Karolinska University Hospital; Table [76]1). Plasma samples
   from 21 pwMS in the replication cohort had higher expression of several
   protein markers known to be affected by sampling and handling
   variability^[77]23 and were therefore excluded from further analysis
   (see Supplementary Fig. [78]1 and Supplementary Fig. [79]2). Using
   linear model t-test (Limma analysis) we first tested if proteins were
   differentially expressed between pwMS in a relapse or not, on treatment
   or not within 3 months before baseline sampling, or based on disease
   duration at baseline sampling. No differentially expressed proteins
   (DEPs) between these groups were found (false discovery rate
   (FDR) < 0.05; see “Methods”). Therefore, all pwMS were included in the
   following analyses.

Table 1.

   Baseline characteristics of persons with MS (pwMS) and healthy controls
   (HC)
   Discovery cohort Replication cohort p-value*
   Discovery vs. Replication
   MS HC MS HC
   Cohort size n 92 23 51 20 NA
   Sex** F/M 67/25 18/5 39/12 10/10 0.69
   Age** (years) Median (range) 31 (16–64) 32 (22–64) 32 (18–54) 30
   (22–47) 0.80
   CSF data**
   CSF cell count Median (range) 4.7 (0–125) 2.1 (0.3–4.6) 6 (0–32)^a 0
   (0–2.0)^b 0.38
   Albumin ratio Median (range) 4.0 (1.5–9.1) 4.9 (2.1–7.0) 4 (2.2–10) 4
   (2.7–12) 0.60
   IgG index Median (range) 0.8 (0.4–2.7) 0.5 (0.4–0.5) 0.8 (0.4–3.2)^c
   0.4 (0.3–0.5) 0.52
   Oligoclonal CSF IgG bands Yes/No 86/6 0/23 44/2^d 0/18^d 0.72
   [80]Open in a new tab

   CSF cerebrospinal fluid.

   ^an = 46 due to missing data; ^bn = 19 due to missing data; ^cn = 48
   due to missing data; ^dmissing data exists.

   *Two-sided Fisher’s exact test was used for contingency tables or
   two-sided Mann–Whitney U test for continuous values.

   **Sex was not significantly different between pwMS and HC in the
   discovery cohort but it was significantly different in the replication
   cohort (p = 0.04). Age was not significantly different between pwMS and
   HC in either discovery or replication cohort. CSF cell count, IgG
   index, and oligoclonal CSF IgG bands were significantly different
   between pwMS and HC in both discovery and replication cohorts
   (p < 0.01).

   Next, we compared the protein expression in CSF between all pwMS and
   the HC and found a clear separation by principal component analysis
   (Fig. [81]2a). A Limma analysis identified 52 DEPs in the discovery
   cohort whereof 40 were also nominally differentially expressed
   (p < 0.05) in the replication cohort (Fig. [82]2b, c; see Supplementary
   Data [83]1; see Supplementary Fig. [84]3). Furthermore, in the
   replication cohort, 25 proteins were independently differentially
   expressed, whereof 23 proteins overlapped with the discovery cohort
   (Fig. [85]2c). Interestingly, levels of all the 52 DEPs in the
   discovery cohort and the 23 overlapping proteins in the replication
   cohort were higher in pwMS compared with controls. To investigate the
   MS relevance of the DEPs we performed enrichment analyses using three
   different sets of MS genes and proteins. We found highly significant
   enrichment (Fig. [86]2c) for genes from the DisGeNET database^[87]24,
   GWAS genes^[88]25, and known potential MS biomarkers (see Supplementary
   Table [89]1). For example, 65% of the 52 DEPs in the discovery cohort
   (Fisher’s exact test, p = 7∗10^−8) and 60% of the 25 DEPs in the
   replication cohort (p = 0.002) were associated with MS in the DisGeNET
   database. However, some previously suggested MS markers (including
   C1QA, CCL2, CXCL1, GFAP, HGF, and OPN) had non-significant log[2] fold
   change (FC; −0.25–0.34) when comparing pwMS to HC (see Supplementary
   Fig. [90]4). In contrast to CSF, protein profiling of plasma did not
   reveal any significant differences in protein expression after FDR in
   pwMS compared with HC (Fig. [91]2a, b). A few of the DEPs in CSF were
   also nominally differentially expressed in plasma, but with no overlap
   between discovery and replication cohorts (see Supplementary
   Fig. [92]5; see Supplementary Data [93]1). In addition, we found in
   general low correlation between CSF samples and plasma samples for the
   52 DEPs in CSF in the discovery cohort, with the strongest correlations
   obtained for NfL (Pearson’s correlation coefficient (PCC) = 0.46) and
   IL-18 (PCC = 0.33) (see Supplementary Fig. [94]6).

Fig. 2. Differential expression analysis of persons with MS (pwMS) compared
to healthy controls (HC) in cerebrospinal fluid (CSF) and plasma.

   [95]Fig. 2
   [96]Open in a new tab

   a Principal component (PC) analysis of all proteins measured in the CSF
   samples (left) and plasma samples (right). b Volcano plots showing
   differentially expressed proteins (DEPs) in CSF (left) and plasma
   (right). The top upregulated proteins, which overlapped in discovery
   cohort and replication cohort, are marked with protein names in the
   plots. The differential expression analysis was performed using a
   two-sided linear model t-test (Limma analysis). c DEPs (false discovery
   rate < 0.05) in the CSF, in either the discovery cohort or the
   replication cohort. The first two columns show the log[2] fold change
   (FC) of the DEPs in each cohort. Most proteins are upregulated (red)
   and 23 proteins overlap in discovery and replication cohorts. In the
   three columns to the right, it is marked which proteins are in three
   different list of known MS-associated genes and proteins (DisGeNET
   database, GWAS genes, and MS biomarkers) with the odds ratio of the
   enrichment shown on the top (two-sided Fisher’s exact test). The DEPs
   were significantly enriched for MS-associated genes from DisGeNet
   (discovery: p = 7∗10^−8, replication: p = 0.002), GWAS (discovery:
   p = 1∗10^−7, replication: p = 2∗10^−4), and known MS biomarkers
   (discovery: p = 1∗10^−12, replication: p = 2∗10^−6).

   In summary, the 52 CSF proteins identified in the bigger discovery
   cohort represent a set of proteins being dysregulated in early stages
   of MS suggesting their importance in MS pathogenesis. The fact that
   these proteins were enriched for MS-relevant genes makes them strong
   biomarker candidates, and they were therefore used in the following
   prediction models.

B-cell activation markers can discriminate between MS and HC

   In order to test the diagnostic potential of the 52 DEPs in CSF from
   the discovery cohort, we created univariate logistic regression models
   for each of the proteins as well as a stepwise selection model (see
   “Methods”). To make fair assessments of the predictive power of our
   inferred models we allowed no refitting of any model parameters in the
   replication cohort, thus we expect the replication area under the
   receiver operating characteristic curve (AUC) to be a good estimation
   of the model test performance. In the model selection, age and sex were
   included as possible predictors. The highest AUC was found when having
   MZB1 and TNF in the model, which could predict the presence of disease
   with AUC = 0.99 (p = 2∗10^−13) in the discovery cohort and AUC = 0.87
   (p = 6∗10^−7) in the replication cohort. Not surprisingly, in the
   univariate logistic regression models, AUC of the discovery cohort was
   high in all cases, but encouragingly most proteins also had high
   replication AUCs (Fig. [97]3a; see Supplementary Table [98]2). The top
   five proteins for prediction of diagnosis were MZB1, CD79B, CD27,
   TNFRSF13B, and IL-12p40 as ordered by AUC in the discovery cohort
   (Fig. [99]3a), where MZB1 had similar performance as the stepwise
   selection model containing MZB1 and TNF. These five proteins were
   reliably expressed above the limit of detection (LOD) in more than 95%
   of samples from pwMS and HC (see Supplementary Fig. [100]10). Finally,
   we investigated the discriminative power of plasma proteins. We then
   used the same logistic regression formulas that were trained in the CSF
   data of the discovery cohort and applied them to the plasma data of
   both cohorts. The levels of two of the derived proteins, FCN2 and
   IL-1RA, could discriminate pwMS from HC (AUC = 0.71 for FCN2 and
   AUC = 0.65 for IL-1RA) in the discovery cohort but not in the plasma
   data of the replication cohort. Taken together, several CSF proteins
   (MZB1, CD79B, CD27, TNFRSF13B, and IL-12p40) showed a strong ability to
   discriminate pwMS from HC, whereof the proteins MZB1, CD79B, CD27, and
   TNFRSF13B are related to B-cell activation.

Fig. 3. Performance of the top cerebrospinal fluid (CSF) proteins for
predicting diagnosis and disease activity over 2 years.

   [101]Fig. 3
   [102]Open in a new tab

   Predictive power, assessed by area under the curve (AUC), of the most
   significant CSF proteins in the discovery cohort in differentiating
   between a persons with MS (pwMS; n = 92 samples in the discovery and
   n = 51 samples in the replication cohort) and healthy controls (HC;
   n = 23 samples in the discovery and n = 20 samples in the replication
   cohort) and b pwMS showing evidence of disease activity after 2 years
   (n = 48 samples in discovery and n = 45 samples in replication cohort)
   and pwMS not showing evidence of disease activity after 2 years (n = 30
   samples in discovery and n = 5 samples in replication cohort). A
   logistic regression model was used to assess the predictive power of
   both individual proteins (the top 5 proteins in the discovery cohort
   are shown) and a combination of proteins, selected with a stepwise
   method, trained on the discovery cohort and independently validated on
   the replication cohort. The significance of the AUC scores were
   assessed with a two-sided Mann–Whitney U test. The p-values for the AUC
   scores of the diagnosis models in the order (stepwise model, NfL,
   CD79B, CD27, TNFRSF13B, IL-12p40) were (2∗10^−13, 4∗10^−13, 1∗10^−12,
   3∗10^−12, 6∗10^−12, 6∗10^−11) for the discovery cohort and (6∗10^−7,
   4∗10^−7, 2∗10^−5, 10∗10^−7, 1∗10^−7, 2∗10^−8) for the replication
   cohort. The p-values for the AUC scores of the disease activity models
   in the order (stepwise model, NfL, IL-1RA, FASLG, CCL3, CD6) were
   (1∗10^−8, 9∗10^−5, 0.002, 0.003, 0.004, 0.004) for the discovery cohort
   and (0.19, 0.02, 0.02, 0.14, 0.03, 0.41) for the replication cohort.

NfL is superior in predicting disease activity over 2 years

   Next, we aimed to create a robust model for predicting the future
   short-term (2-year) disease activity using the NEDA-3 concept. NEDA-3
   is a binary variable based on no evidence or evidence of disease
   activity, as determined by reported clinical relapses, new or enlarged
   MRI brain lesions, or worsening in the Expanded Disability Status Scale
   (EDSS; see “Methods”)^[103]26. We found that 39% of pwMS in the
   discovery cohort and 10% of pwMS in the replication cohort were
   classified as having no evidence of disease activity (NEDA) during 2
   years follow-up, the remaining pwMS were classified as having evidence
   of disease activity (EDA). We then performed a Limma analysis of NEDA
   versus EDA groups but found no DEPs in the discovery cohort. Instead,
   we based the model on the 52 proteins that were differentially
   expressed in pwMS versus HC (in the discovery cohort) since these
   proteins were considered highly relevant to MS based on the enrichment
   of MS genes (see above). We used a similar approach as for prediction
   of MS diagnosis (see above) and trained a logistic regression model for
   each of the 52 proteins (Supplementary Table [104]3) and a stepwise
   selection model including the 52 proteins, age, and sex as the input
   predictors (Fig. [105]3b). The best separating model was based on NfL
   levels in CSF and had an AUC = 0.75 (p = 9∗10^−5) in the discovery
   cohort and an AUC = 0.77 (p = 0.02) in the replication cohort. In
   addition, IL-1RA, and CCL3 showed predictive power for disease
   activity, although inferior to NfL, when considering results from both
   the discovery and the replication cohort (Fig. [106]3b). A stepwise
   selection model (combination of NfL, IL-18, PDCD1, and CD6) showed good
   discrimination in the discovery cohort (AUC = 0.85) but not as good as
   NfL alone in the replication cohort (AUC = 0.63). In plasma we found no
   proteins to be of significant value to predict disease activity in
   either of our cohorts. Age and sex were not selected as significant
   predictors in any of the models. To evaluate the potential effect of
   treatment, a treatment duration index covering duration and drug
   efficacy (first-line treatment with less effective drugs versus
   second-line treatment with more effective drugs) during the total
   observation time was calculated (see “Methods”) and added to the
   models. Importantly, pwMS with EDA had in general a higher treatment
   duration index than pwMS with NEDA (p = 0.02 in the discovery cohort
   and p = 0.04 in the replication cohort, one-sided Mann–Whitney U test).
   Adding treatment duration index improved the predictive power of the
   best performing model containing only NfL (AUC = 0.77 in the discovery
   cohort and AUC = 0.82 in the replication cohort) but showed no
   significant effect on the other predictive models. The limited effect
   of the treatment duration index on the model performance, could partly
   be caused by the treatment duration index positively correlating with
   the expression of 34 of the 52 DEPs in the discovery cohort, although
   only the expression of one of these proteins (CCL3) were also
   significantly correlating with treatment duration index in the
   replication cohort (see Supplementary Fig. [107]7). Collectively, our
   findings demonstrate NfL to be the superior protein for predicting
   disease activity over 2 years. In addition, NfL is a very reliable
   marker which is expressed above the LOD in all samples from pwMS.

   To facilitate the use of NfL on its own in future studies, we
   calculated the optimal prediction cut-off in the NfL model, and
   corresponding NPX level, which resulted in the maximum accuracy (see
   “Methods”). We found that the optimal prediction cut-off was a
   probability of 0.45 (accuracy = 0.71), which corresponded to an NPX
   level of 1.14. Using the same NPX threshold in the replication cohort
   resulted in an accuracy of 0.62. To translate NPX to pg/ml, we used a
   fraction of our data (n = 38) from which the NfL levels were known
   based on previous measurement by Simoa^[108]27–[109]29. The NPX and
   pg/ml measurements were highly correlated (Spearman’s Correlation
   Coefficient (SCC) = 0.97), and the NPX threshold of 1.14 corresponded
   to 737 pg/ml (see “Methods”).

A combination of 11 proteins accurately predicts disability worsening

   Whereas the NEDA-3 concept reflects the short-term disease activity
   mainly by detecting relapses and MRI activity, the long-term disability
   progression is more relevant from the perspective of a person with MS
   since it directly affects the quality of life^[110]30. The EDSS is the
   most used measure of disability status, but to adjust for age, the
   age-related MS score (ARMSS) was created^[111]31. To further adjust for
   length of observation time and allow for using data from different
   lengths of follow-up time, we used the recently described normalized
   ARMSS (nARMSS; see “Methods”). To obtain an nARMSS score, a person had
   to have had at least two documented EDSS scores over a period of at
   least 3 years. The resulting cohorts used for predictions consisted of
   71 pwMS in the discovery cohort and 33 pwMS in the replication cohort.
   In Fig. [112]4, each person’s EDSS scores for each follow-up year and
   the resulting nARMSS score are shown and described in further detail in
   Supplementary Fig. [113]8. The nARMSS scores can obtain a value between
   −5 and +5, where a score of 0 represents the average disability
   worsening of pwMS based on historical cohorts (n = 25,558)^[114]31.
   Both the discovery and replication cohorts showed an overrepresentation
   of pwMS with a less severe disability worsening, with 50% of the pwMS
   having a score below −3.0 in the discovery cohort and below −2.0 in the
   replication cohort (see Supplementary Fig. [115]9). The nARMSS scores
   had a significantly stronger correlation with the last ARMSS score (age
   adjusted EDSS) compared to the first ARMSS score, used for calculating
   nARMSS, for both the discovery cohort (SCC = 0.89 compared to
   SCC = 0.71, p = 0.003) and the replication cohort (SCC = 0.92 compared
   to SCC = 0.79, p = 0.03). We first tested if short-term disease
   activity (based on 2-year NEDA-3) was associated with nARMSS but found
   no significant difference in nARMSS when comparing EDA (n = 43) with
   NEDA (n = 27; medians were −2.90 and −3.36, respectively, two-sided
   Mann–Whitney U-test p = 0.15). Then we also tested and found that age
   at baseline and subsequent treatment (treatment duration index) were
   correlating with nARMSS with an SCC = 0.38 (p = 0.001) and an
   SCC = 0.28 (p = 0.02), respectively, which led us to further include
   them as possible covariates in our models in downstream analysis.

Fig. 4. Overview of the Expanded Disability Status Scale (EDSS) scores during
yearly follow-up for persons with MS (pwMS).

   [116]Fig. 4
   [117]Open in a new tab

   The disability worsening scores for pwMS, who had at least two EDSS
   scores over a period of more than 3 years. Each column corresponds to
   one person. The top heatmap shows the EDSS scores for each follow-up
   year (0–13 years), followed by the age of each person. White cells
   indicate that no EDSS score was available for that year. Thereafter
   follows the normalized age-related MS score (nARMSS), calculated from a
   person’s EDSS score and age. In the row underneath the nARMSS score it
   is marked if a person’s nARMSS score is below the thresholds
   nARMSS < −4 or nARMSS < −3, or above the threshold nARMSS > −1. White
   cells indicate that the nARMSS score is not covered by any of these
   three thresholds. The last two rows show the predicted nARMSS score
   obtained from the suggested cerebrospinal fluid (CSF) model combining
   11 proteins (first row) and if the predicted nARMSS score is covered by
   any of the three thresholds mentioned above (second row).

   To create a predictive model of nARMSS, we first performed a Limma
   analysis of the 1463 proteins based on the nARMSS score, but no DEPs
   were identified. Therefore, we again started from the 52 DEPs in CSF of
   pwMS compared to HC in the discovery cohort (see above), age, and sex.
   The predictive model of nARMSS was performed with a stepwise linear
   regression model using the CSF protein data. This resulted in a
   significant model including eleven proteins (CXCL13, LTA, FCN2, ICAM3,
   LY9, SLAMF7, TYMP, CHI3L1, FYB1, TNFRSF1B, NfL) and age as predictors
   (see Supplementary Table [118]4). We also evaluated the effect of
   treatment, by adding treatment duration index to the model, but it did
   not improve the performance of the model. The model consisted of both
   proteins with positive and negative coefficients, even though all
   proteins were upregulated in MS compared to HC. Next, when comparing
   the predicted nARMSS with the true nARMSS we found strong and
   significant correlations in both the discovery (SCC = 0.69,
   p = 3∗10^−11) and the replication cohort (SCC = 0.74, p = 9∗10^−7;
   Fig. [119]5a). To also consider both the correlation and accuracy of
   the prediction, we used Lin’s concordance correlation coefficient (CCC)
   as an additional performance metric, which resulted in a CCC of 0.72
   (p = 2∗10^−12) in the discovery cohort and a CCC of 0.51 (p = 0.002) in
   the replication cohort. As a comparison, we also evaluated the
   performance of models only including age and each of the 11 proteins
   and found that the combined model outperformed each of the individual
   models (see Supplementary Table [120]4).

Fig. 5. Performance of the top models for predicting long-term disability
worsening using cerebrospinal fluid (CSF) and plasma proteins.

   [121]Fig. 5
   [122]Open in a new tab

   a CSF: The predicted normalized age-related MS scores (nARMSS) were
   significantly correlating with the true nARMSS for both discovery and
   replication cohorts, assessed with Spearman’s correlation coefficient
   (SCC; discovery: p = 3∗10^−11, replication: p = 9∗10^−7) and Lin’s
   concordance correlation coefficient (CCC; discovery: p = 2∗10^−12,
   replication: p = 0.002). b CSF: Receiver operating characteristic (ROC)
   curves and area under the curve (AUC) scores for each of the three
   different nARMSS thresholds. The p-values for the AUC scores in the
   order (nARMSS > −1, nARMSS < −3, nARMSS < −4) were (2∗10^−5, 7∗10^5,
   6∗10^−7) for the discovery cohort and (0.03, 4∗10^−4, 6∗10^−5) for the
   replication cohort. c Plasma: Reducing the CSF model to NfL and age
   resulted in a model that could predict nARMSS from plasma samples. The
   predicted nARMSS significantly correlated with the true nARMSS for both
   the discovery cohort (SCC: p = 5∗10^−4, CCC: p = 0.02) and replication
   cohort (SCC: p = 0.04, CCC: p = 0.66). d Plasma: ROC curves and AUC
   scores for each of the three different nARMSS thresholds. The p-values
   for the AUC scores in the order (nARMSS > −1, nARMSS < −3, nARMSS < −4)
   were (4∗10^−4, 0.09, 0.003) for the discovery cohort and (0.08, 0.19,
   0.07) for the replication cohort. The significance of the SCCs and CCCs
   was assessed with t-statistics (two-sided) and the significance of the
   AUC scores were assessed with a one-sided Mann–Whitney U test.

   To further evaluate the performance of the model, we assessed the
   ability to predict groups of pwMS with similar disability worsening. We
   made three different divisions using three different nARMSS thresholds,
   selected using the discovery cohort: nARMSS < −4 (corresponding to 20%
   of pwMS with the best prognosis), nARMSS < −3 (corresponding to 50% of
   the pwMS, i.e., a median split), and nARMSS > −1 (corresponding to 20%
   of the pwMS with the worst prognosis). For each of these thresholds,
   the model successfully identified the selected pwMS group both in the
   discovery and the replication cohort. For each respective threshold the
   AUC for the discovery cohort was 0.85 (p = 2∗10^−5), 0.76
   (p = 7∗10^−5), and 0.92 (p = 6∗10^−7) with an accuracy of 0.85, 0.66,
   and 0.85 and the AUC for the replication cohort was 0.90 (p = 0.03),
   0.88 (p = 4∗10^−4), and 0.90 (p = 6∗10^−5) with an accuracy of 0.88,
   0.85, and 0.82 (Fig. [123]5b). Lastly, we confirmed that the 11
   identified proteins were reliably expressed above the LOD in more than
   60% of samples from pwMS whereof eight proteins were expressed in more
   than 75% of samples from pwMS (See Supplementary Fig. [124]10). The
   performance of models with the three proteins (SLAMF7, TYMP, FYB1)
   removed which did not fulfill the more stringent threshold of 75% can
   be seen in Supplementary Table [125]5.

   We continued by investigating the potential of the model to predict
   nARMSS from plasma samples. Interestingly, the model was enriched
   (p = 0.03) for proteins whose expression in CSF correlated with the
   expression in plasma (p < 0.05 in the discovery cohort). Of the 52 DEPs
   in CSF, seven proteins had correlating expressions in CSF and plasma,
   whereof four were selected in the model: NfL (SCC = 0.45), CXCL13
   (SCC = 0.30), CHI3L1 (SCC = 0.27), and FCN2 (SCC = 0.25; see
   Supplementary Table [126]4). We hypothesized that the correlating
   proteins could be used to predict nARMSS from plasma samples by using a
   model trained on CSF samples. Again, performing a stepwise linear
   regression model, only selecting among the four correlating proteins
   and age, we reduced the model to three terms: intercept (coefficient
   (c) = −0.707), age (c = −0.068) and NfL (c = 0.369). The model could
   predict nARMSS from plasma samples with an SCC of 0.40 (p = 5∗10^−4)
   and a CCC of 0.28 (p = 0.02) in the discovery cohort (n = 71), and an
   SCC of 0.60 (p = 0.04) and a CCC of 0.14 (p = 0.66) in the replication
   cohort (n = 12, Fig. [127]5c). Evaluating the model based on the three
   nARMSS thresholds (nARMSS < −4, nARMSS <–3, nARMSS > −1) resulted in
   discovery AUC of 0.78 (p = 4∗10^−4), 0.59 (p = 0.09), and 0.74
   (p = 0.003), with an accuracy of 0.77, 0.56, and 0.82 and replication
   AUC of 1.0 (p = 0.08), 0.70 (p = 0.19), and 0.78 (p = 0.07) with an
   accuracy of 1.0, 0.58, and 0.50 (Fig. [128]5d). It should be noted that
   only 12 pwMS in the replication cohort had both usable plasma samples
   and fulfilled the requirements for obtaining an nARMSS score.

Network analysis provides functional context for DEPs and reveals additional
biomarker candidates

   To provide a functional context of the discovered MS proteins we made
   an MS network using STRING version 11.5^[129]32. The 11 proteins in the
   nARMSS model and the 23 DEPs that overlapped in the discovery and the
   replication cohort, representing a set of core proteins in MS, were
   connected by adding at most one intermediate protein. The proteins,
   except ADA2, formed a closely connected network consisting of 40
   proteins, including 11 intermediate proteins (Fig. [130]6a,
   Supplementary Fig. [131]11a). Among the intermediate (added) proteins
   there were five proteins that were not included in the proteomics
   profiling; the chemokine receptors CCR1 and CCR5, the receptor ITGAL
   expressed on leukocytes, the adapter protein LCP2 associated with the
   T-cell receptor, and the multifunctional adapter protein SDCBP. The
   resulting MS network had 13.5 times as many interactions than is
   expected (p < 1∗10^−16) using the STRING protein–protein interaction
   network, indicating shared biological functionality^[132]32. Gene
   Ontology enrichment analysis showed that the MS network was highly
   enriched for proteins involved in cytokine-mediated signaling (n = 11,
   p = 7∗10^−7), T-cell activation (n = 14, p = 3∗10^−9) and B-cell
   activation (n = 6, p = 6∗10^−4), exocytosis (n = 4, p = 0.03) and
   endocytosis, in particular phagocytosis (n = 4, p = 0.01), cell
   adhesion including regulation of cell-cell adhesion and cell-cell
   adhesion via plasma-membrane adhesion molecules (n = 11, p = 0.02),
   apoptotic processes including positive regulation of apoptotic process
   (n = 9, p = 6∗10^−5) and negative regulation of leukocyte apoptotic
   process (n = 2, p = 3∗10^−2), myelination including regulation of
   myelination (n = 2, p = 0.02). Some proteins in the network were not
   annotated by Gene Ontology and were therefore manually categorized
   based on the literature (Fig. [133]6b; see references in Supplementary