Abstract Sensitive and reliable protein biomarkers are needed to predict disease trajectory and personalize treatment strategies for multiple sclerosis (MS). Here, we use the highly sensitive proximity-extension assay combined with next-generation sequencing (Olink Explore) to quantify 1463 proteins in cerebrospinal fluid (CSF) and plasma from 143 people with early-stage MS and 43 healthy controls. With longitudinally followed discovery and replication cohorts, we identify CSF proteins that consistently predicted both short- and long-term disease progression. Lower levels of neurofilament light chain (NfL) in CSF is superior in predicting the absence of disease activity two years after sampling (replication AUC = 0.77) compared to all other tested proteins. Importantly, we also identify a combination of 11 CSF proteins (CXCL13, LTA, FCN2, ICAM3, LY9, SLAMF7, TYMP, CHI3L1, FYB1, TNFRSF1B and NfL) that predict the severity of disability worsening according to the normalized age-related MS severity score (replication AUC = 0.90). The identification of these proteins may help elucidate pathogenetic processes and might aid decisions on treatment strategies for persons with MS. Subject terms: Multiple sclerosis, Machine learning, Prognostic markers, Diagnostic markers __________________________________________________________________ Precise biomarkers for multiple sclerosis prognosis are vital for treatment decisions. Here, the authors identify specific proteins in cerebrospinal fluid that can predict short-term disease activity and long-term disability outcomes in persons with multiple sclerosis. Introduction Achieving personalized multiple sclerosis (MS) treatment strategies requires more refined data than evaluation of relapse rate, disease progression, and measurements of magnetic resonance imaging (MRI) activity in early disease stages^[52]1. The comprehensive investigation of MS biomarkers, including their validation on a completely new cohort, remains exceptionally rare. A recent meta-analysis study has shown that less than 8% of all studies have adopted this stringent methodology in order to establish the robustness and generalizability for modeling MS^[53]2. To identify new MS biomarkers, extensive discovery approaches are required, such as large-scale proteomics^[54]3 which has shown significant potential in investigating cerebrospinal fluid (CSF) to elucidate various aspects of the disease^[55]4. The proximity extension assay (PEA), recently combined with next-generation sequencing (PEA-NGS or Olink Explore), allows for large-scale investigation of almost 1500 proteins in a small volume with high sensitivity and accuracy^[56]5–[57]7. This technology has provided opportunities for identifying protein biomarkers^[58]5,[59]8–[60]10 that are otherwise difficult to detect due to their low abundance in body fluids. MS is a chronic inflammatory and degenerative disease of the central nervous system (CNS), causing inflammation, demyelination and neuroaxonal damage^[61]11. Early initiation of treatment, particularly with high-efficacy therapies, has been associated with better clinical outcomes and can delay neurological disability progression^[62]12–[63]15. On the other hand, unnecessary treatment must be avoided^[64]16. Since early treatment affects long-term disability outcome, it is likely that disease-associated pathways leading to demyelination and neuroaxonal damage are present already at early stages of the disease. This, in turn, would allow for the discovery of early biomarkers able to predict subsequent disease progression and to provide optimal treatment strategies for each person. Immunological and neurological disease processes can impact the composition of circulating body fluids^[65]9. As a result, changes in protein levels in blood and CSF can be used as biomarkers for disease recognition and disease activity^[66]10,[67]17–[68]19. Most protein biomarkers of relevance in MS have been identified in CSF^[69]20, while only a few candidates have been identified in plasma^[70]10. Since blood samples are much easier to collect and can be collected repeatedly as compared to CSF, plasma makes a more attractive option for biomarker discovery. However, potential biomarker proteins are generally less abundant in plasma than in CSF^[71]21. Furthermore, it remains unclear how well protein levels in plasma reflect disease-relevant processes taking place in the CNS, and in general, plasma and CSF protein levels do not correlate^[72]22. In this study (see overview in Fig. [73]1), we use the highly sensitive and specific PEA-NGS technology to measure the expression of 1463 proteins in paired CSF and plasma samples from two well-defined cohorts of persons with MS (pwMS) in the early stages and healthy controls (HC). We identify a set of differentially expressed MS-relevant proteins and test their ability to predict, either individually or in combination, short-term disease activity and long-term confirmed disability worsening. Fig. 1. Overview of the study. [74]Fig. 1 [75]Open in a new tab a Prospective longitudinal study of two Swedish cohorts of persons with MS (pwMS) in the early stages and healthy controls (HC). b Proteomics profiling of cerebrospinal fluid (CSF) and plasma samples of all pwMS and HC at baseline. c Clinical examination of pwMS during a follow-up of up to 13 years. d Differential expression analysis, performed with a two-sided linear model t-test (Limma analysis), to find MS biomarker candidates. e Building machine learning models for identification of protein MS biomarkers for diagnosis (logistic regression model), prediction of short-term disease activity (logistic regression model), and prediction of long-term disability worsening (linear regression model). Results Proteins in CSF were differentially expressed in MS versus HC in two independent cohorts We analyzed protein expression levels of 1463 proteins in both CSF and plasma samples from 143 pwMS in early stages of the disease and 43 HC. The pwMS were divided into a discovery cohort (92 pwMS and 23 HC from Linköping University Hospital) and a replication cohort (51 pwMS and 20 HC from Karolinska University Hospital; Table [76]1). Plasma samples from 21 pwMS in the replication cohort had higher expression of several protein markers known to be affected by sampling and handling variability^[77]23 and were therefore excluded from further analysis (see Supplementary Fig. [78]1 and Supplementary Fig. [79]2). Using linear model t-test (Limma analysis) we first tested if proteins were differentially expressed between pwMS in a relapse or not, on treatment or not within 3 months before baseline sampling, or based on disease duration at baseline sampling. No differentially expressed proteins (DEPs) between these groups were found (false discovery rate (FDR) < 0.05; see “Methods”). Therefore, all pwMS were included in the following analyses. Table 1. Baseline characteristics of persons with MS (pwMS) and healthy controls (HC) Discovery cohort Replication cohort p-value* Discovery vs. Replication MS HC MS HC Cohort size n 92 23 51 20 NA Sex** F/M 67/25 18/5 39/12 10/10 0.69 Age** (years) Median (range) 31 (16–64) 32 (22–64) 32 (18–54) 30 (22–47) 0.80 CSF data** CSF cell count Median (range) 4.7 (0–125) 2.1 (0.3–4.6) 6 (0–32)^a 0 (0–2.0)^b 0.38 Albumin ratio Median (range) 4.0 (1.5–9.1) 4.9 (2.1–7.0) 4 (2.2–10) 4 (2.7–12) 0.60 IgG index Median (range) 0.8 (0.4–2.7) 0.5 (0.4–0.5) 0.8 (0.4–3.2)^c 0.4 (0.3–0.5) 0.52 Oligoclonal CSF IgG bands Yes/No 86/6 0/23 44/2^d 0/18^d 0.72 [80]Open in a new tab CSF cerebrospinal fluid. ^an = 46 due to missing data; ^bn = 19 due to missing data; ^cn = 48 due to missing data; ^dmissing data exists. *Two-sided Fisher’s exact test was used for contingency tables or two-sided Mann–Whitney U test for continuous values. **Sex was not significantly different between pwMS and HC in the discovery cohort but it was significantly different in the replication cohort (p = 0.04). Age was not significantly different between pwMS and HC in either discovery or replication cohort. CSF cell count, IgG index, and oligoclonal CSF IgG bands were significantly different between pwMS and HC in both discovery and replication cohorts (p < 0.01). Next, we compared the protein expression in CSF between all pwMS and the HC and found a clear separation by principal component analysis (Fig. [81]2a). A Limma analysis identified 52 DEPs in the discovery cohort whereof 40 were also nominally differentially expressed (p < 0.05) in the replication cohort (Fig. [82]2b, c; see Supplementary Data [83]1; see Supplementary Fig. [84]3). Furthermore, in the replication cohort, 25 proteins were independently differentially expressed, whereof 23 proteins overlapped with the discovery cohort (Fig. [85]2c). Interestingly, levels of all the 52 DEPs in the discovery cohort and the 23 overlapping proteins in the replication cohort were higher in pwMS compared with controls. To investigate the MS relevance of the DEPs we performed enrichment analyses using three different sets of MS genes and proteins. We found highly significant enrichment (Fig. [86]2c) for genes from the DisGeNET database^[87]24, GWAS genes^[88]25, and known potential MS biomarkers (see Supplementary Table [89]1). For example, 65% of the 52 DEPs in the discovery cohort (Fisher’s exact test, p = 7∗10^−8) and 60% of the 25 DEPs in the replication cohort (p = 0.002) were associated with MS in the DisGeNET database. However, some previously suggested MS markers (including C1QA, CCL2, CXCL1, GFAP, HGF, and OPN) had non-significant log[2] fold change (FC; −0.25–0.34) when comparing pwMS to HC (see Supplementary Fig. [90]4). In contrast to CSF, protein profiling of plasma did not reveal any significant differences in protein expression after FDR in pwMS compared with HC (Fig. [91]2a, b). A few of the DEPs in CSF were also nominally differentially expressed in plasma, but with no overlap between discovery and replication cohorts (see Supplementary Fig. [92]5; see Supplementary Data [93]1). In addition, we found in general low correlation between CSF samples and plasma samples for the 52 DEPs in CSF in the discovery cohort, with the strongest correlations obtained for NfL (Pearson’s correlation coefficient (PCC) = 0.46) and IL-18 (PCC = 0.33) (see Supplementary Fig. [94]6). Fig. 2. Differential expression analysis of persons with MS (pwMS) compared to healthy controls (HC) in cerebrospinal fluid (CSF) and plasma. [95]Fig. 2 [96]Open in a new tab a Principal component (PC) analysis of all proteins measured in the CSF samples (left) and plasma samples (right). b Volcano plots showing differentially expressed proteins (DEPs) in CSF (left) and plasma (right). The top upregulated proteins, which overlapped in discovery cohort and replication cohort, are marked with protein names in the plots. The differential expression analysis was performed using a two-sided linear model t-test (Limma analysis). c DEPs (false discovery rate < 0.05) in the CSF, in either the discovery cohort or the replication cohort. The first two columns show the log[2] fold change (FC) of the DEPs in each cohort. Most proteins are upregulated (red) and 23 proteins overlap in discovery and replication cohorts. In the three columns to the right, it is marked which proteins are in three different list of known MS-associated genes and proteins (DisGeNET database, GWAS genes, and MS biomarkers) with the odds ratio of the enrichment shown on the top (two-sided Fisher’s exact test). The DEPs were significantly enriched for MS-associated genes from DisGeNet (discovery: p = 7∗10^−8, replication: p = 0.002), GWAS (discovery: p = 1∗10^−7, replication: p = 2∗10^−4), and known MS biomarkers (discovery: p = 1∗10^−12, replication: p = 2∗10^−6). In summary, the 52 CSF proteins identified in the bigger discovery cohort represent a set of proteins being dysregulated in early stages of MS suggesting their importance in MS pathogenesis. The fact that these proteins were enriched for MS-relevant genes makes them strong biomarker candidates, and they were therefore used in the following prediction models. B-cell activation markers can discriminate between MS and HC In order to test the diagnostic potential of the 52 DEPs in CSF from the discovery cohort, we created univariate logistic regression models for each of the proteins as well as a stepwise selection model (see “Methods”). To make fair assessments of the predictive power of our inferred models we allowed no refitting of any model parameters in the replication cohort, thus we expect the replication area under the receiver operating characteristic curve (AUC) to be a good estimation of the model test performance. In the model selection, age and sex were included as possible predictors. The highest AUC was found when having MZB1 and TNF in the model, which could predict the presence of disease with AUC = 0.99 (p = 2∗10^−13) in the discovery cohort and AUC = 0.87 (p = 6∗10^−7) in the replication cohort. Not surprisingly, in the univariate logistic regression models, AUC of the discovery cohort was high in all cases, but encouragingly most proteins also had high replication AUCs (Fig. [97]3a; see Supplementary Table [98]2). The top five proteins for prediction of diagnosis were MZB1, CD79B, CD27, TNFRSF13B, and IL-12p40 as ordered by AUC in the discovery cohort (Fig. [99]3a), where MZB1 had similar performance as the stepwise selection model containing MZB1 and TNF. These five proteins were reliably expressed above the limit of detection (LOD) in more than 95% of samples from pwMS and HC (see Supplementary Fig. [100]10). Finally, we investigated the discriminative power of plasma proteins. We then used the same logistic regression formulas that were trained in the CSF data of the discovery cohort and applied them to the plasma data of both cohorts. The levels of two of the derived proteins, FCN2 and IL-1RA, could discriminate pwMS from HC (AUC = 0.71 for FCN2 and AUC = 0.65 for IL-1RA) in the discovery cohort but not in the plasma data of the replication cohort. Taken together, several CSF proteins (MZB1, CD79B, CD27, TNFRSF13B, and IL-12p40) showed a strong ability to discriminate pwMS from HC, whereof the proteins MZB1, CD79B, CD27, and TNFRSF13B are related to B-cell activation. Fig. 3. Performance of the top cerebrospinal fluid (CSF) proteins for predicting diagnosis and disease activity over 2 years. [101]Fig. 3 [102]Open in a new tab Predictive power, assessed by area under the curve (AUC), of the most significant CSF proteins in the discovery cohort in differentiating between a persons with MS (pwMS; n = 92 samples in the discovery and n = 51 samples in the replication cohort) and healthy controls (HC; n = 23 samples in the discovery and n = 20 samples in the replication cohort) and b pwMS showing evidence of disease activity after 2 years (n = 48 samples in discovery and n = 45 samples in replication cohort) and pwMS not showing evidence of disease activity after 2 years (n = 30 samples in discovery and n = 5 samples in replication cohort). A logistic regression model was used to assess the predictive power of both individual proteins (the top 5 proteins in the discovery cohort are shown) and a combination of proteins, selected with a stepwise method, trained on the discovery cohort and independently validated on the replication cohort. The significance of the AUC scores were assessed with a two-sided Mann–Whitney U test. The p-values for the AUC scores of the diagnosis models in the order (stepwise model, NfL, CD79B, CD27, TNFRSF13B, IL-12p40) were (2∗10^−13, 4∗10^−13, 1∗10^−12, 3∗10^−12, 6∗10^−12, 6∗10^−11) for the discovery cohort and (6∗10^−7, 4∗10^−7, 2∗10^−5, 10∗10^−7, 1∗10^−7, 2∗10^−8) for the replication cohort. The p-values for the AUC scores of the disease activity models in the order (stepwise model, NfL, IL-1RA, FASLG, CCL3, CD6) were (1∗10^−8, 9∗10^−5, 0.002, 0.003, 0.004, 0.004) for the discovery cohort and (0.19, 0.02, 0.02, 0.14, 0.03, 0.41) for the replication cohort. NfL is superior in predicting disease activity over 2 years Next, we aimed to create a robust model for predicting the future short-term (2-year) disease activity using the NEDA-3 concept. NEDA-3 is a binary variable based on no evidence or evidence of disease activity, as determined by reported clinical relapses, new or enlarged MRI brain lesions, or worsening in the Expanded Disability Status Scale (EDSS; see “Methods”)^[103]26. We found that 39% of pwMS in the discovery cohort and 10% of pwMS in the replication cohort were classified as having no evidence of disease activity (NEDA) during 2 years follow-up, the remaining pwMS were classified as having evidence of disease activity (EDA). We then performed a Limma analysis of NEDA versus EDA groups but found no DEPs in the discovery cohort. Instead, we based the model on the 52 proteins that were differentially expressed in pwMS versus HC (in the discovery cohort) since these proteins were considered highly relevant to MS based on the enrichment of MS genes (see above). We used a similar approach as for prediction of MS diagnosis (see above) and trained a logistic regression model for each of the 52 proteins (Supplementary Table [104]3) and a stepwise selection model including the 52 proteins, age, and sex as the input predictors (Fig. [105]3b). The best separating model was based on NfL levels in CSF and had an AUC = 0.75 (p = 9∗10^−5) in the discovery cohort and an AUC = 0.77 (p = 0.02) in the replication cohort. In addition, IL-1RA, and CCL3 showed predictive power for disease activity, although inferior to NfL, when considering results from both the discovery and the replication cohort (Fig. [106]3b). A stepwise selection model (combination of NfL, IL-18, PDCD1, and CD6) showed good discrimination in the discovery cohort (AUC = 0.85) but not as good as NfL alone in the replication cohort (AUC = 0.63). In plasma we found no proteins to be of significant value to predict disease activity in either of our cohorts. Age and sex were not selected as significant predictors in any of the models. To evaluate the potential effect of treatment, a treatment duration index covering duration and drug efficacy (first-line treatment with less effective drugs versus second-line treatment with more effective drugs) during the total observation time was calculated (see “Methods”) and added to the models. Importantly, pwMS with EDA had in general a higher treatment duration index than pwMS with NEDA (p = 0.02 in the discovery cohort and p = 0.04 in the replication cohort, one-sided Mann–Whitney U test). Adding treatment duration index improved the predictive power of the best performing model containing only NfL (AUC = 0.77 in the discovery cohort and AUC = 0.82 in the replication cohort) but showed no significant effect on the other predictive models. The limited effect of the treatment duration index on the model performance, could partly be caused by the treatment duration index positively correlating with the expression of 34 of the 52 DEPs in the discovery cohort, although only the expression of one of these proteins (CCL3) were also significantly correlating with treatment duration index in the replication cohort (see Supplementary Fig. [107]7). Collectively, our findings demonstrate NfL to be the superior protein for predicting disease activity over 2 years. In addition, NfL is a very reliable marker which is expressed above the LOD in all samples from pwMS. To facilitate the use of NfL on its own in future studies, we calculated the optimal prediction cut-off in the NfL model, and corresponding NPX level, which resulted in the maximum accuracy (see “Methods”). We found that the optimal prediction cut-off was a probability of 0.45 (accuracy = 0.71), which corresponded to an NPX level of 1.14. Using the same NPX threshold in the replication cohort resulted in an accuracy of 0.62. To translate NPX to pg/ml, we used a fraction of our data (n = 38) from which the NfL levels were known based on previous measurement by Simoa^[108]27–[109]29. The NPX and pg/ml measurements were highly correlated (Spearman’s Correlation Coefficient (SCC) = 0.97), and the NPX threshold of 1.14 corresponded to 737 pg/ml (see “Methods”). A combination of 11 proteins accurately predicts disability worsening Whereas the NEDA-3 concept reflects the short-term disease activity mainly by detecting relapses and MRI activity, the long-term disability progression is more relevant from the perspective of a person with MS since it directly affects the quality of life^[110]30. The EDSS is the most used measure of disability status, but to adjust for age, the age-related MS score (ARMSS) was created^[111]31. To further adjust for length of observation time and allow for using data from different lengths of follow-up time, we used the recently described normalized ARMSS (nARMSS; see “Methods”). To obtain an nARMSS score, a person had to have had at least two documented EDSS scores over a period of at least 3 years. The resulting cohorts used for predictions consisted of 71 pwMS in the discovery cohort and 33 pwMS in the replication cohort. In Fig. [112]4, each person’s EDSS scores for each follow-up year and the resulting nARMSS score are shown and described in further detail in Supplementary Fig. [113]8. The nARMSS scores can obtain a value between −5 and +5, where a score of 0 represents the average disability worsening of pwMS based on historical cohorts (n = 25,558)^[114]31. Both the discovery and replication cohorts showed an overrepresentation of pwMS with a less severe disability worsening, with 50% of the pwMS having a score below −3.0 in the discovery cohort and below −2.0 in the replication cohort (see Supplementary Fig. [115]9). The nARMSS scores had a significantly stronger correlation with the last ARMSS score (age adjusted EDSS) compared to the first ARMSS score, used for calculating nARMSS, for both the discovery cohort (SCC = 0.89 compared to SCC = 0.71, p = 0.003) and the replication cohort (SCC = 0.92 compared to SCC = 0.79, p = 0.03). We first tested if short-term disease activity (based on 2-year NEDA-3) was associated with nARMSS but found no significant difference in nARMSS when comparing EDA (n = 43) with NEDA (n = 27; medians were −2.90 and −3.36, respectively, two-sided Mann–Whitney U-test p = 0.15). Then we also tested and found that age at baseline and subsequent treatment (treatment duration index) were correlating with nARMSS with an SCC = 0.38 (p = 0.001) and an SCC = 0.28 (p = 0.02), respectively, which led us to further include them as possible covariates in our models in downstream analysis. Fig. 4. Overview of the Expanded Disability Status Scale (EDSS) scores during yearly follow-up for persons with MS (pwMS). [116]Fig. 4 [117]Open in a new tab The disability worsening scores for pwMS, who had at least two EDSS scores over a period of more than 3 years. Each column corresponds to one person. The top heatmap shows the EDSS scores for each follow-up year (0–13 years), followed by the age of each person. White cells indicate that no EDSS score was available for that year. Thereafter follows the normalized age-related MS score (nARMSS), calculated from a person’s EDSS score and age. In the row underneath the nARMSS score it is marked if a person’s nARMSS score is below the thresholds nARMSS < −4 or nARMSS < −3, or above the threshold nARMSS > −1. White cells indicate that the nARMSS score is not covered by any of these three thresholds. The last two rows show the predicted nARMSS score obtained from the suggested cerebrospinal fluid (CSF) model combining 11 proteins (first row) and if the predicted nARMSS score is covered by any of the three thresholds mentioned above (second row). To create a predictive model of nARMSS, we first performed a Limma analysis of the 1463 proteins based on the nARMSS score, but no DEPs were identified. Therefore, we again started from the 52 DEPs in CSF of pwMS compared to HC in the discovery cohort (see above), age, and sex. The predictive model of nARMSS was performed with a stepwise linear regression model using the CSF protein data. This resulted in a significant model including eleven proteins (CXCL13, LTA, FCN2, ICAM3, LY9, SLAMF7, TYMP, CHI3L1, FYB1, TNFRSF1B, NfL) and age as predictors (see Supplementary Table [118]4). We also evaluated the effect of treatment, by adding treatment duration index to the model, but it did not improve the performance of the model. The model consisted of both proteins with positive and negative coefficients, even though all proteins were upregulated in MS compared to HC. Next, when comparing the predicted nARMSS with the true nARMSS we found strong and significant correlations in both the discovery (SCC = 0.69, p = 3∗10^−11) and the replication cohort (SCC = 0.74, p = 9∗10^−7; Fig. [119]5a). To also consider both the correlation and accuracy of the prediction, we used Lin’s concordance correlation coefficient (CCC) as an additional performance metric, which resulted in a CCC of 0.72 (p = 2∗10^−12) in the discovery cohort and a CCC of 0.51 (p = 0.002) in the replication cohort. As a comparison, we also evaluated the performance of models only including age and each of the 11 proteins and found that the combined model outperformed each of the individual models (see Supplementary Table [120]4). Fig. 5. Performance of the top models for predicting long-term disability worsening using cerebrospinal fluid (CSF) and plasma proteins. [121]Fig. 5 [122]Open in a new tab a CSF: The predicted normalized age-related MS scores (nARMSS) were significantly correlating with the true nARMSS for both discovery and replication cohorts, assessed with Spearman’s correlation coefficient (SCC; discovery: p = 3∗10^−11, replication: p = 9∗10^−7) and Lin’s concordance correlation coefficient (CCC; discovery: p = 2∗10^−12, replication: p = 0.002). b CSF: Receiver operating characteristic (ROC) curves and area under the curve (AUC) scores for each of the three different nARMSS thresholds. The p-values for the AUC scores in the order (nARMSS > −1, nARMSS < −3, nARMSS < −4) were (2∗10^−5, 7∗10^5, 6∗10^−7) for the discovery cohort and (0.03, 4∗10^−4, 6∗10^−5) for the replication cohort. c Plasma: Reducing the CSF model to NfL and age resulted in a model that could predict nARMSS from plasma samples. The predicted nARMSS significantly correlated with the true nARMSS for both the discovery cohort (SCC: p = 5∗10^−4, CCC: p = 0.02) and replication cohort (SCC: p = 0.04, CCC: p = 0.66). d Plasma: ROC curves and AUC scores for each of the three different nARMSS thresholds. The p-values for the AUC scores in the order (nARMSS > −1, nARMSS < −3, nARMSS < −4) were (4∗10^−4, 0.09, 0.003) for the discovery cohort and (0.08, 0.19, 0.07) for the replication cohort. The significance of the SCCs and CCCs was assessed with t-statistics (two-sided) and the significance of the AUC scores were assessed with a one-sided Mann–Whitney U test. To further evaluate the performance of the model, we assessed the ability to predict groups of pwMS with similar disability worsening. We made three different divisions using three different nARMSS thresholds, selected using the discovery cohort: nARMSS < −4 (corresponding to 20% of pwMS with the best prognosis), nARMSS < −3 (corresponding to 50% of the pwMS, i.e., a median split), and nARMSS > −1 (corresponding to 20% of the pwMS with the worst prognosis). For each of these thresholds, the model successfully identified the selected pwMS group both in the discovery and the replication cohort. For each respective threshold the AUC for the discovery cohort was 0.85 (p = 2∗10^−5), 0.76 (p = 7∗10^−5), and 0.92 (p = 6∗10^−7) with an accuracy of 0.85, 0.66, and 0.85 and the AUC for the replication cohort was 0.90 (p = 0.03), 0.88 (p = 4∗10^−4), and 0.90 (p = 6∗10^−5) with an accuracy of 0.88, 0.85, and 0.82 (Fig. [123]5b). Lastly, we confirmed that the 11 identified proteins were reliably expressed above the LOD in more than 60% of samples from pwMS whereof eight proteins were expressed in more than 75% of samples from pwMS (See Supplementary Fig. [124]10). The performance of models with the three proteins (SLAMF7, TYMP, FYB1) removed which did not fulfill the more stringent threshold of 75% can be seen in Supplementary Table [125]5. We continued by investigating the potential of the model to predict nARMSS from plasma samples. Interestingly, the model was enriched (p = 0.03) for proteins whose expression in CSF correlated with the expression in plasma (p < 0.05 in the discovery cohort). Of the 52 DEPs in CSF, seven proteins had correlating expressions in CSF and plasma, whereof four were selected in the model: NfL (SCC = 0.45), CXCL13 (SCC = 0.30), CHI3L1 (SCC = 0.27), and FCN2 (SCC = 0.25; see Supplementary Table [126]4). We hypothesized that the correlating proteins could be used to predict nARMSS from plasma samples by using a model trained on CSF samples. Again, performing a stepwise linear regression model, only selecting among the four correlating proteins and age, we reduced the model to three terms: intercept (coefficient (c) = −0.707), age (c = −0.068) and NfL (c = 0.369). The model could predict nARMSS from plasma samples with an SCC of 0.40 (p = 5∗10^−4) and a CCC of 0.28 (p = 0.02) in the discovery cohort (n = 71), and an SCC of 0.60 (p = 0.04) and a CCC of 0.14 (p = 0.66) in the replication cohort (n = 12, Fig. [127]5c). Evaluating the model based on the three nARMSS thresholds (nARMSS < −4, nARMSS <–3, nARMSS > −1) resulted in discovery AUC of 0.78 (p = 4∗10^−4), 0.59 (p = 0.09), and 0.74 (p = 0.003), with an accuracy of 0.77, 0.56, and 0.82 and replication AUC of 1.0 (p = 0.08), 0.70 (p = 0.19), and 0.78 (p = 0.07) with an accuracy of 1.0, 0.58, and 0.50 (Fig. [128]5d). It should be noted that only 12 pwMS in the replication cohort had both usable plasma samples and fulfilled the requirements for obtaining an nARMSS score. Network analysis provides functional context for DEPs and reveals additional biomarker candidates To provide a functional context of the discovered MS proteins we made an MS network using STRING version 11.5^[129]32. The 11 proteins in the nARMSS model and the 23 DEPs that overlapped in the discovery and the replication cohort, representing a set of core proteins in MS, were connected by adding at most one intermediate protein. The proteins, except ADA2, formed a closely connected network consisting of 40 proteins, including 11 intermediate proteins (Fig. [130]6a, Supplementary Fig. [131]11a). Among the intermediate (added) proteins there were five proteins that were not included in the proteomics profiling; the chemokine receptors CCR1 and CCR5, the receptor ITGAL expressed on leukocytes, the adapter protein LCP2 associated with the T-cell receptor, and the multifunctional adapter protein SDCBP. The resulting MS network had 13.5 times as many interactions than is expected (p < 1∗10^−16) using the STRING protein–protein interaction network, indicating shared biological functionality^[132]32. Gene Ontology enrichment analysis showed that the MS network was highly enriched for proteins involved in cytokine-mediated signaling (n = 11, p = 7∗10^−7), T-cell activation (n = 14, p = 3∗10^−9) and B-cell activation (n = 6, p = 6∗10^−4), exocytosis (n = 4, p = 0.03) and endocytosis, in particular phagocytosis (n = 4, p = 0.01), cell adhesion including regulation of cell-cell adhesion and cell-cell adhesion via plasma-membrane adhesion molecules (n = 11, p = 0.02), apoptotic processes including positive regulation of apoptotic process (n = 9, p = 6∗10^−5) and negative regulation of leukocyte apoptotic process (n = 2, p = 3∗10^−2), myelination including regulation of myelination (n = 2, p = 0.02). Some proteins in the network were not annotated by Gene Ontology and were therefore manually categorized based on the literature (Fig. [133]6b; see references in Supplementary