Abstract Eighty-five percent of multiple sclerosis cases begin with a discrete attack termed clinically isolated syndrome, but 37% of clinically isolated syndrome patients do not experience a relapse within 20 years of onset. Thus, the identification of biomarkers able to differentiate between individuals who are most likely to have a second clinical attack from those who remain in the clinically isolated syndrome stage is essential to apply a personalized medicine approach. We sought to identify biomarkers from biochemical, metabolic and proteomic screens that predict clinically defined conversion from clinically isolated syndrome to multiple sclerosis and generate a multi-omics-based algorithm with higher prognostic accuracy than any currently available test. An integrative multi-variate approach was applied to the analysis of cerebrospinal fluid samples taken from 54 individuals at the point of clinically isolated syndrome with 2–10 years of subsequent follow-up enabling stratification into clinical converters and non-converters. Leukocyte counts were significantly elevated at onset in the clinical converters and predict the occurrence of a second attack with 70% accuracy. Myo-inositol levels were significantly increased in clinical converters while glucose levels were decreased, predicting transition to multiple sclerosis with accuracies of 72% and 63%, respectively. Proteomics analysis identified 89 novel gene products related to conversion. The identified biochemical and protein biomarkers were combined to produce an algorithm with predictive accuracy of 83% for the transition to clinically defined multiple sclerosis, outperforming any individual biomarker in isolation including oligoclonal bands. The identified protein biomarkers are consistent with an exaggerated immune response, perturbed energy metabolism and multiple sclerosis pathology in the clinical converter group. The new biomarkers presented provide novel insight into the molecular pathways promoting disease while the multi-omics algorithm provides a means to more accurately predict whether an individual is likely to convert to clinically defined multiple sclerosis. Keywords: multiple sclerosis, clinically isolated syndrome, biomarker, prediction, prognosis __________________________________________________________________ Probert et al. use an integrative analysis approach to identify novel CSF biomarkers from biochemical, proteomics, and metabolomics assays for the prediction of clinically conversion to multiple sclerosis in patients recruited at clinically isolated syndrome onset. The proposed multivariate algorithm significantly outperforms oligoclonal band status for prediction of clinical conversion. Graphical Abstract Graphical Abstract. [46]Graphical Abstract [47]Open in a new tab Introduction Clinically isolated syndrome (CIS) is the first manifestation of multiple sclerosis (MS) in 85% of patients.[48]^1 However, not all patients with a CIS attack go on to a confirmed MS diagnosis. Indeed, 37% of CIS patients do not fulfil McDonald 2010 diagnostic criteria 20 years after onset[49]^2 while only 63% transition to clinically defined MS, which is, in practice, the occurrence of a second clinical attack.[50]^3 Early treatment is essential to minimize the occurrence of further attacks and the accumulation of permanent disability.[51]^4–9 Thus, differentiation of individuals who are most likely to have a second clinical attack from those who remain in the CIS stage is essential to achieve the desired personalized medicine approach. Currently, MS diagnosis relies upon exclusion of other possible diagnoses followed the interpretation of a combination of detailed clinical evaluation, magnetic resonance imaging (MRI), and CSF analysis according to the latest McDonald Diagnostic criteria.[52]^10 The revisions introduced in the 2017 McDonald criteria [namely the inclusion of CSF oligoclonal IgG bands (OCGB) as a surrogate maker for dissemination in time] have resulted in more patients being diagnosed with MS at the point of CIS, this is at the expense of reduced specificity of only 61%–63% compared to 85%[53]^11^,[54]^12 using the previous 2010 criteria. Hence, these revisions undoubtedly benefit patients by ensuring that MS is treated in timely manner. However, the McDonald criteria are not meant to predict the course for disability worsening or time to second relapse. Recognised risk factors of clinically defined conversion from CIS to MS have been identified including younger age of disease onset, male gender,[55]^13^,[56]^14 the number of T[2] weighted MRI lesions,[57]^15 the presence of OCGB,[58]^16 intrathecal IgM synthesis[59]^17^,[60]^18 and neurofilament light chain.[61]^19 However, these measures are validated only at the group level, and there is no biofluid or imaging marker to address this unmet need for individual use. For example, 21% of CIS patients with normal MRI scans at baseline still transition to clinically defined MS and, while OCGB are extremely sensitive for the diagnosis of MS, 50% of CIS patients who test positive for OCGB at baseline do not have a second clinical attack within 50–60 months.[62]^16^,[63]^19^,[64]^20 To identify novel predictive biomarkers for the transition to MS we collected baseline CSF samples from 54 patients with CIS with 2–10 years of follow-up to determine clinical conversion. This study represents the first comprehensive multi-omics investigation of CIS to include simultaneously clinical chemistry, metabolomics and proteomics analysis of baseline CSF samples coupled with extensive clinical follow-up and to investigate the power of such biomarkers at predicting clinically defined conversion in an individualized manner. Here we report the capacity of novel biochemical and protein markers to predict the transition from CIS to clinically defined MS. The future use of simple multivariant biomarker algorithms in the clinical care pathway has the evident potential to facilitate the personalization and optimization of therapy for individuals with CIS. Materials and methods Study participants CSF samples were collected in the Department of Neurology of the University Hospital Basel, during routine diagnostic measures, as indicated by the treating physicians. Inclusion criteria were as follows: (i) the presence of a monophasic clinical episode suggestive of MS (CIS), not attributable to other diseases (for example, infectious, neoplastic, congenital, metabolic or vascular disease)[65]^21; (ii) clinical follow-up of at least two years; (iii) available basic demographic and clinical data (age, gender, dates of CIS onset, serum sampling, CSF examination, MRI, clinically defined conversion to MS (if present) and last follow-up visit); and (iv) CSF information at time of CIS. Patients with neuromyelitis optica, or a history of a progressive disease course from onset were excluded. Exclusion criteria included (i) active systemic infection and (ii) steroid treatment at the time of CSF sampling. Conversion to MS was diagnosed according to Poser criteria. This implied the exclusion of alternative diagnoses and the presence of a second clinically evident demyelinating attack which had to be separated in time and space from the first episode (i.e. occurring after an interval of at least one month and in a separate CNS location).[66]^22 In total, 85 individuals were recruited at CIS onset and examined for eligibility in the study. Of these, three were confirmed to have active infections and two were receiving steroid treatment at the point of CSF sampling while 26 had insufficient follow-up to determine clinical converter status and were excluded from the study. A flow chart of sample recruitment and exclusions can be found in [67]Supplementary Fig.[68]1. There were no missing data for any of the 54 participants included in the analysis. CIS patients were recruited based on the criteria above to avoid bias and no significant difference in age, gender or onset expanded disability scale status (EDSS) was observed between the converter and non-converter groups. Standard protocol approvals, registrations and patient consents Written informed consent was obtained from all patients according to the Declaration of Helsinki. Ethical approval was obtained by the local ethics committee. Power calculation Prior to analysis, a power calculation (PPCA model using the R package MetSizeR) was carried out. This confirmed that a sample size of 40 (20 non-converters and 20 converters) would be sufficient to achieve an FDR cut-off of 0.05 assuming the significance of 10% of variables. These assumptions are in line with our previous omics analysis of MS patient cohorts and indicated that n = 22 in the converter group would be sufficient. CSF sample collection CSF samples were centrifuged at 400 g for 10 min at room temperature, and the cell-free supernatant stored at −80°C within 2 h of collection.[69]^23 Samples were processed as per standard laboratory procedures for leukocytes (cells/mm^3), and total protein concentration (mg/dl). Serum samples were collected at the same visit to calculate the CSF/serum albumin ratio (Q[alb]). The integrity of the blood–CSF barrier was determined by calculating the CSF/serum ratio for albumin (Q[alb]).[70]^24 Intrathecal synthesis of IgG was determined by detection of oligoclonal IgG bands (OCGB) by isoelectric focussing on agarose gel and subsequent immunoblotting using IgG-specific antibody staining.[71]^25 Testing of OCGB was considered positive if pattern two or three (local synthesis of IgG within the CNS) were present.[72]^26 These parameters are henceforth referred to as clinical chemistry parameters. Nuclear magnetic resonance sample preparation for metabolomics analysis On the day of metabolomics analysis, CSF samples were thawed at room temperature and 100 µl was then diluted with 450 μl of 75 mM sodium phosphate buffer prepared in D[2]O (pH 7.4) containing 1 mM maleic acid as an internal reference standard. Samples were briefly centrifuged at 3000 × g for 5 min before transferring to a 5-mm NMR tube. Nuclear magnetic resonance spectroscopy and data processing for metabolomics analysis All nuclear magnetic resonance (NMR) spectra were acquired at 310 K using a 700-MHz Bruker AVIII spectrometer operating at 16.4 T equipped with a ^1H [^13C/^15N] TCI cryoprobe (Department of Chemistry, University of Oxford). The noesygppr1d (Bruker, Germany) pulse sequence was used to acquire ^1H NMR spectra with a 2 s presaturation, 32 data collections, a spectral width of 16 ppm, and an acquisition time of 1.46 s. All spectra were preprocessed in Topspin 2.1 (Bruker, Germany); zero filled by a factor of 2 and multiplied by a 1D exponential corresponding to a 0.3 Hz line broadening. All spectra were baseline corrected with a fifth-degree polynomial and referenced to the lactate doublet at 1.33 ppm. Following visual inspection for errors in baseline correction, referencing, spectral distortion or contamination, the processed spectra were exported to ACD/Labs Spectrus Processor Academic Edition 12.01 (Advanced Chemistry Development, Inc., Toronto, Canada), whereby regions of the spectra between 0.83 and 8.47 were split into 0.02-ppm-wide bins. The residual water resonance region (4.13–5.22 ppm) was removed from the analysis. The integral of each spectral bin was calculated and exported as a .csv file for statistical analysis. Metabolite assignment was performed by referencing to literature values,[73]^27^,[74]^28 the Human Metabolome Database[75]^29 and via 2 D correlation spectroscopy (COSY) experiments. A list of all NMR-detectable CSF metabolites has been previously reported.[76]^30 While, all metabolite resonances were included in our analysis the 30 most abundant metabolites detectable in the NMR spectra included (in alphabetical order) 3-hydroxybutyrate, acetate, acetoacetate, alanine, arginine, aspartate, citrate, creatine, creatinine, formate, glucose, glutamate, glutamine, glycerol, histidine, isoleucine, lactate, leucine, lysine, methyl isobutyrate, myo-inositol, N-acetyl-aspartate, phenylalanine, proline, scyllo-inositol, taurine, threonine, trimethyl-amine, tyrosine and valine. Determining concentration of the NMR metabolite biomarkers NMR metabolite measures, in relative units, were converted to absolute concentrations in SI units using an internal reference standard (1 mM maleic acid). In order to validate the quantification of the metabolites by NMR, the glucose and lactate levels in all CSF samples were measured using a Cobas^® 8000 modular analyzer (Roche Diagnostics, Switzerland) and the Gluc3 and LAC2 assays, respectively. There was a significant correlation between the NMR-determined concentration and the laboratory chemistry determined concentration for both glucose (Pearson’s R = 0.91, P-value < 0.001) and lactate (Pearson’s R = 0.90, P-value < 0.001) ([77]Supplementary Fig. 4A[78]and B) and Bland–Altman plots revealed excellent agreement between the two methods ([79]Supplementary Fig. 4C[80]and D). Protein profiling by SomaScan^TM Protein biomarker profiling in CSF samples was performed using the SomaScan^® platform from SomaLogic Somalogic Inc, Boulder, Co). SomaScan^® is multiplexed proteomic tool that measures more than 5000 protein analytes including 4783 SOMAmers (slow off-rate modified aptamers) that recognize 4137 distinct human gene targets. The SOMAmers are constructed with chemically modified nucleotides that expand the physicochemical diversity of the large randomized nucleic acid libraries from which the SOMAmer reagents are selected. The SomaScan^® assay measures native proteins in complex matrices by transforming each individual protein concentration into a corresponding SOMAmer reagent concentration, which is then quantified in customized DNA microarrays. The assay takes advantage of SOMAmer reagents’ dual nature as both protein affinity-binding reagents with defined three-dimensional structures, and unique nucleotide sequences recognizable by specific DNA hybridization probes.[81]^31^,[82]^32 The CSF samples were stored at −80°C and shipped to SomaLogic (Boulder, CO) on dry ice for SomaScan^® analysis. Metabolomics & proteomics statistical analysis Multivariate orthogonal partial least squares discriminant analysis (OPLS-DA) was performed in R software (R foundation for statistical computing, Vienna, Austria)[83]^33 using in-house R scripts and the ropls package.[84]^28 OPLS-DA is a dimension reduction technique that can extract correlated patterns from complex data sets. This method is ideally suited for large ‘omics datasets (as used in other similar aforementioned studies), such as those studied here, because it helps reduce the dimension of such datasets by extracting a subset of most salient biomarkers (described as a linear equation) that can subsequently be used to predict the class of interest—in this instance clinical converters and non-converters. OPLS-DA models were validated using an external 10-fold cross-validation strategy with repetition coupled with permutation testing as previously described.[85]^34 We have previously published an in-depth description of this analysis approach[86]^35 and further detail can be found in the [87]Supplementary material. In brief, the data are corrected for unequal class sizes before being randomly split into a training set (90%) and an independent test set (10%). The training set is used to build the OPLS-DA model the R^2 and Q^2 values are then used to assess the model performance (the goodness of fit and prediction, respectively) on the training data. The model generated is then applied to the test set (to which the OPLS-DA model is blinded) to determine the predictive accuracy, sensitivity, and specificity of the model (on previously, unseen data). This process of model training and testing is repeated for a total of 1000 times, thereby creating an ensemble of models. It has been shown that in cases where the sample sizes are small, one can achieve a prediction accuracy of up to 70% or higher by chance alone in differentiating two-classes.[88]^36 Thus, to further validate our prediction accuracy, we compare our ensemble of models to an ensemble of randomly permuted models (generated by randomly permuting the class identities). For a 2-class classification problem, the expected accuracy of a randomly permuted model is 50%. If the model ensemble significantly outperforms the randomly permuted models (as quantified using a two-sided Kolmogorov–Smirnov test, significant if P-value 0.05 or less), then the discriminatory variables responsible for the observed class separation are extracted by inspection of the average variable importance (VIP) scores. The VIP score of a given variable represents the mean decrease in accuracy which occurs when that variable is removed from the model. Thus, a variable which is highly significant and plays a large role in the diagnostic accuracy of the model will result in a large decrease in accuracy when removed from the model resulting a large VIP score. Conversely, variables which do not play a role in discriminating between groups have very little effect on model accuracy when removed and have a low VIP score. Elastic Net feature selection was applied to the proteomics data using the glmnet package[89]^37 prior to each iteration of the OPLS-DA method to reduce the number of predictor variables (by removing irrelevant or redundant variables), which helps improve model inference and lowers computational time.[90]^38^,[91]^39 Both the α and λ for use in each elastic net feature selection were determined using 7-fold cross validation to optimize the mean squared error on the training data alone. In order to identify the combination of clinical chemistry, metabolomics and proteomics variables with the highest predictive accuracy a combined multi-omics strategy was applied to the selected discriminatory variables (all variables from the clinical chemistry, proteomics, and metabolomics data combined) by applying the OPLS-DA cross-validation strategy (described above) to every combination of one to six variables and performing a ROC analysis on each model to assess the performance. The linear combination of metabolites which resulted in the highest performing model (determined by AUC, accuracy, sensitivity and specificity) are then reported. There was no significant increase in AUC or accuracy between five and six variables and so linear combinations of variables containing greater than 6 were not pursued. Univariate and ROC analysis All analysis was performed in R software (R foundation for statistical computing, Vienna, Austria). Two-sample t-tests were used for continuous variables while Chi-square tests were used for categorical variables as appropriate. A Bonferroni correction, to account for multiple comparisons, was applied throughout. Two-tailed P-values < 0.05 were considered statistically significant. Receiver operator curves (ROC), area under the curve (AUC), 95% confidence intervals, optimal thresholds for diagnosis and P-values (relative to a null distribution ROC curve with AUC = 0.5) were calculated for each discriminatory variable using the pROC package.[92]^40 The diagnostic odds ratio for each biomarker identified is reported. Where one group in the contingency was empty the Haldane–Anscombe correction was used.[93]^41 Protein pathway analysis Pathway enrichment analysis was performed on the discriminatory proteins identified by the multivariate analysis (described above) using Metascape [[94]http://metascape.org].[95]^42 Metascape is updated monthly and combines over 40 independent knowledgebases including GO, KEGG and MSigDB for enrichment and gene membership analysis. All genes in the genome were used as the enrichment background. Terms with a P-value < 0.01, a minimum count of 3 and an enrichment factor (the ratio of observed counts to the counts expected by chance) >1.5 were collected and grouped into clusters based on their membership similarities. Kappa scores[96]^43 were used to define the extent of ‘similarity’ when performing hierarchical clustering on the enriched terms, and sub-trees with a similarity of >0.3 are considered a cluster. The most statistically significant term within any given cluster was chosen to represent that cluster. Cytoscape was used to visualize the results of the protein enrichment network.[97]^44 The DisGeNet discovery platform[98]^45 (www.disgenet.org last accessed 27/04/2021) and Enrichr[99]^46 (https://maayanlab.cloud/Enrichr/ last accessed 27/04/21) was used to perform enrichment analysis on the identified proteins which were up/down regulated in the converter cohort relative to the non-converter cohort in an effort to identify disease specific pathways associations. Data and code availability Anonymized data and code will be shared by request from any qualified investigator. Results While useful in diagnosis, baseline OCGB positivity is unable to predict clinically defined conversion A total of 54 patients with symptoms consistent with CIS were included in this study. CSF samples were collected within one year of CIS onset [mean time to sample collection from onset 3.5 (1–11) months] and followed up for up to 10 years. Twenty-two patients converted to clinically defined MS (hence forth referred to as ‘converters’) while 32 patients had no signs of further relapses during follow-up (hence forth referred to as ‘non-converters’). The median time to sample collection in both the converter and non-converter groups was 3 weeks and there was no significant difference in the means (P-value 0.82). All converter CSF samples were collected before the second attack. Patient demographics and clinical chemistry results for both converters and non-converters are reported in [100]Table 1. Length of follow-up was longer in the non-converter group [mean, 6.5 years; median, 7.1 years; range, 2.0–9.7 years; interquartile range (IQR), 5.6–8.1 years] relative to the converter group (mean, 4.6 years; median, 4.2 years; range 1.6–8.7 years; IQR, 3.1–5.3 years) to ensure that sufficient time elapsed for relapses to occur. As the majority of CIS patients experience a second attack within two years of onset with the median and mean time to second attack ranging from 11–14 months and 8–11 months, respectively,[101]^13^,[102]^20^,[103]^47^,[104]^48 patients were only included in the non-converter group if a minimum of two years of follow-up was available. Indeed, the average time to conversion in the converter group was 1.7 years (median, 0.9 years; IQR, 0.4–2.1 years). Thus, the length of follow-up in the non-converter group (median 7.1 years) was significantly greater than the time to conversion in the converter group (median 0.9 years), suggesting the length of follow up in the non-converters is sufficient. There were no significant differences in the gender distributions, age, or grade of disability as per EDSS score at onset between the converter and non-converter groups. Table 1. Patient demographic and clinical data, at the point of CSF sampling, grouped by converter status. Converter [n = 22] Non-converter [n = 32] P-value Female, No. [%] 17 [77] 21 [66] 0.36 Age, mean [SD], years 31.3 [9.9] 36.4 [11.2] 0.08 EDSS, median [range] 2.5 [0–4] 1.5 [0–4] 0.06 Time to conversion, mean [SD], years 1.7 [2.0] NA NA Follow-up, median [IQR], years 4.2 [3.1–5.3] 7.1 [6.0–8.1] <0.001 Immune modulating therapies None None NA OCGB positive, No. [%] 22 [100] 22 [69] <0.001 Leukocytes, mean [SD], cells/mm^3 10.9 [9.1] 4.9 [4.9] <0.001 Mononuclear, mean [SD], cells/mm^3 10.7 [8.8] 4.7 [4.7] <0.001 Polynuclear, mean [SD], cells/mm^3 0.27 [0.65] 0.21 [0.64] 0.73 Total protein, mean [SD], mg/dl 3367.7 [96.0] 374.7 [96.0] 0.83 CSF/serum albumin ratio, mean [SD] 4.9 [2.0] 5.2 [1.6] 0.58 [105]Open in a new tab P-values from Student’s t-test for continuous variables and Chi-squared test for categorical variables are reported. EDSS, expanded disability status scale; IQR, interquartile range; OCGB, oligoclonal bands; SD, standard deviation. While useful in the diagnosis of MS in the context of the revised McDonald criteria, the presence/absence of OCGB was not predictive of transition to clinically defined MS. Although more of the converters tested positive for OCGB (100%) still the majority (69%) of the non-converters also tested positive at onset ([106]Fig. 1A). As a result, OCGB are extremely sensitive (100%) for the prediction of a second clinical attack, but specificity is very low (31%) resulting in an AUC and accuracy of only 0.66 and 59%, respectively. It should be noted that the average length of follow-up in the OCGB+ve non-converters (mean, 4.6 years; median, 4.2 years; range, 2–10 years; IQR, 3.1–5.3 years) was, again, significantly greater than the median time to conversion (0.9 years) in the converter group, indicating that the absence of a second attack in these OCGB+ve was not a result of follow-up duration. Figure 1. [107]Figure 1 [108]Open in a new tab Clinical chemistry results. (A) Confusion matrix illustrating the low sensitivity of OCGB status for the prediction of clinically defined conversion to MS. Box plots of CSF clinical chemistry parameters (B) mononuclear cells, (C) leukocyte cells, (D) CSF/serum albumin ratio (Q[alb]) and (E) total protein measured in clinically defined converters and non-converters. Dashed lines represent the optimal threshold to achieve the greatest accuracy as determined by ROC analysis. A significant increase in mononuclear and leukocytes cell counts was observed in the converter group. (F) Heat map illustrating all correlations (Pearson’s R) between the CSF clinical chemistry measures and OCGB. (G) Predictive performance of CSF clinical chemistry parameters compared to the performance of OCGB status. Biomarkers are listed from highest to lowest AUC. Acc, Accuracy; AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; ROC, receiver operator curve; Sens, sensitivity; Spec, specificity. Clinically defined converter n = 22, clinically defined non-converter n = 32. Two-sample t-tests were used for continuous variables while Chi-square tests were used for categorical variables as appropriate. A Bonferroni correction to account for multiple comparisons was applied throughout. Two-tailed P-values <0.05 were considered statistically significant. P-values <0.001 following correction for multiple comparisons are represented by ***. Baseline CSF leukocyte and mononuclear cell counts predict clinically defined conversion with greater overall accuracy than OCGB status In addition to OCGB, several clinical chemistry parameters were measured at CIS onset including leukocyte [further divided into mononuclear and polymorphonuclear (PMN) cell counts], total CSF protein levels and Q[alb]. There was no significant difference in the Q[alb] or total CSF protein concentration between converters and non-converters at onset. Twenty-nine patients (54%) had ‘normal’ (< 4 cell/mm^3) CSF leukocyte cell counts at baseline. The highest leukocyte cell count was 28.7 cell/mm^3 and 13 (24%) patients (nine converters and four non-converters) exhibiting a cell count above 10 cell/mm^3 at onset. The leukocyte cell count was increased (>4 cell/mm^3) in a larger proportion of converter patients (72%) than non-converters (41%) (Chi-Squared P-value 0.02). Interestingly, the mononuclear cell sub-population was elevated in the converter group relative to non-converters, while the PMN subset was not significantly altered ([109]Fig. 1B–E). Indeed, ROC analysis reveals that the CSF leukocyte cell count and, particularly, the monocular cell population outperform OCGB for the prediction of a second clinical attack with AUC values of 0.74 and 0.73, respectively ([110]Fig. 1G). Mononuclear cell counts were strongly correlated with leukocyte levels (Pearson’s r 0.96, P-value < 0.001 corrected for multiple comparisons) and weekly correlated with PMN although this correlation did not reach significance following correction for multiple comparisons (Pearson’s r 0.35, P-value 0.12 corrected for multiple comparisons). As expected, a strong correlation (Pearson’s r 0.94, P-value < 0.001 corrected for multiple comparisons) between total protein levels and Q[alb] was observed ([111]Fig. 1F). Myo-inositol and glucose CSF concentrations outperform OCGB status for prediction of clinical conversion to MS To uncover further predictive biomarkers of conversion, NMR metabolomics analysis was used to simultaneously measure ∼50 small molecule, soluble metabolite levels in the baseline CSF samples. OPLS-DA revealed significant differences in the CSF metabolome of converters compared to non-converters ([112]Fig. 2A), and external cross-validation confirmed that the accuracy of the multivariate metabolomics model (determined on independent test data) significantly outperformed the randomly permuted models ([113]Supplementary Fig. 2). The metabolites driving the separation observed in the multivariate model included lactate and glucose, which were elevated in the converter CSF samples, and myo-inositol and creatine, which were decreased in converter CSF samples relative to non-converters ([114]Fig. 2B–E). As the multivariate model, dominated by glucose, lactate, creatine and myo-inositol, was able to significantly predict conversion to CDMS (two-sided Kolmogorov–Smirnov test P-value < 0.001) we then investigated how each of these metabolites would perform in isolation, to determine if measuring a single biomarker could produce sufficient predictive accuracy for use in a clinical setting. Both myo-inositol and glucose CSF levels showed greater specificity for predicting the occurrence of a second clinical attack resulting in improved overall accuracy and AUC compared to OCGB status alone ([115]Fig. 2F). By contrast, lactate and creatine did not perform better than OCGB status as predictors of conversion when measured in isolation. In addition, while lactate and creatine were important for discrimination in the multivariate model each of these metabolites in isolation did not reach significance by univariate analysis following correction for multiple comparisons. Figure 2. [116]Figure 2 [117]Open in a new tab Metabolomics results. (A) Representative OPLS-DA scores plot illustrating discrimination between clinically defined converters (square) and non-converters (circle) CSF metabolite profiles. (B–E) Boxplots of the significant discriminatory metabolites identified by the OPLS-DA analysis in clinically defined converters and non-converters. Dashed lines represent the optimal threshold to achieve the greatest accuracy as determined by ROC analysis. For univariate analysis, two-sample t-tests were used for continuous variables while Chi-square tests were used for categorical variables as appropriate. While for multivariate analysis a two-sided Kolmogorov–Smirnov test was used to determine the significance of the OPLS-DA performance on independent test data relative to the null distribution. A Bonferroni correction to account for multiple comparisons was applied throughout. Univariate P-values below 0.05, 0.01 and 0.001 are represented by *, ** and ***, respectively. (F) Predictive performance of identified CSF metabolite biomarkers compared to the performance of OCGB status. Biomarkers are listed from highest to lowest AUC. Acc, Accuracy; AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; ROC, receiver operator curve; NPV, negative predictive value; Sens, Sensitivity; Spec, specificity. Clinically defined converter n = 22, clinically defined non-converter n = 32. Multivariate proteomics analysis of baseline CSF samples identifies several proteins which outperform OCGB status, in terms of both sensitivity and specificity, for prediction of clinical conversion to MS The SomaScan^® platform was used to measure over 5000 CSF protein concentrations in both converters and non-converters. Multi-variate analysis confirmed significant separation between the converter and non-converter proteomes ([118]Fig. 3A), which was validated by external cross-validation and permutation testing (Kolmogorov–Smirnov P-value < 0.001, Supplementary Fig. 3). The multi-variate analysis uncovered a panel of 89 proteins driving the discrimination between converter and non-converter in CIS CSF samples, 72 of which predict occurrence of a second attack with greater AUC (>0.66) than OCGB alone ([119]Supplementary Table 1). Of the 89 protein biomarkers identified, 27 were elevated in converters at onset while the remaining 62 were elevated in the non-converter CSF samples relative to converters. Representative boxplots of the 12 most significant biomarkers are shown in [120]Fig. 3B–I, while boxplots of the remaining discriminatory proteins can be found in [121]Supplementary Fig. 5. ROC analysis was used to determine the optimum thresholds to produce the greatest predictive accuracy for each discriminatory protein identified. Figure 3. [122]Figure 3 [123]Open in a new tab Proteomics and multi-omics results. (A) Representative OPLS-DA scores plot illustrating discrimination between clinically defined converters (square) and non-converter (circle) using CSF proteomics measurements. (B–I) Representative boxplots of the highest ranked significant discriminatory proteins identified by the OPLS-DA analysis in clinically defined converters and non-converters. Dashed lines represent the optimal threshold to achieve the greatest accuracy as determined by ROC analysis. For univariate analysis, two-sample t-tests were used for continuous variables while Chi-square tests were used for categorical variables as appropriate. While for multivariate analysis a two-sided Kolmogorov–Smirnov test was used to determine the significance of the OPLS-DA performance on independent test data relative to the null distribution. A Bonferroni correction to account for multiple comparisons was applied throughout. Univariate P-values below 0.05, 0.01 and 0.001 are represented by *, ** and ***, respectively. (J) ROC curves illustrating the performance of the multivariate model (solid black) compared to each component of the model alone. Protein markers RSK like protein kinase (dashed red), MUSK (dashed light blue), DYLT1 (dashed dark blue) along with myo-inositol (dashed pink) and CSF mononuclear levels (dashed purple) combined afford greater predictive accuracy than OCGB status (solid grey) alone. The AUC and accuracy of each ROC curve is displayed in brackets. Clinically defined converter n = 22, clinically defined non-converter n = 32. DNA repair protein XRCC1 was significantly decreased in the converter CSF samples relative to non-converter ([124]Fig. 3I) and was the highest performing biomarker overall; predicting conversion with an AUC, accuracy, sensitivity and specificity of 0.84, 80%, 73% and 84%, respectively. The top 20 most sensitive proteins are listed in [125]Supplementary Table 2 while the top 20 most specific proteins are listed in [126]Table 2. Both Tropomyosin α3-chain (TPM3) and EF-hand calcium-binding domain-containing protein 14 (EFCAB14) CSF levels were decreased in the converter cohort ([127]Fig. 3H and C, respectively) and predicted conversion with 100% sensitivity (rivalling OCGB). Of note, TPM3 predicted conversion with a specificity, accuracy and AUC of 63%, 78% and 0.78, respectively, significantly outperforming OCGB. Eighty-eight of the identified protein biomarkers predict conversion with greater specificity than OCGB (top twenty illustrated in [128]Table 2 full list [129]Supplementary Table 1). In particular, muscle-skeletal receptor tyrosine-protein kinase (MUSK) levels were significantly decreased in converter samples relative to those of non-converters ([130]Fig. 3E), and predicted conversion in this cohort with 100% specificity suggesting that this protein could be used in combination with OCGB to better predict the risk of a second clinical attack in CIS patients. Table 2. Predictive performance of the top 20 identified CSF protein biomarkers with highest specificity compared to the performance of OCGB status. AUC [95% CI] Acc Sens Spec PPV NPV ROC threshold Odds ratio P-value MUSK 0.68 [0.54–0.83] 0.74 0.36 1 1 0.7 118.1 36.57 0.02 MMP13 0.71 [0.57–0.86] 0.74 0.41 0.97 0.9 0.7 93.5 21.46 0.02 CCL17 0.69 [0.54–0.84] 0.76 0.5 0.94 0.85 0.73 106 15 0.03 CXCL1 0.72 [0.57–0.86] 0.72 0.41 0.94 0.82 0.7 1289.6 10.38 0.03 RARRES2 0.72 [0.58–0.86] 0.74 0.5 0.91 0.79 0.73 4102.7 9.67 0.01 SMDT1 0.72 [0.58–0.86] 0.74 0.5 0.91 0.79 0.73 23 9.67 0.02 THBS4 0.66 [0.51–0.81] 0.7 0.41 0.91 0.75 0.69 229.3 6.69 0.03 RPS6KA5 0.79 [0.66–0.92] 0.78 0.64 0.88 0.78 0.78 24.9 12.25 0.002 MFAP4 0.78 [0.65–0.91] 0.78 0.64 0.88 0.78 0.78 8757.5 12.25 0.003 IL22RA2 0.74 [0.6–0.88] 0.76 0.59 0.88 0.76 0.76 185.1 10.11 0.01 LCN10 0.74 [0.6–0.88] 0.74 0.55 0.88 0.75 0.74 58.7 8.4 0.004 SGCB 0.72 [0.58–0.86] 0.72 0.5 0.88 0.73 0.72 110.2 7 0.01 XRCC1 0.84 [0.72–0.95] 0.8 0.73 0.84 0.76 0.82 69.6 14.4 <0.001 MZF1 0.7 [0.55–0.85] 0.74 0.59 0.84 0.72 0.75 187.3 7.8 0.03 CCDC80 0.71 [0.56–0.85] 0.72 0.55 0.84 0.71 0.73 293.1 6.48 0.03 PLEKHA1 0.77 [0.63–0.9] 0.78 0.73 0.81 0.73 0.81 39.2 11.56 0.01 MZT1 0.74 [0.6–0.88] 0.76 0.68 0.81 0.71 0.79 68.8 9.29 0.01 TSSK2 0.74 [0.6–0.88] 0.72 0.59 0.81 0.68 0.74 79.8 6.26 0.005 PPIL2 0.72 [0.57–0.86] 0.72 0.59 0.81 0.68 0.74 94.1 6.26 0.03 COL6A2 0.71 [0.57–0.86] 0.72 0.59 0.81 0.68 0.74 28.8 6.26 0.01 OCGB status 0.66 [0.5–0.81] 59% 100% 31% 50% 100% 0.5 21 0.01 [131]Open in a new tab Proteins are listed from highest to lowest specificity. Acc, Accuracy; AUC, area under the curve; CI, confidence interval; NPV, negative predictive value; PPV, positive predictive value; ROC, receiver operator curve; Sens, sensitivity; Spec, specificity. A linear combination of baseline CSF protein, metabolite and leukocyte concentrations predicts clinical conversion with an AUC of 0.94 and accuracy of 83% As clinical chemistry, metabolomics and proteomics analysis of onset CSF samples revealed several predictive biomarkers of conversion which perform well individually, we next investigated whether a multivariate combination of the identified biomarkers and/or combining with OCGB provided improved predictive accuracy. Five variable multivariate models provided the greatest predictive accuracy (83%) with an AUC of 0.94; the addition of further variables provided no significant increase in accuracy. The variables selected by the multivariate model with greatest accuracy included MUSK, Ribosomal protein S6 kinase alpha-5 (RPS6KA5), Dynein light chain Tctex-type 1 (DYNLT1), CSF myo-inositol and mononuclear cell levels. While inclusion of mononuclear cell counts gave the highest performance, replacing this measure with leukocyte cell counts resulted in a decrease in accuracy of only 2%. The predictive accuracy of the multivariate multi-omics model greatly outperforms the accuracy of each identified biomarker in isolation ([132]Fig. 3J). Interestingly, OCGB was not selected in the top performing model, likely due to the increased overall accuracy of baseline mononuclear and leukocyte cell counts in this cohort. Indeed, replacement of mononuclear cell count with OCGB in the multivariate model results in a decrease in predictive accuracy of 11%. Protein pathway enrichment analysis reveals perturbations in cytokine, TNF and leukocyte proliferation pathways in converters while proteins upregulated in non-converters are consistent with dysregulated cellular assembly and rheumatoid arthritis Protein pathway enrichment analysis revealed that the top 89 discriminatory variables are linked through several pathways and physiological functions. The discriminatory proteins upregulated in the converter group relative to the non-converter group are consistent with perturbations in cytokine, TNF, and interferon-gamma signalling pathways along with leukocyte proliferation, leukocyte mediated immune response and chemotaxis ([133]Fig. 4A). In contrast, those proteins elevated in the non-converters are consistent with cellular assembly, proliferation, and survival pathways including regulation of the MAPK cascade in addition to immune activation and chemotaxis pathways ([134]Fig. 4B). These results point to distinct protein pathway perturbations in the converter and non-converter groups at baseline, suggesting potentially discrete underlying pathology. Of note, the proteins upregulated in the converters were enriched in disease pathways consistent with viral infection, retinal detachment, MS and atherosclerosis ([135]Fig. 5A) suggesting that the proteome of the converter patients was representative of MS at onset. In contrast, the proteins upregulated in the non-converters are enriched in disease pathways consistent with rheumatoid arthritis, degenerative polyarthritis, coronary artery disease, and prostatic neoplasms ([136]Fig. 5B). Figure 4. [137]Figure 4 [138]Open in a new tab Protein pathway enrichment analysis. Top significant pathways associated with the identified proteins which are upregulated in clinically defined (A) converters and (C) non-converters. P-values <0.01 were considered significant. Visualization of the enrichment network associated with proteins upregulated in (B) converters (D) and non-converters. Nodes are coloured according to pathway while larger nodes represent smaller P-values. Similar and related pathways are grouped together. Clinically defined converter n = 22, clinically defined non-converter n = 32. Figure 5. [139]Figure 5 [140]Open in a new tab Clustergram of associations of significantly enriched disease pathways with proteins upregulated in clinically defined (A) converter CSF and (B) non-converter CSF. The proteins upregulated in converters were consistent with viral infection, retinal damage, and MS disease pathways while the proteins upregulated in non-converters suggest atypical rheumatoid arthritis presentation. Red squares represent a significant enrichment of the identified gene (rows) with a given pathological pathway (columns) while white squares represent no significant interaction. Fisher exact test, corrected P-values <0.05 were considered significant. Clinically defined converter n = 22, clinically defined non-converter n = 32. Discussion This study provides a detailed exploration of predictive biomarkers in a prospective cohort of 54 individuals with CIS. Due to differences in past McDonald diagnostic criteria and the fact that the 2017 McDonald criteria have low specificity, the occurrence of a second attack was used to define clinical conversion in this cohort. This not only ensures that the predictive biomarkers identified are applicable to the ‘gold standard’ definition of MS, but also ensures clinical utility as those individuals who have a second clinical attack are more likely to develop severe permanent disability than those with ‘silent’ demyelinating events and, thus, should be treated as early as possible. In total, only 41% of the patients recruited had a second attack and converted to clinically defined MS over the course of follow-up, which is somewhat lower than figures reported in a 20-year follow-up study where 63% of patients experienced a second clinical event.[141]^3 This is likely due to the maximum length of follow-up in this study, which was 10 years. The average time to conversion was 1.7 years and 72% of the converters had a second clinical attack within 2 years. As a result, only non-converters with a minimum of 2 years of follow-up were included in this study. No significant difference was observed in either the gender, age, or EDSS at onset between the converter and non-converter groups. Previous reports suggest that between 61 and 68.9% of patients with CIS test positive for OCGB[142]^49 and 50% of these have a second clinical attack within 4 years of onset.[143]^16 In line with this, 81% of the CIS patients in our cohort tested positive for OCGB at baseline, of which, 50% converted to clinically defined MS. In contrast, none of the patients in the converter group tested negative for OCGB at baseline. As a result, the use of OCGB in isolation predicted occurrence of a second clinical attack with 100% sensitivity but only 31% specificity resulting in an overall accuracy and AUC of 59% and 0.66, respectively. CSF leukocyte cell count was moderately elevated (>4 but < 30 cells/µl) in 54% of the cohort at baseline. A larger proportion of the patients who went on to have a second clinical attack (72%) exhibited elevated leukocyte cell counts at baseline when compared to those who did not clinically convert over the follow up period (41%). As a result, a significant elevation in leukocyte cell counts was observed in converter CSF samples relative to non-converters, which is consistent with an increased immune response at baseline in these individuals. Interestingly, when the total leukocyte population was evaluated at the cell-type level, a significant elevation in the mononuclear, but not PMN cells, was observed in the converter group suggesting that the elevated leukocyte levels are dominated by changes in mononuclear cells. Leukocytes are known to be moderately elevated (4–50 cells/mm^3) in up to 60% of MS cases[144]^50 while relapsing remitting (RR) MS patients with pleocytic CSF (>5 cells/mm^3) are known to have increased annualized relapse rates.[145]^51 Indeed, using leukocyte levels >4 cells/mm^3 predicted transition to clinically defined MS with an accuracy of 65%, although ROC analysis revealed that a threshold of 3.5 cells/mm^3 was optimal ([146]Fig. 1G). Interestingly, the mononuclear cell population had greater predictive value than leukocyte cell count. While less sensitive, both CSF mononuclear and leukocyte levels predicted the occurrence of second clinical attack with higher accuracy than OCGB (70% and 67%, respectively). As these parameters are routinely measured in order to rule out other potential diagnoses, this suggests that inclusion of mononuclear cells as an adjunct to MRI parameters and OCGB could improve identification of those at high risk of clinically converting to clinically defined MS without the requirement for an additional biochemical test. Changes in the concentration of soluble, small molecule metabolites in biofluid samples, including CSF, are known to be associated with CNS pathology, increased inflammation, and an elevated immune response. We and others have demonstrated that relatively subtle differences in the pathological mechanisms of inflammatory and demyelinating CNS disease are detectable in the peripheral metabolome.[147]^35^,[148]^52^,[149]^53 Multivariate analysis revealed that myoinositol, glucose, creatine and lactate levels discriminate between converters and non-converters. Converters had significantly higher average CSF myo-inositol and glucose levels. While lactate and creatine did not reach univariate significance when corrected for multiple comparisons the average lactate levels were higher and creatine levels lower in converters compared to non-converters at baseline. Perturbations in glucose and lactate CSF levels in the converter group are consistent with dysregulated CNS energy metabolism. Perturbed energy metabolism in MS patients has been previously reported[150]^54 and linked to oxidative damage, mitochondrial function imbalance, and neuroaxonal degeneration. Increased CSF lactate levels have been observed in RRMS patients and correlate with markers of neuroaxonal damage.[151]^55 Under healthy conditions, the brain utilizes 25% of the body’s total glucose[152]^56 which is converted to lactate by anaerobic glycolysis within glial cells and shuttled to neurons as the primary energy source. Ineffective nerve conductance as a result of demyelination coupled with CNS inflammation in MS results in an increased neuronal energy demand.[153]^57 Thus, the metabolite changes observed may represent increased neuronal energy metabolism dysfunction in those who go on to have a second clinical attack. Myo-inositol is a component of plasma membranes and myelin[154]^58 and thus the increased levels of myo-inositol observed in the converter cohort may be a direct result of myelin breakdown in the CNS. In addition, an increased level of myo-inositol is known to be a marker of gliosis[155]^59 and elevated levels of myo-inositol have been observed in MS and CIS lesions.[156]^60 This could suggest that CIS patients, who go on to have a second clinical attack, have greater demyelination and glial activation at baseline relative to non-converter which, while detectable biochemically, is not yet distinguishable radiologically or clinically. Over 5000 proteins were measured in the baseline CSF samples using the SomaScan^® platform to uncover novel predictive biomarkers of clinically defined conversion. Multivariate pattern recognition identified 89 protein biomarkers. Protein pathway enrichment analysis confirmed that the proteins upregulated in the clinically defined converter group were consistent with increased inflammation and an altered immune response. Significant enrichment in pathways regulating leukocyte proliferation and immunity is consistent with the elevated white blood cell counts and increased gliosis observed in these patients. Furthermore, the protein biomarkers identified were significantly enriched in regulation of cytokine production and receptor signalling, TNF signalling, and response to interferon gamma. Investigation of disease associated protein pathways revealed that the proteins elevated in the converter CSF were significantly enriched in MS suggesting that the proteins identified are indeed able to identify MS at the point of first attack. In addition, viral infection pathways were significantly enriched in the converter cohort which may correspond to virus-associated onset that is observed in many MS cases. In contrast, the proteins upregulated in the non-converter cohort were significantly enriched in pathways associated with cellular assembly, proliferation, and survival. Pathways associated with activation of the immune response were also significantly enriched in the non-converters at baseline. Interestingly, disease pathways associated with the proteins elevated in the non-converter group were associated with rheumatoid arthritis and degenerative arthritis as well as coronary artery disease, suggesting some individuals in this group may have atypical presentation of peripheral inflammatory, immune-mediated disease. Indeed, recent evidence suggests that chronic peripheral inflammation can lead to blood brain barrier dysfunction and CNS involvement in diseases such as rheumatoid arthritis.[157]^61 The proteins with greatest overall predictive power were DNA repair protein XRCC1 (XRCC1), dynein light chain Tctex-TYPE 1 (DYNLT1), and natural cytotoxicity triggering receptor 1 (NCR1) with AUC values of 0.84, 0.84 and 0.83, respectively. Partial loss of XRCC1 renders brain cells vulnerable to oxidative damage[158]^62 suggesting that the decreased levels of XRCC1 in the converters could reflect an increased propensity to CNS oxidative damage in this cohort. This is consistent with oxidative damage and demyelinating induced perturbations in energy metabolism observed in the metabolomics analysis. DYNLT1 regulates neuronal morphogenesis and, during cortical development, inhibits neurogenesis.[159]^63 Thus, increased levels of DYNLT1 in converter CSF may reflect increased inhibition of neurogenesis in this group or dysregulation of neuronal architecture. Increased levels of NCR1 in the converter group may represent increased activation of natural killer (NK) cells.[160]^64 The majority of NCR1 expression in the CNS is localized to astrocytes[161]^65 supporting the metabolomics results which were suggestive of increased gliosis in the converter cohort. While NK cell activation in the CNS had been implicated in several autoimmune diseases[162]^66 their role in MS remains to be elucidated. The extensive biomarker discovery employed here successfully identified clinical chemistry, metabolite, and protein biomarkers of conversion to clinically defined MS, each of which perform well in isolation and provide novel insight into the different pathological mechanisms in converters and non-converters at baseline. Future work, on larger prospective cohorts, will investigate whether the biomarkers identified here could be included in the McDonald criteria to improve specificity when identifying patients at high risk of second clinical attack. In particular, the data presented suggest that a larger follow-on study investigating the impact of leukocyte and mononuclear cell counts on prediction of clinical conversion would be particularly attractive as these measures are routinely available. Ongoing work will validate the identified biomarkers in the further independent cohorts, develop a method to measure the identified biomarkers using a single, cost-effective, assay for use in a clinical setting, and compare this test with other recently identified predictive biomarkers including baseline MRI findings, clinical variables, NfL and IgG/M levels. Supplementary material [163]Supplementary material is available at Brain Communications online. Funding F.P. received funding from the Multiple Sclerosis Society UK (grant 59) and the Medical Research Council (MC_PC_15029). T.Y. is supported by the Ministry of Health, Singapore through the National Medical Research Council Research Training Fellowship (NMRC/Fellowship/0038/2016). Competing interests Y.Z., M.S., S.A., T.D.W.C., R.H., J.O., D.L. and D.C.A. declare no competing interests. S.A. declares no competing interests. J.K. received speaker fees, research support, travel support, and/or served on advisory boards by ECTRIMS, Swiss MS Society, Swiss National Research Foundation, (320030_160221), University of Basel, Bayer, Biogen, Celgene, Merck, Novartis, Roche, Sanofi. FP has received travel awards from ECTRIMS, Merck, and the Multiple Sclerosis Society UK. T.Y. has received travel grants from UCB, Merck and PACTRIMS, and travel awards from ECTRIMS, ACTRIMS and Orebro University. J.P. is partly funded by highly specialized services to run a national congenital myasthenia service and a neuromyelitis service. She has received support for scientific meetings and honorariums for advisory work from Merck Serono, Biogen Idec, Novartis, Teva, Chugai Pharma and Bayer Schering, Alexion, Roche, Genzyme, MedImmune, EuroImmun, MedDay, Abide ARGENX, UCB and Viela Bio and grants from Merck Serono, Novartis, Biogen Idec, Teva, Abide, MedImmune, Bayer Schering, Genzyme, Chugai and Alexion. She has received grants from the MS society, Guthrie Jackson Foundation, NIHR, Oxford Health Services Research Committee, EDEN, MRC, GMSI, John Fell and Myaware for research studies. Supplementary Material fcab084_Supplementary_Data [164]Click here for additional data file.^ (1.6MB, pdf) Glossary AUC = area under the curve CIS = clinically isolated syndrome EDSS = expanded disability scale status IQR = interquartile range MS = multiple sclerosis NK = natural killer NMR = nuclear magnetic resonance OCGB = oligoclonal bands OPLS-DA = orthogonal partial least squares discriminant analysis PIRA = disability progression independent of relapse activity PMN = polymorphonuclear Q[alb] = CSF/serum albumin ratio ROC = receiver operator curves RR = relapsing remitting VIP = variable importance projection score References