Abstract Amyotrophic lateral sclerosis (ALS) is a progressive and fatal neurodegenerative disease. While genetics and other factors contribute to ALS pathogenesis, critical knowledge is still missing and validated biomarkers for monitoring the disease activity have not yet been identified. To address those aspects we carried out this study with the primary aim of identifying possible miRNAs/mRNAs dysregulation associated with the sporadic form of the disease (sALS). Additionally, we explored miRNAs as modulating factors of the observed clinical features. Study included 56 sALS and 20 healthy controls (HCs). We analyzed the peripheral blood samples of sALS patients and HCs with a high-throughput next-generation sequencing followed by an integrated bioinformatics/biostatistics analysis. Results showed that 38 miRNAs (let-7a-5p, let-7d-5p, let-7f-5p, let-7g-5p, let-7i-5p, miR-103a-3p, miR-106b-3p, miR-128-3p, miR-130a-3p, miR-130b-3p, miR-144-5p, miR-148a-3p, miR-148b-3p, miR-15a-5p, miR-15b-5p, miR-151a-5p, miR-151b, miR-16-5p, miR-182-5p, miR-183-5p, miR-186-5p, miR-22-3p, miR-221-3p, miR-223-3p, miR-23a-3p, miR-26a-5p, miR-26b-5p, miR-27b-3p, miR-28-3p, miR-30b-5p, miR-30c-5p, miR-342-3p, miR-425-5p, miR-451a, miR-532-5p, miR-550a-3p, miR-584-5p, miR-93-5p) were significantly downregulated in sALS. We also found that different miRNAs profiles characterized the bulbar/spinal onset and the progression rate. This observation supports the hypothesis that miRNAs may impact the phenotypic expression of the disease. Genes known to be associated with ALS (e.g., PARK7, C9orf72, ALS2, MATR3, SPG11, ATXN2) were confirmed to be dysregulated in our study. We also identified other potential candidate genes like LGALS3 (implicated in neuroinflammation) and PRKCD (activated in mitochondrial-induced apoptosis). Some of the downregulated genes are involved in molecular bindings to ions (i.e., metals, zinc, magnesium) and in ions-related functions. The genes that we found upregulated were involved in the immune response, oxidation–reduction, and apoptosis. These findings may have important implication for the monitoring, e.g., of sALS progression and therefore represent a significant advance in the elucidation of the disease’s underlying molecular mechanisms. The extensive multidisciplinary approach we applied in this study was critically important for its success, especially in complex disorders such as sALS, wherein access to genetic background is a major limitation. Keywords: sporadic amyotrophic lateral sclerosis, microRNA, target genes, peripheral blood markers, high throughput next-generation sequencing (HT-NGS), clinical parameters, bioinformatics, pathway analysis Introduction Amyotrophic lateral sclerosis is a progressive neurodegenerative disease in which different pathogenic mechanisms, including inflammation, oxidative stress, glutamate excitotoxicity, protein misfolding, apoptosis and dysfunction of axonal transport, have been identified ([40]Al-Chalabi et al., 2017; [41]Vejux et al., 2018). Clinically, the disease predominantly affects upper and lower motor neurons, although impairment of extramotor systems – such as temporal, behavioral and executive frontal circuits – has also been reported suggesting that ALS should be better considered a multisystem disorder ([42]Phukan et al., 2007; [43]Goldstein and Abrahams, 2013). Progressive spinal muscular atrophy and primary lateral sclerosis have been classified as restricted phenotypes of ALS (10% of cases) but a clear separation is still controversial ([44]Gordon et al., 2013). Isolated bulbar involvement (5%) and association with cognitive/behavioral signs that fulfill the diagnostic criteria of frontotemporal dementia (FTD) (5–15%) ([45]Strong et al., 2017) underline the heterogeneity of the disease that also involves the age at onset, the rate of progression and finally the overall prognosis, thus drawing a complex scenario of the ALS phenotypes ([46]Swinnen and Robberecht, 2014). Most ALS cases are sporadic (sALS), whereas a family history (fALS) is found in 10% of patients. Indeed, several factors other than genes (toxic exposures, diet, and others) seem to possibly contribute to ALS pathogenesis ([47]Paez-Colasante et al., 2015; [48]Morgan and Orrell, 2016; [49]Al-Chalabi et al., 2017) but a definitive conclusion on the effective role of the different factors is still awaited, mostly due to methodological biases (i.e., low power of the studies) ([50]van Es et al., 2017). Another unsolved issue is the lack of validated biomarkers for ALS pathogenesis and monitoring, mainly due to the phenotypic heterogeneity of the disease. Molecular markers have been first searched in the CSF, since liquor may provide more biological information on the biochemical and molecular processes underlying the disease, given its intimate connection with the central nervous system (CNS). Indeed, suggestive CSF markers have been evaluated over the years, with particular attention to those related to neuroinflammation, like metalloproteinases-2 and -9, IL-2 and -6, or neurodegeneration, e.g., Tau-protein or TDP-43 ([51]Vejux et al., 2018). However, presently only the neurofilament (NF), main product of neuroaxonal breakdown, seems to represent a sensitive biomarker of neurodegeneration ([52]Lu et al., 2012). Significant increase of NF was found in the CSF and serum of ALS patients, thus justifying its inclusion in the diagnostic protocol as well as in the evaluation of clinical course, given an increased level of NF-light chain in CSF of ALS that was found to be predictive of rapid clinical progression ([53]Tortelli et al., 2015; [54]Steinacker et al., 2016). Recently, a report showed increased levels of three macrophage-derived chitinases (CHIT1, CHI3L1, and CH13L2) in CSF of ALS patients and found a correlation with the progression rate of the disease, independently of their NF levels ([55]Thompson et al., 2018). However, lately the search for biomarkers has increasingly focused on blood samples (i.e., serum and plasma) obtained from a less invasive procedure, in an effort to identify early markers of ALS onset as well as those useful for monitoring its progression ([56]Vu and Bowser, 2017). The report on the implication of two ALS genes, TDP-43 and FUS, in the biogenesis of microRNAs (miRNAs) has sparked great interest about their potential role in the pathogenesis and progression of ALS ([57]Freischmidt et al., 2013). miRNAs are small non-coding RNA molecules that regulate at the post-transcriptional level the expression of genes involved in cellular response to stressors and other pathogenic insults ([58]Viader et al., 2011; [59]Kye and Goncalves Ido, 2014). Interestingly, despite being found rapidly degraded in post-mortem brain tissue, miRNAs are stable in serum and other body fluids, such as CSF ([60]Saraiva et al., 2017). Therefore their evaluation can be informative in healthy as in pathological conditions. In ALS, several studies have reported the occurrence of miRNAs dysregulation ([61]Waller et al., 2017b; [62]Matamala et al., 2018; [63]Vejux et al., 2018; [64]Vrabec et al., 2018). Among the others, a disease-specific two-fold upregulation of miR-338-3p, involved in apoptosis, neurodegeneration and glutamate clearance, has been reported in blood leukocytes, CSF serum, and spinal cords of patients with sALS compared to controls ([65]De Felice et al., 2014). Furthermore, 30 miRNAs significantly downregulated have been identified in the serum of fALS patients, the majority of them already dysregulated in pre-symptomatic subjects carrying some of the mutations causative of the disease ([66]Freischmidt et al., 2014). These miRNAs represent potential targets for therapeutic interventions at the very early stages of the disease. The availability of high-throughput technologies for large scale sequencing (HT-NGS) significantly improved the possibility of investigating with an unbiased approach the patterns of miRNAs associated with diseases like ALS. Importantly, miRNAs expression profiles can reflect the activation of specific pathogenic pathways in many neurodegenerative diseases, including ALS ([67]Waller et al., 2017a). That approach, in combination with the evaluation of genes expression (i.e., mRNA), could reveal novel pathogenic hypotheses worthy of testing. Based on the above developments we performed an extensive transcriptomic investigation in sALS patients with the primary aim of identifying dysregulation of miRNAs and mRNAs associated with the disease, and secondly to analyze their possible role as modulating factors of clinical features. Materials and Methods We planned a multidisciplinary strategy starting with the subjects’ selection, the molecular analyses on their peripheral blood samples, as well as the bioinformatics/statistics evaluations of the data. Subject Recruitment and Clinical Evaluation Patients with probable or definite ALS (revised El Escorial Criteria) ([68]Brooks et al., 2000) were recruited at the time of their first diagnosis (age ranging from 18 to 80 years) at the Department of Basic Sciences, Neurosciences and Sense Organs, University of Bari, Bari, Italy. Subjects with positive history of other neurological diseases, head/spinal trauma, psychiatric disorders, and alcohol/psychotropic drug use were excluded from the study. The search for mutations within those genes (and flanking intron–exon boundaries) that are commonly associated with ALS in our geographic area (Southern Italy) returned negative for all except six patients (two with SOD1 mutation, one patient with TARDBP mutation and three carrying the pathogenic C9orf72 expansions). Since no familial history of ALS was ascertained at the study entry, they were considered de novo mutations or mutations with incomplete penetrance. We divided the study in two phases. In the first one (discovery phase) a small sample of sALS patients was recruited, whereas in the second phase (validation phase) we examined a larger and distinct sample of sALS patients. We collected the following information from all patients: demographic and clinical data including gender, age at symptom onset as referred by the patient, site of onset (bulbar or spinal), ODI (onset to diagnosis interval), time to generalization (time interval between disease spreading from spinal to bulbar district or vice versa), disease duration (time interval between symptom onset and blood sampling). Clinical severity was assessed by the revised ALS Functional Rating scale (ALSFRSr) ([69]Cedarbaum et al., 1999) and the Manual Muscle Testing (MMT) (medium score). Disease progression rate was calculated as: (48-ALSFRSr score at blood sampling)/disease duration at blood sampling. Forced vital capacity was also measured. The enrollment end-date was December 31, 2017. We recruited healthy subjects with no history of neurological diseases as HCs in the same geographic area (Bari, Southern Italy). The Ethic Committee of Azienda Ospedaliera Policlinico, University of Bari, Italy, approved the study and we obtained a signed informed consent from all participants at the time of their enrollment (according to the Declaration of Helsinki^[70]1). Molecular Analysis Peripheral blood samples were taken from patients and controls and stored at -20°C in 3 ml PAXgene Blood RNA Tubes (PreAnalytiX Qiagen/BD, Hombrechtikon, Switzerland). Total RNA was isolated using the PAXgene Blood RNA Kit (PreAnalytiX Qiagen/BD, Hilden, Germany) at the Institute of Biomedical Technologies, National Research Council, Bari, Italy. RNA concentration and purity were measured by Nanodrop ND-1000 (Thermo Scientific, Wilmington, DE, United States) and RNA 6000 Pico chip on Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, United States), respectively. Samples with RNA integrity number (RIN) scores higher than 7 and with A260/A280 values in the 1.8–2.2 range were processed for deep sequencing. HT-NGS (Discovery Phase) The RNA samples were sequenced using an Illumina HiSeq2500 platform. SmallRNA (sRNA) libraries were prepared by the TruSeq sRNA Sample Preparation kit (Illumina) and their quality was confirmed on a Bioanalyzer 2100 instrument. A multiplexed pool of equimolar amounts of individual sRNA-derived libraries was sequenced to generate 50 bp single-end reads, resulting in around 10 million reads/sample. The mRNA libraries were prepared using the TruSeq Stranded mRNA Sample Preparation kit, fluorimetrically quantified and analyzed, pooled together to obtain equimolar concentrations into a multiplex sequencing pool and sequenced to generate 2 bp × 100 bp paired-end reads (around 30 million reads/sample). RT and Microfluidic_qPCR (Validation Phase) Total RNA/sample was reverse transcribed into cDNA, amplified, diluted, and used as a template for microfluidic_qPCR analysis (TaqMan Advanced miRNA Cards, ABI). PCR amplification was performed under the manufacturer’s protocols. Raw_Ct-values were calculated using Expression Suite^TM software v1.1 (Life Technologies, Thermo Fisher Scientific). The auto-baseline algorithm in the software was used to compensate for background noise for each amplification curve, and the thresholds were automatically adjusted to the log-linear range ([71]Liguori et al., 2018). HT-NGS Data Analysis We processed sRNA/RNA-Seq data according to a bioinformatics pipeline that we developed and tested in other neurological diseases ([72]Liguori et al., 2018) and that consisted in the steps described below. Quality Check The quality control of the obtained reads was performed using the FastQC package^[73]2. We checked for low-quality reads by base sequence quality, sequence quality scores, base sequence content, base GC content, sequence GC content, base N content, sequence length distribution, sequence duplication levels, overrepresented sequences, and kmer content. If reads were of low quality by the above criteria, we removed them from the subsequent analysis. Read Identification (sRNA) The sRNA reads were mapped, using Bowtie aligner ([74]Langmead et al., 2009), an ultrafast and memory-efficient alignment of short DNA sequences to the human genome, against an in-house-developed reference database ncRNAdb, a comprehensive and non-redundant dataset of non-coding (nc-RNA) sequences and annotations extracted from public database like miRBase^[75]3, Vega^[76]4, Ensembl^[77]5, RefSeq^[78]6, piRNAbank^[79]7, GtRNAdb^[80]8, and HGNC^[81]9. The reads that were not mapped to known ncRNAs were aligned against the human genome and passed to mirDeep2 software^[82]10, which computationally identifies novel miRNA and their mature miRNA products. Read Identification (mRNA) The reads obtained from total RNA were mapped against the human genome and known human transcripts (GRCh38), using Bowtie2 which supports gapped alignment and is faster on long paired-end reads. Expression Quantification In order to obtain reliable read counts and to fix the problem of multireads (reads mapping to more than one reference location) ([83]Consiglio et al., 2016), we employed the RSEM tool for accurate expression estimations ([84]Li and Dewey, 2011). The count values produced by the Bayesian model implemented in RSEM were used as expression values in this work. When normalization of the expressions was necessary for some analysis steps, the trimmed mean of M-values (TMM) normalization method was used ([85]Robinson and Oshlack, 2010). Differential Expression (DE) Analysis Expression estimations computed for mRNAs (coding genes) and small ncRNAs were compared among the sALS and HC groups with the aim of determining statistically significant changes in the levels of expression. Since this is a very crucial step in the bioinformatics workflow and there is no general consensus regarding which method performs best in a given situation, we combined the results of three different software packages for DE analysis: edgeR^[86]11, the DESeq2^[87]12, and the limma^[88]13. The edgeR and DESeq2 were designed for NGS data and include data normalization and p-value correction for multiple testing by false discovery rate (FDR). The limma software was recently upgraded to enable measurements from read counts while taking into account the peculiarities of RNA-seq data ([89]Law et al., 2014). Specifically, genes were filtered out if they failed to achieve a count per million (cpm) value of 1 in at least 20% of samples. The expressions were simultaneously scale normalized using TMM and variance was stabilized using the voom technique. The corresponding log-cpm values and associated weights were the inputs in the limma standard linear modeling and empirical Bayes for DE analysis. The change in the expression was considered statistically significant if the adjusted p-value was < 0.05. Statistical Analysis For qRT-PCR Data Two normalization tools, NormFinder and geNorm, were used to identify the most suitable endogenous reference genes. The comparison of normalized values between the subgroups was obtained according to the 2^-ΔΔCt method (p-value < 0.05). Normality of data was assessed by the Shapiro–Wilk test. We performed the statistical analysis between miRNAs (DE) with the Expression Suite software, which consists of two-tailed Student’s t-test followed by Benjamini–Hochberg FDR, in order to adjust p-value (adjusted p-value < 0.05). We performed the comparison for every miRNA, and ratios between miRNAs and multiple patterns, representing specific transcriptome profiles. For Clinical Correlations Spearman correlation coefficient test was used to evaluate the main clinical features within sALS subjects, as well as the association between these characteristics and each miRNA (fold change). Furthermore, in order to evaluate the prediction accuracy of each miRNA with respect to the type of disease onset (spinal/bulbar) we used boxplots and ROC curves. ROC curves were plotted with the R package pROC ([90]Robin et al., 2011), while boxplots were produced with the basic R drawing tools. MicroRNA Target Analysis Starting from the results of the DE analysis performed on the two datasets (sRNAs and mRNAs), the relationships between DE miRNAs and DE target genes were investigated through a bioinformatics approach. Their interactions were selected using two databases of experimentally validated bindings (miRtarbase and DIANA-Tarbase). In order to consider the most reliable information about the interactions between the significant miRNAs and their target genes, we selected those bindings that were confirmed in tests with multiple reporters and that were positive at least by Dual Luciferase Reporter Assay, an in vitro test that explores the ability of a single miRNA to post-transcriptionally downregulate putative targets through its binding to specific sites within their 3′ UTRs ([91]Clancy et al., 2007; [92]Jin et al., 2013). Pathway Analysis Functional and pathway enrichment analysis of identified DE genes was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID v6.8^[93]14) tool. DAVID is a gene functional enrichment program that provides a large series of functional annotation tools and pathway databases (e.g., KEGG, Biocarta, Reactome databases). We determined statistical significance using the one-tailed Fisher’s exact test followed by the Benjamini correction; adjusted p-value < 0.05 was set as the threshold value. Results Study Subjects Characteristics Table [94]1 shows the demographic and clinical characteristics of the sALS and HC groups, at both the discovery and the validation phases. In the discovery phase, the age at blood sample of the HC group was on average about 20 years younger than the sALS patients (p = 0.018). In the validation phase, age, or gender did not differ between the patients and controls. Age at disease onset significantly correlated with the progression rate (r[s] = 0.36, p = 0.010) and ALSFSRr (r[s] = -045, p = 0.001). Table 1. Demographic and clinical characteristics of the study groups. ALS HC Discovery phase (n = 6) (n = 5) Gender (number) 4F, 2M 3F, 2M Age at sample (mean ± SD), years^∗ 69.7 ± 7.6 49.2 ± 14.9^∗ Age at onset (mean ± SD), years 66.3 ± 6.1 Clinical signs at onset: Spinal 5 Bulbar 1 Disease duration (median, IQR), months 28.6 (13.0) Onset-to-diagnosis interval (median, IQR), months 17.3 (6.1) ALSFRSr (mean ± SD) 24.7 ± 2.3 MMT-m (median, IQR) 6.5 (3.6) Disease progression rate (median, IQR) 0.8 (0.5) Generalization (number) 6 (100%) Time to generalization (median, IQR), months 21.8 (16.0) Validation phase (n = 50) (n = 15) Gender (number) 23F, 27M 9F, 6M Age at sample (mean ± SD), years 64.2 ± 11.0 60.9 ± 5.4 Age at onset (mean ± SD), years 62.5 ± 11.0 Clinical signs at onset: Spinal 36 Bulbar 14 Disease duration (median, IQR), months 18.5 (15.4) Onset-to-diagnosis interval (median, IQR), months 12.5 (10.4) ALSFRSr (median, IQR) 35.0 (12.0) MMT-m (median, IQR) 8.8 (2.0) Disease progression rate (median, IQR) 0.7 (0.6) Generalization (number) 45 (90%) Time to generalization (median, IQR), months 10.7 (12.2) [95]Open in a new tab Data are presented as means ± standard deviation (SD), in case of normal distribution, or as median and InterQuartile Ratio (IQR). ^∗Mann–Whitney U-test: p = 0.018. F = female; M = male: ALSFRSr, ALS Functional Rating scale; MMT-m, Manual Muscle Testing medium score. Identification of miRNAs and mRNAs Differentially Expressed (DE) in sALS Versus HC After the discovery experiment performed with HT-NGS methodology, the comparisons of miRNAs expression levels within the study groups revealed 107 mature miRNAs significantly DE between sALS and HC. According to our selection criteria ([96]Liguori et al., 2018), we decided to consider only miRNAs with mean number of reads higher than 25, a Fold Change higher than 2, and a Dispersion Index range of 0–1.6. The 42 miRNAs included in the final list (all downregulated in our sALS population, see Table [97]2) were subjected to qRT-PCR validation as described in the section below. Most of these miRNAs have been shown to be associated with ALS or other neurogenerative diseases (references of