Abstract

   Amyotrophic lateral sclerosis (ALS) is a progressive and fatal
   neurodegenerative disease. While genetics and other factors contribute
   to ALS pathogenesis, critical knowledge is still missing and validated
   biomarkers for monitoring the disease activity have not yet been
   identified. To address those aspects we carried out this study with the
   primary aim of identifying possible miRNAs/mRNAs dysregulation
   associated with the sporadic form of the disease (sALS). Additionally,
   we explored miRNAs as modulating factors of the observed clinical
   features. Study included 56 sALS and 20 healthy controls (HCs). We
   analyzed the peripheral blood samples of sALS patients and HCs with a
   high-throughput next-generation sequencing followed by an integrated
   bioinformatics/biostatistics analysis. Results showed that 38 miRNAs
   (let-7a-5p, let-7d-5p, let-7f-5p, let-7g-5p, let-7i-5p, miR-103a-3p,
   miR-106b-3p, miR-128-3p, miR-130a-3p, miR-130b-3p, miR-144-5p,
   miR-148a-3p, miR-148b-3p, miR-15a-5p, miR-15b-5p, miR-151a-5p,
   miR-151b, miR-16-5p, miR-182-5p, miR-183-5p, miR-186-5p, miR-22-3p,
   miR-221-3p, miR-223-3p, miR-23a-3p, miR-26a-5p, miR-26b-5p, miR-27b-3p,
   miR-28-3p, miR-30b-5p, miR-30c-5p, miR-342-3p, miR-425-5p, miR-451a,
   miR-532-5p, miR-550a-3p, miR-584-5p, miR-93-5p) were significantly
   downregulated in sALS. We also found that different miRNAs profiles
   characterized the bulbar/spinal onset and the progression rate. This
   observation supports the hypothesis that miRNAs may impact the
   phenotypic expression of the disease. Genes known to be associated with
   ALS (e.g., PARK7, C9orf72, ALS2, MATR3, SPG11, ATXN2) were confirmed to
   be dysregulated in our study. We also identified other potential
   candidate genes like LGALS3 (implicated in neuroinflammation) and PRKCD
   (activated in mitochondrial-induced apoptosis). Some of the
   downregulated genes are involved in molecular bindings to ions (i.e.,
   metals, zinc, magnesium) and in ions-related functions. The genes that
   we found upregulated were involved in the immune response,
   oxidation–reduction, and apoptosis. These findings may have important
   implication for the monitoring, e.g., of sALS progression and therefore
   represent a significant advance in the elucidation of the disease’s
   underlying molecular mechanisms. The extensive multidisciplinary
   approach we applied in this study was critically important for its
   success, especially in complex disorders such as sALS, wherein access
   to genetic background is a major limitation.

   Keywords: sporadic amyotrophic lateral sclerosis, microRNA, target
   genes, peripheral blood markers, high throughput next-generation
   sequencing (HT-NGS), clinical parameters, bioinformatics, pathway
   analysis

Introduction

   Amyotrophic lateral sclerosis is a progressive neurodegenerative
   disease in which different pathogenic mechanisms, including
   inflammation, oxidative stress, glutamate excitotoxicity, protein
   misfolding, apoptosis and dysfunction of axonal transport, have been
   identified ([40]Al-Chalabi et al., 2017; [41]Vejux et al., 2018).
   Clinically, the disease predominantly affects upper and lower motor
   neurons, although impairment of extramotor systems – such as temporal,
   behavioral and executive frontal circuits – has also been reported
   suggesting that ALS should be better considered a multisystem disorder
   ([42]Phukan et al., 2007; [43]Goldstein and Abrahams, 2013).
   Progressive spinal muscular atrophy and primary lateral sclerosis have
   been classified as restricted phenotypes of ALS (10% of cases) but a
   clear separation is still controversial ([44]Gordon et al., 2013).
   Isolated bulbar involvement (5%) and association with
   cognitive/behavioral signs that fulfill the diagnostic criteria of
   frontotemporal dementia (FTD) (5–15%) ([45]Strong et al., 2017)
   underline the heterogeneity of the disease that also involves the age
   at onset, the rate of progression and finally the overall prognosis,
   thus drawing a complex scenario of the ALS phenotypes ([46]Swinnen and
   Robberecht, 2014).

   Most ALS cases are sporadic (sALS), whereas a family history (fALS) is
   found in 10% of patients. Indeed, several factors other than genes
   (toxic exposures, diet, and others) seem to possibly contribute to ALS
   pathogenesis ([47]Paez-Colasante et al., 2015; [48]Morgan and Orrell,
   2016; [49]Al-Chalabi et al., 2017) but a definitive conclusion on the
   effective role of the different factors is still awaited, mostly due to
   methodological biases (i.e., low power of the studies) ([50]van Es et
   al., 2017).

   Another unsolved issue is the lack of validated biomarkers for ALS
   pathogenesis and monitoring, mainly due to the phenotypic heterogeneity
   of the disease. Molecular markers have been first searched in the CSF,
   since liquor may provide more biological information on the biochemical
   and molecular processes underlying the disease, given its intimate
   connection with the central nervous system (CNS). Indeed, suggestive
   CSF markers have been evaluated over the years, with particular
   attention to those related to neuroinflammation, like
   metalloproteinases-2 and -9, IL-2 and -6, or neurodegeneration, e.g.,
   Tau-protein or TDP-43 ([51]Vejux et al., 2018). However, presently only
   the neurofilament (NF), main product of neuroaxonal breakdown, seems to
   represent a sensitive biomarker of neurodegeneration ([52]Lu et al.,
   2012). Significant increase of NF was found in the CSF and serum of ALS
   patients, thus justifying its inclusion in the diagnostic protocol as
   well as in the evaluation of clinical course, given an increased level
   of NF-light chain in CSF of ALS that was found to be predictive of
   rapid clinical progression ([53]Tortelli et al., 2015; [54]Steinacker
   et al., 2016). Recently, a report showed increased levels of three
   macrophage-derived chitinases (CHIT1, CHI3L1, and CH13L2) in CSF of ALS
   patients and found a correlation with the progression rate of the
   disease, independently of their NF levels ([55]Thompson et al., 2018).
   However, lately the search for biomarkers has increasingly focused on
   blood samples (i.e., serum and plasma) obtained from a less invasive
   procedure, in an effort to identify early markers of ALS onset as well
   as those useful for monitoring its progression ([56]Vu and Bowser,
   2017).

   The report on the implication of two ALS genes, TDP-43 and FUS, in the
   biogenesis of microRNAs (miRNAs) has sparked great interest about their
   potential role in the pathogenesis and progression of ALS
   ([57]Freischmidt et al., 2013). miRNAs are small non-coding RNA
   molecules that regulate at the post-transcriptional level the
   expression of genes involved in cellular response to stressors and
   other pathogenic insults ([58]Viader et al., 2011; [59]Kye and
   Goncalves Ido, 2014). Interestingly, despite being found rapidly
   degraded in post-mortem brain tissue, miRNAs are stable in serum and
   other body fluids, such as CSF ([60]Saraiva et al., 2017). Therefore
   their evaluation can be informative in healthy as in pathological
   conditions. In ALS, several studies have reported the occurrence of
   miRNAs dysregulation ([61]Waller et al., 2017b; [62]Matamala et al.,
   2018; [63]Vejux et al., 2018; [64]Vrabec et al., 2018). Among the
   others, a disease-specific two-fold upregulation of miR-338-3p,
   involved in apoptosis, neurodegeneration and glutamate clearance, has
   been reported in blood leukocytes, CSF serum, and spinal cords of
   patients with sALS compared to controls ([65]De Felice et al., 2014).
   Furthermore, 30 miRNAs significantly downregulated have been identified
   in the serum of fALS patients, the majority of them already
   dysregulated in pre-symptomatic subjects carrying some of the mutations
   causative of the disease ([66]Freischmidt et al., 2014). These miRNAs
   represent potential targets for therapeutic interventions at the very
   early stages of the disease.

   The availability of high-throughput technologies for large scale
   sequencing (HT-NGS) significantly improved the possibility of
   investigating with an unbiased approach the patterns of miRNAs
   associated with diseases like ALS. Importantly, miRNAs expression
   profiles can reflect the activation of specific pathogenic pathways in
   many neurodegenerative diseases, including ALS ([67]Waller et al.,
   2017a). That approach, in combination with the evaluation of genes
   expression (i.e., mRNA), could reveal novel pathogenic hypotheses
   worthy of testing.

   Based on the above developments we performed an extensive
   transcriptomic investigation in sALS patients with the primary aim of
   identifying dysregulation of miRNAs and mRNAs associated with the
   disease, and secondly to analyze their possible role as modulating
   factors of clinical features.

Materials and Methods

   We planned a multidisciplinary strategy starting with the subjects’
   selection, the molecular analyses on their peripheral blood samples, as
   well as the bioinformatics/statistics evaluations of the data.

Subject Recruitment and Clinical Evaluation

   Patients with probable or definite ALS (revised El Escorial Criteria)
   ([68]Brooks et al., 2000) were recruited at the time of their first
   diagnosis (age ranging from 18 to 80 years) at the Department of Basic
   Sciences, Neurosciences and Sense Organs, University of Bari, Bari,
   Italy. Subjects with positive history of other neurological diseases,
   head/spinal trauma, psychiatric disorders, and alcohol/psychotropic
   drug use were excluded from the study. The search for mutations within
   those genes (and flanking intron–exon boundaries) that are commonly
   associated with ALS in our geographic area (Southern Italy) returned
   negative for all except six patients (two with SOD1 mutation, one
   patient with TARDBP mutation and three carrying the pathogenic C9orf72
   expansions). Since no familial history of ALS was ascertained at the
   study entry, they were considered de novo mutations or mutations with
   incomplete penetrance.

   We divided the study in two phases. In the first one (discovery phase)
   a small sample of sALS patients was recruited, whereas in the second
   phase (validation phase) we examined a larger and distinct sample of
   sALS patients. We collected the following information from all
   patients: demographic and clinical data including gender, age at
   symptom onset as referred by the patient, site of onset (bulbar or
   spinal), ODI (onset to diagnosis interval), time to generalization
   (time interval between disease spreading from spinal to bulbar district
   or vice versa), disease duration (time interval between symptom onset
   and blood sampling). Clinical severity was assessed by the revised ALS
   Functional Rating scale (ALSFRSr) ([69]Cedarbaum et al., 1999) and the
   Manual Muscle Testing (MMT) (medium score). Disease progression rate
   was calculated as: (48-ALSFRSr score at blood sampling)/disease
   duration at blood sampling. Forced vital capacity was also measured.
   The enrollment end-date was December 31, 2017.

   We recruited healthy subjects with no history of neurological diseases
   as HCs in the same geographic area (Bari, Southern Italy).

   The Ethic Committee of Azienda Ospedaliera Policlinico, University of
   Bari, Italy, approved the study and we obtained a signed informed
   consent from all participants at the time of their enrollment
   (according to the Declaration of Helsinki^[70]1).

Molecular Analysis

   Peripheral blood samples were taken from patients and controls and
   stored at -20°C in 3 ml PAXgene Blood RNA Tubes (PreAnalytiX Qiagen/BD,
   Hombrechtikon, Switzerland). Total RNA was isolated using the PAXgene
   Blood RNA Kit (PreAnalytiX Qiagen/BD, Hilden, Germany) at the Institute
   of Biomedical Technologies, National Research Council, Bari, Italy. RNA
   concentration and purity were measured by Nanodrop ND-1000 (Thermo
   Scientific, Wilmington, DE, United States) and RNA 6000 Pico chip on
   Bioanalyzer 2100 (Agilent Technologies, Santa Clara, CA, United
   States), respectively. Samples with RNA integrity number (RIN) scores
   higher than 7 and with A260/A280 values in the 1.8–2.2 range were
   processed for deep sequencing.

HT-NGS (Discovery Phase)

   The RNA samples were sequenced using an Illumina HiSeq2500 platform.
   SmallRNA (sRNA) libraries were prepared by the TruSeq sRNA Sample
   Preparation kit (Illumina) and their quality was confirmed on a
   Bioanalyzer 2100 instrument. A multiplexed pool of equimolar amounts of
   individual sRNA-derived libraries was sequenced to generate 50 bp
   single-end reads, resulting in around 10 million reads/sample. The mRNA
   libraries were prepared using the TruSeq Stranded mRNA Sample
   Preparation kit, fluorimetrically quantified and analyzed, pooled
   together to obtain equimolar concentrations into a multiplex sequencing
   pool and sequenced to generate 2 bp × 100 bp paired-end reads (around
   30 million reads/sample).

RT and Microfluidic_qPCR (Validation Phase)

   Total RNA/sample was reverse transcribed into cDNA, amplified, diluted,
   and used as a template for microfluidic_qPCR analysis (TaqMan Advanced
   miRNA Cards, ABI). PCR amplification was performed under the
   manufacturer’s protocols. Raw_Ct-values were calculated using
   Expression Suite^TM software v1.1 (Life Technologies, Thermo Fisher
   Scientific). The auto-baseline algorithm in the software was used to
   compensate for background noise for each amplification curve, and the
   thresholds were automatically adjusted to the log-linear range
   ([71]Liguori et al., 2018).

HT-NGS Data Analysis

   We processed sRNA/RNA-Seq data according to a bioinformatics pipeline
   that we developed and tested in other neurological diseases
   ([72]Liguori et al., 2018) and that consisted in the steps described
   below.

Quality Check

   The quality control of the obtained reads was performed using the
   FastQC package^[73]2. We checked for low-quality reads by base sequence
   quality, sequence quality scores, base sequence content, base GC
   content, sequence GC content, base N content, sequence length
   distribution, sequence duplication levels, overrepresented sequences,
   and kmer content. If reads were of low quality by the above criteria,
   we removed them from the subsequent analysis.

Read Identification (sRNA)

   The sRNA reads were mapped, using Bowtie aligner ([74]Langmead et al.,
   2009), an ultrafast and memory-efficient alignment of short DNA
   sequences to the human genome, against an in-house-developed reference
   database ncRNAdb, a comprehensive and non-redundant dataset of
   non-coding (nc-RNA) sequences and annotations extracted from public
   database like miRBase^[75]3, Vega^[76]4, Ensembl^[77]5, RefSeq^[78]6,
   piRNAbank^[79]7, GtRNAdb^[80]8, and HGNC^[81]9. The reads that were not
   mapped to known ncRNAs were aligned against the human genome and passed
   to mirDeep2 software^[82]10, which computationally identifies novel
   miRNA and their mature miRNA products.

Read Identification (mRNA)

   The reads obtained from total RNA were mapped against the human genome
   and known human transcripts (GRCh38), using Bowtie2 which supports
   gapped alignment and is faster on long paired-end reads.

Expression Quantification

   In order to obtain reliable read counts and to fix the problem of
   multireads (reads mapping to more than one reference location)
   ([83]Consiglio et al., 2016), we employed the RSEM tool for accurate
   expression estimations ([84]Li and Dewey, 2011). The count values
   produced by the Bayesian model implemented in RSEM were used as
   expression values in this work. When normalization of the expressions
   was necessary for some analysis steps, the trimmed mean of M-values
   (TMM) normalization method was used ([85]Robinson and Oshlack, 2010).

Differential Expression (DE) Analysis

   Expression estimations computed for mRNAs (coding genes) and small
   ncRNAs were compared among the sALS and HC groups with the aim of
   determining statistically significant changes in the levels of
   expression. Since this is a very crucial step in the bioinformatics
   workflow and there is no general consensus regarding which method
   performs best in a given situation, we combined the results of three
   different software packages for DE analysis: edgeR^[86]11, the
   DESeq2^[87]12, and the limma^[88]13. The edgeR and DESeq2 were designed
   for NGS data and include data normalization and p-value correction for
   multiple testing by false discovery rate (FDR). The limma software was
   recently upgraded to enable measurements from read counts while taking
   into account the peculiarities of RNA-seq data ([89]Law et al., 2014).
   Specifically, genes were filtered out if they failed to achieve a count
   per million (cpm) value of 1 in at least 20% of samples. The
   expressions were simultaneously scale normalized using TMM and variance
   was stabilized using the voom technique. The corresponding log-cpm
   values and associated weights were the inputs in the limma standard
   linear modeling and empirical Bayes for DE analysis. The change in the
   expression was considered statistically significant if the adjusted
   p-value was < 0.05.

Statistical Analysis

For qRT-PCR Data

   Two normalization tools, NormFinder and geNorm, were used to identify
   the most suitable endogenous reference genes. The comparison of
   normalized values between the subgroups was obtained according to the
   2^-ΔΔCt method (p-value < 0.05). Normality of data was assessed by the
   Shapiro–Wilk test. We performed the statistical analysis between miRNAs
   (DE) with the Expression Suite software, which consists of two-tailed
   Student’s t-test followed by Benjamini–Hochberg FDR, in order to adjust
   p-value (adjusted p-value < 0.05). We performed the comparison for
   every miRNA, and ratios between miRNAs and multiple patterns,
   representing specific transcriptome profiles.

For Clinical Correlations

   Spearman correlation coefficient test was used to evaluate the main
   clinical features within sALS subjects, as well as the association
   between these characteristics and each miRNA (fold change).
   Furthermore, in order to evaluate the prediction accuracy of each miRNA
   with respect to the type of disease onset (spinal/bulbar) we used
   boxplots and ROC curves. ROC curves were plotted with the R package
   pROC ([90]Robin et al., 2011), while boxplots were produced with the
   basic R drawing tools.

MicroRNA Target Analysis

   Starting from the results of the DE analysis performed on the two
   datasets (sRNAs and mRNAs), the relationships between DE miRNAs and DE
   target genes were investigated through a bioinformatics approach. Their
   interactions were selected using two databases of experimentally
   validated bindings (miRtarbase and DIANA-Tarbase). In order to consider
   the most reliable information about the interactions between the
   significant miRNAs and their target genes, we selected those bindings
   that were confirmed in tests with multiple reporters and that were
   positive at least by Dual Luciferase Reporter Assay, an in vitro test
   that explores the ability of a single miRNA to post-transcriptionally
   downregulate putative targets through its binding to specific sites
   within their 3′ UTRs ([91]Clancy et al., 2007; [92]Jin et al., 2013).

Pathway Analysis

   Functional and pathway enrichment analysis of identified DE genes was
   performed using the Database for Annotation, Visualization and
   Integrated Discovery (DAVID v6.8^[93]14) tool. DAVID is a gene
   functional enrichment program that provides a large series of
   functional annotation tools and pathway databases (e.g., KEGG,
   Biocarta, Reactome databases). We determined statistical significance
   using the one-tailed Fisher’s exact test followed by the Benjamini
   correction; adjusted p-value < 0.05 was set as the threshold value.

Results

Study Subjects Characteristics

   Table [94]1 shows the demographic and clinical characteristics of the
   sALS and HC groups, at both the discovery and the validation phases. In
   the discovery phase, the age at blood sample of the HC group was on
   average about 20 years younger than the sALS patients (p = 0.018). In
   the validation phase, age, or gender did not differ between the
   patients and controls. Age at disease onset significantly correlated
   with the progression rate (r[s] = 0.36, p = 0.010) and ALSFSRr (r[s] =
   -045, p = 0.001).

Table 1.

   Demographic and clinical characteristics of the study groups.
                                                         ALS           HC
   Discovery phase                                     (n = 6)      (n = 5)
   Gender (number)                                     4F, 2M        3F, 2M
   Age at sample (mean ± SD), years^∗                69.7 ± 7.6  49.2 ± 14.9^∗
   Age at onset (mean ± SD), years                   66.3 ± 6.1
   Clinical signs at onset:
   Spinal                                                 5
   Bulbar                                                 1
   Disease duration (median, IQR), months            28.6 (13.0)
   Onset-to-diagnosis interval (median, IQR), months 17.3 (6.1)
   ALSFRSr (mean ± SD)                               24.7 ± 2.3
   MMT-m (median, IQR)                                6.5 (3.6)
   Disease progression rate (median, IQR)             0.8 (0.5)
   Generalization (number)                            6 (100%)
   Time to generalization (median, IQR), months      21.8 (16.0)
   Validation phase                                   (n = 50)      (n = 15)
   Gender (number)                                    23F, 27M       9F, 6M
   Age at sample (mean ± SD), years                  64.2 ± 11.0   60.9 ± 5.4
   Age at onset (mean ± SD), years                   62.5 ± 11.0
   Clinical signs at onset:
   Spinal                                                36
   Bulbar                                                14
   Disease duration (median, IQR), months            18.5 (15.4)
   Onset-to-diagnosis interval (median, IQR), months 12.5 (10.4)
   ALSFRSr (median, IQR)                             35.0 (12.0)
   MMT-m (median, IQR)                                8.8 (2.0)
   Disease progression rate (median, IQR)             0.7 (0.6)
   Generalization (number)                            45 (90%)
   Time to generalization (median, IQR), months      10.7 (12.2)
   [95]Open in a new tab

   Data are presented as means ± standard deviation (SD), in case of
   normal distribution, or as median and InterQuartile Ratio (IQR).
   ^∗Mann–Whitney U-test: p = 0.018. F = female; M = male: ALSFRSr, ALS
   Functional Rating scale; MMT-m, Manual Muscle Testing medium score.

Identification of miRNAs and mRNAs Differentially Expressed (DE) in sALS
Versus HC

   After the discovery experiment performed with HT-NGS methodology, the
   comparisons of miRNAs expression levels within the study groups
   revealed 107 mature miRNAs significantly DE between sALS and HC.
   According to our selection criteria ([96]Liguori et al., 2018), we
   decided to consider only miRNAs with mean number of reads higher than
   25, a Fold Change higher than 2, and a Dispersion Index range of 0–1.6.
   The 42 miRNAs included in the final list (all downregulated in our sALS
   population, see Table [97]2) were subjected to qRT-PCR validation as
   described in the section below. Most of these miRNAs have been shown to
   be associated with ALS or other neurogenerative diseases (references of