Abstract Sulfur mustard is a vesicant chemical warfare agent, which has been used during Iraq-Iran-war. Many veterans and civilians still suffer from long-term complications of sulfur mustard exposure, especially in their lung. Although the lung lesions of these patients are similar to Chronic Obstructive Pulmonary Disease (COPD), there are some differences due to different etiology and clinical care. Less is known on the molecular mechanism of sulfur mustard patients and specific treatment options. microRNAs are master regulators of many biological pathways and proofed to be stable surrogate markers in body fluids. Based on that microRNA expression for serum samples of sulfur mustard patients were examined, to establish specific microRNA patterns as a basis for diagnostic use and insight into affected molecular pathways. Patients were categorized based on their long-term complications into three groups and microRNA serum levels were measured. The differentially regulated microRNAs and their corresponding gene targets were identified. Cell cycle arrest, ageing and TGF-beta signaling pathways showed up to be the most deregulated pathways. The candidate microRNA miR-143-3p could be validated on all individual patients. In a ROC analysis miR-143-3p turned out to be a suitable diagnostic biomarker in the mild and severe categories of patients. Further microRNAs which might own a link to the biology of the sulfur mustard patients are miR-365a-3p, miR-200a-3p, miR-663a. miR-148a-3p, which showed up only in a validation study, might be linked to the airway complications of the sulfur mustard patients. All the other candidate microRNAs do not directly link to COPD phenotype or lung complications. In summary the microRNA screening study characterizes several molecular differences in-between the clinical categories of the sulfur mustard exposure groups and established some useful microRNA biomarkers. qPCR raw data is available via the Gene Expression Omnibus [38]https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE110797. Introduction Sulfur mustard [bis (2-chloroethyl) sulfide] is a vesicant chemical warfare agent, which has been used during world war I (1914–1918) and in recent times during 8-year Iraq-Iran-war (1980–1988) [[39]1]. The sulfur mustard victims are severely affected throughout their whole life if they survived this injury. Especially in Iran it is a humanitarian challenge to cure and help theses many forgotten people in society. The sulfur mustard exposure of the human body causes a systemic disease. We will introduce for the whole phenotype in the following sections the acronym 'SMV'. As an alkylating agent, sulfur mustard reacts with a lot of macromolecules in the human body. Especially DNA and RNA in the cells are the obvious targets of sulfur mustard and this interaction is responsible for many clinical manifestations of sulfur mustard exposure [[40]2, [41]3]. The guanine base of nucleotides is the preferential site of sulfur mustard action which can lead to mono- and bifunctional adducts [[42]2]. These adducts are preferentially found at the N7 position of guanine but rarely at O6 position. Although very uncommon, O6-(2-ethylthioethyl guanine) is the most critical DNA lesion as human DNA repair system is unable to remove it [[43]2]. Bifuntional sulfur mustard adducts including intra- or inter strand crosslinks occurred in nearly 17% of the sulfur mustard alkylations resulting in single or double strand DNA breaks [[44]2]. In vitro and in vivo experiments showed that, beyond a certain amount of DNA alkylation, the direct consequences are apoptosis or necrosis [[45]4]. In a comprehensive study, Khateri and colleagues [[46]1] showed that of the approximately 100,000 military and civilian Iranian people, who had been exposed to sulfur mustard, a large group of 34,000 people suffer up to now from its chronic effects, mainly located in their lung, eyes and skin. The lung lesions turned up to be the most frequently diagnosed complication of SMV (42.5%) [[47]1]. Indeed, the mucosal membrane of the respiratory tract provides an enormous surface area for sulfur mustard action. Additionally, the moistness nature of the lung surface makes it an ideal site of sulfur mustard reactivity [[48]1] as body temperature and moisture enhance the destructive effects of sulfur mustard [[49]5]. Pathological studies, pulmonary function test (PFT) and other clinical examinations showed that COPD (Chronic Obstructive Pulmonary Disease) is the main long-term lung disorders among sulfur mustard patients. But the COPD phenotype observed in SMV patients is different from other conventional COPDs resulting from causes such as smoking. Serological findings regarding SMV patients showed that, among all proposed mechanisms for COPD, oxidative stress and apoptosis are more probable to be involved in COPD pathogenesis [[50]6]. Because of the complicated nature of the lung disorder in SMV, the exact biological mechanism of this disease is still an active field of researches [[51]6]. In this regard, screening studies are necessary to define the pattern of molecular drivers. qPCR arrays are a reliable and sensitive option in investigating the expression pattern of a large amount of microRNAs in the peripheral blood system as surrogate markers for systemic changes in the lung and further organs [[52]7]. The post-transcriptional regulatory role of microRNAs directly influences many important biological pathways. These non-coding RNAs are stable enough to be detected in biological samples such as serum, a prerequisite for becoming a clinical application [[53]7, [54]8]. They also show constant and invariable expression patterns under storage conditions, which makes them more appropriate over other biomarkers [[55]8]. Their stability against RNase activity, which is a big problem in transcriptional studies, is an additional advantage [[56]8]. The pathophysiological deregulation of microRNA markers has already been described in a similar disease phenotype of the pulmonary disorder COPD and emphysema [[57]9–[58]11]. Additionally, it has been described that serum has sufficiently significant microRNA signatures being useful to distinguish different disease states [[59]7, [60]12]. Because of the phenotypic similarity of lung lesions of SMV with COPD cases, the GOLD procedure (Global Initiative for Chronic Obstructive Lung Disease) is adopted to evaluate the severity of pulmonary lesions in these patients on a clinical level [[61]1] and to categorize the patient cohort. The objectives of this study were to find out if the microRNA expression is differentially expressed in serum samples of SMV compared to normal ones, and which biological pathways are linked to the differentially expressed microRNAs. This can be exploited in defining therapeutic strategies for the clinical procedures or diagnostic applications. Here, microRNAs might offer an appropriate and reliable diagnostic alternative, which can be a substitute for more inaccurate diagnostic methods of SMV like high resolution CT scan and PFT. Finally, this study should help to reconfirm some preliminary aspects of previous research efforts. Materials and methods Patients Eighty-four male participants, including SMV patients and age- and gender-matched healthy persons, were enrolled in this case and control study in 2011 / 2012. An overview of the screening procedure can be seen in [62]Fig 1. Due to the variability of the symptoms in the SMV phenotype, all the patients were chosen from a homogeneous group of veterans who had been exposed to high doses of mustard gas during a gas attack in February 1986. The participants in the control group were volunteers, who had not been exposed to any chemical substances during their lifetime either in war zone or their work place. The exclusion criteria for both groups were based on smoking, chronic disease of lung, skin and eyes and also malignancies. The average age of patients in this study was 47.5 ± 3.6 years ranging from 40 to 64 years. Fig 1. Patient cohort. [63]Fig 1 [64]Open in a new tab Participants were categorized into the four classes of normal, mild, moderate and severe based on the history of exposure to sulfur mustard, PFT and other clinical examinations. The samples of the screening study were randomly pooled into two biological replicates and measured. The grouping into normal study participants and mild, moderate and severe SMV patients was done according to the severity of their eye, lung and skin lesions. Spirometry test values of the lung (forced expiratory volume—FEV1, forced vital capacity—FVC) were used to categorize patients based on their lung function. The classification of the eye and skin functionality was done according to Khateri et al. [[65]1]. The study was approved by the research ethics committee of Janbazan Medical and Engineering Research Center (JMERC 89. E.B.101). Written informed consents were obtained from all participants. Peripheral blood sampling Eight milliliters of peripheral blood were collected in BD Vacutainer tubes with clot activator and gel (BD, Plymouth, UK). The time point of sampling was always early in the morning. The samples were placed at room temperature for 20 minutes and serum was separated by centrifugation at room temperature [[66]13]. Each serum sample was stored at -80°C until further analysis. RNA extraction and quantitative PCR The patients from the mild, moderate and severe groups were randomly pooled into two, disjunct biological replicates. Equal volumes of sera were mixed together and the pooled samples were subjected to RNA extraction. Total RNA was extracted from 250 microliters of each pooled serum samples using miRNeasy mini kit (Qiagen, Germany) based on manufacture`s protocol. To increase the yield of microRNA extraction, MS2 RNA (Roche Applied Science, Germany) was added to each sample before adding QIAzole. 1.5 microliters of each RNA samples were reverse transcribed in a volume of 10 microliters. This amount of RNA input has the least inhibitory effect based on our previous experiments on serum samples assessing the inhibitory effect of RNA input on microRNA expression [[67]14]. cDNA synthesis was performed in a two-step protocol of RNA polyadenylation followed by universal cDNA synthesis (Exiqon, Denmark). Thereafter, the quantitative PCR panel was run using SYBR green master mix (Exiqon, Denmark) by the Light Cycler 480 (Roche, Germany). The expression pattern of 752 microRNAs was assessed for each sample using the Exiqon made panels (microRNA Ready-to-Use PCR, Human panel I+II, V2.R). Data processing and analysis Totally, due to the complexity of disease and to increase the power of the analyses, three different strategies based on the R (3.2.3) Bioconductor (3.2) packages of limma (3.26.9) [[68]15], HTqPCR (1.24.0) [[69]16] and R CRAN package of MCMC (0.9–5) [[70]17] were tested to get insight into the performance of the methods. Among these, HTqPCR package was the most reliable package, and had the advantage of several test statistics and normalization procedures and was therefore finally chosen for the presented analyses ([71]Fig 2A). Fig 2. Comparative differential microRNA analysis workflow including sampling controls. [72]Fig 2 [73]Open in a new tab The panel (A) shows that two different analysis approaches were applied. On the left side the IPC and delta Ct method followed by linear model / Bayes based differential analysis was performed while on the right side a pure linear model / Bayes approach was used based on the results of the pilot analysis. The relevance of both approaches was tested by a resampling approach. The right branch was finally chosen and basis for the discussion. In panel (B) the results of all the alternative tests on either the normal-mild or the normal-severe comparisons are shown. The results on the right side indicate that in the normal-mild comparison 15 and in the normal-severe comparison 29 microRNAs are stable on a 5% significance level after applying the resampling control. The numbers on white background indicate the remaining candidates after the intersections were performed. The result denotes a good consistency between left and right procedure. The quality control and the differential analysis was done with the HTqPCR package functions. Based on the quality control (i.a. [74]S1 and [75]S2 Figs) a raw and IPC/delta Ct analysis was performed. The differential analysis was therefore also performed in two different approaches. In one branch the raw data was directly used for a linear model-based Bayes approach (HTqPCR: limmaCtData) followed by a resampling control (see section 'Sampling procedure'). In parallel, and as an outer control to the HTqPCR differential analysis workflow, a simple normalization approach, well established in the qPCR community as well as recommended by Exiqon company, was applied on the expression sets (see next paragraph). The significant differences were again evaluated on an alpha error of 5 percent applying a linear model-based Bayes approach. Like in the first procedure, a resampling (permutation) analysis was added on the selected data workflow to establish a sampling p value denoting the reliability of the results. An overview on the used HTqPCR functions and the code of the custom functions is given in [76]S1 Functions. Data normalization by delta Ct method The approach is a well-established method [[77]18–[78]20]. To keep the Exiqon specific peculiarities a proposed workflow of the company was used. Briefly, after raw data collection, inter-plate calibration and internal control normalization were performed. The used predefined inter-plate calibrators, annotated as UniSp3 IPC (inter plate calibrator), should have the same expression value across all panels. in which a standard deviation of less than 0.05 is necessary to pass this quality control step. For each panel, a calibration factor was calculated based on the Exiqon's manual. Additionally, six different small RNAs including three small nucleolar RNAs and three microRNAs exist in each panel including: SNORD38B (ENSG00000207421), SNORD49A (ENSG00000277370), U6 snRNA (ENSG00000206625), and miR-103a-3p (MIMAT0000101), miR-423-5p (MIMAT0004748) and miR-191-5p (MIMAT0000440). These factors are provided as reference genes for the delta Ct normalization method [[79]6]. In a separate study, we evaluated the stability of these six recommended internal controls and found miR-103a-3p, miR-423-5p and miR-191-5p (named r103, r191, r423 in the raw data) as the most consistent and reliable internal controls among others [[80]21]. So the mean value of these three stable microRNAs was finally used as reference gene in calculating the delta Ct value. Sampling procedure A sampling method was applied on the delta Ct respectively the raw data to show the reliability of obtained results. Specifically, a resampling procedure without replacement was applied (permutation, per group, on all microRNAs of a group). This algorithm is known to be a very reliable and test distribution conformable procedure to control the alpha error [[81]22–[82]24]. A permutation number of 10,000 times was chosen. The p value was calculated by the fraction of permutation approaches better than the original p value without resampling. Pathway enrichment analysis The differential microRNAs were mapped to the corresponding gene targets to reveal the associated pathways. miRTarbase [[83]25] is one of several tools that is using validated microRNA-target interactions instead of predicted microRNA-gene interactions resulting in a higher relevance in vivo. The target genes of the miRTarbase database which have been considered are experimentally validated by reporter assay, western blot, microarray or next-generation sequencing experiments. The resulting ENTREZ gene IDs were used by utilizing DAVID tools (The Database for Annotation, Visualization and Integrated Discovery, v6.8, Oct 2016) [[84]26] for a pathway enrichment analysis. The resulting genes were mapped by the DAVID tool onto known pathway resources, in our case BIOCARTA. The significant enriched pathways were qualified and ranked by false discovery rate corrected p values (alpha error: p<0.05). Background for the analysis was the entire Homo sapiens transcriptome. The enriched pathways were discussed in the context of established knowledge. Validation of selected microRNAs The selected microRNAs were validated in the same cohort of patients as described for the screening procedure using quantitative real time PCR. The number of selected patients was: 14 patients with mild, 16 with severe symptoms and 16 control samples for miR-143-3p. Similar numbers were chosen for miR-148a-3p (see [85]S3 Fig). The validation was not performed on a pool of samples instead on each individual patient sample. The protocol of cDNA synthesis and real time PCR was the same as previously explained. Delta-Ct value was calculated using the mean of miR-103a-3p, miR423-5p and miR-191-5p as reference genes. Student's t-test was used here to evaluate statistical significance with an alpha value of smaller than 0.05. The comparison was made between control group and mild and severe groups separately. Receiver operating characteristic (ROC) curve analysis were used to determine the specificity and sensitivity in distinguishing the whole group of SMV affected patients from the normal group. All these analyses and the graphs were created using GraphPad Prism 6. Cell cycle analysis To investigate the impact of microRNA overexpression on the cell cycle, a DNA segment containing miR-143-3p precursor was cloned in PEGFP-C1 vector and 1 microgram of the vector was transfected to HEK-293T using lipofectamin 2000 (Invitrogen, USA) in a 24-well plates (7 × 10 4 cells per well). The transfection rate (~70%) was assessed after 24 hours by measuring the GFP signal using a fluorescent microscope. Transfected wells were subjected to RNA extraction following cDNA synthesis to evaluate if the expression of miR-143-3p was increased compared to mock vector. U6 was used as internal control and each reaction was done in duplicate. In parallel, a flow cytometry analysis of the cell cycle with propidium iodide (PI) was performed on the transfected cells after 15-minutes of ethanol fixation. Following fixation, the cells were rinsed with PBS and stained with PI in a solution containing Triton X-100 and RNase A. The cell cycle analysis was performed using a cell sorter (BD FACSCanto II, Becton Dickinson). The flow cytometry data analysis software (Flowing software version 2.5.1), was used to determine and plot the cell cycle profile and percentage values. Results Data properties and experimental design The differentially expressed microRNAs in serum samples of SMVs were determined using miRNome panels (Exiqon). Following RNA extraction and cDNA synthesis, quantitative PCR array of microRNAs was performed on serum samples. Four patient`s categories including normal, mild, moderate, and severe were included in the study, while the focus for generating biological insight was mainly put on a differential analysis of normal (grey) versus mild (blue) respectively severe (red, [86]Fig 1). All groups were based on two biological replicates, each based on two independent pools of patients. The samples were analyzed on two qPCR arrays I and II with 372 and 367 microRNAs respectively. 528 microRNAs out of 739 were at least partly expressed. The majority of expressed microRNAs with a relevant Ct value were located on panel I. To protect the downstream analysis from this biased behavior, panel II was dropped. The remaining microRNAs on panel I were further filtered to have Ct values smaller than 40 in all measurements. This might also effect on/off situations by microRNA expression but the raw data inspection showed that without this correction data reproducibility drops remarkably. The resulting number of 173 microRNAs was used for further analysis. The qPCR panel additionally included controls, inter plate calibrators (IPCs) and six different reference genes where 3 were used in the differential analysis. The delta Ct calculation are normally based on references of small RNA species from non-microRNA origin (including