Graphical abstract graphic file with name fx1.jpg [73]Open in a new tab Highlights * • 2,336 known and 361 novel miRNAs identified in this study * • 85 miRNAs associated with COVID-19 * • A panel of miRNAs targeting the viral or cellular genes * • Machine learning using miRNAs for classification of COVID-19 __________________________________________________________________ Virology; Machine learning Introduction The pandemic of the coronavirus disease 2019 (COVID-19), which was caused by SARS-CoV-2 infection, has posed unprecedented challenges to the international communities. As of December 29, 2021, 281,808,270 confirmed cases of COVID-19, including 5,411,759 deaths worldwide were reported to the World Health Organization ([74]WHO Coronavirus (COVID-19) Dashboard, 2021). The emerging variants of SARS-CoV-2 have resulted in the devastating surge of COVID-19 cases in several countries and have caused serious concerns ([75]Kupferschmidt and Wadman, 2021). The clinical presentations and disease courses of SARS-CoV-2 infection are highly variable, ranging from asymptomatic infection (AS) to severe or critical conditions ([76]Rothe et al., 2020; [77]Buitrago-Garcia et al., 2020; [78]Guan et al., 2020; [79]Wu and McGoogan, 2020; [80]Price-Haywood et al., 2020). While many studies have been conducted to understand the differences between the moderate disease (MD) and the severe disease (SD), the AS remains poorly characterized. In addition, some of the patients were tested positive for the nucleic acid of SARS-CoV-2 by RT-PCR but did not become negative for a longer period of time (≥60 days, herein designated as long-term nucleic acid test positive, LTNP) whereas others turned into negative for the viral nucleic acid in a shorter period of time (≤45 days, designated as short-term nucleic acid test positive, STNP). However, the underlying mechanisms responsible for the LTNP and STNP remain unknown. MicroRNAs (miRNAs) are small single-stranded RNA molecules with a length of 18–28 nucleotides and have been suggested to play important roles in a variety of physiological and pathological processes in animals and humans, including development, immune responses, inflammation, and apoptosis ([81]Bartel, 2004). In particular, miRNAs were involved in the replication process of a number of viruses, such as hepatitis C virus, HIV-1, and influenza viruses ([82]Lee et al., 2017; [83]Narla et al., 2018; [84]Zhao et al., 2019). Recent studies have also suggested that miRNAs may be involved in the pathogenesis of COVID-19 ([85]Chauhan et al., 2021; [86]Arisan et al., 2020; [87]Hosseini Rad Sm and McLellan, 2020). However, the miRNA alternations associated with SARS-CoV-2 infection and the underlying mechanisms are largely unknown. In this study, to investigate the potential contributions of miRNAs to the pathogenesis of COVID-19, we performed high-throughput sequencing of the plasma miRNA library and identified 2,336 known and 361 novel miRNAs in 233 plasma samples from 61 healthy controls and 116 patients with COVID-19 with various clinical presentations. We revealed 85 differentially expressed miRNAs that were associated with the SARS-CoV-2 infections, disease severity, and viral persistence in the patients with COVID-19, such as hsa-miR-370-3p, hsa-miR-146a-3p, hsa-miR-885-5p, hsa-miR-214-3p, and hsa-miR-10b-5p. We also identified a panel of miRNAs that may directly target the viral genes of SARS-CoV-2. Moreover, we detected a number of miRNAs that may target the cellular genes involved in the life cycle of SARS-CoV-2, including ACE2, AXL, TMEM106B, and TOMM70. The gene ontology and KEGG pathway analyses of the significant miRNAs revealed their connections to immune responses, viral infections, and lung diseases. Finally, we established and tested a machine learning model using the differentially expressed miRNAs between various compared groups for classification of COVID-19 cases with different clinical presentations. Our findings may help understand the contribution of miRNAs to the pathogenesis of COVID-19 and identify potential biomarkers and molecular targets for the diagnosis and treatment of SARS-CoV-2 infection. Results Plasma miRNA profiles in patients with COVID-19 with various clinical presentations In this study, we first identified 2,336 known and 361 novel miRNAs in 233 plasma samples from 61 healthy controls (HC) and 116 SARS-CoV-2-infected subjects (COV) by high-throughput sequencing ([88]Figure 1A; [89]Data S2 and [90]S3). Of note, we identified 2,336 of the 2,656 known miRNAs (approximately 88%) in the current database ([91]http://www.mirbase.org/ftp.shtml), suggesting that our experimental procedures and the analytical pipelines were robust and well performed. We subsequently performed a series of computational and statistical analyses on these miRNAs to identify the target genes (both the viral and cellular genes), the differentially expressed miRNAs (hereinafter referred to as DE-miRNAs) between various groups ([92]Figure 1A), Gene ontology (GO) and KEGG pathways, and built an XGBoost machine learning model to identify potential miRNA biomarkers for clinical classification and diagnosis of patients with COVID-19. Here, we compared the miRNA expression levels between eleven groups (COV vs HC, AS vs HC, SM vs HC, SM vs AS, MD vs AS, SD vs AS, SD vs MD, SD22 vs MD12, MD12 vs MD-R, SD22 vs SD-R, and LTNP vs STNP), respectively, and discovered 85 DE-miRNAs ([93]Figure 1B; [94]Tables 1, [95]S2; [96]Data S4 and [97]S5). Further analysis revealed that 67 of the 85 DE-miRNAs were associated with immune responses, virus infections, and/or lung diseases ([98]Figure 1B) ([99]Pulati et al., 2019; [100]Chen et al., 2019), suggesting that these DE-miRNAs may play important roles in the pathogenesis of COVID-19. Figure 1. [101]Figure 1 [102]Open in a new tab Study design and the differentially expressed miRNAs (DE-miRNAs) (A), An overview of the study design. A total of 2,336 known and 361 novel miRNAs were identified in 233 plasma samples derived from 61 healthy controls (HC) and 116 SARS-CoV-2-infected patients (COV) by high throughput sequencing. (B), The heatmap of 85 DE-miRNAs from eleven compared groups: COV vs HC, AS vs HC, SM vs AS, SD vs MD, LTNP vs STNP, SM vs HC, MD vs AS, SD vs AS, SD22 vs MD12, MD12 vs MDR, and SD22 vs SDR. At the bottom of the figure, orange squares represent the DE-miRNAs between eleven compared groups. On the top, blue, orange, and purple squares represent the function of DE-miRNAs associated with immune responses, virus infections, or lung diseases, respectively. Table 1. DE-miRNAs associated with COVID-19 with various clinical presentations miRNA COV vs HC AS vs HC SM vs HC SM vs AS SD vs MD SD22 vs MD12 MD12 vs MDR SD22 vs SDR LTNP vs STNP hsa-miR-370-3p ↓[103]^a ↓ ↓ ↓ hsa-mir-12136-p3_1ss19TC ↓ ↓ ↓ hsa-miR-302b-3p_R-2 ↓ ↓ ↓ ↑ hsa-miR-302a-3p ↓ ↓ ↓ hsa-miR-939-5p_R+1 ↓ ↓ ↓ hsa-miR-2020_010-5p ↓ ↓ ↓ hsa-miR-885-5p ↑[104]^b ↑ ↑ hsa-miR-1246_R+1 ↑ ↑ ↑ hsa-miR-193a-5p_R-1 ↑ ↑ ↑ ↓ hsa-miR-122-5p_R-1 ↑ ↑ ↑ hsa-miR-483-5p ↑ ↑ ↑ ↑ hsa-miR-452-5p_R+2 [105]^c ↑ ↓ hsa-miR-183-5p ↓ ↑ hsa-miR-2020_009-5p ↓ ↑ hsa-miR-342-5p_R+2 ↓ hsa-miR-423-5p ↓ hsa-miR-2020_013-5p ↓ hsa-miR-145-5p ↑ hsa-miR-23a-5p ↓ hsa-miR-12136_R+8 ↓ ↑ hsa-miR-3960_R-2_1ss12AC ↓ hsa-miR-146a-3p_L+2R-2 ↑ hsa-miR-1-3p ↑ hsa-miR-2020_005-3p ↓ hsa-miR-2020_002-5p ↓ hsa-miR-2020_017-5p ↑ hsa-miR-133a-3p_R+1 ↑ hsa-miR-92b-5p_R+1 hsa-miR-625-5p ↑ hsa-miR-7977_1ss6AG ↑ hsa-miR-214-3p_L+1R-4 ↓ ↑ hsa-miR-452-5p_R+2 ↓ hsa-miR-23a-5p ↑ hsa-miR-190a-5p_R+2 ↓ hsa-miR-95-3p_R-1 ↑ hsa-miR-429 ↑ [106]Open in a new tab COV, Coronavirus-infected cases; HC, Healthy Controls; AS, Asymptomatic Infection; SM, Symptomatic Infection; MD, moderate disease; SD, severe disease; MDR, moderate disease repeated samples; SDR, severe disease repeated samples; MD12, 12 moderate disease (single-person first blood samples); SD22, 22 severe disease (single-person first blood samples); LTNP, long-term nucleic acid test positive; STNP, short-term nucleic acid test positive. ^a The expression of miRNA was upregulated or showed positive correlation. ^b The expression of miRNA was downregulated or showed negative correlation. ^c No significant difference. DE-miRNAs associated with SARS-CoV-2 infection We then compared the miRNA expression levels between the SARS-CoV-2-infected individuals (COV) and the healthy control (HC), and identified 112 upregulated (fold change>1.5, p ≤ 0.05) and 44 downregulated DE-miRNAs (fold change<0.67, p ≤ 0.05) ([107]Figure 2A). To improve the accuracy and reliability of the DE-miRNA candidates, we removed the DE-miRNAs with the average expression level lower than 10 copies and retained 24 DE-miRNAs ([108]Data S5). Further analysis suggested that 18, 12, and 17 of the DE-miRNAs were associated with virus infections, lung diseases, and/or immune responses, respectively ([109]Figure 2B). Our data also indicate that the age and gender of the study subjects had no significant influence on the expression level and function of the DE-miRNAs ([110]Figure 2B). A total of 3,108 target genes were mapped to these DE-miRNAs (e.g., CD40 for hsa-miR-370-3p and CXCL8 for hsa-miR-302a-3p) through the target gene prediction analysis ([111]Data S6). Moreover, 936 significant GO terms and 59 KEGG pathways were derived from these targeted genes ([112]Data S7 and [113]S8). Here, we displayed the top 20 GO terms ([114]Figure 2C) and KEGG pathways ([115]Figure 2D), which were associated with immune responses (T cell homeostasis and interleukin-2 production), virus infection (Hepatitis B), lung diseases (non-small cell lung cancer), and other processes (HIF-1 signaling pathway), respectively. Furthermore, to test the hypothesis that the DE-miRNAs could be used as biomarkers for diagnosis of SARS-CoV-2 infection, we built an XGBoost machine learning model based on miRNA expression data from the 61 HC and 116 patients with COVID-19, leading to prioritization of eleven important miRNA variables ([116]Figure 2E). This model reached an area under curve (AUC) of 0.9258 and correctly classified over 83% of COV and 87% of HC subjects in the training set ([117]Figure 2F). In the testing set, 24 of 29 COV (83%) and 12 of 15 HC (80%) were correctly classified by this model ([118]Figure 2G). Finally, we identified four DE-miRNAs (i.e., has-miR-34a-5p, has-miR-370-3p, has-miR-193a-5p_R-1, and has-miR-12136-P3_1ss19TC) by combining the differential expression data with the XGBoost machine learning, which could be used as potential biomarkers to distinguish the SARS-CoV-2-infected patients (COV) from the uninfected healthy individuals (HC) ([119]Figure 2H). Figure 2. [120]Figure 2 [121]Open in a new tab DE-miRNAs associated with SARS-COV-2 infections (A), The volcano plot of COV vs HC. Red dots represent the upregulated DE-miRNAs (fold change>1.5, p ≤ 0.05); blue dots represent the downregulated DE-miRNAs (fold change<0.67, p ≤ 0.05); gray dots represent miRNAs without significant changes (0.67 ≤ fold change≤1.5, p > 0.05). (B), The heatmap of 24 DE-miRNAs in COV vs HC. In the left panel, blue, orange, and purple squares represent the function of miRNAs associated with immune responses, virus infections, and lung diseases, respectively. (C), The GO analysis of the targeted genes in COV vs HC. Top 20 GO terms associated with immune responses, virus infections, and lung diseases were shown. (D), The KEGG analysis of the targeted genes in COV vs HC. Top 20 KEGG pathways associated with immune responses, virus infections, and lung diseases were shown. (E), The important DE-miRNAs prioritized by XGBoost analysis. (F), Receiver operating characteristic (ROC) and performance of the model in the training set. (G), ROC and performance of the model in the test set. (H), Normalized copy numbers of four representative DE-miRNAs in the COV vs HC group, the bars showing the mean value ± standard deviation (sd) (COV n = 116, HC n = 61). DE-miRNAs associated with asymptomatic infection of SARS-CoV-2 Similar to the analysis described above, we compared the miRNA expression levels between the asymptomatic-infected individuals (AS) and the healthy control (HC), and identified 26 upregulated and 93 downregulated DE-miRNAs ([122]Figure 3A). There are 24 DE-miRNAs left after removing the ones with the average expression level lower than 10 copies ([123]Data S5). Further analysis suggested that 13, 8, and 20 of the DE-miRNAs were associated with virus infections, lung diseases, and/or immune responses, respectively ([124]Figure 3B). Our data also indicate that the age and gender of the study subjects had no significant influence on the expression level and function of the DE-miRNAs ([125]Figure 3B). A total of 3,184 target genes were mapped to these DE-miRNAs through the target gene prediction analysis ([126]Data S6). Moreover, 952 significant GO terms and 71 KEGG pathways were derived from these targeted genes ([127]Data S7 and [128]S8). Here, we displayed the top 20 GO terms ([129]Figure 3C) and KEGG pathways ([130]Figure 3D), which were associated with immune responses, virus infection, lung diseases, and other processes, respectively. We also built an XGBoost machine learning model based on miRNA expression data from the 61 HC and 16 AS patients, leading to prioritization of nine important miRNA variables ([131]Figure 3E). This model reached an area under curve (AUC) of 0.999 and correctly classified 100% of the AS and 96% of the HC subjects in the training set ([132]Figure 3F). In the testing set, 3 of 4 COV (75%) and 13 of 1 H4C (93%) were correctly classified by this model ([133]Figure 3G). Of note, the lower percentage of correct classification of the AS subjects in the testing set was most likely caused by the smaller sample size. Finally, we also identified four DE-miRNAs (has-miR-423-5p, has-miR-23a-5p, has-miR-146a-3p_2R-2, and has-miR-1-3p) by combining the differential expression data with the XGBoost machine learning, which could be used as potential biomarkers to distinguish the AS from the HC ([134]Figure 3H). Figure 3. [135]Figure 3 [136]Open in a new tab DE-miRNAs associated with the asymptomatic infection of SARS-COV-2 (A), The volcano plot of AS vs HC. Red dots represent the upregulated DE-miRNAs (fold change>1.5, p ≤ 0.05); blue dots represent the downregulated DE-miRNAs (fold change<0.67, p ≤ 0.05); gray dots represent miRNAs without significant changes (0.67 ≤ fold change≤1.5, p > 0.05). (B), The heatmap of 24 DE-miRNAs in AS vs HC. In the left panel, blue, orange, and purple squares represent the function of miRNAs associated with immune responses, virus infections, and lung diseases, respectively. (C), The GO analysis of the targeted genes in AS vs HC. Top 20 GO terms associated with immune responses, virus infections, and lung diseases were shown. (D), The KEGG analysis of the targeted genes in AS vs HC. Top 20 KEGG pathways associated with immune responses, virus infections, and lung diseases were shown.(E), The important DE-miRNAs prioritized by XGBoost analysis. (F), Receiver operating characteristic (ROC) and performance of the XGBoost model in the training set. (G), ROC and performance of the XGBoost model in the test set. (H), Normalized copy numbers of four representative DE-miRNAs in the AS vs HC group, the bars showing the mean ± sd (AS n = 16, HC n = 61). DE-miRNAs associated with disease severity of COVID-19 When we compared the miRNA expression levels in the SD vs MD group using the same cutoff value described above (i.e. fold change>1.5 and p ≤ 0.05 for the upregulated; fold change<0.67, and p ≤ 0.05 for the downregulated DE-miRNAs), we only observed one significant DE-miRNA (has-miR-214-3p_L+1R_4). Therefore, we adjusted the fold change>1.2 for the upregulated whereas the fold change<0.83 for the downregulated miRNAs. As a result, we identified 47 upregulated and 37 downregulated DE-miRNAs ([137]Figure 4A). There are 17 DE-miRNAs left after removing the ones with the average expression level lower than 10 copies ([138]Data S5). Further analysis suggested that 7, 8, and 17 of the DE-miRNAs were associated with virus infections, lung diseases, and/or immune responses, respectively ([139]Figure 4B). Our data also indicate that the age and gender of the study subjects had no significant influence on the expression level and function of the DE-miRNAs ([140]Figure 4B). A total of 2,626 target genes were mapped to these DE-miRNAs through the target gene prediction analysis ([141]Data S6). Moreover, 901 significant GO terms and 39 KEGG pathways were derived from these targeted genes ([142]Datas S7 and [143]S8). Here, we displayed the top 20 GO terms ([144]Figure 4C) and KEGG pathways ([145]Figure 4D), which were associated with immune responses, virus infection, lung diseases, and other processes, respectively. We also built an XGBoost machine learning model based on miRNA expression data from the 48 SD and 52 MD patients, leading to prioritization of nine important miRNA variables ([146]Figure 4E). This model reached an area under curve (AUC) of 0.996 and correctly classified 100% of the SD and 97% of the MD patients in the training set ([147]Figure 4F). In the testing set, 10 of 12 SD (83%) and 9 of 13 MD (69%) were correctly classified by this model ([148]Figure 4G). Of note, the lower percentage of correct classification of the SD and MD patients in the testing set was most likely caused by the smaller sample size. Finally, we identified five DE-miRNAs (i.e., has-miR-214-3p_L+1R_4, has-miR-143-3p, has-miR-224-5p_L-1, has-miR-452-3p_R+2, and hsa-miR-625-5p) by combining the differential expression data with the XGBoost machine learning, which could be used as potential biomarkers to distinguish the SD from the MD patients ([149]Figure 4H). Furthermore, we examined the dynamic changes of the DE-miRNA expression levels in two of the patients with blood samples collected at multiple time points, including three samples collected from one fatal patient (COV012, COV031, COV042, and COV077) and three samples collected from a recovered patient (COV011, COV041, and COV078). The expression level of hsa-miR-122-5p_ R-1 in SD was significantly higher than that in MD, suggesting that this miRNA could be positively correlated with the disease progression. Indeed, the level of hsa-miR-122-5p_R-1 in the fatal patient continuously increased as the disease worsened, whereas its expression restored to the normal level in the recovered patient ([150]Figure 4I). On the other hand, the expression level of hsa-miR-224-5p_L-1 exhibited opposite changes in these two patients ([151]Figure 4J), suggesting that this miRNA could be negatively correlated with the disease progression. In this study, we also completed similar analysis on other compared groups ([152]Figure 1B), including SM vs HC, SM vs AS, MD vs AS, SD vs AS, SD22 vs MD12, MD12 vs MD-R, and SD22 vs SD-R ([153]Figures S1–S8 and [154]Datas S5, [155]S6, [156]S7, and [157]S8). Figure 4. [158]Figure 4 [159]Open in a new tab DE-miRNAs associated with the disease severity of COVID-19, see also [160]Figures S1–S8 (A), The volcano plot of SD vs MD. Red dots represent the upregulated DE-miRNAs (fold change>1.2, p ≤ 0.05); blue dots represent the downregulated DE-miRNAs (fold change<0.83, p ≤ 0.05); gray dots represent miRNAs without significant changes (0.83 ≤ fold change≤1.2, p > 0.05). (B), The heatmap of 17 DE-miRNAs in SD vs MD. In the left panel, blue, orange, and purple squares represent the function of miRNAs associated with immune responses, virus infections, and lung diseases, respectively. (C), The GO analysis of the targeted genes in SD vs MD. Top 20 GO terms associated with immune responses, virus infections, and lung diseases were shown. (D), The KEGG analysis results of targeted genes in SD vs MD. Top 20 KEGG pathways associated with immune responses, virus infections, and lung diseaseswere shown. (E), The important miRNAs prioritized by XGBoost analysis. (F), Receiver operating characteristic (ROC) and performance of the XGBoost model in the training set. (G), ROC and performance of the XGBoost model in the test set. (H), Normalized copy numbers of five representative DE-miRNAs in the SD vs MD group, the bars showing the mean ± SD (SD n = 48, MD n = 52). (I), The dynamic change of the copy numbers of hsa-miR-122-5p_R-1 in a fatal patient (COV012, COV031, COV042, and COV077) and a recovered patient (COV011, COV041, and COV078). (J), The dynamic change of the copy numbers of hsa-miR-224-5p_L-1 in a fatal patient (COV012, COV031, COV042, and COV077) and a recovered patient (COV011, COV041, and COV078). DE-miRNAs associated with the viral persistence of SARS-CoV-2 infection We compared the miRNA alternations between the LTNP and STNP groups and identified 78 upregulated and 37 downregulated DE-miRNAs ([161]Figure 5A). There are 20 DE-miRNAs left after removing the ones with the average expression level lower than 10 copies ([162]Data S5). Further analysis suggested that 10, 8, and 11 of the DE-miRNAs were associated with virus infections, lung diseases, and/or immune responses, respectively ([163]Figure 5B). Our data also indicate that the age and gender of the study subjects had no significant influence on the expression level and function of the DE-miRNAs ([164]Figure 5B). A total of 1,848 target genes were mapped to these DE-miRNAs through the target gene prediction analysis ([165]Data S6). Moreover, 594 significant GO terms and 44 KEGG pathways were derived from these targeted genes ([166]Datas S7 and [167]S8). Here, we displayed the top 20 GO terms ([168]Figure 5C) and KEGG pathways ([169]Figure 5D), which were associated with immune responses, virus infection, lung diseases, and other processes, respectively. We also built an XGBoost machine learning model based on miRNA expression data from the 30 LTNP and 33 STNP patients, leading to prioritization of 16 important miRNA variables ([170]Figure 5E). This model reached an area under curve (AUC) of 0.977 and correctly classified 100% of the LTNP and 92% of the STNP patients in the training set ([171]Figure 5F). In the testing set, 5 of 7 LTNP (71%) and 7 of 8 STNP (88%) were correctly classified by this model ([172]Figure 5G). Of note, the lower percentage of correct classification of the LTNP and STNP patients in the testing set was most likely caused by the smaller sample size. Finally, we identified five DE-miRNAs (has-miR-429, has-miR-574-5p, has-miR-483-5p, has-miR-95-3p_R-1, and hsa-miR-378i_R+1_1ss9AT) by combining the differential expression data with the XGBoost machine learning, which could be used as potential biomarkers to distinguish the LTNP from the STNP patients ([173]Figure 5H). Moreover, 9 of the 20 DE-miRNAs were positively or negatively correlated with the length (days) of the nucleic acid test positive for SARS-CoV-2 ([174]Figure S8). Figure 5. [175]Figure 5 [176]Open in a new tab DE-miRNAs associated with the viral persistence of SARS-CoV-2 infection, see also [177]Figure S8 (A), The volcano plot of LTNP vs STNP. Red dots represent the upregulated DE-miRNAs (fold change>1.5, p ≤ 0.05); blue dots represent the downregulated DE-miRNAs (fold change<0.67, p ≤ 0.05); gray dots represent miRNAs without significant changes (0.67 ≤ fold change≤1.5, p > 0.05). (B), The heatmap of 20 DE-miRNAs in LTNP vs STNP. In the left panel, blue, orange, and purple squares represent the function of miRNAs associated with immune responses, virus infections, and lung diseases, respectively. (C), The GO analysis of the targeted genes in LTNP vs STNP. Top 20 GO terms associated with immune responses, virus infections, and lung diseases were shown. (D), The KEGG analysis of the targeted genes in LTNP vs STNP. Top 20 KEGG pathways associated with immune responses, virus infections, and lung diseases were shown. (E), The important DE-miRNAs prioritized by XGBoost analysis. (F), Receiver operating characteristic (ROC) and performance of the XGBoost model in the training set. (G), ROC and performance of the XGBoost model in the test set. (H), Normalized copy numbers of five representative DE-miRNAs in the LTNP vs STNP group, the bars showing the mean ± SD (LTNP n = 30, STNP n = 33). DE-miRNAs that may target the viral sequences of SARS-CoV-2 or the cellular genes associated with the viral life cycle First, among the 2,336 known and 361 novel miRNAs, we detected 58 known and 7 novel miRNAs that may target at the viral sequences of SARS-CoV-2 genome ([178]Data S9), including 4, 20, 16, and 61 miRNAs that target the membrane protein (M), nucleocapsid protein (N), spike protein (S), and open reading frame (ORF1ab), respectively ([179]Figure 6A). In addition, by reviewing the published literature and the relevant databases including The Human Gene Database GeneCards ([180]https://www.genecards.org), we have identified a panel of miRNAs that may target the cellular genes involved in the life cycle of SASR-CoV-2 replication including the entry and post-entry events, such as miRNA let-7b targeting ACE2 receptor, hsa-miR-1-3p targeting AXL receptor, hsa-miR-122-5p targeting ADAM17, hsa-miR-485-5p targeting DC-SIGN, hsa-miR-424-5p targeting furin, hsa-miR-1185-1-3p targeting HAT(HAT1), hsa-miR-183-5p targeting integrin (ITGB1), hsa-miR-1-3p targeting Neuropilin-1 (NRP1), hsa-miR-485-5p targeting TMPRSS4, hsa-miR-34a-5p targeting AR, hsa-miR-423-5p targeting Cyclophilin A, hsa-miR-377-5p targeting Cathepsin B, hsa-miR-148a-5p targeting TMEM106B, hsa-miR-193a-5p_R-1 targeting TOMM70 (TOM70), and hsa-miR-1-3p and hsa-miR-7110-3p targeting VMP1([181]Figure 6B). Our findings indicate that the cellular miRNAs may directly and/or indirectly act on the life cycle of SASR-CoV-2 and involve in the pathogenesis of COVID-19. Figure 6. [182]Figure 6 [183]Open in a new tab The miRNAs targeting the SARS-CoV-2 viral genome and the cellular genes associated with the viral life cycle (A), The miRNAs targeting the SARS-CoV-2 viral genome. There are 65 miRNAs that may target the SARS-CoV-2 viral genome, including membrane protein, spike protein, nucleocapsid protein, and open reading frame. (B), The miRNAs targeting the cellular genes associated with the life cycle of SARS-CoV-2. Discussion In this study, we identified a large number of DE-miRNAs that were associated with the virus infection, clinical symptoms, disease severity, and the viral persistence of the patients with COVID-19 as well as a panel of miRNAs that may target the viral sequences of SARS-CoV-2 or the cellular genes associated with the viral life cycle ([184]Figure 1, [185]Figure 2, [186]Figure 3, [187]Figure 4, [188]Figure 5, [189]Figure 6, [190]Figure 7; [191]Table 1). To our knowledge, this is the first large-scale study of miRNA profiles by high-throughput sequencing on the plasma samples from the AS subjects in parallel with the HC individuals and the SM patients with various clinical presentations, We identified six DE-miRNAs that could be used as biomarkers to distinguish the SASR-CoV-2-infected individuals from the healthy controls, and four of the six biomarkers were suggested to be associated with virus infection, inflammation, immune responses, and lung diseases ([192]Pulati et al., 2019; [193]Ramanathan et al., 2019; [194]Peng et al., 2018). In particular, hsa-miR-302b-3p_R-2 and hsa-miR-302a-3p were differentially expressed in COV vs HC, AS vs HC, and SM vs HC and significantly downregulated in the SARS-CoV-2-infeected patients, suggesting that hsa-miR-302 may be used as a potential biomarker to distinguish patients with COVID-19 from normal controls. The hsa-miR-302 family members have been demonstrated to play vital roles in cell proliferation and the regulation of tumor growth ([195]Maadi et al., 2016; [196]Subramanyam et al., 2011; [197]Lin et al., 2011). Previous studies have shown that influenza A virus (IAV) infection may cause downregulation of miRNA-302a and miR-302c, which may suppress IAV-induced cytokine storm and prevent the translation of nuclear factor-kappa B (NF-κB) from the cytosol to the nucleus, respectively ([198]Gui et al., 2015). Both hsa-miR-302a and hsa-miR-302b may target on the C-X-C motif chemokine ligand 8 (CXCL8) gene, which was associated with the patients with acute respiratory distress syndrome (ARDS) by promoting neutrophil migration to lung interstitium and alveolar space ([199]Williams et al., 2017). CXCL8 was also suggested to play a vital role in the inflammatory diseases of the lungs, such as asthma and chronic obstructive pulmonary disease (COPD) ([200]Huang et al., 2017). These findings suggested that miR-302 may have a protective effect in SARS-CoV-2-infected patients through targeting at the CXCL8-linked pathways. Figure 7. [201]Figure 7 [202]Open in a new tab Potential contributions of the representative DE-miRNAs to the pathogenesis of COVID-19 Representative miRNAs, which were associated with immune responses, virus infections, and lung diseases, were selected from the five compared groups (COV vs HC, AS vs HC, SM vs AS, SD vs MD, and LTNP vs STNP) to help understand the pathogenesis of COVID-19. COV, SARS-CoV-2-infected individuals; HC, Healthy controls; AS, Asymptomatic infection; SM, Symptomatic infection; MD, Moderate disease; SD, Severe disease; LTNP, Long-term nucleic acid test positive; STNP, Short-term nucleic acid test positive. The hsa-miR-146a-3p was significantly upregulated in AS subjects compared with HC. Previous studies suggested that hsa-miR-146a-3p was a key factor in the regulation of innate immunity, viral infection, inflammation, and tumor development ([203]Testa et al., 2017; [204]Fei et al., 2020). Fei et al. found that miR-146a suppresses the NF-κB pathway by downregulating TLR3 and TRAF6 to alleviate the inflammatory response during coxsackievirus B infection ([205]Fei et al., 2020). The host’s innate immune system provides the first line of defense against viral invasions so that the body can effectively fight the virus infection. It is conceivable that through the regulatory loop, the inflammation can be controlled to a level adequate to eliminate the invading viruses without causing significant damage to the host. Our results indicated that hsa-miR-146a-3p may play a protective role in the patients with asymptomatic infection and serve as a biomarker for diagnosis and therapeutic approach for AS. We also identified a number of De-miRNAs that may help differentiate the SM from AS. In particular, hsa-miR-122-5p, hsa-miR-1246, and hsa-miR-885-5p were suggested to be involved in the virus infection, immune responses, and inflammation ([206]Suo et al., 2018; [207]Lu et al., 2020; [208]Janssen et al., 2013; [209]Hussein et al., 2017). For instance, the stability and propagation of hepatitis C virus (HCV) was dependent on a functional interaction between the HCV genome and liver-expressed miR-122 ([210]Hussein et al., 2017). The hsa-miR-122 could promote HCV RNA accumulation, and serum miR-122-5p was significantly higher in patients in different stages of HBV infection than in controls ([211]Lak et al., 2020). HCV infection may lead to dyslipidaemia, and miR-122 can regulate hepatic lipid metabolism and inflammation ([212]Lee et al., 2015). Targeting miR-122 may be an effective treatment strategy for HCV infection ([213]van der Ree et al., 2014). In addition, miR-122-5p may protect acute lung injury via regulation of DUSP4/ERK signaling in pulmonary microvascular endothelial cells and may help differentiate moderate from severe patients with COVID-19 ([214]Lu et al., 2020; [215]Fujita et al., 2021). Nakao et al. reported that miR-122-5p may target ADAM metallopeptidase domain 17 (ADAM17) and become a promising strategy for hepatocellular carcinoma treatment ([216]Nakao et al., 2014). GO analysis showed that ADAM17 enriched in T cell differentiation in thymus. There are robust T cell responses against SARS-CoV-2 in patients with COVID-19 ([217]Neidleman et al., 2020), suggesting that miR-122-5p may be involved in the immune responses in COVID-19. We observed that hsa-miR-214-3p_L+1R-4 was significantly downregulated in SD compared with the MD patients, and was associated with the disease severity. The hsa-miR-214-3p has been suggested to be associated with various cancers, inflammatory diseases, and the related signaling pathways ([218]Liu et al., 2019; [219]Yang et al., 2019; [220]Yan et al., 2020; [221]Li and Wang, 2017). Another important aspect of this study is the identification of 20 DE-miRNAs in the LTNP vs STNP groups, nine of which were significantly correlated with the length of the viral persistence in patients with COVID-19, including hsa-miR-483-5p and hsa-miR-429. The expression level of hsa-miR-483-5p was significantly higher in the LTNP subjects, indicating that hsa-miR-483-5p was positively correlated with the disease course ([222]Figure 4E). The hsa-miR-483-5p was suggested to be associated with influenza virus infection, acute lung injury, chronic pulmonary obstruction, and lung cancer ([223]Hassan et al., 2019; [224]Shen et al., 2017; [225]Wang et al., 2018). The expression of hsa-miR-483-5p was up-regulated in lung injury tissues and the increase of hsa-miR-483-5p may protect acute lung injury ([226]Leng et al., 2020). In addition, has-miR-429 was previously reported to be antiviral in respiratory diseases ([227]Sardar et al., 2020). Kozak et al. have demonstrated that miR-429 may directly target SESN3 (Sestrin 3) ([228]Kozak et al., 2019). Functional analysis showed that SESN3 enriched in the p53 signaling pathway and could be regulated by p53 ([229]Budanov et al., 2010). SARS-CoV-2 could induce p53 signaling pathway in lymphocytes ([230]Xiong et al., 2020). Therefore, has-miR-429 may target SESN3 through p53 signaling pathway and correlated with the viral persistence. Moreover, hsa-let-7b-3p_1ss22CT was up-regulated in LTNP compared with STNP. ACE2 is a major cellular receptor for SASR-CoV-2 and Let-7b may downregulate ACE2 expression ([231]Zhang et al., 2019, [232]2021), suggesting that the upregulation of let-7b in the LTNP patients may lead to the downregulation of ACE2 and subsequently prevent the hosts from virus infection. In this study, we have identified 63 miRNAs that may directly target the viral sequences of SARS-CoV-2 genome and a panel of miRNAs that may target the cellular genes involved in the life cycle of SASR-CoV-2 replication. The exact roles of these miRNA in viral infection, persistence, and pathogenesis of COVID-19 remain unclear at this time, and they warrant additional studies. In addition, we have successfully established and tested a machine learning model for differentiation of SARS-CoV-2-infected patients from uninfected individuals and classification of paients with COVID-19 based on the differentially expressed miRNAs. Currently, detection of viral nucleic acid by RT-PCR is the gold standard method for clinical diagnosis of SARS-CoV-2 infection. Because multiple factors may affect the sensitivity and specificity of RT-PCR assays, our machine learning model may become a novel complementary method to RT-PCR for clinical diagnosis of COVID-19. In conclusion, we identified a large number of DE-miRNAs that were associated with the virus infection, clinical symptoms, disease severity, and the viral persistence of the patients with COVID-19, as well as a panel of miRNAs that may target the host genes involved in a variety of pathways or the viral genomic regions of SARS-CoV-2. Our findings may help us understand the potential contributions of microRNAs to the pathogenesis of COVID-19 and identify novel biomarkers for the diagnosis and treatment of SARS-CoV-2 infection. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Biological samples __________________________________________________________________ Human blood samples (COVID-19) The Renmin Hospital of Wuhan University-China [233]Data S1 Human blood samples (Healthy donors) The First Affiliated Hospital of Xi’an Jiaotong University-China [234]Data S1 __________________________________________________________________ Chemicals, peptides, and recombinant proteins __________________________________________________________________ PBS ThermoFisher Cat# 20,012,068 Ethanol Sigma Cat# E7023 SUPERase•In™ RNase Inhibitor ThermoFisher Cat# AM2694 __________________________________________________________________ Critical commercial assays __________________________________________________________________ Plasma/Serum Circulating and Exosomal RNA Purification Kit Norgen Cat#51000 Next Multiplex Small RNA Library Prep Set New England Biolabs Cat#E7300 Qubit dsDNA HS Assay Kit Invitrogen Cat#[235]Q32854 __________________________________________________________________ Deposited data __________________________________________________________________ The raw sequence data [236]http://www.ncbi.nim.nih.gov/geo/query/acc.cgi?acc=GSE166160 This paper GEO: [237]GSE166160 __________________________________________________________________ Software and algorithms __________________________________________________________________ R [238]https://www.cran.r-project.org V4.0.2 Prism (software) [239]https://www.graphpad.com v8.0.2 Human Genomes [240]ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna/ Hairpin RNA Structures Sequences Predication [241]http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi ComputationalTarget Prediction Algorithms TargetScan [242]http://www.targetscan.org/vert_71/docs/help.html ComputationalTarget Prediction Algorithms miRanda [243]http://cbio.mskcc.org/miRNA2003/miranda.html Bowtie [244]http://bowtie-bio.sourceforge.net/bowtie2/index.shtml V1.0.0 OmicStudio [245]http://www.omicstudio.cn/tool Xgboost [246]Chen et al., 2019[247]https://dl.acm.org/doi/10.1145/2939672.2939785#pill-authors__c ontent V1.4.1.1 [248]Open in a new tab Resource availability Lead contact Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Chengsheng Zhang ([249]cszhang99@126.com). Materials availability This study did not generate new unique reagents. Experimental model and subject details Ethics statement This study was approved by the Ethics Committee of The First affiliated Hospital of Xi’an Jiaotong University (XJTU1AF2020LSK-015) and The Renmin Hospital of Wuhan University (WDRY2020-K130). All participants enrolled in this study offered the written informed consent by themselves or their surrogates. The high throughput sequencing of plasma samples was performed on existing samples collected during standard diagnostic tests, posing no extra burden to patients. Case definition and study cohort The definition and classification of all COVID-19 patients in this study follow the Guidelines of the World Health Organization and the “Guidelines on the Diagnosis and Treatment of the Novel Coronavirus Infected Pneumonia” developed by the National Health Commission of People’s Republic of China ([250]World Health Organization, 2020a, [251]2020b; [252]World Health Organization, 2020a, [253]2020b; [254]Government of China, 2021). The study cohort included 16 asymptomatic subjects (AS, n = 16) and 100 symptomatic patients (SM, n = 100) consisting of 52 moderate disease (MD, n = 52), 12 repeated samples from the MD patients (MD-R, n = 12), 48 severe disease (SD, n = 48), and 22 repeated samples from the SD patients (SD-R, n = 22). The SM group was further divided into the long-term nucleic acid test positive (LTNP, n = 30) and the short-term nucleic acid test positive (STNP, n = 33) sub-groups. In this study, based on the clinical observation that most of the COVID-19 patients hospitalized in the Renmin Hospital in Wuhan became negative for the nucleic acid test within 45 days, we therefore defined the STNP was ≤45 days whereas the LTNP was >60 days ([255]Figure 1A; [256]Table 1 and [257]S1; [258]Data S1). The demographic features, clinical laboratory testing results and other relevant information were provided in [259]Data S1 and [260]Table S1. Method details Blood sample collection and plasma preparation The peripheral blood was collected into the standard EDTA-K2 Vacuette Blood Collection Tubes (Jiangsu Yuli Medical Equipment Co., Ltd, China; Cat.Y09012282) and stored at room temperature or 4°C until processed. The plasma was prepared after centrifugation of the whole blood sample at 2500 rpm for 20 min and stored in the −80^οC freezer until used for the studies. All the experimental procedures were completed inside a biosafety level 2 (BSL-2) laboratory at the Department of Clinical Diagnostic Laboratories, Renmin Hospital of Wuhan University. A total of 233 plasma samples were collected from 61 healthy controls (HC) and 116 SARS-CoV-2 infected individuals (COV) including samples collected at different time-points from some of these individuals. RNA extraction, library construction and high throughput sequencing The Norgen Plasma/Serum Circulating and Exosomal RNA Purification Kit (Slurry Format, Cat#51000) was used for isolation of the cell-free RNA from the plasma samples inside the BSL-2 laboratory described above. Approximately 3 μg of total RNA from each sample was used for the library construction by using the NEBNext Multiplex Small RNA Library Prep Set for Illumina (New England Biolabs, Cat. E7300) following the instructions of the manufacturer with an in-house modification, by which the pre-amplified small RNA molecules were labeled with unique molecular identifier (UMI) (LC-Bio Technology Co., LTD, China) to reduce potential duplication bias during the PCR and sequencing steps. The DNA library was purified and quantified by using 6% PAGE gel and the Qubit dsDNA HS Assay Kit (Invitrogen, Cat. [261]Q32854), respectively. The DNA fragments were examined by using Qsep100TM bio-fragment analyser (Bioptic Inc., Taiwan). The high throughput sequencing was performed on the Novaseq 6000 sequencer (PE150 model, Illumina). Quantification and statistical analysis The miRNA-seq data analysis The raw sequencing data was firstly filtered by using the fastx toolkit (version: 0.0.13.2, [262]http://hannonlab.cshl.edu/fastx_toolkit/) to remove the low-quality reads, and the adaptor sequences were trimmed by using the cutadapt (version: 1.15) ([263]Martin, 2011). To eliminate duplication bias introduced in the library preparation and sequencing, the filtered reads were first clustered based on the UMI sequences and compared to each other by pairwise alignment. The reads with 100% sequence identity were then extracted to a new sub-cluster. Multiple sequence alignments were performed to obtain one consensus sequence for each sub-cluster after all the sub-clusters were generated. The reads were then subjected to an in-house program, ACGT101-miR (LC Sciences, Houston, Texas, USA) to remove common RNA families (rRNA, tRNA, snRNA, snoRNA) and repeats. Subsequently, unique sequences with length in 18–26 nucleotides were mapped to human precursors in miRBase 22.0 by BLAST search to identify known miRNAs and novel 3p- and 5p-derived miRNAs ([264]Kozomara et al., 2019). The length variations at both 3′and 5′ends and one mismatch inside the sequence were accepted in the alignment. The unique sequences mapping to human mature miRNAs in hairpin arms were identified as known miRNAs, whereas the unique sequences mapping to the other arm of known human precursor hairpin opposite to the annotated mature miRNA-containing arm were considered to be novel 5p- or 3p-derived miRNA candidates. The unmapped sequences were BLASTed against the human genomes ([265]ftp://ftp.ensembl.org/pub/release-96/fasta/homo_sapiens/dna/), and the hairpin RNA structures containing sequences were predicated from the flank 80 nt sequences using RNAfold software ([266]http://rna.tbi.univie.ac.at/cgi-bin/RNAfold.cgi). The criteria for secondary structure prediction were: (1) number of nucleotides in one bulge in stem (≤12); (2) number of base pairs in the stem region of the predicted hairpin (≥16); (3) cut-off of free energy (kCal/mol ≤ -15); (4) length of hairpin (up and down stems + terminal loop ≥50); (5) length of hairpin loop (≤20); (6) number of nucleotides in one bulge in mature region (≤8); (7) number of biased errors in one bulge in mature region(≤4); (8) number of biased bulges in mature region (≤2); (9) number of errors in mature region (≤7); (10) number of base pairs in the mature region of the predicted hairpin (≥12); (11) percent of mature in stem (≥80). Identification of differentially expressed miRNAs To identify the differentially expressed miRNAs (DE-miRNAs), miRNA differential expression based on the normalized deep-sequencing counts was analyzed among the eleven groups by using Student’t test ([267]Figure 1A). Nine of the eleven groups (COV vs HC, AS vs HC, SM vs HC, SM vs AS, MD vs AS, SD vs AS, SD vs MD, SD22 vs MD12 and LTNP vs STNP) were analyzed using the independent samples’ t-test ([268]Figure 1B), whereas two of them (MD12 vs MD-R and SD22 vs SD-R) were analyzed using the paired t-test. The threshold for significant difference was set to be p value ≤ 0.05 and the fold changes >1.50 or <0.67. The miRNA with copy number less than 10 was considered to be low expression level. The miRNAs with copy number higher than 10 in at least one sample but less than the average copy number in the dataset (the sum of copy numbers/(sample number × total number of miRNAs) was considered to be middle expression level. The miRNA with copy number higher than the average copy number in the dataset was considered to be high expression level. Target genes prediction We employed the following two computational target prediction algorithms, TargetScan ([269]http://www.targetscan.org/vert_71/docs/help.html) and miRanda ([270]http://cbio.mskcc.org/miRNA2003/miranda.html) to identify the binding sites and predict the genes targeted by the differentially expressed miRNAs ([271]Enright et al., 2003; [272]Agarwal et al., 2015). TargetScan removes target genes with context score percentile<90, whereas miRanda removes target genes with score cut-off <140 and the maximum energy > −20 kcal/mol. Finally, the data obtained by both algorithms were combined and the overlaps were calculated. The comparison of sequences between the SARS-CoV-2 genome and miRNAs To examine whether SARS-CoV-2 may generate virus-encoded miRNAs, the candidate miRNAs detected only in the SARS-CoV-2 infected subjects but not in the healthy controls were aligned to the SARS-CoV-2 genome (NC_045,512.2, a viral genome from Wuhan) with bowtie software (version:1.0.0). The allowed maximum number of mismatches was 3. On the other hand, TargetScan and miRanda were used to check whether the cellular miRNAs may target the SARS-CoV-2 genome. For all of miRNAs in our miRNA library, target gene prediction was performed to target at SARS-CoV-2 gene using miRanda and TargetScan with TargetScan score≥90, miRanda score≥140 and maximum energy < −20 kcal/mol. In addition, miRTarBase ([273]https://mirtarbase.cuhk.edu.cn/∼miRTarBase/miRTarBase_2019/php/in dex.php) was used to search the miRNAs that target the cellular receptors associated with SARS-CoV-2. Gene ontology (GO) and KEGG pathway enrichment analysis For significant target genes, functional annotation analysis was performed to identify the most relevant biological terms, including GO biological process, GO molecular function, GO cellular component and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway. For a GO term or pathway, P was calculated as [MATH: P=1i=0S1 (Bi)(TBBTBi)(TBTS) :MATH] . S denotes the number of significant genes that were mapped to a specific GO term or pathway, B denotes the number of genes mapped to a specific GO or pathway, TB represents the total number of genes mapped to all GO or pathways, TS represents the total number of the significant differentially expressed genes. The flowchart for our analysis was showed in [274]Figure 1A. Machine learning The machine learning was performed using the expression data of the differentially expressed miRNAs for each of the compared groups. The framework of XGBoost (Extreme Gradient Boosting) was derived from the published study ([275]Chen and Guestrin, 2016). The package includes efficient linear model solver and tree learning algorithms. In brief, 75% of the modeling data was selected as the training set, whereas the remaining 25% data was used as the testing set. From the training set, we selected significant microRNAs by machine learning using XGBoost method with the R package “xgboost” (version 1.4.1.1) ([276]Chen and Guestrin, 2016). Then, the training sets sever as the input to perform 5-fold-cross validation to get best parameters. In XGBoost model, L1 Regularization were considered by adjusting “reg_alpha” parameter in XGBoost. Besides, all parameters of these models were added in [277]Data S10. Once the models were built, we applied the model to the test sets and used R package “InformationValue” (version:1.2.3) to get AUC curve. Besides, the performance of the XGBoost model in the training and test set were analyzed by “ggplot2” (version:3.3.5). Statistical analysis The two-sided Student’t test and paired t-test were used to calculate the differentially expressed miRNAs in different compared groups. The threshold for significant difference was set to be p ≤ 0.05, Fold changes >1.50 or <0.67. The miRNAs with low expression levels were excluded for further analysis. For the differentially expressed miRNAs in LTNP vs STNP, the correlation analysis was performed based on the number of days in hospital with nucleic acid test positive and the normalized copy number of the significant differentially miRNA in the symptomatic patient samples. GraphPad Prism 8.0.2 and OmicStudio tools ([278]https://www.omicstudio.cn/tool) were used to perform our analysis. Acknowledgments