Abstract Rationale: Viral infections are complex processes based on an intricate network of molecular interactions. The infectious agent hijacks components of the cellular machinery for its profit, circumventing the natural defense mechanisms triggered by the infected cell. The successful completion of the replicative viral cycle within a cell depends on the function of viral components versus the cellular defenses. Non-coding RNAs (ncRNAs) are important cellular modulators, either promoting or preventing the progression of viral infections. Among these ncRNAs, the long non-coding RNA (lncRNA) family is especially relevant due to their intrinsic functional properties and ubiquitous biological roles. Specific lncRNAs have been recently characterized as modulators of the cellular response during infection of human host cells by single stranded RNA viruses. However, the role of host lncRNAs in the infection by human RNA coronaviruses such as SARS-CoV-2 remains uncharacterized. Methods: In the present work, we have performed a transcriptomic study of a cohort of patients with different SARS-CoV-2 viral load and analyzed the involvement of lncRNAs in supporting regulatory networks based on their interaction with RNA-binding proteins (RBPs). Results: Our results revealed the existence of a SARS-CoV-2 infection-dependent pattern of transcriptional up-regulation in which specific lncRNAs are an integral component. To determine the role of these lncRNAs, we performed a functional correlation analysis complemented with the study of the validated interactions between lncRNAs and RBPs. This combination of in silico functional association studies and experimental evidence allowed us to identify a lncRNA signature composed of six elements - NRIR, BISPR, MIR155HG, FMR1-IT1, USP30-AS1, and [55]U62317.2 - associated with the regulation of SARS-CoV-2 infection. Conclusions: We propose a competition mechanism between the viral RNA genome and the regulatory lncRNAs in the sequestering of specific RBPs that modulates the interferon response and the regulation of RNA surveillance by nonsense-mediated decay (NMD). Keywords: SARS-CoV-2, long non-coding RNA, RNA-binding protein, regulatory network Introduction Pervasive transcription of the human genome generates a wide range of regulatory RNA molecules that control the flow of genetic information originated from the cell nucleus. Among these regulatory RNAs, long non-coding RNAs (lncRNAs), defined as those non-coding RNAs (ncRNAs) with sizes larger than 200 nucleotides and originated from specialized transcriptional units, are a very diverse class. These genes typically harbor their own promoters and regulatory sequences, many undergoing splicing and post-transcriptional modifications [56]^1. According to a recent update of the GENCODE database, the estimated number of lncRNA genes in the human genome is now over 18,000, a comparable number to the protein coding genes (around 20,000) [57]^2. LncRNA transcriptional units typically generate structured RNA molecules with regulatory functions that modulate the genomic output at different levels, including acting as scaffolds of high-molecular weight complexes as well as interacting with other biomolecules such as DNA, RNAs, and proteins [58]^3^-[59]^5. LncRNAs have been found to have cell-state specific functions and have modulating effects on protein-coding gene transcription [60]^6. Transcriptomic analysis driven by next-generation sequencing applications has unveiled functional relationships between lncRNA and the pathophysiology of metabolic diseases, cancer, and infections [61]^7^-[62]^9. The existence of a pathology is often accompanied by a dysregulation of lncRNA expression that could represent a secondary event associated with the disease or as a driving factor of the condition [63]^10. Viral infections are extreme cases of the interaction between two organisms in which the infectious agent strictly depends on the molecular and metabolic machinery of the infected cell to complete its replication and proliferation cycle. During the hijacking of the host cellular machinery by the virus, key molecular interactions between viral components and cellular structures are established. These interactions are responsible for the reorganization of cellular membranes to facilitate virus entry, modulation of cellular metabolism, and evasion of specific defense mechanisms [64]^11. Most of the knowledge about cellular and viral molecular players during infection is in the protein realm, represented by the characterization of viral-encoding polypeptides that are responsible for the progression of the infection or immune evasion, and their cellular cognate targets. However, the relevance of cellular and viral RNAs as relevant players within the context of an infection must be considered [65]^12. The roles of lncRNAs as mediators or drivers of viral infections were first unveiled in the last decade [66]^13. LncRNA mediators have been shown to play key roles in the regulation of the immune and inflammatory response against viral infections [67]^14^, [68]^15. These well-described examples define the regulatory role of individual lncRNAs during RNA viral infections [69]^16^-[70]^18. For instance, a leading cause of viral gastroenteritis from the human norovirus can induce a strong lncRNA-based response in the host cells that is related to the regulation of the interferon response [71]^19. Strains of human hepatitis C virus (HCV) associated with long-term persistence downregulate the expression of lncPINT (p53-induced transcript long non-coding RNA) as a mechanism for circumventing the interferon defense mechanism and evading the innate immune response [72]^20. Following a similar strategy, the recently characterized lncRNA [73]AP000253, provides a mechanism by which hepatitis B virus can remain occult for prolonged times within the host [74]^21. In many of these examples, results obtained from experimental models linked the lncRNA mediators of infection with a complex network of RNA-binding proteins (RBPs) [75]^20^, [76]^22. SARS-CoV-2, a respiratory RNA(+) virus with a rapid transmission pattern, was responsible for the global pandemic that started in late 2019. SARS-CoV-2 is a virus belonging to the coronaviridae family that enters the cell by specific interactions with the host ACE2 receptor [77]^23^, [78]^24. After internalization, cell infection is characterized by a dysregulated gene and protein expression pattern that includes an up-regulation of genes involved in the interferon response and interleukin production [79]^25^, [80]^26. If the virus evades host cell defenses, the replication of the genetic material is enabled by a multimeric RNA-dependent RNA polymerase. The RNA genome is translated into a polypeptide that is matured by proteolytic specific digestion with two viral proteases, the main protease (MPro) and the papain-like protease (PLPro) [81]^27. Whole virions are assembled and secreted by a pathway that involves the participation of the endoplasmic reticulum and Golgi complex [82]^11^, [83]^12. In severe cases, SARS-CoV-2 infected patients showed a striking pattern of acute inflammatory responses that has been related to the uncontrolled production of cytokines and designated as “cytokine storm” [84]^28^, [85]^29. Genomic SARS-CoV-2 RNA and its RNA transcripts interact with specific proteins modulating cellular responses to the infection, as revealed by high-throughput proteomic analysis [86]^25^, [87]^30. The multiple interactions between the viral genome/transcriptome and cellular proteins are a factor in promoting replication of the virus or, contrariwise, ensuring the success of the cell in preventing replication [88]^31^, [89]^32. Small non-coding RNAs (ncRNAs) have been previously described as regulatory factors in the virus-host interface [90]^33. However, the functions and roles of lncRNAs in the development and progression of SARS-CoV-2 infection remain uncharacterized. In this work, we determined the lncRNA dysregulation pattern induced by the SARS-CoV-2 infection and characterized the lncRNA-centered regulatory networks involving RBPs associated with RNA metabolism and interferon-mediated responses, by analysis of high-throughput transcriptomes of samples obtained from patients with and without SARS-CoV-2 infection. The detailed knowledge of the complex regulatory networks involving lncRNAs could open new perspectives for the design of targeted drugs to treat severe cases of SARS-CoV-2 infection. Material and methods Data source and group stratification The source data for this study was generated within the framework of COV-IRT consortium (www.cov-irt.org) and deposited at the Short Read Archive (SRA) database with the project reference PRJNA671371, corresponding to a previously published study [91]^26. The dataset includes a shotgun metatranscriptomic (total RNA-seq) for host and viral profiling of 735 clinical specimens obtained from patients at the Weill Medical College of Cornell University, New York, USA. Patients were stratified according to the SARS-CoV-2 levels determined by qRT-PCR experiments by simultaneously using primers to amplify the E (envelope protein) and S (spike protein) genes together with the proper internal controls as previously described [92]^26. Patients with a cycle threshold value (Ct) less than or equal to 18 were assigned to “high viral load”, a Ct between 18 and 24 were assigned to “medium viral load”, and a Ct between 24 and 40 were assigned to “low viral load” classes, with anything above a Ct of 40 classified as “negative” [93]^26. These last patients were also subdivided according to the presence of other viral respiratory infections different from Covid19 and having compatible symptoms. Analysis of RNAseq data Raw Illumina sequence reads obtained by a pair-end sequencing strategy, were filtered, and trimmed with Trimmomatic software [94]^34. Filtered sequence reads were dual-aligned with the reference SARS-CoV-2 genome from Wuhan (strain reference [95]MN908947.3) and the human genome (genome build GRCh38 and GENCODE v33) using the STAR aligner [96]^35. The gene counts were indexed to the different families of coding and non-coding gene transcripts by the BioMart data portal [97]^36. Data was normalized using the variance-stabilizing transform (vst) in the DESeq2 package [98]^37. Differential gene expression between working groups was determined by the Limma/Voom algorithm implemented in the iGEAK data processing platform for RNAseq data [99]^38. Criteria for selection of significant differentially expressed genes included an adjusted p-value < 0.05, and logFc < -1.0 or logFc > 1.0. All the gene expression data is publicly available at the Weill Cornell Medicine COVID-19 Genes Portal, an interactive repository for mining the human gene expression changes in the data from this study (covidgenes.weill.cornell.edu). Bioinformatic analysis of lncRNA-centered regulatory networks The functional annotation of the group of selected lncRNAs whose expression was induced by SARS-CoV-2 infection was performed by the ncFANS 2.0 platform using the ncRNA-NET module [100]^39. Applying this module, we determined the co-expression network involving the selected lncRNAs and protein-coding genes using data extracted from healthy tissues and compiled in the Genotype-Tissue Expression (GTEx) portal [101]^40. The correlated coding genes were functionally grouped by GO-term analysis, pathway enrichment, and determination of molecular signatures by the ncRNA-NET module in ncFANS. In addition to the classical GO-term enrichment analysis, the redundant ontology terms were filtered by REVIGO software [102]^41. The lncRNA-centered regulatory networks established between lncRNAs and RNA-binding proteins were constructed by interrogating the ENCORI database for RNA interactomes [103]^42. Graphical analysis and representation of lncRNA-centered regulatory networks was performed by NAViGaTOR software [104]^43. Functional similarity of the selected overexpressed lncRNAs in SARS-CoV-2 infection was inferred by integrating heterogeneous network data with IHNLncSim algorithm [105]^44. This approach integrates information from experimentally validated data at three levels of functional association: miRNA-lncRNA, disease-based correlation and GTEx expression-based networks. Results Host transcriptional shift induced by SARS-CoV-2 infection To characterize the cellular response against SARS-CoV-2 infection, we performed a transcriptomic analysis from nasopharyngeal swabs collected from patients testing for SARS-CoV-2 virus. The patients were previously stratified according to the presence or absence of positive qPCR test, the existence of other respiratory pathogens different from SARS-CoV-2 and the different virus loading depending on the amplification Ct parameters as described in the Material and Methods section. The results, depicted in Figure [106]1A, characterize the transcriptional dysregulation in the host cells associated with infections by SARS-CoV-2 and other respiratory viruses. In SARS-CoV-2 patients, increased viral load resulted in an increment of the number of upregulated transcripts (Figure [107]1B-C). Globally, the number of transcripts in infected patients with a logFC > 1 compared to uninfected control patients increased from 52 to 891 from low to high SARS-CoV-2 viral loads. Analyzing the different families of transcripts, high viral load SARS-CoV-2 infected patients together with those infected with other respiratory viruses showed a preferential upregulation pattern, where the coding RNAs were more abundant. Moreover, the patients with higher SARS-CoV-2 loads also showed greater proportion of upregulated transcripts represented by lncRNAs (Figure [108]1B). Figure 1. [109]Figure 1 [110]Open in a new tab SARS-CoV-2 infection is characterized by a gene expression pattern enriched in up-regulated mRNA and lncRNA transcripts that can be correlated with the viral load observed in patients. A, number of differentially expressed transcripts observed in patients with different SARS-CoV-2 viral loads (Low, Medium and High) and those infected with different respiratory viruses (Other) in comparison with the uninfected patients; B, number of the different families of up-regulated transcripts in SARS-CoV-2 patients and infected with other respiratory viruses in comparison with the control group; C, number of the different families of down-regulated transcripts in SARS-CoV-2 patients and infected with other respiratory viruses in comparison with the control group; D, Venn diagram representing the number of up-regulated lncRNA transcripts observed in each group of study referred to the uninfected control group; E, CIRCOS plot [111]^45 showing the genomic location and fold changes of the differentially expressed coding transcripts in the group of SARS-CoV-2 patients infected with higher viral loads in comparison with the uninfected controls (red squares, up-regulated mRNAs; green squares, down-regulated mRNAs); F, CIRCOS plot [112]^45 depicting the genomic locations and fold changes of the differentially expressed lncRNA transcripts in the group of SARS-CoV-2 patients infected with higher viral loads in comparison with the uninfected controls (red circles, up-regulated lncRNAs; green circles, down-regulated lncRNAs). Interestingly, from the 152 upregulated lncRNAs in high viral load samples, only 2 are common to all the analyzed infections. In SARS-CoV-2 infected patients, 105 upregulated lncRNAs were exclusive to the higher-level infections (Figure [113]1D). Positional gene enrichment analysis [114]^46 of the upregulated lncRNA loci in high level SARS-CoV-2 infection showed two genomic regions enriched in overexpressed transcriptional units in response to the virus, comprising chr1: 148290889-155324176 and chr17: 32127595-62552121. The remaining overexpressed lncRNAs and coding mRNAs were evenly distributed across the different chromosomal loci with no evident spatial enrichment pattern (Figure [115]1E-F). The complete list of differentially expressed genes in all the comparisons is available as supplementary table ([116]Table S1). Functional analysis of upregulated lncRNAs in SARS-CoV-2 infection Some of the upregulated lncRNAs detected in patients with high viral load have been already characterized in different biological contexts (Table [117]1). However, to understand the global role of lncRNAs during SARS-CoV-2 infection, a transcriptome-wide analysis should be required. Prediction of lncRNA functions using the principles of systems biology is a challenging task due to the lack of supporting experimental evidence and the complexity of interactions established among lncRNAs and other functional players. Among the computer-based strategies available, we selected ncFANs 2.0 as a functional classifier [118]^39. The ncFANs-NET module was used to predict the functions of the upregulated lncRNAs in high-viral load infections by using the “guilty by association” approach. A correlation network between the differentially overexpressed lncRNAs and coding genes was constructed by ncFANs using data extracted from GTEx project database [119]^47 and enrichment analyzed using terms from the Gene Ontology (GO) [120]^48^, [121]^49 and KEGG databases [122]^50. The results of the functional analysis of the resultant co-expression network by GO-term enrichment with redundant term filtering, pathway analysis and molecular signature determination are depicted in Figure [123]2. GO-term enrichment within the category of molecular function resulted in the selection of terms related with cell-to-cell communication, and the general processes of lymphocyte activation and cytokine production (Figure [124]2A). The KEGG-pathway enrichment analysis resulted in a list of pathways also related with cytokine response and regulation, T-cell signaling and infections by viruses, bacteria, Trypanosoma and Apicomplexa parasites (Figure [125]2B), suggesting the common lncRNA-related regulatory responses exerted by the host cells against different infectious agents. Interestingly, the analysis of the molecular signatures in the regulatory lncRNA network revealed the existence of genes related with the interleukin signaling pathways, the interferon gamma response and the epithelial to mesenchymal transition phenomena, as more significant functions (Figure [126]2C). Table 1. upregulated lncRNAs detected in nasopharyngeal samples from patients with high SARS-CoV-2 viral loads that have been functionally characterized in different cellular processes or pathologies. Symbol ENSEMBL gene Location Comments References