Abstract The novel betacoronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused a worldwide pandemic (COVID-19) after emerging in Wuhan, China. Here we analyzed public host and viral RNA sequencing data to better understand how SARS-CoV-2 interacts with human respiratory cells. We identified genes, isoforms and transposable element families that are specifically altered in SARS-CoV-2-infected respiratory cells. Well-known immunoregulatory genes including CSF2, IL32, IL-6 and SERPINA3 were differentially expressed, while immunoregulatory transposable element families were upregulated. We predicted conserved interactions between the SARS-CoV-2 genome and human RNA-binding proteins such as the heterogeneous nuclear ribonucleoprotein A1 (hnRNPA1) and eukaryotic initiation factor 4 (eIF4b). We also identified a viral sequence variant with a statistically significant skew associated with age of infection, that may contribute to intracellular host–pathogen interactions. These findings can help identify host mechanisms that can be targeted by prophylactics and/or therapeutics to reduce the severity of COVID-19. Subject terms: SARS-CoV-2, RNA, Computational biology and bioinformatics, Transcriptomics, Viral infection __________________________________________________________________ Ferrarini & Lal et al. developed a novel bioinformatic pipeline to explore how SARS-CoV-2 interacts with human respiratory cells using public available host gene expression and viral genome sequence data. Several human genes and proteins were predicted to play a role in the viral life cycle and the host response to SARS-CoV-2 infection. Introduction In December of 2019, a novel betacoronavirus that was named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China^[48]1,[49]2. This virus is responsible for causing the coronavirus disease of 2019 (COVID-19) and by 24 January 2021, it had already infected more than 95 million people worldwide, accounting for at least 2 million deaths^[50]3. The SARS-CoV-2 genome is phylogenetically distinct from the SARS-CoV and Middle East Respiratory Syndrome (MERS-CoV) betacoronaviruses that caused human outbreaks in 2002 and 2012, respectively^[51]4,[52]5. Based on its high sequence similarity to a coronavirus isolated from bats^[53]6, SARS-CoV-2 is hypothesized to have originated from bat coronaviruses, potentially using pangolins as an intermediate host before infecting humans^[54]7. It remains a global priority to develop effective treatments for COVID-19, including treatments that inhibit viral replication inside human cells. At the same time, it is critical to control the hyper-inflammatory state that is frequently caused by this infection^[55]8. It is therefore important to define the biological processes that occur early in infection, including the mechanisms of viral replication, transcription, and translation inside host cells, which can be targeted by therapeutics^[56]9, as well as host immune responses that can be modulated^[57]8. Although many aspects of SARS-CoV-2 infection may be shared with other respiratory viruses, it is particularly interesting to identify its specific molecular interactions with host cells, to explain the unique clinical and epidemiological features of COVID-19^[58]10,[59]11. Further, the observation of heterogeneous immune responses in COVID-19 patients^[60]12 emphasizes the importance of identifying molecular responses to SARS-CoV-2, which are consistent across patients, and can therefore be targeted to develop widely applicable treatments. SARS-CoV-2 enters human cells by binding to the angiotensin-converting enzyme 2 (ACE2) receptor^[61]13. Once inside the infected cell, components of the virus interact with host cell machinery. Coronaviruses have been shown to co-opt a diverse range of host factors for their life cycle, forming both protein–protein interactions and RNA–protein interactions with host factors^[62]14,[63]15. Furthermore, viruses generally trigger a drastic host response during infection. A subset of these specific changes in gene regulation are associated with viral replication and, therefore, can be seen as potential drug targets. In addition, transposable element (TE) overexpression has been observed upon viral infection^[64]16 and TEs have been actively implicated in gene regulatory networks related to immunity^[65]17. Recent studies have sought to understand the molecular interactions between SARS-CoV-2 and infected cells^[66]18,[67]19, and some have quantified gene expression changes in patient samples or cultured lung-derived cells infected by SARS-CoV-2^[68]20–[69]22. However, it remains important to contrast the effects of SARS-CoV-2 with those of other respiratory viruses. Furthermore, host factors such as TEs and genetic isoforms remain understudied in the context of SARS-CoV-2 infection. Here we aim to identify host factors, pathways, and processes that are altered in response to SARS-CoV-2 infection in human cells, in particular those that are unaffected by other respiratory infections. Moreover, although many previous studies have examined immune cells, we focused specifically on human airway epithelial cells, as they are the primary entry points for respiratory viruses and therefore constitute the first producers of inflammatory signals that, in addition to their antiviral activity, promote the initiation of the innate and adaptive immune responses. We identified a signature of altered gene expression that is consistent across published datasets of SARS-CoV-2-infected human lung cells. We present extensive results from functional analyses (signaling pathway enrichment, biological functions, transcript isoform usage, and TE overexpression) of the genes differentially expressed during SARS-CoV-2 infection^[70]22, highlighting a consistent isoform switch of the IL-6 gene in SARS-CoV-2-infected cell lines. We also analyzed viral genome sequences to predict specific interactions between the SARS-CoV-2 RNA genome and human proteins that may be involved in viral replication, transcription, or translation, and identified at least one viral sequence variation that is significantly associated with patient age in humans. Knowledge of these molecular and genetic mechanisms is important to understand SARS-CoV-2 pathogenesis and to improve the future development of effective prophylactic and therapeutic treatments. Results We designed a comprehensive bioinformatics workflow to identify relevant host–pathogen interactions using a complementary set of computational analyses (Fig. [71]1). First, we carried out an exhaustive analysis of differential gene expression in human lung cells infected by SARS-CoV-2 or other respiratory viruses, identifying gene-, isoform-, and pathway-level responses, which specifically characterize SARS-CoV-2 infection. Second, we predicted putative interactions between the SARS-CoV-2 RNA genome and human RNA-binding proteins (RBPs). Third, we identified a subset of these human RBPs, which are also differentially expressed in response to SARS-CoV-2. Finally, we identified a viral sequence variant that could play a role in intracellular host–pathogen interaction. Fig. 1. Overview of the bioinformatic workflow applied in this study. [72]Fig. 1 [73]Open in a new tab As indicated in orange, RNA-seq data from SARS-CoV-2-infected samples were used as the input to identify differentially expressed (DE) genes, isoforms, and transposable elements (TEs). DE genes were used to identify functional enrichment of deregulated genes and possible impacts on metabolism. Neighboring genes of differentially expressed TEs (DETEs) were analyzed to verify if TEs could serve as regulatory mechanisms of gene expression. In green, the complete genome of the SARS-CoV-2 virus was used to identify enrichment of RNA-binding protein (RBP) motifs. We also used all available sequenced genomes as of 11 November 2020, to detect conserved RBP motifs and possible links to disease severity. SARS-CoV-2 infection elicits a specific gene expression and pathway signature in human cells We wanted to identify genes that were differentially expressed across multiple SARS-CoV-2-infected samples, while not significant in samples infected with other respiratory viruses. As a primary dataset, we selected [74]GSE147507^[75]22 (Fig. [76]2a), which includes gene expression measurements from three cell lines derived from the human respiratory system (NHBE, A549, Calu-3) infected either with SARS-CoV-2, influenza A virus (IAV), respiratory syncytial virus (RSV), or human parainfluenza virus 3 (HPIV3). We also analyzed an additional dataset [77]GSE150316 (Fig. [78]2a), which includes RNA sequencing (RNA-seq) extracted from formalin-fixed, paraffin-embedded (FFPE) histological sections of lung biopsies from COVID-19 deceased patients and healthy individuals. This second dataset encompasses a variable number (1–5) of postmortem lung biopsies per subject. The results coming from FFPE sections were less consistent, presumably due to the collection of biospecimens from different sites within the lung. Supplementary Data [79]1 provides details of all the samples included in our analyses. Fig. 2. Overview of the RNA-seq-based results specific to SARS-CoV-2, which were not detected in the other viral infections (IAV, HPIV3, and RSV). [80]Fig. 2 [81]Open in a new tab a Representation of the RNA-seq studies used in our analyses. b A subset of non-redundant reduced terms consistently enriched in more than one SARS-CoV-2 cell line, which were not detected in the other viruses’ datasets. c Top 20 differentially expressed isoforms (DEIs) in SARS-CoV-2-infected samples. The y-axis denotes the differential usage of isoforms between conditions (i.e., difference in isoform fraction, dIF), whereas the x-axis represents the overall log2FC of the corresponding gene. Thus, DEIs also detected as differentially expressed genes (DEGs) by this analysis are depicted in blue. d Different manners by which transposable element (TE) family overexpression might be detected. Although TEs may be autonomously expressed, the old age of most TEs detected points toward either being part of a gene (exonization or alternative promoter) or a result of pervasive transcription. We report the functional enrichment for neighboring genes of differentially expressed TEs (DETEs) specifically upregulated in SARS-CoV-2 Calu-3 and A549 cells (MOI 2). Source data for Fig. 2 is provided in Supplementary Data [82]18. We retrieved 41 differentially expressed genes (DEGs) that showed significant and consistent expression changes in at least three datasets from cell lines infected with SARS-CoV-2 and were not significantly affected in cell lines infected with other viruses within the same dataset. To these, we added 36 genes that showed significant and consistent expression changes in 2 of 4 cell line datasets infected with SARS-CoV-2 and at least 1 lung biopsy sample from a SARS-CoV-2 patient. The rationale behind these criteria was that results from FFPE sections were less reliable and, hence, were used only as supporting evidence where a gene was altered in at least two cell line samples. We further excluded four discordant genes that were upregulated in more than one cell line sample and downregulated in the biopsy samples. Thus, the final set consisted of 73 DEGs (Supplementary Data [83]2a): 53 upregulated and 20 downregulated, of which 41 had an absolute Log2FC > 1 in at least one dataset (selected genes from this list are shown in Table [84]1). Table 1. Differentially expressed genes specific to SARS-CoV-2 infection. Gene Cell type and MOI Also detected in biopsies A549 A549 Calu-3 NHBE MOI 0.2 MOI 2 VNN2 6.18 0.42 6.13 CSF2 3.56 7.30 2.70 WNT7A 4.99 0.79 0.45 PDZK1IP1 1.72 0.70 2.28 SERPINA3 0.49 1.39 0.77 1.44 Case 9 RHCG 1.51 2.02 1.33 IL32 1.64 1.23 1.21 Case 1 PDGFB 1.91 1.75 1.00 ALDH1A3 1.09 1.32 0.39 TLR2 1.63 0.89 0.84 SERPINB1 0.61 1.17 0.72 PRDM1 0.82 3.49 0.59 MT-TN 0.55 1.70 0.33 ATF4 0.79 1.07 0.26 PTPN12 0.48 0.97 1.23 DUSP16 0.33 0.41 1.43 FKBP5 −0.39 −0.36 Cases 1, 3, 8, 11 DAP −0.18 −0.61 Case 1 FECH −0.27 −0.36 Case 1 MT-CYB −0.30 −0.26 Case 1, 8 EIF4A1 −0.33 −0.63 Case 1 POLE4 −0.23 −0.82 −1.24 DDX39A −0.23 −1.27 −0.54 CENPP −0.36 −0.40 −0.38 TMEM50B −0.48 −0.59 −0.53 HPS1 −0.28 −0.31 −0.62 SNX8 −0.30 −0.43 −0.56 [85]Open in a new tab Log2 fold change for selected genes that showed significant up- or downregulation in SARS-CoV-2-infected samples (FDR-adjusted p-value < 0.05) and not in samples infected with the other viruses tested. MOI multiplicity of infection. SERPINA3, an antichymotrypsin that was proposed as a candidate inhibitor of viral replication^[86]23, was the only gene specifically upregulated in the four cell line datasets tested (Table [87]1). Other interesting upregulated genes were the amidohydrolase VNN2, the pro-fibrotic gene PDGFB, the β-interferon (IFN) regulator PRDM1 and the proinflammatory cytokines CSF2 and IL32. FKBP5, a known regulator of nuclear factor-κB (NF-κB) activity, was among the consistently downregulated genes. This set of genes represents a signature of host response specific to SARS-CoV-2 and may help to explain the specific clinical and epidemiological features of this disease. We also generated additional lists of DEGs that met different filtering criteria (Supplementary Data [88]2b, c and [89]3 for the complete DEG results for each dataset). To better understand the underlying biological functions and molecular mechanisms associated with the observed DEGs, we performed a hypergeometric test to detect statistically significant overrepresented Gene Ontology (GO) terms^[90]24 among the DEGs having an absolute Log2FC > 1 in each dataset separately^[91]24. Consistent with the findings of Blanco-Melo et al.^[92]22, GO enrichment analysis returned terms associated with immune system processes, response to cytokine, stress and viruses, and phosphatidylinositol 3-kinase (PI3K)/AKT signaling pathway, among others (see Supplementary Data [93]4 for complete results). In addition, we report 285 GO terms common to at least two cell line datasets infected with SARS-CoV-2 and absent in the response to other viruses (Fig. [94]2b and Supplementary Data [95]5a, b), including neutrophil and granulocyte activation, interleukin-1 (IL-1)-mediated signaling pathway, proteolysis, and stress-activated signaling cascades. We also detected 397 cell line-specific GO terms (76 in NHBE, 160 in A549, and 161 in Calu-3), which were not detected in the other viral datasets (Supplementary Data [96]5c). Our results show that each cell type regulates specific responses against SARS-CoV-2: A549-specific terms included vacuolar organization, endosome membrane, and protein export, whereas Calu-3-specific terms included oxidative phosphorylation, mitochondria, and cellular response to oxidative stress; NHBE cells had the majority of significant terms involved in cell chemotaxis and leukocyte-mediated immunity. One possible reason for these cell-specific responses is that each cell type expresses different levels of the viral receptor ACE2 (Supplementary Fig. [97]1). Next, we wanted to pinpoint intracellular signaling pathways that may be modulated specifically during SARS-CoV-2 infection. A robust bootstrap-based signaling pathway impact analysis (SPIA) enabled us to identify 30 pathways, including many involved in the host immune response, which are significantly enriched among DEGs in at least one virus-infected cell line dataset (Supplementary Data [98]6). More importantly, we predicted four pathways to be specific to SARS-CoV-2 infection and observed that the significant pathways differ by cell type and multiplicity of infection (MOI). The significant results included only one term common to A549 (MOI 0.2) and Calu-3 cells (MOI 2), namely IFN-α/β signaling. In addition, we found the amoebiasis pathways (A549 cells, MOI 0.2) and the p75(NTR)-mediated and trka receptor signaling pathways (A549 cells, MOI 2) to be significantly impacted. We also used a classic hypergeometric method as a complementary approach to our SPIA pathway enrichment analysis. Although there were generally higher numbers of significant results using this method, we observed that the vast majority of enriched terms (false discovery rate (FDR) < 0.05) described infections with various pathogens, innate immunity, metabolism, and cell cycle regulation (Supplementary Data [99]6). Interestingly, we were able to detect enriched Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways common to at least two SARS-CoV-2-infected cell types and absent from the other virus-infected datasets (Fig. [100]2b). These included pathways related to infection, cell cycle, endocytosis, signaling pathways, cancer, and other diseases. Our analyses highlight biological pathways in human lung cells that are altered specifically by SARS-CoV-2 infection, either in a cell-type-specific manner or consistent across cell types. This complements studies identifying pathway-level alterations in immune cells of COVID-19 patients^[101]25–[102]27. SARS-CoV-2 infection results in altered lipid-related metabolic fluxes To better understand how gene expression changes in response to virus infection impact human metabolism, we projected the transcriptomic data onto the human metabolic network^[103]28. This is an important analysis, as it can recover pathway-level changes that are not evident from analyzing dysregulated genes separately. This is based on the fact that the regulation of the entire metabolic pathways can be achieved by targeting few key enzymes via different regulatory mechanisms^[104]29–[105]31. By integrating information of the metabolic network with differential expression, we can predict which connected pathways were most likely increased or decreased in viral infection^[106]32. This analysis detected decreased fluxes in inositol phosphate metabolism in both A549 and Calu-3 cells infected with SARS-CoV-2 with an MOI of 2 (Supplementary Data [107]7). In addition, we detected an increased flux common to A549 and Calu-3 cell lines in reactive oxygen species detoxification, in accordance with previous terms recovered from functional enrichment analyses. Our analysis in A549 cells (MOI 2) also recovered decreased fluxes in several lipid pathways: fatty acid, cholesterol, sphingolipid, and glycerophospholipid, which have been shown as essential for the infection of multiple coronaviruses^[108]33. Overall, we were able to predict pathway-level changes that were not evident based only on DEGs, given that the control of key enzymes can be enough for the regulation of entire pathways. SARS-CoV-2 infection induced an isoform switch of genes associated with immunity and mRNA processing We analyzed changes in transcript isoform expression and usage associated with SARS-CoV-2 infection, and predicted whether these changes might result in altered protein function. We calculated isoform fraction (IF) values as the percentage of an individual isoform’s expression level relative to all other isoforms present within the parent gene’s expression level as presented in Eq. ([109]1): [MATH: IFisoform1= Isoformexpression1GeneexpressionIsoformexpression1+isoform2++isoformn :MATH] 1 Differential isoform usage (difference in IF, dIF) is defined as the difference in the fraction of an isoform present between two conditions presented in Eq. ([110]2): [MATH: dIF=IFcondition2< mi mathvariant="normal">IFcondition1 :MATH] 2 We identified isoforms experiencing a switch in usage ≥30% in absolute value (dIF ≥ |0.3|) across conditions and retrieved those with an FDR-adjusted p-value (q-value) < 0.05. Based on these criteria, we detected 3569 differentially expressed isoforms across all samples (Supplementary Table [111]1 and Supplementary Data [112]8). We performed biological consequence enrichment analysis to assess whether a particular consequence occurs more frequently than its opposite between conditions (Supplementary Fig. [113]2). For example, isoforms from A549 cells infected with RSV, IAV, and HPIV3 exhibited significant increases in nonsense-mediated decay (NMD) sensitivity and intron retention gain, while simultaneously exhibiting decreases in open reading frames (ORFs) and domains present. These conditions also displayed significant changes in splicing patterns, ranging from loss of exon skipping events, changes in usage of alternative transcription start and termination sites, and decreased alternative 5′- and 3′-splice sites (Supplementary Fig. [114]3). In contrast, isoforms from SARS-CoV-2-infected samples displayed no significant global enrichment of biological consequences or alternative splicing events (Supplementary Figs. [115]2 and [116]3, respectively). However, nonsignificant trends (FDR-adjusted p-value > 0.05) indicated that certain transcripts in SARS-CoV-2 samples experienced decreases in ORF length, numbers of domains, coding capability, intron retention, and NMD (Supplementary Fig. [117]2). Although these trends were not significant on the genome-wide scale, they implicate that SARS-CoV-2 may trigger host machinery to target and aberrantly splice specific isoforms, leading to decreases in transcript length and, therefore, production of truncated proteins or alternative proteins. To identify the specific isoforms affected by SARS-CoV-2 infection, we analyzed gene expression and isoform usage of individual isoforms in SARS-CoV-2 samples vs. controls. Results showed significant changes in gene expression and isoform usage at the individual gene level, with top-expressing isoforms associated with genes encoding cellular processes such as immune response and antiviral activity (IFI44L, IL-6, MX1, TRIM5), transcription and mRNA processing (DDX10, HNRNPA3F6, JMJD7, ZNF487, ZNF599), and cell cycle and survival (BCL2L2-PABPN1, CDCA3) (Fig. [118]2c, Supplementary Fig. [119]4, and Supplementary Data [120]8). We noticed that IL-6, a gene encoding a cytokine involved in acute and chronic inflammatory responses, displayed significant changes in both gene expression and isoform usage in SARS-CoV-2 infection. IL-6 expression increased by two- to sixfold with an MOI of 2 (Fig. [121]3b). To date, the Ensembl Genome Reference Consortium has identified nine IL-6 isoforms in humans, with the traditional transcript having six exons (IL6-204), five of which contain coding elements (Fig. [122]3a). NHBE cells expressed four known IL-6 isoforms, whereas A549 cells expressed one unknown and six known isoforms (Supplementary Fig. [123]5). When evaluating the actual isoforms used across conditions, SARS-CoV-2-infected NHBE cells used three out of four isoforms observed, whereas SARS-CoV-2-infected A549 cells used all seven observed isoforms. For example, in the case of NHBE SARS-CoV-2 samples, the IF for the IL6-201 isoform = 0.75, IL6-204 = 0.05, IL6-206 = 0.09, and IL6-209 = 0.06, and the sum of these IF values = 0.95, or 95% usage of the IL-6 gene relative to mock. SARS-CoV-2 samples (A549 MOI 0.2, A549 MOI 2, and NHBE MOI 2) exhibited exclusive usage of non-canonical isoform IL6-201 (Fig. [124]3c) and, inversely, mock samples almost exclusively utilized the IL6-204 transcript. In NHBE-infected cells, isoform IL6-201 experienced a significant increase in usage (dIF = 0.75) and IL6-204 a significant decrease in usage (dIF = −0.95) when compared to mock conditions. Similarly, isoform IL6-201 in A549-infected cells experienced an increase in usage (dIF = 0.58), whereas uses of all other isoforms remained nonsignificant in comparison to mock conditions. Fig. 3. Isoform usage of IL-6 transcripts in SARS-CoV-2-infected cells. [125]Fig. 3 [126]Open in a new tab a IL6-204 is the major IL-6 transcript and is composed of 6 exons, five (E2, E3, E4, E5, E6) containing coding sequences (CDS) and one (E1) containing exclusively a 5′-untranslated region (5′-UTR). Both isoforms (IL6-204 and IL6-201) have the same protein-coding capability. The main difference between them is the absence of E1 in IL6-201, which is the major induced isoform upon SARS-CoV-2 infection. b Gene expression of IL-6 in all SARS-CoV-2 cell line samples (A549 multiplicity of infection (MOI) 0.2 and 2; Calu-3 and NHBE MOI 2). Each boxplot represents three biological replicates and statistical testing was performed with DESeq2 (detailed in “Methods” section). Exact p-values are available in Supplementary Data [127]2. c Isoform usage switch between both isoforms in SARS-CoV-2-infected cell line samples. This figure shows that IL6-204 is almost exclusively expressed in uninfected (mock) cells, whereas IL6-201 is almost exclusively expressed in SARS-CoV-2-infected cells. Each boxplot represents three biological replicates and statistical testing was performed with IsoformSwitchAnalyzeR and exact q-values are available in Supplementary Data [128]8. Source data for Fig. 3 is provided in Supplementary Data [129]18. The IL6-201 and IL6-204 isoforms both contain five coding exons, and according to Ensembl, both are predicted to produce the same 212 amino acid protein product. The main difference between both isoforms is that IL6-201 does not contain exon 1 (5′-untranslated region, 5′-UTR), which is present in IL6-204. The 5′-UTRs are traditionally involved in translational regulation, either promoting or inhibiting translation, depending upon the sequence and secondary RNA structure^[130]34,[131]35 or modulating mRNA stability^[132]36. Thus, this isoform switch may be a mechanism to regulate IL-6 protein synthesis through control of translation rate and/or mRNA stability. Overexpression of TE families close to immune-associated genes upon SARS-CoV-2 infection TEs are repeated DNA sequences that are able to spread across the genome, representing around two-thirds of the human genetic material^[133]37. TEs can be grouped into two different classes regarding their transposition mechanisms: (1) DNA elements, which are mobilized via a DNA molecule and make up around ~3% of the human genome^[134]38, and (2) retrotransposons, which have an RNA intermediate. Retrotransposons can be further divided into long-term repeat (LTR) elements, also named endogenous retroviruses, which account for ~8% of the human genome, or long and short interspersed nuclear elements (LINEs and SINEs) and SINE-VNTR-Alu elements, which lack LTRs and are the most abundant superfamilies in the human genome, accounting for around one-third of DNA sequences^[135]38. Although the human genome is bursting with TEs, most TE families are unable to transpose, either because they lost their transpositional machinery or because they have accumulated mutations that hinder their activity. There are only three TE families currently active in the human genome: LINE1, Alu (SINE) subfamily, and SVAs^[136]39. Nevertheless, the graveyard of dead TEs in the human genome has been repeatedly shown to regulate host gene expression, thus participating in key developmental and immune networks^[137]40–[138]42. Therefore, searching for TE deregulation upon viral infection might shed light into activation of young TE families, but also pinpoint changes in gene regulatory networks. To estimate the expression of TE families and their possible roles in SARS-CoV-2 infection, we mapped the RNA-seq reads against all annotated TE human families (see “Methods” section) and detected differentially expressed TE (DETE) families (Fig. [139]2d and Supplementary Data [140]9). We found 68 common TE families upregulated in SARS-CoV-2-infected A549 and Calu-3 cells (MOI 2), including 53 retrotransposons (24 LINEs, 27 LTRs, and two SINEs). It is important to note that none of the current transpositionally active human TE families were found to be upregulated in SARS-CoV-2-infected cells. From this list, we excluded all TE families detected in cells infected with the other viruses. This allowed us to identify 16 families that were specifically upregulated in Calu-3 and A549 cells infected with SARS-CoV-2, and not in the other viral infections. The 16 families identified are MER77B, MamRep4096, MLT2C2, PABL_A, Charlie9, MER34A, L1MEg1, LTR13A, L1MB5, MER11C, MER41B, LTR79, THE1D-int, MLT1I, MLT1F1, and MamRep137. Most of the TE families uncovered are ancient elements, and none are capable of transposing^[141]43–[142]45. Eleven of the 16 TE families specifically upregulated in SARS-CoV-2-infected cells are LTR elements and include well-known TE immune regulators. For instance, MER41B (primate-specific TE family) is known to contribute to IFN-γ-inducible binding sites (bound by STAT1 and/or IRF1)^[143]46,[144]47. Other LTR elements are also enriched in STAT1-binding sites (MER77B, LTR13, and MLT1l)^[145]46 or have been shown to act as cellular gene enhancers (LTR13A^[146]48,[147]49). In humans, TEs have been shown to accumulate in mammalian-specific gene regulatory sequences, such as within immunity-related gene transcripts^[148]50. Given that at least four TE families identified are well-known host–gene regulators, along with the general ability of TE families to impact nearby gene expression, we further investigated the functional enrichment of genes near these upregulated TE families (Supplementary Data [149]10). The GREAT method used for this analysis extends the regulatory domain of each annotated gene to 5 kb upstream and 1 kb downstream the transcription start site, as we still lack precise maps of gene regulatory regions^[150]51. We detected GO functional enrichment of several immunity-related terms (e.g., major histocompatibility complex (MHC) protein complex, antigen processing, regulation of dendritic cell differentiation, and T-cell tolerance induction), metabolism-related terms (such as regulation of phospholipid catabolic process), and, interestingly, a specific human phenotype term called “progressive pulmonary function impairment” (Fig. [151]2d). Even though we did not limit our search only to neighboring genes which were also DE, we found several similar (and very specific) enriched terms in both analyses, for instance, related to endosomes, endoplasmic reticulum, and vitamin (cofactor) metabolism, among others. This result supports the idea that some responses during infection could be related to TE-mediated transcriptional regulation. Finally, when we searched for enriched terms related to each one of the 16 families separately, we also detected immunity-related enriched terms such as regulation of ILs, antigen processing, TGF-β receptor binding, and temperature homeostasis. It is important to note that given the old age of some of the TEs detected, overexpression might be associated with pervasive transcription or inclusion of TE copies within unspliced introns (Fig. [152]2d). In conclusion, we were able to demonstrate that 16 TE families are upregulated specifically upon SARS-CoV-2 infection, including four TE families previously shown to harbor STAT1/IRF1-binding sites, and are enriched close to immunity-related genes. Finally, to clearly pinpoint if such TE families are responsible for nearby gene regulation, future work should focus on TE-gene chimeric transcript searches (using long read RNA-seq or paired-end reads), mapping of regulatory sequences within TE copies using chromatin-related methods such as ATAC-seq, and deletion of TE copies followed by analysis of gene expression. The SARS-CoV-2 genome is enriched in binding motifs for 40 human proteins, most of them conserved across SARS-CoV-2 isolates The SARS-CoV-2 virus possesses a positive-sense, single-stranded, monopartite RNA genome. Such viruses are well-known to co-opt host RBPs for diverse processes including viral replication, translation, viral RNA stability, assembly of viral protein complexes, and regulation of viral protein activity^[153]14,[154]15. Therefore, we sought to predict host RBPs that may form functionally significant interactions with the SARS-CoV-2 genome. To do so, we first filtered the AtTRACT database^[155]52 to obtain a list of 102 human RBPs and 205 associated position weight matrices (PWMs) describing the experimentally determined sequence-binding preferences of these proteins. We then scanned the SARS-CoV-2 reference