Abstract The proteins and RNAs of viruses extensively interact with host proteins after infection. We collected and reanalyzed all available datasets of protein-protein and RNA-protein interactions related to SARS-CoV-2. We investigated the reproducibility of those interactions and made strict filters to identify highly confident interactions. We systematically analyzed the interaction network and identified preferred subcellular localizations of viral proteins, some of which such as ORF8 in ER and ORF7A/B in ER membrane were validated using dual fluorescence imaging. Moreover, we showed that viral proteins frequently interact with host machinery related to protein processing in ER and vesicle-associated processes. Integrating the protein- and RNA-interactomes, we found that SARS-CoV-2 RNA and its N protein closely interacted with stress granules including 40 core factors, of which we specifically validated G3BP1, IGF2BP1, and MOV10 using RIP and Co-IP assays. Combining CRISPR screening results, we further identified 86 antiviral and 62 proviral factors and associated drugs. Using network diffusion, we found additional 44 interacting proteins including two proviral factors previously validated. Furthermore, we showed that this atlas could be applied to identify the complications associated with COVID-19. All data are available in the AIMaP database ([39]https://mvip.whu.edu.cn/aimap/) for users to easily explore the interaction map. Keywords: SARS-CoV-2, Protein-interactome, RNA-Interactome, Drug repurposing, Protein localization Graphical abstract [40]Image 1 [41]Open in a new tab Highlights * • An integrated atlas of interactions between SARS-CoV-2 RNA/proteins and host proteins. * • SARS-CoV-2 may disrupt protein processing in ER through protein-protein interactions. * • The core stress granule factors are the main targets of SARS-CoV-2 RNA and protein N. * • The atlas could be used to understand viral protein localizations and complications. 1. Introduction The global COVID-19 pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus is still ongoing and has led to more than 500 million infections and more than 6 million deaths since its first report more than two years ago ([42]Wu et al., 2020), posing a series of threats to human security. Currently, approved antiviral drugs targeting SARS-CoV-2 are still scarce ([43]Edwards et al., 2022), and vaccines remain the primary option for fighting against the pandemic. However, the emergence of highly-transmitted and highly-pathogenic SASR-CoV-2 variants, such as the Delta and Omicron variants, continuously challenge the efficiency of vaccines ([44]Araf et al., 2022; [45]Harvey et al., 2021; [46]Li et al., 2022; [47]Mlcochova et al., 2021). Drug repurposing is an effective strategy to quickly identify drugs to fight against SARS-CoV-2. However, it requires knowing targetable host proviral proteins, especially those directly interacting with viral macromolecules (proteins and RNAs). Generally, there are three strategies to identify the host proteins involved in interactions between SARS-CoV-2 and the host cells. The first method is using CRISPR screening to identify host proteins that are functionally essential for SARS-CoV-2 infection ([48]Baggen, Persoons, et al., 2021; [49]Biering et al., 2022; [50]Daniloski et al., 2021; [51]Flynn et al., 2021; [52]Hoffmann et al., 2021; [53]Schneider et al., 2021; [54]Wang et al., 2021; [55]Wei et al., 2021; [56]Zhu et al., 2021), based on effectively repressing or activating a specific host gene ([57]Konermann et al., 2015; [58]Sanjana et al., 2014; [59]Shalem et al., 2014). The second strategy is to detect differentially expressed genes at RNA or protein level after SARS-CoV-2 infection using high-throughput RNA sequencing or mass spectrometry ([60]Bojkova et al., 2020; [61]Stukalov et al., 2021). The third strategy is identifying host proteins that physically interact with viral proteins and RNAs using affinity purification (AP), proximity labeling (PL), or RNA antisense purification coupled with mass spectrometry. The proteins identified from these methods are candidates that merit further examination of antiviral effects. In detail, the affinity-purification coupled mass spectrometry (AP-MS) can identify interacting proteins (protein-interactome) of a specific protein fused to an affinity tag, and has been widely used for different SARS-CoV-2 proteins recently ([62]Chen, Wang, et al., 2021; [63]Davies et al., 2020; [64]Gordon et al., 2020a, [65]2020b; [66]Jiang et al., 2020; [67]Kruse et al., 2021; [68]Li et al., 2021; [69]Liu et al., 2021; [70]Nabeel-Shah et al., 2022; [71]Shin et al., 2020; [72]Slavin et al., 2021; [73]Stukalov et al., 2021). The proximity-labeling coupled mass spectrometry (PL-MS) has also been extensively adopted to identify viral protein-interactomes based on fusing with a biotin ligase ([74]Chen, Wang, et al., 2021; [75]Laurent et al., 2020; [76]Liu et al., 2021; [77]Meyers et al., 2021; [78]Samavarchi-Tehrani et al., 2020; [79]St-Germain et al., 2020; [80]Zhang, Shang, et al., 2022), such as BioID ([81]Roux et al., 2012) and engineered BioID including TurboID and miniTurbo ([82]Branon et al., 2018). The biotin ligases can catalyze biotinylation of nearby proteins within a radius of 10 nm ([83]Kim et al., 2014), from which the biotinylated proteins can then be captured with streptavidin beads. It has been unclear whether these methods detect similar interactions. Specifically, the RNA antisense purification coupled mass spectrometry (RAP-MS) ([84]Lee et al., 2021; [85]Schmidt et al., 2021) and comprehensive identification of RNA binding proteins by mass spectrometry (ChIRP-MS) ([86]Flynn et al., 2021; [87]Zhang, Huang, et al., 2022) can identify host and viral proteins binding to viral RNAs (RNA-interactome). Both methods chemically crosslink cells, where RAP-MS uses 254 nm UV light and ChIRP-MS uses 3% PFA to fix in vivo RNA-protein interactions, and then purify the proteins using RNA antisense probes against viral RNAs. Recently, another two methods, viral cross-linking and solid-phase purification (VIR-CLASP) ([88]Kim et al., 2020) and viral RNA interactome capture (vRIC-MS) ([89]Kamel et al., 2021), were developed to specifically explore viral RNA-interactome, using 365 nm UV light to crosslink 4-Thiouridine (4sU)-labeled RNA with proteins. It is worth noting that VIR-CLASP could capture RNA-interactome in the earliest period of infection, while other methods could only efficiently capture RNA-interactome after viruses are massively amplified in the cells. These methods, except VIR-CLASP, have been used to detect SARS-CoV-2 RNA-interactome. However, an integrated map of these interactions from all previous studies is lacking. Recently, several works summarized a few studies related to the interactomes ([90]Baggen, Vanstreels, et al., 2021; [91]Haas et al., 2021; [92]Kolinski et al., 2022; [93]Terracciano et al., 2021). However, none of them did an intensive analysis of these data. Here, we collected and systematically investigated all available datasets on the protein-interactome and RNA-interactome of SARS-CoV-2 from published studies. We identified a high-confident interaction network of SARS-CoV-2 macromolecules and host proteins. We further characterized the interactions and proteins by integrating CRISPR screening and protein localization data, and we experimentally validated a few predictions from our analyses using multiple assays. This atlas of interactions would be a valuable resource in providing clues to understand the molecular mechanisms of viral infection and for developing antiviral strategies and drug repurposing. 2. Results 2.1. Strategies and data sources for the SARS-CoV-2 interactomes We searched PubMed and ProteomeXchange databases and collected all published data by January 2022 on protein-interactome ([94]Chen, Wang, et al., 2021; [95]Davies et al., 2020; [96]Gordon et al., 2020a, [97]2020b; [98]Jiang et al., 2020; [99]Kruse et al., 2021; [100]Laurent et al., 2020; [101]Li et al., 2021; [102]Liu et al., 2021; [103]Meyers et al., 2021; [104]Nabeel-Shah et al., 2022; [105]Samavarchi-Tehrani et al., 2020; [106]Shin et al., 2020; [107]Slavin et al., 2021; [108]St-Germain et al., 2020; [109]Stukalov et al., 2021; [110]Zhang, Shang, et al., 2022) and RNA-interactome ([111]Flynn et al., 2021; [112]Kamel et al., 2021; [113]Lee et al., 2021; [114]Schmidt et al., 2021; [115]Zhang, Huang, et al., 2022) for SARS-CoV-2 using different methods including AP-MS, PL-MS, RAP-MS, vRIC-MS, and ChIRP-MS. By integrating these data, we aimed to build an atlas of interactions between SARS-CoV-2 macromolecules and host proteins, a vital component of the knowledge map of virus-host interactions ([116]Fig. 1A). We also curated all detailed meta-information about cell lines, tag types, bait proteins, and dataset identifiers for all the collected datasets ([117]Table 1 and [118]Fig. 1B). Fig. 1. [119]Fig. 1 [120]Open in a new tab Overview of the approaches for systematically analyzing viral protein and RNA interactomes. (A) Schematic of strategies for exploring protein-protein interaction and RNA-protein interaction between SARS-CoV-2 and host. (B) Matrix-like plot showing the viral protein baits information in each study of AP-MS and PL-MS. The datasets with processed results were labeled with red circles on the right. Table 1. Description of the datasets. ID Interactome Method Cell line Strategy Bait Dataset 1 Protein-protein AP-MS A549 His tag N PXD023989 2 GFP tag NSP3 PXD018983 3 HA tag 24 proteins PXD020222 4 HEK293 Strep tag NSP1-2, N PXD023487 5 Strep tag 26 proteins PXD018117 6 Strep tag NSP16 PXD021588 7 Strep tag 29 proteins MSV000087035 8 FLAG tag NSP2, NSP4 PXD022017 9 FLAG tag 18 proteins [121]Li et al. (2021) 10 Biotin tag ORF9B PXD019803 11 GFP tag 27 proteins MSV000086704 12 SFB tag 29 proteins PXD023209 13 HeLa YFP tag N PXD025410 14 PL-MS A549 miniTurbo 27 proteins [122]Samavarchi-Tehrani et al. (2020) 15 HEK293 BioID 14 proteins MSV000086006 16 BioID 17 proteins PXD023277 17 BioID 28 proteins [123]Laurent et al., 2020 18 BioID 29 proteins PXD023209 19 BioID 29 proteins MSV000087035 20 TurboID 27 proteins PXD022086 21 RNA-protein RAP-MS VeroE6 IVT RNA probes +ssRNA PXD024808 22 Huh7 67 DNA probes +ssRNA MSV000085734 23 vRIC-MS Calu-3 (dT)25 probe 4sU ​+ssRNA PXD023418 24 ChIRP-MS VeroE6 108 DNA probes +ssRNA [124]Flynn et al. (2021) 25 Huh7.5 108 DNA probes +ssRNA [125]Flynn et al. (2021) 26 160 DNA probes +ssRNA [126]Zhang, Huang, et al. (2022) [127]Open in a new tab AP-MS: affinity-purification coupled mass spectrometry; PL-MS: proximity-labeling coupled mass spectrometry; RAP-MS: RNA antisense purification coupled mass spectrometry; vRIC-MS: viral RNA interactome capture coupled mass spectrometry; ChIRP-MS: comprehensive identification of RNA binding proteins by mass spectrometry. As shown in [128]Table 1, the protein-interactome of SARS-CoV-2 was mainly captured by AP-MS or PL-MS. Most AP-MS assays were conducted using viral protein bait fused with an affinity tag, and seven types of affinity tags were commonly used. Besides, Jiang et al. used in vitro purified biotinylated viral proteins as baits to capture interacting proteins under in vitro incubation conditions ([129]Jiang et al., 2020). In PL-MS assays, three types of biotin ligases were mainly used: BioID, TurboID, and miniTurbo. SARS-CoV-2 encodes 29 viral proteins and most studies were dedicated to exploring all viral proteins simultaneously; however, not every viral protein could be successfully expressed, and thus not all proteins had data (grey squares in [130]Fig. 1B). In addition, some studies also expressed mutated viral proteins as baits ([131]Fig. 1B). Notably, two studies explored the interactions under stress conditions: IFN-α treatment ([132]Slavin et al., 2021) and poly(I:C) treatment ([133]Laurent et al., 2020). In the studies to capture SARS-CoV-2 RNA-interactome using RAP-MS, ChIRP-MS, and vRIC-MS, only the positive-strand RNAs were targeted as baits in all datasets from three host cell types ([134]Table 1), probably because the abundance of negative-strand RNAs was much lower than that of positive-strand RNAs. In RAP-MS and ChIRP-MS, the antisense probes should cover the full viral genomes to achieve the best capture efficiency; however, the cost of synthesizing a large number of biotinylated probes made it difficult, especially for SARS-CoV-2 with gRNA length reaching 29 kb. To solve this difficulty, Lee et al. used in vitro transcription to synthesize biotinylated tiling probes ([135]Lee et al., 2021). 2.2. Integrated virus-host protein-protein interaction network We finally acquired processed protein-interactome data from 7 AP-MS assays and 7 PL-MS assays and the numbers of detected viral protein-host protein interactions varied significantly between different studies ([136]Fig. 2A). Comparing the presences of all interactions in different datasets, we found that there were only a few interactions being captured multiple times, whereas most interactions appeared specifically in a certain dataset ([137]Fig. S1A). We counted the number of occurrences of each protein-protein interaction and found that 93.7% of the interactions in AP-MS were unique, compared to 77% in PL-MS ([138]Fig. 2B and C). Fig. 2. [139]Fig. 2 [140]Open in a new tab Summary ofSARS-CoV-2protein-host protein interactions. (A) The number of viral protein-host protein interactions from different studies. (B–C) Pie charts showing the proportion of virus protein-host protein interactions from different studies, which were identified by AP-MS (B) and PL-MS (C) experimental strategy, respectively. (D) The network of virus protein-host protein interactions identified by AP-MS (AP-interactome). Hexagon yellow nodes represent viral proteins. Circular blue nodes represent host proteins. Edges indicate virus-host protein interactions. Edge thickness represents the credibility level of interactions. We classified the interactions by their occurrences in AP-MS and PL-MS, respectively. For the 9 interactions with 4 or more occurrences in AP-MS ([141]Fig. 2B), 6 interactions had evidence from the literature (ORF3A-CLCC1 ([142]Chen, Wang, et al., 2021), ORF3A-VPS39 ([143]Chen, Zheng, et al., 2021; [144]Miao et al., 2021; [145]Zhang et al., 2021), ORF3A-VPS11 ([146]Chen, Wang, et al., 2021), N-G3BP1 ([147]Chen, Wang, et al., 2021; [148]Kruse et al., 2021), N-G3BP2 ([149]Chen, Wang, et al., 2021), ORF9B-TOMM70 ([150]Gao et al., 2021)). We identified 25 interactions with 5 or more occurrences in PL-MS, three of which had literature confirmations: ORF3A-VPS39, ORF3A-CLCC1, and ORF9B-TOMM70 ([151]Fig. 2C). Unexpectedly, the ORF9B-MAVS interaction was reported as non-direct interaction ([152]Wu et al., 2021). We counted the number of interactions with viral proteins by host protein and found that the proportion and frequency of such interactions in PL-MS were much larger than those in AP-MS ([153]Fig. S1B). These results implicated that the identified interactions may originate from both direct and indirect protein-protein interactions due to space proximity to the bait protein. Based on the assumption that the larger the frequency of occurrences, the higher the probability of being authentic for the interaction, we empirically defined a minimum threshold of 2 occurrences for AP-MS and of 3 occurrences for PL-MS (without counting potential replicates in an individual study) to identify reliable interactions. Under these cutoffs, 93.7% of interactions in AP-MS whose occurrence ≤1 were removed, and 93.9% of interactions in PL-MS whose occurrences ≤2 were removed. The remaining interactions were used to construct the protein-interactome of SARS-CoV-2 for further analysis. In AP-MS supported protein-interactome (AP-interactome), there were 298 interactions containing 26 viral proteins and 234 host proteins ([154]Fig. 2D and [155]Supplementary Table 1). In PL-MS supported protein-interactome (PL-interactome), there were 1086 interactions containing 24 viral proteins and 538 host proteins ([156]Fig. S2 and [157]Supplementary Table 1). 2.3. Comparison between AP- and PL-interactomes and viral protein localization Since PL-MS results may reflect different characteristics from AP-MS, we first counted the overlapped interactions between AP-interactome and PL-interactome. We found that there were a few common interactions between AP-interactome and PL-interactome for all or individual viral proteins ([158]Fig. 3A). In the case of ORF3A, for which the number of AP-interactome was close to that of PL-interactome, the proportion of the overlapped interactions was small ([159]Fig. 3A). The results confirmed that two methods AP-MS and PL-MS have different focuses, suggesting that we could not simply merge AP-interactome and PL-interactome and need to explore them separately. Fig. 3. [160]Fig. 3 [161]Open in a new tab Comparison between viral proteinAP-interactomeand PL-interactome. (A) The number of interacting proteins of each viral protein identified in AP-interactome, PL-interactome, or both. (B–C) Localization analysis of viral proteins based on host proteins in AP-interactome (B) and PL-interactome (C). Previously validated proteins localizing in the membrane are marked in blue dotted boxes and proteins validated in this study are labeled with red dotted boxes. (D) Western blotting of expressed mCherry-tagged viral proteins (left) and eGFP-tagged marker proteins (right). (E-H) Representative fluorescence images showing co-localization between ORF7A (E)/ORF7B (F)/ORF8 (G)/ORF10 (H) (red) and LBR/SEC61B (green). The plots on the right showed the fluorescence intensity along the region of interest (ROI). Since AP-MS tends to reveal potential direct interactions between proteins, we performed protein domain analysis on the host proteins from the AP-interactome. We found that some viral proteins frequently interacted with host proteins containing the same type of domains ([162]Fig. S3A). For example, ORF3A and NSP7 could specifically interact with the small GTP-binding protein domain, while NSP6, ORF3A, M, and ORF7B all could interact with the cation-transporting P-type ATPase domain. These domain-interacting signatures reflect the structural bases of those virus-host protein-protein interactions. Considering two interacting proteins tend to have similar subcellular localization, we next used the PL-interactome and AP-interactome to analyze the localization of viral proteins. We used the protein localization data from a BioID proximity map of HEK293 proteome, also named [163]humancellmap.org ([164]Go et al., 2021), which divided cells into 20 compartments and 4145 host proteins were mapped into individual compartments. We mapped the AP-interactome and PL-interactome of viral proteins to these 20 compartments and obtained their probable subcellular localizations ([165]Fig. 3B and C). We found that the most significantly enriched localization of PL-interactome was the nuclear outer membrane-ER membrane network ([166]Fig. 3C), while the most enriched one of AP-interactome was ER lumen ([167]Fig. 3B). We found that the interactome of NSP4, NSP6, M, ORF7A, and ORF7B significantly enriched the term of nuclear outer membrane-ER membrane network ([168]Fig. 3C), consistent with that M protein was a well-known membrane protein and recent findings showing that NSP4 and NSP6 were membrane proteins of double-membrane vesicles (DMVs) ([169]Ricciardi et al., 2022). The results suggest that ORF7A and ORF7B were most likely proteins preferred to localize at the membrane, especially ORF7A containing a potential transmembrane domain ([170]Samavarchi-Tehrani et al., 2020). In addition, we found that the interactome of ORF8 and ORF10 (weakly) specifically enriched the GO-term ER lumen ([171]Fig. 3B and C). These results are consistent with that ORF8 contains a potential ER signal peptide ([172]Samavarchi-Tehrani et al., 2020), and that ORF8 and ORF10 were recently shown to colocalize with ER ([173]Liu et al., 2021). To validate the localization data, we investigated the cellular localization of ORF7A, ORF7B, ORF8, and ORF10. The mCherry-tagged fusion proteins were constructed, all of which expressed products near their theoretical sizes, except ORF8 showing another band lower than its full-length product ([174]Fig. 3D left and [175]Figs. S3B–C left). We used eGFP-SEC61B and eGFP-LBR to mark the endoplasmic reticulum (ER) and nuclear envelope, respectively ([176]Fig. 3D right and [177]Figs. S3B–C right). We found that ORF7A, ORF7B, and ORF8 co-localized with SEC61B and LBR, while ORF10 was more diffused in the cytosol and partially colocalized with SEC61B ([178]Fig. 3E–H and [179]Figs. S4A–D). These results suggested that ORF7A, ORF7B, and ORF8 could locate in the ER-nuclear membrane network, confirming our localization analysis based on interactomes. However, limited by the imaging resolution, it was hard to distinguish ER membrane and ER lumen by confocal microscopy. Among SARS-CoV-2 proteins, ORF7B and ORF10 had not been well characterized probably due to that their sizes (43 and 38 amino acids, respectively) were too small for most techniques. Understanding how ORF7B and ORF10 are localized to corresponding subcellular compartments would help to investigate their functions in the viral lifecycle. 2.4. A high-confident viral protein network and its functional characterization We further built a high-confident viral protein-host protein interaction network using the results of AP-interactome and PL-interactome by further requiring the interaction to satisfy at least one of the following characteristics: 1) interactions between viral proteins and host complexes in AP-interactome or PL-interactome; 2) interactions between viral proteins and host proteins with the same protein domain in AP-interactome ([180]Fig. S3A); 3) interactions with 3 or more occurrences in AP-interactome; 4) interactions with 3 or more occurrences in PL-interactome and also appearing in AP-interactome. There are 261 interactions in this network consisting of 19 viral proteins and 152 host proteins ([181]Fig. 4). Fig. 4. [182]Fig. 4 [183]Open in a new tab A compendium of high-confident protein-protein interactions between SARS-CoV-2 and host. Hexagon yellow nodes represent viral proteins. Circular grey nodes represent host proteins. Edges indicate virus-host protein interactions that supported by AP-MS (blue) or PL-MS (pink). Protein complexes (and cellular component) according to GO-term analysis and proteins sharing the identical domain are enclosed within rounded quadrilateral. Edge thickness represents the strength of interactions. In this high-confident network, some interactions have evidence from crystal structures, such as NSP1-alpha DNA polymerase:primase complex ([184]Kilkenny et al., 2022) and ORF9B-TOMM70 complex ([185]Gao et al., 2021), while some interactions are supported by biochemical experiments, such as N-Stress granule ([186]Kruse et al., 2021) and ORF3A-HOPS complex ([187]Miao et al., 2021). Notably, there are many interesting interactions that lack in-depth studies yet, including the NSP6-ATP synthase complex, NSP6-EMC, ORF7B-SNARE complex, and NSP16-CCC complex. The ATP synthase complex, also named Complex V, is the fifth component of the oxidative phosphorylation chain and catalyzes the phosphorylation of ADP to generate ATP ([188]Neupane et al., 2019). The ER membrane complex (EMC) is an insertase and directly mediates the insertion of transmembrane domains into the ER membrane ([189]Pleiner et al., 2020). The soluble N-ethylmaleimide-sensitive factor attachment protein receptor (SNARE) complex mediates membrane recognition and fusion ([190]Yoon & Munson, 2018), and the COMMD/CCDC22/CCDC93 (CCC) complex mediates membrane protein recycling in cargo transport ([191]Singla et al., 2019). These host complexes interacting with viral proteins may play important roles in SARS-CoV-2 infection and replication. Moreover, we found that the viral proteins in this high-confident network mostly interact with host proteins in two biological processes: protein processing in ER and vesicle-associated process ([192]Supplementary Table 2). Furthermore, the KEGG analysis of AP-interactome and PL-interactome also revealed that viral proteins most significantly interact with host proteins related to protein processing in ER ([193]Figs. S5A–B). Interestingly, we found that some host proteins functioning in protein processing in ER might directly interact with multiple viral proteins ([194]Fig. S5C). These results suggest that protein processing in ER could be the major pathway targeted by viral proteins, as a disorder of this pathway can induce ER stress and the unfolded protein response, which often occurs after virus infection. 2.5. An integrated viral RNA-host protein interaction network For viral RNA-host protein interactions, we curated the processed RNA-interactome data from three ChIRP-MS assays, two RAP-MS assays, and one vRIC-MS assay ([195]Table 1). The counts of interactions identified in the assays mostly ranged from 100 to 150 ([196]Fig. 5A), and there were many overlapped interactions among these assays ([197]Fig. S6A). Requiring 3 or more occurrences, we identified 90 reliable RNA-protein interactions and this number is close to the minimum (104) of the original datasets ([198]Fig. 5B). We performed protein domain analysis for the 90 host proteins and found that most proteins contain domains related to RNA binding, among which RRM domain is the most enriched domain found in 41 proteins ([199]Fig. S6B). We constructed a network of SARS-CoV-2 RNA-interactome together with known protein-protein interactions and revealed several complexes of those proteins ([200]Fig. 5C and [201]Supplementary Table 1). Fig. 5. [202]Fig. 5 [203]Open in a new tab Overview ofSARS-CoV-2RNA-host protein interactions. (A) The number of viral RNA-host protein interactions from different studies. (B) Pie chart showing the proportion of viral RNA-host protein interactions from different studies. (C) A compendium of SARS-CoV-2 RNA-interactome. Central nodes represent SARS-CoV-2 RNAs. Ellipses represent host proteins. Edges represent viral RNA-protein interactions (pink) and protein-protein interactions (blue). Edge thickness indicates the affinity of interactions. Protein complexes (and cellular components) according to GO-term analysis are enclosed within sectors. (D) The dynamics of SARS-CoV-2 RNA-interactomes at different time points after viral infection. (E) Venn plot showing the overlap among the viral protein N-interactome, SG&PB proteome, and SARS-CoV-2 RNA-interactome. (F) The enrichment ratio of SG proteins (G3BP1, IGF2BP1, and MOV10)-bound NSP12 and N RNAs in SARS-CoV-2 infected cells using RNA immunoprecipitation followed by qRT-PCR. The relative RNA levels were normalized with ACTB RNA and the enrichment ratios were normalized with IgG samples. (G) Co-immunoprecipitation between viral N protein and the SG marker proteins (G3BP1, IGF2BP1, and MOV10) in SARS-CoV-2 infected cells. (H) A competitive model of SARS-CoV-2 RNA and N protein for the assembly of stress granule. We observed that SARS-CoV-2 positive-strand RNA(s) interacted with proteins in the ribosomal subunits, eIF4F complex, and 48S preinitiation complex, which are involved in translation initiation or active translation. Interestingly, SARS-CoV-2 RNA(s) interact with RNA stability-related proteins, like CRD-mediated stability complex and RNA-induced silencing complex (RISC). Notably, the RNA(s) most significantly interact with stress granule proteins, suggesting that SARS-CoV-2 RNA may be localized in stress granule (SGs), which are cytosolic subcellular compartments where translation-stalled RNAs are localized ([204]Youn et al., 2019). In addition, the viral RNA-interacted proteins include PP1 phosphatase, hnRNPs, nuclear speckle proteins, and other RNA metabolism-related proteins. We further analyzed the overlapped proteins between SARS-CoV-2 RNA-interactome and protein-interactome, and found that 16 host proteins interacted with both viral RNAs and proteins ([205]Fig. S6C). We analyzed the dynamic changes of SARS-CoV-2 RNA-interactome at different time points after virus infection using the time-course data from Flynn et al. including ChIRP-MS data at 24 h p.i and 48 h p.i in Huh7 and VeroE6 cells, respectively. We found that viral RNA interacting proteins at 24 h p.i were almost captured at 48 h p.i in both Huh7 and VeroE6 cells ([206]Fig. 5D). There were also more interacting proteins at 48 h p.i. than at 24 h p.i., especially in Huh7 cells, where over half of the interacting proteins at 48 h p.i were not observed at 24 h p.i ([207]Fig. 5D). These results indicated that the SARS-CoV-2 RNA-interactome increased along with infection time, and there were many same interactions taking place at different stages and in different cell lines, suggesting their potential functionalities in viral infection. It was reported that the N protein could inhibit the stress granule assembly by competitively binding to G3BP1 and G3BP2 ([208]Zheng et al., 2021); and intriguingly, under non-viral conditions, transfection of SARS-CoV-2 RNA alone induced stress granule assembly ([209]Zheng et al., 2021). We thus further analyzed another more complete set of SG & P-body (PB)-related proteins (SG&PB) identified recently by in vivo BioID mediated proximity-mapping ([210]Youn et al., 2018) and data curation ([211]Youn et al., 2019). We found that N protein is the most significant viral protein interacting with SG&PB ([212]Figs. S6D–E). The N protein interacts with 20 SG&PB proteins, accounting for 80% (20/25) of the N-interactome ([213]Fig. 5E bottom and [214]Supplementary Table 3). Especially, 18 of the 20 proteins are core SG&PB proteins ([215]Fig. 5E top and [216]Supplementary Table 3), suggesting that N primarily interacts with the core proteins and supporting that N could disrupt SG assembly in the recent study. The SARS-CoV-2 RNA interacted with 55 SG&PB proteins, accounting for 61% (55/90) of the RNA-interactome ([217]Fig. 5E bottom) and 28 of them were core proteins ([218]Fig. 5E top). Specifically, both N and viral RNA interacted with G3BP1 and G3BP2, markers of stress granule rather than P-body. To verify whether N and viral RNA interacted with SG under SARS-CoV-2 infection, we used G3BP1, IGF2BP1, and MOV10 as SG markers to perform RNA Immuno-precipitation (RIP) and Co-IP experiments ([219]Fig. 5E). We found that under SARS-CoV-2 infection, G3BP1, IGF2BP1, and MOV10 all interacted with viral RNAs including NSP12 and N RNAs ([220]Fig. 5F), and the three SG core proteins also all interacted with the viral N protein ([221]Fig. 5G and [222]Figs. S6F–G). Especially, G3BP1, an essential factor in mediating SG formation, most strongly interacts with SARS-CoV-2 RNA. These results further supported that viral RNA could localize into stress granules under SARS-CoV-2 infection, based on which we proposed a model that SARS-CoV-2 RNA induced SG assembly by co-binding with multiple SG proteins resulted in stalled viral RNA translation; however, N protein could disrupt SG assembly through competitively binding with SG core proteins to ensure viral RNA active translation ([223]Fig. 5H). 2.6. Interactome-based identification of proviral/antiviral factors and associated diseases The virus-interacting host proteins identified from AP-interactome, PL-interactome, and RNA-interactome showed ubiquitous expression and no strong tissue specificity ([224]Figs. S7A–C). Interestingly, compared to the non-interactome control proteins, the interactome proteins exhibited more residues to be phosphorylated and ubiquitinated ([225]Fig. S7D). To assess the effect of the interactome proteins on viral lifecycle, we used available genome-wide CRISPR screening data on SARS-CoV-2. Integrating the Z-scores or fold-change values in 8 datasets from 6 studies ([226]Baggen, Persoons, et al., 2021; [227]Biering et al., 2022; [228]Daniloski et al., 2021; [229]Schneider et al., 2021; [230]Wang et al., 2021; [231]Wei et al., 2021), we computed the mean enrichment scores of CRISPR screened genes ([232]Fig. 6A). Using non-interactome proteins as a background control, we identified potential proviral proteins with mean enrichment score >0.44 and antiviral proteins score < −0.54 under 0.05 false discovery rate: 22 proviral vs. 24 antiviral proteins in AP-interactome, 43 vs. 44 in PL-interactome, and 3 vs. 24 in RNA-interactome ([233]Fig. 6A and B and [234]Supplementary Table 4). Fig. 6. [235]Fig. 6 [236]Open in a new tab Functional characteristics ofSARS-CoV-2interactomes. (A) Density map showing the distribution of mean CRISPR screening enrichment scores of host proteins after SARS-CoV-2 infection. Background 1 represents proteins that were identified but removed as background interactions. Background 2 (non-interactome) represents all other human genes that were not in the background and the SARS-CoV-2 interactome. Dashed lines indicated estimated 95% confidence interval values of the mean enrichment scores. (B) Heatmap plot showing the CRISPR screening enrichment scores of host proteins after SARS-CoV-2 infection. The numbers of provial and antiviral proteins are marked. (C) Network of predicted proviral protein-drug interactions. Circular nodes represent viral proteins (blue), viral RNA (yellow), host proteins (grey), and drugs in DGIdb (red). Edges indicate viral protein/RNA-host protein interactions (blue) and host protein-drug interactions (green). (D) Two subnetworks identified by the network diffusion approach. Circular nodes represent seed nodes (yellow) and candidate nodes (red). The size of nodes indicates the network diffusion score. (E) Disease enrichment analysis of the host proteins in SARS-CoV-2 protein AP-interactome (red), PL-interactome (green), and RNA-interactome (blue). We next used drug targeting information from the DGIdb database ([237]Cotto et al., 2018) to identify potential drugs that could target those proviral proteins defined above. Interestingly, some proviral proteins do have known drugs ([238]Fig. 6C), including glycosylase GANAB, a reported proviral factor, and the drug MIGLUSTAT, which has shown significant antiviral effect in cultured cells ([239]Casas-Sanchez et al., 2022). The positive results implicate the efficiency and accuracy of our analysis. In addition, our results also suggest that other potential drugs, targeting the identified proviral proteins we predicted, merit further studies. We further expanded the interactome network by applying the network diffusion algorithm and identified 44 more host proteins associated with the viral interactome ([240]Supplementary Table 5). Surprisingly, some proteins found by network diffusion were reported as proviral factors, including RACK1 and EWSR1 ([241]Fig. 6D). RACK1 was found to interact with multiple flavivirus NS1 proteins and showed a proviral effect for flavivirus and SARS-CoV-2 ([242]Shue et al., 2021). EWSR1 was found to interact with NSP13 in promoting its helicase activity and showed a proviral effect for SARS-CoV-2 ([243]Zeng et al., 2022). These results suggested that using the virus-host interactome, combined with a network diffusion algorithm and known protein-drug data, can be an effective strategy to identify and repurpose antiviral drugs. To understand the molecular features of those proviral and antiviral proteins, we performed Gene Ontology analysis by viral protein-interactome (VP-interactome) and RNA-interactome, respectively. Interestingly, in VP-interactome, proviral proteins preferred to associate with COPII-coated vesicle transport ([244]Fig. S8A), and antiviral proteins favored mRNA transport ([245]Fig. S8B). In RNA-interactome, proviral proteins preferred to associate with tRNA splicing ([246]Fig. S9A), and antiviral proteins favored mRNA translation and stability regulation ([247]Fig. S9B). COVID-19 has shown to cause severe complications, and we asked whether virus-host interactome could help to explain certain symptoms or diseases associated with COVID-19. We used the gene-disease interactions in DisGeNET database ([248]Pinero et al., 2017), and counted the frequency of interactome proteins in different diseases. We found that diseases related to the brain were the most significantly enriched, such as Spinocerebellar ataxia 17 and cerebellar atrophy ([249]Fig. 6E). According to a recent clinical report, compared with other patients, the brain of COVID-19 patients underwent abnormal structure changes, such as reduction in grey matter thickness and global brain size ([250]Douaud et al., 2022). We further analyzed the RNA expression profile of 59 brain-disease related interacting proteins in multiple human tissues using GTEx portal ([251]Consortium, 2020). We found that these proteins did not show obvious brain-specific expression pattern ([252]Fig. S10A); however, the expression levels of these brain-disease related interacting proteins were significantly higher than the levels of other interacting proteins in brain tissue ([253]Fig. S10B), suggesting a potential link between SARS-CoV-2 and brain disease. Moreover, a recent study showed that SARS-CoV-2 was found in olfactory brain areas of rhesus macaques at 7 days post infection ([254]Beckman et al., 2022). Combining all the information together, we thought that SARS-CoV-2 might induce brain disease via its interactome after infection of brain tissue. These results suggested the value of using the interactomes to recognize the associated diseases or to understand the molecular mechanisms underlying specific complications. Together, we showed that virus-host interactomes could be valuable resources for discovering antiviral targets and drugs, and systematically mining the virus-host interactomes would help us better understand how SARS-CoV-2 affects host cells. We have created a web server AIMaP ([255]https://mvip.whu.edu.cn/aimap/) for users to easily explore the atlas of interactions between SARS-CoV-2 macromolecules and host proteins with variable thresholds. 3. Discussion Our study integrated all available datasets and constructed a comprehensive atlas of interactions between SARS-CoV-2 RNA/proteins with host proteins. Besides curating a larger number of datasets than similar works recently published ([256]Baggen, Vanstreels, et al., 2021; [257]Haas et al., 2021; [258]Kolinski et al., 2022; [259]Terracciano et al., 2021), we conducted more in-depth and systematic analyses of the molecular interactions than previous works. We compared the results from different studies by technique and identified a high-confident interaction network ([260]Fig. 4). Combining with other data such as host protein localization and CRISPR screening, we made a series of findings through purposefully mining the interactome. However, the atlas merits further improvements. On one hand, the confident interaction map for all viral proteins is not complete yet, as our AP-interactome lacked interactions for NSP3, NSP11, and ORF9C, and PL-interactome lacked information on NSP5, NSP7, NSP10, NSP11, and NSP13. This might be improved by re-analyzing the raw mass spectrometry data with uniform standards when the raw data are available. On the other hand, the interactions need to be validated such as mapping the direct interaction sites, although we identified enriched protein domains. Recently developed programs such as AlphaFold2 ([261]Bryant et al., 2022; [262]Jumper et al., 2021) and Rosetta ([263]Baek et al., 2021) might be used to predict the interaction sites based on protein structural modeling. Given the interaction of one pair of factors, it is still difficult to determine the biological function of the viral protein to interact with the host protein. One easy-to-understand pattern is that viral protein sequesters the host protein resulting in disturbing its normal physiological function. We performed the KEGG analysis of the host proteins interacting with the virus and found that protein processing in ER is enriched, the disorder of which would induce ER stress and unfolded protein response. It merits to examine whether SARS-CoV-2 induced such stress in patients of COVID-19. However, another pattern that interaction promotes the function of host protein is also possible. For example, viral proteins are enriched in interacting with proteins related to vesicle-associated processes, and such processes are critical for SARS-CoV-2 and may be hijacked by the virus. Our analysis validated that SARS-CoV-2 RNA and N protein interacted closely with stress granules (SGs) under SARS-CoV-2 infection, and provided candidate proteins that may play important roles in this process ([264]Supplementary Table 3). Further studies are deserved to investigate how RNA-interactome mediates SARS-CoV-2 RNA to localize into SGs and how N-interactome dissembles SGs. In analyzing the localization of viral proteins, the host protein localization data were generated in HEK293 cells, while some of the interactions in the interactome were identified in other cell lines. As the locations of some host proteins might vary between cell lines, and that SARS-CoV-2 may hijack proviral proteins from their usually localized compartments, the predicted localizations of viral proteins need to be inspected carefully or confirmed experimentally for potential hypotheses. During the revision of this manuscript, we noticed a new research article on SARS-CoV-2 protein-interactome identified by AP-MS and high-throughput yeast two-hybrid screen (Y2H) published in Nat Biotechnol on Oct. 10, 2022 ([265]Zhou et al., 2022). We curated it and integrated the protein-protein interactions into our AIMaP database in expanding the SARS-CoV-2-human interactome network. We compared this viral protein-interactome with our integrated interactome from AP-MS ([266]Fig. S10C), and presented the overlapped proteins in [267]Fig. S10D. We would continuously update the AIMaP database when new relevant data are available. We combined the atlas of virus-host molecular interactions with previous CRISPR screening results of SARS-CoV-2 and identified high-potential antiviral targets ([268]Supplementary Table 4). However, the CRISPR screenings were conducted at the cell level which might not reflect the human physiological status and have biases. Nevertheless, using the interactome with network diffusion, GO-term functional analysis, and structure-based interaction prediction would help to identify and prioritize antiviral targets, and further experiments are necessary to understand their functions and the mechanisms in viral infection. 4. Materials and methods 4.1. Fluorescent protein tagging and localization analysis SEC61B (ER membrane protein) was chosen as an endoplasmic reticulum marker and LBR (nuclear membrane protein) was chosen as a nuclear envelope marker. The marker ORFs were inserted in-frame at the C-terminus of eGFP in the pEGFP-C1 plasmid. The viral ORFs (ORF7A, ORF7B, ORF8, and ORF10) were inserted in-frame at C-terminus of mCherry in the pmCherry-C1 plasmid. Plasmids of fusion proteins were transfected into HeLa or 293T cells with lipofectamine2000 (Invitrogen, 11668019) for 24 h. The protein expression was detected by western blotting with eGFP (Proteintech, 66002-1-Ig) and mCherry (Proteintech, 26765-1-AP) antibodies. For localization analysis, HeLa cells were cultured on coverslips, fixed with 4% PFA, and permeabilized with 0.1% Triton X-100. The cell nucleus was stained with DAPI, and protein localizations were detected by autoluminescence of fluorescent protein tags. Theoretical molecular weights of fusion proteins are: mCherry-ORF7A (369 amino acids, 41.9 kDa); mCherry-ORF7B (291 amino acids, 33.4 kDa); mCherry-ORF8 (369 amino acids, 42 kDa); mCherry-ORF10 (286 amino acids, 32.6 kDa); eGFP-SEC61B (342 amino acids, 37.6 kDa); and eGFP-LBR (861 amino acids, 98.4 kDa). 4.2. Co-immunoprecipitation (Co-IP) and RNA immuno-precipitation (RIP) A clinical strain of SARS-CoV-2 (nCoV-2019BetaCoV/Wuhan/WIV04/2019) ([269]Zhou et al., 2020) was used to infect hACE2-293T cells at multiplicity of infection (MOI) of 0.1 TCID[50] unit/cell. At 24 h p.i, infected cells were scraped from dishes and washed with ice-cold 1xPBS. Cell pellets proliferated in one 100 mm dish were resuspended with 2 mL RIP/IP lysis buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 5% NP-40, and newly added protease inhibitor) and incubated on ice for 30 min. The suspensions were then sonicated for 7 cycles (30s/30s) in high-power mode (Bioruptor® Plus sonication device) at 4 °C, and centrifugated at 16,000 g for 30 min. The supernatants were than harvested for Co-IP and RIP assay by taking out 0.3 mL cell lysis, incubating with 1.5 μg antibodies for 2 h at room temperature, and then adding 25 μL washed protein G magnetic beads (Thermo Scientific). After 1.5-h ​incubation at room temperature, beads were washed with lysis buffer for 3 times with 10 min ​each. For Co-IP assay, beads were boiled with protein loading buffer followed by western blotting. For RIP assay, beads were resuspended with proteinase K buffer containing proteinase K (10 mM Tris-HCl pH 7.5, 10 mM EDTA, 0.5% SDS). After reaction at 37 °C for 30 min, RNA was extracted with TRIzol™ LS reagent (Invitrogen) and prepared for qRT-PCR. Primers include ACTB (F: tggagaaaatctggcaccacac, R: atcttctcgcggttggccttg); NSP12 (F: agaatagagctcgcaccgta, R: ctcctctagtggcggctatt); and N (F: caatgctgcaatcgtgctac, R: gttgcgactacgtgatgagg). Antibodies include IgG (Proteintech, 10284-1-AP); G3BP1 (Proteintech, 66486-1-Ig); IGF2BP1 (Proteintech, 22803-1-AP); MOV10 (Proteintech, 10370-1-AP); and N (made in Hu Lab). All experiments using infectious SARS-CoV-2 were performed in a biosafety level 3 (BSL-3) laboratory at the Wuhan Institute of Virology, Chinese Academy of Sciences. 4.3. Data resource and curation We manually retrieved and curated 25 studies associated with SARS-CoV-2 on viral protein/RNA-host protein interaction. All data resources and relevant information (including method, cell line, tags, baits, and datasets) are summarized in [270]Table 1. As the interactome data were from various sources, we mapped the viral and host gene names using an NCBI resource ([271]ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz, March 2022) to unify the names. In addition, we also manually collected and downloaded CRISPR screening data associated with SARS-CoV-2. The interactome of SARS-CoV-2 proteins in BioGRID ([272]https://thebiogrid.org/, BIOGRID-4.4.202) was downloaded and extracted ([273]Oughtred et al., 2021). The post-translational modifications (PTMs) datasets including phosphorylation, acetylation, ubiquitination, methylation, sumoylation, and O-glycosylation were downloaded from PhosphoSitePlus database at [274]https://www.phosphosite.org/homeAction ([275]Hornbeck et al., 2015). We extracted the localization information of host proteins from the Cell Map database at [276]https://humancellmap.org/([277]Go et al., 2021). Human protein sequences were downloaded from UniProt database at [278]https://www.uniprot.org/([279]Uniprot, 2021). The gene expression profiles across various tissues were downloaded from GTEx database at [280]https://www.gtexportal.org ([281]Consortium, 2020). Similarly, to facilitate the subsequent integrated analysis, we unified the gene names from various data sources using the above NCBI resource. 4.4. Weight of viral protein/RNA-host protein interactions To extract credible interactions, we computed the weight for each protein/RNA-protein interaction. Weight represents the frequency of this interaction that was identified by different study groups. The viral protein-host protein interactions identified by at least two (AP-MS) and three (PL-MS) studies were defined as AP-interactome and PL-interactome, respectively. The SARS-CoV-2 RNA-host protein interactions identified by at least three studies were preserved as RNA-interactome. 4.5. Gene Ontology (GO) and KEGG pathway enrichment analysis The interacting proteins of all viral proteins obtained from AP-MS and PL-MS were tested for enrichment of GO-terms (including biological process, cellular component, and molecular function) and KEGG pathways using the enricher function of gprofiler2 package v0.2.1 ([282]Kolberg et al., 2020) in R with default parameters. Significant GO-terms and KEGG pathways were identified at a cutoff of the p-value less than 0.05. The enrichments of the high-confident host proteins interacting with SARS-CoV-2 RNA were similarly tested for three GO categories and the KEGG pathways. 4.6. Protein domain and localization analysis To understand the binding domains of viral proteins, we performed domain analysis for viral protein AP-interactome. We first extracted the protein sequences from the UniProt database. Then, we performed domain annotation on these host proteins using InterProScan software v5.55–88.0 ([283]Jones et al., 2014). Finally, we counted the number of host proteins sharing the same domains. Only those domains shared by at least two host proteins were used for further analysis. To understand the subcellular localizations of viral proteins, we counted the number of proteins that localized to the same subcellular compartments defined in the Cell Map database ([284]Go et al., 2021). 4.7. Disease ontology enrichment analysis We performed disease enrichment of viral protein/RNA-interactome using the enricher function of DOSE packages v3.18.3 ([285]Yu et al., 2015) in R with default parameters, based on the DisGeNET database at [286]http://www.disgenet.org ([287]Pinero et al., 2017). Only enriched disease ontology terms with p-value < 0.05 were used. 4.8. Drug-protein interactions Host proteins in viral protein/RNA-interactome were examined for known chemical compound interactions through the Drug-Gene Interaction database ([288]Cotto et al., 2018) (DGIdb at [289]https://dgidb.org/, downloaded in Feb 2022). 4.9. Network diffusion analysis To systematically identify other host proteins associated with virus RNA/proteins beyond our identified interactome, we applied the network diffusion method (based on random walk) as implemented in NetCore software ([290]Barel & Herwig, 2020). Briefly, for network diffusion with a restart procedure, we used the PPI network extracted from the human reference protein interactome mapping project (HuRI) at [291]http://www.interactome-atlas.org/([292]Luck et al., 2020). This network held approximately 53,000 binary protein-protein interactions and served as a reference map of human protein interactome. The SARS-CoV-2 viral protein-interactome was used as the source seed nodes, for which the signal diffusion weights were set to 1 (for other nodes set to 0). To exclude the node degree bias in the PPI network, we used the node coreness method to normalize the random walk matrix. In addition, to reduce the number of potentially false predictions, the restart probability was set to 0.8. The sub-networks containing seed nodes were identified using NetCore's semi-supervised approach. 4.10. Network generation and visualization The PPI networks associated with viral protein/RNA were visualized in Cytoscape v3.8.2 ([293]Shannon et al., 2003). Host physical interactions were extracted from String database v11.5 at [294]https://cn.string-db.org/([295]Szklarczyk et al., 2021). The groupings by protein domain, localization, protein complex and biological processes were derived from the analyses of protein domains, localization, and GO-term enrichment, respectively. Author contributions Guangnan Li: Conceptualization, Methodology, Investigation, Writing - Original Draft, Visualization; Zhidong Tang: Conceptualization, Methodology, Software, Formal analysis, Data Curation; Weiliang Fan: Software, Resources, Visualization; Xi Wang: Investigation; Li Huang: Validation; Yu Jia: Validation; Manli Wang: Resources; Zhihong Hu: Writing - Review & Editing, Resources, Funding acquisition; Yu Zhou: Conceptualization, Writing - Review & Editing, Supervision, Project administration, Funding acquisition. Declaration of competing interest The authors declare that they have no conflict of interest. Acknowledgment