Abstract

   The proteins and RNAs of viruses extensively interact with host
   proteins after infection. We collected and reanalyzed all available
   datasets of protein-protein and RNA-protein interactions related to
   SARS-CoV-2. We investigated the reproducibility of those interactions
   and made strict filters to identify highly confident interactions. We
   systematically analyzed the interaction network and identified
   preferred subcellular localizations of viral proteins, some of which
   such as ORF8 in ER and ORF7A/B in ER membrane were validated using dual
   fluorescence imaging. Moreover, we showed that viral proteins
   frequently interact with host machinery related to protein processing
   in ER and vesicle-associated processes. Integrating the protein- and
   RNA-interactomes, we found that SARS-CoV-2 RNA and its N protein
   closely interacted with stress granules including 40 core factors, of
   which we specifically validated G3BP1, IGF2BP1, and MOV10 using RIP and
   Co-IP assays. Combining CRISPR screening results, we further identified
   86 antiviral and 62 proviral factors and associated drugs. Using
   network diffusion, we found additional 44 interacting proteins
   including two proviral factors previously validated. Furthermore, we
   showed that this atlas could be applied to identify the complications
   associated with COVID-19. All data are available in the AIMaP database
   ([39]https://mvip.whu.edu.cn/aimap/) for users to easily explore the
   interaction map.

   Keywords: SARS-CoV-2, Protein-interactome, RNA-Interactome, Drug
   repurposing, Protein localization

Graphical abstract

   [40]Image 1
   [41]Open in a new tab

Highlights

     * •
       An integrated atlas of interactions between SARS-CoV-2 RNA/proteins
       and host proteins.
     * •
       SARS-CoV-2 may disrupt protein processing in ER through
       protein-protein interactions.
     * •
       The core stress granule factors are the main targets of SARS-CoV-2
       RNA and protein N.
     * •
       The atlas could be used to understand viral protein localizations
       and complications.

1. Introduction

   The global COVID-19 pandemic caused by the severe acute respiratory
   syndrome coronavirus 2 (SARS-CoV-2) virus is still ongoing and has led
   to more than 500 million infections and more than 6 million deaths
   since its first report more than two years ago ([42]Wu et al., 2020),
   posing a series of threats to human security. Currently, approved
   antiviral drugs targeting SARS-CoV-2 are still scarce ([43]Edwards
   et al., 2022), and vaccines remain the primary option for fighting
   against the pandemic. However, the emergence of highly-transmitted and
   highly-pathogenic SASR-CoV-2 variants, such as the Delta and Omicron
   variants, continuously challenge the efficiency of vaccines ([44]Araf
   et al., 2022; [45]Harvey et al., 2021; [46]Li et al., 2022;
   [47]Mlcochova et al., 2021). Drug repurposing is an effective strategy
   to quickly identify drugs to fight against SARS-CoV-2. However, it
   requires knowing targetable host proviral proteins, especially those
   directly interacting with viral macromolecules (proteins and RNAs).

   Generally, there are three strategies to identify the host proteins
   involved in interactions between SARS-CoV-2 and the host cells. The
   first method is using CRISPR screening to identify host proteins that
   are functionally essential for SARS-CoV-2 infection ([48]Baggen,
   Persoons, et al., 2021; [49]Biering et al., 2022; [50]Daniloski et al.,
   2021; [51]Flynn et al., 2021; [52]Hoffmann et al., 2021; [53]Schneider
   et al., 2021; [54]Wang et al., 2021; [55]Wei et al., 2021; [56]Zhu
   et al., 2021), based on effectively repressing or activating a specific
   host gene ([57]Konermann et al., 2015; [58]Sanjana et al., 2014;
   [59]Shalem et al., 2014). The second strategy is to detect
   differentially expressed genes at RNA or protein level after SARS-CoV-2
   infection using high-throughput RNA sequencing or mass spectrometry
   ([60]Bojkova et al., 2020; [61]Stukalov et al., 2021). The third
   strategy is identifying host proteins that physically interact with
   viral proteins and RNAs using affinity purification (AP), proximity
   labeling (PL), or RNA antisense purification coupled with mass
   spectrometry. The proteins identified from these methods are candidates
   that merit further examination of antiviral effects.

   In detail, the affinity-purification coupled mass spectrometry (AP-MS)
   can identify interacting proteins (protein-interactome) of a specific
   protein fused to an affinity tag, and has been widely used for
   different SARS-CoV-2 proteins recently ([62]Chen, Wang, et al., 2021;
   [63]Davies et al., 2020; [64]Gordon et al., 2020a, [65]2020b; [66]Jiang
   et al., 2020; [67]Kruse et al., 2021; [68]Li et al., 2021; [69]Liu
   et al., 2021; [70]Nabeel-Shah et al., 2022; [71]Shin et al., 2020;
   [72]Slavin et al., 2021; [73]Stukalov et al., 2021). The
   proximity-labeling coupled mass spectrometry (PL-MS) has also been
   extensively adopted to identify viral protein-interactomes based on
   fusing with a biotin ligase ([74]Chen, Wang, et al., 2021; [75]Laurent
   et al., 2020; [76]Liu et al., 2021; [77]Meyers et al., 2021;
   [78]Samavarchi-Tehrani et al., 2020; [79]St-Germain et al., 2020;
   [80]Zhang, Shang, et al., 2022), such as BioID ([81]Roux et al., 2012)
   and engineered BioID including TurboID and miniTurbo ([82]Branon
   et al., 2018). The biotin ligases can catalyze biotinylation of nearby
   proteins within a radius of 10 nm ([83]Kim et al., 2014), from which
   the biotinylated proteins can then be captured with streptavidin beads.
   It has been unclear whether these methods detect similar interactions.

   Specifically, the RNA antisense purification coupled mass spectrometry
   (RAP-MS) ([84]Lee et al., 2021; [85]Schmidt et al., 2021) and
   comprehensive identification of RNA binding proteins by mass
   spectrometry (ChIRP-MS) ([86]Flynn et al., 2021; [87]Zhang, Huang,
   et al., 2022) can identify host and viral proteins binding to viral
   RNAs (RNA-interactome). Both methods chemically crosslink cells, where
   RAP-MS uses 254 nm UV light and ChIRP-MS uses 3% PFA to fix in vivo
   RNA-protein interactions, and then purify the proteins using RNA
   antisense probes against viral RNAs. Recently, another two methods,
   viral cross-linking and solid-phase purification (VIR-CLASP) ([88]Kim
   et al., 2020) and viral RNA interactome capture (vRIC-MS) ([89]Kamel
   et al., 2021), were developed to specifically explore viral
   RNA-interactome, using 365 nm UV light to crosslink 4-Thiouridine
   (4sU)-labeled RNA with proteins. It is worth noting that VIR-CLASP
   could capture RNA-interactome in the earliest period of infection,
   while other methods could only efficiently capture RNA-interactome
   after viruses are massively amplified in the cells. These methods,
   except VIR-CLASP, have been used to detect SARS-CoV-2 RNA-interactome.
   However, an integrated map of these interactions from all previous
   studies is lacking.

   Recently, several works summarized a few studies related to the
   interactomes ([90]Baggen, Vanstreels, et al., 2021; [91]Haas et al.,
   2021; [92]Kolinski et al., 2022; [93]Terracciano et al., 2021).
   However, none of them did an intensive analysis of these data. Here, we
   collected and systematically investigated all available datasets on the
   protein-interactome and RNA-interactome of SARS-CoV-2 from published
   studies. We identified a high-confident interaction network of
   SARS-CoV-2 macromolecules and host proteins. We further characterized
   the interactions and proteins by integrating CRISPR screening and
   protein localization data, and we experimentally validated a few
   predictions from our analyses using multiple assays. This atlas of
   interactions would be a valuable resource in providing clues to
   understand the molecular mechanisms of viral infection and for
   developing antiviral strategies and drug repurposing.

2. Results

2.1. Strategies and data sources for the SARS-CoV-2 interactomes

   We searched PubMed and ProteomeXchange databases and collected all
   published data by January 2022 on protein-interactome ([94]Chen, Wang,
   et al., 2021; [95]Davies et al., 2020; [96]Gordon et al., 2020a,
   [97]2020b; [98]Jiang et al., 2020; [99]Kruse et al., 2021; [100]Laurent
   et al., 2020; [101]Li et al., 2021; [102]Liu et al., 2021; [103]Meyers
   et al., 2021; [104]Nabeel-Shah et al., 2022; [105]Samavarchi-Tehrani et
   al., 2020; [106]Shin et al., 2020; [107]Slavin et al., 2021;
   [108]St-Germain et al., 2020; [109]Stukalov et al., 2021; [110]Zhang,
   Shang, et al., 2022) and RNA-interactome ([111]Flynn et al., 2021;
   [112]Kamel et al., 2021; [113]Lee et al., 2021; [114]Schmidt et al.,
   2021; [115]Zhang, Huang, et al., 2022) for SARS-CoV-2 using different
   methods including AP-MS, PL-MS, RAP-MS, vRIC-MS, and ChIRP-MS. By
   integrating these data, we aimed to build an atlas of interactions
   between SARS-CoV-2 macromolecules and host proteins, a vital component
   of the knowledge map of virus-host interactions ([116]Fig. 1A). We also
   curated all detailed meta-information about cell lines, tag types, bait
   proteins, and dataset identifiers for all the collected datasets
   ([117]Table 1 and [118]Fig. 1B).

Fig. 1.

   [119]Fig. 1
   [120]Open in a new tab

   Overview of the approaches for systematically analyzing viral protein
   and RNA interactomes.

   (A) Schematic of strategies for exploring protein-protein interaction
   and RNA-protein interaction between SARS-CoV-2 and host. (B)
   Matrix-like plot showing the viral protein baits information in each
   study of AP-MS and PL-MS. The datasets with processed results were
   labeled with red circles on the right.

Table 1.

   Description of the datasets.
   ID Interactome Method Cell line Strategy Bait Dataset
   1 Protein-protein AP-MS A549 His tag N PXD023989
   2 GFP tag NSP3 PXD018983
   3 HA tag 24 proteins PXD020222
   4 HEK293 Strep tag NSP1-2, N PXD023487
   5 Strep tag 26 proteins PXD018117
   6 Strep tag NSP16 PXD021588
   7 Strep tag 29 proteins MSV000087035
   8 FLAG tag NSP2, NSP4 PXD022017
   9 FLAG tag 18 proteins [121]Li et al. (2021)
   10 Biotin tag ORF9B PXD019803
   11 GFP tag 27 proteins MSV000086704
   12 SFB tag 29 proteins PXD023209
   13 HeLa YFP tag N PXD025410
   14 PL-MS A549 miniTurbo 27 proteins [122]Samavarchi-Tehrani et al.
   (2020)
   15 HEK293 BioID 14 proteins MSV000086006
   16 BioID 17 proteins PXD023277
   17 BioID 28 proteins [123]Laurent et al., 2020
   18 BioID 29 proteins PXD023209
   19 BioID 29 proteins MSV000087035
   20 TurboID 27 proteins PXD022086
   21 RNA-protein RAP-MS VeroE6 IVT RNA probes +ssRNA PXD024808
   22 Huh7 67 DNA probes +ssRNA MSV000085734
   23 vRIC-MS Calu-3 (dT)25 probe 4sU ​+ssRNA PXD023418
   24 ChIRP-MS VeroE6 108 DNA probes +ssRNA [124]Flynn et al. (2021)
   25 Huh7.5 108 DNA probes +ssRNA [125]Flynn et al. (2021)
   26 160 DNA probes +ssRNA [126]Zhang, Huang, et al. (2022)
   [127]Open in a new tab

   AP-MS: affinity-purification coupled mass spectrometry; PL-MS:
   proximity-labeling coupled mass spectrometry; RAP-MS: RNA antisense
   purification coupled mass spectrometry; vRIC-MS: viral RNA interactome
   capture coupled mass spectrometry; ChIRP-MS: comprehensive
   identification of RNA binding proteins by mass spectrometry.

   As shown in [128]Table 1, the protein-interactome of SARS-CoV-2 was
   mainly captured by AP-MS or PL-MS. Most AP-MS assays were conducted
   using viral protein bait fused with an affinity tag, and seven types of
   affinity tags were commonly used. Besides, Jiang et al. used in vitro
   purified biotinylated viral proteins as baits to capture interacting
   proteins under in vitro incubation conditions ([129]Jiang et al.,
   2020). In PL-MS assays, three types of biotin ligases were mainly used:
   BioID, TurboID, and miniTurbo. SARS-CoV-2 encodes 29 viral proteins and
   most studies were dedicated to exploring all viral proteins
   simultaneously; however, not every viral protein could be successfully
   expressed, and thus not all proteins had data (grey squares in
   [130]Fig. 1B). In addition, some studies also expressed mutated viral
   proteins as baits ([131]Fig. 1B). Notably, two studies explored the
   interactions under stress conditions: IFN-α treatment ([132]Slavin
   et al., 2021) and poly(I:C) treatment ([133]Laurent et al., 2020).

   In the studies to capture SARS-CoV-2 RNA-interactome using RAP-MS,
   ChIRP-MS, and vRIC-MS, only the positive-strand RNAs were targeted as
   baits in all datasets from three host cell types ([134]Table 1),
   probably because the abundance of negative-strand RNAs was much lower
   than that of positive-strand RNAs. In RAP-MS and ChIRP-MS, the
   antisense probes should cover the full viral genomes to achieve the
   best capture efficiency; however, the cost of synthesizing a large
   number of biotinylated probes made it difficult, especially for
   SARS-CoV-2 with gRNA length reaching 29 kb. To solve this difficulty,
   Lee et al. used in vitro transcription to synthesize biotinylated
   tiling probes ([135]Lee et al., 2021).

2.2. Integrated virus-host protein-protein interaction network

   We finally acquired processed protein-interactome data from 7 AP-MS
   assays and 7 PL-MS assays and the numbers of detected viral
   protein-host protein interactions varied significantly between
   different studies ([136]Fig. 2A). Comparing the presences of all
   interactions in different datasets, we found that there were only a few
   interactions being captured multiple times, whereas most interactions
   appeared specifically in a certain dataset ([137]Fig. S1A). We counted
   the number of occurrences of each protein-protein interaction and found
   that 93.7% of the interactions in AP-MS were unique, compared to 77% in
   PL-MS ([138]Fig. 2B and C).

Fig. 2.

   [139]Fig. 2
   [140]Open in a new tab

   Summary ofSARS-CoV-2protein-host protein interactions.

   (A) The number of viral protein-host protein interactions from
   different studies. (B–C) Pie charts showing the proportion of virus
   protein-host protein interactions from different studies, which were
   identified by AP-MS (B) and PL-MS (C) experimental strategy,
   respectively. (D) The network of virus protein-host protein
   interactions identified by AP-MS (AP-interactome). Hexagon yellow nodes
   represent viral proteins. Circular blue nodes represent host proteins.
   Edges indicate virus-host protein interactions. Edge thickness
   represents the credibility level of interactions.

   We classified the interactions by their occurrences in AP-MS and PL-MS,
   respectively. For the 9 interactions with 4 or more occurrences in
   AP-MS ([141]Fig. 2B), 6 interactions had evidence from the literature
   (ORF3A-CLCC1 ([142]Chen, Wang, et al., 2021), ORF3A-VPS39 ([143]Chen,
   Zheng, et al., 2021; [144]Miao et al., 2021; [145]Zhang et al., 2021),
   ORF3A-VPS11 ([146]Chen, Wang, et al., 2021), N-G3BP1 ([147]Chen, Wang,
   et al., 2021; [148]Kruse et al., 2021), N-G3BP2 ([149]Chen, Wang,
   et al., 2021), ORF9B-TOMM70 ([150]Gao et al., 2021)). We identified 25
   interactions with 5 or more occurrences in PL-MS, three of which had
   literature confirmations: ORF3A-VPS39, ORF3A-CLCC1, and ORF9B-TOMM70
   ([151]Fig. 2C). Unexpectedly, the ORF9B-MAVS interaction was reported
   as non-direct interaction ([152]Wu et al., 2021). We counted the number
   of interactions with viral proteins by host protein and found that the
   proportion and frequency of such interactions in PL-MS were much larger
   than those in AP-MS ([153]Fig. S1B). These results implicated that the
   identified interactions may originate from both direct and indirect
   protein-protein interactions due to space proximity to the bait
   protein.

   Based on the assumption that the larger the frequency of occurrences,
   the higher the probability of being authentic for the interaction, we
   empirically defined a minimum threshold of 2 occurrences for AP-MS and
   of 3 occurrences for PL-MS (without counting potential replicates in an
   individual study) to identify reliable interactions. Under these
   cutoffs, 93.7% of interactions in AP-MS whose occurrence ≤1 were
   removed, and 93.9% of interactions in PL-MS whose occurrences ≤2 were
   removed. The remaining interactions were used to construct the
   protein-interactome of SARS-CoV-2 for further analysis. In AP-MS
   supported protein-interactome (AP-interactome), there were 298
   interactions containing 26 viral proteins and 234 host proteins
   ([154]Fig. 2D and [155]Supplementary Table 1). In PL-MS supported
   protein-interactome (PL-interactome), there were 1086 interactions
   containing 24 viral proteins and 538 host proteins ([156]Fig. S2 and
   [157]Supplementary Table 1).

2.3. Comparison between AP- and PL-interactomes and viral protein
localization

   Since PL-MS results may reflect different characteristics from AP-MS,
   we first counted the overlapped interactions between AP-interactome and
   PL-interactome. We found that there were a few common interactions
   between AP-interactome and PL-interactome for all or individual viral
   proteins ([158]Fig. 3A). In the case of ORF3A, for which the number of
   AP-interactome was close to that of PL-interactome, the proportion of
   the overlapped interactions was small ([159]Fig. 3A). The results
   confirmed that two methods AP-MS and PL-MS have different focuses,
   suggesting that we could not simply merge AP-interactome and
   PL-interactome and need to explore them separately.

Fig. 3.

   [160]Fig. 3
   [161]Open in a new tab

   Comparison between viral proteinAP-interactomeand PL-interactome.

   (A) The number of interacting proteins of each viral protein identified
   in AP-interactome, PL-interactome, or both. (B–C) Localization analysis
   of viral proteins based on host proteins in AP-interactome (B) and
   PL-interactome (C). Previously validated proteins localizing in the
   membrane are marked in blue dotted boxes and proteins validated in this
   study are labeled with red dotted boxes. (D) Western blotting of
   expressed mCherry-tagged viral proteins (left) and eGFP-tagged marker
   proteins (right). (E-H) Representative fluorescence images showing
   co-localization between ORF7A (E)/ORF7B (F)/ORF8 (G)/ORF10 (H) (red)
   and LBR/SEC61B (green). The plots on the right showed the fluorescence
   intensity along the region of interest (ROI).

   Since AP-MS tends to reveal potential direct interactions between
   proteins, we performed protein domain analysis on the host proteins
   from the AP-interactome. We found that some viral proteins frequently
   interacted with host proteins containing the same type of domains
   ([162]Fig. S3A). For example, ORF3A and NSP7 could specifically
   interact with the small GTP-binding protein domain, while NSP6, ORF3A,
   M, and ORF7B all could interact with the cation-transporting P-type
   ATPase domain. These domain-interacting signatures reflect the
   structural bases of those virus-host protein-protein interactions.

   Considering two interacting proteins tend to have similar subcellular
   localization, we next used the PL-interactome and AP-interactome to
   analyze the localization of viral proteins. We used the protein
   localization data from a BioID proximity map of HEK293 proteome, also
   named [163]humancellmap.org ([164]Go et al., 2021), which divided cells
   into 20 compartments and 4145 host proteins were mapped into individual
   compartments. We mapped the AP-interactome and PL-interactome of viral
   proteins to these 20 compartments and obtained their probable
   subcellular localizations ([165]Fig. 3B and C). We found that the most
   significantly enriched localization of PL-interactome was the nuclear
   outer membrane-ER membrane network ([166]Fig. 3C), while the most
   enriched one of AP-interactome was ER lumen ([167]Fig. 3B). We found
   that the interactome of NSP4, NSP6, M, ORF7A, and ORF7B significantly
   enriched the term of nuclear outer membrane-ER membrane network
   ([168]Fig. 3C), consistent with that M protein was a well-known
   membrane protein and recent findings showing that NSP4 and NSP6 were
   membrane proteins of double-membrane vesicles (DMVs) ([169]Ricciardi
   et al., 2022). The results suggest that ORF7A and ORF7B were most
   likely proteins preferred to localize at the membrane, especially ORF7A
   containing a potential transmembrane domain ([170]Samavarchi-Tehrani et
   al., 2020). In addition, we found that the interactome of ORF8 and
   ORF10 (weakly) specifically enriched the GO-term ER lumen ([171]Fig. 3B
   and C). These results are consistent with that ORF8 contains a
   potential ER signal peptide ([172]Samavarchi-Tehrani et al., 2020), and
   that ORF8 and ORF10 were recently shown to colocalize with ER ([173]Liu
   et al., 2021).

   To validate the localization data, we investigated the cellular
   localization of ORF7A, ORF7B, ORF8, and ORF10. The mCherry-tagged
   fusion proteins were constructed, all of which expressed products near
   their theoretical sizes, except ORF8 showing another band lower than
   its full-length product ([174]Fig. 3D left and [175]Figs. S3B–C left).
   We used eGFP-SEC61B and eGFP-LBR to mark the endoplasmic reticulum (ER)
   and nuclear envelope, respectively ([176]Fig. 3D right and
   [177]Figs. S3B–C right). We found that ORF7A, ORF7B, and ORF8
   co-localized with SEC61B and LBR, while ORF10 was more diffused in the
   cytosol and partially colocalized with SEC61B ([178]Fig. 3E–H and
   [179]Figs. S4A–D). These results suggested that ORF7A, ORF7B, and ORF8
   could locate in the ER-nuclear membrane network, confirming our
   localization analysis based on interactomes. However, limited by the
   imaging resolution, it was hard to distinguish ER membrane and ER lumen
   by confocal microscopy.

   Among SARS-CoV-2 proteins, ORF7B and ORF10 had not been well
   characterized probably due to that their sizes (43 and 38 amino acids,
   respectively) were too small for most techniques. Understanding how
   ORF7B and ORF10 are localized to corresponding subcellular compartments
   would help to investigate their functions in the viral lifecycle.

2.4. A high-confident viral protein network and its functional
characterization

   We further built a high-confident viral protein-host protein
   interaction network using the results of AP-interactome and
   PL-interactome by further requiring the interaction to satisfy at least
   one of the following characteristics: 1) interactions between viral
   proteins and host complexes in AP-interactome or PL-interactome; 2)
   interactions between viral proteins and host proteins with the same
   protein domain in AP-interactome ([180]Fig. S3A); 3) interactions with
   3 or more occurrences in AP-interactome; 4) interactions with 3 or more
   occurrences in PL-interactome and also appearing in AP-interactome.
   There are 261 interactions in this network consisting of 19 viral
   proteins and 152 host proteins ([181]Fig. 4).

Fig. 4.

   [182]Fig. 4
   [183]Open in a new tab

   A compendium of high-confident protein-protein interactions between
   SARS-CoV-2 and host. Hexagon yellow nodes represent viral proteins.
   Circular grey nodes represent host proteins. Edges indicate virus-host
   protein interactions that supported by AP-MS (blue) or PL-MS (pink).
   Protein complexes (and cellular component) according to GO-term
   analysis and proteins sharing the identical domain are enclosed within
   rounded quadrilateral. Edge thickness represents the strength of
   interactions.

   In this high-confident network, some interactions have evidence from
   crystal structures, such as NSP1-alpha DNA polymerase:primase complex
   ([184]Kilkenny et al., 2022) and ORF9B-TOMM70 complex ([185]Gao et al.,
   2021), while some interactions are supported by biochemical
   experiments, such as N-Stress granule ([186]Kruse et al., 2021) and
   ORF3A-HOPS complex ([187]Miao et al., 2021). Notably, there are many
   interesting interactions that lack in-depth studies yet, including the
   NSP6-ATP synthase complex, NSP6-EMC, ORF7B-SNARE complex, and NSP16-CCC
   complex. The ATP synthase complex, also named Complex V, is the fifth
   component of the oxidative phosphorylation chain and catalyzes the
   phosphorylation of ADP to generate ATP ([188]Neupane et al., 2019). The
   ER membrane complex (EMC) is an insertase and directly mediates the
   insertion of transmembrane domains into the ER membrane ([189]Pleiner
   et al., 2020). The soluble N-ethylmaleimide-sensitive factor attachment
   protein receptor (SNARE) complex mediates membrane recognition and
   fusion ([190]Yoon & Munson, 2018), and the COMMD/CCDC22/CCDC93 (CCC)
   complex mediates membrane protein recycling in cargo transport
   ([191]Singla et al., 2019). These host complexes interacting with viral
   proteins may play important roles in SARS-CoV-2 infection and
   replication.

   Moreover, we found that the viral proteins in this high-confident
   network mostly interact with host proteins in two biological processes:
   protein processing in ER and vesicle-associated process
   ([192]Supplementary Table 2). Furthermore, the KEGG analysis of
   AP-interactome and PL-interactome also revealed that viral proteins
   most significantly interact with host proteins related to protein
   processing in ER ([193]Figs. S5A–B). Interestingly, we found that some
   host proteins functioning in protein processing in ER might directly
   interact with multiple viral proteins ([194]Fig. S5C). These results
   suggest that protein processing in ER could be the major pathway
   targeted by viral proteins, as a disorder of this pathway can induce ER
   stress and the unfolded protein response, which often occurs after
   virus infection.

2.5. An integrated viral RNA-host protein interaction network

   For viral RNA-host protein interactions, we curated the processed
   RNA-interactome data from three ChIRP-MS assays, two RAP-MS assays, and
   one vRIC-MS assay ([195]Table 1). The counts of interactions identified
   in the assays mostly ranged from 100 to 150 ([196]Fig. 5A), and there
   were many overlapped interactions among these assays ([197]Fig. S6A).
   Requiring 3 or more occurrences, we identified 90 reliable RNA-protein
   interactions and this number is close to the minimum (104) of the
   original datasets ([198]Fig. 5B). We performed protein domain analysis
   for the 90 host proteins and found that most proteins contain domains
   related to RNA binding, among which RRM domain is the most enriched
   domain found in 41 proteins ([199]Fig. S6B). We constructed a network
   of SARS-CoV-2 RNA-interactome together with known protein-protein
   interactions and revealed several complexes of those proteins
   ([200]Fig. 5C and [201]Supplementary Table 1).

Fig. 5.

   [202]Fig. 5
   [203]Open in a new tab

   Overview ofSARS-CoV-2RNA-host protein interactions.

   (A) The number of viral RNA-host protein interactions from different
   studies. (B) Pie chart showing the proportion of viral RNA-host protein
   interactions from different studies. (C) A compendium of SARS-CoV-2
   RNA-interactome. Central nodes represent SARS-CoV-2 RNAs. Ellipses
   represent host proteins. Edges represent viral RNA-protein interactions
   (pink) and protein-protein interactions (blue). Edge thickness
   indicates the affinity of interactions. Protein complexes (and cellular
   components) according to GO-term analysis are enclosed within sectors.
   (D) The dynamics of SARS-CoV-2 RNA-interactomes at different time
   points after viral infection. (E) Venn plot showing the overlap among
   the viral protein N-interactome, SG&PB proteome, and SARS-CoV-2
   RNA-interactome. (F) The enrichment ratio of SG proteins (G3BP1,
   IGF2BP1, and MOV10)-bound NSP12 and N RNAs in SARS-CoV-2 infected cells
   using RNA immunoprecipitation followed by qRT-PCR. The relative RNA
   levels were normalized with ACTB RNA and the enrichment ratios were
   normalized with IgG samples. (G) Co-immunoprecipitation between viral N
   protein and the SG marker proteins (G3BP1, IGF2BP1, and MOV10) in
   SARS-CoV-2 infected cells. (H) A competitive model of SARS-CoV-2 RNA
   and N protein for the assembly of stress granule.

   We observed that SARS-CoV-2 positive-strand RNA(s) interacted with
   proteins in the ribosomal subunits, eIF4F complex, and 48S
   preinitiation complex, which are involved in translation initiation or
   active translation. Interestingly, SARS-CoV-2 RNA(s) interact with RNA
   stability-related proteins, like CRD-mediated stability complex and
   RNA-induced silencing complex (RISC). Notably, the RNA(s) most
   significantly interact with stress granule proteins, suggesting that
   SARS-CoV-2 RNA may be localized in stress granule (SGs), which are
   cytosolic subcellular compartments where translation-stalled RNAs are
   localized ([204]Youn et al., 2019). In addition, the viral
   RNA-interacted proteins include PP1 phosphatase, hnRNPs, nuclear
   speckle proteins, and other RNA metabolism-related proteins. We further
   analyzed the overlapped proteins between SARS-CoV-2 RNA-interactome and
   protein-interactome, and found that 16 host proteins interacted with
   both viral RNAs and proteins ([205]Fig. S6C).

   We analyzed the dynamic changes of SARS-CoV-2 RNA-interactome at
   different time points after virus infection using the time-course data
   from Flynn et al. including ChIRP-MS data at 24 h p.i and 48 h p.i in
   Huh7 and VeroE6 cells, respectively. We found that viral RNA
   interacting proteins at 24 h p.i were almost captured at 48 h p.i in
   both Huh7 and VeroE6 cells ([206]Fig. 5D). There were also more
   interacting proteins at 48 h p.i. than at 24 h p.i., especially in Huh7
   cells, where over half of the interacting proteins at 48 h p.i were not
   observed at 24 h p.i ([207]Fig. 5D). These results indicated that the
   SARS-CoV-2 RNA-interactome increased along with infection time, and
   there were many same interactions taking place at different stages and
   in different cell lines, suggesting their potential functionalities in
   viral infection.

   It was reported that the N protein could inhibit the stress granule
   assembly by competitively binding to G3BP1 and G3BP2 ([208]Zheng
   et al., 2021); and intriguingly, under non-viral conditions,
   transfection of SARS-CoV-2 RNA alone induced stress granule assembly
   ([209]Zheng et al., 2021). We thus further analyzed another more
   complete set of SG & P-body (PB)-related proteins (SG&PB) identified
   recently by in vivo BioID mediated proximity-mapping ([210]Youn et al.,
   2018) and data curation ([211]Youn et al., 2019). We found that N
   protein is the most significant viral protein interacting with SG&PB
   ([212]Figs. S6D–E). The N protein interacts with 20 SG&PB proteins,
   accounting for 80% (20/25) of the N-interactome ([213]Fig. 5E bottom
   and [214]Supplementary Table 3). Especially, 18 of the 20 proteins are
   core SG&PB proteins ([215]Fig. 5E top and [216]Supplementary Table 3),
   suggesting that N primarily interacts with the core proteins and
   supporting that N could disrupt SG assembly in the recent study. The
   SARS-CoV-2 RNA interacted with 55 SG&PB proteins, accounting for 61%
   (55/90) of the RNA-interactome ([217]Fig. 5E bottom) and 28 of them
   were core proteins ([218]Fig. 5E top). Specifically, both N and viral
   RNA interacted with G3BP1 and G3BP2, markers of stress granule rather
   than P-body.

   To verify whether N and viral RNA interacted with SG under SARS-CoV-2
   infection, we used G3BP1, IGF2BP1, and MOV10 as SG markers to perform
   RNA Immuno-precipitation (RIP) and Co-IP experiments ([219]Fig. 5E). We
   found that under SARS-CoV-2 infection, G3BP1, IGF2BP1, and MOV10 all
   interacted with viral RNAs including NSP12 and N RNAs ([220]Fig. 5F),
   and the three SG core proteins also all interacted with the viral N
   protein ([221]Fig. 5G and [222]Figs. S6F–G). Especially, G3BP1, an
   essential factor in mediating SG formation, most strongly interacts
   with SARS-CoV-2 RNA. These results further supported that viral RNA
   could localize into stress granules under SARS-CoV-2 infection, based
   on which we proposed a model that SARS-CoV-2 RNA induced SG assembly by
   co-binding with multiple SG proteins resulted in stalled viral RNA
   translation; however, N protein could disrupt SG assembly through
   competitively binding with SG core proteins to ensure viral RNA active
   translation ([223]Fig. 5H).

2.6. Interactome-based identification of proviral/antiviral factors and
associated diseases

   The virus-interacting host proteins identified from AP-interactome,
   PL-interactome, and RNA-interactome showed ubiquitous expression and no
   strong tissue specificity ([224]Figs. S7A–C). Interestingly, compared
   to the non-interactome control proteins, the interactome proteins
   exhibited more residues to be phosphorylated and ubiquitinated
   ([225]Fig. S7D).

   To assess the effect of the interactome proteins on viral lifecycle, we
   used available genome-wide CRISPR screening data on SARS-CoV-2.
   Integrating the Z-scores or fold-change values in 8 datasets from 6
   studies ([226]Baggen, Persoons, et al., 2021; [227]Biering et al.,
   2022; [228]Daniloski et al., 2021; [229]Schneider et al., 2021;
   [230]Wang et al., 2021; [231]Wei et al., 2021), we computed the mean
   enrichment scores of CRISPR screened genes ([232]Fig. 6A). Using
   non-interactome proteins as a background control, we identified
   potential proviral proteins with mean enrichment score >0.44 and
   antiviral proteins score < −0.54 under 0.05 false discovery rate: 22
   proviral vs. 24 antiviral proteins in AP-interactome, 43 vs. 44 in
   PL-interactome, and 3 vs. 24 in RNA-interactome ([233]Fig. 6A and B and
   [234]Supplementary Table 4).

Fig. 6.

   [235]Fig. 6
   [236]Open in a new tab

   Functional characteristics ofSARS-CoV-2interactomes.

   (A) Density map showing the distribution of mean CRISPR screening
   enrichment scores of host proteins after SARS-CoV-2 infection.
   Background 1 represents proteins that were identified but removed as
   background interactions. Background 2 (non-interactome) represents all
   other human genes that were not in the background and the SARS-CoV-2
   interactome. Dashed lines indicated estimated 95% confidence interval
   values of the mean enrichment scores. (B) Heatmap plot showing the
   CRISPR screening enrichment scores of host proteins after SARS-CoV-2
   infection. The numbers of provial and antiviral proteins are marked.
   (C) Network of predicted proviral protein-drug interactions. Circular
   nodes represent viral proteins (blue), viral RNA (yellow), host
   proteins (grey), and drugs in DGIdb (red). Edges indicate viral
   protein/RNA-host protein interactions (blue) and host protein-drug
   interactions (green). (D) Two subnetworks identified by the network
   diffusion approach. Circular nodes represent seed nodes (yellow) and
   candidate nodes (red). The size of nodes indicates the network
   diffusion score. (E) Disease enrichment analysis of the host proteins
   in SARS-CoV-2 protein AP-interactome (red), PL-interactome (green), and
   RNA-interactome (blue).

   We next used drug targeting information from the DGIdb database
   ([237]Cotto et al., 2018) to identify potential drugs that could target
   those proviral proteins defined above. Interestingly, some proviral
   proteins do have known drugs ([238]Fig. 6C), including glycosylase
   GANAB, a reported proviral factor, and the drug MIGLUSTAT, which has
   shown significant antiviral effect in cultured cells
   ([239]Casas-Sanchez et al., 2022). The positive results implicate the
   efficiency and accuracy of our analysis. In addition, our results also
   suggest that other potential drugs, targeting the identified proviral
   proteins we predicted, merit further studies.

   We further expanded the interactome network by applying the network
   diffusion algorithm and identified 44 more host proteins associated
   with the viral interactome ([240]Supplementary Table 5). Surprisingly,
   some proteins found by network diffusion were reported as proviral
   factors, including RACK1 and EWSR1 ([241]Fig. 6D). RACK1 was found to
   interact with multiple flavivirus NS1 proteins and showed a proviral
   effect for flavivirus and SARS-CoV-2 ([242]Shue et al., 2021). EWSR1
   was found to interact with NSP13 in promoting its helicase activity and
   showed a proviral effect for SARS-CoV-2 ([243]Zeng et al., 2022). These
   results suggested that using the virus-host interactome, combined with
   a network diffusion algorithm and known protein-drug data, can be an
   effective strategy to identify and repurpose antiviral drugs.

   To understand the molecular features of those proviral and antiviral
   proteins, we performed Gene Ontology analysis by viral
   protein-interactome (VP-interactome) and RNA-interactome, respectively.
   Interestingly, in VP-interactome, proviral proteins preferred to
   associate with COPII-coated vesicle transport ([244]Fig. S8A), and
   antiviral proteins favored mRNA transport ([245]Fig. S8B). In
   RNA-interactome, proviral proteins preferred to associate with tRNA
   splicing ([246]Fig. S9A), and antiviral proteins favored mRNA
   translation and stability regulation ([247]Fig. S9B).

   COVID-19 has shown to cause severe complications, and we asked whether
   virus-host interactome could help to explain certain symptoms or
   diseases associated with COVID-19. We used the gene-disease
   interactions in DisGeNET database ([248]Pinero et al., 2017), and
   counted the frequency of interactome proteins in different diseases. We
   found that diseases related to the brain were the most significantly
   enriched, such as Spinocerebellar ataxia 17 and cerebellar atrophy
   ([249]Fig. 6E). According to a recent clinical report, compared with
   other patients, the brain of COVID-19 patients underwent abnormal
   structure changes, such as reduction in grey matter thickness and
   global brain size ([250]Douaud et al., 2022). We further analyzed the
   RNA expression profile of 59 brain-disease related interacting proteins
   in multiple human tissues using GTEx portal ([251]Consortium, 2020). We
   found that these proteins did not show obvious brain-specific
   expression pattern ([252]Fig. S10A); however, the expression levels of
   these brain-disease related interacting proteins were significantly
   higher than the levels of other interacting proteins in brain tissue
   ([253]Fig. S10B), suggesting a potential link between SARS-CoV-2 and
   brain disease. Moreover, a recent study showed that SARS-CoV-2 was
   found in olfactory brain areas of rhesus macaques at 7 days post
   infection ([254]Beckman et al., 2022). Combining all the information
   together, we thought that SARS-CoV-2 might induce brain disease via its
   interactome after infection of brain tissue. These results suggested
   the value of using the interactomes to recognize the associated
   diseases or to understand the molecular mechanisms underlying specific
   complications.

   Together, we showed that virus-host interactomes could be valuable
   resources for discovering antiviral targets and drugs, and
   systematically mining the virus-host interactomes would help us better
   understand how SARS-CoV-2 affects host cells. We have created a web
   server AIMaP ([255]https://mvip.whu.edu.cn/aimap/) for users to easily
   explore the atlas of interactions between SARS-CoV-2 macromolecules and
   host proteins with variable thresholds.

3. Discussion

   Our study integrated all available datasets and constructed a
   comprehensive atlas of interactions between SARS-CoV-2 RNA/proteins
   with host proteins. Besides curating a larger number of datasets than
   similar works recently published ([256]Baggen, Vanstreels, et al.,
   2021; [257]Haas et al., 2021; [258]Kolinski et al., 2022;
   [259]Terracciano et al., 2021), we conducted more in-depth and
   systematic analyses of the molecular interactions than previous works.
   We compared the results from different studies by technique and
   identified a high-confident interaction network ([260]Fig. 4).
   Combining with other data such as host protein localization and CRISPR
   screening, we made a series of findings through purposefully mining the
   interactome. However, the atlas merits further improvements. On one
   hand, the confident interaction map for all viral proteins is not
   complete yet, as our AP-interactome lacked interactions for NSP3,
   NSP11, and ORF9C, and PL-interactome lacked information on NSP5, NSP7,
   NSP10, NSP11, and NSP13. This might be improved by re-analyzing the raw
   mass spectrometry data with uniform standards when the raw data are
   available. On the other hand, the interactions need to be validated
   such as mapping the direct interaction sites, although we identified
   enriched protein domains. Recently developed programs such as
   AlphaFold2 ([261]Bryant et al., 2022; [262]Jumper et al., 2021) and
   Rosetta ([263]Baek et al., 2021) might be used to predict the
   interaction sites based on protein structural modeling.

   Given the interaction of one pair of factors, it is still difficult to
   determine the biological function of the viral protein to interact with
   the host protein. One easy-to-understand pattern is that viral protein
   sequesters the host protein resulting in disturbing its normal
   physiological function. We performed the KEGG analysis of the host
   proteins interacting with the virus and found that protein processing
   in ER is enriched, the disorder of which would induce ER stress and
   unfolded protein response. It merits to examine whether SARS-CoV-2
   induced such stress in patients of COVID-19. However, another pattern
   that interaction promotes the function of host protein is also
   possible. For example, viral proteins are enriched in interacting with
   proteins related to vesicle-associated processes, and such processes
   are critical for SARS-CoV-2 and may be hijacked by the virus. Our
   analysis validated that SARS-CoV-2 RNA and N protein interacted closely
   with stress granules (SGs) under SARS-CoV-2 infection, and provided
   candidate proteins that may play important roles in this process
   ([264]Supplementary Table 3). Further studies are deserved to
   investigate how RNA-interactome mediates SARS-CoV-2 RNA to localize
   into SGs and how N-interactome dissembles SGs.

   In analyzing the localization of viral proteins, the host protein
   localization data were generated in HEK293 cells, while some of the
   interactions in the interactome were identified in other cell lines. As
   the locations of some host proteins might vary between cell lines, and
   that SARS-CoV-2 may hijack proviral proteins from their usually
   localized compartments, the predicted localizations of viral proteins
   need to be inspected carefully or confirmed experimentally for
   potential hypotheses.

   During the revision of this manuscript, we noticed a new research
   article on SARS-CoV-2 protein-interactome identified by AP-MS and
   high-throughput yeast two-hybrid screen (Y2H) published in Nat
   Biotechnol on Oct. 10, 2022 ([265]Zhou et al., 2022). We curated it and
   integrated the protein-protein interactions into our AIMaP database in
   expanding the SARS-CoV-2-human interactome network. We compared this
   viral protein-interactome with our integrated interactome from AP-MS
   ([266]Fig. S10C), and presented the overlapped proteins in
   [267]Fig. S10D. We would continuously update the AIMaP database when
   new relevant data are available.

   We combined the atlas of virus-host molecular interactions with
   previous CRISPR screening results of SARS-CoV-2 and identified
   high-potential antiviral targets ([268]Supplementary Table 4). However,
   the CRISPR screenings were conducted at the cell level which might not
   reflect the human physiological status and have biases. Nevertheless,
   using the interactome with network diffusion, GO-term functional
   analysis, and structure-based interaction prediction would help to
   identify and prioritize antiviral targets, and further experiments are
   necessary to understand their functions and the mechanisms in viral
   infection.

4. Materials and methods

4.1. Fluorescent protein tagging and localization analysis

   SEC61B (ER membrane protein) was chosen as an endoplasmic reticulum
   marker and LBR (nuclear membrane protein) was chosen as a nuclear
   envelope marker. The marker ORFs were inserted in-frame at the
   C-terminus of eGFP in the pEGFP-C1 plasmid. The viral ORFs (ORF7A,
   ORF7B, ORF8, and ORF10) were inserted in-frame at C-terminus of mCherry
   in the pmCherry-C1 plasmid. Plasmids of fusion proteins were
   transfected into HeLa or 293T cells with lipofectamine2000 (Invitrogen,
   11668019) for 24 h. The protein expression was detected by western
   blotting with eGFP (Proteintech, 66002-1-Ig) and mCherry (Proteintech,
   26765-1-AP) antibodies. For localization analysis, HeLa cells were
   cultured on coverslips, fixed with 4% PFA, and permeabilized with 0.1%
   Triton X-100. The cell nucleus was stained with DAPI, and protein
   localizations were detected by autoluminescence of fluorescent protein
   tags. Theoretical molecular weights of fusion proteins are:
   mCherry-ORF7A (369 amino acids, 41.9 kDa); mCherry-ORF7B (291 amino
   acids, 33.4 kDa); mCherry-ORF8 (369 amino acids, 42 kDa); mCherry-ORF10
   (286 amino acids, 32.6 kDa); eGFP-SEC61B (342 amino acids, 37.6 kDa);
   and eGFP-LBR (861 amino acids, 98.4 kDa).

4.2. Co-immunoprecipitation (Co-IP) and RNA immuno-precipitation (RIP)

   A clinical strain of SARS-CoV-2 (nCoV-2019BetaCoV/Wuhan/WIV04/2019)
   ([269]Zhou et al., 2020) was used to infect hACE2-293T cells at
   multiplicity of infection (MOI) of 0.1 TCID[50] unit/cell. At 24 h p.i,
   infected cells were scraped from dishes and washed with ice-cold 1xPBS.
   Cell pellets proliferated in one 100 mm dish were resuspended with 2 mL
   RIP/IP lysis buffer (20 mM Tris-HCl pH 7.5, 150 mM NaCl, 5% NP-40, and
   newly added protease inhibitor) and incubated on ice for 30 min. The
   suspensions were then sonicated for 7 cycles (30s/30s) in high-power
   mode (Bioruptor® Plus sonication device) at 4 °C, and centrifugated at
   16,000 g for 30 min. The supernatants were than harvested for Co-IP and
   RIP assay by taking out 0.3 mL cell lysis, incubating with 1.5 μg
   antibodies for 2 h at room temperature, and then adding 25 μL washed
   protein G magnetic beads (Thermo Scientific). After 1.5-h ​incubation
   at room temperature, beads were washed with lysis buffer for 3 times
   with 10 min ​each.

   For Co-IP assay, beads were boiled with protein loading buffer followed
   by western blotting. For RIP assay, beads were resuspended with
   proteinase K buffer containing proteinase K (10 mM Tris-HCl pH 7.5,
   10 mM EDTA, 0.5% SDS). After reaction at 37 °C for 30 min, RNA was
   extracted with TRIzol™ LS reagent (Invitrogen) and prepared for
   qRT-PCR. Primers include ACTB (F: tggagaaaatctggcaccacac, R:
   atcttctcgcggttggccttg); NSP12 (F: agaatagagctcgcaccgta, R:
   ctcctctagtggcggctatt); and N (F: caatgctgcaatcgtgctac, R:
   gttgcgactacgtgatgagg). Antibodies include IgG (Proteintech,
   10284-1-AP); G3BP1 (Proteintech, 66486-1-Ig); IGF2BP1 (Proteintech,
   22803-1-AP); MOV10 (Proteintech, 10370-1-AP); and N (made in Hu Lab).
   All experiments using infectious SARS-CoV-2 were performed in a
   biosafety level 3 (BSL-3) laboratory at the Wuhan Institute of
   Virology, Chinese Academy of Sciences.

4.3. Data resource and curation

   We manually retrieved and curated 25 studies associated with SARS-CoV-2
   on viral protein/RNA-host protein interaction. All data resources and
   relevant information (including method, cell line, tags, baits, and
   datasets) are summarized in [270]Table 1. As the interactome data were
   from various sources, we mapped the viral and host gene names using an
   NCBI resource ([271]ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz,
   March 2022) to unify the names. In addition, we also manually collected
   and downloaded CRISPR screening data associated with SARS-CoV-2. The
   interactome of SARS-CoV-2 proteins in BioGRID
   ([272]https://thebiogrid.org/, BIOGRID-4.4.202) was downloaded and
   extracted ([273]Oughtred et al., 2021). The post-translational
   modifications (PTMs) datasets including phosphorylation, acetylation,
   ubiquitination, methylation, sumoylation, and O-glycosylation were
   downloaded from PhosphoSitePlus database at
   [274]https://www.phosphosite.org/homeAction ([275]Hornbeck et al.,
   2015). We extracted the localization information of host proteins from
   the Cell Map database at [276]https://humancellmap.org/([277]Go et al.,
   2021). Human protein sequences were downloaded from UniProt database at
   [278]https://www.uniprot.org/([279]Uniprot, 2021). The gene expression
   profiles across various tissues were downloaded from GTEx database at
   [280]https://www.gtexportal.org ([281]Consortium, 2020). Similarly, to
   facilitate the subsequent integrated analysis, we unified the gene
   names from various data sources using the above NCBI resource.

4.4. Weight of viral protein/RNA-host protein interactions

   To extract credible interactions, we computed the weight for each
   protein/RNA-protein interaction. Weight represents the frequency of
   this interaction that was identified by different study groups. The
   viral protein-host protein interactions identified by at least two
   (AP-MS) and three (PL-MS) studies were defined as AP-interactome and
   PL-interactome, respectively. The SARS-CoV-2 RNA-host protein
   interactions identified by at least three studies were preserved as
   RNA-interactome.

4.5. Gene Ontology (GO) and KEGG pathway enrichment analysis

   The interacting proteins of all viral proteins obtained from AP-MS and
   PL-MS were tested for enrichment of GO-terms (including biological
   process, cellular component, and molecular function) and KEGG pathways
   using the enricher function of gprofiler2 package v0.2.1 ([282]Kolberg
   et al., 2020) in R with default parameters. Significant GO-terms and
   KEGG pathways were identified at a cutoff of the p-value less than
   0.05. The enrichments of the high-confident host proteins interacting
   with SARS-CoV-2 RNA were similarly tested for three GO categories and
   the KEGG pathways.

4.6. Protein domain and localization analysis

   To understand the binding domains of viral proteins, we performed
   domain analysis for viral protein AP-interactome. We first extracted
   the protein sequences from the UniProt database. Then, we performed
   domain annotation on these host proteins using InterProScan software
   v5.55–88.0 ([283]Jones et al., 2014). Finally, we counted the number of
   host proteins sharing the same domains. Only those domains shared by at
   least two host proteins were used for further analysis. To understand
   the subcellular localizations of viral proteins, we counted the number
   of proteins that localized to the same subcellular compartments defined
   in the Cell Map database ([284]Go et al., 2021).

4.7. Disease ontology enrichment analysis

   We performed disease enrichment of viral protein/RNA-interactome using
   the enricher function of DOSE packages v3.18.3 ([285]Yu et al., 2015)
   in R with default parameters, based on the DisGeNET database at
   [286]http://www.disgenet.org ([287]Pinero et al., 2017). Only enriched
   disease ontology terms with p-value < 0.05 were used.

4.8. Drug-protein interactions

   Host proteins in viral protein/RNA-interactome were examined for known
   chemical compound interactions through the Drug-Gene Interaction
   database ([288]Cotto et al., 2018) (DGIdb at [289]https://dgidb.org/,
   downloaded in Feb 2022).

4.9. Network diffusion analysis

   To systematically identify other host proteins associated with virus
   RNA/proteins beyond our identified interactome, we applied the network
   diffusion method (based on random walk) as implemented in NetCore
   software ([290]Barel & Herwig, 2020). Briefly, for network diffusion
   with a restart procedure, we used the PPI network extracted from the
   human reference protein interactome mapping project (HuRI) at
   [291]http://www.interactome-atlas.org/([292]Luck et al., 2020). This
   network held approximately 53,000 binary protein-protein interactions
   and served as a reference map of human protein interactome. The
   SARS-CoV-2 viral protein-interactome was used as the source seed nodes,
   for which the signal diffusion weights were set to 1 (for other nodes
   set to 0). To exclude the node degree bias in the PPI network, we used
   the node coreness method to normalize the random walk matrix. In
   addition, to reduce the number of potentially false predictions, the
   restart probability was set to 0.8. The sub-networks containing seed
   nodes were identified using NetCore's semi-supervised approach.

4.10. Network generation and visualization

   The PPI networks associated with viral protein/RNA were visualized in
   Cytoscape v3.8.2 ([293]Shannon et al., 2003). Host physical
   interactions were extracted from String database v11.5 at
   [294]https://cn.string-db.org/([295]Szklarczyk et al., 2021). The
   groupings by protein domain, localization, protein complex and
   biological processes were derived from the analyses of protein domains,
   localization, and GO-term enrichment, respectively.

Author contributions

   Guangnan Li: Conceptualization, Methodology, Investigation, Writing -
   Original Draft, Visualization; Zhidong Tang: Conceptualization,
   Methodology, Software, Formal analysis, Data Curation; Weiliang Fan:
   Software, Resources, Visualization; Xi Wang: Investigation; Li Huang:
   Validation; Yu Jia: Validation; Manli Wang: Resources; Zhihong Hu:
   Writing - Review & Editing, Resources, Funding acquisition; Yu Zhou:
   Conceptualization, Writing - Review & Editing, Supervision, Project
   administration, Funding acquisition.

Declaration of competing interest

   The authors declare that they have no conflict of interest.

Acknowledgment