Abstract The COVID-19 disease is an ongoing global health concern. Although vaccination provides some protection, people are still susceptible to re-infection. Ostensibly, certain populations or clinical groups may be more vulnerable. Factors causing these differences are unclear and whilst socioeconomic and cultural differences are likely to be important, human genetic factors could influence susceptibility. Experimental studies indicate SARS-CoV-2 uses innate immune suppression as a strategy to speed-up entry and replication into the host cell. Therefore, it is necessary to understand the impact of variants in immunity-associated human proteins on susceptibility to COVID-19. In this work, we analysed missense coding variants in several SARS-CoV-2 proteins and their human protein interactors that could enhance binding affinity to SARS-CoV-2. We curated a dataset of 19 SARS-CoV-2: human protein 3D-complexes, from the experimentally determined structures in the Protein Data Bank and models built using AlphaFold2-multimer, and analysed the impact of missense variants occurring in the protein–protein interface region. We analysed 468 missense variants from human proteins and 212 variants from SARS-CoV-2 proteins and computationally predicted their impacts on binding affinities for the human viral protein complexes. We predicted a total of 26 affinity-enhancing variants from 13 human proteins implicated in increased binding affinity to SARS-CoV-2. These include key-immunity associated genes (TOMM70, ISG15, IFIH1, IFIT2, RPS3, PALS1, NUP98, AXL, ARF6, TRIMM, TRIM25) as well as important spike receptors (KREMEN1, AXL and ACE2). We report both common (e.g., Y13N in IFIH1) and rare variants in these proteins and discuss their likely structural and functional impact, using information on known and predicted functional sites. Potential mechanisms associated with immune suppression implicated by these variants are discussed. Occurrence of certain predicted affinity-enhancing variants should be monitored as they could lead to increased susceptibility and reduced immune response to SARS-CoV-2 infection in individuals/populations carrying them. Our analyses aid in understanding the potential impact of genetic variation in immunity-associated proteins on COVID-19 susceptibility and help guide drug-repurposing strategies. Keywords: COVID-19, Human genetic variation, SARS-CoV-2: human protein interaction, Protein structure complex, Functional family, CATH database, Protein binding affinity prediction, Immunity Subject terms: Computational biology and bioinformatics, Protein structure predictions Introduction The COVID-19 pandemic has caused a major global health and socioeconomic burden since 2020. Many countries are still experiencing an intermittent rise in the number of infections due to the emergence of new Variants of Concern (VOCs) of SARS-CoV-2 and their sub-variants^[44]1. Although vaccines are now available, re-infection is common^[45]2. Various factors including ethnicity, age and clinical conditions have been proposed to be associated with an increased risk of infection^[46]3–[47]10. In addition, increasing reports indicate that human genetic variation is a contributing factor for increased susceptibility and disease severity^[48]11–[49]13. Potential drug targets include human host proteins which interact with SARS-CoV-2^[50]14,[51]15. In 2020, Krogan group identified a total of 332 human proteins that interact with SARS-CoV-2 proteins, using affinity purification followed by mass spectrometry (AP-MS)^[52]15. Subsequently, the presence of additional human proteins interactors of SARS-CoV-2 was also revealed by other studies based on techniques such as yeast two-hybrid assay, anti-tag coimmunoprecipitation, tandem affinity purification, pull down, structure-based studies (X-ray, NMR), etc.^[53]14,[54]16–[55]19 and are made available via dedicated protein-interaction resources such as IntAct ([56]https://www.ebi.ac.uk/intact/home)^[57]20 and BIOGRID^[58]21. These studies indicate that interactor proteins in humans participate in a wide range of biological processes/pathways, including innate and adaptive immune pathways, lipid metabolism, cell adhesion, mRNA processing, among others^[59]14,[60]15,[61]19,[62]22. Innate immune suppression is known as one of the key characteristics of infections by SARS-CoV-2 and its VOCs. SARS-CoV-2 VOCs (namely Alpha, Beta, Gamma, Delta, and Omicron) are reported to exhibit increased interferon resistance as compared to the wild-type suggesting evasion of innate immunity is a driving force for SARS-CoV-2 evolution^[63]23,[64]24. Furthermore, inborn variation in immunity-associated genes is reported to trigger susceptibility to COVID-19^[65]25,[66]26. For example, rare variants in the Toll-like receptor 7 gene have been associated with increased severity and susceptibility of the COVID-19^[67]12. Likewise, various studies including those by the GeNOMICC-ISARIC consortium suggests association of both common and rare variants with increased severity of COVID-19^[68]27–[69]32. Though spike-ACE2 binding is the key entry mechanism used by SARS-CoV-2 for cell entry and infection, recent experimental studies indicate that SARS-CoV-2 also uses innate immune suppression as a strategy to speed-up entry and replication in the host cell^[70]33,[71]34. Interactions such as SARS-CoV-2:ORF9b-human:TOMM70 and SARS-CoV-2:NSP1-human:NUP98 have been associated with innate immune evasion^[72]14,[73]34,[74]35. Further experimental studies have suggested involvement of other human protein interactors of SARS-CoV-2 associated with the immunity-associated pathways (e.g., IFIH1, ISG15, IFIT2)^[75]33,[76]34,[77]36. In the case of SARS-CoV-2, spike-ACE2 is the most studied protein complex, where the impact of emerging variants in spike as well as natural human population variants in ACE2 have been computationally predicted by the Barton group and others^[78]37–[79]40. Some of these predicted variants in ACE2 and spike proteins, have also been validated experimentally^[80]37,[81]40. In this study, we focus on the impact of variants in immune-associated proteins and novel spike receptors (such as AXL and Kremen1) in humans on their binding to SARS-CoV-2 proteins. Such analyses could explain novel mechanisms by which SARS-CoV-2 proteins interfere with natural immune pathways and disrupt the system in humans. Computationally predicted (docking-based) complexes for a subset of interactions identified from the Krogan study, are made available via several resources^[82]41,[83]42. The Beltrao group has designed a resource called Mutfunc, which has predicted impacts of all possible single-amino acid substitutions in SARS-CoV-2 proteins, using known 3D structure data ([84]http://sars.mutfunc.com/home^[85]43). Similarly, the Ascher group developed the COVID-3D resource, to analyse > 11,000 SARS-CoV-2 variants and predicted the impact of missense variants using spike-ACE2 complex^[86]39. More recently, a powerful AlphaFold2-based protein structure prediction method has been developed for modelling protein–protein complexes which facilitates modelling of interactions between SARS-CoV-2 and human proteins which have yet to be experimentally characterized, and with improved accuracy compared to other approaches^[87]44–[88]48. In this study, we analysed the impact of missense coding variants in human and viral proteins occurring at protein–protein interfaces, using a curated dataset of 19 immune-associated SARS-CoV-2:human protein 3D-complexes, obtained from the Protein Data Bank^[89]49 and models built using AlphaFold2-multimer^[90]44,[91]46. For the human proteins, we obtained population variants from various databases including gnomAD (^[92]50 [93]https://gnomad.broadinstitute.org) and GenomeAsia 100K^[94]51. For the viral proteins, we obtained mutation data from ViralZone ([95]https://viralzone.expasy.org/), CoV-Glue ([96]https://cov-glue.cvr.gla.ac.uk/) and COG-UK ([97]https://sars2.cvr.gla.ac.uk/cog-uk/)^[98]52–[99]54. The impact of variants on binding affinity of the complexes was computationally predicted using a state-of-the-art program (mCSM-PPI2)^[100]55. The structural and functional impact of the predicted affinity-enhancing variants was analysed in the context of proximity to known functional sites such as protein–protein interfaces, ligand or substrate -binding sites and predicted sites identified using conserved positions in Functional Families in the CATH database (i.e., CATH-FunFams)^[101]56–[102]58. CATH-FunFams represent functionally coherent groups i.e., members of a CATH-FunFam have high structural and functional similarity^[103]56. Finally, the human proteins implicated in enhanced SARS-CoV-2 binding are mapped onto protein networks to understand biological pathways/processes associated with the network modules. We then studied whether these proteins are associated with CATH-FunFams that are enriched in small molecules or drugs from ChEMBL^[104]59,[105]60. In summary, we analysed the impact of missense coding variants occurring at protein–protein interfaces in a total of nineteen 3D complexes of human proteins and SARS-CoV-2 interactors. A total of 26 affinity-enhancing variants from 13 human proteins (namely TOMM70, IFIH1, IFIT2, ISG15, RPS3, PALS1, NUP98, AXL, ARF6, KREMEN1, TRIMM, TRIM25 and ACE2) were predicted to enhance binding affinity to their interacting proteins in SARS-CoV-2. Our study sheds light on affinity-enhancing variants in immunity-associated proteins; their frequencies in gnomAD populations; their impact on protein structure and function and the populations more likely to be susceptible to COVID-19 infection. We provide computational evidence that the predicted affinity-enhancing variants in human proteins could promote binding to SARS-CoV-2 proteins, instead of their natural protein partners or substrates in immune pathways, thereby hampering the normal antiviral activity and leading to increased susceptibility. Protein Functional families associated with three proteins (IFIH1, AXL and ARF6) are associated with small molecule inhibitors and their potential applications in drug-repurposing in discussed. Materials and methods Compilation of interactors associated with SARS-CoV-2 immunity The dataset of human proteins interacting with SARS-CoV-2 proteins, was compiled using the COVID-19 UniProtKB resource (^[106]61; [107]https://covid-19.uniprot.org/uniprotkb?query=*) and IntAct database (^[108]20; [109]https://www.ebi.ac.uk/intact/home). IntAct provides a COVID-19 dataset of SARS-CoV-2: human protein interactors which is based on interactions reported from experimental studies. For every protein–protein interaction, IntAct assigns an MIscore (ranges from 0 to 1) based on (i) the type of the experimental detection method (ii), the number of associated publications and (iii) the interaction types (such as direct association, physical association). The IntAct recommended MIscore threshold of 0.45 was used to exclude low-confidence interactions^[110]62. Thus, the dataset of a total of 536 high-confidence (MI-score ≥ 0.45) interactions are considered for subsequent analyses. We further filtered immunity-associated human proteins by mapping specific GO terms associated with immunity using UniProt (i.e., GO:0002250 GO:0002218, GO:0002376, GO:0045087, GO:0045089, GO:0060337, GO:0050776 and GO:0006955)^[111]61,[112]63; and by mapping the UniProt IDs to InnateDB database ([113]http://innatedb.sahmri.com/^[114]64. We also curated available literature-based evidence specifying a COVID-19-associated immunological role associated with interacting pairs of proteins in dataset. Functional families in the CATH database and conserved functional sites The CATH database provides a hierarchical structural classification of protein domains into Class (C), Architecture (A), Topology (T) and Homologous Superfamily (H). In CATH, protein domains are classified into superfamilies where there is strong evidence of an evolutionary relationship via structure and sequence similarity^[115]65,[116]66. Within each superfamily, sequences are sub-classified using an agglomerative clustering protocol followed by an entropy-based segregation of functionally coherent subgroups known as Functional Families (CATH-FunFams)^[117]56. The conserved sites obtained from CATH-FunFams have been shown to be enriched in known protein functional sites^[118]56,[119]57. For the dataset of shortlisted human proteins and their SARS-CoV-2 interactors, we identified CATH-FunFams, and subsequently conserved sites as follows: Identification of CATH-FunFams We scanned the sequences of human and their interacting SARS-CoV-2 proteins against HMMs of FunFams in CATH v4.3 ([120]https://www.cathdb.info/), using HMMsearch (E-value < 1e^−3)^[121]67. We then processed the output of HMMsearch using cath-resolve-hits, an in-house tool built to obtain the best non-overlapping set of domain matches^[122]67,[123]68. Identification of conserved sites using CATH-FunFams Using the matching CATH-FunFams, we identified conserved sites using Scorecons, an entropy-based method^[124]69. The multiple sequence alignment (MSA) program, namely MAFFT is used to construct an MSA from seed sequences within a FunFam^[125]68. The Scorecons program is then applied to each MSA to determine an overall measure of sequence diversity called Diversity of Positions score (DOPs). DOPs captures the amount of diversity in an MSA by considering all the different conservation scores, and their frequencies, and provides a value from 0 (i.e., zero diversity) and 100 (i.e., high diversity). Only MSAs with a DOPs score ≥ 70 were considered for further analyses. The Scorecons program also provides the degree of conservation of each position in the MSA. Thus, for each column in a CATH-FunFam based MSA, the Scorecons program provides a conservation score ranging from 0 (i.e., not conserved) to 1 (i.e., completely conserved). The sites belonging to alignment positions with Scorecons-based score ≥ 0.90 are used for analyses and are referred to as Scorecons90 in this manuscript. In addition to Scorecons method by Valdar, we used other conservation scoring methods such as Shenkin^[126]70 and Thompson^[127]71 (available from Jalview^[128]72 ([129]https://www.jalview.org/) for detecting conservation scores. We used the consensus among the two methods for predicting conserved sites (score ≥ 0.90). Compilation of missense coding variants in human and SARS-CoV-2 proteins Human protein variants For the human genes, missense coding variants from canonical transcripts were obtained from the Genome Aggregation Database (gnomAD, v2.1.1; the recommended version for coding region analyses) ([130]https://gnomad.broadinstitute.org/)^[131]50. We compiled ancestry (i.e., ethnic population) information available from gnomAD, using VarSite^[132]73. GnomAD provides ancestry for the following populations: African/African American (afr), American Admixed/Latino (amr), Amish (ami), Ashkenazi Jewish (asj), East Asian (eas), South Asian (sas), Finnish (fin) and Non-Finnish European (nfe). If individuals did not unambiguously cluster with any of these populations in a principal component analysis (PCA), gnomAD classifies them as "other" (oth). GnomAD v2 also provides sub-continental information for the East Asian cohort (Japanese, Koreans) and Non-Finnish European (Bulgarian, Estonian, Swedish, North-Western European, Southern European) populations. SARS-CoV-2 protein variants For each of the associated interactor proteins in SARS-CoV-2, a non-redundant set of mutations in strains of SARS-CoV-2 (including VOCs and Variants of Interest) was compiled from resources such as ViralZone^[133]54, COG-UK (^[134]52 [135]https://sars2.cvr.gla.ac.uk/cog-uk/), and CoV-Glue ([136]https://cov-glue-viz.cvr.gla.ac.uk/)^[137]53. Functional site data from CATH-FunVar (Functional Variation) protocol All gnomAD missense variants in the interactor human genes were processed using a modified version of the in-house FunVar protocol to identify variants occurring near known and predicted functional sites. Predicted sites comprise CATH-FunFam-based conserved residues, i.e., highly conserved residues identified by the Scorecons program (i.e., Scorecons90 sites, as described above). Known functional sites were compiled from existing resources and include: known ligand and nucleic acid binding sites from BioLip^[138]74; protein–protein interface (PPI) residues from PDBSum^[139]75; catalytic sites from M-CSA and VarMap^[140]76,[141]77 and annotated functional sites in UniProt resource. Spatial proximity of each gnomAD variant to these sites was found by mapping each variant to a representative domain from the corresponding FunFam in CATH v4.3. Functional sites were similarly mapped to FunFam representative domains, allowing detection of variants occurring on or near (within 5 Å) of any of the functional sites. For each variant, the Grantham score ([142]https://gist.github.com/danielecook/501f03650bca6a3db31ff3af2d413 d2a) was calculated to identify variants having a significant change in physico-chemical properties (such as volume, polarity), as compared to that of wild-type residues. Finally, each variant was assigned a simple functional impact score (from 1 to 5) by counting each of the impacts—scoring 1 for each of: high Grantham score; variant is a catalytic site; variant lies on a known functional site; variant near (5 Å) a known site; variant is on a conserved predicted (Scorecons90) site. Additionally, CATH-FunVar reports impact scores from CADD^[143]78 and SIFT^[144]79. Three-dimensional (3D) structures of complexes 3D structures of complexes are available for 10 interactions as follows—human:TOMM70-SARS-CoV-2:ORF9b [PDB ID: 7KDT], human:ISG15-SARS-CoV-2:PLpro [7RBS], human:RPS2-SARS-CoV-2:NSP1 [6ZMT], hRPS3-SARS-CoV-2:NSP1 [6ZMT], human:Ubiquitin-SARS-CoV-2:Plpro [7RBR], human:APOA1-SARS-CoV-2:ORF3a, human:PALS1-SARS-CoV-2:E, human:NUP98-SARS-CoV-2:ORF6 [7VPH] and humanRAE1-SARS-CoV-2:ORF6 [7VPH] and human:ACE2-SARS-CoV-2:spike [wild-type (6M0J, 7A95); Alpha (7EDJ), Beta (7V7Z), Gamma (7V83), Delta (7V89), Omicron (7T9K), BA.1 (7XO6), BA.2 (7XB0, 7XO8), BA.3 (7XB1)]. For the remaining interactions with no available structures of complexes in the PDB, we predicted models using AlphaFold2-ptm and AlphaFold2-multimer(v1)^[145]44,[146]46, as described below. Modelling complexes using AlphaFold2-ptm and AlphaFold2-multimer Prior to modelling protein–protein complexes, we excluded some interactions from the modelling procedure due to the following reasons—we excluded proteins of very short lengths such as ORF3b (22aa residues long) and proteins for which high-quality models are not built by AlphaFold2 ([147]https://alphafold.ebi.ac.uk/). We modelled the remaining complexes using the AlphaFold2-ptm and alphafold-multimer(v1) protocols, which were made available in March, 2022 (^[148]44,[149]46,[150]80; [151]https://github.com/sokrypton/ColabFold). We built models using both the AlphaFold2_ptm and AlphaFold2-multimer(v1) methods and then selected a model from one of these methods, whichever had the best interface quality (see “[152]Results”, Table [153]1). Table 1. Summary of human: SARS-CoV-2 protein complexes used for the study. SARS-CoV-2 protein Human protein interactor Source of Complex Interface pLDDT score Interface PAE score PIZSA Interface stability Z-score (≥ 1.5 = stable) PRODIGY ΔG (kcal/mol) ORF9b TOMM70 PDB [ID:7KDT] n/a n/a 2.507 − 17.4 NSP1 RPS2 PDB [ID: 6ZMT] n/a n/a 2.283 − 4.6 NSP1 RPS3 PDB [ID: 6ZMT] n/a n/a 2.805 − 4.5 Spike-RBD ACE2 PDB [ID:6M0J] n/a n/a 2.582 − 11.9 ORF6 NUP98 PDB [ID: 7VPH] n/a n/a 2.780 − 6.1 ORF6 RAE1 PDB [ID: 7VPH] n/a n/a 2.161 − 8.8 ORF3a APOA1 PDB [ID: 7KJR] n/a n/a 2.569 − 15.6 E PASL1 PDB[ID:7M4R] n/a n/a 1.685 − 6.6 PLPro ISG15 PDB[ID:7RBS] n/a n/a 2.046 − 13.4 PLPro UBB PDB[ID:7RBR] n/a n/a 2.424 − 10.4 PLPro IFIT2 AlphaFold2-ptm 72.82 9.86 2.460 − 11.3 PLPro IFIH1 AlphaFold2-multimer(v1) 83.00 10.00 2.375 − 10.1 N TRIM25 AlphaFold2-ptm 93.15 2.87 2.24 − 9.1 NSP7 SCRB1 AlphaFold2-multimer(v1) 83.87 4.27 2.144 − 8.0 Spike-RBD KREMEN1 AlphaFold2-ptm 81.02 4.74 2.148 − 10.2 NSP15 ARF6 AlphaFold2-ptm 82.61 7.20 2.063 − 7.5 NSP14 TRIMM AlphaFold2-ptm 81.24 7.58 1.974 − 8.8 Spike-NTD AXL AlphaFold2-ptm 75.31 8.55 2.321 − 10.6 ORF7b UN93B AlphaFold2-ptm 72.77 9.45 2.488 − 5.1 [154]Open in a new tab The table lists 3D complexes used in this study. We outline the source, quality metrics such as Interface pLDDT (≥ 70), Interface PAE (≤ 10), PIZSA Interface stability Z-score (≥ 1.5 indicates stable interface) and PRODIGY binding energy (ΔG ≤ − 4.3 kcal/Mol, cutoff chosen based on experimental binding energy values reported in the PRODIGY^[155]82. The literature supporting biochemical evidence of protein interaction is also cited in Supplementary file [156]2. For modelled complexes, we first built models using the AlphaFold2 (ptm method) and AlphaFold2-multimer(v1) methods and then selected the model from one of these methods which had the best interface quality. High-confidence models were chosen where models have overall pLDDT (predicted local difference distance test) ≥ 70 as well as pTM-Score (predicted TM-score) ≥ 70^[157]44,[158]46. We further filtered complexes on the basis of the interface quality i.e., interface-pLDDT (i.e., those with < 70 were excluded) and interface-PAE (predicted alignment error > 10 were excluded) and by manually inspecting the domain interface regions (i.e., we excluded models where we observed erroneous overlapping/entangled interface). We performed additional quality checks such as verifying the interface stability score using the PIZSA method (which calculates protein interaction Z-Score; ≥ 1.5 indicates stable interface)^[159]81 and predicted binding affinity of the complexes by the PRODIGY (PROtein binDIng enerGY prediction) method^[160]82. Using the 3D-structure of the protein–protein complex as an input, the PRODIGY server ([161]https://wenmr.science.uu.nl/prodigy/) predicts the binding affinity of the complex and provides the predicted value of the binding free energy (ΔG) in kcal/mol. PRODIGY is benchmarked using an experimental dataset comprising 144 non-redundant protein–protein complexes with known 3D structures (of both bound and unbound). The resultant high-confidence models predicted in this study along with their quality metrics are given in “[162]Results” section, Table [163]1. Extraction of interface residues using complexes After the selection of the complexes, we extracted interface residues as follows: Directly-contacting (DC) residues For the PDB complexes, we extracted directly-contacting (DC) interface residues made available by PDBSum^[164]75. PDBSum calculates the protein–protein interfaces using the NACCESS program ([165]http://www.bioinf.manchester.ac.uk/naccess/). For the modelled complexes, we extracted interface residues by selecting residues from interacting chains with heavy atom distances ≤ 4 Å)^[166]83,[167]84. Human protein missense variants in gnomAD that occur in DC interface residues are referred to as ‘DC-variant’ residues. Secondary shell residues Residues that occur within 5 Å from the DC interface are considered to be residues in the secondary shell and variants at these residue positions are likely to influence binding^[168]38,[169]85 and are referred to as ‘DCSS-variants’. Predicting the impact of variants in human and viral proteins on the binding affinity of the complexes For the filtered dataset of complexes (experimental and predicted), we applied the mCSM-PPI2 program^[170]55 to identify human and viral missense variants which could significantly impact binding-affinity of interacting proteins. We analysed the impact of missense variants reported in gnomAD that occur at directly contacting interface (DC-variant) and secondary shell (DCSS-variant) residues. The mCSM-PPI2 program, developed by the Ascher lab, was shown to be the top-performing method when compared to 26 other methods in CAPRI (round 26) blind tests^[171]55. mCSM-PPI2 is based on a graph-based structural signature framework with evolutionary information, inter-residue non-covalent interaction networks analysis plus computed energetic terms, providing an optimised overall prediction performance. It was used to predict change in binding affinity (i.e., mCSM-PPI2 ΔΔG^Affinity in kcal/mol) for each mutation. A positive mCSM-PPI2 ΔΔG^Affinity score (> 0 kcal/mol) indicates that the mutation is stabilising the interaction whereas a negative ΔΔG^Affinity score (< 0 kcal/mol) indicates a destabilising effect^[172]55. Where available, we compiled evidence from experimental studies reporting experimental mutagenesis and binding affinity kinetic assays^[173]14,[174]37 to choose our ΔΔG thresholds. For example, in the case of spike-ACE2 complex, the mutations K26R in ACE2 and S477N in the spike protein are reported to increase the binding affinity of the spike-ACE2 complex using kinetic assays^[175]37,[176]38, and these are predicted to have mCSM-PPI2 ΔΔG^Affinity of 0.12 kcal/mol and 0.5 kcal/mol respectively. Likewise, experimental alanine scanning mutagenesis studies in the ORF9b-TOMM70 complex indicates that a E477A mutation in TOMM70 and a S53A mutation in ORF9b significantly reduced the binding affinity of the interaction^[177]14, and these are predicted to have a mCSM-PPI2 ΔΔG^Affinity of < − 0.5 kcal/mol. Therefore, we used the confidence cut-offs of ΔΔG^Affinity ≤ − 0.5 kcal/mol (for destabilising; affinity-reducing) and ≥ 0.5 kcal/mol (for stabilising; affinity-enhancing), to analyse the structural impact of gnomAD variants in this study, while we also provide a catalogue of other mutations predicted to have scores ranging from 0 to 0.490 kcal/mol (see “[178]Results” section). Through in silico saturation mutagenesis, we further confirmed that the mCSM-PPI2 method does not show bias towards predicting positive ΔΔG^Affinity scores and corresponds well to observed changes in amino acid properties for mutated residues. To evaluate the reliability (reproducibility) of the binding affinity values predicted by mCSM-PPI2, we computed 95% confidence interval for affinity-enhancing variants (ΔΔG^Affinity ≥ 0.5 kcal/mol). We generated 10 different versions for each of the SARS-CoV-2:Human protein complex structure, using GalaxyrefineComplex software^[179]86, which is specifically designed for structure refinement of protein–protein complexes. Its refinement protocol involves both an initial local energy minimization and a 1.2-ps molecular dynamic (MD) relaxation with a 4-fs time step. Thus, incorporation of MD relaxation step allows some physical space for atomic fluctuations, facilitating further conformational sampling. Particularly, relaxation of the input complex structure is driven by side-chain repacking of interfacial residues, which ensures overall conformational changes. Side-chain repacking and MD relaxation is repeated 22 times (please refer to Heo et al.^[180]86, for details). Analysis of predicted affinity-enhancing variants As the affinity-enhancing variants could be associated with increased risk of susceptibility/infection, we closely examined their 3D structural impact on molecular interactions and characterised these variants in the context of known and predicted functional sites (see FunVar section above) and using UniProt site annotations. Variants are mapped on 3D-complexes and visualized using UCSF Chimera^[181]83. We further analysed their associated population allele frequencies in gnomAD as well as other databases such as Indigenomes^[182]87, SweGen^[183]88, GenomeAsia 100K^[184]51, jMORP^[185]89 and the NIH-funded research hub called All of Us (^[186]90 [187]https://databrowser.researchallofus.org/). Finally, for affinity-enhancing variants we analysed pathogenicity scores by mutpred2 (score ≥ 0.50), CADD (score ≥ 20) and SIFT (deleteriousness score ranges from 0 to 0.05). MutPred2 is a machine learning-based method, which uses structural and functional features (e.g. secondary structure, allostery, binding site data, etc.) to provide pathogenicity score. SIFT relies on sequence conservation information and considers position and the type of amino acid change. CADD combines multiple sources of information (genomic features, biochemical activity, and scores from other predictors and applies a machine learning based scoring system. These methods were chosen as the study by Pejaver group showed that Mutpred2 performs better than other predictors while CADD and SIFT were the second and third best performing tools^[188]91. We used Dynamut2^[189]92, to analyse impact of affinity-enhancing variants on protein stability in human proteins ([190]https://biosig.lab.uq.edu.au/dynamut2/). Network mapping and enrichment analysis Identification of network modules The human proteins containing affinity-enhancing variants with putative impact on SARS-CoV-2 binding, were mapped to ConsensusPathDB (CPDB) and STRING database (STRINGdb) protein–protein interaction networks^[191]93,[192]94. Interactions from STRINGdb were filtered to include only those with confidence values ≥ 0.2 to optimize signal-to-noise ratio. A module detection algorithm (M1) was applied to the network using the MOdularising NEtwork Toolbox that adopts a multiresolution approach to combine optimization algorithms to improve modularity^[193]95. Enrichment analyses Pathway enrichment analysis was performed using g:profiler ([194]https://biit.cs.ut.ee/gprofiler/gost^[195]96), a public webserver using Ensembl and Ensembl Genomes to identify significantly enriched terms in Gene Ontology (GO) biological processes, KEGG and Reactome databases^[196]97–[197]99. Pathways were ranked by significance (p < 0.01) and the most significant term per database for the protein or each module was taken as the representative pathway. Where necessary, Ensembl identifiers with the most GO annotations were selected according to the default g:profiler function. Values and pathways for STRINGdb and CPDB modules were retained. Identification of druggable functional families in CATH database The human proteins with affinity-enhancing variants, were mapped to the CATH functional families (CATH-FunFams) that are linked with small molecule information from ChEMBL^[198]59,[199]60. For the associated human proteins, the druggability score of the protein–protein interface region was analysed using the CavityPlus and CASTp programs^[200]100,[201]101. Where the protein–protein complex is significantly predicted to contain druggable cavity, we performed docking using AutoDockTools^[202]102.The ligand molecule was extracted from the PubChem database^[203]103. The predicted druggable cavity site by CavityPlus was used to dock the drug molecule. The methodology used in the study is summarised in Fig. [204]1. Figure 1. [205]Figure 1 [206]Open in a new tab Flow-chart of the methodology used in this study. The dataset of SARS-CoV-2: human protein interactors is analysed using variants from gnomAD (in case of human proteins). The in-house CATH-FunVar (Functional Variation) pipeline was used to extract all annotations on known and predicted functional sites. For the missense coding variants from gnomAD, the impact of binding affinity of the complex was analysed using mCSM-PPI2 program. Affinity-enhancing variants were analysed using functional sites, and population data from gnomAD. The human proteins predicted to contain affinity-enhancing variants are used for pathway enrichment and identification of CATH functional families (CATH-FunFams) linked with small molecules/drugs from ChEMBL. Results Dataset of SARS-CoV-2: human protein interactors, 3D-complexes and missense variants As described in Methods, out of the 536 high-confidence (MIscore ≥ 0.45) protein–protein interactions, we curated 94 human proteins involved in SARS-CoV-2 infection and immunity by identifying those with immunity-associated GO terms. For the curated dataset of 94 human proteins, we compiled information from multiples sources: the interactor protein in SARS-CoV-2, IntAct MIscore^[207]20, UniProt accession IDs^[208]61, immune-associated GO terms, literature evidence linked to the interaction in IntAct, literature-based evidence for COVID-19 immune association, CRISPR-association (obtained from BIOGRID-ORCS; [209]https://orcs.thebiogrid.org/^[210]21) and gene-expression (obtained from SARS-COVIDB; [211]https://sarscovidb.org/^[212]104). This information is provided in the Supplementary files [213]1 and [214]2. In total, the 94 human proteins were associated with 110 SARS-CoV-2:human interactions (Supplementary file [215]2). Experimental 3D structures for 10 interactions were available in the PDB, with a further 9 high-quality models predicted using AlphaFold2-multimer/ptm method, as summarised in Table [216]1. Thus, a total of 19 protein 3D structural complexes were used for subsequent analyses. For the human proteins from this curated dataset of 19 complexes (Table [217]1), we analysed a total of 468 missense variants from gnomAD; that occur at directly contacting (DC) residues in the interface of the complexes (DC-variants, 131 in total) as well as those that occur within 5 Å from the DC residues i.e., secondary shell (DCSS-variants, 337 in total). A total of 26 variants from 13 human proteins, are predicted to significantly enhance binding affinity to their SARS-CoV-2 protein partners by the mCSM-PPI2 program (ΔΔG^Affinity ≥ 0.5 kcal/mol), as given in Table [218]2. Table 2. 13 human proteins with variants predicted to have significant impact on binding affinity to SARS-CoV-2 interactor proteins (by mCSM-PPI2 program^[219]55). Name of the complex: human (SARS-CoV-2 protein) Affinity-enhancing variants in human proteins (ΔΔG^Affinity ≥ 0.5 kcal/mol) (stabilizing mutations) Affinity-reducing variants in human proteins (ΔΔG^Affinity ≤ − 0.5 kcal/mol) (de-stabilizing mutations) Other affinity-enhancing variants in human proteins (0 < ΔΔG^Affinity < 0.49 kcal/mol) hTOMM70 (ORF9B) V556L, K576R, A591T, V514I, A483T I412T, Q477G, H515Q, L584M, I 554T, D589N, L111P, F155L, G530S, D397N A582V hISG15 (PLPro) L121Q L60V, G128V, L28M, F122L, P130T, E127G, M23V, N151D, L85P, M23T, R155W,N151K, R155Q S21N, Q55R, Q118R,V148L,A30V,D56Y, Q31H hIFIH1 (PLpro) Y13N, S16L – L120F, F107S, L121F, M95 hIFIT2 (PLpro) L373F, K221E, A319S, A319T, L373F, Y383F N300K, R376Q, R406S, L375V, R376P, Q384K, L323P, G398V, F380L Q381H, R292K, P190Q, A391V hRPS3 (NSP1) V164I, I99F R106C, L86H, Y34H, V153A E81Q, A52T, I223V, A114P, hNUP98-hRAE1(ORF6) NUP98: T190S – NUP98: K185R, I162V RAE1: A275V, S311L hARF6 (NSP15) L166F I29V, L33V, Y78C N56H, D68H, I96F hTRIM25 (N) A466T V470M, F615L C475Y, P490L, I457V, A471S, K458E, C506S hTRIMM (NSP14) I105F – Q160R, H223Y, K146R, A144E, I13F hAXL (spike-NTD) V38M L54Il, Q57K A79E hKremen1 (spike-RBD) Y66H, V189I Y98C, C200Y, C86Y, E187K, V93M, E187K, V93M, H196L, Y167C, Y107H, G166D, W108R – hACE2 (spike-RBD) G326E* G352V, D355N, L585P A501T*, K26R*, S19P*, K26E* hPALS1(E) L321F – – [220]Open in a new tab The first column lists the name of complexes in the following format—human protein (SARS-CoV-2 interactor protein). For every complex, the variants in human proteins that are predicted to enhance binding affinity with ΔΔG^Affinity score ≥ 0.5 kcal/mol are indicated in bold. The variants that are predicted to increase binding affinity but with lower scores ranging from 0 < ΔΔG < 0.49 kcal/mol are provided in the last column. *Indicates that the ACE2 variants reported in our study are also noted to enhance binding affinity in previous studies^[221]38,[222]40. To further validate the reproducibility of the binding affinity scores by the mCSM-PPI2, we calculated 95% confidence interval (CI) values for all 26 variants. For all of the 26 variants we noted that either both or the upper CI value is above ≥ 0.5 kcal/mol, thus confirming the reproducibility of the results (please see details in Supplementary Table [223]1). For the affinity-enhancing variants (ΔΔG^Affinity scores ≥ 0.5 kcal/mol) from the 13 human proteins (Table [224]2), we analysed their impact on protein structure and function and their allele frequency distribution related to distinct populations in the gnomAD database and other databases including GenomeAsia 100K (see methods). The structural impact of 26 affinity-enhancing human variants on the local stability of human protein structure was also analysed using Dynamut2 program. The majority of the variants are observed to cause a mild destabilizing impact on the stability on the protein, while only two variants were predicted cause stronger impacts on stability (with < − 2 kcal/mol) (see Supplementary Table [225]2). However, neither of these are associated with diseases in ClinVar. According to gnomAD most of the predicted affinity-enhancing variants are rare in human populations, while a few affinity-enhancing variants in IFIH1 and ISG15 are observed to be common (see figure in Supplementary file [226]3). For the 13 human proteins, affinity-enhancing variants were analysed in the context of proximity to known and predicted functional sites (Supplementary file [227]4). The potential mechanisms associated with suppression of the immune system implicated by these variants are discussed in the following section, using some complexes to illustrate our approach. Structure–function analyses of predicted affinity-enhancing variants Impact of human TOMM70 variants on SARS-CoV-2: ORF9b binding TOMM70 protein is one of the major human import receptors in the translocase of the outer membrane (TOM) complex. It recognizes and mediates the translocation of mitochondrial preproteins from the cytosol into the mitochondria in a chaperone(HSP90)-dependent manner^[228]105. It is involved in activation of the innate immune system [GO:0002218] and interacts with ORF9b, which is a key viral innate immune antagonist in SARS-CoV-2^[229]14,[230]105,[231]106. A study by the Krogan group revealed that TOMM70 is a high-confidence interactor of SARS-CoV-2 ORF9b indicating that binding of ORF9b to the C-terminal domain of TOMM70 is associated with suppression of the innate immune response^[232]33. We analysed the impact of missense variants using the experimental structure of this complex (PDB ID: 7KDT^[233]14). Affinity-enhancing variants: structural-function impact and population distribution Three DC (directly contacting)-variants in TOMM70 (V556L, K576R, A591T) and two DCSS-variants (V514I, A483T) were predicted to significantly increase affinity (ΔΔG^Affinity ≥ 0.5 kcal/mol) (Table [234]3 and Fig. [235]2). Table 3. Details of variants in human:TOMM70 that impact binding to SARS-CoV-2:ORF9b. Variant ID rsID Missense variant in gnomAD mCSM-PPI2-prediction (kcal/mol) Functional site annotation GnomAD populations [and allele frequency] 3-100086895-C-T rs1306989616 p.Val556Leu 0.905 DC_PPI African/AA [0.00007] 3-100084508-T-C rs1370508158 p.Lys576Arg 0.618 DC_ PPI South Asian [0.00003] 3-100084464-C-T rs756063544 p.Ala591Thr 0.599 DC_PPI Latino/AA [0.00008] AFR [0.00004] Other [0.0001] 3-100087892-C-T rs957967770 p.Val514Ile 0.562 DCSS_ PPI Latino/AA [0.0001] European (non-Finnish [0.00003] 3-100091455-C-T rs770985289 p.Ala483Thr 0.499 DCSS_ PPI European (non-Finnish) [0.00002] Latino/AA [0.00002] 3-100092482-A-G rs1266227924 p.Ile412Thr − 1.054 DC_ PPI African/AA [0.00008] 3-100091472-T-C rs750026792 p.Glu477Gly − 1.014 DC_ PPI South Asian [0.000032] 3-100105815-A-G rs765666545 p.Leu111Pro − 1.056 near_ligsite European (non-Finnish) [0.000008] 3-100084470-C-T rs753612125 p.Asp589Asn − 0.534 DCSS_PPI African/AA[0.0000615]; South Asian [0.00003] 3-100086973-C-T rs775872768 p.Gly530Ser − 0.545 – South Asian [0.00003] 3-100093900-C-T rs762230068 p.Asp397Asn − 0.536 – African/AA [0.0001] European (non-Finnish) [0.000008] [236]Open in a new tab For each variant, SNP ID (i.e. rsID), amino acid mutation, mCSM-PPI2 prediction (ΔΔG^Affinity), functional site annotations and population in gnomAD (in which these variants are present) are indicated. Functional sites are abbreviated as follows- DC_PPI: directly contacting interface site, DCSS_PPI, secondary shell residue from the DC interface site, Ligsite: indicates occurrence in a known ligand binding site and near_ligsite indicates sites that are proximal to a known ligand (HSP90) binding site. Figure 2. [237]Figure 2 [238]Open in a new tab The structure of human TOMM70 in complex with SARS-CoV-2 ORF9b [PDB ID: 7KDT]. (A) ORF9b-TOMM70 complex: SARS-CoV-2 ORF9b (blue) interacts with the C-terminal domain of human TOMM70 (tan). ORF9b binds at the substrate-binding pocket in TOMM70. The structural locations of variants (with ΔΔG^Affinity ≥ 0.5 kcal/mol) are shown in red. These include three DC-variants (V556L, K576R, A591T) and two DCSS-variants (V514I, A483T). (B) The structural impact of DC-variants on atomic interactions at the interface, are shown. HP: Hydrophobic (green), H-bond: hydrogen bond (red), VDW: Van-der-Waals (blue), Polar (orange). (C) Effects of affinity-enhancing variants in TOMM70 are shown in detail (figures in 2C source: mCSM-PPI2^[239]55). The variant V556L has the highest predicted change in the binding affinity (ΔΔG^Affinity score of 0.905 kcal/mol). V556 is a directly contacting (DC) residue at the ORF9b:TOMM70 interface. The wild type V556 in TOMM70 forms hydrophobic bonds with A68 and F69 in ORF9b, and a polar interaction with T72. The mutant V556L gains additional hydrophobic bonds with A68 and F69, resulting in increased predicted affinity for SARS-CoV-2: ORF9b. The DCSS-variant at residue position 483 lies close to (i.e., within 5 Å) the phosphorylation site in SARS-CoV-2: ORF9b i.e., S53, which is important for binding of ORF9b to TOMM70. The formation of TOMM70:ORF9b complex is regulated via phosphorylation at S53^[240]107. A483 also interacts with other DC residues in TOMM70. The A483T mutation strengthens this interaction, and this mutation is predicted to be strongly pathogenic by mutpred2 (score = 0.90). Figure [241]2 summarizes impact of all affinity-enhancing variants and their structural impact on atomic interactions at the interface. Most affinity-increasing variants in TOMM70 are observed in the African American and American population, but with rare allele frequency i.e., less than 1% (Table [242]3). Most of the affinity-enhancing variants have an impact on atomic interactions of residues in the interface and are also predicted to have significant impact by various programs including CADD (score > 20), SIFT (predicted as deleterious) and predicted to be pathogenic by mutpred2 (score > 0.5), as shown in Fig. [243]2B. In must be noted that African American and Asian populations are also observed to carry affinity-reducing variants in TOM70 that are predicted to decrease binding affinity to SARS-CoV-2:ORF9b (by mCSM-PPI2), thus likely confer resistance to the COVID-19 infection (Table [244]3). Amongst these, two interface variants (I412T and E477G) are predicted to have strong destabilising impact on binding (< − 1 kcal/mol). Interestingly, the residue E477 is known to be the key residue for ORF9b binding (via binding to Serine 53 in ORF9b [ref.^[245]14]), thus substantiating the potential effect of E477G variant in abolishing the binding. Impact of human IFIH1 variants on SARS-CoV-2: PLPro binding PLpro (papain-like cysteine protease) in SARS-CoV-2 is involved in a wide range of important functions, such as viral polyprotein chain processing, dysregulation of host inflammatory responses, and impairing the type I interferon (IFN-1) antiviral immune responses^[246]108. SARS-CoV-2:PLpro plays a key role in innate immune suppression in humans by interacting with various host substrates such as ISG15, IFIH1 and IFIT2, and others^[247]109,[248]110. Therefore, the SARS-CoV-2:PLpro is a hot spot for designing protein–protein interactor inhibitors^[249]110. IFIH1 (also known as MDA5) is a cytoplasmic innate immune receptor. IFIH1 is a pattern-recognition receptor which binds to viral RNAs and suppresses translation initiation^[250]111. IFIH1-binding to viral RNA is known to induce type I interferon response by triggering activation of antiviral immunological genes including IFN-alpha, IFN-beta and pro-inflammatory cytokines^[251]111–[252]113. SARS-COV-2 employs PLpro to bind and block the activation of an IFIH1-dependent cascade of antiviral responses^[253]112. PLpro is suggested to bind CARD domains of IFIH1^[254]109,[255]112 and hence we modelled a complex of PLpro and IFIH1 using AlphaFold2-multimer (Fig. [256]3). Figure 3. [257]Figure 3 [258]Open in a new tab IFIH1-PLpro complex built using AlphaFold2-multimer. (A) Human IFIH1 is shown in tan and SARS-CoV-2:PLpro is shown in blue. The structural locations of affinity-enhancing variants (ΔΔG^Affinity ≥ 0.5 kcal/mol) are shown in red. These include two DC-variants (S16L, Y13N) (B,C) Effects of affinity-enhancing variants in TOMM70 are shown in detail (figures in 3B and 3C source: mCSM-PPI2 [55]). HP: Hydrophobic (green), H-bond: hydrogen bond (red), VDW: Van-der-Waals (blue), Polar (orange). (D) The structural impact of DC-variants on atomic interactions at the interface, are shown. Affinity-enhancing variants: structural impact and population distribution Two affinity-enhancing variants in IFIH1 are predicted—both Y13N and S16L, are DC residues (Fig. [259]3). According to gnomAD, Y13N is a common variant [allele frequency (AF) > 1%] in East Asians. This is in accordance with GenomeAsia 100K, which indicates Y13N is a common variant in Northeast Asian i.e., in Japanese (AF = 0.01) and Korean populations (AF = 0.003). Upon mutation Y13N, hydrophobic interactions between aromatic rings of Y13 in wild-type IFIH1 and of P245, P246 and Y262 in viral PLpro are replaced by a stronger hydrogen bond between the N13 of the mutant IFIH1 and the P246 of PLpro (Fig. [260]3). The other variant significantly impacting SARS-CoV-2:PLpro binding affinity is S16L, observed to occur at a rare frequency in Europeans (non-Finnish). The side chain oxygen atom on residue S16 of IFIH1 forms two weak polar interactions with Y266 of PLpro whereas the leucine side chain in the S16L IFIH1 mutant interacts more strongly with Y266 (Fig. [261]3). Leucine side chain forms enhanced hydrophobic contacts and polar interaction with neighbouring residues Y262 in PLpro and Y13 in IFIH1 (Fig. [262]3). Impact of human ISG15 variants on SARS-CoV-2: PLPro binding ISG15 (Interferon stimulated gene 15) plays a key role in the innate immune response to viral infection via a process known as ISGylation upon activation by type I interferons or by viral/bacterial infections. ISGylation (ISG15 modification) is a process whereby ISG15 protein covalently binds to other protein substrates^[263]108,[264]114. The ISGylation process acts as an antiviral defence mechanism against SARS-CoV-2 and several other RNA viruses. SARS-CoV-2 PLpro binds to ISG15 and blocks the ISGylation^[265]112,[266]114,[267]115. ISG15 binds to PLpro via the LRGG motif (154–157 residues). Affinity-enhancing variants: structural impact and population distribution Two ISG15 coding variants at DC positions (S21N, L121N) are predicted to enhance the binding affinity of the ISG15-PLpro complex (Table [268]4, Fig. [269]4). Table 4. Details of variants in human:ISG15 that impact binding to SARS-CoV-2:PLpro. rsIDs Mutation Mcsm-PPI2 (kcal/mol) Functional site annotations Populations in gnomAD and [allele frequency] rs748715915 p.Leu121Gln 0.57 DC_PPI Near_Scorecons90 Near_LRGG motif NFE-SWEDISH [0.00003834] rs143888043 p.Ser21Asn 0.401 DC_PPI AFR [0.01648] *also supported by GenomeAsia 100K AMR[0.0008475], NFE [0.00007794], SAS[0.00009810], other[0.0008333] rs1477663018 Leu85Pro − 1.002 DCSS_PPI, Scorecons90 NFE-North-Western [0.00002404] rs1160940574 p.Met23Thr − 1.017 DC_PPI NFE-North-Western [0.00002387] rs761507082 p.Arg155Trp − 1.166 DC_PPI LRGG motif site AMR[0.0004058] ASJ[0.0001011], SAS[0.00003275], rs774241776 p.Asn151Lys − 1.286 DC_PPI Near_ Scorecons90 Other [0.0001647] rs750338976 p.Arg155Gln − 2.061 DC_PPI LRGG motif site NFE[0.00001808] [270]Open in a new tab For each variant, SNP ID, amino acid mutation, mCSM-PPI2 prediction (ΔΔG^Affinity), functional site annotations and enrichment in specific population in gnomAD are indicated. Functional sites are abbreviated as follows- DC_PPI: directly contacting interface site, DCSS_PPI, secondary shell residue, ligsite: known ligand binding site, Near_ Scorecons90: proximity to conserved sites predicted using Scorecons (score ≥ 0.90). Figure 4. [271]Figure 4 [272]Open in a new tab SARS-CoV-2: PLpro-human ISG15 complex (7RBS). (A) Human ISG15 (blue) in complex with SARS-CoV-2: PLpro (tan). The affinity-enhancing variants L121N and S21N are indicated in red. (B) The table summarizes the impact of affinity-enhancing mutations (wild type vs mutant) on atomic interactions at the interface. Atomic interactions associated with L121Q and S21N are shown in (C). HP: Hydrophobic (green), H-bond: hydrogen bond (red), VDW: Van-der-Waals (blue), Polar (orange); Figures in 4(C) are generated using mCSM-PPI2^[273]55. L121Q, one of the direct contact residues in the interface, is predicted to enhance the ISG15: SCoV2 PLpro binding affinity (mCSM-PPI2 ΔΔG^Affinity = 0.57 kcal/mol). This variant occurs within 5 Å from the key LRGG motif (the PLpro recognition site) and forms direct interactions with R153, which is adjacent to this motif, which is therefore likely to have an impact on PLpro-binding. Analyses of proximity to conserved sites detected using Scorecons indicates that the variant lies in the structural neighbourhood (5 Å) of five conserved residues (at positions 117, 120, 122, 123 and 152; with Scorecons90) of which one (W123) is also predicted to be an allosteric site (score: 0.896, predicted using Ohm^[274]116). This substitution is only observed in the Swedish population at rare frequency. The second DC-variant S21N is annotated in ClinVar (ID: 475283, benign) and is associated with Mendelian susceptibility to mycobacterial diseases (also known as Immunodeficiency 38 disease)^[275]117. The variant is associated with severe clinical disease upon infection with weakly virulent mycobacteria (including Mycobacterium bovis and Bacille Calmette-Guerin vaccines)^[276]118,[277]119. The S21N variant is predicted to moderately increase affinity (with the borderline mCSM-PPI2 ΔΔG^Affinity score of 0.401 kcal/mol) and is a common variant (allele frequency > 1%) found in the African population, as supported by multiple population databases including gnomAD [allele frequency (AF): 0.01648], GenomeAsia 100K (allele frequency 0.043269) and All of Us (allele frequency 0.014). In the case of affinity-reducing (protective) variants (mCSM-PPI2 ΔΔG^Affinity ≤ − 0.5 kcal/mol), four variants (R155Q, R155Q, N151D, L85P, M23T) were predicted with significantly reduced binding affinity (ΔΔG^Affinity < − 1.0 kcal/mol). Most of these occur only in Non-Finnish Europeans and one predominantly in the American population (R155W). Two variants (R155Q, R155W) occur within the known PLpro-recognition motif in ISG15 (LRGG motif formed by 154–157 residues)^[278]114. Thus, individuals carrying these mutations may be at lower risk of compromised immunity mediated by ISG15: PLpro binding. Impact of human IFIT2 variants on SARS-CoV-2: PLPro binding IFIT2 (Interferon-induced protein with tetratricopeptide repeats 2) is an RNA-binding protein, and binding of RNA is known to be important for antiviral activity of IFIT2 [127]. Biochemical studies indicate that human IFIT2 binds to SARS-CoV-2:PLpro^[279]109, however an experimental structure of the complex is not available. The modelled complex in our study, indicates SARS-CoV-2: PLpro binds at a channel-like region at the C-terminus of IFIT2, which is a known RNA-binding region (formed by K37, R184, K255, R259, R291, and K410) [128]. Three PLpro-interacting residues in IFIT2 (i.e., R259, K410 and R291) are involved in RNA-binding (depicted in Fig. [280]5). Figure 5. [281]Figure 5 [282]Open in a new tab The model of SARS-CoV-2:PLpro in complex with human:IFIT2, and mapping of affinity-enhancing variants (A) Mapping of the affinity-enhancing variants (red) K221E, A319S/T, L373F, Y383F onto the IFIT2 (cornflower blue)-SARS-CoV-2:PLpro (tan) complex. (B) PLpro interacts with IFIT2 at a site which partially overlaps with a known RNA-binding region (yellow). Affinity-enhancing variants: structural impact and population distribution Four IFIT2 DCSS-variants (L373F, K221E, A319S and A319T) significantly affect the binding affinity (mCSM-PPI2 ΔΔG^Affinity ≥ 0.5 kcal/mol) (see Table [283]5). Three of these- A319S, A319T and L373F are also predicted to be pathogenic by CADD (score ≥ 20) and SIFT. Two DCSS-variants in IFIT2 namely, K221E and A319(S/T) are predominant in South Asian and African/AA populations, respectively. The residues (K221E, A319S and A319T and Y383F) directly interact with residues (R259 and R291) known to be involved in the RNA-binding in IFIT2 (Fig. [284]5B). Thus, mutations in IFIT2 protein which increase binding to SARS-CoV-2:PLpro are likely to hinder the binding of IFIT2 to RNA, and thus the normal antiviral mechanism of IFIT2. Table 5. Details of affinity-enhancing variants in human IFIT2 (SARS:CoV:2-PLpro interacting protein). Variant ID rsID Mutation mCSM-PPI2 (kcal/mol) Functional site annotations Populations in gnomAD [ Allele frequency] 10-91066830-C-T rs761596604 p.Leu373Phe 0.911 DCSS_PPI South Asian [0.00003268] Asian: 0.00004 (All of Us) 10-91066374-A-G rs1305223950 p.Lys221Glu 0.616 DCSS_PPI Near_ligsite Near_Scorecons90 South Asian [0.00003268] Asian: 0.00002 (All of Us) 10-91066668-G-T rs775027680 p.Ala319Ser 0.532 DCSS_PPI Near_ligsite Near_Scorecons90 African/African American [0.00006460] 10-91066668-G-T rs775027680 p.Ala319Thr 0.516 DCSS_PPI Near_ligsite Near_ Scorecons90 African/African American [0.00006460] East Asian: 0.00007130 Latino/Admixed American [0.00007008] Swedish [0.00003832] 10-91066861-A-T rs751422356 p.Tyr383Phe 0.449 Near_ligsite European NFE [0.00001556] [285]Open in a new tab For each variant, SNP ID, amino acid mutation, mCSM-PPI2 prediction, functional site annotations and enrichment in specific population in gnomAD are indicated. Functional sites are abbreviated as follows- DC_PPI: means directly contacting interface site, DCSS_PPI, secondary shell residue, ligsite: indicates known ligand binding site (i.e. RNA binding site in IFIT2) and near_ligsite indicates sites that are proximal to a known ligand binding site. near_Scorecons90: proximity to conserved sites predicted using Scorecons (score > 0.90). Thus, our analyses of 3D complexes, affinity-enhancing variants, and functional sites identifies some affinity enhancing variants that could promote the binding of host immune proteins to SARS-CoV-2 proteins, thereby reducing the binding to their natural protein partners. This could mediate reduced immune responses in certain individuals. Impact of variants in SARS-CoV-2 proteins In addition to human protein variants, we also analysed the impact of viral protein variants, in Spike-RBD, Spike-NTD, PLPro, ORF9b, ORF7b, ORF6, ORF3a, NSP7, NSP15, NSP14, NSP1, N and E (using experimental/predicted complexes listed in Table [286]1). The impact of a total of 212 viral variants occurring at protein–protein interface region was analysed using mCSM-PPI2. The majority of the variants are observed to have ‘moderate’ impact on binding to their human interactors, i.e. predicted binding affinity (ΔΔG^Affinity) values range from 0 to 0.49 kcal/mol for affinity-enhancing variants (see supplementary Table [287]3). However, a few variants in SARS-CoV-2 proteins such as NSP14 (L6074F and N6054I), PLpro (L1774F) and Spike (S477N) are significantly predicted to be affinity-enhancing (mCSM-PPI2 ΔΔG^Affinity ≥ 0.5 kcal/mol). The impact of the variant spike-RBD S477N on enhanced ACE2-binding affinity is recently confirmed by experimental assays^[288]37. Among the five VOC’s Omicron has evolved to contain highest number of mutations in the spike protein. The phylogenetic tree of all VOC’s and VOI’s was built from the Spike protein sequences, to better understand the pattern of emergence of Omicron and Omicron subvariants (Please see phylogenetic tree in Supplementary Fig. [289]1). Interestingly, a comprehensive study by Martin et al. suggest that S477N is common in all Omicron sub-variants and is under positive selection pressure^[290]120. The impact on binding affinity to the ACE2 receptor, thus may provide a selective advantage to increase transmissibility in the host. Pathway enrichment analyses We examined the biological processes and signalling pathways associated with the 13 human proteins containing affinity-enhancing variants (Table [291]2) using three well-established pathway enrichment databases: Gene Ontology (GO), KEGG and Reactome (Table [292]6). The most significantly enriched terms were related to immune functions, including viral life cycle (GO biological process) and Influenza A (KEGG). The top enriched Reactome pathway was SARS-CoV-2 Infection, confirming the association with this panel of proteins identified in earlier analyses. Table 6. Top significant GO, KEGG^[293]98 and Reactome pathways associated with 13 SARS-CoV-2-interacting human proteins (containing predicted affinity-enhancing variants). Term Pathway Adj. p-value REAC:R-HSA-9694516 SARS-CoV-2 infection 2.59 × 10^–8 GO:1903900 Regulation of viral life cycle 3.19 × 10^–5 KEGG:05164 Influenza A 0.0019 [294]Open in a new tab Since many biological processes are governed by functional modules which are highly interconnected sub-networks of protein–protein interactions, identifying functional modules containing the 13 human proteins provides a method for gaining further insights into their biological functions. This approach can also offer insights into specific mechanisms contributing to SARS-CoV-2 pathology. To this end, we used two state-of-the-art modularity detection algorithms and applied them to two well regarded network datasets: STRING, and ConsensusPathDB. We then identified which modules contain any of the 13 proteins and performed pathway enrichment analysis on each of the modules using GO (biological processes), KEGG and Reactome. The enrichment analysis was used to assign to each of the 13 proteins the appropriate biological processes/pathways. For a full list of proteins in each module, see Supplementary files [295]5 and [296]6. We constructed a protein–protein interaction network for all these human proteins and their associated partner viral proteins (please see Supplementary file [297]7). Unsurprisingly, many of the 13 proteins were associated with immune response to viral infection, including interferon-inducible genes IFIT2 and IFIH1, and ISG15 involved in modulating viral replication. Induction by type I interferons (α/β), which activate other immune cells, was also associated with the transmembrane protein TRIM25. Several symptoms, particularly in severe or long COVID-19 cases, are associated with mitochondrial dysfunction, including cytokine storms^[298]121, which is a key pathway associated with TOMM70 associated with mitochondrion organisation. It has also been found to be involved in interferon regulation^[299]122. SARS-CoV-2 viral proteins replicate in the cytoplasm via translation on ribosomes, hence supporting our identification of several pathways involved in cytoplasmic or ribosomal transport, specifically NUP98, RAE1, ribosomal protein RPS3, TOMM70 and TRIM25. Other genes are involved in cell processes that are hijacked by the virus during infection. For example, ARF6 was identified as involved in exocytosis, but is used by the virus to infiltrate and infect the cell^[300]123. Overall, the findings of our functional analysis highlight some key mechanisms involved in the viral response to COVID-19. Functional families associated with small molecule inhibitors For the 13 human proteins containing predicted affinity-enhancing variants, we analysed their associated functional families in CATH (CATH-FunFams) to inspect whether any of the homologous proteins in the FunFam were linked to known small molecule inhibitors. These small molecules, which may include drug-like molecules or approved drugs, were identified using the ChEMBL database (see Table [301]7). Table 7. Human proteins with CATH-FunFams containing small molecule inhibitors. Protein CATH superfamily and functional family (CATH-FunFam) ID Associated entry in ChEMBL Representative 3D-structure IFIH1 1.10.533.10 CATH-FunFam ID 77 CHEMBL4739862 [CHEMBL4594258, phase 2] 7DNI [PDB] AXL 2.60.40.10 CATH-FunFam ID 810 CHEMBL4895 [CHEMBL3301622, approved] CHEMBL4879451 5VXZ [PDB] ARF6 3.40.50.300 CATH-FunFam ID 286 CHEMBL5987 CHEMBL1075274 3LVR [PDB] [302]Open in a new tab Three human proteins (IFIH1, AXL and ARF6) are observed to be linked with small molecule inhibitors from ChEMBL. The associated entries in ChEMBL are indicated. The representative 3D structure for each functional families in CATH is indicated. If the representative is not available in CATH, the representative from PDB is shown. Next, we used CavityPlus to detect druggability of the protein–protein interface formed by these three proteins. CavityPlus provided strong confidence for prediction for druggability of IFIH1: PLpro interface and ARF6:NSP15-interface, and medium confidence for AXL:NTD interface (see supplementary file [303]8). These observations are also supported by the CASTp prediction (see Supplementary file [304]9). These results are indicative of potential applications of their associated inhibitors for designing drugs targeting these protein–protein interactions. For example, IFIH1 is associated with Selgantolimod in ChEMBL (ID: CHEMBL4594258), which is in phase 2 clinical trial. Selgantolimod is known to be a Toll Like Receptor 8 Agonist, which increases immune responses in chronic Hepatitis B patients^[305]124. Analyses using CavityPlus provided a strong prediction score for the presence of two cavities that occur at the IFIH1-PLpro interface (the topmost cavity with score of 4177.0, Supplementary File [306]8). These cavities do not interfere with CARD oligomerization or its protein partner MAVS, which is required for IFIH1- induced interferon response, thus substantiating the potential use of this molecule in designing therapeutics against SARS-CoV-2 infection. Docking of this ligand with IFIH1-PLpro, provides support for binding at this cavity (^[307]125; see Supplementary File [308]10). Likewise, cavities in the interfaces of NTD-AXL and ARF6-NSP15 could likely be interesting targets for drug design and further experimental assays are required to substantiate their application for drug repurposing or as starting points for structure-based design of novel compounds. Discussion We analysed ~ 20% of SARS-CoV-2 immunity associated interactions using structural data of the protein–protein complexes. Structural analyses helped in detecting affinity-enhancing variants in human immunity-associated proteins and in evaluating their possible impact on SARS-CoV-2:human protein complexes and COVID-19 susceptibility. Information on the proximity of these variants to functional sites in the 3D structure can give insights into potential mechanisms associated with the suppression of normal functioning of these immune proteins, thereby affecting COVID-19 susceptibility. We applied a structural bioinformatics approach to analyse the impact of missense variants from human and viral proteins, using 19 SARS-CoV-2: human protein structural complexes (obtained from PDB or built using AlphaFold2-multimer/ptm). We analysed 468 coding variants in human proteins occurring at protein–protein interfaces. A total of 26 affinity-enhancing variants from 13 human proteins were predicted to significantly enhance SARS-CoV-2 binding. A majority of these human proteins were involved in key immune pathways and associated with antiviral activity against SARS-CoV-2, including Interferon stimulating genes (ISG15, IFIT2); important receptors (such as IFIH1, TOMM70); proteins involved in nucleocytoplasmic shuttling of viral mRNA (NUP98 and RAE1), proteins involved in cellular translation machinery (RPS2 and RPS3) and cell entry receptors via spike-binding (ACE2, KREMEN1 and AXL). Among these 13 proteins, experimental assays have been performed on spike-ACE2, substantiating the role of the predicted spike affinity-enhancing variants in COVID-19 susceptibility and transmission^[309]37,[310]38, while variants in the remaining proteins are reported for the first time in this study. The modelling of complexes using AlphaFold2-multimer/ptm increased the structural coverage of the complexes and helped to provide structural insights into the mechanisms of SARS-CoV-2 proteins binding to human host proteins. We propose that affinity-enhancing variants in key-immunity associated human proteins could promote their binding to SARS-CoV-2 proteins, competing with human protein partners or substrates in immune pathways, and this in turn, may have an impact on COVID-19 susceptibility. This finding is in line with previous experimental studies on SARS-CoV-2: human interactions that affect natural immune pathways^[311]126,[312]127. For example, Li et al.^[313]127 suggest that binding of SARS-CoV-2: ORF6 to human: NUP98-RAE1 complex competitively inhibits mRNA binding (to NUP98-RAE1), which is essential for its immune function. Likewise, overexpression of SARS-CoV-2: Nucleoprotein protein is observed to be associated with the attenuation of RIG-I-mediated interferon production via binding to TRIM25 and thus interrupting the interaction between TRIM25 and RIG-I (which is its natural protein partner)^[314]126. The SARS-CoV-2:human protein interactions associated with these 13 human proteins could be attractive targets for drug design that target the protein–protein interfaces. In particular, SARS-CoV-2: PLpro is known to be a promising target for designing protein–protein interaction inhibitors^[315]128. In our analyses, we modelled SARS-CoV-2:PLpro interactions with ISG15, IFIH1 and IFIT2. Interestingly, PLpro interactor protein IFIH1 is associated with drug-associated functional family in CATH. Likewise, we observed druggable CATH-FunFam for AXL and ARF6 proteins (interacting with Spike-NTD and NSP15 in SARS-CoV-2, respectively). The domain relatives within CATH-FunFams exhibit highly conserved drug binding sites and have the potential to be the druggable entities within drug targets, as shown in^[316]60. Thus, further studies targeting such immune proteins and PLpro-mediated interactions would be helpful. We suggest monitoring both common and rare variants in human proteins, that are predicted to cause significant impact and thus likely to be associated with disease pathogenicity or susceptibility. Two affinity-enhancing variants – one from IFIH1 (Y13N) and one from ISG15 (S21N), are observed to occur at > 1% allele frequency i.e. common in East Asian and African populations, respectively. These affinity-enhancing variants exhibit distinct allele frequency distributions across gnomAD populations and thus it is likely that they exhibit distinct COVID-19 susceptibility. For example, the variant Y13N in IFIH1 is common mutation found only in East Asians. The S21N in ISG15 is common in African population and less common in other populations such as Middle Eastern, Admixed American, European (non-Finnish) populations and not present in any other populations in gnomAD. Thus, affinity-enhancing variants involved in increased affinity to SARS-CoV-2 binding, could manifest population-scale differences in causing susceptibility to the disease via alteration to natural immune pathways in humans. Most of the variants identified in our study have allele frequencies (< 1%). A growing number of studies support the key role of rare variants in causing susceptibility/severity to COVID-19^[317]12,[318]26–[319]32. Thus, apart from common variants in East Asian and African population, rare variants predicted from various other populations are predicted to be affinity-enhancing, which could affect COVID-19 susceptibility and cause increased risk of infection in individuals carrying them. We also provide a catalogue of affinity-reducing variants from 12 proteins (Table 2), particularly in ISG15 and TOMM70. These variants are observed to significantly reduce binding affinities of the human proteins to their SARS-CoV-2 partners and some of which are observed to occur at key functional sites/motif region that are important for binding to SARS-CoV-2 proteins, thus their likely role in resistance to COVID-19. Earlier data showed a good correlation between predicted affinity-reducing variants and experimental observations^[320]38. Thus, affinity-reducing variants reported in this study could provide an explanation for why some individuals in specific populations are less likely to experience SARS-CoV-2 associated immune evasion. Whilst we observed only two common variants in certain specific ethnic groups, occurrence of certain rare affinity-enhancing variants could also lead to increased susceptibility in individuals carrying them. Though the data suggest some role of genetic variation in COVID-19 susceptibility, the role of social and environmental factors should also be studied. In addition, our study is based on computational prediction of changes in the binding affinity. Experimental data is available for variants in only a limited number of complexes such as spike-ACE2 and ORF9b-TOMM70. Future experimental studies would be helpful for validating the impacts of proposed affinity-enhancing mutations in other immune proteins. Conclusions We used structural bioinformatics approaches to predict human and viral protein variants affecting COVID-19 susceptibility and to suggest repurposing of therapeutics based on CATH functional family data associated with small molecules/drugs. A total of 26 affinity-enhancing variants are reported in our study and we discuss their structural impact in the context of functional sites using 3D structures of SARS-CoV-2: human complexes. The protocol designed in this study could be extended to analyse other protein interactions as more structures are experimentally determined and more powerful tools for protein structure prediction emerge. Our approach could be helpful in future studies not only for COVID-19 but also other emerging infectious diseases. Supplementary Information [321]Supplementary Information.^ (14.9MB, zip) Acknowledgements