Abstract Next-generation sequencing enables simultaneous analysis of hundreds of human genomes associated with a particular phenotype, for example, a disease. These genomes naturally contain a lot of sequence variation that ranges from single-nucleotide variants (SNVs) to large-scale structural rearrangements. In order to establish a functional connection between genotype and disease-associated phenotypes, one needs to distinguish disease drivers from neutral passenger variants. Functional annotation based on experimental assays is feasible only for a limited number of candidate mutations. Thus alternative computational tools are needed. A possible approach to annotating mutations functionally is to consider their spatial location relative to functionally relevant sites in three-dimensional (3D) structures of the harboring proteins. This is impeded by the lack of available protein 3D structures. Complementing experimentally resolved structures with reliable computational models is an attractive alternative. We developed a structure-based approach to characterizing comprehensive sets of non-synonymous single-nucleotide variants (nsSNVs): associated with cancer, non-cancer diseases and putatively functionally neutral. We searched experimentally resolved protein 3D structures for potential homology-modeling templates for proteins harboring corresponding mutations. We found such templates for all proteins with disease-associated nsSNVs, and 51 and 66% of proteins carrying common polymorphisms and annotated benign variants. Many mutations caused by nsSNVs can be found in protein–protein, protein–nucleic acid or protein–ligand complexes. Correction for the number of available templates per protein reveals that protein–protein interaction interfaces are not enriched in either cancer nsSNVs, or nsSNVs associated with non-cancer diseases. Whereas cancer-associated mutations are enriched in DNA-binding proteins, they are rarely located directly in DNA-interacting interfaces. In contrast, mutations associated with non-cancer diseases are in general rare in DNA-binding proteins, but enriched in DNA-interacting interfaces in these proteins. All disease-associated nsSNVs are overrepresented in ligand-binding pockets, and nsSNVs associated with non-cancer diseases are additionally enriched in protein core, where they probably affect overall protein stability. Introduction Human genetic variation ranges from neutral polymorphisms to disease susceptibility variants and pathogenic mutations with high penetrance.^[28]1 A single individual may carry up to 3 × 10^6 single-nucleotide variants (SNVs) and up to 3 × 10^5 insertions and deletions,^[29]2 but even in disease-affected individuals only few variants of this continuum are expected to be causal, with the rest being neutral. Data on genetic variants that underlie certain disease phenotypes are accumulated in specific databases, for example, ClinVar,^[30]3 which currently contains >160 000 unique variant records pertaining to 27 261 genes. However, even a strong mutation-phenotype association itself provides no insight into the mechanistic changes to the protein function and/or structure that are caused by the mutation. These changes can result in protein instability or misfolding, or in perturbations of interaction energy, if the affected protein is involved in protein–protein, protein–nucleic acid or protein–ligand interactions. Computational analysis of the available three-dimensional (3D) structures of human proteins shows that disease-causing missense (non-synonymous) mutations often result in significant alteration of the amino-acid residue properties and disruption of non-covalent bonding.^[31]4 In contrast, functionally neutral variants tend to be located at the protein surface and to be less conserved than random.^[32]5, [33]6 Anecdotal data are available on the involvement of disease-associated missense SNPs in protein–protein interactions (PPI), reviewed in.^[34]7, [35]8, [36]9 A large-scale analysis confirms that disease-related mutations are frequently overrepresented on PPI interfaces.^[37]10 Several computational methods have been developed to assess the impact of non-synonymous single-nucleotide variants (nsSNVs) on the protein function, with SIFT^[38]11 and PolyPhen-2^[39]12 being among the most commonly used ones. Some methods take into account protein sequence-based phylogenetic information pertaining to the mutation,^[40]11, [41]13 others rely on the combination of protein structural information, functional parameters and phylogenetic information derived from multiple sequence alignments.^[42]14, [43]15, [44]16, [45]17, [46]18 Specific contribution of structural parameters to the prediction performance has been a long-discussed issue.^[47]12, [48]17 Numerous tools have been constructed to assess potential changes caused by SNVs in protein 3D structure: SNPeffect database,^[49]18 for example, ignores the conservation profiles of SNVs and relies on predicted structural features (aggregation, amyloidogenicity, stability) and domain and catalytic site annotations. There are tools that predict the energetic impact of a mutation on the stability of a protein or protein complex.^[50]19, [51]20, [52]21, [53]22, [54]23, [55]24 A thorough comparison and discussion of limitations of these methods can be found in references [56]17, [57]25. dSysMap^[58]26 and