Abstract

   Alzheimer’s disease (AD) is a neurodegenerative disorder, and
   represents the most common cause of dementia. In this study, we
   performed several different analyses to detect loci involved in
   development of the late onset AD in the Russian population. DNA samples
   from 472 unrelated subjects were genotyped for 63 SNPs using iPLEX
   Assay and real-time PCR. We identified five genetic loci that were
   significantly associated with LOAD risk for the Russian population
   (TOMM40 rs2075650, APOE rs429358 and rs769449, NECTIN rs6857, APOE ε4).
   The results of the analysis based on comparison of the haplotype
   frequencies showed two risk haplotypes and one protective haplotype.
   The GMDR analysis demonstrated three significant models as a result: a
   one-factor, a two-factor and a three-factor model. A protein–protein
   interaction network with three subnetworks was formed for the 24
   proteins. Eight proteins with a large number of interactions are
   identified: APOE, SORL1, APOC1, CD33, CLU, TOMM40, CNTNAP2 and CACNA1C.
   The present study confirms the importance of the APOE-TOMM40 locus as
   the main risk locus of development and progress of LOAD in the Russian
   population. Association analysis and bioinformatics approaches detected
   interactions both at the association level of single SNPs and at the
   level of genes and proteins.

   Keywords: Alzheimer’s disease, APOE, case–control study, GWAS, SNP

1. Introduction

   Alzheimer’s disease (AD) is a neurodegenerative disorder and the most
   common cause of dementia. The disease is characterized by the loss of
   neurons and synaptic connections in the cortex and certain subcortical
   areas. AD is clinically characterized by decline in progressive
   executive function, perceptual speed, short-term memory and other
   cognitive capacities [[38]1]. Degradation of cognitive functions has a
   major impact on an individual’s quality of life. According to the World
   Alzheimer Report 2018, there are over 50 million people suffering from
   AD or other dementias in the world [[39]2]. Each year, approximately
   5–7 million new cases of AD are registered in the geriatric population
   [[40]3].

   Two forms of AD are traditionally distinguished: early onset (EOAD,
   onset <65 years old) and late-onset (LOAD, onset >65 years old). EOAD
   accounts for 1–6% of AD cases. These cases are familial and are
   associated with classical Mendelian patterns of inheritance. The
   amyloid precursor protein (APP), presenilin1 (PSEN1), and presenilin2
   (PSEN2) genes are primarily implicated in EOAD. As for the later onset
   form of the disease, both genetic and environmental factors are
   considered to impact the disease risk profile. LOAD constitutes
   approximately 95% of AD cases [[41]4]. Heritability for LOAD is
   estimated to be between 58% and 79%, on the basis of a twin study
   design [[42]5]. The genetic complexity of LOAD should not be
   underestimated. Due to general efforts, it has become clear that
   pathogenesis of LOAD is multi-faceted and is not limited to one set of
   simple molecular interactions.

   Common neurological and psychiatric disorders, including AD,
   schizophrenia, bipolar disorder, Parkinson’s disease are now the
   subject of intensive studies. Over the last few decades great progress
   in researching of susceptibility genetic loci to common mental and
   neurological diseases has been made using new technologies and methods:
   genome-wide association studies (GWAS), large meta-analysis,
   whole-exome/genome sequencing (GWAS, meta-analysis, NGS).

   Alzheimer’s disease and schizophrenia are two common diseases of the
   brain with significant differences in neuropathology, etiology and
   symptoms. Data on the common links in the pathogenesis of AD and
   schizophrenia are accumulating [[43]6]. Thus, statistical evidence has
   shown that AD and schizophrenia have a specific molecular background.
   Transcriptome studies have found a similar pattern of gene expression
   in the superior temporal gyrus in AD and schizophrenia [[44]7].
   Recently, GWASs have identified many genomic loci characterized by
   small effect sizes associated with LOAD: CLU, CR1, PICALM, BIN1, ABCA7,
   CD33 and others [[45]8,[46]9,[47]10]. It is important to keep in mind
   that genetic association studies have often produced false findings,
   and thus need validation in multiple sample sets. On the other hand, it
   is known that most GWAS SNPs in the associated loci with disease cannot
   be explained on the basis of known pathological mechanisms.
   Next-generation sequencing has assisted the identification of rare
   susceptibility modifying alleles in APP, TREM2, and PLD3 [[48]11].
   However missing heritability for LOAD remains extensive. While great
   progress has been made with respect to understanding the genetic
   landscape of LOAD, the connection between the genes and variants
   promoting to the risk of the disease remains unclear. The data
   accumulation and the use of new bioinformatics approaches including
   those implemented in publicly available resources can help with this
   problem. The biological roles in metabolic pathways of all newly
   identified variants, as well as functionally related genes, can be
   assessed using ALIGATOR, KEGG, GO, STRING, GeneMANIA
   [[49]12,[50]13,[51]14,[52]15,[53]16]. Exploring potentially new protein
   interactions in common mental and neurological diseases using
   bioinformatic strategies presents potential interest. Genetic risk
   factors could be analyzed and integrated in terms of biological
   pathways and functions to better understand the contribution of
   genetics to the general background of neurological and mental diseases.
   Protein interaction data, derived from the GWAS database, can be used
   to build protein–protein interaction (PPI) networks from genes
   associated with AD, schizophrenia and cognitive performance. PPI
   network analysis could be an effective and powerful approach to
   identify potential biological pathways or key genes involved in the
   link of neurological and mental diseases.

   In addition to the complexities of LOAD associated with numerous
   interconnected environmental and genetic factors, the genetic
   architecture of LOAD differs between ethnic groups. Our prior study
   estimated genetic diversity of sixteen native populations of North
   Eurasia using a panel of genetic markers significantly associated with
   AD and Schizophrenia [[54]17].

   Here we follow up on our prior research in a new cohort of the AD group
   and controls. In the present study, we aimed to analyze 63 of the
   previously reported statistically significant SNP markers associated
   with AD in approximately 450 Russian samples from well-characterized
   LOAD case–control set. In addition, our focus was on the identification
   of the molecular and functional pathways of these SNP markers using
   various statistical and bioinformatics resources.

2. Materials and Methods

2.1. Samples

   This study involved 185 unrelated LOAD patients (mean age 72.15 ± 7.87
   years) recruited and evaluated at the Department of Neurology and
   Neurosurgery (Siberian State Medical University, Tomsk, Russia).
   Subjects were included in the study after a diagnosis of LOAD according
   to the International Classification of Diseases, Tenth Revision,
   Clinical Modification (ICD-10-CM), the diagnostic and statistical
   manual of mental disorders, fourth edition, text revision (DSM-IV TR)
   [[55]18] and the NINCDS-ADRDA Alzheimer’s Criteria (the National
   Institute of Neurological and Communicative Diseases and Stroke and the
   Alzheimer’s Disease and Related Disorders Association) [[56]19].
   Structural imaging based on magnetic resonance is a part of the
   clinical assessment of patients with suspected AD. The presence of
   focal symmetrical medial temporal atrophy has predictive value for AD.
   Our study included the cases with MRI images showing the characteristic
   atrophy of the hippocampus. Cases had a minimum age at disease onset of
   65 years. Control group consisted of 287 unrelated cognitively normal
   elderly Russian participants (mean age 71.8 ± 5.7 years). The same
   procedures for the control group were performed as for the case group
   to exclude the LOAD diagnosis. Accordingly, history, neurologic
   examinations, magnetic resonance images of brain volumes were examined
   for the control group. Additionally, the MoCA test was conducted for
   the control group to exclude mild cognitive impairment. The MoCA was
   developed to be more sensitive to mild cognitive impairment in
   geriatric populations than other screeners (like the Mini Mental State
   Examination-MMSE). MoCA scores ranged between 0 and 30 points, and the
   higher scores indicated the better cognitive performance. In our study,
   the control group included people with the MoCA score ≥ 23. Controls
   were recruited and evaluated at the Nebbiolo Center for Clinical Trials
   (Tomsk).

   All of the studied individuals, cases and controls were of the same
   ethnic (Russians) and geographical origins, living in the Tomsk region
   of the Russian Federation. [57]Table 1 presents the main demographic
   parameters of the studied groups. The study protocol was approved by
   the Scientific Ethics Committee of the Research Institute of Medical
   Genetics (Tomsk National Research Medical Center of the Russian Academy
   of Sciences) (protocol number 2017/108). Individuals or legal guardians
   signed a written informed consent form after the study objectives and
   procedures had been explained.

Table 1.

   Demographic parameters of the studied groups.
   Parameter Patients with LOAD,
   n = 185 Control Group,
   n = 287
   Gender: Women 120 (64.86%) 200 (69.69%)
   Gender: Men 65 (35.14%) 87 (30.31%)
   Mean age 72.15 ± 7.87 71.8 ± 5.70
   Race Caucasoid Caucasoid
   Population Russians Russians
   [58]Open in a new tab

2.2. Genotyping

   Genomic DNA was extracted from the peripheral venous blood using
   phenol–chloroform extraction method. For this study, two multiplex
   panels of SNP markers that showed repeated association with cognitive
   performance, AD or/and schizophrenia in the GWAS were formed (see
   [59]Table 2). Genetic markers were selected based on certain and clear
   criteria. First of all, we selected loci reaching a genomic value (p <
   5.00 × 10^−7) for the phenotype (cognitive functions, AD and/or
   schizophrenia in GWAS). The next requirement concerned the type of
   marker: only SNPs were selected. The minor allele frequency (MAF)
   should have been ≥ than 5% in at least one population group of the
   HapMap project or the 1000 Genomes project. Selected markers should
   form multiplexes using the Assay Design Suite v2.0 genotyping tool
   ([60]https://agenacx.com/online-tools/, accessed on 6 August 2021). All
   patients and controls were genotyped for 62 SNPs using iPLEX Assay
   following the recommended protocol by the manufacturer (Agena
   Bioscience™, San Diego, CA, USA). In addition to 62 SNPs, genotyped by
   MALDI/TOF mass spectrometry, rs7412 of APOE was genotyped by real-time
   PCR using the TaqMan^® SNP Genotyping Assay (Applied Biosystems, Foster
   City, CA, USA).

Table 2.

   Frequencies of the 58 SNPs analyzed for LOAD.
   Gene. SNP ID Allele ^a Functional Consequence
   (NCBI) Position (GRCh38) Minor Allele Frequency MAF
   Cases Controls HapMap
   CHD6 rs1010304 G/A intron variant 20:41473007 0.06 0.05 0.06
   NCAPD3 rs1031381 T/C intron variant 11:134218788 0.40 0.43 0.42
   MPC2 rs10489202 T/G intron variant 1:167933841 0.21 0.20 0.22
   CSMD1 rs10503253 A/C intron variant 8:4323322 0.23 0.27 0.26
   CCDC60 rs11064768 G/A intron variant 12:119380704 0.08 0.07 0.07
   NT5C2 rs11191580 C/T intron variant 10:103146454 0.09 0.09 0.11
   LOC105378889-PRMT6 rs12125971 T/C intergenic variant 1:106921021 0.08
   0.08 0.07
   LOC101928778-LOC105371627 rs12140439 A/C intergenic variant 1:177753772
   0.29 0.30 0.34
   TENM4 rs12290811 A/T intron variant 11:79372576 0.12 0.12 0.14
   LUZP2 rs12361953 G/T intron variant 11:24589584 0.14 0.14 0.15
   CADM2 rs12494658 C/T intron variant 3:85825326 0.24 0.30 0.28
   SNX29 rs12922317 G/A intron variant 16:11983775 0.40 0.35 0.35
   LOC105373605 rs12989701 A/C intron variant 2:127130409 0.10 0.15 0.15
   CTNNA2 rs13034462 G/T intron variant 2:79892368 0.03 0.04 0.05
   BRD1 rs138880 C/A intron variant 22:49824963 0.21 0.21 0.23
   DCHS2 rs1466662 A/T intron variant 4:154426241 0.35 0.35 0.69
   CLU rs1532278 T/C intron variant 8:27608798 0.44 0.39 0.38
   TOMM40 rs157580 G/A intron variant 19:44892009 0.34 0.36 0.35
   LOC730100 rs1606974 A/G intron variant 2:51646461 0.08 0.06 0.07
   NKAPL rs1635 T/G missense variant 6:28259826 0.07 0.06 0.05
   LSM1 rs16887244 G/A intron variant 8:38173827 0.23 0.21 0.21
   POM121L2 rs16897515 A/C missense variant 6:27310241 0.09 0.11 0.12
   CNTN4 rs17194490 T/G intron variant 3:2506102 0.18 0.15 0.14
   ARHGAP31 rs17203055 G/A intron variant 3:119365484 0.11 0.10 0.12
   TCF4 rs17512836 C/T intron variant 18:55527730 0.008 0.02 0.05
   CADM2 rs17518584 C/T intron variant 3:85555773 0.34 0.32 0.30
   TCF4 rs17594526 T/C intron variant 18:55391007 0.01 0.02 0.05
   GPR89P-
   TRV-AAC1-5 rs17693963 C/A intergenic variant 6:27742386 0.06 0.06 0.08
   TOMM40 rs2075650 G/A intron variant 19:44892362 0.26 0.28 0.16
   CLU rs2279590 T/C intron variant 8:27598736 0.44 0.40 0.39
   ZNF365 rs2393895 C/A intron variant 10:62579087 0.23 0.23 0.20
   CSMD1 rs2616984 G/A intron variant 8:4625619 0.33 0.31 0.29
   DNAH11 rs368331 G/A intron variant 7:21703356 0.07 0.06 0.06
   FBXO40 rs3772130 G/A intron variant 3:121625293 0.20 0.25 0.25
   STK24 rs3783006 C/G intron variant 13:98458955 0.46 0.48 0.47
   CR1 rs3818361 A/G intron variant 1:207611623 0.33 0.25 0.25
   CD33 rs3826656 G/A intron variant 19:51223357 0.24 0.24 0.23
   APOE rs429358 C/T missense variant 19:44908684 0.22 0.10 0.13
   ACSM1 rs433598 T/C intron variant 16:20668884 0.35 0.37 0.34
   APOC1 rs4420638 G/A 500B Downstream Variant 19:44919689 0.18 0.14 0.18
   CNTN4 rs4629318 A/G intron variant 3:2851590 0.15 0.15 0.14
   CDON rs472926 G/A intron variant 11:126035363 0.16 0.17 0.17
   CACNA1C rs4765905 C/G intron variant 12:2240418 0.32 0.37 0.37
   TENM4 rs530965 T/C intron variant 11:79354056 0.49 0.47 0.49
   CSMD2 rs544991 T/C intron variant 1:33723829 0.31 0.27 0.31
   PICALM-
   RNU6-560P rs561655 G/A intergenic variant 11:86089237 0.30 0.35 0.33
   CHD6 rs6129846 T/C intron variant 20:41478674 0.06 0.05 0.06
   NRXN3 rs6574433 G/A intron variant 14:78319816 0.45 0.42 0.42
   CR1 rs6656401 A/G intron variant 1:207518704 0.31 0.23 0.24
   NECTIN2 rs6857 T/C 3 Prime UTR Variant 19:44888997 0.28 0.19 0.17
   NECTIN2 rs6859 A/G intron variant 19:44878777 0.50 0.42 0.44
   LOC105375630 rs7004633 G/A intron variant 8:88748082 0.21 0.19 0.20
   RELN rs7341475 A/G intron variant 7:103764368 0.16 0.19 0.16
   LOC105373605 rs7561528 A/G intron variant 2:127132061 0.31 0.31 0.34
   APOE rs769449 A/G intron variant 19:44906745 0.17 0.08 0.10
   KLHL1 rs7984606 C/A intron variant 13:69881529 0.003 0.004 0.05
   NKAIN2 rs9491140 T/C intron variant 6:124370091 0.32 0.32 0.31
   APOE rs7412 T/C missense variant 19:44908822 0.08 0.07 0.07
   [61]Open in a new tab

   ^a Minor allele/major allele.

2.3. Statistical Analyses

   Two SNPs (SORL1 rs11218343, TCF4 rs1261117) were excluded from the
   final analysis because they showed genotype call rates < 87%.
   Per-marker genotype call rates were higher than 95% for all the rest of
   the markers. The genotype distributions of 3 loci (LUZP2 rs1021261,
   CNTNAP2 rs10273775, GRIN2B rs2160519) were not consistent with the
   Hardy–Weinberg equilibrium in the control group. Therefore, these three
   polymorphisms were excluded from further analyses.

   Pearson’s chi-squared test was performed to verify the Hardy–Weinberg
   equilibrium of each SNP in LOAD patients and controls. The genetic
   variants in the control group which were not in the Hardy–Weinberg
   equilibrium (p < 0.05) were excluded from the analysis. Allele and
   genotype frequencies of each SNP were compared between LOAD cases and
   controls using the chi-squared (χ^2) test. The odds ratios (OR) and 95%
   confidence intervals (CIs) were calculated to assess the relative risk.
   The Benjamini and Hochberg false discovery rate method was used for
   multiple testing corrections [[62]20]. A p value threshold of 0.05 was
   used to determine significance.

   The linkage disequilibrium (LD) between SNP pairs in the genomic region
   of APOE- TOMM40 locus was quantified using Haploview version 4.2
   software [[63]21]. Haplotype frequencies were determined using the EM
   algorithm. The LD block structure was determined using the Solid Spine
   method provided by the Haploview 4.2. The degree of genetic linkage
   between the 8 SNPs in study groups was estimated as Lewontin’s
   coefficient D’ and Pearson’s correlation coefficient r^2, where no
   color (D’ = 0) indicates that LD is weak or nonexistent and the black
   (D’ = 1) indicates that there exists strong pairwise LD between SNPs
   ([64]Figure 1).

Figure 1.

   [65]Figure 1
   [66]Open in a new tab

   The structure of linkage disequilibrium in the genomic region of the
   NECTIN2-TOMM40-APOE locus: (a) tThe structure of LD for the cases; (b)
   the structure of LD for the controls. Linkage disequilibrium was
   measured by the D’ statistic: black—a strong link (D’ = 1, LOD > 2),
   grey—a significant link (D’ < 1, LOD > 2), white—poor link (D’ < 1, LOD
   < 2). A D’ value of 100 indicates a complete LD between 2 markers.

   Generalized multifactor dimensionality reduction (GMDR) analysis was
   carried out for the detection and characterization of gene–gene
   interactions. This is a generalized MDR framework based on the score of
   a generalized linear model [[67]22]. The best interaction model was
   selected on the basis of maximum testing balanced accuracy (TBA) and
   cross-validation consistency (CVC). Permutation (p) results were
   considered to be statistically significant at the 0.05 level. In our
   study, we used open-source software GMDR
   ([68]http://sourceforge.net/projects/gmdr, accessed on 5 August 2021).

   The protein–protein interaction network of the studied proteins was
   constructed with the online analysis tool STRING v11.0
   ([69]http://www.string-db.org/, accessed on 5 August 2021). The
   interactions include indirect (functional) and direct (physical)
   associations, which are derived from four sources: experiments, genomic
   context, co-expression (conserved), and previous knowledge. Biological
   processes, cellular components and molecular functions were analyzed by
   using a freely available system PANTHER (Released 2020-12-18,
   [70]http://www.pantherdb.org/, accessed on 5 August 2021). STRING and
   GO analyses included 41 genes. SNPs of these genes were genotyped in
   the present study ([71]Table 2). The analyses did not include markers
   from intergenic regions.

3. Results

   The minor allele frequencies of 58 SNPs in LOAD patients and in
   controls are presented in a [72]Table 2. Almost all the allele and
   genotype frequencies for the investigated loci were within the range
   reported in other populations of the European descent in the 1000
   Genomes project
   ([73]https://www.ncbi.nlm.nih.gov/variation/tools/1000genomes/,
   accessed on 5 August 2021), with the exception of three genetic
   variants. The G allele of rs2075650 at TOMM40 gene has higher frequency
   (28%) in the Russian population in comparison with other populations of
   the European origin in the 1000 Genomes project (16%). The minor allele
   C of rs7984606 at KLHL1 gene has lower frequency (0.4%) in the Russian
   population than other populations of European descent in the 1000
   Genomes project (1%). The allele A of DCHS2 rs1466662 has lower
   frequency (35%) in the Russian population than other populations of
   European origin in the 1000 Genomes project (69%).

   The statistical power of the study was estimated for the Genetic
   Association Study (GAS) Power Calculator
   ([74]http://csg.sph.umich.edu/abecasis/gas_power_calculator/, accessed
   on 5 August 2021) assuming an odds ratio (Genotype Relative Risk) of
   1.5, allele frequencies 18% in AD cases, and a significance level of
   0.05. The sample size of this study was estimated to have 80% power to
   detect a positive association for 36 variants with 18% frequency in AD
   cases. Power to detect an odds ratio of 1.5 or greater was 80% for six
   of the nine SNPs in [75]Table 2, showing non-significant results after
   correction for multiple testing (FDR) for three studied variants:
   FBXO40 rs3772130, CR1 rs6656401, NECTIN2 rs6859. The reporting of
   variants with low frequency which are not sufficiently powered is
   justified not only by detecting associations with the disease but by
   estimating allele frequencies of these markers in cases and controls
   belonging to specific population.

   Out of the 58 markers tested, nine showed significant association to
   LOAD ([76]Table 3). Two variants demonstrated marginal p values for the
   allele and/or genotype frequency difference between the LOAD and
   control subjects: FBXO40 rs3772130 and NECTIN2 rs6859. After correction
   for multiple testing the results remained statistically significant for
   five genetic variants in three genes: TOMM40 rs2075650, APOE ε4,
   rs429358 and rs769449, NECTIN rs6857 ([77]Table 3). All of these
   markers were located in the same region of the chromosome 19.

Table 3.

   Association of statistically significant genetic markers with LOAD in
   the Russian population.
   Gene Variant Minor Allele OR (95% CI) Major Allele OR (95% CI) p Value
   p Value Corrected
   APOE4 *
   (ε3/ε4) rs429358
   rs7412 ε4 2.88 (1.95–4.24) ε3 0.35 (0.24–0.51) 5 × 10^−8 8.62 × 10^−4
   APOE * rs769449 A 2.44 (1.62–3.66) G 0.41 (0.27–0.62) 1 × 10^−5 2.5 ×
   10^−3
   APOE * rs429358 C 2.53 (1.75–3.67) T 0.39 (0.27–0.57) 5 × 10^−7 1.7 ×
   10^−3
   TOMM40 * rs2075650 G 1.67 (1.21–2.29) A 0.60 (0.44–0.82) 2 × 10^−3 3.4
   × 10^−3
   NECTIN2 * rs6857 T 1.65 (1.21–2.26) C 0.61 (0.44–0.83) 2 × 10^−3 4.3 ×
   10^−3
   CR1 rs3818361 A 1.5 (1.12–2.00) G 0.67 (0.5–0.89) 6 × 10^−3 5.2 × 10^−3
   CR1 rs6656401 A 1.46 (1.09–1.96) G 0.68 (0.51–0.92) 0.01 6 × 10^−3
   NECTIN2 rs6859 A 1.34 (1.03–1.74) G 0.75 (0.57–0.97) 0.03 6.8 × 10^−3
   FBXO40 rs3772130 G 0.72 (0.52–0.99) A 1.39 (1.01–1.91) 0.04 7.7 × 10^−3
   [78]Open in a new tab

   OR odds ratio, CI confidence interval, p values corrected based on
   Benjamini and Hochberg method, * Statistically significant results
   after multiple comparison correction.

   Due to the fact that it is important for the LD analysis that the
   markers are in the same chromosome region, we included five SNPs in the
   LD analysis, which in our study were significantly associated with
   LOAD: TOMM40 rs2075650, APOE rs7412, rs429358 and rs769449, NECTIN
   rs6857. All of these markers were located in the same region of
   chromosome 19 (the APOE-TOMM40 region). Markers TOMM40 rs157580, NECTIN
   rs6859, APOC1 rs4420638 were genotyped, but were not significantly
   associated with the LOAD in our study. These markers were also analyzed
   using the LD analysis, as they were also located in the APOE-TOMM40
   region of chromosome 19, like the other 5 markers. The theoretically
   possible number of haplotypes was 256. A total of 44 haplotypes were
   detected in our samples. Twenty haplotypes were revealed in the control
   group, while 24 haplotypes were found in the cases. Fourteen identical
   haplotypes were observed in both groups. Identical haplotypes and their
   frequencies are presented in a [79]Table 4.

Table 4.

   Identical haplotypes identified in both groups.
   N rs, Gene Haplotype Frequencies (Cases/Controls)
   %
   rs6859
   NECTIN2 rs6857
   NECTIN2 rs157580
   TOMM40 rs2075650
   TOMM40 rs769449
   APOE rs429358
   APOE rs7412
   APOE rs4420638
   APOC1
   1 G C G A G T C A 21.4/26.3
   2 A C A A G T C A 15.7/16.4
   3 G C A A G T C A 14.0/20.9
   4 A T A G A C C G 11.0/6.4
   5 A T A G G T C A 9.1/9.9
   6 A C G A G T C A 7.1/5.5
   7 G C G A G T T A 5.0/3.1
   8 G C A A G T T A 2.8/3.0
   9 A C A A G T C G 2.3/2.7
   10 G T A G A C C G 1.4/1.0
   11 G T A A G C C G 1.1/0.6
   12 G C A A A C C G 0.7/0.6
   13 A T A A G C C G 0.7/0.8
   14 G C A A G T C G 0.5/1.0
   [80]Open in a new tab

   [81]Figure 2 demonstrates haplotypes with frequencies above 1% in the
   case and the control groups.

Figure 2.

   [82]Figure 2
   [83]Open in a new tab

   Haplotype frequencies for the LOAD patients and the control group. *
   Statistically significant differences between the cases and the
   controls (p < 0.05).

   The number of main haplotypes (frequency above 10%) was four in the
   control group and three in LOAD patients ([84]Figure 2). Results of the
   analysis based on comparison of the haplotype frequencies showed that
   risk haplotypes ATAGACCG (χ^2 = 7.313, p = 0.0068; OR 1.90, 95% CI
   1.19–3.05) and ATAGACCA (χ^2 = 14.589, p = 0.0001; OR 17.43, 95% CI
   2.24–135.62) were significantly associated with LOAD. Moreover, we
   identified one protective haplotype GCAAGTCA (χ^2 = 6.822, p = 0.009;
   OR 0.63, 95% CI 0.44–0.89) generated from these eight SNPs. All of the
   above associations remain significant after permutation test (p <
   0.05).

   The structure of LD among eight SNPs in the APOE-TOMM40 locus for the
   cases and controls is shown in [85]Figure 1. Analysis of LD in cases
   and controls demonstrated that these groups were characterized by
   different structures of haplotype blocks at APOE-TOMM40 locus
   ([86]Figure 1a,b). In controls, the first block consisted of four SNPs
   (rs6859, rs6857, rs157580, rs2075650) and the second included three
   SNPs (rs769449, rs429358, rs7412), while in the cases, the first block
   comprised four SNPs (rs6857, rs157580, rs2075650, rs769449) and the
   second block was made up of two closely located SNPs (rs429358,
   rs7412). Notably, this difference demonstrates the high level of LD in
   case group between rs429358 and rs7412 from the APOE gene, which are
   formed the APOE isoforms (ε2, ε3, ε4). The haplotype block structure
   for control group demonstrates the blocks that include genes:
   Block1-NECTIN2-TOMM40, Block2–APOE.

   Although not every studied variant in genes was significantly
   associated with AD by univariate (case–control) analysis, there were
   gene–gene interactions among variants using generalized multifactor
   dimensionality reduction (GMDR) method. GMDR combines genotypes into
   “high-risk” and “low-risk” groups in order to reduce multidimensional
   data into only one dimension. In our study, we analyzed all possible
   combinations of 44 genes both in cases and in control subjects. There
   were three significant models as a result of the GMDR analysis:
   one-factor, two-factor and three-factor model ([87]Table 5).

Table 5.

   Gene–gene interaction models by GMDR analysis.
   Best Interaction Model TBA ^#  CVC  p Value OR ^# (95% CI) ^#
            APOE           0.59  6/10  0.0006   2.81 (1.54–5.13)
        TCF4, APOE         0.66  10/10 <0.0001  4.04 (2.27–7.17)
      CLU, TCF4, APOE      0.66  10/10 <0.0001 5.71 (3.03–10.78)
   [88]Open in a new tab

   CVC—cross validation consistency, TBA—testing balanced accuracy, ^#
   Values rounded up to 2 decimal places.

   Among these results, we selected two models (a two-factor and a
   three-factor) as the best SNP combinations, based on the best Testing
   Balanced Accuracy (TBA) in data and their high cross-validation
   consistency (CVC) value. [89]Figure 3 shows the score distributions in
   the best models. Seven genotype combinations with an increased risk of
   developing LOAD and only one combination of low-risk genotypes were
   found for the two-factor model ([90]Figure 3a). The combination of
   low-risk genotypes showed statistically significant differences between
   the studied groups: TCF4 rs17594526 “CC” and APOE rs429358 “TT” (p <
   0.001; OR = 0.25; 95% CI: 0.14–0.45). Fourteen genotype combinations
   with an increased risk of developing LOAD and five combinations of
   low-risk genotypes were found for the three-locus model ([91]Figure
   3b). Since we did not have all genotypes for the three-locus model in
   GMDR analysis, we designated the data in this model as preliminary. The
   one-locus model was computed for APOE rs429358 and the empirical p
   value for prediction error using permutation testing was 0.0006,
   indicating the interactions among the three variants contributed to a
   higher risk of AD than did single variant alone ([92]Table 5).

Figure 3.

   [93]Figure 3
   [94]Open in a new tab

   Gene–gene interactions among TCF4 (rs17594526), APOE (rs429358) and CLU
   (rs1532278) loci in AD and healthy subjects: (a) two-factor model; (b)
   three-factor model. The score distribution of AD subjects (left black
   bar in boxes) and control subjects (right black bar in boxes) is shown
   for each genotype combination. High-risk genotype combinations are
   represented by dark gray shade cells, while light gray shade cells
   represent low-risk genotype combinations. Cells with no shading or
   white cells represent genotype combination for which no data is
   observed.

   In the present study, we conducted a protein–protein network
   interaction (PPI) search with proteins of AD, schizophrenia and
   cognitive performance collected via the GWAS database using STRING
   (v11.0). This exploratory bioinformatic analysis demonstrated a
   potential network of interaction between proteins connected to AD,
   schizophrenia and cognitive performance. PPI network analysis of 41
   studied genes showed statistically significant connectivity among
   proteins. Interaction enrichment had p-value < 1.0 × 10^−16 for our
   gene set. This means that such enrichment demonstrates that the
   proteins are at least partly biologically connected, as a group. The
   PPI network contained 24 nodes and 39 edges ([95]Figure 4). The STRING
   indicated that the proteins formed three subnetworks of interconnected
   gene products ([96]Figure 4). Each subnetwork had a central protein.
   The first subnetwork consisted of 10 proteins with APOE in the center:
   CD33, CR1, CLU, RELN, SORL1, NECTIN2 (PRVL2), TOMM40, APOC, MPC2. The
   second subnetwork included 6 proteins with CACNA1C as the center
   component: GRIN2B, TENM4, CSMD1, NT5C2, POM121L2. In addition, the
   third protein subgroup had 8 nodes with CTNNA2 in the center part:
   FBXO40, CNTN4, CTNNA2, CDON, CAMD2, TCF4, CHD6. Proteins with a large
   number of interactions were identified (with edges from 8 to 5). The
   eight genes with strong connections (“multiple interactions”) were
   APOE, SORL1, APOC1, CD33, CLU, TOMM40, CNTNAP2 and CACNA1C ([97]Figure
   4). The largest number of connections was established for the APOE
   protein (8). This large number of interactions is not surprising, given
   the high known effect of the APOE gene for phenotypic modifications for
   several medical traits [[98]23,[99]24,[100]25]. In addition to proteins
   with a large number of interactions, there are other proteins that can
   connect or act as bridges between subnetworks. The four “bridge” genes
   were APOE, RELN, CNTNAP2 and GRIN2B. It should be noted that APOE and
   CNTNAP2 represent both types of genes (“multiple interactions” and
   “bridge”).

Figure 4.

   [101]Figure 4
   [102]Open in a new tab

   Protein–protein interaction network of studied genes. The nodes and
   edges represent the proteins (genes) and their interactions,
   respectively. The PPI network contained 24 nodes and 39 edges. The
   STRING indicated that the proteins formed three subnetworks of
   interconnected gene products. Each subnetwork had a central protein:
   APOE, CACNA1C, CTNNA2. All these three subsystems are connected
   together by RELN as a bridge.

   To evaluate molecular function, cellular component, and biological
   process of our gene set we used bioinformatics resource Gene Ontology
   (GO, [103]http://geneontology.org, accessed on 5 August 2021) databases
   via PANTHER (The Protein ANalysis THrough Evolutionary Relationships,
   [104]http://www.pantherdb.org, accessed on 5 August 2021). This system
   includes correction for multiple testing for all GO categories, thus
   providing statistically significant evidence for their involvement in
   disease susceptibility. The studied gene set was enriched in three
   categories: molecular function (cell adhesion molecule binding),
   cellular component (neuron part), and biological process (nervous
   system development). Pathway enrichment analysis of genes using STRING
   is presented in [105]Table 6.

Table 6.

   Significantly enriched pathways of studied genes.
   Pathway ID Pathway Description Observed Gene Count False Discovery Rate
   Matching Proteins in Your Network (Labels)
   GO:0022008 neurogenesis 15 0.00028
   APOE,CDON,CLU,CNTN4,CNTNAP2,CTNNA2,KLHL1,LSM1,NRXN3,RELN,SORL1,STK24,TC
   F4,TENM4,ZNF365
   GO:0007155 cell adhesion 11 0.00036
   CADM2,CD33,CDON,CNTN4,CNTNAP2,CTNNA2,DCHS2,NRXN3,PVRL2,RELN,TENM4
   GO:0007399 nervous system development 17 0.00036
   APOE,CDON,CLU,CNTN4,CNTNAP2,CTNNA2,DNAH11,GRIN2B,KLHL1,LSM1,NRXN3,RELN,
   SORL1,STK24,TCF4,TENM4,ZNF365
   GO:0048699 generation of neurons 14 0.00036
   APOE,CDON,CNTN4,CNTNAP2,CTNNA2,KLHL1,LSM1,NRXN3,RELN,SORL1,STK24,TCF4,T
   ENM4,ZNF365
   GO:0032989 cellular component morphogenesis 10 0.00044
   CLU,CNTN4,CNTNAP2,CTNNA2,NRXN3,PVRL2,RELN,STK24,TENM4,ZNF365
   GO:0050767 regulation of neurogenesis 10 0.00044
   APOE,CDON,CNTN4,LSM1,RELN,SORL1,STK24,TCF4,TENM4,ZNF365
   GO:0048666 neuron development 10 0.00045
   APOE,CNTN4,CNTNAP2,CTNNA2,KLHL1,NRXN3,RELN,STK24,TENM4,ZNF365
   GO:0031175 neuron projection development 9 0.00058
   APOE,CNTN4,CNTNAP2,CTNNA2,KLHL1,NRXN3,RELN,STK24,ZNF365
   GO:0000902 cell morphogenesis 9 0.00061
   CLU,CNTN4,CNTNAP2,CTNNA2,NRXN3,RELN,STK24,TENM4,ZNF365
   GO:0032990 cell part morphogenesis 8 0.00064
   CNTN4,CNTNAP2,CTNNA2,NRXN3,PVRL2,RELN,STK24,ZNF365
   GO:0007417 central nervous system development 10 0.00084
   CDON,CLU,CNTN4,CNTNAP2,CTNNA2,GRIN2B,KLHL1,RELN,TENM4,ZNF365
   GO:0007611 learning or memory 6 0.00089
   APOE,CNTNAP2,DNAH11,GRIN2B,NRXN3,RELN
   [106]Open in a new tab

4. Discussion

   AD is the most common form of age-related neurodegenerative disease. To
   study such complex diseases, it is necessary to apply various types of
   approaches. In this study, we performed several different analyses to
   excavate loci involved in development of the late onset AD in the
   Russian population. First of all, we focused on the potential
   associations of SNPs previously identified in GWAS. Since not only AD
   is characterized by a decrease in cognitive function, we also included
   GWAS loci for other reported traits in our analysis, such as
   schizophrenia, unipolar depression, bipolar disorder, attention deficit
   hyperactivity disorder, autism spectrum disorder and others.

   Similar study designs have been reported in other papers for various
   disorders: AD and Parkinson’s disease, AD and ischemic stroke,
   schizophrenia and bipolar disorder, and others [[107]26,[108]27]. A
   recent study demonstrated an unbiased link between polygenic risk for
   schizophrenia and a lower risk of psychosis in AD [[109]28]. The most
   significant associations with LOAD were observed in the APOE region on
   chromosome 19 (p < 5 × 10^−8). Outside the APOE region, the most
   significant association was found in CR1 gene on chromosome 1
   (chr1:207611623, rs3818361, G/A, p = 6 × 10^−3). This gene is one of
   the ten highest risk factors for developing AD. The CR1 gene encodes
   the complement receptor 1 (CR1), which is one of the regulators of
   complement activity. The CR1 might prevent excessive complement
   activation. The presence of CR1 on erythrocytes is noted as an
   important component in protecting tissues against immune-complex
   deposition and following disease, such as AD [[110]29].

   After correction for multiple testing (the Benjamini and Hochberg false
   discovery rate method), only the APOE region genetic markers were
   remained statistically significant. Interestingly, according to our
   data, the G allele frequency (28%) of rs2075650 at TOMM40 gene in our
   Russian sample sets is significantly higher than in other populations
   not only of European descent (13–16%), but also of the Asian population
   (10–12%). Salakhov et al. [[111]30] found that TOMM40 gene rs2075650
   was associated with low-density lipoprotein cholesterol levels in
   healthy men group of the Russian population (Kemerovo, Russia), and G
   allele frequency in the total group was 21%. It should be noted that,
   most of the genetic variants contributing to risk of developing AD are
   located in non-coding regions and, presumably, they may influence as a
   controller of the functional activity of closely located genes that
   have an activating effect on transcription [[112]31]. We can suggest
   that there may be some additional genetic signal in this locus that is
   impact to risk AD. The study results supplement the previous data on
   the AD risk development genetic markers in the Russian population
   [[113]32].

   Within the chromosomal region 19q12-q13.33, there is an accumulation of
   SNPs that are probably critical for beginning and development of AD. A
   huge number of publications devoted to the role of this region in the
   progress of AD. In addition, although APOE was identified as a
   susceptibility factor for LOAD over 25 years ago [[114]33], it is still
   not clear how the ε4 allele contributes to disease risk. Some
   observations indicate an independent effect of SNPs of the
   NECTIN2-TOMM40-APOE locus [[115]34], while other studies showed that
   there are general effects of several SNPs [[116]35]. Our data
   demonstrated that in both the cases and the controls rs157580 of TOMM40
   gene is in high pairwise LD with the SNPs rs2075650 and rs769449 (D’ =
   100). However, LOAD-affected individuals are characterized by higher
   pairwise LD between neighboring SNPs compared with unaffected
   individuals ([117]Figure 1). Notably, the haplotype block structure for
   cases demonstrates that Block2 included rs429358 and rs7412 from the
   APOE gene, which are formed the APOE isoforms (ε2, ε3, ε4). Our study
   showed that the haplotypes that include the ε3 allele of APOE and the A
   allele of TOMM40 gene rs2075650 are associated with reduced risk for
   LOAD compared with haplotypes that include the ε4 allele of APOE and
   the G allele of rs2075650. In a recently published study on
   heritability of extreme longevity in human populations, it was found
   that the ε4 allele of APOE is associated with substantial reduction in
   the chance for extreme longevity [[118]36].

   Haplotypes that showed an association with a decrease in extreme
   longevity in Sebastiani et al. in our study demonstrated a significant
   association with an increased risk of developing LOAD. The possible
   combinations of loci that may reduce the risk of LOAD development were
   obtained by researching gene–gene interactions with GMDR. Approximately
   two-fold reduced risk for LOAD was observed for the combination of CLU
   rs1532278 “CC”, TCF4 rs17594526 “CC”, APOE rs429358 “TT” genotypes in
   the three-locus model. In the model combining TCF4 and APOE, genotype
   combination CC/TT could similarly be considered to protective to LOAD.
   Notably, our data presented intergenic interaction between the three
   loci (CLU, TCF4, APOE), demonstrating no significant associations in
   our replicative analysis. Transcription factor 4 gene (TCF4) plays an
   important role in nervous system development. TCF4, a basic
   helix–loop–helix transcription factor, is broadly expressed and is
   critical for normal brain development. The gene has been identified as
   the cause of Pitt-Hopkins syndrome (PTHS), and it has been implicated
   in various other neuropsychiatric diseases, including schizophrenia,
   autism and depression [[119]37,[120]38,[121]39].

   The STRING database aims to combine all known and predicted
   associations between proteins, including both functional associations
   and physical interactions. In the present study, we used this
   exploratory bioinformatic analysis to predict a possible PPI network
   between neurological and mental diseases with using GWAS database. The
   network shows possible PPIs between associated genes of AD,
   schizophrenia and cognitive performance. A protein–protein interaction
   network with three subnetworks was formed for 24 proteins. A more
   detailed analysis of subnetworks detected the traits of protein
   grouping for each of them. The first group (with the APOE) is
   characterized by the maximum number of edges (21). Proteins of this
   subnetwork are related by common biological processes, such as
   regulation (positive and negative) of various processes and molecule
   transport, as well as such common molecular function as binding. There
   were no significant GO associations for the other protein subgroup (the
   CACNA1C as a center node), but these proteins were connected through
   publication studies with psychiatric diseases, such as schizophrenia,
   bipolar disorder, depression, and others [[122]40,[123]41,[124]42]. The
   grouping trait in the third subnetwork (with the CNTNAP2 in the center)
   is the biological processes related with the neuron development and
   differentiation, as well as with the regulation of these processes. In
   addition, all these three subsystems are connected together by the
   Reelin as a bridge. Reelin is a large signaling protein that is engaged
   in a cascade of cytoplasmic events that control the migration of
   neurons during brain development and it is essential for the correct
   development and plasticity of the cerebral cortex and regulate synaptic
   plasticity, neurotransmission and memory in the adult brain.
   Presumably, Reelin plays a significant role both in AD and in
   psychiatric disorders [[125]43,[126]44,[127]45].

   The PPI networks have limited clinical predictive value, but discover
   hypothetical new paths of interaction. The forecasted PPI network needs
   careful interpretation and warrants preclinical and clinical
   validations. Nevertheless, this network gives new facts that will be
   helpful to new studies of general molecular background between
   neurological and mental diseases.

5. Conclusions

   In summary, genes and proteins affecting cognitive function are of
   special interest in the search of genetic composition of AD. The vast
   majority of cases of AD are viewed as sporadic results from the complex
   interaction of genetic risk factors and unknown environmental
   conditions. The present study confirmed the importance of the
   APOE-TOMM40 locus as the main risk locus of development and progress of
   LOAD. Association analysis and bioinformatics approaches detected
   interactions both at the association level of single SNPs and at the
   level of genes and proteins. Strong evidence for association with LOAD
   was seen for the APOE-4 polymorphism, as well as for rs429358, rs769449
   and rs2075650, using various approaches: replicative association
   analysis, LD analysis, GMDR analysis and protein–protein interaction
   analysis.

   Currently, priority approaches are aiming not only to search for genes,
   but also to establish their functional effect. This provides the
   possibility of using the progress made in identifying the biological
   effects of risk alleles to recognize new protective mechanisms against
   more rapid cognitive decline in AD. Identifying biological pathways
   that increase risk may be useful in the future for the treatment of
   cognitive and behavioral disorders. Nevertheless, the current findings
   are a step forward in in this direction. We believe that the
   identification of additional risk loci would be useful from the
   development of predictive approaches of AD.

Author Contributions

   A.B., K.V., A.M. and V.S. contributed conception and design of the
   study; N.Z., I.Z., O.M. and L.M. organized the database; A.B. performed
   the statistical analysis; A.B. wrote the first draft of the manuscript;
   V.S. reviewed and corrected draft of the manuscript. All authors have
   read and agreed to the published version of the manuscript.

Funding

   This research was funded by the Russian Science Foundation, grant
   number 16-14-00020.

Institutional Review Board Statement

   The study was conducted according to the guidelines of the Declaration
   of Helsinki, and approved by the Scientific Ethics Committee of the
   Research Institute of Medical Genetics (Tomsk National Research Medical
   Center of the Russian Academy of Sciences) (protocol number 2017/108).

Informed Consent Statement

   Informed consent was obtained from all subjects involved in the study.

Conflicts of Interest

   The authors declare no conflict of interest.

Footnotes

   Publisher’s Note: MDPI stays neutral with regard to jurisdictional
   claims in published maps and institutional affiliations.

References