Abstract Dyslipidemia represents a strong and independent risk factor for cardiovascular disease. Plasma cholesterol, such as total cholesterol (TC), low density lipoprotein cholesterol (LDL-C), and high density lipoprotein cholesterol (HDL-C), is the common indicator of diagnosing dyslipidemia. Here based on 382 Chinese twin pairs, we explored the magnitude of genetic impact on TC, HDL-C, LDL-C variation and further searched for genetic susceptibility loci for them using genome-wide association study (GWAS). The ACE model was the best fit model with additive genetic parameter (A) accounting for 26.6%, common or shared environmental parameter (C) accounting for 47.8%, unique/non-shared environmental parameter (E) accounting for 25.6% for the variance in HDL-C. The AE model was the best fit model for TC (A: 61.4%; E: 38.6%) and LDL-C (A: 65.5%; E: 34.5%). While no SNPs reached the genome-wide significance level (P < 5 × 10^−8), 8, 14, 9 SNPs exceeded the suggestive significance level (P < 1 × 10^−5) for TC, HDL-C, LDL-C, respectively. The promising genetic regions for TC, HDL-C, LDL-C were on chromosome 11 around rs7107698, chromosome 5 around rs12518218, chromosome 2 around rs10490120, respectively. Gene-based analysis found 1038, 1033 and 1090 genes nominally associated with TC, HDL-C, LDL-C (P < 0.05), especially FAF1, KLKB1 for TC, KLKB1 for HDL-C, and NTRK1, FAF1, SNTB2 for LDL-C, respectively. The number of common related genes among TC, HDL-C and LDL-C was 71, including FAF1, KLKB1, etc. Pathway enrichment analysis discovered known related pathways-zinc transporters, metal ion SLC transporters for TC, cell adhesion molecules CAMs, IL-6 signaling for HDL, FC epsilon RI signaling pathway, NFAT pathway for LDL, respectively. In conclusion, the TC and LDL-C level is moderately heritable and the HDL-C level is lowly heritable in Chinese population. The genomic loci, functional genes and pathways are identified to account for the heritability of plasma cholesterol level. Our findings provide important insights into plasma cholesterol molecular physiology and expect future research to replicate and validate our results. Keywords: cholesterol, dyslipidemias, genetics, GWAS, heritability, HDL, LDL, twins Introduction Dyslipidemia represents a strong and independent risk factor for cardiovascular disease ([35]1), a leading cause of death worldwide ([36]2, [37]3). Besides, many studies indicate that dyslipidemia can increase the risk of preeclampsia, colorectal cancer, diabetic macular edema and polycystic ovary syndrome ([38]4–[39]7). The prevalence of dyslipidemia is high and is increasingly prevalent in China ([40]8, [41]9). Plasma cholesterol, such as total cholesterol (TC), low density lipoprotein cholesterol (LDL-C), high density lipoprotein cholesterol (HDL-C), is a common indicator of diagnosing dyslipidemia. Therefore, exploring factors affecting plasma cholesterol homeostasis is a crucial step toward providing early prevention and therapeutic targets for dyslipidemia. The plasma cholesterol level is mediated by a combination of genetic and environmental factors. At present, the heritability for plasma cholesterol level has been estimated in several studies, with the heritability of TC, HDL-C, LDL-C level ranging from 0 to 89% ([42]10–[43]13), 22 to 93% ([44]13–[45]16), and 22 to 91% ([46]10, [47]13, [48]16, [49]17), respectively. Additionally, several genome-wide association studies (GWAS) have attempted to find susceptible genetic loci located in the corresponding gene affecting plasma cholesterol level. A GWAS conducted by Middelberg RP et al. ([50]18) for plasma cholesterol found genes LPL, LIPC, CETP associated with HDL-C level and CELSR2, APOB, TOMM40 with LDL-C level. Another two GWAS studies identified SRGAP2, HOXC13 and CD47, DUSP4 that were associated with HDL-C and TC level, respectively ([51]19, [52]20). However, the known genetic variations only explain a small proportion of the genetic contribution and many potential genetic genes and loci remain to be discovered. Besides, allele frequencies, life style and environmental contributions differ between Chinese and other ethnic populations. Twin samples will have a higher power in genetic study, especially in human complex diseases ([53]21). Here based on 382 Chinese twin pairs, we explored the genetic effect on TC, HDL-C, LDL-C variation and further searched for genetic susceptibility loci for these traits using GWAS ([54]22). Materials and methods Twin samples collection and phenotypic measurement The twin samples were collected from Qingdao Twin Registry and the details can be found in the literatures ([55]23, [56]24). Twins who were pregnancy and lactation, as well as incomplete co-twin pairs were excluded. Twin pairs that taking cholesterol-lowering drugs also were excluded. The study included 382 twin pairs for heritability analysis and 139 dizygotic (DZ) twin pairs for GWAS with a mean age of 51.6 ± 7.7. All twin samples undertook the blood sampling and a physical examination after a 10–12 h overnight fast and completed a questionnaire. The zygosity of same sex and blood type was identified by using 16 multiple short tandem sequence repeat DNA markers ([57]25, [58]26). According to the standard procedure using an automatic biochemical analyzer (Hitachi 7600; Hitachi, Tokyo, Japan), we tested participants' fasting blood cholesterol sample in the Qingdao Diabetes Hospital. Friedewald equation was used to calculate the LDL-C level: LDL-C (mmol/L) = TC–HDL-C–(TG/2.2) (mmol/L) ([59]27). The Regional Ethics Committee of the Qingdao CDC Institutional Review Boards approved this study and the ethical principles followed the Helsinki Declaration. All subjects signed the written informed consent. Genotyping, imputation, and quality control Genome-wide SNPs were genotyped using the Illumina's InfiniumOmni2.5Exome-8v1.2 Bead-Chip platform. We performed stringent genotype quality control procedures: locus missing (< 0.05), minor allele frequency (MAF > 0.05), call rate (>0.98), and Hardy-Weinberg Equilibrium (HWE > 1 × 10^−4). The final number of SNPs included in the subsequent GWAS analysis was 1,365,181. We used the IMPUTE2 ([60]28) software to impute un-typed SNPs using the LD information from 1,000 Genomes Project Phase 3 reference panel ([61]29) (CHS N = 171 and CHB N = 142). We used R^2 > 0.6, MAF > 0.05 and HWE > 1 × 10^−4 to filter the imputed data, and the 7,405,822 SNPs were used to explore the association with plasma cholesterol level. Statistical analysis Heritability Data preparation and statistical description were performed with SPSS version 22.0. We used the structural equation models (SEM) to evaluate the genetic variance components with Mx software[62]^1. The Blom's formula was used to guarantee the normal distribution or approximate normal distribution because of the deviation of the distribution of all indicators. Pearson's product-moment correlation coefficient was used to calculate intra-class phenotypic correlations. The correlation in MZ twins was significantly higher than DZ twins, reflecting the importance of genetic effects in plasma cholesterol levels. The total phenotypic variance could be decomposed to additive genetic variance (A), common or shared environmental variance (C), and unique/non-shared environmental variance (E). The full ACE model was firstly fitted, the likelihood-ratio χ^2-test was applied to test whether the contributions of A or C to the model had statistical significance by comparing the full model (ACE) and their nested models (CE and AE). The Akaike information criterion (AIC), which is equal to the goodness-of-fit χ^2-value minus twice the degrees of freedom, was used to indicate the parsimony of each model and a lower AIC indicated a better fit. Heritability (h^2) was calculated in the best-fitting model based on the ratio of additive genetic variation to total phenotypic variation, with adjusting for the effect of age, sex and education level. We calculated the power of twin pairs for additive genetic influences by Mx software, and the calculation results showed that the power of our heritability analysis was above 90%. GWAS SNPs-based analysis We used the genome-wide efficient mixed-model association (GEMMA) to test the association between plasma cholesterol level and SNP genotypes after adjusting for the following covariates: sex, age, and education level ([63]30). We used Quantile-quantile (Q-Q) and Manhattan plots to illustrate overall significance level (P < 5 × 10^−8) and suggestive level (P < 1 × 10^−5) autosomal and chromosomal X SNPs ([64]31). In addition, we analyzed the enrichment of cell-type enhancers for the typed GWAS results of the regulatory domains outside the coding regions by using online HaploReg v4.1 software[65]^2 ([66]32, [67]33). SNPs with P < 1 × 10^−5 were selected as query SNPs, and an uncorrected P < 0.05 for enrichments of cell-type enhancers were reported. Gene-based analysis The gene-based analysis integrated all SNPs within a gene to increase the signal or strength of association, which correcting for linkage disequilibrium (LD) and gene size. The gene-based test was implemented in Versatile Gene-based Association Study-2 (VEGAS2)[68]^3 which uses 1000 Genomes data to simulate the correlations of SNPs across the autosomes and chromosome X. ([69]34, [70]35). We used SNPs from “1000G East ASIAN Population”. The genome-wide significant gene for the association was defined as P < 2.63 × 10^−6 (0.05/19,001) as 19,001 genes being evaluated. pathway enrichment analysis We used PASCAL to compute pathway-scored ([71]36–[72]38). In this approach, genetic markers SNPs were first mapped to genes, and the association scores of all genes in the pathway were computed. We then combined the genes scores of the same pathways to calculate the pathway scores. We used chi-squared or empirical score to evaluate pathway enrichment of high-scoring (possibly fused) genes, avoiding any standard binary enrichment tests with inherent P-value thresholds. Pathways and their corresponding gene annotation were obtained from KEGG, Reactome, and Biocarta (as defined in MSigDB)[73]^4. Results Heritability The final sample consisted of 382 twin pairs for heritability analysis and 139 DZ twin pairs for GWAS analysis with a mean age of 51.6 ± 7.7. The mean value ± SD of TC, HDL-C, LDL-C level for all subjects was 4.9 ± 1.2 umol/L, 1.5 ± 0.5 umol/L, 2.8 ± 0.9 umol/L, respectively (Supplemental Table [74]1). MZ twin correlations for TC (rMZ = 0.61, 95%CI: 0.52–0.68), HDL-C (rMZ = 0.74, 95%CI: 0.68–0.79), LDL-C (rMZ = 0.65, 95%CI: 0.57–0.72) level were all larger than DZ twin correlations (rDZ = 0.35, 95%CI: 0.21–0.47; rDZ = 0.61, 95%CI: 0.50–0.70; rDZ = 0.35, 95%CI: 0.21–0.46), respectively, indicating the existence of genetic effects (Supplemental Table [75]2). The full ACE model was first determined and then the likelihood ratio test and AIC was applied to choose the nested models. Finally, the ACE model was the best fit model with A accounting for 26.6% (95% CI: 7.9–48.9), C accounting for 47.8% (95% CI: 26.3–64.6), E accounting for 25.6% (95% CI: 20.8–31.6) for the variance in HDL-C level. The best fit model for TC level was AE model with A accounting for 61.4% (95% CI: 53.2–68.3), E accounting for 38.6% (95% CI: 31.7–46.8) and for LDL-C level was also AE model with A accounting for 65.5% (95% CI: 57.9–71.8), E accounting for 34.5% (95% CI: 28.2–42.2) (Table [76]1). Table 1. Model fit and proportion of variance for TC, HDL-C, LDL-C level accounted by genetic and environmental parameters. Variable Model A (95%CI) C (95%CI) E (95%CI) −2LL df AIC χ^2 P TC ACE 51.8 (23.6–67.9) 9.0 (0–33.7) 39.2 (32.0–47.9) 1996.9 757 482.9 CE – – 49.5 (41.5–56.8) 50.5 (43.2–58.5) 2010.3 758 494.3 13.4 2.49E−04 AE[77]^* 61.4 (53.2–68.3) – – 38.6 (31.7–46.8) 1997.4 758 481.4 0.4 5.13E−01 HDL-C ACE[78]^* 26.6 (7.9–48.9) 47.8 (26.3–64.6) 25.6 (20.8–31.6) 1763.6 757 249.6 CE – – 69.4 (63.7–74.4) 30.6 (25.6–36.3) 1771.7 758 255.7 8.1 4.48E−03 AE 74.9 (69.5–79.4) – – 25.1 (20.6–30.5) 1778.5 758 262.5 14.9 1.15E−04 LDL-C ACE 61.5 (34.9–71.7) 3.8 (0–27.4) 34.7 (28.3–42.7) 2028.1 760 508.1 CE – – 50.9 (43.0–58.0) 49.1 (42.0–57.0) 2049.4 761 527.4 21.2 4.06E−06 AE[79]^* 65.5 (57.9–71.8) – – 34.5 (28.2–42.2) 2028.2 761 506.2 0.09 7.71E−01 [80]Open in a new tab ^* , The best fitted model; −2LL, −2 Log Likelihood; df, degree of freedom; χ^2, difference of χ^2-value; P, χ^2 test in model fitting; A, additive genetic influence; AIC, Akaike's information criterion; C, common or shared environmental influence; E, unique or non-shared environmental influence; HDL-C, high density lipoprotein cholesterol; LDL-C, low density lipoprotein cholesterol; TC, total cholesterol. The content discussed in detail were in bold. GWAS SNPs-based analysis TC level A sample of 139 DZ twin pairs including 1,365,181 qualified SNPs was included for the present GWAS. The Q-Q plot about TC level illustrated the relationship between the observed and expected GWAS P-values (Supplemental Figure [81]1). The genomic inflation factor (λ-statistic = 1) revealed no evidence of inflation of the test statistics due to population stratification. And weak association was shown due to the slight deviation in the upper right tail from the null distribution. No SNPs reached the genome-wide significance level (P < 5 × 10^−8) as illustrated in Manhattan plot (Supplemental Figure [82]2). However, there were 8 SNPs exceeding the threshold for suggestive significance level (P < 1 × 10^−5) (Supplemental Table [83]3). The strongest association SNP was rs7107698 (P = 2.29 × 10^−6). As the locus zoom plots illustrated, one chromosomal loci 11p15.4 showed nominal association with TC level (Supplemental Figure [84]3). Four SNPs (P = 2.29 × 10^−6 to 2.45 × 10^−6) were located at AMPD3 gene on chromosome 11p15.4. By HaploReg v4.1, three cell-type specific enhancers (uncorrected P < 0.05) of primary neutrophils from peripheral blood (P = 0.04), ovary (P = 0.04) and NHDF-Ad adult dermal fibroblast primary cells (P = 0.01) were identified for TC level (Supplemental Table [85]4). HDL-C level The Q-Q plot about HDL-C level was shown in Figure [86]1. None of the SNPs reached the genome-wide significance level (P < 5 × 10^−8) as illustrated in Manhattan plot (Figure [87]2). However, there are 14 SNPs exceeding the threshold for suggestive significance level (P < 1 × 10^−5) (Table [88]2). The strongest association SNP was kgp6737496 (rs199929635) (P = 7.10 × 10^−7). Chromosomal loci 5q14.1 showed suggestive association with HDL-C level, which including kgp6737496 (rs199929635), rs12518218, rs7729225 near LOC101929154 genes (Figure [89]3). Figure 1. [90]Figure 1 [91]Open in a new tab Quantile-quantile plot for quality control check and visualizing crude association for genome-wide association study of HDL-C level. The x-axis shows the –log10 of expected P-values of association from chi-square distribution and the y-axis shows the –log10 of P-values from the observed chi-square distribution. The black dots represent the observed data, and the red line is the expectation under the null hypothesis of no association. Figure 2. [92]Figure 2 [93]Open in a new tab Manhattan plot for genome-wide association study of HDL-C level. The x-axis shows the numbers of autosomes and the X chromosome, and the y-axis shows the –log10 of P-values for statistical significance. The dots represent the SNPs. None of the SNPs reached the genome-wide significance level (P < 5 × 10^−8). Table 2. The summary of SNPs with P < 1 × 10^−5 for association with HDL-C in typed GWAS data. SNP Chr band CHR BP P-value Closest genes or genes Official full name kgp6737496 (rs199929635) 5q14.1 5 77,142,788 7.10E-07 LOC101929154 Uncharacterized LOC101929154 rs28402213 4q24 4 106,517,449 7.51E-07 ARHGEF38 Rho guanine nucleotide exchange factor 38 rs6468909 8q22.3 8 105,227,985 1.65E-06 RIMS2 Regulating synaptic membrane exocytosis 2 rs72685070 8q22.3 8 105,250,237 1.71E-06 RIMS2 Regulating synaptic membrane exocytosis 2 rs12518218 5q14.1 5 77,161,489 2.22E-06 LOC101929154 Uncharacterized LOC101929154 rs241178 8p21.1 8 28,626,418 2.67E-06 INTS9 Integrator complex subunit 9 rs56207115 12q13.13 12 52,994,896 3.08E-06 KRT72 Keratin 72 rs10939012 4p15.2 4 24,897,032 4.43E-06 CCDC149 Coiled-coil domain containing 149 rs10053012 5q35.1 5 171,522,275 5.71E-06 STK10 Serine/threonine kinase 10 rs12414709 10p13 10 17,041,083 5.90E-06 CUBN Cubilin rs12511068 4p15.2 4 24,896,658 6.04E-06 CCDC149 Coiled-coil domain containing 149 rs7729225 5q14.1 5 77,142,829 6.05E-06 LOC101929154 Uncharacterized LOC101929154 kgp10999245 (rs201647698) 8q22.3 8 105,243,123 6.36E-06 RIMS2 Regulating synaptic membrane exocytosis 2 rs17345993 10p13 10 17,032,885 9.56E-06 CUBN Cubilin [94]Open in a new tab kgp, 1,000 Genomes Project; CHR, chromosome. The content discussed in detail were in bold. Figure 3. [95]Figure 3 [96]Open in a new tab Regional association plot showing signal around chromosomal loci of 5q14.1 for genome-wide association study of HDL-C level. LDL-C level The Q-Q plot about LDL-C level was shown in Supplemental Figure [97]4. None of the SNPs reached the genome-wide significance level (P < 5 × 10^−8) as illustrated in Manhattan plot (Supplemental Figure [98]5). However, there are 9 SNPs exceeding the threshold for suggestive significance level (P < 1 × 10^−5) (Supplemental Table [99]5). The strongest association SNP was rs10490120 (P = 1.11 × 10^−6). At chromosomal loci 2p16.3, four SNPs rs10490120, rs4953640, rs74263479, rs17037869 were positioned closest to FSHR gene that was involved in follicle stimulating hormone receptor (Supplemental Figure [100]6). HUES6 cells (P = 0.01), HUES64 cells (P = 0.01) and iPS-18 cells (P = 0.01) were confirm as cell-type specific enhancers for LDL-C level (Supplemental Table [101]6). Imputation To maximize the identification of new risk variants, we imputed typed SNPs using 1,000 Genomes Project Phase 3 as the reference panel. Manhattan plots for all post-imputation variants showed none evidence of genome-wide significance level (P < 5 × 10^−8) (Figure [102]2; Supplementary Figures [103]2, [104]5). However, there were 39, 55, and 59 SNPs exceeding the threshold for suggestive significance level (P < 1 × 10^−5) for TC, HDL-C, and LDL-C, respectively. The strongest associations were rs77348447, rs79775842, rs56047090 for TC, HDL-C, LDL-C, respectively (Supplemental Tables [105]7–[106]9). We also compared our post-imputation results with 34,421 East Asians lipids GWAS meta-analysis results[107]^5 And 32 SNPs located in genes RGS5, OSBPL10, ADGRB3, PSMB7, SRSF8 for TC level, 50 SNPs located in genes KAZN, LOC105378657, LOC105373529, LOC105373941, BIN3, LINC02153, LDHAL6CP for HDL-C level, 39 SNPs located in genes VPS13D, IGF2BP2, PSMB7, CA10, NECTIN2, APOE, APOC1, TOMM40 for LDL-C level could be replicated (Supplemental Table [108]10). Gene-based analysis In the gene-based analysis, none genes were found to achieve genome-wide significance level and the number of genes that were nominally associated with TC, HDL-C, LDL-C level was 1,038, 1,033, and 1,090, respectively (P < 0.05). The top 20 genes were ranked by their P values (TC: Supplemental Table [109]11, HDL-C: Table [110]3, LDL-C: Supplemental Table [111]12). LOC101929154, RIMS2, INTS9, KRT72, CCDC149, CUBN genes for HDL-C level and AMPD3, ABR genes for TC level had already been shown in the suggestive level SNPs-based analysis, whereas the other genes were novel. Table 3. The top 20 genes from VEGAS2 gene-based analysis showing the strongest association with HDL-C level (P < 0.05) in typed GWAS data. Chr Gene Numbers of SNPs Start position Stop position Gene-based test statistic Gene P-value Top-SNP Top-SNP P-value 11 RRP8 9 6,621,143 6,624,880 57.80 4.00E-06 rs17834692 1.58E-04 1 RLF 23 40,627,040 40,706,593 164.27 1.20E-05 rs16827079 2.88E-05 4 MGARP 7 140,187,316 140,201,492 54.60 2.30E-05 rs3208941 2.04E-05 12 SLC26A10 5 58,013,692 58,019,934 45.55 2.90E-05 rs923828 1.79E-04 5 CTXN3 15 126,984,712 126,994,322 99.49 3.80E-05 rs248709 4.16E-04 3 UROC1 39 126,200,007 126,236,616 243.37 4.10E-05 rs1091553 1.99E-04 4 KLKB1 19 187,148,671 187,179,625 113.75 7.10E-05 rs1912826 1.31E-04 17 CBX4 6 77,806,954 77,813,213 54.62 8.60E-05 rs73422123 7.25E-05 19 ACPT 7 51,293,671 51,298,481 58.32 1.00E-04 rs55735528 5.48E-04 20 CSTL1 10 23,420,321 23,425,567 52.62 1.30E-04 rs3746737 3.40E-04 1 ISG15 5 948,846 949,919 36.63 1.80E-04 rs116002608 8.77E-05 19 C19orf48 13 51,300,949 51,308,110 112.72 1.90E-04 rs4801853 2.61E-04 1 TMCO2 2 40,713,572 40,717,365 17.20 2.10E-04 rs61200654 3.36E-05 4 NDUFC1 4 140,211,070 140,223,705 38.88 2.20E-04 rs12642647 2.49E-05 2 SOWAHC 5 110,371,910 110,376,564 43.04 2.40E-04 rs6726252 2.79E-04 19 ZNF776 7 58,258,163 58,269,527 45.19 3.10E-04 rs35919456 2.11E-04 1 CD101 24 117,544,371 117,579,173 106.37 3.30E-04 rs1555793 1.20E-04 10 IFIT1B 3 91,137,812 91,144,962 18.81 4.00E-04 rs10887951 7.36E-04 1 HECTD3 6 45,468,219 45,477,027 42.86 4.20E-04 rs7541207 2.66E-04 12 FBXO21 10 117,581,584 117,628,300 90.53 4.40E-04 rs2279766 5.40E-04 [112]Open in a new tab Chr, chromosome. The content discussed in detail were in bold. The number of common nominally associated genes (P < 0.05) between TC VS HDL-C level, TC VS LDL-C level, HDL-C VS LDL-C level was 195, 504, 91, respectively. And 71 common genes were found among the three traits (Supplemental Table [113]13). Pathway enrichment analysis PASCAL program was used in the pathway enrichment analysis. The number of pathways that were nominally associated with TC, HDL-C, and LDL-C level was 816, 529, and 813, respectively (P < 0.05). The top 20 pathways were ranked by their emp-P-values (TC: Supplemental Table [114]14; HDL-C: Table [115]4; LDL-C: Supplemental Table [116]15). The Q-Q plots were shown in Supplemental Figures [117]7–[118]9. Pathway analysis revealed several biological processes significantly associated with TC level: zinc transporters, metal ion SLC transporters, ERKS ARE inactivated, MAPK targets nuclear events mediated by map kinases, ERK MAPK targets, transport of glucose and other sugars bile salts and organic acids metal ions and amine compounds, amino acid transport across the plasma membrane, etc. Table 4. The top 20 pathway results-KEGG, Reactome, and Biocarta (emp-P < 0.05) using PASCAL program for HDL-C level in typed GWAS data. Pathway chisq-P emp-P -log(chisqP) –log(empP) REACTOME_CELL_SURFACE_INTERACTIONS_AT_THE_VASCULAR_WALL 7.11E-04 2.94E-04 3.14818 3.53165 BIOCARTA_HER2_PATHWAY 4.39E-04 4.35E-04 3.35736 3.36151 KEGG_CELL_ADHESION_MOLECULES_CAMS 6.25E-04 4.44E-04 3.20421 3.35262 REACTOME_IL_6_SIGNALING 6.27E-04 4.66E-04 3.20281 3.33161 KEGG_TIGHT_JUNCTION 2.91E-03 6.20E-04 2.53649 3.20761 BIOCARTA_AHSP_PATHWAY 8.78E-04 8.80E-04 3.05652 3.05552 REACTOME_BILE_SALT_AND_ORGANIC_ANION_SLC_TRANSPORTERS 9.23E-04 9.30E-04 3.03463 3.03152 REACTOME_TRANSPORT_OF_GLUCOSE_AND_OTHER_SUGARS_BILE _SALTS_AND_ORGANIC_ACIDS_METAL_IONS_AND_AMINE_COMPOUNDS 9.23E-04 9.50E-04 3.03463 3.02228 REACTOME_SIGNALING_BY_ILS 1.35E-03 1.17E-03 2.87021 2.93181 BIOCARTA_CHREBP2_PATHWAY 1.13E-03 1.30E-03 2.94743 2.88606 REACTOME_INTRINSIC_PATHWAY 1.51E-03 1.38E-03 2.82086 2.86012 REACTOME_FORMATION_OF_FIBRIN_CLOT_CLOTTING_CASCADE 1.51E-03 1.48E-03 2.82086 2.82974 REACTOME_NEPHRIN_INTERACTIONS 4.77E-03 1.82E-03 2.32139 2.73993 BIOCARTA_IL6_PATHWAY 2.04E-03 1.96E-03 2.69087 2.70774 KEGG_NATURAL_KILLER_CELL_MEDIATED_CYTOTOXICITY 7.95E-03 2.15E-03 2.09938 2.66756 REACTOME_CHONDROITIN_SULFATE_BIOSYNTHESIS 2.36E-03 2.31E-03 2.62634 2.63639 KEGG_SULFUR_METABOLISM 2.36E-03 2.34E-03 2.62634 2.63078 KEGG_GLYCOSAMINOGLYCAN_BIOSYNTHESIS_CHONDROITIN_SULFATE 2.36E-03 2.39E-03 2.62634 2.6216 REACTOME_APC_C_CDH1_MEDIATED_DEGRADATION_OF_CDC20_AND_OTHER_ APC_C_CDH1_TARGETED_PROTEINS_IN_LATE_MITOSIS_EARLY_G1 5.66E-03 2.61E-03 2.24755 2.58336 KEGG_LEUKOCYTE_TRANSENDOTHELIAL_MIGRATION 2.96E-03 2.62E-03 2.52912 2.5817 [119]Open in a new tab chisq-P, Chi-square p-value. Chi-squared method (gene-score p-value were ranked and transformed to a uniform distribution, these values were then transformed by a chi-square quantile function, and summed). The content discussed in detail were in bold. emp-P, empirical p-value. Empirical sampling method (gene-scores are transformed with chi-square quantile function and summed, then Monte Carlo estimate of the p-values were obtained by sampling random sets of the same size). Several biological pathways significantly related to HDL-C levels emerged: cell adhesion molecules CAMs, IL-6 signaling, tight junction, bile salt and organic anion SLC transporters, transport of glucose and other sugars bile salts and organic acids metal ions and amine compounds, signaling by ILS, IL6 pathway, natural killer cell mediated cytotoxicity, sulfur metabolism, chondroitin sulfate biosynthesis, glycosaminoglycan biosynthesis chondroitin sulfate, leukocyte transendothelial migration, etc. The pathways associated with LDL-C involved: FC epsilon RI signaling pathway, NFAT pathway, metabolism of amino acids and derivatives, ERBB signaling pathway, IL1R pathway, HIV infection, etc. Discussion Heritability Our investigation was based on 382 twin pairs to explore the heritability of TC, HDL-C, LDL-C level, and confirmed the genetic variants underlying these trait by GWAS. The correlation in MZ twins was significantly higher than DZ twins, which reflected the existence of genetic variance in TC, HDL-C, LDL-C levels (Supplemental Table [120]2). The ACE model was the best fit model with A accounting for 26.6%, C accounting for 47.8%, E accounting for 25.6% for the variance in HDL-C level. The best fit model for TC level was AE model with A accounting for 61.4%, E accounting for 38.6% and for LDL-C level was also AE model with A accounting for 65.5%, E accounting for 34.5% (Table [121]1). The magnitude of the additive genetic components (A) for TC level obtained here was in line with several previous twin studies ([122]15, [123]16, [124]27, [125]39–[126]47). HDL-C level is close to some other twin studies ([127]11, [128]14, [129]27, [130]48, [131]49). LDL-C level is similar to the others ([132]27, [133]42, [134]43, [135]45, [136]47, [137]50). GWAS SNP-based analysis TC level. While no genome-wide significant SNPs were found in GWAS for TC level, one SNP rs7107698 on chromosome 11 was found as promising genetic regions. The AMPD3 gene around the rs7107698 has been linked to the TC level. Adenosine monophosphate deaminase 3 (AMPD3) is an enzyme that catalyzes the hydrolytic deamination of adenosine monophosphate to inosine monophosphate and participates in the pathway of adenylate catabolic ([138]51). The activation of AMP deaminase leads to the increase of uric acid and the production of mitochondrial oxidative stress, which further stimulates the formation of de novo lipogenesis and the activation of ATP-citrate lyase and the synthesis of long-chain saturated fatty acids and eventually affects TC level ([139]52, [140]53). Additionally, three highly correlated SNPs (rs4909928, rs12184411, kgp6520322) were detected within AMPD3 gene, showing suggestive evidence of association with TC level. HDL-C level. This cluster of suggestive SNPs (kgp6737496, rs12518218, rs7729225) is linked to LOC101929154, a non-coding RNA. Although we have not fully elucidated the biological function of this ncRNA, mutations in base pairs may affect their regulatory domain to regulate HDL-C levels. Another gene CUBN including suggestive SNPs rs12414709 and rs17345993 has been linked to HDL-C metabolism ([141]54). Cubilin (CUBN) encodes high affinity HDL-C and lipid-poor apoA-I endocytosis receptor, which participate in the renal clearance of filterable apolipoprotein AI/HDL-C ([142]55, [143]56). LDL-C level SNPs rs10490120, rs4953640, rs74263479, rs17037869 were included in the FSHR gene. Follicle stimulating hormone receptor (FSHR) may interact with FSH in human hepatic tissue, which can elevate LDL-C level by blocking the expression of LDLR ([144]57, [145]58). Another KCNK9 around the suggestive SNP rs13251143 also has evidence for dyslipidemia. A great deal of studies has shown that KCNK9 gene is associated with obesity, HDL-C, adiponectin levels and aldosterone production ([146]59–[147]61). All the above factors are related to dyslipidemia ([148]62–[149]64). As for cell-type enhancers, the relationship between TC level and ovary has been studied. Women with polycystic ovary syndrome had higher TC and LDL-C level, lower HDL-C level ([150]65, [151]66). Many studies have showed that Estrogen Receptor 1 (ESR1) is associated with TC, HDL-C, LDL-C level ([152]67). Imputation Genotype imputation substantially increases available SNPs for analysis in GWAS. Substantive SNPs introduced by imputation reached the suggestive level associated with cholesterol. SNPs rs17578959 and rs28845526 were located in the PID1 gene, which served as the regulator of the LDLR-related protein 1 (LRP1) function and controlled the processing of postprandial lipoproteins ([153]68). Solute Carrier Family 13 Member 1 (SLC13A1) gene can regulate the cholesterol level by participating in transport of glucose and other sugars, bile salts and organic acids, metal ions and amine compounds pathways. As additional replication, we cross-referenced our post-imputation results with 34,421 East Asians lipids GWAS meta-analysis results (Supplemental Table [154]10). A list of SNPs could be replicated, especially the SNPs located in the relevant genes, eg. RGS5, OSBPL10, SRSF8 for TC level, KAZN, LOC105378657, LOC105373529, LOC105373941, BIN3, LINC02153, LDHAL6CP for HDL-C level, VPS13D, IGF2BP2, PSMB7, CA10, NECTIN2, APOE, APOC1, TOMM40 for LDL-C level. Gene-based analysis TC level Fas Associated Factor 1 (FAF1) can inhibit the activation of NF-kB by interfering with the assembly of IêB kinase (IKK) complex and impeding the nuclear translocation of NF-kB RelA in stimulation-dependent manner. NF-kB signaling pathway is closely related to inflammation ([155]69–[156]71). Epidemiology investigations have revealed that inflammation is associated with a risk of dyslipidemia ([157]72, [158]73). Lamina C et al. found that Kallikrein B1 (KLKB1) was associated with apolipoprotein A-IV concentrations in a genome-wide association meta-analysis. Apolipoprotein A-IV acting as a major component of HDL-C and chylomicron particles participates in reverse cholesterol transport ([159]74). HDL-C level Kallikrein B1 (KLKB1) has been studied in a genome-wide association meta-analysis on apolipoprotein A-IV concentrations. Apolipoprotein A-IV acting as a major component of HDL-C and chylomicron particles participates in reverse cholesterol transport ([160]74). LDL-C level Neurotrophic Receptor Tyrosine Kinase 1 (NTRK1) encodes the neurotrophic tyrosine kinase receptor (NTKR) family members, which are related to classical mitogen-activated protein kinases (MAPK) signaling pathways ([161]75). Further, MAP signaling pathways participated in lipid metabolism ([162]76). Syntrophin Beta 2 (SNTB2) suggestes a role in ERK and SR-BI level, and sphingomyelin metabolism in obesity and have been proved to affect the activity of ABCA1 by stabilizing ABCA1 protein, which affect the catabolism of LDL-C level ([163]77–[164]79). Fas Associated Factor 1 (FAF1) also plays a key role in regulating the LDL-C level ([165]69–[166]71). Pathway enrichment analysis TC level Pathway analysis revealed several biological processes significantly associated with TC level: zinc transporters, metal ion SLC transporters, immune system, regulation of transport. Other novel pathways especially top 20 may also be interesting potential candidates for future research and validation. ([167]1) Zinc serves as a catalytic or structural cofactor for many different proteins and participates in the formation and function of enzyme. One study in young Finns (6–18 years) reported that serum Zinc was positively correlated with TC level ([168]80). ([169]2) Six SLC gene families encode proteins which mediate transport of metals. Transition metal ion such as vanadium, iron, can influence the serum lipid and lipoprotein profiles ([170]81, [171]82). ([172]3) ERKS ARE inactivated, MAPK targets nuclear events mediated by map kinases and ERK MAPK targets participate in the immune system pathway. Immune system is also associated with cholesterol level. Acute phase conditions and immune disorders promote the decrease of HDL-C and change the composition and size of HDL-C ([173]83). ([174]4) Transport of glucose and other sugars bile salts and organic acids metal ions and amine compounds and amino acid transport across the plasma membrane participate in regulation of transport. Several genes (GAS6, GPLD1, PRKCE, WNK1, ESR1, RAC2, FAF1, FFAR1, RNASEL, GCK, SLC30A8, RARRES2, TLR5, TLR1, ACSL3, ABCA7, Th17) enriching in pathway of regulation of transport have been clarified the association with cholesterol level. HDL level Several biological pathways significantly related to HDL-C levels include: cell adhesion molecules CAMs, tight junction, chondroitin sulfate biosynthesis, sulfur metabolism, immune system, regulation of transport. (1) An increased CAMs expression may be a mechanism that decreases plasma HDL-C level ([175]84). (2) Zonulin, a physiological mediator of tight intercellular junctions reversibly regulates intestinal permeability to reduce the level of HDL-C ([176]85). (3) Animal experiments have shown that chondroitin-6-sulfate can reduce plasma LDL-C ([177]86). (4) Sulfur-containing compounds are generally associated with an unfavorable lipid profile ([178]87). (5) IL-6 signaling, signaling by ILS, IL6 pathway, natural killer cell mediated cytotoxicity, leukocyte transendothelial migration constitute the immune system to regulate cholesterol levels. (6) Bile salt and organic anion SLC transporters, transport of glucose and other sugars bile salts and organic acids metal ions and amine compounds participate in regulation of transport. LDL level The pathways associated with LDL-C were: immune system, metabolism of amino acids and derivatives, ERBB signaling pathway, HIV infection. (1) FC epsilon RI signaling pathway, NFAT pathway, IL1R pathway constitute the immune system to regulate cholesterol levels. (2) Metabolism of amino acids and derivatives participates in the formation and function of enzyme to regulate cholesterol level. (3) ERBB4 activates sterol regulatory element binding protein-2 (SREBP-2) to enhance LDL-C uptake and cholesterol biosynthesis ([179]88). (4) A cross-sectional epidemiological study in China showed that mean LDL-C was lower in HIV-positive than HIV-negative subjects ([180]89). Strengths and limitations There are two advantages in this study. First, our results were based on the twin data of TC, HDL-C, LDL-C level. Associated individuals, such as twin pairs, will confer increased power in genetic association analysis due to their genetic association ([181]21). And the power of our heritability analysis was above 90%, which suggested that the heritability findings were credible. Second, compared with most GWAS results from European populations, we sampled GWAS from the Qingdao twin population. Our GWAS results of TC, HDL-C, LDL-C level in East Asian population provided a basis for future molecular biology investigations. This study has two limitations that are noteworthy. First, our study had a relatively small sample size due to the challenges of recruiting and identifying qualified twin pairs. So it should be noted that the power of our analysis might not be sufficient to detect association between SNPs, pathways and cholesterol levels, and our results needed to be furtherly confirmed. Second, our study provided a lot of suggestive results, but no statistically significant results. However, many variants and genes had been confirmed, other novel variants and genes may also be interesting potential candidates for future research and validation. Conclusions In brief, we have verified genetic impact on TC, HDL-C, LDL-C variation through twin modeling. The promising genetic regions for TC, HDL-C, LDL-C level were on chromosome 11 around rs7107698, chromosome 5 around rs12518218, chromosome 2 around rs10490120, respectively. There are 8, 14, 9 SNPs exceeding the threshold for suggestive significance level for TC, HDL-C, LDL-C level, respectively. In the gene-based analysis, the number of genes that was nominally associated with TC, HDL-C, LDL-C level was 1,038, 1,033, and 1,090, respectively. Although our findings need to be replicated and validated, the data reported here could represent a useful reference for GWAS results of TC, HDL-C, LDL-C level in East Asian population and provide a basis for future molecular biology investigations. Data availability The SNPs datasets of this study have been deposited in the European Variation Archive (EVA) (Accession No. [182]PRJEB23749). Author contributions HL and DZ designed the study. HD and CX collected samples and phenotypes. HL and CZ assisted in sample data and sequencing data management. HL and WW analyzed the sequencing data and interpreted the analysis results. HL and DZ drafted the manuscript, XT and HD participated in the discussion, and WW, CX, and CZ revised it. All the authors read the manuscript and agreed to publish. All the authors agreed to be responsible for all aspects of the work. Conflict of interest statement The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. Glossary Abbreviations DZ dizygotic GWAS genome-wide association study MZ monozygotic TC total cholesterol VEGAS2 Versatile Gene-based Association Study-2. ^1[183]http://www.vcu.edu/mx ^2[184]http://archive.broadinstitute.org/mammals/haploreg/haploreg.php ^3[185]https://vegas2.qimrberghofer.edu.au/ ^4[186]http://software.broadinstitute.org/gsea/msigdb ^5[187]https://blog.nus.edu.sg/agen/summary-statistics/ Funding. This study was supported by the grants from the National Natural Science Foundation of China (31371024), the EFSD/CDS/Lilly Programme award (2013). Supplementary material The Supplementary Material for this article can be found online at: [188]https://www.frontiersin.org/articles/10.3389/fendo.2018.00677/full #supplementary-material Supplemental Figure 1 Quantile-quantile plot for quality control check and visualizing crude association for genome-wide association study of TC level. The x-axis shows the –log10 of expected P-values of association from chi-square distribution and the y-axis shows the –log10 of P-values from the observed chi-square distribution. The black dots represent the observed data, and the red line is the expectation under the null hypothesis of no association. [189]Click here for additional data file.^ (2.2MB, TIF) Supplemental Figure 2 Manhattan plot for genome-wide association study of TC level. The x-axis shows the numbers of autosomes and the X chromosome, and the y-axis shows the –log10 of P-values for statistical significance. The dots represent the SNPs. None of the SNPs reached the genome-wide significance level (P < 5 × 10^−8). [190]Click here for additional data file.^ (3.3MB, TIF) Supplemental Figure 3 Regional association plot showing signal around chromosomal loci of 11p15.4 for genome-wide association study of TC level. [191]Click here for additional data file.^ (308.8KB, TIF) Supplemental Figure 4 Quantile-quantile plot for quality control check and visualizing crude association for genome-wide association study of LDL-C level. The x-axis shows the –log10 of expected P-values of association from chi-square distribution and the y-axis shows the –log10 of P-values from the observed chi-square distribution. The black dots represent the observed data, and the red line is the expectation under the null hypothesis of no association. [192]Click here for additional data file.^ (2.1MB, TIF) Supplemental Figure 5 Manhattan plot for genome-wide association study of LDL-C level. The x-axis shows the numbers of autosomes and the X chromosome, and the y-axis shows the –log10 of P-values for statistical significance. The dots represent the SNPs. None of the SNPs reached the genome-wide significance level (P < 5 × 10^−8). [193]Click here for additional data file.^ (3.3MB, TIF) Supplemental Figure 6 Regional association plot showing signal around chromosomal loci of 2p16.3 for genome-wide association study of LDL-C level. [194]Click here for additional data file.^ (6.6MB, TIF) Supplemental Figure 7 Quantile-quantile plot for PASCAL pathways results of TC level. The x-axis shows the -log10 of expected P-values of association from chi-square distribution and the y-axis shows the –log10 of P-values from the observed chi-square distribution. The black dots represent the observed data, and the red line is the expectation under the null hypothesis of no association. [195]Click here for additional data file.^ (4.2MB, TIF) Supplemental Figure 8 Quantile-quantile plot for PASCAL pathway results of HDL-C level. The x-axis shows the –log10 of expected P-values of association from chi-square distribution and the y-axis shows the –log10 of P-values from the observed chi-square distribution. The black dots represent the observed data, and the red line is the expectation under the null hypothesis of no association. [196]Click here for additional data file.^ (4.3MB, TIF) Supplemental Figure 9 Quantile-quantile plot for PASCAL pathway results of LDL-C level. The x-axis shows the –log10 of expected P-values of association from chi-square distribution and the y-axis shows the –log10 of P-values from the observed chi-square distribution. The black dots represent the observed data, and the red line is the expectation under the null hypothesis of no association. [197]Click here for additional data file.^ (4.2MB, TIF) Supplemental Table 1 Descriptive statistics for subjects in all sample and GWAS sample. [198]Click here for additional data file.^ (22.5KB, DOCX) Supplemental Table 2 Descriptive statistics for subjects in all sample and GWAS sample phenotypic correlation coefficients (95% CI) with covariates' effects in MZ and DZ twin pairs. [199]Click here for additional data file.^ (24.6KB, DOCX) Supplemental Table 3 The summary of SNPs with P < 1 × 10^−5 for association with TC in typed GWAS data. [200]Click here for additional data file.^ (19.3KB, DOCX) Supplemental Table 4 Query SNP enhancer summary for TC level in typed GWAS data. [201]Click here for additional data file.^ (33.6KB, DOCX) Supplemental Table 5 The summary of SNPs with P < 1 × 10^−5 for association with LDL-C in typed GWAS data. [202]Click here for additional data file.^ (18.9KB, DOCX) Supplemental Table 6 Query SNP enhancer summary for LDL-C level in typed GWAS data. [203]Click here for additional data file.^ (31.6KB, DOCX) Supplemental Table 7 The summary of SNPs with P < 1 × 10^−5 for association with TC in GWAS imputation. [204]Click here for additional data file.^ (25.9KB, DOCX) Supplemental Table 8 The summary of SNPs with P < 1 × 10^−5 for association with HDL-C in GWAS imputation. [205]Click here for additional data file.^ (29.2KB, DOCX) Supplemental Table 9 The summary of SNPs with P < 1 × 10^−5 for association with LDL-C in GWAS imputation. [206]Click here for additional data file.^ (30.2KB, DOCX) Supplemental Table 10 The comparison between our imputation results of genome-wide association study and other East Asian studies. [207]Click here for additional data file.^ (46.2KB, DOCX) Supplemental Table 11 The top 20 genes from VEGAS2 gene-based analysis showing the strongest association with TC level in typed GWAS data. [208]Click here for additional data file.^ (21.3KB, DOCX) Supplemental Table 12 The top 20 genes from VEGAS2 gene-based analysis showing the strongest association with LDL-C level in typed GWAS data. [209]Click here for additional data file.^ (21.3KB, DOCX) Supplemental Table 13 Common genes among TC, HDL-C and LDL-C level (P < 0.05). [210]Click here for additional data file.^ (22.4KB, DOCX) Supplemental Table 14 The top 20 pathway results-KEGG, Reactome, and Biocarta (emp-P < 0.05) using PASCAL program for TC level in typed GWAS data. [211]Click here for additional data file.^ (23.7KB, DOCX) Supplemental Table 15 The top 20 pathway results-KEGG, Reactome, and Biocarta (emp-P < 0.05) using PASCAL program for LDL-C level in typed GWAS data. [212]Click here for additional data file.^ (22.9KB, DOCX) References