Abstract

Background

   DNA methylation, an epigenetic modification, can be affected by
   environmental factors and thus regulate gene expression levels that can
   lead to alterations of certain phenotypes. Network analysis has been
   used successfully to discover gene sets that are expressed differently
   across multiple disease states and suggest possible pathways of disease
   progression. We applied this framework to compare DNA methylation
   levels before and after lipid-lowering medication and to identify
   modules that differ topologically between the two time points,
   revealing the association between lipid medication and these
   triglyceride-related methylation sites.

Methods

   We performed quality control using beta-mixture quantile normalization
   on 463,995 cytosine-phosphate-guanine (CpG) sites and deleted
   problematic sites, resulting in 423,004 probes. We identified 14,850
   probes that were nominally associated with triglycerides prior to
   treatment and performed weighted gene correlation network analysis
   (WGCNA) to construct pre- and posttreatment methylation networks of
   these probes. We then applied both WGCNA module preservation and
   generalized Hamming distance (GHD) to identify modules with topological
   differences between the pre- and posttreatment. For modules with
   structural changes between 2 time points, we performed
   pathway-enrichment analysis to gain further insight into the biological
   function of the genes from these modules.

Results

   Six triglyceride-associated modules were identified using pretreatment
   methylation probes. The same 3 modules were not preserved in
   posttreatment data using both the module-preservation and the GHD
   methods. Top-enriched pathways for the 3 differentially methylated
   modules are sphingolipid signaling pathway, proteoglycans in cancer,
   and metabolic pathways (p values < 0.005). One module in particular
   included an enrichment of lipid-related pathways among the top results.

Conclusions

   The same 3 modules, which were differentially methylated between pre-
   and posttreatment, were identified using both WGCNA module-preservation
   and GHD methods. Pathway analysis revealed that triglyceride-associated
   modules contain groups of genes that are involved in lipid signaling
   and metabolism. These 3 modules may provide insight into the effect of
   fenofibrate on changes in triglyceride levels and these methylation
   sites.

Background

   Epigenetic changes are biochemical modifications in chromosomes that do
   not alter the DNA sequence [[43]1]. DNA methylation is one such
   epigenetic process implicated in human disease. DNA methylation
   involves the addition of a methyl group to DNA, which usually occurs at
   cytosine-phosphate-guanine (CpG) dinucleotides in the promoter region
   or within genes [[44]1]. It is known to regulate gene expression levels
   by changing the chromatin structure, thereby preventing transcription
   factors from binding to the gene promoter, which can lead to
   alterations of phenotypes [[45]2]. DNA methylation can be modulated by
   external factors, such as smoking or exposure to toxins [[46]3]. As
   such, epigenetic information is considered to be fundamental in
   understanding the interaction between the human genome and the
   environment.

   Recent research has unveiled the potential involvement of DNA
   methylation on the regulation of fasting blood lipids [[47]4–[48]6].
   One way to visualize interactions and changes in the DNA methylation
   profile is to construct methylation networks and compare their
   topology. Structural changes resulting from external stimuli can be
   detected with network comparison algorithms that can identify
   subnetworks that are either preserved or structurally different.
   Several papers have performed network-based methods to identify
   trait-related modules [[49]7–[50]9] and examined the preservation of
   such modules, either between different tissues or different data sets.
   For example, Horvath et al. [[51]7] conducted weighted gene correlation
   network analysis (WGCNA) to examine the effect of aging on DNA
   methylation modules in humans and reported a robustly defined
   age-related comethylation module that is present in multiple human
   issues including blood and brain. Rickabaugh et al. [[52]8] also found
   a preserved methylation module that was associated with age and HIV-1
   status in 2 different HIV data sets via WGCNA. However, to our best
   knowledge, there are no previous studies using network-based approaches
   to identify triglyceride-associated modules and assess topological
   differences for each module between 2 time points. We investigated the
   topological differences of triglyceride-associated methylation networks
   constructed before and after the administration of fenofibrate.

Methods

Quality control

   Data from GAW20 was used and included epigenetic and pharmacogenomics
   data for 188 unique families [[53]10]. A total of 995 individuals with
   pretreatment DNA methylation profiles at visit 2 and 530 individuals
   with posttreatment DNA methylation profiles at visit 4 (with 446
   individuals overlapping during the 3-week treatment period) were used
   to construct, separately, a pre- and posttreatment network. Because of
   systemic differences in the range of Infinium I and Infinium II probe
   methylation values, we performed beta-mixture quantile normalization
   (BMIQ) using the bmiq function from wateRmelon package in R on a total
   of 463,995 methylation probes to adjust the beta values of Type II
   probes to align with the distribution of Type I probes [[54]11,
   [55]12]. Probes with single nucleotide polymorphisms (SNPs) under the
   actual CpG sites, SNPs at the nucleotide right next to a CpG site, or
   cross-reactive probes that have a target sequence similar to another
   location in the genome are potentially problematic, and were excluded
   from analysis [[56]13]. A total of 423,004 pretreatment CpG sites were
   used in the analysis.

Network

   We were interested in DNA methylation changes caused by fenofibrate. To
   build the networks, we limited the methylation probes to those that are
   nominally associated (p < 0.05) with the log-transformed triglyceride
   level from visit 2, resulting in 14,850 probes from pretreatment data.
   If we limited our study to those probes that are significantly
   associated (false discovery rate [FDR] < 0.05), we might lose many
   informative probes that only show nominal significance without
   considering their potential interactions with other probes used in the
   network constructions.

   To isolate the treatment effect from other covariates, we built the
   networks from pre- and posttreatment residual methylation values. We
   performed linear mixed models using the lmekin function in the coxme
   package in R for each of the 14,850 pretreatment CpG sites, adjusting
   for family relatedness, the first 10 principal components accounting
   for residual batch effects, and the covariates age, sex, field center,
   and smoking status [[57]14]. Similarly, we obtained residuals for
   posttreatment methylation for the same set of CpG sites. Samples with
   more than half missing residuals and CpG sites with variance smaller
   than 1 × 10^− 10*max (abs [residuals of all probes among all samples])
   were removed. We applied hierarchical clustering to samples to detect
   sample outliers at a branch cut height of 7 using the hclust function
   from the stats package in R. Overall, 47 probes and 3 samples were
   excluded in both pre- and posttreatment methylation data.

   We used the residuals to construct pretreatment and posttreatment
   networks using WGCNA, which uses pairwise correlations between the
   variables to create a weighted network [[58]15]. We chose soft
   thresholding power β according to the approximate scale-free topology
   criterion and computed the adjacency matrix by raising co-expression
   similarities by power β [[59]16]. We chose a power of β = 3 as it was
   the lowest power at which the scale-free topology had a good fit of R^2
   greater than 0.9.

   To minimize the effects of noise and spurious associations, the
   adjacency matrix was transformed into the topological overlap matrix,
   which reflects the relative interconnectedness between a pair of
   methylation probes, and the corresponding dissimilarity matrix was
   calculated by subtracting the adjacency matrix from 1. We performed
   hierarchical clustering of the dissimilarity matrix and a dynamic
   tree-cut algorithm to identify modules in the pretreatment network,
   with a minimum module size of 30 to avoid small modules and pass more
   informative results to the subsequent pathway analysis [[60]17,
   [61]18]. Transformation of the adjacency matrix and network
   construction were performed using the blockwiseModules function from
   WGCNA package in R [[62]15].

   We tested whether pretreatment methylation modules changed after
   treatment more than we would expect by chance using the
   module-preservation method and the generalized Hamming distance (GHD)
   method. The module-preservation statistic is comprised of many factors
   that describe the density of the network and network connectivity.
   These statistics are aggregated and were used to calculate the z-score,
   Z[summary,] for each preservation measure [[63]19]. A Z[summary] > 10
   indicated strong evidence of module preservation, whereas a Z[summary]
   between 2 and 10 indicated weak evidence of preservation. If Z[summary]
   was < 2, we concluded there was no evidence of module preservation.
   Module changes were also ranked according to a rank statistic
   medianRank, which is also composed of statistics related to a network’s
   density and connectivity, but is less sensitive to module size
   according to Langfelder et al. [[64]19]. We used both statistics to
   determine whether or not a certain module was preserved between pre-
   and posttreatment. Module preservation analyses were conducted using
   the modulePreservation function from the WGCNA package in R [[65]15].
   GHD is a distance metric that uses one-step topological overlap as the
   edge weights to account for its direct neighboring structure around the
   pair of nodes. Formal hypothesis testing can be done by finding the
   null distribution using permutation, but it is computationally
   inefficient. In a scale-free network, GHD permutation distribution is
   approximately normally distributed, ie,
   [MATH: <msub><mi mathvariant="italic">GHD</mi><mi>π</mi></msub><mfenced
   close=")" open="("
   separators=","><msub><mi>A</mi><mi>π</mi></msub><mi>B</mi></mfenced><mo
   >~</mo><mi>N</mi><mfenced close=")" open="("
   separators=","><msub><mi>μ</mi><mi>π</mi></msub><msubsup><mi>σ</mi><mi>
   π</mi><mn>2</mn></msubsup></mfenced> :MATH]
   [[66]20]. We further standardized the GHD value so it follows a
   standard normal distribution and evaluated its significance against the
   normal quantiles.

Pathway analysis

   After we compared the pre−/posttreatment module pairs, we performed
   pathway-enrichment analysis for module pairs that had topological
   change between the 2 time points to determine whether the genes from
   the modules are associated with a certain biological function. Pathway
   analysis was done using the gometh function in missMethyl library in R,
   which conducts hypergeometric tests to determine if the gene sets from
   the modules contain more genes than expected in a particular pathway
   [[67]21]. We used IlluminaHumanMethylation450kanno.ilmn12.hg19 to get
   annotations and tested 319 pathways in the Kyoto Encyclopedia of Genes
   and Genomes [[68]22].

Results

Module assignment

   Six subnetworks, whose module sizes are 210, 177, 123, 112, 91, and 44,
   were identified in the pretreatment network, while 14,049 probes
   remained outside of these modules. Each module was assigned a color:
   pink, blue, brown, yellow, green, and red that are in descending order
   of module size.

Module preservation analysis

   In Fig. [69]1, methylation probes assigned to blue, pink, and yellow
   modules clearly grouped together in the posttreatment network,
   indicating fairly good preservation for these 3 modules. As seen in
   Fig. [70]2, pink, blue, and yellow modules also had a Z[summary] > 10
   and were ranked at first, second, and third, respectively, when
   applying medianRank statistic, revealing their good preservation in the
   posttreatment network. Green and brown modules were tied for fifth
   place, and their Z[summary] statistic values were < 10, so it is likely
   these 2 modules were not preserved between pre- and posttreatment. The
   red module had a Z[summary] of 1.234, which was < 2, and was ranked
   fourth in terms of medianRank statistic.

Fig. 1.

   [71]Fig. 1
   [72]Open in a new tab

   Clustering dendrogram of pre- and posttreatment methylation probe
   network, together with assigned module colors (pink, blue, yellow, red,
   brown, and green)

Fig. 2.

   [73]Fig. 2
   [74]Open in a new tab

   The medianRank and Z[summary] statistics of module preservation of
   pretreatment modules in posttreatment modules (y-axis) versus module
   size (x-axis)

GHD

   Unlike WGCNA, the GHD method requires a (0,1)-adjacency matrix. We
   obtained such a matrix by thresholding the absolute value of the
   correlation matrix obtained from WGCNA. We used a threshold of 0.2,
   which results in a nearly scale-free network. It is thought that
   biological networks are scale-free, which are characterized by a small
   number of highly connected nodes known as hubs that hold the network
   together [[75]16]. Under the null hypothesis, pretreatment and
   posttreatment modules are independent, which means that there is
   topological difference between the two modules. For each pre–post
   module pair, we calculated the GHD statistic, the mean, and the
   variance, and computed the corresponding Z score and the p value.

   We used Bonferroni-corrected p values to account for multiple testing,
   ie, a significance threshold of 0.05/6 = 0.00833. Table [76]1 shows
   that the pink, blue, and yellow module pairs were not independent,
   which implies that the pre- and posttreatment modules did not have
   topological differences. Conversely, the green, brown, and red modules
   show a differential topological structure between pre- and
   posttreatment networks. Of note, the red module has a borderline p
   value of 0.009, which is very close to the Bonferroni corrected alpha
   of 0.00833, suggesting that it is nearly significant in having no
   topological difference between pre- and posttreatment.

Table 1.

   Results of independence test of each module pair between pre- and
   posttreatment networks
   Modules  GHD  Z score      p Value
   Pink    0.025 − 11.774 5.307 × 10^−32
   Blue    0.092 −9.384   6.350 × 10^−21
   Yellow  0.059 −8.106   5.225 × 10^−16
   Red     0.016 2.609    0.009
   Brown   0.041 0.953    0.340
   Green   0.006 0.462    0.643
   [77]Open in a new tab

   Null: pre- and posttreatment modules are dependent (there are
   structural changes); alternative: pre- and posttreatment modules are
   independent (no structural changes)

   We found that rankings of 6 modules from GHD and medianRank statistic
   were similar (ranked first to sixth for module-preservation level:
   pink, blue, yellow, red, brown, and green) by comparing the results
   from module-preservation and GHD methods. Although its Z[summary]
   showed the weakest preservation, the red module had a borderline p
   value of showing no topological difference and was ranked fourth;
   further investigation may be necessary to evaluate module preservation
   of red module.

Visualization

   As mentioned previously, the red module required further investigation
   to determine if there is any topological difference between pre- and
   posttreatment. We examined the structural changes for the red module by
   making network plots. We used the same threshold of 0.2 as in the first
   step of the GHD method, and converted each cell of the adjacency matrix
   from absolute value of the correlation coefficient into a value of 0 or
   1. If the dichotomous correlation coefficient between 2 probes was 1,
   we drew an edge between them.

   In the 2 network plots in Fig. [78]3, there was an apparent difference
   of network structure between pre- and posttreatment of the red module.
   The strong relationships between most of the methylation probes in the
   red module disappeared in the posttreatment network. The network plots
   of the red module were similar to those of the green and brown modules,
   indicating that the red module was also differentially methylated
   between pre- and posttreatment. Further investigation into the probes
   is needed to determine if the changes were the result of a drug effect.
   One such potential investigation would be to perform a
   pathway-enrichment analysis with these modules to determine if
   methylation probes can be linked to genes related to lipid signaling.

Fig. 3.

   Fig. 3
   [79]Open in a new tab

   Pretreatment and posttreatment network plots of red module

Pathway analysis

   For the 3 differentially methylated modules (green, brown, red), we
   performed pathway-enrichment analysis to test for overrepresentation.
   Table [80]2 shows the results of the pathway analysis for the red,
   green, and brown modules. Compared to the brown and green modules, the
   red module is made up of more genes that are involved in lipid
   signaling and metabolism, which points to overrepresentation in the
   gene list in the red module. For example, LPCA1 and AGPAT3 are known to
   be involved in encoding enzymes that play a role in phospholipid
   metabolism [[81]23]. This implies that fenofibrate had a bigger effect
   on triglyceride levels through methylation probes in the red module,
   leading to a striking structural change between the 2 time points as is
   shown in Fig. [82]3.

Table 2.

   Result of pathway analysis for the red, green, and brown modules
                        Pathway                      # DE    p Value
   Red module
    Sphingolipid signaling pathway                   5    1.81 × 10^−32
    Glycerophospholipid metabolism                   4    0.00016
    Phospholipase D signaling pathway                4    0.0008
    Cyclic adenosine monophosphate signaling pathway 4    0.00149
    Choline metabolism in cancer                     3    0.00561
    Ether lipid metabolism                           2    0.01523
    Glycerolipid metabolism                          2    0.03437
   Green module
    Metabolic pathway                                6    0.00471
    Epstein-Barr virus infection                     3    0.1168
    Notch signaling pathway                          2    0.1168
    Pyrimidine metabolism                            2    0.1492
    Basal cell carcinoma                             2    0.1492
   Brown module
    Proteoglycans in cancer                          6    0.00208
    Salmonella infection                             4    0.00208
    Terpenoid backbone biosynthesis                  3    0.00208
    Pathways in cancer                               7    0.00239
    Gonadotropin-releasing hormone signaling pathway 4    0.00239
   [83]Open in a new tab

   Pathway indicates the KEGG pathway being tested; # DE is the number of
   genes that are differentially expressed in our module; and the false
   discovery rate is the p value for overrepresentation

Discussion

   By performing network analyses on triglyceride-associated DNA
   methylation, we found 2 modules (green and brown) that are
   differentially methylated between pre- and posttreatment based on the
   results of both 2 statistics (Z[summary] and medianRank) in the WGCNA
   module-preservation method and the GHD method. As the red module has a
   borderline p value of showing no topological difference and was ranked
   fourth, we created network plots for pre- and posttreatment,
   separately, and observed that it appeared to be differentially
   methylated. Therefore, 3 modules (red, green, and brown) were found to
   have topological difference between pre- and posttreatment.

   Comparison between module-preservation and GHD methods should also be
   considered. Rankings of 6 modules from GHD and medianRank statistic
   were similar (ranked first to sixth for module preservation level:
   pink, blue, yellow, red, brown, and green). However, the Z[summary]
   statistic showed slightly different results, especially for the red
   module, meaning the red module has the weakest preservation in terms of
   Z[summary], but it was at the middle level when applying the GHD and
   medianRank statistic. This difference between results of Z[summary] and
   medianRank statistic may be a result of Z[summary]’s sensitivity to
   module size. Because the red module has a relatively small module size
   of 44, its lack of preservation based on low Z[summary] value may not
   be as credible as other large modules [[84]19]. Thus, further
   investigation using network visualization was necessary to evaluate the
   preservation level of red module.

   Some limitations of WGCNA and 2 network comparison methods surfaced
   during the analysis. Edge direction was not considered in any of the
   statistics, possibly resulting in loss of information regarding
   relationships between methylation probes. It would be more telling if
   we could use a two-step or even more general topological overlap rather
   than one-step topological overlap to include more information about the
   neighboring structure around a pair of methylation probes to define the
   GHD statistic, although it may not be straightforward to prove
   normality of the sampling distribution under the null.

Conclusions

   We explored the relationship between triglyceride-associated DNA
   methylation and fenofibrate using WGCNA to construct pre- and
   posttreatment co-methylation networks and detect topological
   differences between the 2 networks. Using both module-preservation and
   GHD methods, we found 3 modules that were differentially methylated in
   posttreatment. Enrichment analysis of these 3 modules revealed that
   some of these modules are made up of genes that are involved in
   phospholipid metabolism, which may provide insight into the effect of
   fenofibrate on changes in methylation and triglyceride levels. Although
   we cannot conclude that fenofibrate induced the epigenetic alterations,
   our network-based approach seems promising for detecting
   treatment-induced changes in comethylation.

Acknowledgments