Abstract

Background

   Bisulfite sequencing (BS-seq) has become a standard technology to
   profile genome-wide DNA methylation at single-base resolution. It
   allows researchers to conduct genome-wise cytosine methylation analyses
   on issues about genomic imprinting, transcriptional regulation,
   cellular development and differentiation. One single data from a BS-Seq
   experiment is resolved into many features according to the sequence
   contexts, making methylome data analysis and data visualization a
   complex task.

Results

   We developed a streamlined platform, TEA, for analyzing and visualizing
   data from whole-genome BS-Seq (WGBS) experiments conducted in the model
   plant Arabidopsis thaliana. To capture the essence of the genome
   methylation level and to meet the efficiency for running online, we
   introduce a straightforward method for measuring genome methylation in
   each sequence context by gene. The method is scripted in Java to
   process BS-Seq mapping results. Through a simple data uploading
   process, the TEA server deploys a web-based platform for deep analysis
   by linking data to an updated Arabidopsis annotation database and
   toolkits.

Conclusions

   TEA is an intuitive and efficient online platform for analyzing the
   Arabidopsis genomic DNA methylation landscape. It provides several ways
   to help users exploit WGBS data.

   TEA is freely accessible for academic users at:
   [38]http://tea.iis.sinica.edu.tw.

Background

   Genomic DNA methylation is a long observed phenomenon. It was first
   described in prokaryotes and was found as a defense system of the
   genome against foreign DNA invasion (e.g., R-M system, [[39]1]). In
   eukaryotes, it is known to play roles in regulating gene activity
   [[40]2–[41]4]. DNA methylation can be a stable change, such as genomic
   imprinting in diploid organisms to label the parental origin of the
   genome constituents in the zygote or to regulate gene dosage in sexual
   dimorphic chromosomes. Recent studies reveal that genomic DNA
   methylation can be changed dynamically; working together with histone
   codes, DNA methylation is recognized as an epigenetic marker, i.e., to
   alter genome activity without changing the sequence context
   [[42]5–[43]7].

   5-methylcytosine (5 mC) is the best-characterized methylation type in
   genomic DNA. It can be classified into three different sequence
   contexts by the neighboring bases, CG, CHG, and CHH (H = A, C, or T).
   In plants, CG methylation is maintained by DNA methyltransferase 1
   (MET1), the plant homolog of DNMT1 [[44]8–[45]12]. The plant-specific
   DNA methyltransferase chromomethylase 3 (CMT3) activity promotes CHG
   methylation, which is linked to the histone H3 lysine 9 (H3K9)
   dimethylation condition [[46]13–[47]15]. Asymmetric CHH methylation is
   mediated by two DNA methyltransferases: chromomethylase 2 (CMT2), and
   domains rearranged methyltransferase 2 (DRM2) [[48]16]. CMT2 mainly
   functions at the pericentromeric and long transposable element region,
   and DRM2 mediates CHH methylation through the RNA-directed DNA
   methylation (RdDM) pathway [[49]17, [50]18]. The fact that methylation
   of cytosine in different sequence contexts is maintained/modified by
   different enzymes and pathways implies complex regulation of genomic
   methylation status and attracts researchers interested in various
   subjects.

   Bisulfite sequencing (BS-Seq) has become a standard technology to
   profile genome- wide DNA methylation at single-base resolution.
   Briefly, genomic DNA is treated with sodium bisulfite before
   high-throughput next generation sequencing (NGS). Bisulfite
   modification converts non-methylated cytosines to uracils (read as T in
   the sequencing reaction) while methylated cytosines remain unchanged.
   The frequency of C/C + T at each C is calculated from reads mapped to
   the position to give the cytosine methylation measurement at each C in
   the genome. Tools have been developed to deal with whole-genome BS-Seq
   (WGBS) data, such as BiSeq [[51]19], BSmooth [[52]20], DMAP [[53]21],
   methPipe [[54]22], methylKit [[55]23], methylPipe [[56]24], methylSig
   [[57]25], MOABS [[58]26], radMeth [[59]27], and WBSA [[60]28]. Most of
   these analysis tools focus on identifying differentially methylated
   regions (DMRs) and defining differentially methylated genes (DMGs) as
   genes with DMRs in or near the gene body.

   Arabidopsis thaliana is a popular model plant. It is a selfing (or
   self-fertilization) species with a small and completely sequenced
   genome from the standard strain Col-0. The genome of A. thaliana is
   comprised of five nuclear chromosomes plus mitochondrial and
   chloroplast DNA. The present genome build, TAIR10, is 135 Mb.
   Arabidopsis has been used in a wide range of studies which are
   collected in a centralized resource [[61]29]. One special interest in
   studying plant genomes is the relationship between DNA methylation and
   transposable element silencing. In plants, most DNA methylation occurs
   at transposable elements and other repetitive DNA sequences. The DNA
   methylation pathway is described as a powerful tool for flowering
   plants to silence these high amounts of transposon parasites in their
   genomes [[62]30]. Recently, a comprehensive survey on the Arabidopsis
   methylome published a whole series of BS-Seq data from a panel of gene
   silencing mutants at single-nucleotide resolution [[63]31]. Reanalyzing
   the openly accessible methylome data can serve as a good starting point
   for bench scientists to conceive of new studies; however, it is often
   hampered by a ceiling on data-handling skills. Various features
   according to the sequence contexts and locations need to be resolved
   and compared. The workflow for a BS-Seq data analysis process is more
   complicated than a transcriptome workflow.

   In this paper, we present an intuitive data analysis platform for
   Arabidopsis whole-genome bisulfite sequencing (WGBS) data, TEA (The
   Epigenomic platform for Arabidopsis, [64]http://tea.iis.sinica.edu.tw).
   To crop the essence of genome methylation status and to meet the
   efficiency for performing analysis online, we introduce a
   straightforward method for measuring genome methylation level in each
   sequence context by gene. This method is implemented in an in-house
   program (EpiMolas.jar) to process BS-Seq mapping results from CGmap by
   BS-Seeker2 [[65]32] and CX_report by Bismark [[66]33] into a small,
   tab-delimited data file, mtable. The mtable upload process triggers a
   website deployment by linking data files to gene annotation databases
   (TAIR10, KEGG, GO etc.) and to versatile data analysis and display
   modules in TEA. Using this user-friendly interface, the summary of gene
   and promoter methylation levels among experiment conditions are
   retrieved and analyzed easily.

Methods

The measurement of methylation level

   The methylation landscape in TEA is based on the gene, in which DNA
   methylation levels in the promoter and gene body are estimated from the
   WGBS data. The DNA methylation level for individual cytosines is
   estimated as Equation (1.1):
   [MATH: <mi mathvariant="normal">the</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">D</mi><mi
   mathvariant="normal">N</mi><mi mathvariant="normal">A</mi><mspace
   width="0.25em"></mspace><mi
   mathvariant="normal">methylation</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">level</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">f</mi><mi
   mathvariant="normal">o</mi><mi mathvariant="normal">r</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">individual</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">cytosine</mi><mspace
   width="0.25em"></mspace><mi>i</mi><mo>=</mo><mi
   mathvariant="italic">Ci</mi><mo>=</mo><mspace
   width="0.25em"></mspace><mfrac><mrow><mo>#</mo><mspace
   width="0.25em"></mspace><mi mathvariant="italic">read</mi><mspace
   width="0.25em"></mspace><mi>C</mi></mrow><mrow><mo>#</mo><mspace
   width="0.25em"></mspace><mi
   mathvariant="italic">readC</mi><mo>+</mo><mi
   mathvariant="italic">readT</mi></mrow></mfrac> :MATH]
   1.1

   Calculating the average promoter or gene body methylation levels within
   the promoter or gene body is then the average C [i] within the range by
   Equation (1.2):
   [MATH: <mi mathvariant="normal">Average</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">D</mi><mi
   mathvariant="normal">N</mi><mi mathvariant="normal">A</mi><mspace
   width="0.25em"></mspace><mi
   mathvariant="normal">methylation</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">level</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">in</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">promoter</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">or</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">gene</mi><mspace
   width="0.25em"></mspace><mi mathvariant="normal">body</mi><mspace
   width="0.25em"></mspace><mo>=</mo><mfrac><mstyle
   displaystyle="true"><munder><mo
   stretchy="true">∑</mo><mrow><mi>i</mi><mo>∈</mo><mi>X</mi></mrow></mund
   er><mrow><msub><mi>c</mi><mi>i</mi></msub></mrow></mstyle><mstyle
   displaystyle="true"><munder><mo
   stretchy="true">∑</mo><mrow><mi>i</mi><mo>∈</mo><mi>X</mi></mrow></mund
   er><mrow><mn>1</mn></mrow></mstyle></mfrac> :MATH]
   1.2

   X = promoter or gene body

   We count the number of reads mapped to each C with a minimum threshold
   of four, making an average methylation percentage of at least five C
   sites of each sequence context type, and give six measurements (i.e.,
   pmt-CG, pmt-CHG, pmt-CHH, gene_CG, gene_CHG, gene_CHH) of a gene. Whole
   Genome Bisulfite sequencing is actually considered the gold standard
   approach for profiling genome-wide DNA methylation [[67]34], and has
   become the standard profiling method in major epigenome consortiums,
   such as NIH Roadmap [[68]35], ENCODE [[69]36], Blueprint [[70]37], and
   IHEC [[71]38]. The major sources of error originate from DNA sequencing
   errors and incomplete bisulfite conversion. The former is now well
   controlled, with the advancement of NGS, and is generally <0.01%. The
   latter is usually >99.5% in a stable lab practice. When a region such
   as a promoter or gene body is analyzed, TEA requires at least five
   cytosines to be included in order that any potential bias from
   individual cytosines can be minimized.

   The methylation level is thus a normalized score from 0 (all observed
   sites are unmethylated) to 1 (all observed sites are methylated), or
   “NaN” for genes that do not have sufficient reads/sites to calculate
   the index. The overall calculation process is shown in Fig. [72]1.

Fig. 1.

   Fig. 1
   [73]Open in a new tab

   Flowchart of methylation level extraction in EpiMolas.jar. EpiMolas.jar
   accepts output from several types of BS-seq mapping reports and
   generates six methylation profiles in three contexts, CG/CHG/CHH, and
   two locations, promoter/gene body, based on a gene annotated file

   The method is scripted in Java to process BS-Seq mapping results, e.g.,
   *.CGmap from BS SEEKER2 [[74]32] or CXreport.txt from Bismark [[75]33].
   This program EpiMolas.jar (download from link in
   [76]http://tea.iis.sinica.edu.tw/tea/mtable.html) also requires the
   Arabidopsis gene annotation file (gtf) to calculate the methylation
   measurement file mtable. mtable is a tab-delimited pure text file,
   indexed by gene identifiers and six measurements of methylation levels.
   It is the upload format for the TEA server. Processes for executing
   EpiMolas.jar to generate mtable are described in TEA online help.

System implementation

   TEA (The Epigenomic platform for Arabidopsis) is constructed using LAPP
   system architecture (Ubuntu 14.04, Apache 2.04, PostgreSQL 9.1, and PHP
   5.1) with bootstrap 3 CSS framework ([77]http://getbootstrap.com/) to
   provide an intuitive user experience. The whole system runs in a
   virtual machine (CPUs of 2.27 GHz, four cores, 16 GB RAM and 500 GB
   storage) on the cloud infrastructure of the Institute of Information
   Science, Academia Sinica, Taiwan. The TEA annotation database includes
   genome structure and gene annotation for A. thaliana. It based on
   Ensembl (Ensembl Plants release 32) and function-rich annotations like
   GO terms from Gene Ontology (Aug. 2016) and KEGG Pathways (Apr. 2016).
   Data retrieving, integration and real-time calculation in the analysis
   process are implemented in scripts written in Java (OpenJDK 7), Python
   (version 2.7), and PHP (version 5.1). Graphical visualization used HTML
   Canvas and SVG library to provide a high level of data interactivity.
   Biodalliance genome browser (version 0.13.7) [[78]39] and Circos
   software package (version 0.69) [[79]40] were integrated for browsing
   the genome structure and for accessing a gene list of chromosome
   coordinates, respectively.

Heat map for 2D presentation in color

   The heat map plot is implemented in java with jquery (version 2.1.4)
   and d3.js libraries (version 3.5.17, [80]https://d3js.org/). Tree
   topologies of the dataset grouping and the methylation level of the
   subjects (promoter or gene body for a selected C context) are
   calculated by the SINGLE_LINKAGE in EUCLIDIAN_ DISTANCE method. We
   added tooltip functions to enhance the legibility as well as to access
   source data intuitively. For example, we introduced
   CanvasRenderingContext2D.drawImage with d3 to zoom and pan the heat map
   on the gene name (the Y axis on a heat map) or on the dataset label
   (the X axis on a heat map). Users can select a subset of genes from the
   heat map by mouse clicking on the gene name and saving the list for the
   module “Gene List Analysis.”

Venn diagram for all possible logical relationships

   For showing and counting all possible logical relationships between a
   finite collection of different sets, we implemented a Venn plot
   generator to render a diagram. It is a pure java script developed by
   our team to generate SVG without dependency on jquery or other
   libraries. This plot is also an interactive visualization tool to
   assist in subset selection without sophisticated Boolean operation. The
   implemented Venn plot function can generate comparison results for up
   to four selected sets. All of the possible relationships among
   different gene sets can be saved as a new gene list to module “Gene
   List Analysis” for deciphering deeper biological meanings.

Gene Ontology terms and KEGG pathway enrichment calculation

   Functional enrichment analysis pipelines are built in TEA for detecting
   overrepresented GO terms and KEGG entities in the given gene list
   compared with a random sampling from a whole-genome background.
   Customized P-value is an applicable cutoff for displaying enriched GO
   terms or KEGG pathways. The gene set enrichment score is measured using
   the hypergeometric test in Equation (2):
   [MATH: <mi mathvariant="normal">p</mi><mspace
   width="0.25em"></mspace><mfenced close=")" open="("><mrow><mi>x</mi><mo
   stretchy="true">|</mo><mi>N</mi><mo>,</mo><mi>m</mi><mo>,</mo><mi>n</mi
   ></mrow></mfenced><mo>=</mo><mspace
   width="0.25em"></mspace><mfrac><mrow><mfenced close=")"
   open="("><mtable columnalign="center"><mtr columnalign="center"><mtd
   columnalign="center"><mi>m</mi></mtd></mtr><mtr
   columnalign="center"><mtd
   columnalign="center"><mi>x</mi></mtd></mtr></mtable></mfenced><mfenced
   close=")" open="("><mtable columnalign="center"><mtr
   columnalign="center"><mtd
   columnalign="center"><mi>N</mi><mo>−</mo><mi>m</mi></mtd></mtr><mtr
   columnalign="center"><mtd
   columnalign="center"><mi>n</mi><mo>−</mo><mi>x</mi></mtd></mtr></mtable
   ></mfenced></mrow><mfenced close=")" open="("><mtable
   columnalign="center"><mtr columnalign="center"><mtd
   columnalign="center"><mi>N</mi></mtd></mtr><mtr
   columnalign="center"><mtd
   columnalign="center"><mi>n</mi></mtd></mtr></mtable></mfenced></mfrac>
   :MATH]
   2

   where x is the number of genes in the test set in a particular
   annotation class (e.g., a given GO term), m is the gene number of the
   test set, N is the number of total genes being annotated in the whole
   sample space (i.e., the genome), n is the number of the genome in the
   given annotation class.

Demo datasets

   The demo datasets are from Zhong et al. [[81]41], which includes five
   previously published WGBS datasets, [82]GSM881756, [83]GSM1193638,
   [84]GSM981015, [85]GSM981017, and [86]GSM981040 [[87]31, [88]42]. All
   of these bisulfite sequencing reactions were carried out in 50-mer
   single-end format using an Illumina HiSeq2000. Raw read data files were
   downloaded from SRA/NCBI and were preprocessed with Cutadapt [[89]43]
   to remove TruSeq adaptors. Cleaned reads were mapped to TAIR10 genome
   using BS Seeker2 [[90]32] and Bismark [[91]33]. The output files
   (*.CGmap /BS Seeker2 and CX_report.txt) were then converted to
   methylation data files (*.mtable) using EpiMolas.jar as described
   previously.

Results

Deploying the working platform

   A working platform in TEA starts with a data uploading process. Users
   should make their mtable first (described previously). A step-by-step
   process guides users to define and check the overall data-to-TEA
   database mapping conditions in real time. The data deployment will take
   a few minutes to complete. Measurements on promoter regions of RNA
   genes (rRNAs, pre-tRNA, snRNAs, snoRNAs, miRNAs, and other RNA genes)
   and transposable genes (TE) are not included on the TEA-derived
   website. The whole process of TEA is illustrated in Fig. [92]2.
   Data-uploading details can be viewed through the demo tutorial on “New
   Submission”. Unregistered users can upload the data to TEA and run
   “have a trial” to get the analysis in a dynamic URL for 1-month access.

Fig. 2.

   Fig. 2
   [93]Open in a new tab

   The workflow of TEA from raw reads to data visualization and deep
   analysis. a Align BS-seq read to the reference genome. b Generate
   mtable from CGmap, the output of BS Seeker2, or CX_report, the output
   of Bismark. c The main portal of TEA for data submission. d Data
   visualization with Venn diagram, boxplot, histogram, heatmap, and
   Circos plot

   A summary of data mapping condition and usable measurements for each
   dataset and each methylation index are logged in the data website
   “home”. User should check this summary first. Unexpected values, such
   as a low gene id mapping rate or a low percentage of analyzable
   genes/promoters, are warning signs of problems in the data, for
   example, improper files were used in the data preprocessing steps or a
   low read throughput were used for resolving the methylome.

   The basic data display view in TEA is the gene-centric page containing
   gene information, the methylation profile on CG, CHG, and CHH contexts
   of the gene (gene body) and the promoter region (−2500 to +500), an
   embedded genome browser, and gene function annotation (GO terms and
   KEGG pathways). The neighboring gene function allows users to browse
   the genes located up to a 5-Mb flanking range.

Data modules

   TEA provides several ways to help users exploit their uploaded
   high-throughput data (Fig. [94]3). The data modules include the word
   search function on the annotation table, the gene ID list search
   function, the quantitative analysis functions on the differences
   (subtraction) or a threshold (cutoff) of a methylation measurement, and
   the canonical view in pathways. The gene list from all of these data
   retrieval steps can be kept as a gene list for later uses.
     * Find Genes by Text
       In the module Full Text Search, users can find genes by Gene ID,
       gene symbol, gene description, and KEGG pathway description from
       the integrated annotation table in TEA. Further constraints on the
       gene biotypes and chromosome location can be applied to refine the
       search simultaneously. In the module Import Genelist query, users
       can paste or upload a file of a list in TAIR10/ ensembl gene IDs or
       gene symbols and obtain the matches. In the module KEGG GlobalView,
       users can browse genes on pathway maps. Items in a given map can be
       saved as a gene list for further analysis.
     * Find Genes by Values
       Module DMGs (differentially methylated genes) is a pairwise
       comparison workflow for two data pools (experimental conditions) to
       which a single or multiple datasets can be assigned. Through a
       customized and flexible parameter setting, genes that fulfill the
       criteria are selected and can be further constrained on particular
       gene biotypes and chromosome locations. Module mC Threshold is used
       for selecting genes above or below a cutoff in at least one or
       among all of the selected datasets. These two modules provide
       different ways to access methylation landscapes in six different
       sequence contexts.
     * Gene List Analysis
       Gene list items are derived from text search, id search, or a value
       comparison process and are kept for users’ needs. With the aid of
       the data displaying approach “show Venn diagram”, overlapping of
       elements (i.e., genes) in different gene lists is easily solved.
       Further manipulation on Boolean algebra, such as union,
       intersection, or other combinations, can be carried out on the
       interactive diagram to generate a new gene list.

Fig. 3.

   Fig. 3
   [95]Open in a new tab

   Six data modules in TEA. The full text search module is to search for
   genes of interest in the keyword search from the context of gene ID,
   gene symbol, gene description, and KEGG description. The DMGs module is
   to select differentially methylated genes (DMGs) on the basis of
   customized criteria. The mc Threshold module is to select DMGs by a
   cutoff value. The Import Genelist module is to upload lists of genes of
   interest. The KEGG Global View module is to display which genes are
   involved in each category of KEGG pathways. The gene list analysis
   module is to view the gene set from different analytic approaches
   including Venn diagram, heatmap, Circos plot, GO terms, and KEGG
   pathway enrichment analysis

Data displaying approaches

   We designed several ways to deepen the view of a gene list including
   “Show gene list”, “Plot on the location of genes on chromosome map”
   “Calculate GO term enrichment”, “Calculate KEGG pathway enrichment”,
   and “Draw heatmap with 2D clustering”. As mentioned in previous
   paragraphs, genes can be selected for different reasons via different
   data modules. Using “show gene lists”, the selected genes can be saved
   as a GeneList item for later uses. The approach “Show Venn Diagram” is
   only available in module GeneList Analysis to help on sub-list
   selection.

Demo case: BS-Seq data reanalysis

   To help users become familiar with TEA, we provide a demo dataset from
   Zhong et al. (2015) entitled “DOMAINS REARRANGED METHYLTRANSFERASE3
   controls DNA methylation and regulates RNA polymerase V transcript
   abundance in Arabidopsis” [[96]41]. Briefly, this study focused on
   elucidating the function of DRM3 (domain rearranged methyltransferase
   3) in the RdDM pathway, which contains a catalytically inactive enzyme
   domain, but is required for de novo DNA methylation in vivo. The major
   conclusions from this study are: 1. DRM3 has moderate effects on global
   DNA methylation and 2. DRM3 interacts with Pol V, implying regulatory
   roles of Pol V- involved RNA-directed DNA methylation.

   Five BS-Seq libraries were prepared from 3-week old leaf for detecting
   genome-wide methylation status under four conditions: two biological
   replicates in the control group and one dataset for each of the three
   gene silencing mutants (drm2, drm3, and nrpe1/Pol V catalytic domain).
   We first downloaded the raw read files from SRA/ NCBI, trimmed and
   mapped the reads to the Arabidopsis reference genome and produced the
   mtable file for each dataset. The five mtable files were uploaded to
   TEA and joined to TEA annotation database to build a data analysis
   website ([97]http://tea.iis.sinica.edu.tw/tair10_demo_new/).

   Firstly, we examined the summary of the five datasets. The percentage
   on gene id mapping and the usable measurements were acceptable. The
   methylation measurement data distribution showed a higher level of CG
   methylation in both gene body and promoter regions, agreeing with the
   general idea that CG is the major methylation type in the A. thaliana
   genome. Moreover, mutant drm2 had the most obvious effect (decreased
   methylation level) on the overall CHH methylation status in both gene
   body and promoter regions, whereas the drm3 mutant had the mildest
   effect. We further examined the gene body CHH methylation level using a
   criterion of gene body CHH methylation level difference ≥0.15 from the
   control group (Col-1 + Col-2). Gene lists from the three comparisons
   performed in the “DMGs” module were analyzed in the module “GeneList
   Analysis” using the Venn diagram and heatmap plot approaches with 2D
   clustering (Fig. [98]4). The control-to-drm3 (Ctl_drm3) DMGs number
   were the smallest set. The heatmap-2D clustering result indicated that
   the profile of DMGs selected by gene body CHH level were alike in drm2
   and nrpe1 (polV).

Fig. 4.

   Fig. 4
   [99]Open in a new tab

   A reanalysis of the case study dataset. a Using the criterion of
   differentially methylated (Δ = 0.15) on the methylation index CHH-gene,
   DMGs were selected from drm2, drm3, and nrpe1 (polV) mutants in
   comparison with the control group. b The left panel is a Venn diagram
   to show the overlap of genes among the three DMG sets. A resizable 2D
   heat map plot of the union set indicates the profile pattern
   (similarity) among the five datasets

Discussion

   The report of a BS-Seq mapping tool is often a table to count mappings
   on each C base in the reference genome, numerating the “C” and “C + T”
   events with or without the sequence context notation. Although the
   mapping info has been degenerated a lot from the original alignment
   result, the file size of the mapping report is still hard to handle in
   a web-based analysis tool. The measurement introduced in this study is
   a further reduction on the methylation landscape from the single-base
   level to the gene level. Each measurement is an average of five
   observations (a particular C context) and each observation is based on
   at least four events (mapped reads). A deviation of 0.1 in the
   measurement reflects a change in the methylation state of 10% of the Cs
   in the observation. Although we cannot tell whether the changes are
   concentrated in a few sites or more dispersed among the observed sites,
   using this straightforward approach, we reanalyzed a published dataset
   and observed the same trends in the methylation landscapes caused by
   gene silencing.

   DMRs identified from different studies are not easily inferred as
   equivalent because the methods applied are not guaranteed to be
   compatible to each other. Therefore, a reanalysis of published datasets
   is necessary for reusing data. It is cost-effective to perform data
   mining before conducting a new experiment. In a more practical
   scenario, researchers can try to find a compatible dataset from some
   metadata depository, such as SRA/NCBI, TAIR, or MPSS [[100]44] and
   conduct the analysis together with their experimental data to increase
   the power of the data. An interesting aspect of reanalyzing openly
   accessed data is to dig out some novel findings not mentioned in the
   article, because either the authors were not interested in them, or the
   findings were not covered in the article’s scope.

   In summary, we present TEA, a streamlined WGBS analysis platform with
   versatile analysis and display modules. It provides a straightforward
   methodology to explore the methylation status in different sequence
   contexts. The openly accessible TEA provides a dynamic URL for
   one-month access. Further extension of this work will include more
   model organisms in the platform and more sophisticated and robust
   models to select DMGs. To alleviate the burden of upstream BS-seq data
   alignment, we are going to integrate the preprocessing step into the
   Galaxy platform.

Conclusions

   TEA is a user-friendly platform for WGBS analysis. It is fast and
   efficient to select the DMGs because it shrinks the input methylation
   data from the single-base level to a gene-level methylation profile. It
   provides several ways to help users exploit and discover their uploaded
   high-throughput data. It can also facilitate data sharing among
   cooperators. An unregistered usage is available for creating a working
   platform with a dynamic URL for one-month access. TEA is freely
   available for academic users. We welcome researchers to ask for
   cooperation, keeping the data on a password-controlled access website
   or an open-access website.

Acknowledgements