library(kableExtra)
library(knitr)
library(jpeg)
library(png)

Results

Pilot Study QC (section 3.1)

In order to assess the reliability of the EM-seq kit, a pilot study was performed as outlined in section 2.2. To assess the performance of the EM-seq kit, we performed multiple quality control analyses. We first extracted the average CpG methylation from specific contigs as well as the spike-in controls for use in a quality control assessment analysis. These included chromosome 1, the mitochondrial chromosome, lambda phage and pUC19 plasmid.

Percentage CpG methylation (section 3.1.1)

We expect the chromosome 1 CpG sites to be intermediately methylated. As a comparative species, zebrafish chromosomal CpG methylation is ~80% (Goll and Halpern 2011) and we therefore expect our guppy samples to be relatively close to this. We also expect that the chromosome 1 CpG methylation percentages would not significantly differ between each of the samples as these experimental light conditions are unlikely to alter total methylation percentages significantly. The chromosome 1 CpG methylation percentage (Figure XA) ranged from 65% and 69%. This is relatively low variation between the samples and close to the zebrafish’s ~80% chromosomal CpG methylation.

The pUC19 was included in this pilot study as the positive control. The plasmid was supplied fully methylated at the CpG sites and we therefore expect its CpG methylation to be very high (>95%). We expect the pUC19 plasmid and lambda phage DNA methylation to not differ significantly between the samples as the same quantity and source of lambda phage and plasmid DNA were spiked into each of them. The pUC19 plasmid CpG methylation percentage (Figure XB) for each of our samples was >95% and there was low variation among the samples (~3%).

Lambda phage was included in this pilot study as the negative control and therefore we would expect CpG methylation here to be very low (<5%). The lambda phage DNA CpG methylation percentage (Figure XC) for each of our samples was significantly <5%. The variability among the samples for this metric was also low (~0.6%).

We expect low CpG methylation (<5%) on the mitochondrial chromosome as we know that very little to no methylation exists here. As essentially 0% methylation exists at this site we expect very little variation between samples. The CpG methylation within the Mitochondrial chromosome was very low (<1%) and there was minimal variation among the samples (~0.4%) (Figure XD).

Within the chromosomal, lambda phage and mitochondrial chromosome (Figure A, C and D) the T11-F-N sample had greater (Lambda phage and mtDNA) or lower (chromosomal) percentage of CpG methylation than the other samples. This may suggest a minor problem with the sample quality and/or the profiling of it. However, overall the results found in each of these quality control metrics provide evidence of the high quality methylation data generated with the NEBnext enzymatic methyl-seq kit.

knitr::include_graphics(rep("figures/CpGmethylationPilotStudyCollage.jpg"))
Figure X - CpG methylation percentage present in each sample of the pilot study. (A) Bar chart displaying the CpG methylation present in chromosome 1 of each sample. (B) Bar chart displaying the CpG methylation present in the spiked in pUC19 plasmid DNA in each of the samples. (C) Bar chart displaying the CpG methylation present in the spiked in lambda phage DNA in each of the samples. (D) Bar chart displaying the CpG methylation present in the mitochondrial chromosome in each of the samples.

Figure X - CpG methylation percentage present in each sample of the pilot study. (A) Bar chart displaying the CpG methylation present in chromosome 1 of each sample. (B) Bar chart displaying the CpG methylation present in the spiked in pUC19 plasmid DNA in each of the samples. (C) Bar chart displaying the CpG methylation present in the spiked in lambda phage DNA in each of the samples. (D) Bar chart displaying the CpG methylation present in the mitochondrial chromosome in each of the samples.

Insert Length (section 3.1.2)

Suzuki et al. (2018) state that when performing whole genome bisulfite sequencing using NovaSeq, they expect insert lengths to be on average <210 bp in size. In 150-bp paired-end sequencing, a >300 bp insert length is optimal as to be cost and time efficient. For most of our samples, median insert length was ~300 bp (Figure XX), this is an improvement on the expected 210 bp length. The sample T11-F-N results once again deviated from the other samples and had a median insert size of ~210, lower than optimal but meeting our expectation based on the literature.

knitr::include_graphics(rep("figures/Pilotstudyinsertlengths.png"))
Figure X - Violin plot displaying the insert lengths in base pairs within each sample of the pilot study. The ends of the black line under each violin indicate the interquartile range and the white indentation within the black line indicates the median.

Figure X - Violin plot displaying the insert lengths in base pairs within each sample of the pilot study. The ends of the black line under each violin indicate the interquartile range and the white indentation within the black line indicates the median.

Read Length (section 3.1.3)

Because 150-bp paired-end sequencing was performed, we have the same read length expectations as the pilot study. The mean read length was ~150.5 bp for the forward reads of each sample and ~149.5 for the reverse reads of each sample (Figure X). Little variation existed between the samples. These results are evidence of high quality sequencing ### Read Length (section 3.1.3)

We expect the average read length to be ~150 bp. This read length derives the most bp coverage from each read while not unnecessarily lowering the quality of the reads. This is therefore the most efficient read length. Most of our samples’ average read lengths (figure x) were between 147-149 which is close to our optimum length. However, the sample T11-F-N was ~142 bp, less than optimal.

knitr::include_graphics(rep("figures/Pilotstudyreadlength.png"))
Figure X - Figure displaying the mean read length in base pairs for each sample pair in the pilot study. The samples labeled “pair2” are the forward reads and the samples labeled “pair1” are the reverse reads.

Figure X - Figure displaying the mean read length in base pairs for each sample pair in the pilot study. The samples labeled “pair2” are the forward reads and the samples labeled “pair1” are the reverse reads.

CpH methylation (section 3.1.4)

For our chromosome 1, pUC19 plasmid, lambda phage and mitochondrial chromosome, CpH methylation is expected to be very low (<5%) as CpH sites are methylated at very low percentages. The chromosome 1 CpH methylation (Figure XA) was very low (<1.5%) meeting our minimum acceptable criteria. T11-F-N was higher than the other samples at ~1.5%. The pUC19 plasmid DNA CpH methylation (Figure XB) was low (<2%), however T11-F-O was 6%. This is higher than our expectation, however considering that this sample had only 32 reads, this is likely only due to an extremely low sample size. The same explanation can be used to answer why T09-F-O had 0% CpH methylation, as there were only 8 reads of the pUC19 plasmid from this sample. The lambda phage DNA CpH methylation (Figure XC) was low (<1.5%), meeting our minimum acceptable criteria. T11-F-N was again greater than the other samples at ~1.5%. The mitochondrial chromosome CpH methylation (figure XD) was low (<1.5%) meeting our minimum acceptability criteria but T11-F-N was again greater than the other samples at ~1.5%.

knitr::include_graphics(rep("figures/CpHmethylationpilotstudycollagewithlabels.jpg"))
Figure X - CpH methylation percentage present in each sample. (A) Bar chart displaying the CpH methylation present in chromosome 1 of each sample. (B) Bar chart displaying the CpH methylation present in the spiked in pUC19 plasmid DNA in each of the samples. (C) Bar chart displaying the CpH methylation present in the spiked in lambda phage DNA in each of the samples. (D) Bar chart displaying the CpH methylation present in the mitochondrial chromosome in each of the samples.

Figure X - CpH methylation percentage present in each sample. (A) Bar chart displaying the CpH methylation present in chromosome 1 of each sample. (B) Bar chart displaying the CpH methylation present in the spiked in pUC19 plasmid DNA in each of the samples. (C) Bar chart displaying the CpH methylation present in the spiked in lambda phage DNA in each of the samples. (D) Bar chart displaying the CpH methylation present in the mitochondrial chromosome in each of the samples.

Total/Duplicate Reads (section 3.1.5)

The expectation for the total reads of each sample is to be not greater than 28 million divided by 6 (4,666,667). This is due to the capacity of the MiniSeq sequencing system. We also expect relatively little variation among the samples. Each of our samples had less than 4,666,667 total reads with minimal variation among the samples. The percentage of duplicate reads is expected to be low (<5%) as the pilot study contained a very low number of reads in each sample (XA) when compared to the necessary amount to perform whole genome methylation profiling. As the sequencing depth rises, the expected number of duplicate reads also rises. The percentage of duplicate reads (Figure XB) for each sample was low (<2%) as expected.

knitr::include_graphics(rep("figures/Totalandduplicatereadscollagelabeled.jpg"))
Figure X - (A) Bart chart displaying the total number of reads for each sample. (B) Bar chart displaying the percentage of these reads which are duplicates for each sample.

Figure X - (A) Bart chart displaying the total number of reads for each sample. (B) Bar chart displaying the percentage of these reads which are duplicates for each sample.

Wasted Bases Analysis (section 3.1.6)

An article entitled “Data quality of whole genome bisulfite sequencing on Illumina platforms” (Raine et al. 2018) provides a table of percentage yield for 25 different library preparations/sequencing techniques. The majority of these had only 45-55% yield and the greatest percentage yield was 75%. Using these examples as a benchmark we could consider our 79.9% yield (Figure x) from our wasted bases analysis very high. We also expect consistency between our samples as each of them have a similar number of total reads (and therefore duplicate reads) and the same sequencing and library preparation techniques were used on each sample. Each of our samples had consistent percentage yield with only a ~2-3% deviations between the samples (figure x)

knitr::include_graphics(rep("figures/PilotStudyWastedBasesAnalysisCollage.png"))
Figure X - Percentage of total wasted bases at each data processing step. (A) A pie chart of all reads in the pilot study. (B) A stacked bar chart displaying the breakdown for each sample.

Figure X - Percentage of total wasted bases at each data processing step. (A) A pie chart of all reads in the pilot study. (B) A stacked bar chart displaying the breakdown for each sample.

Main data QC (section 3.2)

We performed a main data QC analysis as outlined in section 2.2 to ensure that our methylation profiling results were reliable. Our expectations for many of the Main data QC results do not deviate from the corresponding pilot study expectations, refer to section 3.1 to see these expectations.

Percentage CpG methylation (section 3.2.1)

The Chr1, pUC19, lambda phage and mitochondrial chromosome CpG methylation pilot study and main study expectations are identical. All of the Chr1 samples had a similar CpG methylation percentage and ranged from ~72% to ~75% (figure XA). The pUC19 plasmid CpG methylation percentage (Figure XB) for each of our samples was >97% and there was minimal variation among the samples (~2%). The lambda phage DNA CpG methylation percentage (Figure XC) for each of our samples was <1%. The variability among the samples for this metric was also low (~0.1%). The CpG methylation within the Mitochondrial chromosome was very low (<0.3%) and there was minimal variation among the samples (<0.1%) (Figure XD).

knitr::include_graphics(rep("figures/MDCpGMethylationCollage.png"))
Figure X - CpG methylation percentage present in each sample of the main data. (A) Bar chart displaying the CpG methylation present in chromosome 1 of each sample. (B) Bar chart displaying the CpG methylation present in the spiked in pUC19 plasmid DNA in each of the samples. (C) Bar chart displaying the CpG methylation present in the spiked in lambda phage DNA in each of the samples. (D) Bar chart displaying the CpG methylation present in the mitochondrial chromosome in each of the samples.

Figure X - CpG methylation percentage present in each sample of the main data. (A) Bar chart displaying the CpG methylation present in chromosome 1 of each sample. (B) Bar chart displaying the CpG methylation present in the spiked in pUC19 plasmid DNA in each of the samples. (C) Bar chart displaying the CpG methylation present in the spiked in lambda phage DNA in each of the samples. (D) Bar chart displaying the CpG methylation present in the mitochondrial chromosome in each of the samples.

Insert Length (section 3.2.2)

Like the pilot study, the main study was also performed using 150-bp paired-end sequencing and therefore the insert size expectations do not differ from those of the pilot study. With the exception of Clear2F-R3 all of the samples had a median insert length >300 bp. Clear2F-R3 had slightly <300 bp (Figure X). These results are an improvement on the suzuki et al. (2018) expectations.

knitr::include_graphics(rep("figures/MDInsertlength.png"))
Figure X - Violin plot displaying the insert lengths in base pairs within each sample of the main study. The ends of the black line under each violin indicate the interquartile range and the white indentation within the black line indicates the median.

Figure X - Violin plot displaying the insert lengths in base pairs within each sample of the main study. The ends of the black line under each violin indicate the interquartile range and the white indentation within the black line indicates the median.

Read Length (section 3.2.3)

Because 150-bp paired-end sequencing was performed, we have the same read length expectations as the pilot study. The mean read length was ~150.5 bp for the forward reads of each sample and ~149.5 for the reverse reads of each sample (Figure X). Little variation existed between the samples. These results are evidence of high quality sequencing.

knitr::include_graphics(rep("figures/MDreadlength.png"))
Figure X - Figure displaying the mean read length in base pairs for each sample pair in the main study. The samples labeled “pair2” are the forward reads and the samples labeled “pair1” are the reverse reads.

Figure X - Figure displaying the mean read length in base pairs for each sample pair in the main study. The samples labeled “pair2” are the forward reads and the samples labeled “pair1” are the reverse reads.

Total/Duplicate Reads (section 3.2.4)

A large number of reads for each sample is required in order to have satisfactory coverage of the genome when performing whole genome methylation sequencing. The guppy genome is 731,622,281 bp in length (https://m.ensembl.org/Poecilia_reticulata/Info/Annotation#assembly) and The US National Institutes of Health (NIH) Roadmap Epigenomics Project recommends 30x coverage of the human genome when considering all replicates in order to properly perform whole genome bisulfite sequencing (Michael J Ziller et al. 2014). Considering our reads are 150 bp in length on average and we need 30x coverage, we expect to have >146,324,456.2 total reads per light condition (731,622,281x30/150). Considering all of our samples have >70,000,000 reads per replicate, they each have >210,000,000 reads per light condition and therefore more than adequate coverage of the guppy genome. As there is only one foundation sample it must have >146,324,456.2 total reads. This sample has 151,547,956 total reads which is also more than adequate coverage of the guppy genome.

knitr::include_graphics(rep("figures/MDReadscollage.png"))
Figure X - (A) Bart chart displaying the total number of reads for each sample. (B) Bar chart displaying the percentage of these reads which are duplicates for each sample.

Figure X - (A) Bart chart displaying the total number of reads for each sample. (B) Bar chart displaying the percentage of these reads which are duplicates for each sample.

Wasted Bases Analysis (section 3.2.5)

The percentage yield expectations do not deviate from the pilot study percentage yield expectations. As we achieved a far greater sequencing depth in the main study, a reduction in percentage yield is expected. The 72.6% yield is less than the 79.9% yield of the pilot study but still very high compared to other studies which involve whole genome bisulfite sequencing. This indicates that the NEBnext kit’s reduction in DNA degradation aids in producing a greater yield than would usually be acquired performing bisulfite sequencing. ClearR1 and LilacR3 had a significantly higher percentage of Unmapped/Duplicate reads than the other samples. The number of duplicate reads for each of the samples was consistent and therefore the problem was due to difficulty in mapping the reads to the reference genome.

knitr::include_graphics(rep("figures/MDwastedbasescollage.png"))
Figure X - Percentage of total wasted bases at each data processing step. (A) A pie chart of all reads in the pilot study. (B) A stacked bar chart displaying the breakdown for each sample.

Figure X - Percentage of total wasted bases at each data processing step. (A) A pie chart of all reads in the pilot study. (B) A stacked bar chart displaying the breakdown for each sample.

Principal Component Analysis and Correlation Heatmaps (section 3.3)

If the experimental conditions resulted in significant changes in the guppy methylome that outweigh the effects of biological variability then we would expect to see separation of the biological conditions and the foundation population from one another on the PCA plot. We would also expect to see greater correlation within the light conditions than between than light conditions. This would be represented by colour distinctions between light conditions on the heatmap and clustering of the light condition replicates outside of the heatmap.

For both the CpG site (figure x.A) and tile methylation (figure x.B) we see clumping of most samples. On the CpG PCA plot clear1 and lilac 3 are distanced from the rest of the samples whereas on the tile PCA plot clear1 and foundation are separated from the rest of the samples.

knitr::include_graphics(rep("figures/PCAcollage.png"))
Figure X - PCA plots created using CpG methylation data. (A) A PCA plot created using CpG methylation data at the CpG site scale. (B) A PCA plot created using CpG methylation data at the tile scale.

Figure X - PCA plots created using CpG methylation data. (A) A PCA plot created using CpG methylation data at the CpG site scale. (B) A PCA plot created using CpG methylation data at the tile scale.

The correlation relationships displayed on each of the heatmaps (figure x.A, x.B, x.C and x.D) were similar with small distinctions. All of the heat maps displayed close relationships between the green replicates. The heatmaps overall displayed little consistency in the relationships between the lilac replicates. They also displayed little consistency between the clear replicates. In all four of the heatmaps clear1 had little correlation to any of the other samples. In the CpG site correlation heatmaps (figure x.A and x.C) the foundation sample was closely related to the others. However, in the tile correlation heatmaps (figure x.B and x.D) The foundation had little correlation to any of the other samples.

knitr::include_graphics(rep("figures/Correlationcollage.png"))
Figure X - Correlation heatmaps generated using CpG methylation data. (A) A Pearson correlation heatmap created using CpG methylation data at the CpG site scale. (B) A Pearson correlation heatmap created using CpG methylation data at the tile scale. (C) (A) A Spearman correlation heatmap created using CpG methylation data at the CpG site scale. (D) A Spearman correlation heatmap created using CpG methylation data at the tile scale.

Figure X - Correlation heatmaps generated using CpG methylation data. (A) A Pearson correlation heatmap created using CpG methylation data at the CpG site scale. (B) A Pearson correlation heatmap created using CpG methylation data at the tile scale. (C) (A) A Spearman correlation heatmap created using CpG methylation data at the CpG site scale. (D) A Spearman correlation heatmap created using CpG methylation data at the tile scale.

Differential Methylation Analysis (section 3.4)

The differential methylation analysis was performed to understand if the experimental conditions resulted in differential methylation in the genes or promoter regions of genes. This differential methylation is likely to alter the expression levels of identified genes. As the DNA was extracted from the brain and eyes of the fish we expect to see significantly differentiated tiles associated with genes that are expressed in the eyes and central nervous system. It is also expected that these genes can be in some way linked to the experimental conditions, meaning that the alteration of expression levels confer a sexual selective advantage under altered light conditions.

dmrseq (section 3.4.1)

preanodmrs <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/preanodmrs.rds")
kbl(preanodmrs, caption = "Table 2 - displays the ten most differentiated regions from each light condition comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qval is the adjusted p-value associated with the dmr. ", ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 2 - displays the ten most differentiated regions from each light condition comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qval is the adjusted p-value associated with the dmr.
LightComp location start end width L beta pval qval
CvG LG21 15968848 15969922 1075 69 -0.5126778 3.17e-05 0.7193775
CvG LG13 18160946 18161700 755 75 0.4804793 4.01e-05 0.7193775
CvG LG8 27368425 27368499 75 22 -0.3598625 4.12e-05 0.7193775
CvG KK215287.1 628700 629071 372 42 -0.3577019 4.22e-05 0.7193775
CvG KK215308.1 92510 92874 365 65 -0.3030945 4.44e-05 0.7193775
CvG LG2 18092802 18093635 834 53 -0.3978501 4.54e-05 0.7193775
CvG LG5 2971274 2971545 272 61 0.3785743 6.44e-05 0.7193775
CvG KK215283.1 934758 935340 583 78 0.3357201 8.45e-05 0.7193775
CvG LG10 32171882 32172301 420 46 0.4390605 9.08e-05 0.7193775
CvG LG13 18052556 18054100 1545 81 -0.3413481 9.40e-05 0.7193775
CvL LG19 10333007 10333692 686 71 0.5767663 1.00e-06 0.1114369
CvL LG21 14273013 14273749 737 81 0.4647484 8.00e-06 0.4457476
CvL KK215293.1 186354 186845 492 61 0.3911403 4.00e-05 0.9103295
CvL LG13 22015858 22016352 495 82 0.4397952 6.60e-05 0.9103295
CvL KK215283.1 1376211 1376895 685 104 0.3613973 6.60e-05 0.9103295
CvL LG17 1251474 1251938 465 61 -0.4012925 7.10e-05 0.9103295
CvL LG8 18982718 18983377 660 60 -0.3746378 7.40e-05 0.9103295
CvL LG11 14181818 14182182 365 58 -0.3570840 8.10e-05 0.9103295
CvL LG2 16355666 16355981 316 76 0.3183476 1.01e-04 0.9103295
CvL LG4 15409098 15410303 1206 85 -0.3823658 1.07e-04 0.9103295
GvL LG11 28461237 28461741 505 108 -0.2773737 7.10e-06 0.6538511
GvL KK215283.1 1376121 1376899 779 156 0.3291269 1.79e-05 0.7492043
GvL LG20 17269425 17270374 950 73 0.4755714 2.50e-05 0.7492043
GvL LG1 22240537 22241922 1386 78 0.3515985 4.64e-05 0.7492043
GvL LG2 28494008 28495445 1438 65 0.4626823 5.48e-05 0.7492043
GvL LG13 18160946 18161831 886 95 -0.3970642 5.60e-05 0.7492043
GvL KK215287.1 944627 944828 202 36 -0.4584372 6.07e-05 0.7492043
GvL LG11 10041760 10042135 376 75 0.3701602 6.55e-05 0.7492043
GvL KK215306.1 101373 102087 715 68 0.4324903 7.62e-05 0.7749346
GvL LG5 2971274 2971545 272 58 -0.3990748 9.29e-05 0.7879743

The differential methylation analysis run using dmrseq yielded no significantly differentially methylated regions (dmrs) (table 2). The most differentiated regions for the clear versus green comparison were associated with a q-value of 0.719. The most differentiated region for the clear versus lilac comparison was associated with a q-value of 0.111 which was the closest to a statistically significant region. The most differentiated region for the green versus lilac comparison was associated with a q-value of 0.654.

CvGanodmrs <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/CvGanodmrs.rds")
CvLanodmrs <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/CvLanodmrs.rds")
GvLanodmrs <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/GvLanodmrs.rds")

kbl(CvGanodmrs, caption = "Table 3 - displays the ten most differentiated regions from the clear versus green comparison which could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qvalue is the adjusted p-value associated with the dmr. The column gene_id is the ensembl ID of the gene associated with the dmr and gene_name is the name of the gene associated with the dmr. " ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 3 - displays the ten most differentiated regions from the clear versus green comparison which could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qvalue is the adjusted p-value associated with the dmr. The column gene_id is the ensembl ID of the gene associated with the dmr and gene_name is the name of the gene associated with the dmr.
LightComp location start end width L beta pvalue qvalue gene_id gene_name
1 CvG LG5 2971274 2971545 272 61 0.3785743 0.0000644 0.7193775 ENSPREG00000022867 asb14a
5 CvG LG11 10641658 10642177 520 56 0.4028755 0.0001320 0.7193775 ENSPREG00000001606 uncharacterised
6 CvG LG4 4685116 4685638 523 91 0.2888801 0.0001394 0.7193775 ENSPREG00000003007 uncharacterised
7 CvG LG1 17896682 17897364 683 78 -0.4086131 0.0001700 0.7193775 ENSPREG00000004582 uncharacterised
8 CvG LG14 9176317 9177886 1570 117 -0.2718633 0.0002366 0.7193775 ENSPREG00000021264 ppm1nb
9 CvG LG5 561500 562079 580 58 -0.3328806 0.0003844 0.7193775 ENSPREG00000021434 GRM7
11 CvG LG11 28397337 28398020 684 79 -0.2993888 0.0004034 0.7193775 ENSPREG00000009968 meaf6
12 CvG LG18 3020876 3021389 514 56 -0.3258269 0.0004151 0.7193775 ENSPREG00000000773 uncharacterised
13 CvG LG18 3020876 3021389 514 56 -0.3258269 0.0004151 0.7193775 ENSPREG00000000773 uncharacterised
14 CvG LG9 29010586 29010947 362 37 -0.3274617 0.0004151 0.7193775 ENSPREG00000010392 tbc1d10aa
kbl(CvLanodmrs, caption = "Table 4 - displays the ten most differentiated regions from the clear versus lilac comparison which could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qvalue is the adjusted p-value associated with the dmr. The column gene_id is the ensembl ID of the gene associated with the dmr and gene_name is the name of the gene associated with the dmr. " ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 4 - displays the ten most differentiated regions from the clear versus lilac comparison which could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qvalue is the adjusted p-value associated with the dmr. The column gene_id is the ensembl ID of the gene associated with the dmr and gene_name is the name of the gene associated with the dmr.
LightComp location start end width L beta pvalue qvalue gene_id gene_name
7 CvL LG11 14181818 14182182 365 58 -0.3570840 0.000081 0.9103295 ENSPREG00000007449 uncharacterised
8 CvL LG11 14181818 14182182 365 58 -0.3570840 0.000081 0.9103295 ENSPREG00000007449 uncharacterised
9 CvL LG16 852906 853242 337 59 -0.3099585 0.000184 0.9103295 ENSPREG00000015724 uncharacterised
10 CvL LG16 852906 853242 337 59 -0.3099585 0.000184 0.9103295 ENSPREG00000015724 uncharacterised
11 CvL LG16 852906 853242 337 59 -0.3099585 0.000184 0.9103295 ENSPREG00000015724 uncharacterised
12 CvL LG4 4685053 4685638 586 95 0.3158647 0.000311 0.9103295 ENSPREG00000003007 uncharacterised
13 CvL LG1 18070269 18071303 1035 89 -0.2763986 0.000340 0.9103295 ENSPREG00000005128 slc25a4
14 CvL LG23 1376882 1377608 727 61 -0.4336691 0.000489 0.9103295 ENSPREG00000022413 ppfia2
15 CvL LG7 18259264 18260110 847 28 0.5276226 0.000491 0.9103295 ENSPREG00000003617 uts2a
18 CvL LG7 17085665 17086479 815 101 0.3494188 0.000607 0.9261309 ENSPREG00000001520 zgc:198371
kbl(GvLanodmrs, caption = "Table 5 - displays the ten most differentiated regions from the green versus lilac comparison which could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qvalue is the adjusted p-value associated with the dmr. The column gene_id is the ensembl ID of the gene associated with the dmr and gene_name is the name of the gene associated with the dmr. " ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 5 - displays the ten most differentiated regions from the green versus lilac comparison which could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the dmr is located, start and end are the base pairs on the genome which the dmr starts and ends at. The column width is the size of the dmr in bps, L is the number of CpGs within the dmr, beta is the methylation difference between light groups, pval is the pvalue associated with the dmr and qvalue is the adjusted p-value associated with the dmr. The column gene_id is the ensembl ID of the gene associated with the dmr and gene_name is the name of the gene associated with the dmr.
LightComp location start end width L beta pvalue qvalue gene_id gene_name
1 GvL LG11 28461237 28461741 505 108 -0.2773737 0.0000071 0.6538511 ENSPREG00000010182 tfap2e
8 GvL LG5 2971274 2971545 272 58 -0.3990748 0.0000929 0.7879743 ENSPREG00000022867 asb14a
10 GvL LG17 4434405 4435108 704 67 -0.4174846 0.0001357 0.7879743 ENSPREG00000004307 uncharacterised
11 GvL LG17 4434405 4435108 704 67 -0.4174846 0.0001357 0.7879743 ENSPREG00000004307 uncharacterised
12 GvL LG4 19716877 19717452 576 71 0.3020933 0.0002084 0.7879743 ENSPREG00000020177 ppat
13 GvL LG21 16656091 16656658 568 92 -0.2600441 0.0002131 0.7879743 ENSPREG00000015620 entpd5b
14 GvL LG22 8861308 8862494 1187 107 0.3227734 0.0003298 0.9737459 ENSPREG00000015260 uncharacterised
15 GvL LG1 17897041 17897428 388 79 0.2553471 0.0003655 0.9928849 ENSPREG00000004582 uncharacterised
16 GvL LG17 4281596 4283884 2289 138 0.2363804 0.0005251 0.9998670 ENSPREG00000003984 MYOC
17 GvL LG18 4887976 4888902 927 70 -0.3148876 0.0005537 0.9998670 ENSPREG00000004844 uncharacterised

The dmrs that were identified were annotated to the promoter regions of genes (table 3, 4 and 5). The most differentiated regions from the lilac versus clear comparison were not able to be annotated to any promoter regions. The region with the lowest q-value which was annotated to a gene was associated with a q-value of 0.654 and was from the green versus lilac comparison. This means we did not have any regions close to being deemed differentially methylated by dmrseq that were annotated to a gene.

methylkit (section 3.4.2)

CpG site level

methylkitCvGdfR <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/methylkitCvGdfR.rds")
kbl(methylkitCvGdfR, digits = 150, ,caption = "Table 6 - displays the fifteen most differentiated CpG sites from the clear versus green comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups.") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 6 - displays the fifteen most differentiated CpG sites from the clear versus green comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups.
LightComp location basepair pvalue qvalue meth.diff
14011068 CvG LG4 1847635 1.510121e-15 2.819262e-08 69.49153
14011071 CvG LG4 1847677 6.272684e-15 5.855272e-08 71.81818
14011018 CvG LG4 1846766 9.610321e-15 5.980539e-08 73.61963
14011020 CvG LG4 1846768 1.055213e-13 4.924975e-07 67.64706
14011073 CvG LG4 1847679 3.361816e-13 1.046036e-06 69.15584
14011075 CvG LG4 1847691 3.317211e-13 1.046036e-06 65.26210
14011017 CvG LG4 1846756 6.059313e-13 1.616029e-06 62.43094
14011016 CvG LG4 1846735 8.313745e-13 1.940128e-06 59.77654
14011077 CvG LG4 1847697 1.443677e-12 2.994685e-06 65.58559
14011014 CvG LG4 1846719 4.063471e-12 7.586140e-06 60.12658
8691557 CvG LG19 4233614 2.875857e-08 4.880882e-02 60.09818
3610880 CvG LG12 13448520 5.349979e-08 8.323281e-02 -61.90476
5207011 CvG LG14 17323571 6.150734e-08 8.832981e-02 50.00000
6685875 CvG LG16 15182433 7.754818e-08 9.651705e-02 -55.77342
12677287 CvG LG23 2943357 7.286739e-08 9.651705e-02 59.44056
methylkitCvLdfR <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/methylkitCvLdfR.rds")
kbl(methylkitCvLdfR, digits = 150, ,caption = "Table 7 - displays the fifteen most differentiated CpG sites from the clear versus lilac comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups." ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 7 - displays the fifteen most differentiated CpG sites from the clear versus lilac comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups.
LightComp location basepair pvalue qvalue meth.diff
7721640 CvL LG17 28853681 1.798548e-10 0.003278925 -56.81818
8503892 CvL LG19 4233614 8.650817e-10 0.007885634 65.96774
11916736 CvL LG22 9603310 4.190343e-08 0.254646567 73.91304
231578 CvL KK215289.1 44015 1.152627e-07 0.420269993 48.78205
11130041 CvL LG21 5357179 1.035762e-07 0.420269993 -50.00000
325243 CvL KK215296.1 299902 1.744510e-07 0.454344073 52.17391
8778728 CvL LG19 15379719 1.573263e-07 0.454344073 -62.66667
6104680 CvL LG15 28566314 2.404266e-07 0.459360945 57.14286
8987858 CvL LG19 22845041 2.966095e-07 0.459360945 -50.23847
10246492 CvL LG2 44850219 2.568048e-07 0.459360945 -59.52381
11230076 CvL LG21 9433707 3.023610e-07 0.459360945 56.72269
14527798 CvL LG5 1744620 2.175832e-07 0.459360945 -67.03297
8175112 CvL LG18 14883427 3.366633e-07 0.472130294 -57.67974
3791209 CvL LG12 24547758 3.809787e-07 0.496114701 64.53804
1368825 CvL LG1 23778631 9.371466e-07 0.504339931 48.00000
methylkitGvLdfR <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/methylkitGvLdfR.rds")
kbl(methylkitGvLdfR, digits = 150, ,caption = "Table 8 - displays the fifteen most differentiated CpG sites from the green versus lilac comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups." ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 8 - displays the fifteen most differentiated CpG sites from the green versus lilac comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups.
LightComp location basepair pvalue qvalue meth.diff
15775362 GvL LG4 1846766 5.358976e-19 1.121417e-11 -69.27181
15775428 GvL LG4 1847697 4.085825e-16 4.274991e-09 -68.91892
15775425 GvL LG4 1847679 1.170641e-15 6.124197e-09 -69.78610
15775427 GvL LG4 1847691 1.160716e-15 6.124197e-09 -68.38710
15775423 GvL LG4 1847677 2.302746e-15 9.637436e-09 -71.92571
15775363 GvL LG4 1846768 7.446573e-12 2.597112e-05 -56.01915
15775361 GvL LG4 1846756 1.483015e-11 4.433360e-05 -52.01427
15775419 GvL LG4 1847635 5.926712e-11 1.550277e-04 -56.33363
15775358 GvL LG4 1846719 1.834129e-10 4.264546e-04 -51.03567
15775360 GvL LG4 1846735 7.537840e-10 1.577366e-03 -48.66543
13288679 GvL LG21 20288461 1.359249e-08 2.585782e-02 45.71429
13652786 GvL LG22 8050874 1.705089e-08 2.973386e-02 50.00000
14854930 GvL LG3 2944320 4.672240e-08 7.520856e-02 51.85185
8247430 GvL LG17 4774357 5.842523e-08 8.732888e-02 59.65157
734635 GvL KK215575.1 1095 6.462642e-08 9.015801e-02 -44.18605

Table 5 displays the eleven significantly differentiated CpG sites and four CpG sites which were not significantly differentiated found by methylkit for the clear versus green comparison. The top ten sites were all in the same region on chromosome 4. Table 7 displays the two significantly differentiated CpG sites and thirteen CpG sites which were not significantly differentiated found by methylkit for the clear versus lilac comparison. Table x displays the twelve significantly differentiated CpG sites and three CpG sites which were not significantly differentiated found by methylkit for the green versus lilac comparison. The significantly differentiated green versus lilac sites were the same sites found in the green versus clear comparison.

CvGpromoterschrdfR <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/CvGpromoterschrdfR.rds")
kbl(CvGpromoterschrdfR, digits = 150, ,caption = "Table 9 - displays the ten most differentiated CpG sites from the clear versus green comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the CpG site and gene_name is the name of the gene associated with the CpG site. ") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 9 - displays the ten most differentiated CpG sites from the clear versus green comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the CpG site and gene_name is the name of the gene associated with the CpG site.
LightComp location basepair pvalue qvalue meth.diff gene_id gene_name
1195612 CvG LG4 1847635 1.510121e-15 2.819262e-08 69.49153 ENSPREG00000021490 nasp
1195618 CvG LG4 1847677 6.272684e-15 5.855272e-08 71.81818 ENSPREG00000021490 nasp
1195545 CvG LG4 1846766 9.610321e-15 5.980539e-08 73.61963 ENSPREG00000021490 nasp
1195547 CvG LG4 1846768 1.055213e-13 4.924975e-07 67.64706 ENSPREG00000021490 nasp
1195622 CvG LG4 1847679 3.361816e-13 1.046036e-06 69.15584 ENSPREG00000021490 nasp
1195626 CvG LG4 1847691 3.317211e-13 1.046036e-06 65.26210 ENSPREG00000021490 nasp
1195544 CvG LG4 1846756 6.059313e-13 1.616029e-06 62.43094 ENSPREG00000021490 nasp
1195543 CvG LG4 1846735 8.313745e-13 1.940128e-06 59.77654 ENSPREG00000021490 nasp
1195630 CvG LG4 1847697 1.443677e-12 2.994685e-06 65.58559 ENSPREG00000021490 nasp
1195541 CvG LG4 1846719 4.063471e-12 7.586140e-06 60.12658 ENSPREG00000021490 nasp
CvLpromoterschrdfR <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/CvLpromoterschrdfR.rds")
kbl(CvLpromoterschrdfR, digits = 150, ,caption = "Table 10 - displays the ten most differentiated CpG sites from the clear versus lilac comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the CpG site and gene_name is the name of the gene associated with the CpG site. ") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 10 - displays the ten most differentiated CpG sites from the clear versus lilac comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the CpG site and gene_name is the name of the gene associated with the CpG site.
LightComp location basepair pvalue qvalue meth.diff gene_id gene_name
804720 CvG LG2 449260 1.316960e-06 0.5043399 -62.37037 ENSPREG00000006354 zgc:152904
910884 CvG LG20 421215 6.352613e-07 0.5043399 56.98925 ENSPREG00000005117 uncharacterised
1371208 CvG LG6 24055348 1.261452e-06 0.5043399 53.33333 ENSPREG00000008036 kti12
818069 CvG LG2 5310879 2.756394e-06 0.5646001 56.09244 ENSPREG00000012054 uncharacterised
1568297 CvG LG9 3178904 2.976812e-06 0.5646001 -44.82759 ENSPREG00000016442 adora2aa
543482 CvG LG15 26659727 3.680161e-06 0.5846745 -45.45455 ENSPREG00000009249 ATE1
543483 CvG LG15 26659727 3.680161e-06 0.5846745 -45.45455 ENSPREG00000009249 ATE1
705655 CvG LG18 6177030 4.198931e-06 0.5901297 43.33333 ENSPREG00000006766 uncharacterised
1448486 CvG LG7 23175036 5.187829e-06 0.5901297 45.00000 ENSPREG00000017615 gid8b
1448487 CvG LG7 23175036 5.187829e-06 0.5901297 45.00000 ENSPREG00000017615 gid8b
GvLpromoterschrdfR <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/GvLpromoterschrdfR.rds")
kbl(GvLpromoterschrdfR, digits = 150, ,caption = "Table 10 - displays the ten most differentiated CpG sites from the green versus lilac comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the CpG site and gene_name is the name of the gene associated with the CpG site. ") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 10 - displays the ten most differentiated CpG sites from the green versus lilac comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the CpG site is located, basepair is the basepair location on the genome which the CpG site is located. The column pval is the pvalue associated with the CpG site, qvalue is the adjusted p-value associated with the CpG site and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the CpG site and gene_name is the name of the gene associated with the CpG site.
LightComp location basepair pvalue qvalue meth.diff gene_id gene_name
1338222 CvG LG4 1846766 5.358976e-19 1.121417e-11 -69.27181 ENSPREG00000021490 nasp
1338320 CvG LG4 1847697 4.085825e-16 4.274991e-09 -68.91892 ENSPREG00000021490 nasp
1338321 CvG LG4 1847697 4.085825e-16 4.274991e-09 -68.91892 ENSPREG00000021490 nasp
1338314 CvG LG4 1847679 1.170641e-15 6.124197e-09 -69.78610 ENSPREG00000021490 nasp
1338315 CvG LG4 1847679 1.170641e-15 6.124197e-09 -69.78610 ENSPREG00000021490 nasp
1338318 CvG LG4 1847691 1.160716e-15 6.124197e-09 -68.38710 ENSPREG00000021490 nasp
1338319 CvG LG4 1847691 1.160716e-15 6.124197e-09 -68.38710 ENSPREG00000021490 nasp
1338310 CvG LG4 1847677 2.302746e-15 9.637436e-09 -71.92571 ENSPREG00000021490 nasp
1338311 CvG LG4 1847677 2.302746e-15 9.637436e-09 -71.92571 ENSPREG00000021490 nasp
1338223 CvG LG4 1846768 7.446573e-12 2.597112e-05 -56.01915 ENSPREG00000021490 nasp

Both the clear versus green (table 9) and green versus lilac (table 11) comparisons contained the same ten differentiated CpG sites, which were all annotated to the same nasp gene. There were no significantly differentiated CpG sites from the clear versus lilac (table 10) comparison which could be annotated to promoter regions.

Tile level

LightComp <- c('CvG','CvG','CvG','CvL','CvL','CvL','GvL','GvL','GvL','CvG','CvG','CvG','CvL','CvL','CvL','GvL','GvL','GvL','CvG','CvG','CvG','CvL','CvL','CvL','GvL','GvL','GvL')
LightComp
##  [1] "CvG" "CvG" "CvG" "CvL" "CvL" "CvL" "GvL" "GvL" "GvL" "CvG" "CvG" "CvG"
## [13] "CvL" "CvL" "CvL" "GvL" "GvL" "GvL" "CvG" "CvG" "CvG" "CvL" "CvL" "CvL"
## [25] "GvL" "GvL" "GvL"
MethDirection <- c('up','down','total','up','down','total','up','down','total','up','down','total','up','down','total','up','down','total','up','down','total','up','down','total','up','down','total')
Annotation <- c('N/A','N/A','N/A','N/A','N/A','N/A','N/A','N/A','N/A','gene','gene','gene','gene','gene','gene','gene','gene','gene','promoter','promoter','promoter','promoter','promoter','promoter','promoter','promoter','promoter')
Tiles <- c('16999','39913','56912','19497','30189','49686','31399','20391','51790','9745','23903','33648','11337','17795','29132','18666','11776','30442','3037','5836','8873','3285','4499','7784','4718','3417','8135')
SigMethTiles <- data.frame(LightComp, MethDirection, Annotation, Tiles)
kbl(SigMethTiles, caption = "Table 12 - Table displaying the number of significantly methylated tiles for each light group comparison, direction and annotation combination") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 12 - Table displaying the number of significantly methylated tiles for each light group comparison, direction and annotation combination
LightComp MethDirection Annotation Tiles
CvG up N/A 16999
CvG down N/A 39913
CvG total N/A 56912
CvL up N/A 19497
CvL down N/A 30189
CvL total N/A 49686
GvL up N/A 31399
GvL down N/A 20391
GvL total N/A 51790
CvG up gene 9745
CvG down gene 23903
CvG total gene 33648
CvL up gene 11337
CvL down gene 17795
CvL total gene 29132
GvL up gene 18666
GvL down gene 11776
GvL total gene 30442
CvG up promoter 3037
CvG down promoter 5836
CvG total promoter 8873
CvL up promoter 3285
CvL down promoter 4499
CvL total promoter 7784
GvL up promoter 4718
GvL down promoter 3417
GvL total promoter 8135

A larger number of differentially methylated tiles were identified (table 12) when compared to the dmrseq identified regions and the methylkit identified CpG sites.

preannotiles <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/preannotiles.rds")
kbl(preannotiles, digits = 150, ,caption = "Table 13 - displays the ten most differentiated tiles for each light condition comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the tile is located, star and end are the locations on the genome which the tile starts and ends. The column pval is the pvalue associated with the tile, qvalue is the adjusted p-value associated with the tile and meth.diff is the methylation difference between light groups.") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 13 - displays the ten most differentiated tiles for each light condition comparison. The column LightComp is the light conditions which were compared in the analysis, location is either the chromosome or unmapped contig on which the tile is located, star and end are the locations on the genome which the tile starts and ends. The column pval is the pvalue associated with the tile, qvalue is the adjusted p-value associated with the tile and meth.diff is the methylation difference between light groups.
LightComp location start end pvalue qvalue meth.diff
CvG LG4 1847001 1848000 9.714622e-131 4.885614e-125 19.239478
CvG LG4 1846001 1847000 1.196917e-89 3.009727e-84 19.897694
CvG LG19 1188001 1189000 1.774652e-81 2.974987e-76 -22.272263
CvG KK215608.1 11001 12000 1.425147e-63 1.791814e-58 10.634437
CvG LG13 18161001 18162000 3.721408e-59 3.743092e-54 20.536203
CvG LG21 14273001 14274000 6.434827e-53 5.393601e-48 17.569509
CvG LG1 9073001 9074000 3.550053e-47 2.550528e-42 14.606176
CvG LG21 1055001 1056000 2.372031e-46 1.491158e-41 14.585360
CvG LG1 17897001 17898000 1.654721e-42 9.246461e-38 -6.231157
CvG KK215291.1 217001 218000 1.040687e-37 5.233756e-33 16.920822
CvL LG16 509001 510000 3.213330e-96 1.635845e-90 -25.985845
CvL LG21 1055001 1056000 8.837088e-72 2.249397e-66 19.699209
CvL LG19 1188001 1189000 1.236241e-66 2.097822e-61 -20.545544
CvL LG19 10333001 10334000 5.976181e-65 7.605902e-60 22.885670
CvL LG9 23623001 23624000 3.595176e-58 3.660472e-53 20.839098
CvL LG21 14273001 14274000 1.515368e-52 1.285742e-47 18.421604
CvL LG2 20995001 20996000 4.065070e-48 2.956357e-43 -20.266832
CvL KK215350.1 22001 23000 1.997522e-45 1.271126e-40 24.997453
CvL KK215283.1 1376001 1377000 2.043225e-44 1.155741e-39 13.087834
CvL LG7 18259001 18260000 8.559718e-41 4.357590e-36 23.291524
GvL LG4 1847001 1848000 7.503577e-141 3.921253e-135 -19.016257
GvL LG4 1846001 1847000 8.017760e-104 2.094979e-98 -19.309795
GvL KK215283.1 1376001 1377000 1.012781e-55 1.764212e-50 13.265035
GvL LG16 509001 510000 1.433189e-54 1.872407e-49 -17.403727
GvL LG13 18161001 18162000 1.900156e-51 1.985985e-46 -17.695603
GvL LG21 538001 539000 5.180641e-40 4.512205e-35 -12.608880
GvL LG18 21720001 21721000 2.190222e-39 1.635109e-34 14.869760
GvL LG14 4148001 4149000 2.644311e-35 1.727345e-30 -17.151052
GvL LG17 3558001 3559000 8.543377e-35 4.960707e-30 19.287516
GvL LG1 17897001 17898000 2.744822e-33 1.434401e-28 4.758557
geneannotiles <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/geneannotiles.rds")
kbl(geneannotiles, digits = 150, ,caption = "Table 14 - Displays the ten most differentiated tiles for each light condition comparison that could be annotated to genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the tile is located, star and end are the locations on the genome which the tile starts and ends. The column pval is the pvalue associated with the tile, qvalue is the adjusted p-value associated with the tile and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the tile and gene_name is the name of the gene associated with the tile.") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 14 - Displays the ten most differentiated tiles for each light condition comparison that could be annotated to genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the tile is located, star and end are the locations on the genome which the tile starts and ends. The column pval is the pvalue associated with the tile, qvalue is the adjusted p-value associated with the tile and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the tile and gene_name is the name of the gene associated with the tile.
LightComp location start end pvalue qvalue meth.diff gene_id gene_name
CvG LG19 1188001 1189000 1.774652e-81 2.974987e-76 -22.272263 ENSPREG00000017528 SHANK1
CvG LG13 18161001 18162000 3.721408e-59 3.743092e-54 20.536203 ENSPREG00000001265 NA
CvG LG1 9073001 9074000 3.550053e-47 2.550528e-42 14.606176 ENSPREG00000000312 slx4
CvG LG21 1055001 1056000 2.372031e-46 1.491158e-41 14.585360 ENSPREG00000013853 hivep2a
CvG LG1 17897001 17898000 1.654721e-42 9.246461e-38 -6.231157 ENSPREG00000004582 NA
CvG LG18 21720001 21721000 1.092368e-36 4.994240e-32 -15.394290 ENSPREG00000014753 aimp1b
CvG LG18 21720001 21721000 1.092368e-36 4.994240e-32 -15.394290 ENSPREG00000014824 NA
CvG LG19 10333001 10334000 1.033326e-35 4.330611e-31 16.787633 ENSPREG00000002350 QRFPR
CvG LG22 14521001 14522000 7.000381e-34 2.514704e-29 -31.913947 ENSPREG00000003390 NA
CvG LG9 23623001 23624000 8.468147e-34 2.839163e-29 14.478395 ENSPREG00000003004 homer1b
CvL LG16 509001 510000 3.213330e-96 1.635845e-90 -25.985845 ENSPREG00000014798 NA
CvL LG21 1055001 1056000 8.837088e-72 2.249397e-66 19.699209 ENSPREG00000013853 hivep2a
CvL LG19 1188001 1189000 1.236241e-66 2.097822e-61 -20.545544 ENSPREG00000017528 SHANK1
CvL LG19 10333001 10334000 5.976181e-65 7.605902e-60 22.885670 ENSPREG00000002350 QRFPR
CvL LG9 23623001 23624000 3.595176e-58 3.660472e-53 20.839098 ENSPREG00000003004 homer1b
CvL LG2 20995001 20996000 4.065070e-48 2.956357e-43 -20.266832 ENSPREG00000014863 ephx2
CvL LG7 18259001 18260000 8.559718e-41 4.357590e-36 23.291524 ENSPREG00000003617 uts2a
CvL LG11 28461001 28462000 1.078697e-40 4.992221e-36 -13.623576 ENSPREG00000010167 ncdn
CvL LG21 538001 539000 5.735034e-40 2.432998e-35 -13.783047 ENSPREG00000012615 crnkl1
CvL LG21 538001 539000 5.735034e-40 2.432998e-35 -13.783047 ENSPREG00000012722 NA
GvL LG16 509001 510000 1.433189e-54 1.872407e-49 -17.403727 ENSPREG00000014798 NA
GvL LG13 18161001 18162000 1.900156e-51 1.985985e-46 -17.695603 ENSPREG00000001265 NA
GvL LG21 538001 539000 5.180641e-40 4.512205e-35 -12.608880 ENSPREG00000012615 crnkl1
GvL LG21 538001 539000 5.180641e-40 4.512205e-35 -12.608880 ENSPREG00000012722 NA
GvL LG18 21720001 21721000 2.190222e-39 1.635109e-34 14.869760 ENSPREG00000014753 aimp1b
GvL LG18 21720001 21721000 2.190222e-39 1.635109e-34 14.869760 ENSPREG00000014824 NA
GvL LG14 4148001 4149000 2.644311e-35 1.727345e-30 -17.151052 ENSPREG00000009387 NA
GvL LG14 4148001 4149000 2.644311e-35 1.727345e-30 -17.151052 ENSPREG00000009393 zgc:112437
GvL LG17 3558001 3559000 8.543377e-35 4.960707e-30 19.287516 ENSPREG00000002699 NA
GvL LG1 17897001 17898000 2.744822e-33 1.434401e-28 4.758557 ENSPREG00000004582 NA
promoterannotiles <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/promoterannotiles.rds")
kbl(promoterannotiles, digits = 150, ,caption = "Table 16 - Displays the ten most differentiated tiles for each light condition comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the tile is located, star and end are the locations on the genome which the tile starts and ends. The column pval is the pvalue associated with the tile, qvalue is the adjusted p-value associated with the tile and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the tile and gene_name is the name of the gene associated with the tile.") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 16 - Displays the ten most differentiated tiles for each light condition comparison that could be annotated to the promoter regions of genes. The column LightComp is the light conditions which were compared in the analysis, location is the chromosome on which the tile is located, star and end are the locations on the genome which the tile starts and ends. The column pval is the pvalue associated with the tile, qvalue is the adjusted p-value associated with the tile and meth.diff is the methylation difference between light groups. The column gene_id is the ensembl ID of the gene associated with the tile and gene_name is the name of the gene associated with the tile.
LightComp location start end pvalue qvalue meth.diff gene_id gene_name
CvG LG4 1847001 1848000 9.714622e-131 4.885614e-125 19.239478 ENSPREG00000021468 gpc5b
CvG LG4 1847001 1848000 9.714622e-131 4.885614e-125 19.239478 ENSPREG00000021490 nasp
CvG LG4 1846001 1847000 1.196917e-89 3.009727e-84 19.897694 ENSPREG00000021468 gpc5b
CvG LG4 1846001 1847000 1.196917e-89 3.009727e-84 19.897694 ENSPREG00000021490 nasp
CvG LG1 9073001 9074000 3.550053e-47 2.550528e-42 14.606176 ENSPREG00000000312 slx4
CvG LG1 17897001 17898000 1.654721e-42 9.246461e-38 -6.231157 ENSPREG00000004582 uncharacterised
CvG LG1 17897001 17898000 1.654721e-42 9.246461e-38 -6.231157 ENSPREG00000004594 lgi2b
CvG LG22 14521001 14522000 7.000381e-34 2.514704e-29 -31.913947 ENSPREG00000003390 uncharacterised
CvG LG5 2971001 2972000 5.312984e-29 1.113321e-24 13.827373 ENSPREG00000022867 asb14a
CvG LG21 16656001 16657000 5.936152e-29 1.148219e-24 11.441065 ENSPREG00000015620 entpd5b
CvL LG2 20995001 20996000 4.065070e-48 2.956357e-43 -20.266832 ENSPREG00000014846 zgc:194275
CvL LG7 18259001 18260000 8.559718e-41 4.357590e-36 23.291524 ENSPREG00000003617 uts2a
CvL LG11 28461001 28462000 1.078697e-40 4.992221e-36 -13.623576 ENSPREG00000010182 tfap2e
CvL LG21 538001 539000 5.735034e-40 2.432998e-35 -13.783047 ENSPREG00000012722 uncharacterised
CvL LG14 4148001 4149000 8.688214e-40 3.402312e-35 -21.043219 ENSPREG00000009393 zgc:112437
CvL LG11 27368001 27369000 1.892756e-31 4.588411e-27 13.365924 ENSPREG00000008406 fabp10b
CvL LG15 10977001 10978000 1.506553e-29 3.195656e-25 -11.297619 ENSPREG00000009538 grid1a
CvL LG21 23640001 23641000 3.073884e-29 6.018677e-25 -19.150044 ENSPREG00000001261 gabrr2b
CvL LG21 525001 526000 2.428540e-28 4.121079e-24 -14.582528 ENSPREG00000012371 uncharacterised
CvL LG21 525001 526000 2.428540e-28 4.121079e-24 -14.582528 ENSPREG00000012615 crnkl1
GvL LG4 1847001 1848000 7.503577e-141 3.921253e-135 -19.016257 ENSPREG00000021468 gpc5b
GvL LG4 1847001 1848000 7.503577e-141 3.921253e-135 -19.016257 ENSPREG00000021490 nasp
GvL LG4 1846001 1847000 8.017760e-104 2.094979e-98 -19.309795 ENSPREG00000021468 gpc5b
GvL LG4 1846001 1847000 8.017760e-104 2.094979e-98 -19.309795 ENSPREG00000021490 nasp
GvL LG21 538001 539000 5.180641e-40 4.512205e-35 -12.608880 ENSPREG00000012722 uncharacterised
GvL LG14 4148001 4149000 2.644311e-35 1.727345e-30 -17.151052 ENSPREG00000009393 zgc:112437
GvL LG17 3558001 3559000 8.543377e-35 4.960707e-30 19.287516 ENSPREG00000002699 uncharacterised
GvL LG1 17897001 17898000 2.744822e-33 1.434401e-28 4.758557 ENSPREG00000004582 uncharacterised
GvL LG1 17897001 17898000 2.744822e-33 1.434401e-28 4.758557 ENSPREG00000004594 lgi2b
GvL LG5 2971001 2972000 5.782685e-32 2.518285e-27 -13.468644 ENSPREG00000022867 asb14a

A large number of tiles associated with extremely low q-values (table 13) were identified. A large proportion of these were annotated both within genes (table 14) and within the promoter regions (table 15) of genes. The function of these genes associated with these extremely low q-values were investigated.

The extremely differentiated tiles are mostly evenly spread across the entirety of the genomes (figure 16). When observing the clear versus green RCircos plot (figure 16.A), it appears as though a large number of significantly differentiated tiles are located across chromosome 16. However, only 5 differentiated tiles out of the 75 chosen to be displayed in this plot are present on chromosome 16. This problem was therefore likely caused by a bug in the RCircos package.

knitr::include_graphics(rep("figures/RCircoscollage.png"))
Figure x contains three RCircos plots which show the location and level of differentiation of the 75 most differentially methylated tiles. (A) RCircos plot generated from clear versus green comparison. (B) RCircos plot generated from clear versus lilac comparison. (C) RCircos plot generated from green versus lilac comparison.

Figure x contains three RCircos plots which show the location and level of differentiation of the 75 most differentially methylated tiles. (A) RCircos plot generated from clear versus green comparison. (B) RCircos plot generated from clear versus lilac comparison. (C) RCircos plot generated from green versus lilac comparison.

There exists a number of genes which contain many significantly differentiated tiles in their promoter regions (figure 15). Due to the presence of this differential methylation, these genes are likely to have their expression levels altered. Therefore, to explore if these genes may confer a sexually selected advantage their functions were investigated further.

knitr::include_graphics(rep("figures/Promotergenetilescollage.png"))
Figure 15 contains three bar charts which each display the twenty genes with the most differentially methylated tiles annotated to their promoter region which are in the negative direction and the twenty genes with the most differentially methylated tiles annotated to their promoter region which are in the positive direction from each light condition comparison. (A) Bar chart created using clear versus green comparison. (B)  Bar chart created using clear versus lilac comparison. (C) Bar chart created using green versus lilac comparison.

Figure 15 contains three bar charts which each display the twenty genes with the most differentially methylated tiles annotated to their promoter region which are in the negative direction and the twenty genes with the most differentially methylated tiles annotated to their promoter region which are in the positive direction from each light condition comparison. (A) Bar chart created using clear versus green comparison. (B) Bar chart created using clear versus lilac comparison. (C) Bar chart created using green versus lilac comparison.

Enrichment analysis (section 3.5)

If the changes in light conditions resulted in methylation derived alterations to gene expression we expect to find biological pathways/functions significantly enriched that can be linked to the experimental conditions. Biological pathways that have been significantly or reasonably close to significantly enriched that are linkable to the experimental conditions are notable results. Notable results of this kind will be highlighted in this results section and explored further in the discussion.

mitch (section 3.5.1)

mitchresults <- readRDS("/mnt/data/aaron/projects/guppy-methylation/Thesis/figures/Tables/mitchresults.rds")
kbl(mitchresults, digits = 5, ,caption = "Table 16 - displays the ten most enriched biological pathways/functions for each light group compariosn. The column LightComp is the light conditions which were compared in the analysis, pathway/function is the ontological pathway code associated with the the pathway as well as the name of the pathway. The column pANOVA is the pvalue associated with the pathway, p.adjustANOVA is the adjusted p-value associated with the pathway. ") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 16 - displays the ten most enriched biological pathways/functions for each light group compariosn. The column LightComp is the light conditions which were compared in the analysis, pathway/function is the ontological pathway code associated with the the pathway as well as the name of the pathway. The column pANOVA is the pvalue associated with the pathway, p.adjustANOVA is the adjusted p-value associated with the pathway.
LightComp pathway/function pANOVA p.adjustANOVA
CvG GO:0051087 chaperone binding 0.00017 0.17158
CvG GO:0005833 hemoglobin complex 0.00038 0.18963
CvG GO:0006289 nucleotide-excision repair 0.00071 0.20944
CvG GO:0030509 BMP signaling pathway 0.00085 0.20944
CvG GO:0004879 nuclear receptor activity 0.00168 0.33188
CvG GO:0001947 heart looping 0.00402 0.57700
CvG GO:0005525 GTP binding 0.00414 0.57700
CvG GO:0043231 intracellular membrane-bounded organelle 0.00497 0.57700
CvG GO:0043065 positive regulation of apoptotic process 0.00525 0.57700
CvG GO:0008033 tRNA processing 0.00615 0.59074
CvL GO:0033179 proton-transporting V-type ATPase, V0 domain 0.00090 0.88100
CvL GO:0016042 lipid catabolic process 0.00234 0.88100
CvL GO:0007218 neuropeptide signaling pathway 0.00268 0.88100
CvL GO:0005184 neuropeptide hormone activity 0.00537 0.91821
CvL GO:0005901 caveola 0.00583 0.91821
CvL GO:0030516 regulation of axon extension 0.00650 0.91821
CvL GO:0006417 regulation of translation 0.00651 0.91821
CvL GO:0060027 convergent extension involved in gastrulation 0.00904 0.95684
CvL GO:0000902 cell morphogenesis 0.00954 0.95684
CvL GO:0043005 neuron projection 0.01354 0.95684
GvL GO:0050661 NADP binding 0.00072 0.54754
GvL GO:0021522 spinal cord motor neuron differentiation 0.00143 0.54754
GvL GO:0002224 toll-like receptor signaling pathway 0.00245 0.54754
GvL GO:0005096 GTPase activator activity 0.00249 0.54754
GvL GO:0007420 brain development 0.00277 0.54754
GvL GO:0032580 Golgi cisterna membrane 0.00519 0.79384
GvL GO:0042742 defense response to bacterium 0.00652 0.79384
GvL GO:0016712 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen 0.00655 0.79384
GvL GO:0006281 DNA repair 0.00803 0.79384
GvL GO:0007264 small GTPase mediated signal transduction 0.00830 0.79384

The mitch class sorting enrichment analysis (table 16) yielded no significantly enriched biological pathways for any of the three light group comparisons. The closest to a significant result was the biological pathway chaperone binding which was associated with an adjusted p-value of 0.172 when comparing clear versus green light conditions. GTP binding was associated with a q-value of 0.577 and was seventh most enriched pathway (when comparing p-values associated with pathways) out of a list of 1441 pathways in the clear versus green analysis. The clear versus lilac analysis yielded the least enriched results. Brain development was associated with a q-value of 0.548 and was the fifth most enriched pathway in the green versus lilac analysis. GTPase activator activity was also associated with a q-value of 0.548 and was the fourth most enriched pathway in the green versus lilac analysis.

Cluster profiler (section 3.5.2)

The most interesting results were extracted from the cluster profiler over representation analysis and were used to create two tables. One displayed numerous significantly enriched biological pathways (table 16) and the other displayed pathways which were not significantly enriched but still interesting results (table 17).

Significantly enriched

LightComp <- c('CvG','CvL','GvL','GvL','GvL','GvL','GvL','GvL','GvL','GvL','GvL')
LightComp
##  [1] "CvG" "CvL" "GvL" "GvL" "GvL" "GvL" "GvL" "GvL" "GvL" "GvL" "GvL"
MethDirection <- c('all','up','up','all','all','all','all','all','all','all','all')
Qvalselection <- c('<0.001','<0.5','<0.5','<0.5','<0.5','<0.001','<0.001','<0.001','<0.001','<0.001','<0.001')
Pathway <- c('Regulation of GTPase','nucleus','multicellular organism development','multicellular organism development','DNA-binding transcription factor activity, RNA polymerase II-specific','fatty acid transmembrane transporter activity','long-chain fatty acid metabolic process','long-chain fatty acid-CoA ligase activity','fatty acid transport',' very long-chain fatty acid-CoA ligase activity','bile acid biosynthetic process')
Qvalue <- c('0.0491','0.0446','0.0039','0.0009','0.0085','0.0043','0.0043','0.0043','0.0043','0.0043','0.0404')
SigMethTiles <- data.frame(LightComp, MethDirection, Qvalselection, Pathway, Qvalue)
kbl(SigMethTiles, caption = "Table 17 - Table displaying the significantly enriched biological pathways/functions and the q-value associated with these pathways/functions for each light group comparison, methylation direction and q-value selection combination") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 17 - Table displaying the significantly enriched biological pathways/functions and the q-value associated with these pathways/functions for each light group comparison, methylation direction and q-value selection combination
LightComp MethDirection Qvalselection Pathway Qvalue
CvG all <0.001 Regulation of GTPase 0.0491
CvL up <0.5 nucleus 0.0446
GvL up <0.5 multicellular organism development 0.0039
GvL all <0.5 multicellular organism development 0.0009
GvL all <0.5 DNA-binding transcription factor activity, RNA polymerase II-specific 0.0085
GvL all <0.001 fatty acid transmembrane transporter activity 0.0043
GvL all <0.001 long-chain fatty acid metabolic process 0.0043
GvL all <0.001 long-chain fatty acid-CoA ligase activity 0.0043
GvL all <0.001 fatty acid transport 0.0043
GvL all <0.001 very long-chain fatty acid-CoA ligase activity 0.0043
GvL all <0.001 bile acid biosynthetic process 0.0404

The biological pathway, regulation of GTPase, was significantly enriched in the clear versus green analysis in which all tiles associated with a q-value of <0.001 were selected. This biological pathway was associated with a q-value of 0.049. The clear versus lilac analysis in which only those tiles which were methylated in the positive direction and associated with a q-value of <0.05 were selected yielded one significantly enriched pathway. The pathway nucleus was associated with a q-value of 0.046.

The green versus lilac analysis in which only those tiles which were methylated in the positive direction and associated with a q-value of <0.05 were selected yielded one significantly enriched pathway. The pathway multicellular organism development was associated with a q-value of 0.0039. This same light condition analysis in which all tiles associated with a q-value of <0.05 were selected yielded two significant results. multicellular organism development was associated with a q-value of 0.0039 and DNA-binding transcription factor activity, RNA polymerase II-specific was associated with a q-value of 0.0085.

The green versus lilac analysis in which all those tiles which were associated with a q-value of 0.001 yielded six significantly enriched pathways. fatty acid transmembrane transporter activity, long-chain fatty acid metabolic process, long-chain fatty acid-CoA ligase activity, fatty acid transport and very long-chain fatty acid-CoA ligase activity were all associated with a q-value of 0.0043. bile acid biosynthetic process was associated with a q-value of 0.0404.

Other notable results

LightComp <- c('CvG','CvG','CvL','CvL','GvL','GvL')
LightComp
## [1] "CvG" "CvG" "CvL" "CvL" "GvL" "GvL"
MethDirection <- c('up','up','up','up','all','down')
Qvalselection <- c('<0.5','<0.5','<0.5','<0.001','<0.5','<0.5')
Pathway <- c('Regulation of cell migration','Neural crest cell','Retina morphogenesis in camera-type eye','melanocyte differentiation','GTPase activator activity','embryonic camera-type eye development')
Pvalue <- c('0.0039','0.0059','0.0006','0.0056','0.0001','0.0129')
Qvalue <- c('0.7950','0.7950','0.3619','0.5707','0.0723','0.5483')
No.Pathway <- c('4','5','2','2','3','15')
SigMethTiles <- data.frame(LightComp, MethDirection, Qvalselection, Pathway, Pvalue, Qvalue, No.Pathway)
kbl(SigMethTiles, digits = 150, caption = "Table 18 - Table displaying the non-significantly enriched biological pathways/functions that are interesting, the p-value and q-value associated with these pathways/functions and the pathway's rank in enrichment out of a list of 1441 pathways for each light group comparison, methylation direction and q-value selection combination") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))
Table 18 - Table displaying the non-significantly enriched biological pathways/functions that are interesting, the p-value and q-value associated with these pathways/functions and the pathway’s rank in enrichment out of a list of 1441 pathways for each light group comparison, methylation direction and q-value selection combination
LightComp MethDirection Qvalselection Pathway Pvalue Qvalue No.Pathway
CvG up <0.5 Regulation of cell migration 0.0039 0.7950 4
CvG up <0.5 Neural crest cell 0.0059 0.7950 5
CvL up <0.5 Retina morphogenesis in camera-type eye 0.0006 0.3619 2
CvL up <0.001 melanocyte differentiation 0.0056 0.5707 2
GvL all <0.5 GTPase activator activity 0.0001 0.0723 3
GvL down <0.5 embryonic camera-type eye development 0.0129 0.5483 15

The clear versus green analysis in which only those tiles which were methylated in the positive direction associated with a q-value of <0.05 were selected had two notable results. Regulation of cell migration was associated with a q-value of 0.7950 and was the fourth most enriched pathway out of 1441 pathways in this analysis. Neural crest cell development was also associated with a q-value of 0.7950 and was the fifth most enriched pathway.

The clear versus lilac analysis in which only those tiles which were methylated in the positive direction and associated with a q-value of <0.05 were selected yielded one other notable result. Retina morphogenesis in camera-type eye was associated with a q-value of 0.3619 and was the second most enriched pathway in this analysis. The clear versus lilac analysis in which only those tiles which were methylated in the positive direction and associated with a q-value of <0.001 were selected yielded one other notable result. melanocyte differentiation was associated with a q-value of 0.5707 and was the second most enriched pathway.

The green versus lilac analysis in which all those tiles which were associated with a q-value of <0.05 were selected yielded one other notable result. GTPase activator activity was associated with a 0.0723 q-value and was the third most enriched pathway in this analysis. The green versus lilac analysis in which only those tiles which were methylated in the negative direction and were associated with a q-value of <0.05 were selected yielded one other notable result. embryonic camera-type eye development was associated with a q-value of 0.5483 and was the 15 most enriched pathway for this analysis.

Identified tiles (section 3.6)

As charts were generated for a very large number of genes, not all could be presented in this thesis. Some of those which were associated with extremely significant or many significantly differentiated tiles and some those which were linked to the experimental conditions were selected. The rest of the plots can be obtained by running the script found at https://github.com/aaronsk7/guppy-methylation/blob/main/methylkitresultsanalysis.Rmd.

As the nasp gene was identified as significant at both the CpG site scale (table 9 and 11) and the tile scale (table x and x) this is one gene which was investigated further (figure x). The gpc5b gene was identified as significant at the tile scale and shared the same two tiles as the nasp gene in their promoter regions. The first tile (figure 16.A) had ~10% greater mean methylation in the green1, green3 and foundation population than the other populations. The second tile (figure 17.A) had similar results to the first tile. The lilac3 population had a ~6% greater mean methylation in the second tile and the populations green2, lilac1, lilac2, clear1, clear2 and clear3 had a ~2% greater mean population in the second tile. Differences in methylation percentage distributions are observable for both tiles in the violin plot (figure 16.B and 17.B) and beeswarm plot that are in line with the mean methylation percentage results (figure 16.C and 17.C).

knitr::include_graphics(rep("figures/naspcollage.png"))
Figure 16 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 1847001 on chromosome 4 and is within the gpc5b and nasp gene promoter region. (A) Barchart (B)  Violin plot (C) Beeswarm plot.

Figure 16 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 1847001 on chromosome 4 and is within the gpc5b and nasp gene promoter region. (A) Barchart (B) Violin plot (C) Beeswarm plot.

knitr::include_graphics(rep("figures/gpc5bcollage.png"))
Figure 17 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 1846001 on chromosome 4 and is within the gpc5b and nasp gene promoter region. (A) Barchart (B)  Violin plot (C) Beeswarm plot.

Figure 17 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 1846001 on chromosome 4 and is within the gpc5b and nasp gene promoter region. (A) Barchart (B) Violin plot (C) Beeswarm plot.

The most significant tile that could be annotated to the promoter region of a gene in the clear versus lilac comparison was annotated to the gene zgc:153964, the name of which has since been updated to ptk6b. This gene was investigated further. All three lilac populations and the clear1 population had a significantly reduced mean methylation percentage at <55% when compared to the others. The clear2 and clear3 populations had the greatest mean methylation at >75%. Differences in methylation percentage distributions are observable in the violin plot (figure 18.B) and beeswarm plot that are in line with the mean methylation percentage results (figure 18.C).

knitr::include_graphics(rep("figures/ptk6bcollage.png"))
Figure 18 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 20995001 on chromosome 2 and is within the zgc:153964 or ptk6b gene promoter region. (A) Barchart (B)  Violin plot (C) Beeswarm plot.

Figure 18 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 20995001 on chromosome 2 and is within the zgc:153964 or ptk6b gene promoter region. (A) Barchart (B) Violin plot (C) Beeswarm plot.

A tile which was significantly deferentially methylated and found in the promoter region of a gene, hmx1, which was linked the the biological pathway “retina morphogenesis in camera-type” was investigated further.

knitr::include_graphics(rep("figures/hmx1collage.png"))
Figure 19 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 14443001 on chromosome 1 and is within the hmx1 gene promoter region. (A) Barchart (B)  Violin plot (C) Beeswarm plot.

Figure 19 - series of charts displaying CpG methylation percentage for each sample at the tile which starts at the basepair 14443001 on chromosome 1 and is within the hmx1 gene promoter region. (A) Barchart (B) Violin plot (C) Beeswarm plot.