Abstract Background Over the course of its intraerythrocytic developmental cycle (IDC), the malaria parasite Plasmodium falciparum tightly orchestrates the rise and fall of transcript levels for hundreds of genes. Considerable debate has focused on the relative importance of transcriptional versus post-transcriptional processes in the regulation of transcript levels. Enzymatically active forms of RNAPII in other organisms have been associated with phosphorylation on the serines at positions 2 and 5 of the heptad repeats within the C-terminal domain (CTD) of RNAPII. We reasoned that insight into the contribution of transcriptional mechanisms to gene expression in P. falciparum could be obtained by comparing the presence of enzymatically active forms of RNAPII at multiple genes with the abundance of their associated transcripts. Results We exploited the phosphorylation state of the CTD to detect enzymatically active forms of RNAPII at most P. falciparum genes across the IDC. We raised highly specific monoclonal antibodies against three forms of the parasite CTD, namely unphosphorylated, Ser5-P and Ser2/5-P, and used these in ChIP-on-chip type experiments to map the genome-wide occupancy of RNAPII. Our data reveal that the IDC is divided into early and late phases of RNAPII occupancy evident from simple bi-phasic RNAPII binding profiles. By comparison to mRNA abundance, we identified sub-sets of genes with high occupancy by enzymatically active forms of RNAPII and relatively low transcript levels and vice versa. We further show that the presence of active and repressive histone modifications correlates with RNAPII occupancy over the IDC. Conclusions The simple early/late occupancy by RNAPII cannot account for the complex dynamics of mRNA accumulation over the IDC, suggesting a major role for mechanisms acting downstream of RNAPII occupancy in the control of gene expression in this parasite. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-959) contains supplementary material, which is available to authorized users. Background The malaria parasite Plasmodium falciparum undergoes a 48 h life cycle from the moment of red blood cell invasion through to the production and release of mature progeny. In the course of this intraerythrocytic developmental cycle (IDC), the mRNA level for many genes rises and falls once at a point that correlates with the time its protein product is needed. Such results have led to the proposal of a “just in time” model of plasmodial gene expression in which mRNAs accumulate just as their products are required during the IDC [[41]1] Many factors affect mRNA levels, including transcriptional initiation, transcriptional elongation, mRNA processing, mRNA export, and mRNA stability. While control of Plasmodium gene expression at the level of transcription has been demonstrated [[42]2–[43]5], several recent studies provide evidence that post-transcriptional mechanisms must play a major role as well. For example, P. falciparum shows a widespread chromatin opening and histone H2A.Z recruitment in the intergenic regions throughout the IDC. Although this modification has been associated with actively transcribed chromatin in other species, in Plasmodium falciparum this histone variant is also recruited early to genes whose transcripts do not appear until much later [[44]6–[45]8]. A similar disconnect is seen with component of the basal transcriptional machinery, TBP and TFIIE, which are broadly recruited to Plasmodium genes regardless of corresponding transcript abundance [[46]9]. Moreover, nuclear run-ons correlated active transcription of some selected genes with the levels of their transcripts across the IDC. While some loci showed clear positive correlation between transcriptional activity and mRNA abundance, others revealed striking discrepancies strongly indicative of post-transcriptional regulation [[47]10] and consistent with similar discordances seen in humans [[48]11]. Additionally, transcript stability has been demonstrated to vary by an average of six fold between ring and schizont stages, and is correlated with a progressive loss of RNA degrading enzymes [[49]12]. Last, parasites deficient in the post-transcriptional regulator CAF1 display major shifts in peaks of mRNA accumulation [[50]13]. Such findings are consistent with major post-transcriptional control during the IDC. During transcription, many steps of mRNA synthesis and processing are integrated through the C-terminal domain (CTD) of the largest subunit of RNAPII (RPB1). A hallmark of RPB1 in most eukaryotes is the presence of a repeating heptapeptide motif in the CTD [[51]14]. In most eukaryotes, the heptad repeat has the consensus sequence YSPTSPS and is present in many copies ranging from 52 in humans to 26 in Saccharomyces cerevisiae. Moreover, the number and sequence of repeats is highly invariant within and even between species. By contrast, the CTD of Plasmodium spp. differs by the inclusion of a lysine at position 7 of the heptad repeat (YSPTSPK), contains fewer repeats and shows much greater variability in repeat number between and within Plasmodium species [[52]15, [53]16]. Among other modifications, the serine residues at positions 2 and 5 (and 7 in many organisms), can be phosphorylated and intensive effort has gone into trying to understand the role of these modifications in gene expression. Much attention has focused on the functional consequences of an unphosphorylated CTD, mono-phosphorylation at position 5 (Ser5-P), and di-phosphorylation at positions 2 and 5 (Ser2/5-P). A long-held model proposes that the enzymatic activity of RNAPII is determined by the phosphorylation status of the CTD. In this model, RNAPII bearing a hypophosphorylated CTD is enzymatically inert, while Ser5-P is required for RNAPII to initiate transcription, and Ser2/5-P then confers the elongating and highly processive activity of RNAPII [[54]14]. However, a revised model suggests that the phosphorylation status of the CTD may simply be a correlative marker of RNAPII activity [[55]17]. Thus, while the exact functions of phosphorylation events at the CTD are a matter of debate, there is a strong consensus that the presence of Ser5-P and especially Ser2/5-P are marks of transcriptionally active polymerase. We have exploited the phosphorylation state of the RNAPII CTD to assess the engagement of most P. falciparum genes with the transcriptional machinery across the IDC. This allowed us to assess the extent to which RNAPII occupancy correlates with the mRNA accumulation. Our data indicate that P. falciparum genes are divided into two classes depending on whether peak RNAPII binding occurs early or late during the IDC. When comparing RNAPII occupancy to mRNA abundance, we identified subsets of genes with high occupancy and relatively low RNA levels and vice versa. We find that genes at which RNAPII is detected early (8 to 32 hours post-infection (hpi)) are shorter in length, have more introns and shorter first exons when compared to genes at which RNAPII is detected late (40-48 hpi). Furthermore, we find distinct RNAPII occupancy patterns for the early and late genes along the gene length. Comparison of RNAPII occupancy with various histone modifications identifies chromatin changes that precede, accompany and follow RNAPII recruitment. Together our results show that occupancy by presumed active forms of RNAPII cannot account for the highly orchestrated pattern of mRNA accumulation and loss over the IDC. We suggest that post-initiation mechanisms such as transcriptional elongation and mRNA stability may play predominant roles in determining the dynamics of mRNA abundance during infection of red blood cells. Results Production of monoclonal antibodies specific for three forms of the CTD of the large subunit of RNAPII Steps in the generation and use of monoclonal antibodies in this study are summarized in Figure [56]1A. To examine the genome-wide distribution of RNAPII, we raised highly specific monoclonal antibodies against three forms of the P. falciparum RPB1 CTD: unmodified CTD, Ser5-P-bearing CTD, and CTD having the Ser2/5-P double phosphorylation. The phosphorylated forms of RPB1 have been associated with RNAPII molecules that are functionally engaged with bound genes. Unphosphorylated heptad repeats may be associated with inert forms of the polymerase, but could also be present as one or more copies within a CTD that also contains phosphorylated motifs. The specificity of the antibodies was confirmed in several ways. First, the antibodies showed high specificity in ELISA by serial dilution (data not shown) and, second, by competition with the immunizing peptide in a peptide competition assay (Figure [57]1B-D). These results establish that each antibody specifically recognizes its cognate peptide and not the other two. Third, the appropriate sized band for P. falciparum RBP1 is also observed on Western blot of nuclear lysates following immunoprecipitation using each of the three monoclonal antibodies (Figure [58]1E). A well-characterized commercially available antibody (8WG16) that recognizes unphosphorylated and Ser5-P CTD [[59]18] likewise detects a similarly sized band corresponding to RPB1 in a Western blot of cell lysate. Although all antibodies (including 8WG16) detect additional bands, stringent washing leaves a unique band corresponding to the expected molecular weight of RPB1 as the major species detected by the αSer2/5-P antibody (indicated by W* in Figure [60]1E). Fourth, IP followed by mass spectrometry revealed that both the αSer5-P and αSer2/5-P antibodies precipitated RPB1 as well as two additional subunits of RNAPII, namely RPB2 and RPB3, with high confidence. No RNAPII subunits were precipitated by a control IgG (Additional file [61]1: Table S1). Together, these results show that these antibodies are specific for the CTD of RPB1 in P. falciparum and are suitable for use in the chromatin immunoprecipitation assay (ChIP). Figure 1. Figure 1 [62]Open in a new tab Validation of monoclonal antibodies against RNA polymerase II. (A) Schematic of the work flow. P. falciparum samples were harvested every 8 h across the 48 h life-cycle at time points (TP) 1 through TP6. The same culture was used for triplicate ChIP-on-chip and transcriptome analyses. (B–D) Peptide competition assays. Monoclonal antibodies were raised against peptides corresponding to the unmodified heptad repeat (anti-CTD), the heptad repeat monophosphorylated at position 5 (αSer5-P) and diphosphorylated at positions 2 and 5 (αSer2/5-P). Each monoclonal antibody was pre-incubated with each of the three immunogenic peptides derived from the RNAPII CTD. Reactivity against the immunizing peptide was then tested by ELISA. The concentration of competing peptide in μg/ml is given to the right of each graph. (E) Parasite nuclear protein lysate from an asynchronous culture was immunoprecipitated (IP) with each of the three monoclonal antibodies raised in this study or a non-specific IgG control. The precipitates were separated by 6% SDS-PAGE and Western blot (W) performed with the indicated antibodies. The appropriately sized band for P. falciparum RPB1 (279 KDa) was observed with all three antibodies (arrowhead). Two panels are shown for antibody α-Ser2/5-P and represent the same Western blot following a moderate and more extensive (W*) series of washes. Note that the only significant specific band is of the correct molecular mass for RPB1. No specific bands are detected by the α-CTD and α-Ser2/5 antibodies following IP with control IgG. The commercial anti-CTD antibody 8WG16 detects a band of the correct molecular mass for P. falciparum RPB1 along with several faster running species. The positions of the marker bands in KDa are given to the right of the panels they describe. RNA polymerase II shows a bi-phasic binding distribution across the P. falciparumIDC We next used these P. falciparum anti-RNAPII antibodies in ChIP-on-chip analysis to map the position and enzymatic status of RNAPII across the entire P. falciparum genome over the 48 h of the IDC at 8 h intervals. The commercial anti-CTD antibody, 8WG16, was also used in parallel. We used custom-made microarrays spotted with 70-mer oligonucleotides covering both coding and intergenic regions [[63]19, [64]20]. Following hybridization and analysis of the microarrays, we selected all those probes exhibiting a significantly oscillating profile with p <0.05 and ≥1.5 fold change across the life cycle. This cut-off resulted in the retention of a subset of probes for each of the three antibodies: 2,936 probes with the αCTD (unphosphorylated) antibody; 4,462 probes with the αSer5-P antibody; and 4,255 probes with the αSer2/5-P antibody. We addressed the temporal order within the dataset by creating a phaseogram of RNAPII occupancy. The genome-wide profiles reveal a striking periodicity involving the majority of genes (Figure [65]2A). Most genes can be divided in two groups: those that bind RNAPII early during the IDC and those that bind RNAPII late, regardless of the phosphorylation state of the CTD. This division into early versus late is most dramatic for the Ser2/5-P form of RNAPII which is the elongating form of the polymerase. The data show a striking switch between T4 (32 hpi) and T5 (40 hpi) such that the Ser2/5-P form binds maximally to one large fraction of genes early in the IDC until T4 (1,600 genes), but then switches to a largely mutually exclusive set of genes late in infection during T5 and T6 (1,359). Only 11% of genes are present in both categories due to differential signals from probes within the same gene. A similar observation is made for data sets obtained with the other two antibodies against the unmodified CTD and the Ser5-P; however, the switch between early and late binding profiles for these two forms occurs earlier between T3 (24 hpi) and T4 (32 hpi). Differential occupancy of the different forms of RNAPII argues against a bias inherent to the experimental procedure or the microarrays themselves. Importantly, ChIP-on-chip with the commercial 8WG16 antibody likewise revealed a segregation of early vs late signals for the majority of probes in a pattern resembling those obtained with the α-CTD and α-Ser5-P antibodies (Additional file [66]2: Figure S1A). This is consistent with the demonstration that 8WG16 recognizes the unmodified and Ser5-P forms of the heptad repeat [[67]18]. Hierarchical clustering of the datasets also highlights the distinct distribution of RNAPII in just two prominent groups, one dominated by loci where RNAPII binds early in the IDC and another dominated by loci where the binding is late (Additional file [68]2: Figure S1A and Additional file [69]3: Figure S2). The timings of the switches are identical to those revealed in the phaseograms. Figure 2. Figure 2 [70]Open in a new tab ChIP-on-chip reveals two classes of genes distinguished by recruitment of RNAPII either early or late in the IDC. (A) Three phaseograms (blue-yellow scale) representing temporal occupancy patterns for the three forms of RNAPII detected in this study: unphosphorylated CTD, Ser5-P and Ser2/5-P. The data were filtered to include only those loci where RNAPII shows an oscillating profile with p <0.05 and a fold change ≥1.5 across the life cycle. The yellow-blue color scale represents the mean-centered log transformed ratios of ChIP/input values across the IDC (T1 to T6). Two prominent probe clusters binding the Ser2/5-P form of RNAPII early vs late in the IDC are marked with orange rectangles. The phaseogram on the far right (green-red scale) depicts the phase of peak gene expression for the P. falciparum transcriptome across the IDC. (B-C) ChIP-qPCR confirms the presence of two classes of genes distinguished by enrichment in RNAPII binding either early or late in the IDC. Using immunoprecipitated DNA, quantitative real time PCR was carried out on representative genes chosen from the early (B) and late (C) categories to validate the ChIP occupancy profiles of Ser2/5-P RNAPII across the IDC. Primers for real time PCR were designed to amplify 200–300 bp regions around the respective probe on the microarray. The x-axis gives time points T1 to T6 and the y-axis shows the fold change. Solid lines represent real time PCR profiles, whereas dashed lines give the microarray (ChIP-on-chip) profile at the same probe region. The fine dotted line represents the mRNA levels across the six time points determined by hybridization of cDNA to the microarrays. Error bars show standard deviation of triplicate determinations. See also Additional files [71]2, [72]3 and [73]4: Figures S1, S2 and S3. To confirm results obtained by microarray, we performed ChiP-qPCR at each of the six time points for six genes that bind the elongating (Ser2/5-P) form of RNAPII early in the IDC and four that bind late. Although qPCR reveals a greater dynamic range than the microarray analysis, the results show that all early genes acquire peak RNAPII recruitment early in the IDC at either time point 3 or 4 (24 and 32 hpi), while late genes do not display maximal RNAPII binding until time point 6 (48 hpi) (Figure [74]2B,C). These findings strongly validate the results of ChIP-on-chip and confirm the presence of two gene classes distinguished by early or late enrichment in RNAPII binding over the course of the IDC. Relationship between transcriptional activity and mRNA abundance To characterize the relationship between the genome-wide binding of distinct forms of RNAPII and mRNA abundance, we calculated the Pearson’s Correlation Coefficient (PCC or r) between the dynamic RNAPII occupancy (p <0.05 and 1.5 fold change) and the expression levels of corresponding mRNAs. We established two subsets (Figure [75]3). In one subset, increased RNAPII occupancy (yellow) was positively correlated with high mRNA levels (red). In the other subset, the correlation was negative, meaning that there was either high mRNA expression without detectable RNAPII binding, or that we observed RNAPII binding without corresponding mRNA expression. With respect to the Ser2/5-P isoform of RNAPII, 1,740 probes were highly positively correlated with gene expression (right panel), while 1,200 probes were strongly negatively correlated (left panel) (Figure [76]3C). Similar observations were also made with the unmodified CTD and Ser5-P isoforms of RNAPII (Figure [77]3A,B) as well as with the 8WG16 commercial antibody (Additional file [78]2: Figure S1B). Approximately a third of the probes showed no significant correlation. The above results thus reveal four subsets of genes with respect to RNAPII occupancy and mRNA abundance: genes which bind RNAPII early and are either positively or negatively correlated with mRNA levels (early/positive and early/negative), and genes which bind RNAPII late (late/positive and late/negative). Figure 3. Figure 3 [79]Open in a new tab Relationship between RNAPII occupancy and mRNA abundance. Correlation between the genome-wide detection of unmodified CTD (panel A), Ser5-P (panel B) and Ser2/5-P (panel C) forms of RNAPII and mRNA levels was calculated using Pearson’s correlation coefficient (PCC) between the mean-centered profile of all probes with p <0.05 and the corresponding gene transcripts. The data have been presented as bar graphs with bins of PCC ranging from −1 to +1 on the x-axis plotted against the number of probes falling into each bin on the y-axis. The phaseograms in the right-hand panels represent probes which are highly positively correlated with gene expression (r ≥0.4), and in the left-hand panels display probes showing negative correlation with gene expression (r ≤ -0.4). One-third of the probes do not show any correlation. The order of the probes represented in the phaseograms for RNAPII binding is the same as those used in the mRNA phaseogram. On the basis of this analysis, we identify four subsets of genes with respect to the correlation between RNAPII binding and mRNA abundance. These are early/positive (RNAPII binds early in the IDC and is positively correlated with mRNA levels), early/negative and late/positive and late/negative. We assessed the genomic characteristics of RNAPII-Ser2/5-P-associated early and late genes. We mapped the genes that showed either early or late binding of RNAPII Ser2/5-P onto the 14 P. falciparum chromosomes. All chromosomes were seen to bear potential clusters (Additional file [80]4: Figure S3); however, closer examination revealed that genes within such clusters do not share a common expression profile, arguing against any functional organization and consistent with a previous study suggesting that P. falciparum lacks chromosomal domains with common expression profiles [[81]1]. We mapped several genomic descriptors known to associate with gene expression regulation in human [[82]21], including gene length, GC content, number of introns and intron length. We found that genes binding RNAPII Ser2/5-P at early times (T1 to T4) are short with lower GC content by comparison to late genes (T5 and T6) which are significantly longer (Additional file [83]5: Figure S4A, B). In addition, the early genes had a greater number of introns when compared to late genes (Additional file [84]5: Figure S4C; see below). The presence of introns has been linked to higher transcriptional efficiency [[85]22]. We assessed the length of the first exon and found that late genes have longer first exons when compared to early genes (Additional file [86]5: Figure S4D). Thus, early-transcribed genes are shorter in length, have more introns and shorter first exons. Short gene length and short first exons are properties of highly expressed genes in human cells [[87]23, [88]24]. Differential distribution of RNAPII along the transcription unit distinguishes genes binding RNAPII early vs late in the IDC According to one model of CTD function, phosphorylation of Ser5 residues predominates near the promoter, whereas toward the end of the gene RNAPII is extensively phosphorylated on Ser2 residues [[89]25, [90]26]. To explore the phosphorylation status of the CTD at different positions along the transcription unit, we examined the distribution of the three forms of RNAPII along the gene for each of the four classes, namely early and late binders that positively correlated with the mRNA levels (Figure [91]4A, right panels) and early and late binders that negatively correlated with mRNA levels (Figure [92]4A, left panels). We observed very distinct profiles for genes that show a positive correlation between mRNA levels and peak RNAPII binding either early or late in the IDC. Promoter regions in P. falciparum are predicted to lie between 300 and 1650 bp upstream of the ATG [[93]27]. RNAPII accumulates in the promoter region of early binders, whereas for late binders RNAPII is enriched in the gene body (Figure [94]4A, right panel). Likewise, the Ser2/5-P form of RNAPII was concentrated within upstream presumptive promoter regions (–1,000 to –500) of genes expressed early and negatively correlating with mRNA levels (Figure [95]4A, left panel). On the other hand, the unmodified (CTD) and Ser5-P forms did not obviously follow this distribution. A goodness-of-fit test assigns a significant p value (<0.01) to the difference in RNAPII distribution (all three forms) between early and late binders (Figure [96]4B,C) suggesting that the transcriptional process in early vs late phases may be distinguished by mechanistic differences. However, it is possible that this has more to do with the transcription of shorter vs longer genes or the coupling of the splicing machinery to the CTD, since “late” genes are longer and have fewer introns, as noted above. Figure 4. Figure 4 [97]Open in a new tab Distribution of RNAPII along the gene length and enriched pathways. (A) The plots represent the average distribution along the gene of probes (with p <0.05) associated with each form of RNAPII. The start codon is taken as “0”. Data are organized into bins ranging from -500 bp upstream of the ATG to +4000 bp downstream and plotted against the percentage of total probes falling into each bin. The position of a probe along the x axis is the average of the positions of all probes within a given bin. While all genes are aligned at the ATG (“0”), they terminate at a wide range of positions downstream with an average at +2.5 kb. The plots on the right show the distribution of early/positive and late/positive loci (r >0.4), while those on the left plot the distribution of early/negative and late/negative loci (r <0.4). (B-C) The plots represent the average distribution of the genetic loci associated with each form of RNAPII as a function of gene length. Position distributions were plotted as histograms for each gene group with intervals of 500 bp from -2 kb to +5 kb. Histograms were plotted to display the binding positions in 500 bp intervals from -2 kb to 5 kb flanking the translational start codon at 0. A goodness-of-fit test was performed within each bin to calculate the probability of that region being equally bound by RNAPII for different gene groups. Regions with binding bias by group were assigned p <0.05 using the chi-square test. Data were assessed in this manner for (B) early/positive and late/positive and (C) early/negative and late/negative subsets. (D) Functionally enriched pathways (p <0.05) for each of the four gene groups. Colors refer to the three distinct databases that were used for pathway enrichment analysis. To determine whether the observed differential recruitment of RNAPII at early vs. late times of infection also reflects functional gene classes, we mapped early and late genes to known pathways and determined which pathways are over-represented in each group by using three pathway databases, namely Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG) and Malaria Parasite Metabolic Pathway (MPMP) (Figure [98]4D). We found that early genes were significantly enriched at loci associated with growth (ribosome structure and maturation), metabolism (starch and sucrose metabolism, purine and pyrimidine metabolism), and transcription and splicing. As expected, late genes were significantly enriched in proteins involved in host-parasite interactions and pathogenesis (merozoite invasion and motility, merozoite surface proteins) and cytoskeleton organization and biogenesis. Relationship between RNAPII occupancy and chromatin modifications Our group and others have shown that chromatin influences gene expression in P. falciparum [[99]6, [100]19, [101]28–[102]34]. To examine the relationship between RNAPII occupancy and chromatin modification, we calculated the PCC (r) between dynamic RNAPII occupancy of unmodified CTD, Ser5-P and Ser2/5-P and 13 histone modifications determined in our previous study [[103]19]. We identified 5 histone marks including H3K9ac, H3K56ac, H4ac4, H4K12ac and H4K20me1 which showed clear positive correlation with Ser2/5-P RNAPII binding at high statistical confidence (Figure [104]5). All these histone modifications, in addition to H3K4me3, associate with transcriptionally active protein coding genes and are described as active marks [[105]30, [106]31, [107]33], and references therein.