Abstract
The genome sequence available for different Plasmodium species is a
valuable resource for understanding malaria parasite biology. However,
comparative genomics on its own cannot fully explain all the
species-specific differences which suggests that other genomic aspects
such as regulation of gene expression play an important role in
defining species-specific characteristics. Here, we developed a
comprehensive approach to measure transcriptional changes of the
evolutionary conserved syntenic orthologs during the intraerythrocytic
developmental cycle across six Plasmodium species. We show significant
transcriptional constraint at the mid-developmental stage of Plasmodium
species while the earliest stages of parasite development display the
greatest transcriptional variation associated with critical functional
processes. Modeling of the evolutionary relationship based on changes
in transcriptional profile reveal a phylogeny pattern of the Plasmodium
species that strictly follows its mammalian hosts. In addition, the
work shows that transcriptional conserved orthologs represent potential
future targets for anti-malaria intervention as they would be expected
to carry out key essential functions within the parasites. This work
provides an integrated analysis of orthologous transcriptome, which
aims to provide insights into the Plasmodium evolution thereby
establishing a framework to explore complex pathways and drug discovery
in Plasmodium species with broad host range.
Abbreviations: IDC, intraerythrocytic developmental cycle; P.
falciparum, PF, Plasmodium falciparum; P. vivax, PV, Plasmodium vivax;
P. knowlesi, PK, Plasmodium knowlesi; P. berghei, PB, Plasmodium
berghei; P. chabaudi, PC, Plasmodium chabaudi; P. yoelii, PY,
Plasmodium yoelii; PCC, Pearson correlation coefficient; Ds, Divergent
score; Ph, Phase; PBS, Phosphate-buffered saline; K[a], non-synonymous
site; K[s], synonymous site; hr, hour; SRCC, Spearman Rank Correlation
Coefficient; AU, approximately unbiased; BP, bootstrap probability;
MPM, Malaria Parasite Metabolic Pathway
Keywords: Comparative transcriptomics, Plasmodium species, Evolution,
Microarray, Transcriptome, Drug targets
Highlights
* •
Comparison of variations in mRNA abundance across six different
Plasmodium species.
* •
Transcriptional conservation and divergence of Plasmodium syntenic
orthologs.
* •
Pattern of Plasmodium transcriptome evolution are established.
* •
Transcriptionally conserved orthologs represent attractive
intervention targets.
Malaria remains a major public health concern despite global efforts in
the fight against this disease. The intraerythrocytic stage of the
malaria parasites is currently in the spotlight for anti-malarial
intervention and vaccine targets. The primary goal of this study is to
generate a comprehensive and directly comparable transcriptome dataset
across multiple Plasmodium species originating from different hosts. We
establish that specific pathways and intraerythrocytic stages are more
transcriptionally diverged than others, reflecting transcriptional
evolutionary diversity. We further propose a panel of transcriptionally
conserved genes as potential drug targets.
1. Introduction
Protozoan belonging to the Plasmodium species are obligate
intracellular parasites that display substantial developmental
complexity during their life cycle in the vertebrate hosts and mosquito
vectors. The development of single nucleated parasite cells into
multi-nucleated schizonts through several rounds of mitosis closely
resembles embryonic development of multicellular organisms with a
majority of genes changing their expression during this period
([43]Piras et al., 2014, [44]Quint et al., 2012, [45]Bozdech et al.,
2003, [46]Irie and Kuratani, 2011). The intraerythrocytic developmental
cycle (IDC) exhibits a tightly regulated transcriptional cascade in
which essentially every gene in the genome is targeted to a specific
stage of the parasite development. This precise control of gene
expression ultimately governs critical functional processes for
Plasmodium species to thrive within the host erythrocytes ([47]Bozdech
et al., 2003, [48]Le Roch et al., 2003). We have previously reported
that orthologous expression between two human Plasmodium species,
Plasmodium falciparum and Plasmodium vivax, showed transcriptional
divergence in 30% of the genes, some of which could be linked to known
functional differences between these two species ([49]Bozdech et al.,
2008). This suggested that while there is high conservation of the
overall pattern of genome activity during the IDC of the two Plasmodium
species, there is transcriptional variation of a subset of genes that
enable the parasites to adapt to the individual host niches presumably
defined by variations in nutrients, metabolites, as well as other
cellular components ([50]Kafsack and Llinas, 2010, [51]Srivastava et
al., 2015). However, no studies so far have addressed the extent of
transcriptional diversity of Plasmodium species from distinct host
erythrocytes. Here, we developed a comprehensive analysis of the IDC
transcriptional profiles; from uniform data processing to extensive
orthology annotation, which allow us to directly compare variations in
mRNA abundance across six different Plasmodium species and
significantly extends previous work using pairwise comparison
([52]Bozdech et al., 2008). The Plasmodium conserved syntenic orthologs
show significant transcriptional divergence at the earliest stages of
parasite development, with transcriptional phylogeny pattern that
strictly follows the Plasmodium species mammalian hosts. Furthermore,
changes in the expression of key putative transcriptional regulators
are implicated in the transcriptional diversity. Critically the work
also provides the tools for the identification of new and so far
uncharacterized drug targets, as the orthologs that show a conserved
transcription pattern across the species are likely to carry out
critical conserved functions in all Plasmodium species.
2. Materials and methods
2.1. Sample collection for microarray analysis
2.1.1. Rodent malaria parasites
All studies involving mice were approved by the institutional animal
care and use committee (IACUC) of the Nanyang Technological University,
Singapore. Male BALB/c mice 6–7 weeks old, bred specific pathogen free
(SPF) at the Nanyang Technological University Animal Resource Facility,
were infected with either cryopreserved stocks of parasites or by
syringe passage from a pre-existing infected mouse. Mice were infected
by intraperitoneal injections of Plasmodium berghei ANKA, Plasmodium
chabaudi AS or Plasmodium yoelii 17 × parasitized erythrocytes and
parasitaemia and parasite stages were monitored by thin blood smears
stained with Giemsa. For P. berghei and P. yoelii infection, mice were
terminal bleed and the stage-specific parasitized erythrocytes were
separated via Nycodenz density gradient. The ring stage interface was
isolated, washed and subjected to ex vivo culture, which was then
collected every 2 hr over the course of 24 hr over a complete IDC
life-cycle. Mice infected with P. chabaudi were terminal bled every
2 hr under anesthesia over the course of 24 hr. Blood was collected and
filtered through Plasmodipur filters (Eurodiagnostica, Netherlands) to
remove white blood cells and then washed with PBS. The washed blood was
flash-frozen in liquid nitrogen and stored at − 80 °C until further
use.
2.2. RNA extraction, cDNA preparation and DNA microarray hybridization
Total RNA was isolated using a standard protocol using
trizol/chloroform extraction as described by ([53]Bozdech et al.,
2003). For preparation of the target DNA for microarray hybridization,
Switch Mechanism at the 5′ end of Reverse Transcription (SMART) PCR
approach was employed ([54]Petalidis et al., 2003). Thereafter, cDNA
was synthesized by using the reverse transcriptase (PowerScript,
Clontech BD) for 2 hr at 42 °C. This was followed by PCR amplification
with Taq Polymerase (NEB) and the resulting PCR product was purified
using MiniElute DNA purification kit (Qiagen). The purified DNA was
labeled with fluorescent dye Cy5 (Amersham). A reference pool
comprising of equal mass of total RNA samples representing all
developmental stages of the parasite was prepared and labeled with Cy3
(Amersham). The microarray hybridization was carried at 65 °C in the
automated hybridization station (MAUI, USA). In these two channels
competitive hybridizations, RNA from each time point was labeled by Cy5
and hybridized against a reference RNA pool labeled with Cy3. Data
acquired were analyzed by GenePix Pro software (Axon Instruments USA).
2.3. Microarray data processing and analysis
2.3.1. Reannotation of oligos
All the oligonucleotides used in this study for the rodent malaria
parasites microarray were designed by OligoRankPick as previously
described ([55]Hu et al., 2007). The rodent malaria parasites
microarray contained 13,224 60-mer oligos. The unprocessed microarray
hybridization spots for P. falciparum, P. vivax and P. knowlesi were
derived from previously published data; P. falciparum Dd2 strain
([56]Foth et al., 2011), P. vivax smru1 strain ([57]Bozdech et al.,
2008) and P. knowlesi PkHa strain ([58]Lapp et al., 2015). The
oligonucleotides used in the current and previous microarray studies
were blasted against the genomes of the respective Plasmodium species
from PlasmoDB release 8.2. As a result, a total of 5276 P. falciparum
genes, 4700 P. knowlesi genes, 5017 P. vivax genes, 6251 P. yoelii
genes, 4486 P. chabaudi genes and 4401 P. berghei genes were uniquely
represented on the corresponding microarray datasets.
2.3.2. Normalize and data filtering
All microarray hybridization spots obtained from all six Plasmodium
species were subjected to “normexp” background correction followed by
LOWESS (locally weighted scatterplot smoothing) normalization within
each array and quantile normalization between arrays using Limma
package of R. Log[2] ratios of Cy5 over Cy3 intensities were calculated
for each spot to represent expression value of a particular probe
except those with signal intensity < 1.5 times the background intensity
for both Cy5 and Cy3 fluorescence. For each gene, the expression value
was estimated as the average of all probes representing it. Overall,
4750 (90% of genes designed on the microarray) P. falciparum genes,
4670 (99%) P. knowlesi genes, 4884 (97%) P. vivax genes, 5486 (88%) P.
yoelii genes, 3990 (89%) P. chabaudi genes and 3787 (86%) P. berghei
genes display expression profiles with one or zero missing value across
each IDC life cycle. These “processed” microarray expression dataset
was used for subsequent analysis. The raw and processed microarray data
for P. yoelii, P. berghei and P. chabaudi have been deposited into Gene
Expression Omnibus (GEO accession number: [59]GSE80015).
2.3.3. Phaseogram
The expression profile of each gene is modeled using sine function
which has been described in detail ([60]Lapp et al., 2015). Briefly,
the formula is E(t) = A × sin (ωt - α) + C.
where E(t) is the log[2] ratio sample/reference control at the t time
point of sample collection, A is the amplitude of expression profile
across life cycle, C is the vertical offset of profile from zero, ω is
the angular frequency given by 2π/h and h is the length of a complete
IDC duration expressed in hours (details see below), and α is the
horizontal offset of profile from zero. α was projected into an
interval ranging from 0 to 2π with, π/2 of α representing the peak
expression at early ring stage matching to 0. The converted α was
subsequently used as phaseogram (Ph) to indicate the IDC timing where
gene expression peaks. Ph for each gene was sorted from early to late
IDC for complete visualization of genes expressed in the complete IDC.
2.3.4. Estimation of IDC duration and adjustment of phaseogram
The timing for complete IDC varies between the six Plasmodium species
and the time points collected for each sample were different.
Therefore, h, which is the length of complete IDC duration was
optimized and projected within 44 hr to 54 hr for P. falciparum, P.
vivax and 20 hr to 36 hr for P. knowlesi and the other three rodent
strains to determine the best fit sine function model ([61]Lapp et al.,
2015). As a result, the optimized h is 48 hr for P. falciparum, 49 hr
for P. vivax, 29 hr for P. knowlesi, 27 hr for P. yoelii, 28 hr for P.
chabaudi genes and 29 hr P. berghei. To minimize the effect of
asynchronous parasites between species samples, the phaseogram was
adjusted for P. vivax, P. knowlesi, P. yoelii, P. chabaudi and P.
berghei to P. falciparum reference. For example, for P. vivax
phaseogram adjustment, the best matching time points with P. falciparum
for each P. vivax time point was estimated by the highest Spearman Rank
Correlation Coefficient (SRCC) values between global transcription
profiles of syntenic orthologs. Next, the average shift of time using
those best matching time point pairs with SRCC < 0.2 was calculated,
which resulted in 25 time points for each gene per species on a sine
wave model.
2.3.5. Delta phase, ΔPh
The dissimilarity of transcription profiles between two genes is given
by:
[MATH: ∆Phab=minPha-Phb2π-Pha-Phb
mfenced>,a≠b. :MATH]
Where ∆ Ph is the distance of Ph expression timing between gene a and
gene b, Ph[a] and Ph[b] is the expression timing of gene a and b
respectively.
2.3.6. Medoid gene and Ds value
The metric of transcription divergence among multiple genes, given by
Ds, is the average ∆ Ph of each gene to the medoid gene. K-medoids
clustering methods was applied, where K equal to 1, and took the genes
in question as a cluster. The medoid gene of this cluster was
determined based on the dissimilarity matrix, which contained ∆ Ph of
all gene pairs. Ds value is the average dissimilarity of this cluster
based on its medoid. Calculation was conducted using the package
‘cluster’ of R.
2.3.7. Detection of outlier gene and species
For a syntenic orthologs, outlier of one species is defined as the
ortholog gene with the most divergent expression timing or Ph compare
to the other species member within an ortholog group. To detect the
outlier for each ortholog group, we construct the dissimilarity matrix
of species based on ∆ Ph. Outlier is define as the ortholog gene of a
particular species which maximally contributes to the sum of ∆ Ph of
that dissimilarity matrix.
2.3.8. Dendrogram of transcriptome relationship
Dendrogram was constructed by applying Ward hierarchical clustering
method based on the dissimilarity matrix containing the distance of
each pair of species, ∆ Ph. The distance of two species is defined as;
[MATH: DAB=∑i=0n
∆PhiAB2
mrow>,A≠B :MATH]
where D(A,B) is the distance of species A and B, n is the total number
of syntenic orthologous gene which is equal to 2312 and ∆ Ph[i] (A,B)
if the distance of expression timing between the ith gene of species A
and its orthologous gene in species B. The dendrogram cluster was
subjected to 100 times bootstrapping to estimate the percentage of
approximately unbiased (AU) p-value and bootstrap probability (BP).
2.3.9. K[a]/K[s] ratio
K[a]/K[s] ratio is the ratio of the number of nonsynonymous
substitutions per non-synonymous site (K[a]) to the number of
synonymous substitutions per synonymous site (K[s]). Selective pressure
of protein-coding genes between six Plasmodium species was estimated by
calculating K[a]/K[s] ratio for each pair of syntenic orthologs for a
total of 2312 orthologs. The syntenic orthologs were aligned using
ClustalW ([62]Larkin et al., 2007) and K[a]/K[s] were calculated using
package ‘seqinr’ of R ([63]Charif and Lobry, 2007). Dendrogram
indicating evolution relationship between species was constructed based
on the dissimilarity matrix in which the distance of two species was
defined by the mode of K[a] values.
2.3.10. Functional enrichment analysis
Functional enrichment analysis was carried out by identifying syntenic
orthologs sets that are significantly over-represented in studied genes
list by comparing to annotated gene sets from P. falciparum MPM
database. Hypergeometric test was applied to calculate the level of
significance of indicated orthologous gene sets from the MPM database.
3. Results
3.1. Lack of conservation of specific ortholog expression in Plasmodium
species' life cycle
Despite the long evolutionary time scale of about 100 million years
([64]Escalante and Ayala, 1995, [65]Escalante and Ayala, 1994), the
overall genome organization and content across the sequenced Plasmodium
species is highly conserved, with about 4000 conserved syntenic genes
located within the central core regions of the 14 chromosomes ([66]Hall
et al., 2002, [67]Carlton et al., 2008a, [68]Pain et al., 2008,
[69]Gardner et al., 2002, [70]Hall et al., 2005). On the other hand,
species-specific differences are attributed to the variant multigene
families that reside predominantly at the subtelomeric regions. In this
study, we generated the IDC transcriptomes for the rodent malaria
parasites P. berghei, P. yoelii and P. chabaudi and compared these with
the previously generated data for the human malaria species P.
falciparum ([71]Foth et al., 2011), P. vivax ([72]Bozdech et al.,
2008), and the simian malaria species P. knowlesi ([73]Lapp et al.,
2015) (Supplementary Table 1, Supplementary Fig. 1a). The IDC
transcriptional profiles of each gene were first best fitted onto a
sine-wave function as described in the experimental procedure section
([74]Fig. 1a, Supplementary Fig. 1b). This allowed us to define several
parameters that reflect the temporal aspects of gene expression and
thus make direct species-to-species comparisons. The quality of the
newly generated data was further verified by comparisons with the
recently published P. berghei RNA-seq data showing high levels of
correlation (Supplementary Fig. 1c) ([75]Otto et al., 2014). Overall,
3787 to 5486 genes (including species-specific genes) exhibited
specific IDC transcriptional profiles in the six Plasmodium species
([76]Table 1). In contrast, between 29 and 1436 genes are unique to
individual species with specific transcriptional profiles ([77]Table 1,
Supplementary Fig. 1d), where the majority of them are genes restricted
to subtelomeric ends and are involved in antigenic variation,
host-parasite interactions, cytoadherence and erythrocyte aggregation
for all six species, similar to previous genomic studies ([78]Kooij et
al., 2005, [79]Kyes et al., 2001, [80]Dzikowski et al., 2006,
[81]Galinski and Corredor, 2004). Interestingly, while the distribution
of peak time expression remains relatively constant for the syntenic
orthologs, the species-specific genes show remarkable variability
(Supplementary Fig. 1d).
Fig. 1.
[82]Fig. 1
[83]Open in a new tab
Smoothed transcriptome and comparative correlation transcriptomic
analyses of P. falciparum, P. vivax, P. knowlesi, P. berghei, P. yoelii
and P. chabaudi.
(a) Overall intraerythrocytic developmental cycle (IDC) transcriptome
profiling for P. falciparum (4750 genes), P. vivax (4884 genes), P.
knowlesi (4670 genes), P. yoelii (5486 genes), P. berghei (3787 genes),
and P. chabaudi (3990 genes). The phaseograms were generated from the
log2 expression ratio and each profile was median centered. The
phaseograms also include the expression of 2312 syntenic orthologous
genes present in all 6 Plasmodium species. (b) Histograms showed
overall distribution of Pearson correlation coefficients (PCCs)
calculated from the smoothed transcriptome dataset between P. knowlesi,
P. chabaudi, P. berghei and P. yoelii using P. falciparum as reference
species. (c) Venn diagram analysis of co-expressed genes with PCC
scores of ≥ 0.5–1.00 in different Plasmodium species. The numbers in
brackets beside each species pairs (Pf vs Pv, Pf vs Pk, Pf vs Pb, Pf vs
Pc, Pf vs Py) represent total number of genes with PCCs of ≥ 0.50. The
numbers inside the Venn diagram represent total number of overlapped
orthologous genes between the species pair.
Table 1.
Summary table of number of genes that are syntenic orthologs and with
specific transcriptional profile across all six Plasmodium species.
Microarray data analysis
__________________________________________________________________
Pf Pk Pv Py Pb Pc
Reannotation of oligos for microarray data analysis 5276 4700 5017 6251
4401 4486
After data normalization and filtering (related to [84]Fig. 1) 4750
4670 4884 5486 3787 3990
Syntenic orthologs and species-species specific genes
Syntenic orthologs present in at least one or more species (OrthoMCL
DB) 4148 4286 4275 3577 3708 3783
Syntenic orthologs present in all six species (OrthoMCL DB) 3374 3374
3374 3374 3374 3374
Syntenic orthologs present in at least one or more species with
specific transcriptional profile 3175 3182 3190 3161 2951 3025
Syntenic orthologs present in all six species with specific
transcriptional profile 2312 2312 2312 2312 2312 2312
Species-specific genes with specific transcriptional profile
(non-orthologs, non-syntenic) 84 245 457 1436 29 136
[85]Open in a new tab
Previous studies using Pearson correlation coefficient (PCC) to compare
P. falciparum to P. vivax had indicated that PCC between 1 and 0.5
represent strongly correlated pattern, 0.5 to − 0.2 represent a
moderate timing shift in gene expression, while PCC < − 0.2 indicate
dramatic changes of gene expression along the IDC ([86]Bozdech et al.,
2008) (Supplementary Fig. 2). Using the same approach, we compared all
orthologous gene pairs with P. falciparum as the reference (Pf-Pk,
Pf-Pc, Pf-Pb and Pf-Py) and showed high correlations between 50 and
65%, intermediate to low correlations for 18% to 23% and
anti-correlation for 18% to 26% ([87]Fig. 1b). P. yoelii appears most
divergent from P. falciparum with 725 (23.1%) and 815 (25.9%) gene
pairs with low and anti-correlations, respectively ([88]Fig. 1b). The
overall distribution of highly correlated orthologous genes (PCC
1–0.50) showed only 474 genes conserved among the six species ([89]Fig.
1c). Even when using any group of five out of the six species the total
number of conserved genes would increase by only 104 to 341 genes
([90]Fig. 1c). This suggests that only a relatively small subset of
genes remained transcriptionally conserved throughout Plasmodium
evolution while the majority underwent (some level of) transcriptional
diversification. The results also indicate a relatively limited
constraint on transcriptional timing across the syntenic orthologs
providing a considerable flexibility for species-specific adaptation
among Plasmodium species.
3.2. Pattern of evolution in Plasmodium transcriptional regulation
To compare the temporal character of expression between orthologous
genes, we define the time-shift (delta phase, ∆ Ph) relationship
between the ortholog pairs derived from their IDC gene expression
profiles. ∆ Ph between the species' orthologs for each gene expression
profile was first projected onto a polar coordinate system with timing
of gene expression being depicted by values ranging from 0 to 2π
(Supplementary Fig. 3). Phase adjustment, Ph was performed by
offsetting the peak of gene expression timing to the peak gene
expression time point at the early ring-stage of development
([91]Bozdech et al., 2008) (Supplementary Table 2). This method takes
advantage of a uniform periodic character of the gene expression
profiles that allows much more efficient and representative
normalization of gene expression profiles compared to PCC-based
similarity or Euclidean distance tests generally used to study other
organisms ([92]Pereira et al., 2009, [93]Glazko and Mushegian, 2010).
In total, we established ∆ Ph for the 2312 syntenic ortholog groups
with IDC expression datasets for all six Plasmodium species. K-means
clustering for the ∆ Ph values revealed eight distinct clusters that
represent the overall evolutionary pattern of transcriptional
regulation in Plasmodium ([94]Fig. 2a). The eight clusters consist of
gene groups that are (i.) fully conserved in all six species; (ii.)
show no similarity in gene expression in any species pair; (iii.) genes
with diverse expression in one of the species while similar in the
other five. In particular, (i) there are 818 genes whose
transcriptional timing have been maintained throughout evolution
([95]Fig. 2a), representing 35.4% of the 2312 syntenic orthologs. This
transcriptionally conservation suggests a crucial role of these genes
in Plasmodium biology. On the other hand, (ii) there are 309 genes that
are completely devoid of any transcriptional conservation among the
Plasmodium species. Although the role(s) of these genes is generally
unknown, this large diversity suggests their involvement in highly
dynamic evolutionary processes in Plasmodium. Finally, (iii) there are
six clusters ranging from 136 to 357 genes, in which one Plasmodium
species shows a diversion from the other five ([96]Fig. 2a). To further
clarify these findings we calculated the overall transcriptional
divergence (Ds score) among orthologs as the average ∆ Ph between each
species gene to the medoid species gene ([97]Fig. 2b). When including
all 6 species, about 50% of the ortholog groups have a Ds6 value of
< 0.75 (conserved transcription). This can be significantly increased
to approximately 80% if one (Ds5) or two (Ds4) outlier gene(s) with the
most divergent Ph are removed ([98]Fig. 2b, Supplementary Table 2).
This supports the model that the majority of transcriptional diversion
occurred in individual single species diverting gene expression from a
putative ancestral transcriptional profile. Finally, the ∆ Ph distance
measure separates the Plasmodium species into two distinct clades that
delineates precisely along the mammalian host species ([99]Fig. 2c).
This suggests that transcriptome-based phylogeny is shaped by the
adaptation of the different Plasmodium species in their respective
mammalian host erythrocytes; the human P. falciparum and P. vivax, the
rhesus macaque P. knowlesi and the rodent Plasmodium species.
Fig. 2.
[100]Fig. 2
[101]Open in a new tab
Delta phase (ΔPh) and Ds as the measurement of transcriptional
divergence.
(a) ΔPh is calculated based on the absolute difference of Ph or peak in
gene expression timing between any two orthologs in two different
species. Heatmap shows the syntenic orthologs with K-means clustering
(100 runs) of ΔPh of species pair P. vivax; Pv, P. knowlesi; Pk, P.
chabaudi; Pc, P. berghei; Pb and P. yoelii; Py with reference to P.
falciparum; Pf. Yellow represents highly divergent genes with large
delta phase, while black represents highly conserved genes with small
delta phase compare to P. falciparum. (b) Ds score is the average ΔPh
between each orthologs to the medoid ortholog measuring the
transcription divergence of a gene across multiple species (see
method). Overall frequency distribution and cumulative proportion of Ds
measured from 2312 syntenic orthologous genes present in all six
Plasmodium species (Ds6), any five Plasmodium species (Ds5) or any four
Plasmodium species (Ds4). (c) Hierarchical clustering of the six
Plasmodium species based on the IDC phaseogram using Wards algorithm of
clustering and dissimilarity matrix defined by ∆ Ph (see methods).
Numbers adjacent to the branch points are percentage of approximately
unbiased (AU), P-value (in red) and bootstrap probability (BP) (in
green).
3.3. Sequence and transcription-based evolution in Plasmodium species
In Plasmodium, diversity of genomic sequence is considered to be a
major factor of speciation, in particular involving species-specific
gene families encoding of factors of host-parasite interaction
([102]Frech and Chen, 2011). In the next step we wished to evaluate the
evolutionary role of transcription in the context of genome sequence
diversion. For that, we first estimated the number of non-synonymous
substitutions per non-synonymous site (K[a]) and the number of
synonymous substitutions per synonymous site (K[s]) for the syntenic
orthologous gene groups for each species pair ([103]Fig. 3a). The
distribution of K[a]/K[s] ratio reflects the level of selection
pressure on protein-coding genes throughout the evolution of Plasmodium
parasites species ([104]Fig. 3a), with the rodent malaria parasites
showing a greater difference in the change of protein-coding genes
within the species group. One of the speculated factors of the rodent
malaria parasites evolution is the geographical aspect of adaptation to
their hosts that subsequently led to significant changes in the coding
regions within the rodent species compared to their primate
counterparts. Accordingly, the overall transcriptome in the rodent
Plasmodium species exhibits a greater divergence compared to the
primate-infecting species ([105]Fig. 3b). This suggests broader
diversification of the rodent-infecting species driven by both sequence
diversity and transcriptional divergence. We next assembled a
dendrogram based on non-synonymous K[a] ratio, which reflects the
phylogenetic relationships among the Plasmodium species based on the
sequence homology. Here, P. vivax and P. knowlesi are more closely
related than P. falciparum and P. berghei, which forms a separate
cluster together with P. yoelii and slightly more distant P. chabaudi
([106]Fig. 3c). This is consistent with the previously constructed
phylogenetic topology based on partial mitochondrial genomes
([107]Escalante et al., 1998, [108]Carlton et al., 2008b) as well as
genes encoding surface antigens ([109]Weedall et al., 2008). The
sequence-based phylogenetic tree(s) are different from the
transcriptional differences that is reflective of its mammalian host
lineage (as mentioned above) ([110]Fig. 2c). The sequence-based
relationships seem to be driven by other (than host) factors such as AT
content of the genome (with P. falciparum having a much higher AT
content than P. knowlesi and P. vivax) and possibly reticulocyte
preferences ([111]Craig et al., 2012).