Abstract

   Pathogen genomic epidemiology has the potential to provide a deep
   understanding of population dynamics, facilitating strategic planning
   of interventions, monitoring their impact, and enabling timely
   responses, and thereby supporting control and elimination efforts of
   parasitic tropical diseases. Plasmodium vivax, responsible for most
   malaria cases outside Africa, shows high genetic diversity at the
   population level, driven by factors like sub‐patent infections, a
   hidden reservoir of hypnozoites, and early transmission to mosquitoes.
   While Latin America has made significant progress in controlling
   Plasmodium falciparum, it faces challenges with residual P. vivax. To
   characterize genetic diversity and population structure and dynamics,
   we have analyzed the largest collection of P. vivax genomes to date,
   including 1474 high‐quality genomes from 31 countries across Asia,
   Africa, Oceania, and America. While P. vivax shows high genetic
   diversity globally, Latin American isolates form a distinctive
   population, which is further divided into sub‐populations and
   occasional clonal pockets. Genetic diversity within the continent was
   associated with the intensity of transmission. Population
   differentiation exists between Central America and the North Coast of
   South America, vs. the Amazon Basin, with significant gene flow within
   the Amazon Basin, but limited connectivity between the Northwest Coast
   and the Amazon Basin. Shared genomic regions in these parasite
   populations indicate adaptive evolution, particularly in genes related
   to DNA replication, RNA processing, invasion, and motility – crucial
   for the parasite's survival in diverse environments. Understanding
   these population‐level adaptations is crucial for effective control
   efforts, offering insights into potential mechanisms behind drug
   resistance, immune evasion, and transmission dynamics.

   Keywords: genomic epidemiology, natural selection, parasitology,
   phylogeography, Plasmodium vivax, population dynamics
     __________________________________________________________________

   Studying 1474 high‐quality genomes of P. vivax, the main cause of
   malaria outside Africa, reveals high global genetic diversity across 31
   countries. Latin American P. vivax isolates form a unique population
   with sub‐populations and genetic adaptations linked to regional
   adaptation of the parasites to their hosts and different environmental
   challenges. Understanding these population‐level adaptations is crucial
   for effective control efforts, providing insights into potential
   mechanisms behind drug resistance, immune evasion, and transmission
   dynamics in the fight against parasitic tropical diseases.

   graphic file with name ECE3-14-e11103-g007.jpg

1. INTRODUCTION

   In an era characterized by rapid environmental changes, urbanization,
   and increasing human‐animal interactions, the dynamics of infectious
   diseases are evolving at an unprecedented pace. Large‐scale programs
   are dedicated to controlling or eliminating infectious diseases with
   the greatest global health impact, with many of these efforts focused
   on neglected tropical diseases (NTDs). While NTDs encompass fungal,
   viral, and bacterial infections, the majority are caused by parasites,
   particularly protozoa and helminths. Vector‐borne parasitic diseases
   such as malaria, trypanosomiasis, leishmaniasis, and filariasis cause
   the greatest incidence and mortality globally (Cholewiński
   et al., [40]2015; GBD 2019 Child and Adolescent Communicable Disease
   Collaborators, [41]2023; Pearce & Tarleton, [42]2014).

   Effective control of NTDs relies on the ability to monitor changes in
   pathogen populations, ensuring that interventions stay on track toward
   elimination goals and enabling targeted resource allocation. However,
   conventional monitoring techniques face challenges in many
   disease‐endemic countries, where diagnostic tools are often limited.
   This task becomes increasingly difficult as disease prevalence
   decreases. Genomic epidemiology, however, can provide a deep
   understanding of parasite population dynamics, enabling strategic
   planning of control interventions, monitoring their effects, and
   raising alerts if necessary (Kwiatkowski, [43]2015) and hence, support
   disease eradication efforts by providing actionable knowledge (Cotton
   et al., [44]2018; Gardy & Loman, [45]2018; Grad & Lipsitch, [46]2014;
   World Health Organization, [47]2022a).

   While genetic data are most extensively used for diseases caused by
   prokaryotes and viruses (Gardy & Loman, [48]2018), phylodynamic tools
   used in viral and bacterial genomics capture both epidemiological
   changes and evolutionary history, due to the high mutation rates in
   these pathogens and measurable genetic changes within the time frame of
   an outbreak or epidemic (Drummond et al., [49]2003; Duchêne
   et al., [50]2016; Grenfell et al., [51]2004). However, in pathogens
   with a lower mutation rate and frequent recombination, such as
   eukaryotic parasites, inferring transmission events is more challenging
   (Archie et al., [52]2009; Prugnolle & De Meeûs, [53]2008). The
   application of genomic epidemiology for these parasitic diseases has
   lagged, hindered by the complexity of the parasite's life cycle and the
   greater size of its genome. Genetic diversity is influenced by various
   factors such as its life history, population dynamics, and recent
   changes in population size. It is crucial to have a comprehensive
   understanding of pathogen populations and an accurate assessment of
   their population structure over time to accurately evaluate the
   effectiveness of control interventions (Cotton et al., [54]2018). This
   information allows for a better understanding of inbreeding patterns
   and gene flow that can inform the development of improved strategies
   for controlling current populations.

   While population genetics of several parasite species have been
   analyzed using microsatellite regions, the rapid innovation and
   decreasing cost of whole‐genome sequencing make it the ideal tool,
   since genome‐wide data have more resolution and are more comparable
   between populations and pathogens, eliminating the need for validated
   and standardized marker panels. For many key parasitic diseases,
   essential genomic resources like annotated reference genomes are
   already available. Genome‐wide data can provide insights into the
   sudden emergence and spread of new pathogen genotypes, reveal recent
   strong selection on certain genome regions, and population evolution in
   response to treatment and control interventions when signs of a
   significant bottleneck are detected. An example is the identification
   of emerging drug resistance in the malaria parasite Plasmodium
   falciparum (Miotto et al., [55]2015).

   Malaria, caused by Plasmodium parasites, contributes to a very high
   disease burden with an estimated 247 million malaria cases in 84
   malaria‐endemic countries (World Health Organization, [56]2022b).
   However, in several countries across the world where control efforts
   have reduced overall malaria cases, there has been an increase in the
   proportion of Plasmodium vivax (Price et al., [57]2020). Moreover,
   substantial reductions in P. vivax prevalence over 5–10 years in
   several locations have not consistently resulted in changes in
   population structure (Feachem et al., [58]2010; Kattenberg
   et al., [59]2020; Neafsey et al., [60]2012; Waltmann et al., [61]2018).
   P. vivax accounts for 18.0% to 71.5% of malaria cases outside Africa,
   with the highest proportion in the Americas, and this region
   contributes approximately 0.2% of global malaria cases (World Health
   Organization, [62]2022b). Venezuela, Colombia, Brazil, and Peru are the
   top four countries contributing the highest number of cases (79%) in
   the region (World Health Organization, [63]2022b). In contrast to other
   co‐endemic regions of the world, P. falciparum is less common except in
   specific regions like Colombia's Pacific coast (Rodríguez
   et al., [64]2011). Additionally, Plasmodium malariae infections are
   under‐detected in the region, despite evidence of their presence, and
   zoonotic transmission of Plasmodium brasilianum and Plasmodium simium
   between non‐human primates and humans is a concern (Recht
   et al., [65]2017). Many countries in Latin America have made strong
   progress in malaria control, reducing the malaria burden from 1.5 to
   0.6 million cases between 2000 and 2021 (World Health
   Organization, [66]2022b). However, high transmission areas remain
   predominantly concentrated in the Amazon rainforest regions,
   disproportionally affecting indigenous and remote communities. In 2021,
   Venezuela, Colombia, Brazil, and Peru were in the top four countries
   contributing the most P. vivax cases (79%) in the region (World Health
   Organization, [67]2022b).

   Genomic diversity in malaria parasites is generated through a
   combination of de novo mutations during asexual replication and sexual
   recombination within the mosquito vector. Plasmodium parasites have a
   high recombination rate, and frequent infections with multiple
   genetically distinct clones, especially in the case of P. vivax (Nkhoma
   et al., [68]2020; Siegel & Rayner, [69]2020). In addition, parasite
   genomes are polymorphic, with a diversity of phenotypic characteristics
   that impact disease severity (Neafsey et al., [70]2021). P. vivax often
   displays a higher genetic diversity than P. falciparum, due to key
   biological factors including frequent subpatent (i.e., detectable by
   molecular methods but not by field diagnostics) and asymptomatic
   infections, along with a hidden reservoir of hypnozoites leading to a
   larger number of complex infections (Olliaro et al., [71]2016;
   Sattabongkot et al., [72]2004). The asymptomatic infections and
   hypnozoites contribute to this parasite's resilience and facilitate its
   spread and gene flow across large regions, jeopardizing the
   effectiveness of local and targeted elimination strategies (Angrisano &
   Robinson, [73]2022; Auburn et al., [74]2021; Ferreira
   et al., [75]2022). Other factors contributing to the high genetic
   diversity of P. vivax are its longer history of association with
   humans, larger effective population size, and fewer population
   bottlenecks (Cornejo & Escalante, [76]2006; Hupalo et al., [77]2016;
   Neafsey et al., [78]2012; Noviyanti et al., [79]2015; Rougeron
   et al., [80]2022). Finally, sexual stages of P. vivax parasites appear
   early in the infection, facilitating effective transmission to
   mosquitoes before treatment, even at low‐level parasitemia, making the
   disease more difficult to eliminate (Bousema & Drakeley, [81]2011;
   Sattabongkot et al., [82]2004).

   In Latin America, the analysis of mitochondrial genomes has previously
   shown that the combined effects of geographical population structure
   and the relatively low incidence of P. vivax malaria has resulted in
   patterns of low local but high regional genetic diversity (Taylor
   et al., [83]2013). In this study, we take a population genomic approach
   to investigate the spatial temporal dynamics of P. vivax in this
   region, using genome wide data identified through literature and
   supplemented with data from our own studies (n = 163). Using
   high‐resolution genome wide SNP variants (1,477,945 SNPs in the core
   genome) of these P. vivax isolates, we first compare the Latin American
   P. vivax genomes (n = 399) to P. vivax genomes from around the world
   (n = 1075). Next, we investigate the population structure, admixture,
   relatedness and geneflow, and signatures of positive selection to study
   local adaptations of the parasites. With this study, we investigate if
   and how the declining and more heterogenous transmission is impacting
   P. vivax population structure in this relatively recently expanded
   population and discuss the factors driving diversity and population
   structure in this ecologically diverse region. Not only is this
   informative for malaria control and elimination strategies, but it can
   also identify targets and key pathways important for P. vivax survival.

2. MATERIALS AND METHODS

2.1. Sequencing data

   Based on an exhaustive literature search on PubMed, publications
   describing new P. vivax genomes were identified until October 2022, and
   the corresponding sequencing data was downloaded from the Sequencing
   Read Archive (SRA) of NCBI (Adam et al., [84]2022; Auburn
   et al., [85]2016, [86]2019; Benavente et al., [87]2021; Brashear
   et al., [88]2020; Buyon et al., [89]2020; Chan et al., [90]2012; Chen
   et al., [91]2017; Cowell et al., [92]2018; Daron et al., [93]2021; De
   Meulenaere et al., [94]2023; de Oliveira et al., [95]2017, [96]2020;
   Delgado‐Ratto et al., [97]2016; Dharia et al., [98]2010; Flannery
   et al., [99]2015; Hester et al., [100]2013; Hupalo et al., [101]2016;
   Kattenberg et al., [102]2022; Neafsey et al., [103]2012; Pearson
   et al., [104]2016; Popovici et al., [105]2018). Additionally, a set of
   new P. vivax genome sequencing data produced in the context of this
   study, were added to the list of genomes (n = 163, originating from
   Peru, Brazil, Vietnam, Eritrea, Ethiopia, Burundi, Mauritania, Somalia,
   and Sudan), which have been described in more detail in (De Meulenaere
   et al., [106]2022, [107]2023; Kattenberg et al., [108]2022). Briefly,
   DNA was extracted from leukocyte‐depleted red blood cells, whole blood
   or dried blood using the QIAmp DNA Blood Mini Kit (Qiagen, Germany)
   following the manufacturer's protocol as previously reported (De
   Meulenaere et al., [109]2022, [110]2023; Kattenberg et al., [111]2022).
   Parasite species was identified and quantified by a qPCR targeting Pv
   mtCOX1 (Gruenberg et al., [112]2018) using a standard curve of light
   microscopy quantified control isolates. Sequencing libraries were
   generated using Nextera XT DNA Sample Prep Kit (Illumina), or using
   commercial sequencing services as previously described (De Meulenaere
   et al., [113]2022, [114]2023; Kattenberg et al., [115]2022). Details of
   all P. vivax genomes used in this study can be found in Table [116]S1,
   including metadata and accession numbers. The downloaded genomes
   contained monkey‐adapted P. vivax strains that were removed from
   population genetic analyses.

2.2. Ethics

   Secondary use of all samples for sequencing and analysis of P. vivax
   isolates was approved through the Institutional Review Board of the
   Institute of Tropical Medicine Antwerp (protocols 1417/20 and 1345/19),
   the ethics committee at the University Hospital of Antwerp (protocol
   B3002020000016 and B300201523588), and Universidad Peruana Cayetano
   Heredia (UPCH; Lima, Peru) (protocol 101898).

2.3. Variant detection

   Sequencing reads were first aligned using BWA version 0.7.17 to the
   human reference genome obtained from the Genome Reference Consortium
   Human Build 38 patch release 13 (GRCh38.p13). Reads not mapped in
   proper pairs to the human reference genome were extracted using
   samtools version 1.10 (flag‐F 2), and subsequently aligned to the P.
   vivax PvP01 reference genome from PlasmoDB (version 46) using BWA.
   Duplicate reads were removed with Picard's MarkDuplicates (version
   2.22.4). Variant detection was performed using the Genome Analysis
   ToolKit (GATK) version 4.1.4.1, using in a first step the
   HaplotypeCaller command in GVCF mode for individual chromosomes. GVCF
   files were merged using the GenomicsDBImport, followed by genotyping
   using GenotypeGVCFs, resulting in one vcf file per chromosome. The vcf
   files were filtered according to GATK best practices: (1) SNPs were
   filtered out when having a QualByDepth value lower than 2, a variant
   quality score (QUAL) lower than 30, a StrandOddsRatio (SOR) higher than
   3, a FisherStrand (FS) value higher than 60, or Root Mean Square
   mapping Quality (MQ) lower than 40, (2) Indels were filtered out when
   having a QualByDepth value lower than 2, a varia quality score lower
   than 30, FisherStrand (FS) value higher than 200 and ReadPosRankSum
   value lower than −20. Finally, for most downstream analysis the core
   genome (14 chromosomes, excluding subtelomeric regions and
   low‐complexity domains and the apicoplast and mitochondrial sequences)
   (Pearson et al., [117]2016) was selected using the BCFtools query
   command and samples with less than 50% of the genome covered at least
   5‐fold were excluded from analysis.

2.4. Population structure analysis

   Principal Component Analysis (PCA) was performed using PLINK software
   version 2.0 (Chang et al., [118]2015). First, only biallelic SNPs with
   MAF > 0.005 were selected, and linkage disequilibrium (LD) pruning was
   performed on the vcf file encompassing all variants in the core genome
   using PLINK, followed by PCA analysis using the first 20 principal
   components. PCA results were plotted in R using the ggplot2 library.
   Starting from the LD pruned dataset, admixture analysis was performed
   with the ADMIXTURE software version 1.3.0 (Alexander
   et al., [119]2009). The optimal number of populations was determined by
   running ADMIXTURE for a range of K‐values (i.e., number of populations)
   from 2 to 50. This involved a 10‐fold cross‐validation, and selection
   of the K‐value for the number of populations with the lowest
   cross‐validation error. Phylogenetic trees were constructed by first
   converting the vcf file to PHYLIP format using the vcf2phylip.py script
   (Ortiz, [120]2019). Phylogenetic trees were constructed using RAxML,
   with P. knowlesi defined as outgroup, using the GTR + G evolutionary
   model and using a bootstrapping value of 100 (Kozlov
   et al., [121]2019). The phylogenetic tree was visualized using the
   ggtree library in R. Nucleotide diversity was determined by sliding
   across the genome in 500‐bp windows over all LD‐pruned SNPs of the core
   genome using Vcftools (Danecek et al., [122]2011). The multiplicity of
   infections was calculated by estimating Wright's inbreeding
   co‐efficient (F [WS]) as a measure of the within‐host parasite
   diversity using the getFws command as implemented in the moimix package
   in R (Lee & Bahlo, [123]2016). Infections with F [WS] ≥ 0.95 were
   considered to contain clonal (single strain) parasites, while samples
   with F [WS] < 0.95, indicating within‐host diversity, were considered
   to contain multiple genetically distinct parasite strains.

2.5. IBD relatedness and selection analysis

   Shared ancestry and relatedness between isolates was estimated using
   Identity‐by‐descent (IBD). PED and MAP file formats were created using
   VCFtools from an LD‐pruned vcf dataset of the full genome (core +
   (sub)telomeric and low complexity regions of the 14 chromosomes)
   filtered on a MAF of 0.001 based on the frequency in all included 1474
   genomes. IBD‐sharing between pairs of samples, using all 399 samples
   from LAM, was calculated using the isoRelate package in R, which can
   analyze IBD in haploid recombining microorganisms in the presence of
   multiclonal infections (Danecek et al., [124]2011; Henden
   et al., [125]2018). Genetic distance was calculated using an estimated
   mean map unit size from Plasmodium chabaudi of 13.7 kb/centimorgan (cM)
   (Martinelli et al., [126]2005; Rovira‐Vallbona et al., [127]2021). We
   set the thresholds of IBD at the minimum number of SNPs (n = 20) and
   length of IBD segments (5000 bp) reported to reduce false‐positive
   calls using an error of 0.001. IBD has been shown to be superior to
   probabilistic models such as STRUCTURE for understanding the
   relatedness and interconnectivity of parasite populations (Henden
   et al., [128]2018; Taylor et al., [129]2019; Wesolowski
   et al., [130]2018). Networks of IBD‐sharing (>10% of the genome shared)
   between individuals were created using the igraph package in R, and the
   cumulative level of IBD‐sharing between isolates in countries in the
   network was plotted as a connection map with Scimago graphica
   (Hassan‐Montero et al., [131]2022) and used as a measure of
   connectivity between countries.

   For the samples from Latin America, the proportion of pairs of isolates
   sharing IBD, as well as significance of IBD‐sharing was calculated
   using the isoRelate package in R for all samples together and
   subdivided by population, based on country, as a measure of positive
   selection.

2.6. Pathway enrichment analysis – GO terms

   Gene Ontology (GO) categories were sourced from PlasmoDB release 46,
   with each gene being associated with one or more GO categories. To
   analyze a list of specific genes, a gene set enrichment analysis was
   conducted utilizing the hypergeometric distribution, which assesses the
   statistical significance of the overlap between a gene list and the
   assigned GO categories based on their respective counts.

3. RESULTS

3.1. P. vivax genomic data summary

   Based on a literature search including manuscripts published before
   October 2022, we identified 1311 high‐quality publicly shared P. vivax
   genomes. Raw sequencing data were downloaded and all genomes were
   combined, including in‐house sequenced P. vivax genomes (n = 163)
   samples originating from Peru, Brazil, Vietnam, and imported cases in
   Belgium from travelers and migrants (De Meulenaere et al., [132]2022,
   [133]2023; Kattenberg et al., [134]2022).

   A total of 1474 high‐quality P. vivax genomes (Table [135]S1), coming
   from 36 countries in Asia (n = 878), Americas (n = 399), and Africa
   (n = 197), and collected between 2000 and 2019, were retained after
   removing samples with less than 50% of the genome covered at least
   5‐fold (Figure [136]1). The median sequencing coverage over the PvP01
   reference genome including only retained isolates was 26‐fold (range
   1–763). After alignment and variant calling, a total of 2,435,842 high
   quality genetic variants were identified (1,983,976 SNPs and 451,866
   Indels), with a total of 1,836,935 variants in the core genome region
   (1,477,945 SNPs and 358,990 indels).

FIGURE 1.

   FIGURE 1
   [137]Open in a new tab

   Origin of Plasmodium vivax genomes per country included in the
   analysis. Size of the dots are proportional to the number of samples in
   the genome dataset, and the colors indicate the country. Dots are
   plotted at the centre of the country (as defined by the ggmap package
   in R).

3.2. Global population structure

   Plasmodium vivax genomes were grouped in regional populations
   (following classifications from Adam et al., [138]2022): Africa (AFR,
   including isolates from all countries in sub‐Saharan Africa, and
   returning travelers with history of travel to these countries), Eastern
   South East Asia (ESEA, including isolates from Cambodia, Laos,
   Thailand, Vietnam, and the China‐Myanmar border), Latin America (LAM,
   which includes isolates from Mexico, Central and South America), Middle
   South East Asia (MSEA, including isolates from Malaysia and The
   Philippines), Oceania (OCE, including isolates from the island of New
   Guinea [i.e., Papua New Guinea and part of Indonesia]), Western Asia
   (WAS, which includes Afghanistan, Bangladesh, India, Iran, Pakistan,
   and Sri Lanka). To investigate genetic clustering of P. vivax
   populations in these regions we used the biallelic SNPs as input for
   PCA and phylogenetic analysis. Both analyses (PCA + tree) reveal the
   presence of three major clusters consistent with their geographical
   origin (Figure [139]2a,b). Isolates from ESEA + MSEA form a
   differentiated cluster in the vicinity of isolates from OCE. Isolates
   from AFR cluster close to isolates from WAS, however, these two regions
   are clearly separated in the fourth principal component of the PCA
   (Figure [140]S1) and form separate clades in the tree (Figure [141]2b).
   Isolates from LAM form a distinct cluster and clade in the PCA and
   tree, respectively. Together these results, with the nucleotide
   diversity (Figure [142]S2), indicates a high genetic diversity within
   the global P. vivax population as a whole, with a structuring of
   populations by geographical region.

FIGURE 2.

   FIGURE 2
   [143]Open in a new tab

   Global Plasmodium vivax phylogeny, admixture, and population structure.
   (a) Principal component analysis based on the LD‐pruned biallelic SNPs
   using PLINK2, showing the first two principal components. The samples
   (dots) are colored according to the originating population (here
   region). (b) Phylogenetic tree based on the LD‐pruned biallelic SNPs
   using RAxML, with P. knowlesi defined as outgroup. The phylogenetic
   tree was visualized without the outgroup to improve clarity of the P.
   vivax branches in the figure. (c) Admixture proportions for K = 10
   populations using the ADMIXTURE software, with in the small bar on top
   the region of origin, (AFR = Africa, ESEA = Eastern South East Asia,
   LAM = Latin America, MSEA = Middle South East Asia, OCE = Oceania,
   WAS = Western Asia).

   Admixture analysis estimated ten (K = 10) geographically distinct
   ancestral populations (Figure [144]2c). All genomes from AFR, WAS, and
   OCE were predicted to belong predominantly to a single shared ancestry
   within each region, while genomes from LAM, ESEA, and MSEA regions,
   each belong to distinct subpopulations (i.e., ancestral population
   within a region, Figure [145]2c). Admixture (predicted ancestry to more
   than one cluster) is mostly observed between subpopulations within a
   region (e.g., in LAM and ESEA), and rarely between regions, with the
   exception the admixture observed in AFR with WAS.

   In the phylogenetic tree, isolates from WAS form two separate clades,
   with the upper cluster containing isolates from India (Figure [146]2b).
   This separate subpopulation could not be confirmed in the admixture
   analysis that estimated one ancestral cluster in this region
   (Figure [147]2c). Therefore, while Indian isolates might be distinct
   from other isolates in WAS, all P. vivax isolates from this region
   share a common ancestry. The highest amount of admixture between
   isolates is observed between the three subpopulations in LAM (mixed
   ancestry proportions to K7 and K10 and to a lesser extent K4),
   indicating a shared ancestry or gene flow between these subpopulations
   (Figure [148]2c).

3.3. Population structure in Central and South America

   To investigate shared ancestry of P. vivax in Latin America at a finer
   geographic resolution, the population genomic analyses were repeated
   including only isolates from this region (n = 399). Results from both
   the PCA and the phylogenetic tree indicated clustering on a country
   level (Figure [149]S3).

   The high degree of admixture in LAM noted in the global comparison is
   confirmed in this analysis and constitutes, for a large part, admixed
   samples within Brazil and admixture between populations from Brazil and
   Peru (Figure [150]3a). Eleven ancestral clusters (K = 11) within LAM
   were estimated (Figure [151]3a), and these populations are structured
   geographically by country or at specific locations within a country
   (Figure [152]S3). In addition, admixture is observed between isolates
   from Colombia, Mexico, and Panama with mixed ancestry from multiple
   populations across LAM. Country specific ancestral populations are seen
   in Mexico (K7), Panama (K6), Colombia (K5), Brazil (K1 and K9), and
   Peru (K3 and K11). In addition, some populations are seen in multiple
   countries, such as isolates from Mexico and Panama that share ancestry
   with a population predominantly observed in Colombia (K4). While our
   dataset contains isolates sampled at different time periods, and
   populations are seen in multiple years (Figure [153]3b), we observed
   some distinct populations at specific locations, such as the Madre de
   Dios population (K3) in Peru, the K5 population in Tierralta in
   Colombia, and isolates from Manaus in Brazil (K9) (Figure [154]S3).

FIGURE 3.

   FIGURE 3
   [155]Open in a new tab

   Spatio‐temporal population dynamics in Latin America. Admixture
   analysis of Plasmodium vivax samples from LAM, using K = 11
   populations. (a) Bar plot with admixture proportions of each sample for
   each ancestry cluster, with in the small bar on top the country of
   origin for each sample. (b) Each sample is assigned to one ancestry
   cluster based on the highest membership probability to that population
   in the admixture analysis. Pie charts represent the number of samples
   from each cluster in that country and year.

   Temporal analysis (Figure [156]3b) shows that the K10 sub‐population
   that is predominant in Brazil across most years, is later also observed
   in other countries in the Amazon Basin (2018 in Guyana, and in 2019 in
   Peru in a region relatively close to the border with Brazil), and in
   two isolates in Panama from 2007 (Figure [157]S3). Two additional
   populations are seen in Brazil that are predominant in Peru (K2 and
   K8).

3.4. Gene flow across LAM

   The connectivity between P. vivax populations in Latin American
   countries was assessed by measuring to what extent the parasite
   populations are genetically related. Pairwise IBD between all samples
   within and between countries was used as a measure of connectivity and
   parasite gene flow. From the 93,528 possible pairwise combinations of
   the 399 isolates from LAM, 1812 pairs of isolates (1.9%) had
   moderate‐to‐high relatedness (sharing 10–100% of their genome IBD).
   Among those, 638 pairs had high relatedness (more than 50% IBD, i.e.,
   sibling or clonal pairs).

   As expected, the majority of the related pairs (sharing 10–100% of
   their genome IBD) were observed within country (Figure [158]4 and
   Table [159]1), with observed relatedness between the different
   ancestral populations previously identified in Brazil (K1, K9, and K10)
   and Peru (K2, K8, K11) (Figure [160]S4).

FIGURE 4.

   FIGURE 4
   [161]Open in a new tab

   Plasmodium vivax IBD‐based connectivity between countries in Latin
   America. Connectivity network of inferred IBD between P. vivax samples
   from Latin American countries. Edges connecting countries are
   cumulative IBD sharing between parasite pairs with at least 10% of
   their genomes from those countries (numbers of samples pairs are shown
   in Table [162]1). 10% IBD‐sharing means that for these parasites at
   least 10% of their genomes descended from a common ancestor without
   intervening recombination, indicating distant to close relatedness.
   Node colors indicate the country of origin of the P. vivax genomes, and
   nodes were plotted on the map with known latitude and longitude of
   collection sites by district or if unknown in the respective country's
   capital (for example, in Guyana).

TABLE 1.

   Amount of Plasmodium vivax sample pairs with IBD (at ≥10% or ≥50% IBD)
   in pairwise analysis within and between samples from Latin American
   countries.
   Country Samples Nr pairs with >50% IBD Nr pairs with >10% IBD Nr of
   possible pairs % pairs with >50% IBD % pairs with >10% IBD Study
   references