Abstract Pathogen genomic epidemiology has the potential to provide a deep understanding of population dynamics, facilitating strategic planning of interventions, monitoring their impact, and enabling timely responses, and thereby supporting control and elimination efforts of parasitic tropical diseases. Plasmodium vivax, responsible for most malaria cases outside Africa, shows high genetic diversity at the population level, driven by factors like sub‐patent infections, a hidden reservoir of hypnozoites, and early transmission to mosquitoes. While Latin America has made significant progress in controlling Plasmodium falciparum, it faces challenges with residual P. vivax. To characterize genetic diversity and population structure and dynamics, we have analyzed the largest collection of P. vivax genomes to date, including 1474 high‐quality genomes from 31 countries across Asia, Africa, Oceania, and America. While P. vivax shows high genetic diversity globally, Latin American isolates form a distinctive population, which is further divided into sub‐populations and occasional clonal pockets. Genetic diversity within the continent was associated with the intensity of transmission. Population differentiation exists between Central America and the North Coast of South America, vs. the Amazon Basin, with significant gene flow within the Amazon Basin, but limited connectivity between the Northwest Coast and the Amazon Basin. Shared genomic regions in these parasite populations indicate adaptive evolution, particularly in genes related to DNA replication, RNA processing, invasion, and motility – crucial for the parasite's survival in diverse environments. Understanding these population‐level adaptations is crucial for effective control efforts, offering insights into potential mechanisms behind drug resistance, immune evasion, and transmission dynamics. Keywords: genomic epidemiology, natural selection, parasitology, phylogeography, Plasmodium vivax, population dynamics __________________________________________________________________ Studying 1474 high‐quality genomes of P. vivax, the main cause of malaria outside Africa, reveals high global genetic diversity across 31 countries. Latin American P. vivax isolates form a unique population with sub‐populations and genetic adaptations linked to regional adaptation of the parasites to their hosts and different environmental challenges. Understanding these population‐level adaptations is crucial for effective control efforts, providing insights into potential mechanisms behind drug resistance, immune evasion, and transmission dynamics in the fight against parasitic tropical diseases. graphic file with name ECE3-14-e11103-g007.jpg 1. INTRODUCTION In an era characterized by rapid environmental changes, urbanization, and increasing human‐animal interactions, the dynamics of infectious diseases are evolving at an unprecedented pace. Large‐scale programs are dedicated to controlling or eliminating infectious diseases with the greatest global health impact, with many of these efforts focused on neglected tropical diseases (NTDs). While NTDs encompass fungal, viral, and bacterial infections, the majority are caused by parasites, particularly protozoa and helminths. Vector‐borne parasitic diseases such as malaria, trypanosomiasis, leishmaniasis, and filariasis cause the greatest incidence and mortality globally (Cholewiński et al., [40]2015; GBD 2019 Child and Adolescent Communicable Disease Collaborators, [41]2023; Pearce & Tarleton, [42]2014). Effective control of NTDs relies on the ability to monitor changes in pathogen populations, ensuring that interventions stay on track toward elimination goals and enabling targeted resource allocation. However, conventional monitoring techniques face challenges in many disease‐endemic countries, where diagnostic tools are often limited. This task becomes increasingly difficult as disease prevalence decreases. Genomic epidemiology, however, can provide a deep understanding of parasite population dynamics, enabling strategic planning of control interventions, monitoring their effects, and raising alerts if necessary (Kwiatkowski, [43]2015) and hence, support disease eradication efforts by providing actionable knowledge (Cotton et al., [44]2018; Gardy & Loman, [45]2018; Grad & Lipsitch, [46]2014; World Health Organization, [47]2022a). While genetic data are most extensively used for diseases caused by prokaryotes and viruses (Gardy & Loman, [48]2018), phylodynamic tools used in viral and bacterial genomics capture both epidemiological changes and evolutionary history, due to the high mutation rates in these pathogens and measurable genetic changes within the time frame of an outbreak or epidemic (Drummond et al., [49]2003; Duchêne et al., [50]2016; Grenfell et al., [51]2004). However, in pathogens with a lower mutation rate and frequent recombination, such as eukaryotic parasites, inferring transmission events is more challenging (Archie et al., [52]2009; Prugnolle & De Meeûs, [53]2008). The application of genomic epidemiology for these parasitic diseases has lagged, hindered by the complexity of the parasite's life cycle and the greater size of its genome. Genetic diversity is influenced by various factors such as its life history, population dynamics, and recent changes in population size. It is crucial to have a comprehensive understanding of pathogen populations and an accurate assessment of their population structure over time to accurately evaluate the effectiveness of control interventions (Cotton et al., [54]2018). This information allows for a better understanding of inbreeding patterns and gene flow that can inform the development of improved strategies for controlling current populations. While population genetics of several parasite species have been analyzed using microsatellite regions, the rapid innovation and decreasing cost of whole‐genome sequencing make it the ideal tool, since genome‐wide data have more resolution and are more comparable between populations and pathogens, eliminating the need for validated and standardized marker panels. For many key parasitic diseases, essential genomic resources like annotated reference genomes are already available. Genome‐wide data can provide insights into the sudden emergence and spread of new pathogen genotypes, reveal recent strong selection on certain genome regions, and population evolution in response to treatment and control interventions when signs of a significant bottleneck are detected. An example is the identification of emerging drug resistance in the malaria parasite Plasmodium falciparum (Miotto et al., [55]2015). Malaria, caused by Plasmodium parasites, contributes to a very high disease burden with an estimated 247 million malaria cases in 84 malaria‐endemic countries (World Health Organization, [56]2022b). However, in several countries across the world where control efforts have reduced overall malaria cases, there has been an increase in the proportion of Plasmodium vivax (Price et al., [57]2020). Moreover, substantial reductions in P. vivax prevalence over 5–10 years in several locations have not consistently resulted in changes in population structure (Feachem et al., [58]2010; Kattenberg et al., [59]2020; Neafsey et al., [60]2012; Waltmann et al., [61]2018). P. vivax accounts for 18.0% to 71.5% of malaria cases outside Africa, with the highest proportion in the Americas, and this region contributes approximately 0.2% of global malaria cases (World Health Organization, [62]2022b). Venezuela, Colombia, Brazil, and Peru are the top four countries contributing the highest number of cases (79%) in the region (World Health Organization, [63]2022b). In contrast to other co‐endemic regions of the world, P. falciparum is less common except in specific regions like Colombia's Pacific coast (Rodríguez et al., [64]2011). Additionally, Plasmodium malariae infections are under‐detected in the region, despite evidence of their presence, and zoonotic transmission of Plasmodium brasilianum and Plasmodium simium between non‐human primates and humans is a concern (Recht et al., [65]2017). Many countries in Latin America have made strong progress in malaria control, reducing the malaria burden from 1.5 to 0.6 million cases between 2000 and 2021 (World Health Organization, [66]2022b). However, high transmission areas remain predominantly concentrated in the Amazon rainforest regions, disproportionally affecting indigenous and remote communities. In 2021, Venezuela, Colombia, Brazil, and Peru were in the top four countries contributing the most P. vivax cases (79%) in the region (World Health Organization, [67]2022b). Genomic diversity in malaria parasites is generated through a combination of de novo mutations during asexual replication and sexual recombination within the mosquito vector. Plasmodium parasites have a high recombination rate, and frequent infections with multiple genetically distinct clones, especially in the case of P. vivax (Nkhoma et al., [68]2020; Siegel & Rayner, [69]2020). In addition, parasite genomes are polymorphic, with a diversity of phenotypic characteristics that impact disease severity (Neafsey et al., [70]2021). P. vivax often displays a higher genetic diversity than P. falciparum, due to key biological factors including frequent subpatent (i.e., detectable by molecular methods but not by field diagnostics) and asymptomatic infections, along with a hidden reservoir of hypnozoites leading to a larger number of complex infections (Olliaro et al., [71]2016; Sattabongkot et al., [72]2004). The asymptomatic infections and hypnozoites contribute to this parasite's resilience and facilitate its spread and gene flow across large regions, jeopardizing the effectiveness of local and targeted elimination strategies (Angrisano & Robinson, [73]2022; Auburn et al., [74]2021; Ferreira et al., [75]2022). Other factors contributing to the high genetic diversity of P. vivax are its longer history of association with humans, larger effective population size, and fewer population bottlenecks (Cornejo & Escalante, [76]2006; Hupalo et al., [77]2016; Neafsey et al., [78]2012; Noviyanti et al., [79]2015; Rougeron et al., [80]2022). Finally, sexual stages of P. vivax parasites appear early in the infection, facilitating effective transmission to mosquitoes before treatment, even at low‐level parasitemia, making the disease more difficult to eliminate (Bousema & Drakeley, [81]2011; Sattabongkot et al., [82]2004). In Latin America, the analysis of mitochondrial genomes has previously shown that the combined effects of geographical population structure and the relatively low incidence of P. vivax malaria has resulted in patterns of low local but high regional genetic diversity (Taylor et al., [83]2013). In this study, we take a population genomic approach to investigate the spatial temporal dynamics of P. vivax in this region, using genome wide data identified through literature and supplemented with data from our own studies (n = 163). Using high‐resolution genome wide SNP variants (1,477,945 SNPs in the core genome) of these P. vivax isolates, we first compare the Latin American P. vivax genomes (n = 399) to P. vivax genomes from around the world (n = 1075). Next, we investigate the population structure, admixture, relatedness and geneflow, and signatures of positive selection to study local adaptations of the parasites. With this study, we investigate if and how the declining and more heterogenous transmission is impacting P. vivax population structure in this relatively recently expanded population and discuss the factors driving diversity and population structure in this ecologically diverse region. Not only is this informative for malaria control and elimination strategies, but it can also identify targets and key pathways important for P. vivax survival. 2. MATERIALS AND METHODS 2.1. Sequencing data Based on an exhaustive literature search on PubMed, publications describing new P. vivax genomes were identified until October 2022, and the corresponding sequencing data was downloaded from the Sequencing Read Archive (SRA) of NCBI (Adam et al., [84]2022; Auburn et al., [85]2016, [86]2019; Benavente et al., [87]2021; Brashear et al., [88]2020; Buyon et al., [89]2020; Chan et al., [90]2012; Chen et al., [91]2017; Cowell et al., [92]2018; Daron et al., [93]2021; De Meulenaere et al., [94]2023; de Oliveira et al., [95]2017, [96]2020; Delgado‐Ratto et al., [97]2016; Dharia et al., [98]2010; Flannery et al., [99]2015; Hester et al., [100]2013; Hupalo et al., [101]2016; Kattenberg et al., [102]2022; Neafsey et al., [103]2012; Pearson et al., [104]2016; Popovici et al., [105]2018). Additionally, a set of new P. vivax genome sequencing data produced in the context of this study, were added to the list of genomes (n = 163, originating from Peru, Brazil, Vietnam, Eritrea, Ethiopia, Burundi, Mauritania, Somalia, and Sudan), which have been described in more detail in (De Meulenaere et al., [106]2022, [107]2023; Kattenberg et al., [108]2022). Briefly, DNA was extracted from leukocyte‐depleted red blood cells, whole blood or dried blood using the QIAmp DNA Blood Mini Kit (Qiagen, Germany) following the manufacturer's protocol as previously reported (De Meulenaere et al., [109]2022, [110]2023; Kattenberg et al., [111]2022). Parasite species was identified and quantified by a qPCR targeting Pv mtCOX1 (Gruenberg et al., [112]2018) using a standard curve of light microscopy quantified control isolates. Sequencing libraries were generated using Nextera XT DNA Sample Prep Kit (Illumina), or using commercial sequencing services as previously described (De Meulenaere et al., [113]2022, [114]2023; Kattenberg et al., [115]2022). Details of all P. vivax genomes used in this study can be found in Table [116]S1, including metadata and accession numbers. The downloaded genomes contained monkey‐adapted P. vivax strains that were removed from population genetic analyses. 2.2. Ethics Secondary use of all samples for sequencing and analysis of P. vivax isolates was approved through the Institutional Review Board of the Institute of Tropical Medicine Antwerp (protocols 1417/20 and 1345/19), the ethics committee at the University Hospital of Antwerp (protocol B3002020000016 and B300201523588), and Universidad Peruana Cayetano Heredia (UPCH; Lima, Peru) (protocol 101898). 2.3. Variant detection Sequencing reads were first aligned using BWA version 0.7.17 to the human reference genome obtained from the Genome Reference Consortium Human Build 38 patch release 13 (GRCh38.p13). Reads not mapped in proper pairs to the human reference genome were extracted using samtools version 1.10 (flag‐F 2), and subsequently aligned to the P. vivax PvP01 reference genome from PlasmoDB (version 46) using BWA. Duplicate reads were removed with Picard's MarkDuplicates (version 2.22.4). Variant detection was performed using the Genome Analysis ToolKit (GATK) version 4.1.4.1, using in a first step the HaplotypeCaller command in GVCF mode for individual chromosomes. GVCF files were merged using the GenomicsDBImport, followed by genotyping using GenotypeGVCFs, resulting in one vcf file per chromosome. The vcf files were filtered according to GATK best practices: (1) SNPs were filtered out when having a QualByDepth value lower than 2, a variant quality score (QUAL) lower than 30, a StrandOddsRatio (SOR) higher than 3, a FisherStrand (FS) value higher than 60, or Root Mean Square mapping Quality (MQ) lower than 40, (2) Indels were filtered out when having a QualByDepth value lower than 2, a varia quality score lower than 30, FisherStrand (FS) value higher than 200 and ReadPosRankSum value lower than −20. Finally, for most downstream analysis the core genome (14 chromosomes, excluding subtelomeric regions and low‐complexity domains and the apicoplast and mitochondrial sequences) (Pearson et al., [117]2016) was selected using the BCFtools query command and samples with less than 50% of the genome covered at least 5‐fold were excluded from analysis. 2.4. Population structure analysis Principal Component Analysis (PCA) was performed using PLINK software version 2.0 (Chang et al., [118]2015). First, only biallelic SNPs with MAF > 0.005 were selected, and linkage disequilibrium (LD) pruning was performed on the vcf file encompassing all variants in the core genome using PLINK, followed by PCA analysis using the first 20 principal components. PCA results were plotted in R using the ggplot2 library. Starting from the LD pruned dataset, admixture analysis was performed with the ADMIXTURE software version 1.3.0 (Alexander et al., [119]2009). The optimal number of populations was determined by running ADMIXTURE for a range of K‐values (i.e., number of populations) from 2 to 50. This involved a 10‐fold cross‐validation, and selection of the K‐value for the number of populations with the lowest cross‐validation error. Phylogenetic trees were constructed by first converting the vcf file to PHYLIP format using the vcf2phylip.py script (Ortiz, [120]2019). Phylogenetic trees were constructed using RAxML, with P. knowlesi defined as outgroup, using the GTR + G evolutionary model and using a bootstrapping value of 100 (Kozlov et al., [121]2019). The phylogenetic tree was visualized using the ggtree library in R. Nucleotide diversity was determined by sliding across the genome in 500‐bp windows over all LD‐pruned SNPs of the core genome using Vcftools (Danecek et al., [122]2011). The multiplicity of infections was calculated by estimating Wright's inbreeding co‐efficient (F [WS]) as a measure of the within‐host parasite diversity using the getFws command as implemented in the moimix package in R (Lee & Bahlo, [123]2016). Infections with F [WS] ≥ 0.95 were considered to contain clonal (single strain) parasites, while samples with F [WS] < 0.95, indicating within‐host diversity, were considered to contain multiple genetically distinct parasite strains. 2.5. IBD relatedness and selection analysis Shared ancestry and relatedness between isolates was estimated using Identity‐by‐descent (IBD). PED and MAP file formats were created using VCFtools from an LD‐pruned vcf dataset of the full genome (core + (sub)telomeric and low complexity regions of the 14 chromosomes) filtered on a MAF of 0.001 based on the frequency in all included 1474 genomes. IBD‐sharing between pairs of samples, using all 399 samples from LAM, was calculated using the isoRelate package in R, which can analyze IBD in haploid recombining microorganisms in the presence of multiclonal infections (Danecek et al., [124]2011; Henden et al., [125]2018). Genetic distance was calculated using an estimated mean map unit size from Plasmodium chabaudi of 13.7 kb/centimorgan (cM) (Martinelli et al., [126]2005; Rovira‐Vallbona et al., [127]2021). We set the thresholds of IBD at the minimum number of SNPs (n = 20) and length of IBD segments (5000 bp) reported to reduce false‐positive calls using an error of 0.001. IBD has been shown to be superior to probabilistic models such as STRUCTURE for understanding the relatedness and interconnectivity of parasite populations (Henden et al., [128]2018; Taylor et al., [129]2019; Wesolowski et al., [130]2018). Networks of IBD‐sharing (>10% of the genome shared) between individuals were created using the igraph package in R, and the cumulative level of IBD‐sharing between isolates in countries in the network was plotted as a connection map with Scimago graphica (Hassan‐Montero et al., [131]2022) and used as a measure of connectivity between countries. For the samples from Latin America, the proportion of pairs of isolates sharing IBD, as well as significance of IBD‐sharing was calculated using the isoRelate package in R for all samples together and subdivided by population, based on country, as a measure of positive selection. 2.6. Pathway enrichment analysis – GO terms Gene Ontology (GO) categories were sourced from PlasmoDB release 46, with each gene being associated with one or more GO categories. To analyze a list of specific genes, a gene set enrichment analysis was conducted utilizing the hypergeometric distribution, which assesses the statistical significance of the overlap between a gene list and the assigned GO categories based on their respective counts. 3. RESULTS 3.1. P. vivax genomic data summary Based on a literature search including manuscripts published before October 2022, we identified 1311 high‐quality publicly shared P. vivax genomes. Raw sequencing data were downloaded and all genomes were combined, including in‐house sequenced P. vivax genomes (n = 163) samples originating from Peru, Brazil, Vietnam, and imported cases in Belgium from travelers and migrants (De Meulenaere et al., [132]2022, [133]2023; Kattenberg et al., [134]2022). A total of 1474 high‐quality P. vivax genomes (Table [135]S1), coming from 36 countries in Asia (n = 878), Americas (n = 399), and Africa (n = 197), and collected between 2000 and 2019, were retained after removing samples with less than 50% of the genome covered at least 5‐fold (Figure [136]1). The median sequencing coverage over the PvP01 reference genome including only retained isolates was 26‐fold (range 1–763). After alignment and variant calling, a total of 2,435,842 high quality genetic variants were identified (1,983,976 SNPs and 451,866 Indels), with a total of 1,836,935 variants in the core genome region (1,477,945 SNPs and 358,990 indels). FIGURE 1. FIGURE 1 [137]Open in a new tab Origin of Plasmodium vivax genomes per country included in the analysis. Size of the dots are proportional to the number of samples in the genome dataset, and the colors indicate the country. Dots are plotted at the centre of the country (as defined by the ggmap package in R). 3.2. Global population structure Plasmodium vivax genomes were grouped in regional populations (following classifications from Adam et al., [138]2022): Africa (AFR, including isolates from all countries in sub‐Saharan Africa, and returning travelers with history of travel to these countries), Eastern South East Asia (ESEA, including isolates from Cambodia, Laos, Thailand, Vietnam, and the China‐Myanmar border), Latin America (LAM, which includes isolates from Mexico, Central and South America), Middle South East Asia (MSEA, including isolates from Malaysia and The Philippines), Oceania (OCE, including isolates from the island of New Guinea [i.e., Papua New Guinea and part of Indonesia]), Western Asia (WAS, which includes Afghanistan, Bangladesh, India, Iran, Pakistan, and Sri Lanka). To investigate genetic clustering of P. vivax populations in these regions we used the biallelic SNPs as input for PCA and phylogenetic analysis. Both analyses (PCA + tree) reveal the presence of three major clusters consistent with their geographical origin (Figure [139]2a,b). Isolates from ESEA + MSEA form a differentiated cluster in the vicinity of isolates from OCE. Isolates from AFR cluster close to isolates from WAS, however, these two regions are clearly separated in the fourth principal component of the PCA (Figure [140]S1) and form separate clades in the tree (Figure [141]2b). Isolates from LAM form a distinct cluster and clade in the PCA and tree, respectively. Together these results, with the nucleotide diversity (Figure [142]S2), indicates a high genetic diversity within the global P. vivax population as a whole, with a structuring of populations by geographical region. FIGURE 2. FIGURE 2 [143]Open in a new tab Global Plasmodium vivax phylogeny, admixture, and population structure. (a) Principal component analysis based on the LD‐pruned biallelic SNPs using PLINK2, showing the first two principal components. The samples (dots) are colored according to the originating population (here region). (b) Phylogenetic tree based on the LD‐pruned biallelic SNPs using RAxML, with P. knowlesi defined as outgroup. The phylogenetic tree was visualized without the outgroup to improve clarity of the P. vivax branches in the figure. (c) Admixture proportions for K = 10 populations using the ADMIXTURE software, with in the small bar on top the region of origin, (AFR = Africa, ESEA = Eastern South East Asia, LAM = Latin America, MSEA = Middle South East Asia, OCE = Oceania, WAS = Western Asia). Admixture analysis estimated ten (K = 10) geographically distinct ancestral populations (Figure [144]2c). All genomes from AFR, WAS, and OCE were predicted to belong predominantly to a single shared ancestry within each region, while genomes from LAM, ESEA, and MSEA regions, each belong to distinct subpopulations (i.e., ancestral population within a region, Figure [145]2c). Admixture (predicted ancestry to more than one cluster) is mostly observed between subpopulations within a region (e.g., in LAM and ESEA), and rarely between regions, with the exception the admixture observed in AFR with WAS. In the phylogenetic tree, isolates from WAS form two separate clades, with the upper cluster containing isolates from India (Figure [146]2b). This separate subpopulation could not be confirmed in the admixture analysis that estimated one ancestral cluster in this region (Figure [147]2c). Therefore, while Indian isolates might be distinct from other isolates in WAS, all P. vivax isolates from this region share a common ancestry. The highest amount of admixture between isolates is observed between the three subpopulations in LAM (mixed ancestry proportions to K7 and K10 and to a lesser extent K4), indicating a shared ancestry or gene flow between these subpopulations (Figure [148]2c). 3.3. Population structure in Central and South America To investigate shared ancestry of P. vivax in Latin America at a finer geographic resolution, the population genomic analyses were repeated including only isolates from this region (n = 399). Results from both the PCA and the phylogenetic tree indicated clustering on a country level (Figure [149]S3). The high degree of admixture in LAM noted in the global comparison is confirmed in this analysis and constitutes, for a large part, admixed samples within Brazil and admixture between populations from Brazil and Peru (Figure [150]3a). Eleven ancestral clusters (K = 11) within LAM were estimated (Figure [151]3a), and these populations are structured geographically by country or at specific locations within a country (Figure [152]S3). In addition, admixture is observed between isolates from Colombia, Mexico, and Panama with mixed ancestry from multiple populations across LAM. Country specific ancestral populations are seen in Mexico (K7), Panama (K6), Colombia (K5), Brazil (K1 and K9), and Peru (K3 and K11). In addition, some populations are seen in multiple countries, such as isolates from Mexico and Panama that share ancestry with a population predominantly observed in Colombia (K4). While our dataset contains isolates sampled at different time periods, and populations are seen in multiple years (Figure [153]3b), we observed some distinct populations at specific locations, such as the Madre de Dios population (K3) in Peru, the K5 population in Tierralta in Colombia, and isolates from Manaus in Brazil (K9) (Figure [154]S3). FIGURE 3. FIGURE 3 [155]Open in a new tab Spatio‐temporal population dynamics in Latin America. Admixture analysis of Plasmodium vivax samples from LAM, using K = 11 populations. (a) Bar plot with admixture proportions of each sample for each ancestry cluster, with in the small bar on top the country of origin for each sample. (b) Each sample is assigned to one ancestry cluster based on the highest membership probability to that population in the admixture analysis. Pie charts represent the number of samples from each cluster in that country and year. Temporal analysis (Figure [156]3b) shows that the K10 sub‐population that is predominant in Brazil across most years, is later also observed in other countries in the Amazon Basin (2018 in Guyana, and in 2019 in Peru in a region relatively close to the border with Brazil), and in two isolates in Panama from 2007 (Figure [157]S3). Two additional populations are seen in Brazil that are predominant in Peru (K2 and K8). 3.4. Gene flow across LAM The connectivity between P. vivax populations in Latin American countries was assessed by measuring to what extent the parasite populations are genetically related. Pairwise IBD between all samples within and between countries was used as a measure of connectivity and parasite gene flow. From the 93,528 possible pairwise combinations of the 399 isolates from LAM, 1812 pairs of isolates (1.9%) had moderate‐to‐high relatedness (sharing 10–100% of their genome IBD). Among those, 638 pairs had high relatedness (more than 50% IBD, i.e., sibling or clonal pairs). As expected, the majority of the related pairs (sharing 10–100% of their genome IBD) were observed within country (Figure [158]4 and Table [159]1), with observed relatedness between the different ancestral populations previously identified in Brazil (K1, K9, and K10) and Peru (K2, K8, K11) (Figure [160]S4). FIGURE 4. FIGURE 4 [161]Open in a new tab Plasmodium vivax IBD‐based connectivity between countries in Latin America. Connectivity network of inferred IBD between P. vivax samples from Latin American countries. Edges connecting countries are cumulative IBD sharing between parasite pairs with at least 10% of their genomes from those countries (numbers of samples pairs are shown in Table [162]1). 10% IBD‐sharing means that for these parasites at least 10% of their genomes descended from a common ancestor without intervening recombination, indicating distant to close relatedness. Node colors indicate the country of origin of the P. vivax genomes, and nodes were plotted on the map with known latitude and longitude of collection sites by district or if unknown in the respective country's capital (for example, in Guyana). TABLE 1. Amount of Plasmodium vivax sample pairs with IBD (at ≥10% or ≥50% IBD) in pairwise analysis within and between samples from Latin American countries. Country Samples Nr pairs with >50% IBD Nr pairs with >10% IBD Nr of possible pairs % pairs with >50% IBD % pairs with >10% IBD Study references