Abstract
Simple Summary
Pigs are important in agriculture as they produce animal-based protein
for human consumption. The analysis of selection signatures has
implications for the maintenance and utilization of genetic diversity
and can reveal genes associated with phenotypic traits, either as a
result of natural or of artificial selection. Pig populations are
poorly characterised in South Africa. Hence, studies aimed at
evaluating genetic distinctiveness and pig breed diversity will
contribute to developing a rational plan for population conservation
programs among other applications.
Abstract
South Africa boasts a diverse range of pig populations, encompassing
intensively raised commercial breeds, as well as indigenous and village
pigs reared under low-input production systems. The aim of this study
was to investigate how natural and artificial selection have shaped the
genomic landscape of South African pig populations sampled from
different genetic backgrounds and production systems. For this purpose,
the integrated haplotype score (iHS), as well as cross population
extended haplotype homozygosity (XP-EHH) and Lewontin and Krakauer’s
extension of the Fst statistic based on haplotype information (HapFLK)
were utilised. Our results revealed several population-specific
signatures of selection associated with the different production
systems. The importance of natural selection in village populations was
highlighted, as the majority of genomic regions under selection were
identified in these populations. Regions under natural and artificial
selection causing the distinct genetic footprints of these populations
also allow for the identification of genes and pathways that may
influence production and adaptation. In the context of intensively
raised commercial pig breeds (Large White, Kolbroek, and Windsnyer),
the identified regions included quantitative loci (QTLs) associated
with economically important traits. For example, meat and carcass QTLs
were prevalent in all the populations, showing the potential of village
and indigenous populations’ ability to be managed and improved for such
traits. Results of this study therefore increase our understanding of
the intricate interplay between selection pressures, genomic
adaptations, and desirable traits within South African pig populations.
Keywords: genetic signatures, iHS, XP-EHH, HapFLK, pigs, gene
enrichment analyses
1. Introduction
Pigs are one of the most important livestock species worldwide. In
January 2020, the world population of pigs was estimated to be 677.6
million [[34]1]. They are key for livelihoods, food security, and
economic growth, especially in developing countries where they survive
under harsh environments and provide for resource-limited households
[[35]2]. Besides providing proteins for humans, pigs are also used as
model animals for research on human diseases [[36]2].
Wild hog species (also referred to as wild pigs) include the warthog
(Phacochoreus Africanus), pig deer (Babyrousa Babyrussa), and the pygmy
hog (Porcula Salvania), with only wild boar (Sus scrofa) having been
domesticated [[37]3,[38]4]. Changes in the phenotypic characteristics
between domestic and wild pigs are highly noticeable and were driven by
natural and artificial selection [[39]5]. Independent domestication
events from local wild boar in Europe and Asia gave rise to European
and East Asian pigs [[40]6,[41]7]. As a result of strong artificial
selection, there is considerable genetic distance between European and
Asian domestic pigs [[42]3,[43]6,[44]7]. While commercial lines of
European pigs are characterised by an extended body length and lean
growth, East Asian domestic pigs have good fat deposition and high
reproductive performance [[45]8,[46]9,[47]10].
In the absence of a reproductive barrier between East Asian and
European domestic pigs, hybridisation between East Asian and European
and later American pig breeds has been successfully used to increase
pig production [[48]11,[49]12,[50]13]. Previous studies clearly
demonstrated a hybrid origin of the European Large White breed with
Asian pigs [[51]14]. The hybridisation of domesticated pigs with wild
boars on European farms has also been used to increase reproduction and
genetic diversity in inbred commercial pig lines [[52]15]. Village and
smallholder pigs that are farmed predominantly under free-range
production systems allow for gene flow and introgression, since
hybridisation occurs with wild pigs (e.g., warthogs, wild boars and
bush pigs) [[53]16,[54]17]. Although hybridisation between domesticated
and wild pigs can increase production, these events may also have a
negative impact on pig production [[55]18]. For example, it has been
suggested that the outbreak of classical swine fever (CSF) is related
to wild and domestic pigs mixing in free-range production systems
[[56]19,[57]20].
There is sparse and unreliable information with regards to the history
of pig populations in South Africa and other regions of Africa
[[58]21,[59]22,[60]23]. Indigenous breeds most likely originated from
domestic pigs that spread from sub-Saharan Africa to South Africa via
the Nile Corridor [[61]22]. Commercial pig breeds from Europe and
America were also introduced in South Africa for commercial farming in
the 1600s by European settlers [[62]21,[63]22,[64]23]. While the
commercial pig breeds are known for their high performance (e.g.,
litter size, high growth rate, and meat and carcass quality)
[[65]24,[66]25,[67]26], indigenous breeds are well adapted to harsh
South African environmental conditions. For example, the indigenous
Windsnyer has longer black hair and a thinner epidermis for increased
heat tolerance that will shield it against extreme climatic conditions
[[68]27,[69]28]. The positive characteristics of indigenous and local
populations (e.g., heat tolerance and disease resistance) are valuable
and need to be characterised and conserved as they are also important
to the livelihood of subsistence and small-scale farmers.
In 2013, it was estimated that South Africa had 38,500 commercial farms
and 2 million smallholder farmers [[70]29]. While commercial pig
farmers practise controlled breeding and intense artificial selection
for key production traits, small-scale and village farming landscapes
are characterised by poorly organised and indiscriminate crossbreeding
[[71]30]. Commercial farmers mainly use European pig breeds, while pig
farmers in rural areas mainly use indigenous breeds (e.g., Kolbroek and
Windsnyer). However, rural farmers are increasingly shifting away from
indigenous breeds towards the use of commercial exotic breeds
[[72]18,[73]26]. Crossbreeding between exotic and indigenous breeds
have been used to improve performance and production, as well as to
increase tolerance and/or resistance to disease and parasites and
animals that are hardy and adapted to survive under harsh local
conditions [[74]27,[75]31,[76]32,[77]33].
Adaptation and domestication processes, as well as breed development,
can lead to the emergence of signatures of selection in the genomes of
pig populations [[78]34]. Signatures of selection have been identified
in pig populations associated with important traits, such as adaptation
to high altitudes [[79]35], muscle growth [[80]36], and body size
[[81]10]. Genomic sequences of domestic and wild pigs have been
observed to be predominantly similar, except in regions under strong
selection pressure [[82]10]. Various authors have reported on selection
in domestic pigs for disease resistance, tolerance, and productivity
[[83]37,[84]38,[85]39]. Information on selection signatures is valuable
and can be used in management strategies to improve production and
adaptability. A large-scale analysis of the genetic diversity and
structure of South African pig populations relative to global
populations (e.g., pigs from South America, Europe, United States, and
China) [[86]40] points towards a population that has been shaped by
complex evolutionary forces including domestication and continuous
interactions between domestic and wild populations. This includes
natural and artificial selection under the different production systems
as a result of the need to adapt and survive the prevailing climatic
conditions, low-input production systems, and diseases.
Many statistical methods are available to identify selection
signatures. This includes the integrated haplotype score (iHS) that
allows for the detection and characterisation of genomic regions that
have experienced selection within a population [[87]41]. The iHS
identifies regions where a selected allele has risen in frequency
quickly due to positive selection, resulting in longer haplotypes
around the selected variant. Although iHS methods apply statistical
corrections to control for confounding factors such as population
structure, demographic history, or genetic drift, it is still sensitive
to population structure, potentially leading to false positives. The
cross-population extended haplotype homozygosity (XP-EHH) method takes
into account differences between two populations [[88]42]. Both tests
have been shown to have high power in detecting selection signatures
even in small sample sizes [[89]43,[90]44]. Moreover, the XP-EHH
statistic identifies population-specific genetic signatures, by
identifying regions where specific alleles or haplotypes have undergone
recent positive selection, leading to their rapid increase in frequency
in one population but not in another [[91]45]. This comparison helps in
differentiating between regions affected by local adaptation and those
influenced by hitchhiking or a demographic history shared by multiple
populations. The third approach, hapFLK, involves Lewontin and
Krakauer’s extension of the Fst statistic based on haplotype
information [[92]43,[93]46]. This test measures differences in
haplotype frequencies between populations, while accounting for their
hierarchical structure, enabling the capturing of population-specific
genetic signatures, even in scenarios with limited sample sizes
[[94]44,[95]47]. As a result, the hapFLK method is a powerful tool for
identifying signatures of selection, even in the presence of
bottlenecks and migration, while limiting the effects of hitchhiking.
The aim of this research was to identify and characterise genomic
regions that display signatures of natural and artificial selection in
South African pig populations. For this purpose, commercial, village,
indigenous, wild and Vietnamese potbelly pig populations that were
previously genotyped were included [[96]40]. To improve the statistical
power for detecting the selection signatures, we used the iHS, XP-EHH,
and hapFLK approaches. Specifically, the iHS was used to identify and
characterise signatures of selection in each of the populations, while
XP-EHH was used to identify and characterise selection signatures
between different pairs of populations. The hapFLK statistical method
was used to identify and characterise selection signatures between the
multiple populations (i.e., including all the pig populations).
2. Materials and Methods
2.1. Animal Samples, Genotyping, and Quality Control
In total, 234 animals that were previously genotyped were used in this
study [[97]40]. This included 60 pigs from commercial farms represented
by the Large White (LWT), South African Landrace (SAL) and Duroc (DUR)
breeds, 40 indigenous pigs represented by Kolbroek (KOL) and Windsnyer
(WIN) breeds, as well as 91 village and non-descript pig populations.
The latter were obtained from villages in the Eastern Cape (Alfred Nzo,
ALN and O.R. Tambo, ORT) and the Limpopo (Capricorn, CAP and Mopani,
MOP) districts. In addition, 5 Vietnamese Potbelly pigs (VIT) from the
Johannesburg Zoo and 38 wild pigs represented by the warthog (WAT),
wild boar (WBO) and bush pig (BSP) were collected from various game
reserves.
The animals were genotyped using PorcineSNP60 v2 BeadChip (Illumina,
San Diego, CA, USA) containing 62,163 SNPs with an average gap of 43.4
kb [[98]40]. Markers with a call rate lower than 85% and not physically
mapped to the S. scrofa 11.2 genome assembly were discarded using
Golden Helix SNP Variation Suite (SVS) version 8.8.1. Markers with a
minor allele frequency (MAF) lower than 2%, and those that deviated
from the Hardy–Weinberg equilibrium (p-value < 0.0001) were also
excluded. BEAGLE (version 5.1) was used to phase the autosomal genome
using 30 iterations of the phasing algorithm on a 5 Mb chromosomal
region and sample haplotype pairs for each individual per iteration for
all the data sets used.
All of the pig populations ([99]Table 1) were included in the analyses
to detect selection signatures, including the populations consisting of
fewer than 10 individuals (BSP, WBO, and VIT). Based on the population
diversity and structure results [[100]40], the indigenous pig
populations (WIN and KOL) were grouped together (IND, n = 40), the
village populations sampled in Limpopo (MOP and CAP) were grouped
together (LIM, n = 52), and those in the Eastern Cape (ORT and ALN)
were grouped together (EC, n = 39) in the XP-EHH and hapFLK analysis.
Table 1.
Summary of the sampled pig populations.
Category Population Code N
Village Mopani MOP 27
Village Capricorn CAP 25
Village Oliver Reginald Tambo ORT 22
Village Alfred Nzo ALN 17
Commercial Large White LWT 20
Commercial SA Landrace SAL 20
Commercial Duroc DUR 20
Indigenous Kolbroek KOL 20
Indigenous Windsnyer Type WIN 20
Asian Vietnamese Potbelly VIT 5
Wild Wild boar WBO 4
Wild Wild boar WAT 31
Wild Wild boar BSP 3
[101]Open in a new tab
2.2. Detection of Signatures Using iHS
The iHS was used to screen for non-overlapping regions within a
population under positive selection. Plink 1.9 was used to exclude
duplicate SNPs and to recode all genotypes using the --allele1234
script. Plink format map and ped files were converted into the
fastPHASE format using the recode fastphase script. This generated a
fastphase.inp file that was used in fastPHASE software 1.4.8. This
software was then used to estimate missing genotypes and unobserved
haplotypes from unphased data for each chromosome. This then created an
input file, the fastphase_hapguess_switch.out file, which was used to
calculate the iHS. Once phasing was completed, the iHS was calculated
on individual sites for possible signatures using the rehh software in
the R environment. The absolute unstandardised iHS (uniHS) was
identified as the log ratio iHH^A ancestral to the derived iHH^D allele
for each SNP [[102]41]. As the standardised iHS scores are roughly
distributed normally with mean = 0 and standard deviation = 1, regions
with an average iHS score of 3 (three standard deviations above the
mean) or above with at least five SNPs ≤ 100 kb were considered
candidate regions for selection. Manhattan plots were generated in the
R package qqman 2.3.
2.3. Detection of Signatures Using XP-EHH
Selective sweeps between populations were detected using XP-EHH, which
makes it possible to find selected regions using the genetic distance
between adjacent SNPs based on the of EHH model [[103]42]. The
cross-population EHH (XP-EHH) statistic is similar to Rsb (Robertsonian
selection bias) and compares one population to the other using
haplotypes [[104]45]. Sabeti et al. [[105]42] defined XP-EHH as
standardised (unXP-EHH), identified as a mean (unXP-EHH) and standard
deviation (unXP-EHH) for each given SNP. The argument was set to pop1,
identified as
[MATH:
pXP<
mo>−EHHright :MATH]
relative to pop2 as
[MATH:
pXP<
mo>−EHHleft :MATH]
. to find regions associated with each population. XP-EHH used the same
phased file as the iHS did and therefore the iHS was firstly calculated
for each population using the rehh package [[106]47] in R. Regions with
an average XP-EHH score of 3 (three standard deviations above the mean)
or above with at least five SNPs ≤ 100 kb were considered candidate
regions for selection. Manhattan plots were generated in the R package
qqman.
2.4. Detection of Signatures Using HapFLK
To reveal genetic differentiations in genomic regions subjected to
selection from multiple populations, the HapFLK method was employed.
This test accounts for the haplotype structure of the population whilst
using polymorphic SNPs in ancestral populations. Reynolds distances
were calculated using HapFLK 1.3.0 software
([107]https://forge-dga.jouy.inra.fr/projects/hapflk) and then
converted into a kinship matrix with the HapFLK package in RStudio. A
FastPHASE cross-validation procedure was used to determine haplotype
diversity [[108]46]. In total, 20 clusters with 30 maximisation
iterations on a per chromosome basis were used to calculate the HapFLK
statistic. A standard normal distribution was calculated at each SNP
using p-values. Selected regions were identified using a p-value ≤
0.001 [[109]48]. For this study, the indigenous breed Kolbroek was used
as an outgroup.
2.5. Annotation and Function Analyses of Identified Genomic Regions
The candidate regions identified using the three different methods
(i.e., the iHS, XP-EHH, and HapFLK) were annotated for genes,
quantitative trait loci (QTLs) and functional pathways. For this
purpose, BioMart on the Ensembl gene database website was used to
annotate genes at particular genome coordinates for all selected
regions (release 89). A candidate region search and identification was
performed within 1 Mb to the left and right of statistically
significant SNPs. The current pig genome S. scrofa 11.2 assembly was
used to extract gene symbols. Web-based Panther was employed for
functional and pathway enrichment analysis. A false discovery rate
(FDR) < 0.10 was used to assess the significance of enriched pathways.
The pig QTL (Release 46) database was used to align candidate genes to
an available QTL.
3. Results
3.1. Detection of Signatures within a Population Using iHS
After quality check, 27,422 SNPs in total were retained for further
analysis. The iHS method was used to detect a positive selection within
a population, and identified potential genomic regions in all 13
populations included in this study ([110]Figure 1; [111]Table S1:
Supplementary Material File S1).
Figure 1.
[112]Figure 1
[113]Open in a new tab
Manhattan plot of the genome-wide distribution of the selection of
signatures detected via the iHS across the 18 chromosomes (indicated in
different colours) for the village (A–D), indigenous (E,F), commercial
(G–I), Vietnamese potbelly (J), and wild pig (K–M) populations.
The number of regions identified differed greatly between the different
populations ranging from 87 for CAP ([114]Figure 1C) to 4 for BSP
([115]Figure 1L) ([116]Table S2: Supplementary Material File S2). The
low numbers of regions identified in the BSP and VIT populations
([117]Figure 1J,L) likely reflect their small sample sizes. Most
selection regions were identified among the village pigs, which
included the CAP (87 regions), ALN (40 regions), ORT (71 regions), and
MOP (68 regions) populations ([118]Figure 1A–D). Fewer selection
regions were identified within the indigenous pig population (KOL 17;
WIN 22; [119]Figure 1H,I). In contrast to the wild boar population
(WBO, 32 regions) ([120]Figure 1M), fewer regions were identified for
the warthog (WAT, 10 regions) and bush pig (BSP, 4 regions) populations
([121]Figure 1L,M). While a high number of regions were also identified
for the LWT (34 regions) and SAL (31 regions) commercial pig
populations, 12 regions were identified for the DUR population
([122]Figure 1G–I).
The regions displaying significant selection were distributed on
different chromosomes, harbouring genes associated with different
traits ([123]Table S2: Supplementary Material File S2). For example,
the region on chromosome 13 (145.36 Mbp) displaying the strongest
selection signal (iHS score of 24.09) in the WBO population overlapped
with a GHRL gene related to weight gain, as well as several QTLs
associated with the feed conversion ratio, age at slaughter, average
backfat thickness, and average daily gain, amongst others. ([124]Figure
1M; [125]Table 2). Other regions identified using included QTLs
associated with intramuscular fat content that included the regions on
chromosome 1 (ORT and CAP), chromosome 2 (LWT), chromosome 4 (KOL),
chromosome 5 (WBO), chromosome 8 (WIN), chromosome 12 (ORT), and
chromosome 14 (CAP), as well as QTLs associated with the number of
teats that included the regions on chromosome 7 (ALN, DUR), chromosome
8 (ORT), chromosome 16 (MOP), chromosome 14 (CAP), chromosome 2 (LWT,
WBO), and chromosome 14 (LWT) ([126]Table 2).
Table 2.
Within-population list of genomic regions under selection and candidate
genes detected using the iHS method.
Populations Chr Start End Gene QTLs
ALN 2 25247505 25378565 TRIM44 Spinal Curvature
7 95982739 96254303 DPF3 Teat number
8 31921587 32301975 APBB2 Stearic acid content
13 29753308 29775371 PTH1R Front leg conformation, Hind leg
conformation, Hip structure
ORT 1 162364055 162737492 NEDD4L Intramuscular fat content
2 87678811 87866856 DMGDH Litter weight piglets born alive
8 80290715 80669404 NR3C2 Loin muscle, Teat number
12 53979215 54055377 PIK3R5 Intramuscular fat content
12 27957732 28617585 CA10 Hemaglobin
12 44617499 44660271 VTN Body depth, Drip loss, Hind leg conformation,
Hip structure, pH 24 h post-mortem (loin), ph 45 min post-mortem
MOP 4 71045333 71318030 NKAIN3 Body weight (birth)
8 31921587 32301975 APBB2 Stearic acid content
16 27537885 27631130 SELENOP Meat colour a
16 32142034 32148335 PELO Obesity index, Teat number
16 32336292 32437103 ITGA2 Body weight (5 weeks)
CAP 1 183915339 184140122 SAMD4A Intramuscular fat content
12 27416337 27436160 NME1 Conductivity 24 h post-mortem (loin), Cooking
loss, Loin muscle depth, Loin weight
12 27957732 28617585 CA10 Hemoglobin
14 101123381 101254725 LIPA Average daily gain, Front leg weight,
HDL/LDL ratio, Litter size, Loin muscle area, Monounsaturated fatty
acid content, Oleic acid content, Skin thickness, Sperm concentration,
Teat number
14 103992037 104106576 IDE e.g., Abdominal fat percentage, Age at
puberty, Age at slaughter, Average backfat thickness, Average daily
gain, Backfat at first rib, Backfat at last rib, Backfat between 3rd
and 4th last ribs, Body depth, Body height
14 105036770 105044765 RBP4 Litter size, Total number born alive
14 12437515 12563544 EXTL3 Fat androstenone level
8 76482022 76699351 FBXW7 Body mass index
8 31921587 32301975 APBB2 Stearic acid content
LWT 2 108255930 108536881 PAM Intramuscular fat content, Loin
percentage, Maternal infanticide, Teat number
7 26860140 26990017 LRRC1 Meat colour L
9 48120802 48286278 TECTA Backfat between 3rd and 4th last ribs
13 177365717 179013542 ROBO2 Feed efficiency, Linolenic acid content
14 69200215 70938204 CTNNA3 Teat number
DUR 6 137595524 138010444 SLC44A5 Diameter of muscle fibers
7 27389874 27895570 KHDRBS2 Loin muscle area, Loin muscle depth, Teat
number
KOL 4 61628299 61716546 JPH1 Intramuscular fat content
14 79352396 80106258 KCNMA1 Meat colour b
WIN 5 18718158 18723843 TARBP2 Backfat between 3rd and 4th last ribs
8 127731903 128953331 CCSER1 Backfat between 3rd and 4th last ribs,
Intramuscular fat content
VIT 2 79766349 80141293 COL23A1 Front foot size, Hip structure
4 79687359 79847281 PRKDC Feed conversion ratio
6 79849687 79958271 HSPG2 Days to 113 kg, Marbling
14 79352396 80106258 KCNMA1 Meat colour b
WAT 18 31027031 31125465 MDFIC Fat androstenone level
WBO 2 15791451 15819138 F2 e.g., Age at slaughter, Average backfat
thickness, Average daily gain, Backfat at last rib, Backfat at rump,
Backfat thickness between 3rd and 4th rib, Body weight (end of test),
Body weight (weaning)
5 4769801 4849334 SHISAL1 Intramuscular fat content
5 6997444 7014422 POLR3H Fat androstenone level
13 66316436 66452917 GHRL Age at slaughter, Average daily gain, Daily
feed intake, Days to 100 kg, Feed intake, Loin weight, Marbling
14 21754463 22121051 SPOCK3 Fat androstenone level
[127]Open in a new tab
3.2. Detection of Selection of Signatures between Populations Using XP-EHH
Several regions that displayed significant evidence of selection were
detected between pairs of populations using XP-EHH ([128]Figure 2;
[129]Table S3: Supplementary Material File S3). The numbers of regions
displaying significant evidence of selection differed greatly between
the paired populations. A high number of regions were identified
between the commercial population (DUR) paired with the village (EC, 38
regions and LIM, 19 regions), indigenous (IND, 13 regions), and
commercial (LWT, 23 regions) populations. Although a high number of
regions were identified between the warthog population (WAT) paired
with the village (LIM, 34 regions), commercial (LWT, 14 regions), and
indigenous (IND, 10 regions) populations, fewer regions were identified
between the WAT paired with the wild boar (WBO, 5 regions) population.
This was also true for Vietnamese potbelly pigs (VIMs), with a high
number of regions identified between VIM paired with the village (LIM,
14 regions and EC, 10 regions) and indigenous (IND, 11 regions)
populations, while only one region was identified between VIM and the
commercial (LWT, 1 regions) populations and VIMs paired with the wild
boar (WBO, 5 regions) population.
Figure 2.
[130]Figure 2
[131]Open in a new tab
Manhattan plot of the genome-wide distribution of the selection of
signatures between populations detected via XP-EHH across the 18
chromosomes (indicated in different colours) for DUR (A–G), WAT (H–L),
and VIT (M–Q) populations.
Several QTLs and genes occurred in the genomic regions identified using
the XP-EHH method ([132]Table 3, [133]Table S4: Supplementary Material
File S4). The strongest signal (XP-EHH score of 6.91) was observed for
VIT_LIM on chromosome 9 ([134]Figure 2N). Even though this region was
not associated with known QTLs, several regions identified with the
XP-EHH method were linked with QTLs associated with important traits.
For example, the regions on chromosome 1 (166.17 Mbp) and chromosome 6
(80.64 Mbp) identified for the commercial population (DUR) paired with
the village populations (EC and LIM) are linked with QTLs associated
with reproduction, while the region on chromosome 2 (113.82 Mbp)
identified in the commercial population (DUR) paired with the village
population (EC) and commercial population (LWT) is linked with QTLs
associated with meat and carcass quality traits. The region identified
on chromosome 1 (193.82 Mbp) detected in the wild boar population (WBO)
paired with the Vietnamese potbelly pig (VIT) and commercial population
(DUR) is linked to QTLs associated with key reproduction traits such as
litter size, maternal infanticide, plasma droplet rate, semen volume,
sperm concentration, sperm motility, and total number born alive.
Table 3.
Selected regions and candidate genes detected between pairs of
populations using the XP-EHH method.
Populations Chr Start End Gene QTLs
DUR_EC 1 166173135 166310972 ITGA11 Obesity index, Teat number
2 113774412 114206930 FER Abdominal circumference, Average backfat
thickness, Average daily gain, Backfat at last lumbar, Biceps brachii
weight, Body height, Body weight (3 weeks), Carcass weight (hot),
Double-bond index
6 80649143 80843567 EPHB2 Litter weight total
DUR_LIM 1 166173135 166310972 ITGA11 Obesity index, Teat number
6 80649143 80843567 EPHB2 Litter weight total
DUR_IND 18 40820644 41409087 PDE1C Backfat at rump
18 42030510 42046184 GHRHR Backfat at last rump, Carcass length,
Fat-cuts percentage
DUR_LWT 2 113774412 114206930 FER e.g., Abdominal circumference,
Arachidonic acid content, Aspartate aminotransferase activity, Average
backfat thickness, Average daily gain, Backfat at last lumbar, Backfat
at mid-back, Backfat at rump, Backfat at tenth rib
7 27389874 27895570 KHDRBS2 Loin muscle area, Loin muscle depth, Teat
number
7 30708332 30724045 SNRPC Loin muscle area, Loin muscle depth
7 30731461 30802735 UHRF1BP1 Femur length, Hip bone length, Humerus
length, Tibia length, Ulna length
7 30812920 30995361 ANKS1A Femur length, Hip bone length, Humerus
length, Tibia length, Ulna length, Galt score (front), Loin muscle
area, Loin muscle depth
7 31586555 31603617 ARMC12 Facial morphology
7 31722990 31792904 SLC26A8 Facial morphology, Femur length, Humerus
length, Tibia length, Ulna length
14 111834168 111914304 PAX2 Monounsaturated fatty acid to saturated
fatty acid ratio, Oleic acid to stearic acid ratio, Palmitoleic acid to
palmitic acid ratio, Stearic acid content
DUR_VIT 2 72326714 72391648 VAV1 Average daily gain, Backfat between
3rd and 4th last rib, Birth weight variability, Body weight (end of
test), Conductivity 45 min post-mortem, Fat androstenone level,
Intramuscular fat content, Time in feeder per day, pH 24 h postmortem
(ham), pH 45 min postmortem
DUR_WBO 1 193722164 193906565 ESR2 Front leg conformation, Gait score
(overall), Hind leg conformation, Litter size, Maternal infanticide,
Plasma droplet rate, Semen volume, Sperm concentration, Sperm motility,
Total number born alive
WAT_EC 1 254683885 254703225 AMBP Conductivity 24 h post-mortem (loin),
pH 24 h postmortem (ham), pH 24 h post-mortem (loin), pH 45 min
postmortem
3 39957269 39957727 NPW Lean meat percentage
8 37530815 37809761 CORIN Platelet count
8 37797875 37875540 NFXL1 Mean corpuscular hemoglobin content, Mean
corpuscular volume
8 47473787 47601359 RXFP1 Red blood cell count
8 71520275 71554766 PPEF2 Platelet distribution width
8 71573783 71603338 NAAA Platelet distribution width
8 72543620 72679173 SEPTIN11 Teat number
8 73502664 73958083 FRAS1 Teat number
WAT_LIM 1 254683885 254703225 AMBP Conductivity 24 h post-mortem
(loin), pH 24 h postmortem (ham), pH 24 h post-mortem (loin), pH 45 min
postmortem
8 71520275 71554766 PPEF2 Platelet distribution width
8 71573783 71603338 NAAA Platelet distribution width
14 113414264 113429936 PSD Intramuscular fat content, Oleic acid to
stearic acid ratio
14 113464174 113478699 MFSD13A Oleic acid content, Oleic acid to
stearic acid ratio, Stearic acid content
14 113480197 113498740 ACTR1A Oleic acid content, Stearic acid content
9 45400464 45435916 TMPRSS4 Backfat between 3rd and 4th last ribs
WAT_LWT 4 105804586 105845725 CSDE1 Intramuscular fat content
4 105868897 105893771 AMPD1 Juiciness score, Overall impression,
sensory panel, Tenderness score
WAT_WBO 7 89120822 89168270 MAX Meat colour b, Teat number, maximum per
side
VIT_EC 18 29895878 29936233 TES Average daily gain, Backfat between 3rd
and 4th last rib, Birth weight variability, Body weight (end of test),
Conductivity 45 min post-mortem, Fat androstenone level, Intramuscular
fat content, Time in feeder per day, pH 24 h postmortem (ham), pH 45
min postmortem, Teat number
VIT_LIM 2 72326714 72391648 VAV1 Average daily gain, Backfat between
3rd and 4th last rib, Birth weight variability, Body weight (end of
test), Conductivity 45 min post-mortem, Fat androstenone level,
Intramuscular fat content, Time in feeder per day, pH 24 h postmortem
(ham), pH 45 min postmortem
18 29895878 29936233 TES Average daily gain, Backfat between 3rd and
4th last rib, Birth weight variability, Body weight (end of test),
Conductivity 45 min post-mortem, Fat androstenone level, Intramuscular
fat content, Time in feeder per day, pH 24 h postmortem (ham), pH 45
min postmortem, Teat number
VIT_IND 18 29895878 29936233 TES Average daily gain, Backfat between
3rd and 4th last rib, Birth weight variability, Body weight (end of
test), Conductivity 45 min post-mortem, Fat androstenone level,
Intramuscular fat content, Time in feeder per day, pH 24 h postmortem
(ham), pH 45 min postmortem, Teat number
18 31027031 31125465 MDFIC Fat androstenone level
VIT_WBO 1 193722164 193906565 ESR2 Front leg conformation, Gait score
(overall), Hind leg conformation, Litter size, Maternal infanticide,
Plasma droplet rate, Semen volume, Sperm concentration, Sperm motility,
Total number born alive
[135]Open in a new tab
3.3. Detection of Selection of Signatures between Populations Using HapFLK
Across all populations, regions displaying significant (p-value ≤
0.001) evidence of selection were identified on chromosomes 5 and 6
using KOL as an outgroup ([136]Figure 3).
Figure 3.
[137]Figure 3
[138]Open in a new tab
Manhattan plot for signature of selection of South African pig
populations detected via HapFLK across 18 chromosomes (indicated in
different colours).
In total, 5924 segments displaying significant (p-value < 0.10)
evidence of selection were associated with 1179 genes ([139]Table S5:
Supplementary Material File S5). The regions on chromosomes 5 and 6
were linked with QTLs associated with intramuscular fat content, litter
size, number of teats, as well as age at slaughter, meat to fat ratio,
and body weight ([140]Table 4).
Table 4.
Genomic regions under selection detected via HapFLK methods in South
African pigs.
Chr Start End Gene QTLs
5 44381009 44558744 FAR2 Feed conversion ratio
5 56817606 57005114 EPS8 Fat androstenone level
5 64519186 65002098 VWF Litter size
5 66665263 66838922 PRMT8 Teat number
5 28300950 28605120 SRGAP1 Ear area
5 29695839 29863599 MSRB3 Ear area
5 97092883 97148601 SLC6A15 Time in feeder per day
5 33858572 34033939 CCT2 Feed conversion ratio
5 34067992 34218513 MYRFL Feed conversion ratio
5 34660029 34794321 PTPRB Feed conversion ratio
5 36274364 36658703 TRHDE Feed conversion ratio
6 97342474 97429364 GNAL Age at puberty, Arachidic acid content,
Average backfat thickness, Average daily gain, Backfat at last lumbar,
Backfat at last rib, Backfat at rump, Backfat at tenth rib, Body weight
(16 days), Carcass weight (hot), ear area, Feed conversion ratio, Lean
meat percentage, Loin muscle area, Loin muscle depth, Oleic acid
content, Oleic acid to stearic acid ratio, PH for longissmus dorsi,
Stearic acid content, Teat number, Vertebra number, Androstenone
laboratory
6 108115886 108227748 CABLES1 Average daily gain, Backfat at rump
6 108548837 108805693 LAMA3 Average daily gain
6 75696484 75755892 PADI2 Fat androstenone level
6 112396721 112628432 CDH2 Average daily gain, Intramuscular fat
content, Lean meat percentage, Obesity index
6 117631227 118334105 NOL4 Average backfat thickness
6 79849687 79958271 HSPG2 Days to 113 kg, Marbling
6 125890708 126043161 PIK3C3 Average backfat thickness, Average daily
gain, Intramuscular fat content, Loin muscle area
6 80649143 80843567 EPHB2 Litter weight (total)
6 137595524 138010444 SLC44A5 Diameter of muscle fibers
6 43933759 44069330 GPI Average backfat thickness, Body weight (5
weeks), Intramuscular fat content, Osteochondrosis score
6 46442857 46470075 ZNF570 Lean meat percentage
14 47946396 48040822 LIMK2 Fat androstenone level, Melanoma
susceptibility
14 122777130 122828051 ACSL5 Fat androstenone level
14 123343694 123546417 TCF7L2 Carcass weight (hot), Number of visits to
feeder per day
14 133460167 133544018 CHST15 Teat number
14 124760398 125010892 ABLIM1 Fat androstenone level, Intramuscular fat
content
15 100868469 101211242 ANKRD44 Skin thickness
15 101623818 101957516 PLCL1 Skin thickness
15 118335925 118452296 XRCC5 Average backfat thickness, Conductivity 24
h post-mortem (loin), Cooking loss, Fat weight (total), Lean meat
percentage, Loin muscle area, Loin muscle depth, Loin weight, PH for
longissmusdorsi, Subcutanous fat area, pH 24 h postmortem (ham), pH 24
h post-mortem (ham), pH 24 h post-mortem (loin)
15 79935994 80025850 SP3 Cooking loss, Meat colour b, Shear force,
Thawing loss
16 32130235 32304592 ITGA1 Obesity index, Teat number
16 32336292 32437103 ITGA2 Body weight (5 weeks)
16 33126494 33565542 ARL15 Loin muscle area
18 34006688 34906268 IMMP2L Age at puberty
18 40820644 41409087 PDE1C Backfat at rump
18 42030510 42046184 GHRHR Backfat at last rib, Carcass length,
Fat-cuts percentage
18 51387836 51802945 HECW1 Teat number
[141]Open in a new tab
3.4. Genes Identified Using Different Signatures of Selection Methods
The iHS, XP-EHH, and HapFLK methods allowed the detection of the same
genomic regions on chromosomes 5 and 6. The iHS method detected the
region on chromosome 5 in the ORT, CAP, WIN, KOL, LWT, and MOP
populations, while the region on chromosome 6 was detected in DUR, ALN,
ORT, and CAP. The XP-EHH method detected the region on chromosome 5
between the DUR_WBO pairing, while the method detected the region on
chromosome 6 between the DUR_EC, DUR_WAT, and DUR_LIM pairings.
The region on chromosome 5 detected in the DUR_WBO pairing encoded
NECAP1 and KCNJ3 genes. The GO terms reported for DUR_WBO included the
regulation of ion transmembrane transport, the clathrin vesicle coat,
voltage-gated potassium channel activity, the plasma membrane,
vesicle-mediated transport, ligand-gated ion channel activity, and
potassium transmembrane transport ([142]Table S6: Supplementary
Material File S6). These regions also included genes linked to
important signalling pathways, namely the G-protein signalling pathway,
the GABA-B_receptor_II_signalling pathway, and the muscarinic
acetylcholine receptor 2 and 4 signalling pathway ([143]Table S7:
Supplementary Material File S7).
The genes located in the region on chromosome 6 included EPHB2,
EPB41L3, METTL4, EPHA8, LYPLA2, FUCA1, PNRC2, SRSF10, MYOM3, SLC16A12,
PANK1, and PCGF5. It also included important GO terms linked to the
cellular response to follicle-stimulating hormone stimuli, the fucose
metabolic process, the regulation of cell growth, growth factor
binding, carboxylic ester hydrolase activity, and palmitoyl-(protein)
hydrolase activity ([144]Table S6: Supplementary Material File S6).
Important pathways were found to be related to the dopamine
receptor-mediated signalling pathway and Coenzyme A biosynthesis
([145]Table S7: Supplementary Material File S7).
4. Discussion
To date, this is the first study identifying signatures of selection in
South African pig populations from different genetic backgrounds. We
included animals from commercial farms and villages, as well as
indigenous and wild roaming pigs. The adaptation footprints across
these genomic landscapes were evaluated using within and
cross-population selection statistics. Although these methods accounted
for small population samples, their statistical power was diminished by
the small sample sizes used for bush pig, wild boar and Vietnamese
potbelly pig populations [[146]49]. Nevertheless, a number of genomic
regions containing significant evidence of population-specific
selection signatures were detected in the case of of wild boar, which
we explored further in this study.
The population genomic approach utilised in this study allowed for the
identification of genomic regions under natural selection, such as the
indigenous Kolbroek and Windsyner, as well as in the wild boar.
Specifically, the region on chromosome 5 included a putative KCNJ3
gene, which may be associated with an udder structure in cattle that is
typically important for production efficiencies as well as animal
health and welfare [[147]50]. This region also encoded genes involved
in signalling pathways such as muscarinic acetylcholine receptors that
are G protein-coupled receptors (GPCRs) playing a key role in
regulating many fundamental functions (e.g., motor control, temperature
control, control of inflammation, cell growth, and cell proliferation,
as well as control of the airways, gastrointestinal and urinary tracts,
cardiovascular system, the central nervous system, and eye) [[148]51].
The region on chromosome 13 under natural selection in the wild boar
population encoded a putative GHRL gene known to regulate growth and
development in pigs [[149]52,[150]53]. The identification of these
regions thus provides an opportunity to elucidate the genetic basis of
the adaptive evolution of local wild and indigenous pig populations in
the future including larger sample sizes.
Signatures of selection identified in the commercial pig populations
included regions associated with traits such as meat and carcass
quality. This is expected, as the Large White, Durocs and South African
Landrace pigs are bred for meat production [[151]5,[152]10]. Because of
this strong artificial selection and because the internal mechanism is
the selection of genes, genes in these regions associated with meat and
carcass quality included CORIN on chromosome 8, TMPRRSS4 on chromosome
9, SLC44A5 on chromosome 6, APBB2 on chromosome 8, TECTA on chromosome
9, LIPA and IDE on chromosome 14, and ITGA2 on chromosome 16. The DECR1
gene on chromosome 4 is associated with cholesterol levels amongst
other meat quality and growth traits. Regions associated with meat and
carcass quality were also identified among the indigenous breeds. For
example, the indigenous Kolbroek and Windsnyer breeds included the JPH1
gene observed on chromosome 4, which has previously been linked to meat
and carcass quality in pigs [[153]54,[154]55]. Furthermore, Hoffman et
al. [[155]56] observed that meat from Kolbroek pigs can be processed
into bacon, ham, and chops. This shows that indigenous breeds can also
be identified with traits despite their slow growth rate.
Among the genomic regions displaying signatures of selection, some were
associated with fatness, an important economic trait in pig farming
[[156]57]. For example, ITGA11 on chromosome 1 is associated with an
obesity index that determines fat deposition in pigs and other animals
[[157]40]. The genomic regions (chromosomes 5 and 6) identified with
all three the statistics were linked with QTLs associated with
intra-muscular fat content, meat to fat ratio, and body weight. Regions
that are associated with excess fat deposition when fed improved diets
[[158]36,[159]58] present an opportunity to genetically improve meat
quality in these breeds. A study by Jung et al. [[160]59] and Ren et
al. [[161]60] observed that consumers preferred lean pork with high
intramuscular fat content. As a result, commercial breeds displayed
lower fat levels compared to European and Chinese breeds
[[162]61,[163]62]. While commercial breeds (e.g., Large White, Duroc,
and Landrace) have low levels of fat tissue, European breeds (e.g.,
Iberian and Mangalica pigs) and Chinese breeds are predisposed to
accumulate excess amounts of adipose tissue [[164]62,[165]63,[166]64].
Hoffman et al. [[167]65] reported consumers’ preference towards meat
with a higher lean percentage. Wild boars have low intra-muscular fat
and are categorised under game meat that has high protein and iron and
that is considered healthier than ordinary pork or beef meat
[[168]66,[169]67,[170]68]. Since pig breeds vary when it comes to fat
tissue deposition with heritability levels being around 0.5
[[171]61,[172]62], obesity indices and intra-muscular fat can be used
as potential tools for selecting animals with desirable meat and
carcass qualities. For example, SCPEP1 identified in this study
regulates body fat content and correlates with intra-muscular fat
deposition in pigs [[173]69].
Genomic regions displaying signatures of selection were associated with
reproduction traits such as litter size and total number born alive
from a sow, semen volume, sperm concentration, sperm motility, etc. For
example, PIK3R5 is one of the genes identified in the O.R. Tambo
population that influences litter size at birth and the number of
piglets born alive [[174]70], which is important as pigs differ greatly
in litter size. For example, the wild boar sows an average if 6.6
litters [[175]71] per year versus an average of 14 to 15.3 litters per
sow in Large White breeds [[176]72,[177]73], while indigenous breeds
such as Kolbroek average at 810 piglets [[178]74]. Nowadays, the pig
industry in Europe has been yielding 18–20 litters per sow [[179]75].
This high litter number has a negative implication on the physiological
tolerance for both sows and litters. The good mothering ability and
hardiness of sows ensure high survival rates for the litters.
Commercial breeds have an advantage being raised in the intensive
production system. Several genomic regions contain QTLs associated with
the number of teats on chromosomes 1, 2, 5, 6, 14, 16, and 18. The
number of teats is an important trait as it ensures that piglets have
adequate access to milk from the sow. The number of teats can have
effects on the weaning weight of a piglet and a smaller number of teats
in a sow reduces piglets’ survival rate [[180]76]. In commercial breeds
such as Large White and Duroc, a sow can have as many as 19 teats
[[181]77]. Makhanya [[182]78] reported the number of teats to be an
average of 10 in indigenous Kolbroek pigs. Various studies have shown
that the number of teats is an essential morphological and reproductive
trait that has been under selection for many generations in the pig
industry [[183]77,[184]79].
A high number of regions that display significant selection were
detected in the South African village pig populations. This is similar
to what was previously seen for cattle, where Van Hossou et al.
[[185]80] also reported a higher number of selection signatures in
admixed West African cattle populations in Benin. The presence of more
selection signatures in village pig populations compared to that in
other populations can be attributed to several factors. One possible
explanation is that genetic diversity may provide a broader pool of
genetic variants for selection to act upon, resulting in a higher
number of selection signatures. Several genes related to health and
resistance to parasites were identified in the village populations,
which is well in line with the sturdy nature of this breed. This
included the APBB2 gene present in regions under selection in the
Alfred Nzo, Mopani, and Capricorn populations, which was shown to
regulate inflammatory responses during infection with porcine
reproductive and respiratory syndrome virus, which is a major
respiratory pathogen of pigs [[186]71]. The LIPA gene under selection
in the Capricorn population may be involved in the response to wounds
and inflammations, as well as in the molecular genetic mechanisms
affecting fecundity in sheep [[187]72]. Village pigs are well adaptable
to local harsh conditions, and this makes them important genetic
resources that provide new diversity for the improvement of commercial
lines.
Another explanation for the high number of regions is the admixture
between village pigs and commercial pigs, which could allow for an
improvement in economic traits such as reproduction, growth, and
carcass traits among village pigs. Crossbreeding with commercial pigs
has allowed for the introduction of genetic variants that are
advantageous for these traits in not only village pigs but also
indigenous pigs. For example, regions displaying selection signatures
included genes for meat and carcass quality in pigs in the village
(e.g., SCPEP1 and SAMD4A) and indigenous (e.g., JPH1) populations
[[188]54,[189]55,[190]69]. The admixed genomes that result from the
interbreeding of previously isolated populations can carry genetic
signatures that resemble signals of positive selection. Therefore, the
possibility that some of the genomic selection signatures identified
here stem from historical admixture (i.e., they represent the “ghosts”
of introgression) and not recent adaptive events could not be
discounted [[191]81]. Although further research would be needed to
distinguish these types of signatures in all of the populations
examined, the genetic remnants of past genetic exchange in admixed
genomes may represent a valuable source of variation for further
selection and/or adaptation [[192]81].
5. Conclusions
This study identified several regions displaying significant signatures
of selection, which are the result of natural and artificial
directional selection events that have contributed to the adaptation of
breeds to different environments and production systems of these pig
populations. These signatures of selection allowed for the
identification of the genomic regions and evolutionary processes that
have shaped the populations and affect important phenotypic traits.
These included traits related to reproduction, production, health, and
meat and carcass quality. Meat and carcass QTLs were prevalent in all
the populations, showing the potential of village and indigenous
populations’ ability to be managed and improved for such traits. Our
findings also confirm that genetic resources from villages and wild
pigs are important for research as they are not influenced by selection
when compared to commercial breeds. Additionally, as BeadChip, used in
this study, may not be dense enough to fully understand the signatures
between domestic and wild pigs, further research based on larger
population sizes is required.
Acknowledgments