Abstract
Background
   The cattle introduced by European conquerors during the Brazilian
   colonization period were exposed to a process of natural selection in
   different types of biomes throughout the country, leading to the
   development of locally adapted cattle breeds. In this study,
   whole-genome re-sequencing data from indicine and Brazilian locally
   adapted taurine cattle breeds were used to detect genomic regions under
   selective pressure. Within-population and cross-population statistics
   were combined separately in a single score using the de-correlated
   composite of multiple signals (DCMS) method. Putative sweep regions
   were revealed by assessing the top 1% of the empirical distribution
   generated by the DCMS statistics.
Results
   A total of 33,328,447 biallelic SNPs with an average read depth of
   12.4X passed the hard filtering process and were used to access
   putative sweep regions. Admixture has occurred in some locally adapted
   taurine populations due to the introgression of exotic breeds. The
   genomic inbreeding coefficient based on runs of homozygosity (ROH)
   concurred with the populations’ historical background. Signatures of
   selection retrieved from the DCMS statistics provided a comprehensive
   set of putative candidate genes and revealed QTLs disclosing cattle
   production traits and adaptation to the challenging environments.
   Additionally, several candidate regions overlapped with previous
   regions under selection described in the literature for other cattle
   breeds.
Conclusion
   The current study reported putative sweep regions that can provide
   important insights to better understand the selective forces shaping
   the genome of the indicine and Brazilian locally adapted taurine cattle
   breeds. Such regions likely harbor traces of natural selection
   pressures by which these populations have been exposed and may
   elucidate footprints for adaptation to the challenging climatic
   conditions.
   Keywords: Bos taurus indicus, Bos taurus taurus, Signatures of
   selection, Local adaptation, Next-generation sequencing
Background
   The first cattle herds were brought to Brazil by Portuguese conquerors
   in 1534 during the Brazilian colonization period [[43]1]. These cattle
   have undergone to a process of natural selection for more than
   450 years in a wide range of ecosystems throughout the country [[44]2].
   Natural selection in a remarkably diverse set of environments together
   with recurring events of breed admixture led to the development of
   locally adapted cattle breeds, i.e. Curraleiro Pé-Duro, Pantaneiro,
   Crioulo Lageano, Caracu, and Mocho Nacional [[45]3]. By the end of the
   nineteenth century, the increasing demand for food supply triggered the
   imports of exotic and more productive breeds of indicine origin [[46]3,
   [47]4]. As a consequence, a reduction in locally adapted cattle breed
   populations has occurred to such an extent that nowadays, most of them
   are threatened with extinction [[48]3, [49]5].
   Brazilian locally adapted cattle breeds have been subjected to strong
   environmental pressures and faced several difficulties including hot,
   dry or humid tropical climate conditions, scarce food availability,
   diseases, and parasite infestations without any significant selective
   pressure imposed by man [[50]2]. Influenced by the environment and
   shaped by natural selection, these animals acquired very particular
   traits to thrive in distinct ecosystems, which has presumably left
   detectable signatures of selection within their genomes. In this
   regard, Brazilian locally adapted cattle breeds represent an important
   genetic resource for the understanding of the role of natural selection
   in diverse environments, providing new insights into the genetic
   mechanisms inherent to adaptation and survivorship [[51]6]. Although
   their productivity is much lower compared to highly-specialized breeds
   under intensive production systems [[52]7, [53]8], great efforts have
   been made to improve our knowledge of locally adapted breeds [[54]5,
   [55]9, [56]10] and their use in crossbred schemes.
   According to Utsunomiya et al. [[57]11], signatures of selection
   studies should strongly focus on small local breeds given their
   endangered status and the putative importance of their genomes in
   unraveling footprints of selection by elucidating genes and structural
   variants underlying phenotypic variation. Advances in molecular
   genetics and statistical methodologies together with the availability
   of whole-genome re-sequencing has notably improved the accuracy to
   disentangle the effects of natural and artificial selection in the
   genome of livestock [[58]12–[59]14]. However, despite the recent
   achievements in high-throughput sequencing, studies to detect positive
   selection in endangered Brazilian locally adapted cattle breeds are
   incipient. Previous studies on such breeds have mainly focused on
   population structure and genetic diversity using Random Amplified
   Polymorphic DNA (RAPD), pedigree data, microsatellite, and
   Single-Nucleotide Polymorphism (SNP) arrays [[60]15–[61]19].
   In this study, we report for the first-time signatures of selection
   derived from whole-genome re-sequencing data in three Brazilian locally
   adapted taurine cattle breeds as well as in one indicine breed.
   Potential biological functions of the genes screened within the
   putative candidate regions were also examined to better elucidate the
   phenotypic variation related to adaptation shaped by natural selection.
Results
Data
   DNA samples from 13 Gir (GIR), 12 Caracu Caldeano (CAR), 12 Crioulo
   Lageano (CRL), and 12 Pantaneiro (PAN) re-sequenced to 15X genome
   coverage were used. An average alignment rate of 99.59% was obtained.
   After SNP calling and filtering, a total of 33,328,447 SNPs distributed
   across all 29 autosomes were retained for subsequent analyses with an
   average read depth of 12.37X (9.57 ~ 17.52X).
Variant annotation and enrichment
   Of the total SNPs identified (n = 33,328,447 SNPs), most of them were
   located in intergenic (67.17%) and intronic (25.85%) regions
   (Additional file [62]1). A total of 1,065,515 (3.19%) variants were
   located in the 5-kb regions upstream from genes, and 928,061 (2.78%) in
   the 5-kb regions downstream from genes. Several variants with high
   consequence on protein sequence were identified, including splice
   acceptor variant (n = 471), splice donor variant (n = 481), stop gained
   (n = 1111, stop lost (n = 58), and start lost (n = 208). According to
   SIFT scores, 24,159 variants (23,428 missense, 578 splice region, and
   143 start lost) were classified as deleterious.
   Following variant annotation, we further investigated the gene content
   within the predicted variants to cause relevant biological functions. A
   total of 1189 genes were described within variants with high
   consequence on protein sequence and 7373 genes within those causing a
   deleterious mutation based on the SIFT score. Functional enrichment
   analysis revealed several gene ontology (GO) terms and one Kyoto
   Encyclopedia of Genes and Genomes (KEGG) pathway overrepresented
   (p < 0.01) for the set of genes previously described
   (Additional files [63]2 and [64]3), however, none of them have been
   associated with the traits/phenotypes that could be affected by the
   natural selection which those breeds have been subjected to.
Population structure
   The population structure among breeds was dissected by analyzing the
   first two principal components, which accounted for roughly 20% of the
   genetic variability and divided the populations into three clusters
   (Fig. [65]1a). A clear separation could be observed between indicine
   (Bos taurus indicus) and locally adapted taurine (Bos taurus taurus)
   populations. Within the taurine populations, the greatest overlap of
   genetic variation was observed between CRL and PAN breeds. Despite
   clustering together, the analysis of molecular variance (AMOVA)
   revealed genetic differentiation between those two breeds (p < 0.001,
   Additional file [66]4), indicating that all four breeds could be
   considered as genetically independent entities. Further, when analyzing
   the first two principal components encompassing the locally adapted
   taurine cattle breeds (Fig. [67]1b), an evident separation could be
   observed between CAR and the remaining two populations. The analysis
   also distinguished CRL from PAN, agreeing with the AMOVA results.
Fig. 1.
   [68]Fig. 1
   [69]Open in a new tab
   Principal components analysis (PCA) scores plot with variance explained
   by the first two principal components in brackets. a PCA scores for the
   four breeds (Caracu Caldeano – CAR, Crioulo Lageano – CRL, Gir – GIR,
   and Pantaneiro - PAN. b PCA scores for the locally adapted taurine
   cattle breeds (Caracu Caldeano – CAR, Crioulo Lageano – CRL, and
   Pantaneiro – PAN)
   Admixture analysis was performed to further estimate the proportions of
   ancestry (K) in each population (Fig. [70]2). The lowest
   cross-validation error (0.387) was observed for K = 2, revealing the
   presence of two main clusters differentiating the locally adapted
   taurine populations from the indicine population. Within the taurine
   populations, the CAR breed did not show admixed ancestry while CRL and
   PAN breeds showed 77% of taurine and 23% of indicine ancestry on
   average. When K = 3 was assumed, CRL samples revealed evidence of
   admixed ancestry from other breeds, whereas PAN samples were quite
   homogeneous, with little indication of introgression from other breeds.
   CAR and GIR breeds displayed a greater uniformity and did not reveal
   major signs of admixture of other breeds, being consistent with K = 2.
Fig. 2.
   [71]Fig. 2
   [72]Open in a new tab
   Population structure inferred by using the ADMIXTURE software. Each
   sample is denoted by a single vertical bar partitioned into K colors
   according to its proportion of ancestry in each of the clusters.
   Ancestral contributions for K = 2 and K = 3 are graphically represented
Genomic inbreeding
   Descriptive statistics for runs of homozygosity-based inbreeding
   coefficients (F[ROH]) are shown in Table [73]1. The average inbreeding
   coefficients did not differ significantly (p < 0.05) among breeds, with
   the exception of CAR animals. It is worth to highlight that these
   animals also displayed the smallest inbreeding variability among all
   breeds, supported by the lowest coefficient of variation.
Table 1.
   Descriptive statistics of runs of homozygosity-based inbreeding
   coefficient (F[ROH]) for Gir (GIR), Crioulo Lageano (CRL), Caracu
   Caldeano (CAR), and Pantaneiro (PAN) cattle breeds
        Breed        Mean  Median Minimum Maximum Coefficient of variation (%)
   Gir             0.040^b 0.038  0.020   0.060   29.37
   Crioulo Lageano 0.036^b 0.028  0.017   0.082   53.69
   Caracu Caldeano 0.138^a 0.140  0.121   0.153   8.63
   Pantaneiro      0.045^b 0.042  0.022   0.096   43.56
   [74]Open in a new tab
   Means sharing a common letter within a column were not significantly
   different (p < 0.05) from one another
Selective sweeps
   A total of 499 putative sweep regions encompassing 221 genes were
   identified from the top 1% of the empirical distribution generated by
   the within-population de-correlated composite of multiple signals
   (DCMS) statistic [[75]20] (Fig. [76]3, Additional file [77]5). For the
   cross-population DCMS statistic, the top 1% of the empirical
   distribution revealed 503 putative sweep regions comprehending 242
   genes (Additional file [78]6). The Bos taurus autosome (BTA) 3
   displayed the highest number of putative sweep regions for the
   within-population DCMS statistic (n = 33), while BTA11 did for the
   cross-population DCMS statistic (n = 67). The functional importance of
   the annotated genes was assessed by performing GO and KEGG pathway
   enrichment analysis separately for each DCMS statistic and its
   respective retrieved gene list. No overall significant enrichment of
   any particular GO nor KEGG was found after adjusting the p-values for
   False Discovery Rate [[79]21].
Fig. 3.
   [80]Fig. 3
   [81]Open in a new tab
   Whole-genome signatures of selection for the within-population DCMS
   statistic (outer circle) and cross-population DCMS statistic (inner
   circle). The x-axis shows the window position along the chromosome, and
   the y-axis the DCMS value associated with such window. Reds dots
   correspond to the top 1% of the empirical distribution generated by the
   DCMS statistics
   Five genomic regions overlapped between the candidate sweep regions of
   the within-population and cross-population DCMS statistics
   (BTA4:101600000–101,650,000, BTA5:3700000–3,750,000,
   BTA9:98650000–98,700,000, BTA11:22300000–22,350,000, and
   BTA11:53900000–53,950,000). When inspecting in detail, the region on
   BTA4:101600000–101,650,000 harbored two quantitative trait locus (QTL)
   with functions related to the bovine respiratory disease [[82]22] and
   body condition score [[83]23]. The remaining four regions have not been
   associated with any QTL in cattle so far, however, they were found to
   be in close vicinity (~ 15 to 237 kb) with specific QTLs for beef
   cattle production traits. Such QTLs included body weight at yearling,
   calving ease, body weight gain, and marbling score [[84]24–[85]26].
   Further, among the five overlapping candidates sweep regions, only the
   one on BTA9 was found to harbor a gene, the PRKN.
Selective sweeps and runs of homozygosity
   Shared genomic regions harboring several protein-coding genes were
   identified between runs of homozygosity (ROH) hotspots and the putative
   sweep regions retrieved from the DCMS statistics (Table [86]2). ROH
   hotspots for each breed are described in Additional file [87]7. For the
   shared regions disclosed when considering the within-population DCMS
   statistic, the ones located on BTA1:8300000–8,350,000 and
   BTA1:41600000–41,650,000 coincided with a QTL for somatic cell score
   [[88]27] and maturity rate [[89]28], respectively. It is noteworthy to
   underscore that despite not displaying any overlapping QTL, the region
   on BTA8:15700224–15,700,228 was described nearby (~ 99 kb) a QTL for
   tick resistance [[90]29], and those on BTA21:6550000–6,600,000 and
   BTA21:63250000–63,300,000 were very close (< 14 kb) to QTLs for
   reproductive-related traits [[91]30, [92]31]. When considering the
   cross-population DCMS statistic, the candidate regions overlapped
   previously identified QTLs formerly implicated in dairy-related
   [[93]35–[94]37, [95]39] and body-related (weight [[96]24], energy
   content [[97]34], and conformation [[98]35]) traits. Further, several
   QTLs associated with body conformation and growth [[99]23, [100]24,
   [101]33], reproductive-related traits [[102]28, [103]32], and coat
   texture [[104]38] were described to be in very close proximity (~ 18.98
   to 88.38 kb).
Table 2.
   Gene annotation and reported QTLs for the shared genomic regions
   between runs of homozygosity (ROH) hotspots and the putative sweep
   regions retrieved from the within-population and cross-populations DCMS
   statistics
   BTA^1 Start End Genes QTL^2
   Within-population DCMS statistic x ROH
    1 8,300,000 8,350,000 – Somatic cell score [[105]27]
    1 41,600,000 41,650,000 EPHA6, ARL6 Maturity rate [[106]28]
    1 112,250,000 112,300,000 KCNAB1 –
    8 15,800,000 15,850,000 – Tick resistance [[107]29]
    15 35,365,655 35,399,999 OTOG –
    15 35,400,001 35,450,000 – –
    18 34,718,675 34,750,000 CDH16, RRAD –
    21 6,550,000 6,600,000 ADAMTS17 Calving ease [[108]30]
    21 63,250,000 63,300,000 VRK1 Interval to first estrus after calving
   [[109]31]
   Cross-population DCMS statistic x ROH
    3 77,250,000 77,300,000 – Body condition score [[110]23]
    5 31,800,000 31,850,000 – Body weight (yearling) [[111]24], Conception
   rate [[112]32]
    5 38,761,637 38,761,745 YAF2
    7 57,050,000 57,100,000 – Rump angle [[113]33]
    11 67,450,000 67,500,000 ANTXR1, GFPT1 Body weight (yearling)
   [[114]24], Body energy content [[115]34]
    11 67,700,000 67,749,999 – –
    11 67,750,001 67,800,000 NFU1 –
    11 68,550,000 68,600,000 PCYOX1 –
    14 52,900,000 52,914,848 – Maturity rate [[116]28]
    15 10,150,000 10,200,000 – –
    15 10,900,000 10,950,000 – Calving ease (maternal) [[117]35], Daughter
   pregnancy rate [[118]35], Foot angle [[119]35], Milk fat percentage
   [[120]35], Milk fat yield [[121]35], Net merit [[122]35], Length of
   productive life [[123]35], Milk protein percentage [[124]35], Milk
   protein yield [[125]35], Calving ease [[126]35], Somatic cell score
   [[127]35]
    20 38,000,000 38,050,000 RANBP3L, NADK2 Milk protein percentage
   [[128]36], Milk protein yield [[129]37], Milk yield [[130]37], Coat
   texture [[131]38]
    21 200,000 250,000 – –
    25 1,345,564 1,350,000 NME3, MRPS34 Milk fat yield [[132]39]
   [133]Open in a new tab
   ^1 BTA: Bos taurus autosome; ^2 QTLs within the candidate genomic
   regions are highlighted in bold. Non-bold QTLs were the closest and
   most suitable candidate QTL for the given candidate region
Overlap with candidate regions under positive selection in other cattle
populations
   Several putative sweep regions identified from the top 1% of the
   empirical distribution generated by the within-population and
   cross-population DCMS statistics were in agreement with previous
   research on signatures of selection in cattle (Additional files [134]8
   and [135]9, respectively). Such studies included indigenous African and
   Spanish [[136]6, [137]40–[138]43], native [[139]44–[140]46],
   tropical-adapted [[141]6, [142]47–[143]49], Chinese [[144]49, [145]50],
   and commercial beef and dairy [[146]13, [147]41, [148]49,
   [149]51–[150]54] cattle breeds. For the five genomic regions identified
   overlapping in between the DCMS statistics, the one on
   BTA9:98650000–98,700,000 matched with a previous study on cattle breeds
   selected for dairy production [[151]54]. Besides, common signals found
   between ROH hotspots and the within-population and cross-population
   DCMS statistics were also supported by previously published data on
   signatures of selection [[152]6, [153]41, [154]43, [155]44, [156]46,
   [157]50, [158]53] (Additional files [159]10 and [160]11, respectively).
Discussion
Population structure
   The segregation between indicine and taurine cattle populations
   described in both principal component and admixture analysis (K = 2)
   reflects the divergence and evolutionary process started roughly two
   million years ago [[161]55, [162]56]. As a result of the domestication
   process and selective breeding over time, the cattle can be classified
   into temperate (Bos taurus taurus or taurine) and tropical (Bos taurus
   indicus or indicine) based on the common adaptive and evolutionary
   traits they have acquired [[163]57]. Within the Brazilian locally
   adapted taurine breeds, the principal component analysis (PCA)
   indicates the highest relatedness between CRL and PAN breeds and their
   divergence from the CAR breed may be explained by the European cattle
   type introduced in Brazil during the colonization period [[164]58].
   These results were similar to those obtained using RAPD [[165]17] and
   microsatellites [[166]19]. Portuguese purebred cattle brought to Brazil
   belonged to three different bloodlines: Bos taurus aquitanicus, Bos
   taurus batavicus, and Bos taurus ibericus. In this regard, CRL and PAN
   breeds descended from a common ancestral pool and have their origin in
   breeds from Bos taurus ibericus cattle, while the CAR cattle is derived
   from the Bos taurus aquitanicus cattle [[167]17]. Further, the
   divergence within the locally adapted cattle breeds may be a result of
   artificial selection events over time since the CAR cattle have been
   selected for milk production for the past 100 years, while CRL and PAN
   started recently to be artificially selected.
   Levels of introgression of indicine genes in taurine breeds described
   herein are consistent with previous studies on Brazilian locally
   adapted taurine breeds [[168]16, [169]17, [170]19]. This gene flow
   reinforces the concept that the import of exotic breeds at the
   beginning of the twentieth century [[171]3] led to the miscegenation of
   the locally adapted breeds due to crossbreeding practices, resulting
   nearly in their extinction [[172]4]. In this regard, the CRL breed
   experienced some introduction of Nellore (Bos taurus indicus) genes for
   a short period in the eighties [[173]17], which can be visualized when
   assuming K = 2 and K = 3. Concurring with our findings, Egito et al.
   [[174]19] also revealed that CRL and PAN animals were the closest to
   the indicine cattle among four Brazilian locally adapted cattle breeds,
   displaying the highest frequency of indicine gene introgression. A
   cytogenetic analysis study on the PAN cattle also revealed absorbing
   crosses with the indicine cattle [[175]59]. In addition, the absence of
   admixture patterns in CAR individuals has been previously described by
   Campos et al. [[176]16] and Egito et al. [[177]21]. The homogeneity of
   such population most likely reflects its formation process and the
   objective of selection for dairy traits since 1893 [[178]60], which may
   have distinguished them from other locally adapted taurine breeds when
   taking into consideration the genetic structure integrity.
Genomic inbreeding
   As already stated, the Brazilian locally adapted cattle breeds nearly
   disappeared between the late 19th and beginning of the twentieth
   century, and most of them are nowadays threatened with extinction
   [[179]3, [180]5]. It is worth to stress out that the CAR cattle are an
   exception, and they can be considered as an established breed [[181]5,
   [182]61]. In this regard, animals comprising our dual purpose cattle
   populations, which were exploited for meat production in former times
   [[183]62], are nowadays mainly used in animal genetic resources
   conservation programs (in situ and ex situ) and as a germplasm
   reservoir to preserve the genetic variability [[184]4, [185]63].
   Different from the dual-purpose cattle populations, the dairy
   populations are no longer considered endangered, and such animals have
   been selected for milk production traits in the southeastern region of
   Brazil since 1893 (CAR, [[186]60]) and the early nineties (GIR,
   [[187]64]).
   Most of the locally adapted cattle breeds in Brazil developed from a
   narrow genetic base, and in such cases, inbreeding can increase over
   generations and reduce genetic variability [[188]65]. Despite their
   population background, CRL and PAN animals displayed low F[ROH]
   estimates, concurring with heterozygosity estimates (Results not
   shown). Decreased levels of inbreeding and high genetic variability
   have been previously described for both breeds, probably resulting from
   a slight selection pressure and herd management focused on maintaining
   genetic diversity by using a male:female relationship larger than usual
   [[189]19]. Egito et al. [[190]15] attributed such results to the
   formation of new PAN herds from 2009 onwards while Pezzini et al.
   [[191]18] associated it with the diversification in the use of CRL
   sires. Further, Egito et al. [[192]19] stated that CRL and PAN cattle
   were the most diverse population with the highest mean allelic richness
   among four locally adapted cattle breeds investigated. Such results are
   consistent with F[ROH] estimates found in this current work, reflecting
   mild selection pressure in our dual-purpose cattle populations together
   with rationale mating decisions and herd management taken by the
   breeders and associations.
   The highest F[ROH] found for the CAR population most likely reflects
   its history of selective breeding for milk-related traits from a
   limited genetic base and the occurrence of a population decrease in the
   sixties, as discussed by Egito et al. [[193]19]. According to Marras et
   al. [[194]66], it is not unusual to disclose a higher sum of ROH in
   dairy than in beef populations. In this regard, the reduction of
   genetic variability through the increase of autozygosity in dairy
   breeds can be explained by the intense artificial selection with the
   use of a relatively small number of proven sires [[195]67]. Despite
   being also specialized for milk-related traits, it is not surprising
   that the GIR population did not show as high F[ROH] levels as did CAR.
   Previous studies have also shown low inbreeding rates for the GIR
   cattle considering pedigree-based inbreeding coefficient [[196]68,
   [197]69] and F[ROH] [[198]70, [199]71]. A trend in the decrease of
   inbreeding has been previously described [[200]68, [201]70], and it
   happens along with the establishment of the Brazilian Dairy Gir
   Breeding Program (PNMGL) and the Gir progeny testing. Presumptively,
   these two concomitant events led to the dissemination of the breed,
   allowing formerly closed herds to start using semen of proven sires,
   increasing the overall genetic exchange and reducing the average
   inbreeding over time.
Candidate regions under positive selection
   After combining the top 1% putative sweep regions retrieved from the
   within-population and cross-population DCMS statistics, five candidate
   regions harboring two QTLs and only one protein-coding gene were
   identified. Such results allowed us to highlight the body condition
   score QTL [[202]23] on BTA4:101600000–101,650,000, which can be defined
   as the amount of metabolized energy stored in fat and muscle of a live
   animal [[203]72]. During periods of energy shortage, key hormones
   expression and tissue responsiveness adjust to increase lipolysis to
   meet energy requirements and maintain physiological equilibrium
   [[204]73, [205]74]. Regulation and coordination of energy partitioning
   and homeostasis is a challenge to sustainable intensification of cattle
   productivity in the tropics. The variation in the animal’s nutritional
   and energetic balance may explain the observed variability in
   performance between animals in different environments [[206]75].
   Negative energy balance most likely reduce energy expenditure,
   impairing reproductive performance [[207]76], and increasing the
   susceptibility to infections [[208]77]. As formerly described, the
   Brazilian locally adapted cattle breeds faced several environmental
   pressures to thrive in the tropics under harsh environmental
   conditions, suggesting that animals that were able to minimize the
   mobilization of adipose tissue reserves in response to the energy
   deficit might have conferred fitness advantage than the average
   individual in the given population.
   The PRKN (also known as PARK2) was the only annotated gene identified
   in between the DCMS statistics, and its functions have been associated
   with adipose metabolism and adipogenesis [[209]78]. Remarkably, it is
   considered a strong positional candidate for adiposity regulation in
   chicken [[210]79].
   We also explored common signals between ROH hotspots and the top 1%
   putative sweep regions retrieved from both DCMS statistics to increase
   the power of signals. Among the genes identified when considering the
   within-population DCMS statistic, we revealed the presence of two
   interesting genes that have been described to have effects on
   temperament (EPHA6) [[211]80] and body size (ADAMTS17) [[212]81] in
   cattle. Further, one gene associated with temperament (ANTXR1)
   [[213]82] was also highlighted when considering the cross-population
   DCMS statistic.
   In tropical and subtropical regions, cattle productivity depends not
   only on the inherent ability of animals to grow and reproduce but also
   on their ability to overcome environmental stressors that impact
   several aspects of cattle production [[214]83]. In cattle, stress
   responsiveness has been associated with cattle behavior, more
   specifically, temperament. Temperament can adversely affect key
   physiological processes involved in cattle growth, reproduction, and
   immune functions [[215]84]. Studies have shown that non-temperamental
   cattle tend to gain weight faster [[216]85–[217]87], spend more time
   eating [[218]87], and have a higher dry matter intake and average daily
   gain [[219]85, [220]88] than temperamental cattle. Further, studies
   have discussed the negative impacts of temperamental animals on
   immune-related functions (reviewed by [[221]84]). Two reasons might
   explain those genes associated with temperament located on ROH hotspots
   overlapping regions on BTA1:41600000–41,650,000 and
   BTA11:67450000–67,500,000. The first reason is that such genes likely
   reflect levels of introgression of indicine genes in locally adapted
   taurine cattle breeds, as confirmed by admixture analysis. Bos
   taurus indicus and their crosses have been reported to be more
   temperamental than Bos taurus taurus cattle when reared under similar
   conditions [[222]89]. The second reason is that the locally adapted
   taurine cattle breeds were able to overcome environmental stressors
   through natural selection over time and could prosper in such harsh
   tropical environment.
   The ADAMTS17 gene, described enclosing a ROH hotspot overlapping region
   on BTA21:6550000–6,600,000, is a well-known candidate gene with a major
   impact on body size [[223]81, [224]90, [225]91]. Much has been
   discussed about the relationship between body size and environmental
   adaptation. Variations in body size may be explained as an adaptive
   response to climate and/or can be driven by changes in feed resources
   and seasonal influences [[226]92, [227]93]. In this regard, large body
   size animals can better tolerate austere conditions, having advantages
   under cold stress as well as in the use of abundant forage resources
   [[228]94]. On the other hand, smaller animals exhibit better adaptation
   to warmer and dry climates [[229]95–[230]97] and are more efficient for
   grazing under seasonal and scarce forage resources [[231]98]. Based on
   morphological measurements, it should be noted that the indicine and
   Brazilian locally adapted taurine cattle breeds are small to
   medium-sized breeds. Both GIR, CRL, and PAN have reduced body size and
   lightweight, in which females exhibit an average adult live weight of
   418 kg [[232]99], 430 kg [[233]100], and 298 kg [[234]101],
   respectively. CAR animals have a greater body size among the locally
   adapted cattle breeds, with females displaying an average live weight
   of 650 kg [[235]102].
   Two intersecting QTLs associated with productivity traits usually
   favored in commercial breeds (somatic cell score and maturity rate
   QTLs) were found in ROH hotspots overlapping regions when considering
   the within-population DCMS statistic. Among the QTLs identified when
   considering the cross-population DCMS statistic, the one associated
   with body energy content [[236]34] must be highlighted given its
   importance in energy partitioning and homeostasis, as previously
   discussed. Additionally, several remarkably QTLs neighboring the
   candidate regions intervals were identified. These QTLs have been
   associated with different biological functions linked to local
   environment adaptation, such as parasite vector resistance (tick
   resistance QTL), reproductive-related traits (calving ease, interval to
   first estrus after calving, conception and maturity rate QTLs), body
   conformation and morphology traits (body condition score, body weight
   at yearling, rump angle QTLs), and coat color (coat texture QTL).
   The genes and QTLs identified within the candidate regions provide a
   hint about the selective forces shaping the genome of the indicine and
   Brazilian locally adapted taurine cattle breeds. Such selective forces
   were described to be likely associated with adaptation to a challenging
   environment and environmental stressors. Further, several QTLs
   identified nearby the candidate regions intervals were also associated
   to a lesser extent with beef cattle production traits, while others
   with various biological functions presumably linked to selection to
   environmental resilience as well.
Overlap with candidate regions under positive selection in other cattle
populations
   The greatest number of the putative sweep regions identified from the
   top 1% of the within-population DCMS statistic overlapped with
   candidate regions under positive selection previously reported in five
   cattle breeds selected for dairy production [[237]54], comprehending
   roughly 22% (n = 52) of the overlapping regions. For the top 1% of the
   cross-population DCMS statistic, the greatest number was described for
   native cattle breeds from Siberia, eastern and northern Europe
   [[238]46], totaling nearly 17% (n = 50) of the overlapping regions.
   Remarkably, in both statistics, the majority of the shared signals
   within those reported in the literature was found associated with
   specialized cattle breeds (i.e. dairy and beef). We also identified
   signatures of selection within those reported in the literature shared
   by breeds showing different production selection within the same
   candidate region. According to Gutiérrez-Gil et al. [[239]103], such
   genomic regions may reflect selection for general traits such as
   metabolic homeostasis, or they might disclose the pleiotropic effects
   of genes on relevant traits underlying specialized cattle breeds.
   The greater number (seven out of 11) of the putative sweep regions
   shared between ROH hotspots and the top 1% putative sweep regions
   retrieved from both DCMS statistics overlapped with regions previously
   described on local and native cattle breeds [[240]41, [241]43, [242]44,
   [243]46]. Such results allow us to assume that the same selective
   forces are most likely acting across these populations, and such
   regions might have been shaped by selection events rather than genetic
   drift or admixture events.
   It is noteworthy to underscore that the regions under positive
   selection for other cattle populations reported herein were mainly
   obtained through medium and high-density SNP arrays. SNP genotyping
   arrays suffer from SNP ascertainment bias, and it strongly influences
   population genetic inferences (reviewed by Lachance and Tishkoff
   [[244]104]). Besides, some scan methodologies based on site frequency
   spectrum and population differentiation may be more likely to
   ascertainment bias than others [[245]105, [246]106], compromising the
   power of the tests and may yielding to flawed results [[247]107] when
   compared to those obtained from whole-genome re-sequencing data.
Conclusions
   By using whole-genome re-sequencing data, we identified candidate sweep
   regions in indicine and Brazilian locally adapted taurine cattle
   breeds, of which the latter have been exposed to a process of natural
   selection for several generations in extremely variable environments.
   The signatures of selection across the genome could provide important
   insights for the understanding of the adaptive process and the
   differences in the breeding history underlying such breeds. Our
   findings suggest that admixture has occurred in some locally adapted
   taurine populations due to the introgression of exotic breeds, and the
   stratification results revealed the genetic structure integrity of the
   dairy populations sampled in this study. Candidate sweep regions, most
   of which overlapped with or were nearby reported QTLs and candidate
   genes closely linked to cattle production traits and environmental
   adaptation. Putative sweep regions together with ROH hotspots also
   provided valuable shreds of evidence of footprints for adaptation to
   the challenging climatic conditions faced by the breeds. The candidate
   sweeps regions and the gene list retrieved from them can improve our
   understanding of the biological mechanisms underlying important
   phenotypic variation related to adaptation to hostile environments and
   selective pressures events to which these breeds have undergone.
   Furthermore, the study provides complementary information which could
   be used in the implementation of breeding programs for the conservation
   of such breeds.
Methods
Samples, sequencing, and raw data preparation
   Sequencing analysis was based on data from 13 Gir (Bos taurus indicus,
   dairy production use), 12 Caracu Caldeano (Bos taurus taurus, dairy
   production use), 12 Crioulo Lageano (Bos taurus taurus, dual purpose
   use), and 12 Pantaneiro (Bos taurus taurus, dual purpose use) animals.
   The studied breeds can be classified into two groups: (i) indicine
   breeds represented by the Gir (GIR) cattle; and (ii) locally adapted
   taurine cattle breeds encompassing Caracu Caldeano (CAR), Crioulo
   Lageano (CRL), and Pantaneiro (PAN) cattle. Animals were sampled from
   three Brazilian geographical regions, including the south (CRL),
   southeast (GIR and CAR), and mid-west (PAN) (Additional file [248]12).
   DNA was extracted from semen samples that were collected from GIR bulls
   and blood samples from the remaining breeds. The semen straws were
   acquired from three commercial artificial insemination centers
   (American Breeders Service (ABS), Cooperatie Rundvee Verbetering (CRV),
   and Alta Genetics) and the DNA samples from the Animal Genetics
   Laboratory (AGL) at EMBRAPA Genetic Resources and Biotechnology
   (Cenargen, Brasília-DF, Brazil). Paired-end whole-genome re-sequencing
   with 2 × 100 bp reads (CRL) and 2 × 125 bp reads (GIR, CAR, and PAN)
   was performed on the Illumina HiSeq2500 platform with an aimed average
   sequencing depth of 15X.
   Pair-end reads were aligned to the Bos taurus taurus genome assembly
   UMD 3.1 using Burrows-Wheeler Alignment MEM (BWA-MEM) tool v.0.7.17
   [[249]108] and converted into a binary format using SAMtools v.1.8
   [[250]109]. Polymerase chain reaction (PCR) duplicates were marked
   using Picard tools ([251]http://picard.sourceforge.net, v.2.18.2). For
   downstream processing, GATK v.4.0.10.1 [[252]110–[253]112] software was
   used. Base quality score recalibration was performed using a SNP
   database (dbSNP Build 150) retrieved from the NCBI [[254]113] followed
   by SNP calling using the HaplotypeCaller algorithm. To remove
   unreliable SNP calls and reduce the false discovery rate, hard
   filtering steps were applied on the variant call. Insertions and
   deletions polymorphism (Indels) and multi-allelic SNPs were filtered
   out, and then hard filtering was applied for clustered SNPs (> 5 SNPs)
   in a window size of 20 bp. An outlier approach was used and values
   above 14.44 (highest 5%) for Fisher strand test were removed. The same
   was applied for the highest and lowest 2.5% values for base quality
   rank sum test (− 2.26 and 3.04), mapping quality rank sum test (− 2.46
   and 1.58), read position rank sum test (− 1.64 and 2.18), and read
   depth (267 and 883). Variants with a mapping quality value lower than
   30 (0.1% error probability) were also removed from the call set. SNPs
   that passed the filtering process and located on autosomal chromosomes
   were retained for subsequent analysis.
Variant annotation and predicted functional impacts
   A functional annotation analysis of the called variants was performed
   to assess their possible biological impact using the Variant Effect
   Predictor (VEP, [[255]114]) together with the Ensembl cow gene set 94
   release. Variants are categorized according to their consequence impact
   on protein sequence as high, moderate, low, or modifier (more severe to
   less severe). Variants with high consequence on protein sequence (i.e.
   splice acceptor variant, splice donor variant, stop gained, frameshift
   variant, stop lost, and start lost) were selected for further
   assessment. The impact of amino acid substitutions on protein function
   were predicted using the sorting intolerant from tolerant (SIFT) scores
   implemented on VEP tool, and variants with SIFT scores lower than 0.05
   were considered as deleterious to protein function.
   Database for Annotation, Visualization, and Integrated Discovery
   (DAVID) v6.8 tool [[256]115, [257]116] was used to identify
   overrepresented GO terms and KEGG pathways using the list of genes
   retrieved from the variants classified with high consequence on protein
   sequence and as deleterious, and the Bos taurus taurus annotation file
   as a background. The p-values were adjusted by False Discovery Rate
   [[258]21], and significant terms and pathways were considered when
   p < 0.01.
Population differentiation analysis
   A PCA implemented with a custom R script was used to examine the
   genetic structure of the four breeds. AMOVA [[259]117] was also
   implemented to test for genetic differentiation among breeds. Such
   method consists in assessing population differentiation using molecular
   markers together with a pairwise distance matrix, and it can easily
   incorporate additional hierarchical levels of population structure.
   AMOVA computations were conducted using the ‘amova’ function in R
   package pegas [[260]118]. The analyses were based on pairwise squared
   Euclidean distances using the ‘dist’ function implemented in R
   [[261]119] and the statistical significances were tested by
   permutations (n = 1000). Additionally, the software ADMIXTURE v1.3
   [[262]120] was used to reveal admixture patterns among breeds by
   measuring the proportion of individual ancestry from different numbers
   of hypothetical ancestral populations (K). Linkage disequilibrium (LD)
   pruning for admixture analysis was performed on PLINK v1.90 software
   [[263]121] to remove SNP with a R^2 value greater than 0.1 with any
   other SNP within a 50-SNP sliding window. The optimal number of K was
   defined based on the cross-validation error value (K = 1 to 5)
   implemented in ADMIXTURE.
Genomic inbreeding coefficient estimation
   Genomic inbreeding coefficients based on runs of homozygosity (F[ROH])
   were estimated for every animal according to the genome autozygotic
   proportion described by McQuillan et al. [[264]122]:
   [MATH: FROHi=SROHiLGEN :MATH]
   where
   [MATH: SROHi :MATH]
   is the sum of ROH across the genome for the i^th animals and L[GEN] is
   the total length of the autosomes covered by SNPs. L[GEN] was taken to
   be 2511.4 Mb based on the Bos taurus taurus genome assembly UMD 3.1.
   ROH were identified in every individual using PLINK v1.90 [[265]121]
   software in non-overlapping sliding windows of 50 SNPs. The minimum
   length of a ROH was set to 500 kb. A maximum of three SNPs with missing
   genotypes and three heterozygous SNPs were admitted in each window, as
   discussed by Ceballos et al. [[266]123]. Tukey’s post-hoc test
   [[267]124] was used to identify significant pairwise comparisons
   (p < 0.05).
Selective sweeps detection
   Four statistical methods were implemented to detect genomic regions
   under selective pressure. Cross-population methods encompassed the
   Wright’s fixation index (F[ST]) and the Cross-Population Extended
   Haplotype Homozygosity (XPEHH). Within-population methods included the
   Composite Likelihood Ratio (CLR) statistic and the integrated Haplotype
   Score (iHS).
   F[ST] [[268]125] was calculated between all six pairwise combinations
   of the four breeds with custom R scripts as follows:
   [MATH: FST=p¯1−p¯−∑cipi1−pip¯1−p¯ :MATH]
   where
   [MATH: p¯ :MATH]
   is is the average frequency of an allele in the total population, p[i]
   is the allele frequency in the i^th population, and c[i] is the
   relative number of SNPs in the i^th population. F[ST] scores were then
   averaged in non-overlapping sliding windows of 50 kb. SweepFinder2
   software [[269]126] was used to calculate the CLR statistic [[270]127]
   within each breed in non-overlapping sliding windows of 50 kb across
   the genome. The ancestral allele information was assessed from a cattle
   reference allele list retrieved from Rocha et al. [[271]128]. The CLR
   analysis was performed considering only SNPs containing the ancestral
   allele information (n = 11,260,629 SNPs). The iHS [[272]129] and XP-EHH
   [[273]130] statistics were calculated using the program selscan v1.2.0a
   [[274]131] with default parameters. Within each population, haplotype
   phasing was performed using Beagle 5.0 [[275]132] and the genetic
   distances were determined by assuming that 1 Mb ≈ 1 centiMorgan (cM).
   The iHS scores were calculated within each breed and XP-EHH between all
   six pairwise combinations of the four breeds. The unstandardized iHS
   and XP-EHH scores were standard normalized using the script norm with
   default parameters, as provided by selscan. Absolute iHS and XP-EHH
   values were averaged in non-overlapping sliding windows of 50 kb. To
   compute the iHS statistic, the same subset of SNPs (n = 11,260,629
   SNPs) applied in the CLR statistic was used, however, without
   considering any ancestral allele information. Independent results for
   each statistical method and population implemented herein are presented
   in Additional file [276]13.
   Selective sweeps detection can be enhanced by combining multiple
   genome-wide scan methodologies, benefiting from advantageous
   complementarities among them together with the increase in the
   statistical power [[277]20, [278]133–[279]136]. Further, combining
   within-population statistics from multiple breeds may decrease
   false-positive signals that arise due to population stratification
   (reviewed by Hellwege et al. [[280]137]). Accordingly,
   within-population and cross-population statistics were combined
   separately in a single score using the DCMS statistic [[281]20]. The
   DCMS statistic was calculated for each 50 kb window using the MINOTAUR
   package [[282]138] and the empirical p-values of each statistic were
   derived from a skewness normal distribution with an appropriate
   one-tailed test (Additional file [283]14). Candidate sweep regions
   under selection were revealed by assessing the top 1% of the empirical
   distribution generated by the DCMS statistics.
   Candidate regions identified herein were compared with previous regions
   under selection described in the literature for other cattle breeds.
   Overlap analysis was carried out using the Bioconductor package
   GenomicRanges [[284]139].
Selective sweeps and runs of homozygosity
   Candidate sweep regions revealed from the top 1% of the empirical
   distribution generated by the DCMS statistics were intersected with ROH
   hotspots to identify common signals between both methodologies. ROH
   formerly identified to estimate F[ROH] were applied, and ROH hotspots
   were determined by selecting segments shared by more than 50% of the
   samples within each breed.
   Overlap analysis was performed separately for each DCMS statistic using
   the Bioconductor package GenomicRanges [[285]139].
Functional annotation of the candidate regions
   Genes were annotated within the candidate sweep regions using the cow
   gene set Ensembl release 94 fetched from the Biomart tool [[286]140].
   BEDTools [[287]141] was used to identify overlaps between the retrieved
   gene set list and the putative sweep regions. DAVID v6.8 tool
   [[288]115, [289]116] was used to identify overrepresented GO terms and
   KEGG pathways using the list of genes from the putative sweep regions
   and the Bos taurus taurus annotation file as a background. The p-values
   were adjusted by False Discovery Rate [[290]21], and significant terms
   and pathways were considered when p < 0.01. QTLs retrieved from the
   CattleQTL database [[291]142] were overlapped with the candidate sweep
   regions using BEDtools [[292]141].
Supplementary information
   [293]12864_2020_7035_MOESM1_ESM.xlsx^ (9.7KB, xlsx)
   Additional file 1. Distribution of the functional consequences of the
   called variants (n = 33,328,447 SNPs) using the Variant Effect
   Predictor (VEP) tool.
   [294]12864_2020_7035_MOESM2_ESM.xlsx^ (24.1KB, xlsx)
   Additional file 2. Gene Ontology (GO) terms and Kyoto Encyclopedia of
   Genes and Genomes (KEGG) pathways analysis enriched (p < 0.01) based on
   variants with high consequence on protein sequence set of genes
   [295]12864_2020_7035_MOESM3_ESM.xlsx^ (35.3KB, xlsx)
   Additional file 3. Gene Ontology (GO) terms and Kyoto Encyclopedia of
   Genes and Genomes (KEGG) pathways analysis enriched (p < 0.01) based on
   deleterious variants (SIFT score < 0.05) set of genes
   [296]12864_2020_7035_MOESM4_ESM.docx^ (12.3KB, docx)
   Additional file 4. Analysis of Molecular Variance results.
   [297]12864_2020_7035_MOESM5_ESM.xlsx^ (27.9KB, xlsx)
   Additional file 5. Annotated candidate sweep regions for the
   within-population statistic retrieved from the top 1% of the empirical
   distribution generated by the DCMS statistic.
   [298]12864_2020_7035_MOESM6_ESM.xlsx^ (26.8KB, xlsx)
   Additional file 6. Annotated candidate sweep regions for the
   cross-population statistic retrieved from the top 1% of the empirical
   distribution generated by the DCMS statistic.
   [299]12864_2020_7035_MOESM7_ESM.xlsx^ (26.1KB, xlsx)
   Additional file 7. Runs of homozygosity (ROH) hotspots for Gir (GIR),
   Caracu Caldeano (CAR), Crioulo Lageano (CRL), and Pantaneiro (PAN)
   cattle breeds.
   [300]12864_2020_7035_MOESM8_ESM.xlsx^ (25.6KB, xlsx)
   Additional file 8. Overlapping of the putative sweep regions identified
   from the top 1% of the within-population DCMS statistic with candidate
   regions under positive selection previously reported in other cattle
   populations.
   [301]12864_2020_7035_MOESM9_ESM.xlsx^ (27.2KB, xlsx)
   Additional file 9. Overlapping of the putative sweep regions identified
   from the top 1% of the cross-population DCMS statistic with candidate
   regions under positive selection previously reported in other cattle
   populations.
   [302]12864_2020_7035_MOESM10_ESM.xlsx^ (10.2KB, xlsx)
   Additional file 10. Overlapping between ROH hotspots and the top 1% of
   the within-population DCMS statistic with the candidate regions under
   positive selection previously reported in other cattle populations.
   [303]12864_2020_7035_MOESM11_ESM.xlsx^ (10.7KB, xlsx)
   Additional file 11. Overlapping between ROH hotspots and the top 1% of
   the cross-population DCMS statistic with the candidate regions under
   positive selection previously reported in other cattle populations.
   [304]12864_2020_7035_MOESM12_ESM.pdf^ (97.9KB, pdf)
   Additional file 12. Brazilian geographical regions of the four cattle
   breeds sampled in the study (Adapted from
   [305]https://pt.wikipedia.org/wiki/Ficheiro:Brazil_Labelled_Map.svg).
   [306]12864_2020_7035_MOESM13_ESM.pdf^ (2.4MB, pdf)
   Additional file 13. Manhattan plot of the independent results for each
   selective sweep statistical method and population.
   [307]12864_2020_7035_MOESM14_ESM.pdf^ (237.7KB, pdf)
   Additional file 14. Histogram and quantile-quantile (Q-Q) plots of
   statistical scores calculated for all four methods derived from a
   skewness normal distribution.
Acknowledgements