Abstract

   In goat breeds, the domestication followed by artificial selection for
   economically important traits have shaped genetic variation within
   populations, leading to the fixation of specific alleles for specific
   traits. This led to the formation and evolution of many different
   breeds specialised and raised for a particular purpose. However, and
   despite the intensity of artificial selection, natural selection
   continues acting, possibly leaving a more diluted contribution over
   time, whose traces may be more difficult to capture. In order to
   explore selection footprints as response of environmental adaptation,
   we analysed a total of 993 goats from four transboundary goats breeds
   (Angora, Boer, Nubian and Saanen) genotyped with the SNP chip 50 K
   using outlier detection, runs of homozygosity and haplotype-based
   detection methods. Our results showed that all methods identified
   footprints on chromosome 6 (from 30 to 49 Mb) for two specific
   populations of Nubian goats sampled in Egypt. In Angora and Saanen
   breeds, we detected two selective sweeps using HapFLK, on chromosome 21
   (from 52 to 55 Mb) and chromosome 25 (from 1 to 5 Mb) respectively. The
   analysis of runs of homozygosity showed some hotspots in all breeds.
   The overall investigation of the selected regions detected combining
   the different approaches and the gene ontology exploration revealed
   both novel and well-known loci related to adaptation, especially for
   heat stress. Our findings can help to better understand the balance
   between the two selective pressures in commercial goat breeds providing
   new insights on the molecular mechanisms of adaptation.

   Subject terms: Evolution, Genetics

Introduction

   Capra hircus is one of the most important worldwide farmed species and
   its domestication dates back to the early Neolithic era (ca. 11,000
   YBP) in the Fertile Crescent^[40]1. Goats have been selected during
   centuries for different traits (milk, meat, wool or leather) show high
   resistance to stress and have a great ability to adapt to various
   agro-climatic conditions^[41]2. This quick adaptation implies several
   changes in physiology, morphology, behaviour, phenotype, and at the
   basis of all, in genetics. To analyse these genetic changes and the
   footprints that they left on the genome, the genome-wide scanning
   technology using SNP Arrays is a powerful and efficient tool widely
   used^[42]3 since several decades. For large transboundary breeds,
   genomic information has been used to identify candidate regions for
   traits of commercial interest and for application in breeding (genomic
   selection), investigating mainly the effect of artificial selection on
   the genome. In small local breed, genomic information has been
   exploited to investigate the ability to respond to environmental
   changes and challenges. This subdivision is generally based on the
   assumption that strong selection pressure applied to commercial breeds
   lead to negative impacts on their ability to adapt in comparison with
   traditional breeds, due to stronger connection to their original
   environments^[43]4. The role of the natural selection in shaping the
   genetic architecture of the highly selected, transboundary breeds has
   not yet been investigated. Little is known about how transboundary
   breeds have adapted to a wide range of different environments and
   management conditions. Indeed, natural selection continues acting,
   possibly leaving smaller but detectable contributions. To investigate
   this issue, it is important to first accurately choose an ideal model
   which go through strong anthropogenic selection over centuries and also
   to transboundary transport. Different methodologies that can be applied
   to detect selection signatures^[44]5, generally based on the comparison
   of statistics on genotypes at intra-populations versus
   inter-populations level. Two main categories of statistics have been
   developed at (1) intra-population (2) and inter-populations level. The
   first one is based on site frequency spectrum (SFS), linkage
   disequilibrium (LD) and reduced local variability. The second one
   focuses on single site differentiation and haplotype-based
   differentiation. Each of these approaches includes associated
   statistics and specific bioinformatic tools.

   Objective of the study was to reveal breed-specific selection
   signatures linked to environmental variables and thus to identify loci
   potentially relevant for adaptation in commercial breeds. Our results
   will contribute to advancing knowledge on climate-driven adaptive
   evolution and to better understand the molecular mechanisms involved in
   this process. Moreover, results may find application in selective
   breeding and conservation management programs.

Results

   Publicly available genotypes of four commercial goat breeds were used
   to reveal breed-specific selection signature linked to environmental
   variable. Combining three different statistical methods, we detected
   several polymorphisms that revealed loci potentially affecting
   adaptation to agro-climatic conditions.

Population relationships, clustering and outlier variant detection

   Figure [45]1 presents the PCA results where we can see that all
   populations clustered according to their geographic origin. The Nubian
   breed shows a clear separation not only between the two geographic
   areas (Argentina and Egypt) but also within the NBN_EGCH population
   (Fig. [46]1, A1) with the first two principal components that explain
   about 13% of genetic diversity (Fig. [47]1, A2). In Angora breed, all
   populations constitute well-defined clusters (Fig. [48]1, B1 and B2)
   except the Argentine population. Finally, in the Boer and Saanen breeds
   there are no well-defined clusters, even if it is possible to highlight
   a weak subdivision across them (Fig. [49]1, C1 and D1). The percentage
   of variance explained by the first two principal components indicates a
   low genetic diversity (Boer = 5% and Saanen = 8%, Fig. [50]1, C2 and
   D2).

Figure 1.

   [51]Figure 1
   [52]Open in a new tab

   Distribution of samples in relation to their scores on the first and
   second principal components obtained after principal component analysis
   for the four breeds. Each point represents a single individual and for
   each breed a different colour was assigned. The legend that explains
   the correspondence between breeds and colour is in the lower right
   corner. (A) In the Nubian breed, after separation due to a
   subpopulation from Egypt, the percentage of variance explained up to
   13% (eigenvector 1, X-axis, 8% and eigenvector 2, Y-axis, 5%). (B) In
   the Angora breed, the two components explained 8.6% of total variation
   (eigenvector 1, X-axis, 5.1% of variation and eigenvector 2, Y-axis,
   3.5%). (C) In the Boer breed, the total percentage of variance is 5.1%
   (eigenvector 1, X-axis, 3% and eigenvector 2, Y-axis, 2.1%). (D) In the
   Saanen breed, the variance was a total of 8% (eigenvector 1, X-axis,
   4.6% and eigenvector 2, Y-axis, 3.4%).

   The second step of the analysis with PCAdapt identified several
   polymorphisms as outliers and putative signs of local adaptation. The
   number of outliers as well as the corresponding chromosomes and the
   genes falling into the putative genomic region under selection are
   summarised in Table [53]1 for all breeds. We found a total of twelve
   outliers in Angora, and three in both Saanen and Boer breeds remaining
   after Bonferroni’s correction (Table [54]1, Fig. [55]2).

Table 1.

   List of candidate gene retrieved inside the genomic regions included in
   the interval of 1 Mb on both sides of each outlier SNP detected by
   PCAdapt.
   Breed Chr Position Putative genes
   Nubian 2 41557749 ZDBF2, ADAM23, METTL21A, FASTKD2, CREB1, KLF7, CPO,
   MDH1B, DYTN, FAM237A, EEF1B2, NDUFS1, GPR1, NRP2, PARD3B
   3

   100143884

   101377443
   RNF115, NUDT17, PIAS3, ANKRD35, ITGA10, RBM8A, LIX1L, ANKRD34A,
   POLR3GL, TXNIP, HJV, H2BC18, H2AC21, BOLA1, SV2A, MTMR11, OTUD7B,
   VPS45, PLEKHO1, ANP32E, CA14, APH1A, CIART, MRPS21, PRPF3, RPRD2,
   TARS2, ECM1, ADAMTSL4, MCL1, ENSA, HORMAD1, CTSS, CTSK, ARNT, SETDB1,
   CERS2, ANXA9, MINDY1, PRUNE1, BNIPL, CDC42SE1, MLLT11, GABPB2, SEMA6C,
   TNFAIP8L2, LYSMD1, SCNM1, TMOD4, VPS72, PIP5K1A, PSMD4, ZNF687, PI4KB,
   RFX5, SELENBP1, PSMB4, POGZ, TUFT1, SNX27, CELF3, RIIAD1, MRPL9, OAZ3,
   TDRKH, LINGO4, RORC, C2CD4D, THEM5, THEM4, TCHHL1
   4 1257240 PTPRN2, HSP40, UBE3C, MNX1, NOM1, LMBR1, RNF32
   5

   3006775

   42494415

   80765343

   116979586
   CCDC91, PTHLH, REP15, PPFIBP1, TRHDE, KLHL42, CCDC91, PTPRB, CNOT2,
   ABCD2, CNOT2, CPNE8, KCNMB4, KIF21A, MYRFL, PTPRB, ATXN10, WNT7B,
   MIRLET7A, MIRLET7B, PPARA, CDPF1, PKDREJ, TTC38, TRMU, CELSR1, TBC1D22A
   6

   1951800

   14358997

   29025606

   30831884

   34602486

   37401587

   38473627

   38696985

   39295173

   39985135

   40745285

   40875771

   41727639

   43160722

   46214946

   49627140

   52110326

   52236308
   BMPR1B, PDLIM5, HPGDS, SMARCAD1, ATOH1, GRID2, CCSER1, MMRN1, SNCA,
   GPRIN3, TIGD2, FAM13A, NAP1L5, PYURF, HERC5, HERC6, PPM1K, ABCG2, PKD2,
   SPP1, MEPE, IBSP, LAP3, MED28, FAM184B, DCAF16, NCAPG, LCORL, SLIT2,
   PACRGL, KCNIP4, ADGRA3, GBA3, PPARGC1A, DHX15, SOD3, CCDC149, LGI2,
   SEPSECS, PI4K2B, ZCCHC4, ANAPC4, SLC34A2, SEL1L3, RBPJ, CCKAR, TBC1D19,
   STIM2, SNORA70
   Angora 1 108667531 RARRES1, LXN, VEPH1, PTX3, GFM1, SHOX2, IQCJ, MFSD1,
   MLF1
   2 82677792 KYNU, GTDC1
   5 8209663 PPP1R12A, SYT, NAV3, PAWR
   6

   71967098

   71996981
   SRD5A3, KDR, NMU, THEM165, PDCL2, CLOCK, EXOC1L, CEO135, CRACD, AASDH,
   ARL9, THEGL, HOPX, NOA1, POLR2B
   7 56050271 KCTD16, NRC1, FGF1
   9

   5706994

   14534276
   MEI4, IRAK1BP1, PHIP, LCA5, SH3BGRL2, CLVS2
   13 29008963 CDNF, HSPA14, SNORD22, MEIG1, DCLRE1C, ACBD7, RPP38, NMT2,
   FAM171A1, ITGA8, MINDY3
   18 61158064 Zinc Finger Protein Family
   20 60102747 DNAH5
   21 31490029 HYKK, PSMA4, CHRNA5, CHRNA3, CHRNB4, UBE2Q2, NRG4, FBXO22,
   TMEM266, ETFA, ISL2, RCN2, SCARPER, PSTPIP1, TSPAN3, PEAK1, LINGO1
   Boer 3 89901645 RAP1A, INKA2, DDX20, KCND3, ST7L, CAPZA1, CTTNBP2NL,
   WNT2B, MOV10, RHOC, PPM1J
   26 36351768 HELLS, TBC1D12, NOC3L, PLCE1, SLC35G1, LGI1, FRA10AC1,
   PDE6C, RBP4, FFAR4, CEP55, MYOF, CYP26A1, CYP26C1, EXOC6, HHEX, KIF11
   27 40094623 ANGPT2, MCPH1
   Saanen 3 71857518 DNTTIP2, GCLM, ABCA4, ARHGAP2, ABCD3, SLC44A3, CNN3,
   ALG14, TLCD4, RWDD3
   11 19476181 CRIM1, FEZ2, VIT, STRN, HEATR5B, GPATCH11, EIF2AK2,
   SULT6B1, CEBPZ, NDUFAF7, PRKD3, QPCT, CDC42EP3, RMDN2, CYP1B
   14 59228427 PENK, SDR16C5, CHCHD7, PLAG1, MOS, LYN, TGS1, TMEM68, XKR4,
   RP1, SOX17, RGS20, ATP6V1H
   [56]Open in a new tab

Figure 2.

   [57]Figure 2
   [58]Open in a new tab

   Circular Manhattan plot of outliers SNP detected with PCadapt analysis.
   One different colour is assigned to each breed: coral for Saanen,
   purple for Nubian, green for Boer and blue for Angora. The red dashed
   line indicates the threshold of significance of 0.05%. Every point is a
   SNP and with amplified the significant ones.

   In the Nubian breed we observed the highest number of putative markers
   under selection even after the Bonferroni correction (Supplementary
   Table [59]2 and Fig. [60]3). Since we found a strong overlap across
   three different analyses of a large, potential genomic region under
   selection in the CH6, we decided to re-analyse the Nubian dataset
   excluding this chromosome. After comparisons with the modified
   analysis, the number of outliers confirmed by both analysis with
   PCAdapt in Nubian breed was eight (into the CH2, CH3, CH4 and CH5,
   Table [61]2).

Figure 3.

   [62]Figure 3
   [63]Open in a new tab

   Frequency of Runs of Homozygosity for each class of length. Histograms
   are built with different colours for each breed (A = Nubian,
   B = Angora, C = Boer and D = Saanen) and every population is indicated
   with a different shade of the same colour.

   Supplementary Figs. [64]1–[65]4 show the Admixture analysis and the
   Supplementary Fig. [66]5 the cross-validation errors for each breed.
   These results agreed with the PCA and, together with the pairwise F[ST]
   values, are in agreement with^[67]1 (Supplementary Table [68]1). In
   particular, both Admixture and F[ST] confirmed the quite strong genetic
   structure found in the Nubian breed (Supplementary Table [69]1 and
   Supplementary Fig. [70]3).

Runs of Homozygosity

   The Supplementary Figs. [71]7, [72]8, [73]9 and [74]10 have the
   Manhattan plots for the four breeds and for each population. In the
   Nubian breed, based on estimation of the genomic inbreeding coefficient
   (FROH), it is evident that both populations from Egypt had a higher
   level of inbreeding compared with the population from Argentina (0.13
   and 0.14 for NBN_EGCH and NBN_EGCH1, and 0.11 for NBN_ARCH,
   Supplementary Table [75]3). Looking at the distribution of ROHs per
   class (Fig. [76]3A) the NBN_EGCH group revealed a higher amount of
   longer ROH (8–16 and > 16 Mb) compared to NBN_EGCH1 (4–8 Mb) and
   NBN_ARCH (2–4 Mb).

   The Nubian populations from Egypt showed a high incidence of variants
   in ROH creating a peak on CH6, with a percentage of variants that
   overcomes 75% (Supplementary Fig. [77]9). The NBN_EGCH also showed
   another remarkable peak on CH25 and NBN_EGCH1 on CH18. We found similar
   patterns of homozygosity considering the FROH (Fig. [78]4A) and the
   percentage of ROH per chromosome (Fig. [79]5A).

Figure 4.

   [80]Figure 4
   [81]Open in a new tab

   Distribution of genomic inbreeding coefficient (FROH) or ROH-based
   inbreeding per chromosome and for each breed. Every bar represents a
   chromosome and a different colour is associated with a population for
   every breed. A = Nubian, B = Angora, C = Boer and D = Saanen.

Figure 5.

   [82]Figure 5
   [83]Open in a new tab

   Percentage of Runs Of Homozygosity. The bars exhibit the frequency of
   ROH per chromosome identified in every population for the four breeds.
   A different colour is assigned to each population within every breed.
   A = Nubian, B = Angora, C = Boer and D = Saanen.

   Finally, we observed the presence of several hotspots of homozygosity
   (ROH islands) occurring mainly in both Egyptian populations but also
   one in the NBN_ARCH population (Fig. [84]6, Supplementary Table [85]4).

Figure 6.

   [86]Figure 6
   [87]Open in a new tab

   Graphical representation of the proportion of the genome covered by ROH
   Islands. One different colour is assigned to each breed and the order
   of breeds is based on the abundance of ROH Islands for a better
   visualization. From the most external to the centre: Boer (light
   green), Angora (light blue), Saanen (light orange) and Nubian (light
   pink).

   In the Angora breed, both African populations showed the same pattern
   of homozygosity, where the sum and the mean of ROH for each individual
   were relatively high (Supplementary Figs. [88]11 and [89]12). On the
   contrary, the population from Argentina had very short ROH whereas the
   population from France displayed an intermediate situation.
   Interestingly, the France population exhibited the highest FROH value
   on CH21 in respect to the rest of populations (Fig. [90]4B) and
   ANG_ZACH had the highest value of genomic inbreeding coefficient
   (FROH = 0.22, Supplementary Table [91]3 and Fig. [92]4B). The analysis
   by classes of length revealed an interesting result for ANG_ARCH and
   ANG_ZACH populations which show several homozygous segments within the
   4–8 and 8–16 Mb classes (Fig. [93]3B). Only the ANG_MGCH population had
   some hotspots characterised by homozygous segments of about 2 Mb
   (Fig. [94]6 and Supplementary Table [95]3). The percentage of ROH per
   chromosome (Fig. [96]5B) presented a similar pattern for all Angora
   populations. In the Boer breed, the genomic inbreeding coefficient
   ranged from 0.22 to 0.1 (Supplementary Table [97]3) where BOE_AUCH,
   BOE_NZCH, BOE_CHCH and BOE_USCH had the highest values. These results
   were confirmed when we look at the sum and mean of ROH (Supplementary
   Figs. [98]11 and [99]12) and the distribution per length class, where
   it is evident that several segments exceed the 16 Mb in length
   especially in the BOE_CHCH population (Fig. [100]3C). In general,
   similar patterns of homozygosity are found in all Boer populations
   considering the FROH (Fig. [101]4C) and the percentage of ROH per
   chromosome (Fig. [102]5C). In the Saanen breed, the populations from
   Switzerland, Tanzania and Argentina showed the highest value of FROH,
   ranging from 0.12 to 0.14 (Supplementary Table [103]3) and in general
   we found few ROH in all groups (Supplementary Table [104]4,
   Fig. [105]6) and only few hotspots in SAA_KECH population
   (Supplementary Table [106]4). The distribution by length class showed
   several segments < 2 Mb in the SAA_ARCH, SAA_CHCH and SAA_FRCH
   populations, while segments that exceed 16 MB are found in the SAA_TZCH
   population (Fig. [107]3D). A similar pattern of homozygosity
   considering the FROH (Fig. [108]4D) was found within all Saanen
   populations, however SAA_TZCH had the highest value of genomic
   inbreeding coefficient (FROH > 0.2) on CH23, CH24, CH25 and CH28.
   Regarding the percentage of ROH per chromosome (Fig. [109]5D), SAA_KECH
   showed a different pattern compared with the other Saanen populations.
   Another interesting finding is related to the abundance of hotspots
   present in the genome of Boer populations. In particular, we found the
   same genomic region in the CH6, ranging from 85 to 86 Mb in all groups
   and an additional region ranging from 80 to 82 Mb was absent only in
   BOE_UGCH and BOE_ZWCH. Furthermore, we discovered several long ROH
   islands in other chromosomes shared by some populations or exclusive of
   a particular population that were not discovered in previous
   studies^[110]6 (Fig. [111]6 and Supplementary Table [112]4).

   For instance, a ROH ranging from 21 to 25 Mb on CH13 is shared only by
   the BOE_NZCH and BOE_CHCH populations, and another ROH of 12 Mb on the
   same chromosome is exclusive to the BOE_AUCH population. This stretched
   segment was found to partially overlap several shorter segments found
   in all population excluding the African ones. Other regions on CH3, CH7
   and CH8 (Supplementary Table [113]4 and Fig. [114]6) were shared or
   partially overlapped in BOE_AUCH, BOE_CHCH, BOE_USCH and BOE_NZCH, and
   some segments are partially overlapped with ROH islands found in a
   previous study^[115]6.

Selective sweeps with HapFLK

   The HapFLK analysis detected two significant selective sweeps. The
   first one was a region of about 5 Mb mapping on CH25 and ranging from 1
   and 5 Mb, in the Saanen breed (FDR < 0.01), whereas the second one was
   of about 3 Mb, spanning between 52 and 55 Mb on CH21 in the Angora
   breed (FDR < 0.02). For the remaining two breeds, there were no
   significant regions after FDR correction (Fig. [116]7).

Figure 7.

   [117]Figure 7
   [118]Open in a new tab

   Circular Manhattan plot of selective sweeps detected with HapFLK. One
   different colour is the same to each breed: coral for Saanen, purple
   for Nubian, green for Boer and blue for Angora. The red dashed line
   indicates the threshold of significance of 0.05%. The grey dashed line
   indicates the two significant peaks in the CH25 (Saanen) and CH21
   (Angora).

   However, it is necessary to remark that, while some of the peaks did
   not achieve the statistical threshold of 0.05, some of them showed a
   co-localisation with selective sweeps identified in previous studies
   using the same populations^[119]6. The corresponding putative selective
   sweeps and the genes falling into these genomic regions are summarised
   in Table [120]2 for the Angora and Saanen breeds. The analysis revealed
   several novel and well-known genes that are associated to adaptation.

Table 2.

   List of candidate gene retrieved inside the genomic regions included in
   the interval of 1 Mb on both sides of selective sweep detected by
   HapFLK.
   Breed Chr Genomic region in Mb Putative genes
   ANGORA 21 50–56

   LRFN5, C14orf28, KLHL28, TOGARAM1, PRPF39, FKBP3

   FANCM, MIS18BP1, TGM5, TGM7, LCMT2, ADAL, ZSCAN29, TUBGCP4, TP53BP1,
   5S_rRNA,

   MAP1A, PPIP5K1, STRC

   CATSPER2, PDIA3, U6, ELL3

   SERF2, SERINC4, HYPK, MFAP1

   WDR76, FRMD5, GPR68
   SAANEN 25 1–5

   UNKL, C16orf91, CCDC154, CLCN7 PTX4, TELO2, IFT140

   TMEM204, CRAMP1, JPT2, MAPK8IP3, NME3, MRPS34

   SPSB3, NUBP2, IGFALS, HAGH, FAHD1, MEIOB, HS3ST6

   RPL3L, MSRB1, NDUFB10, RNF151, TBL3, NOXO1, GFER

   SYNGR3, ZNF598, SLC9A3R2, NTHL1, TSC2, PKD1, RAB26, TRAF7, CASKIN1,
   MLST8, BRICD5, PGP, E4F1, DNASE1L2, ECI1, ABCA3, CCNF, TEDC2, NTN3,
   TP6V0C, PDPK1, KCTD5, PRSS22, PRSS33, SRRM2, FLYWCH1, KREMEN2, PAQR4,
   PKMYT1, CLDN6, TNFRSF12A

   HCFC1R1, THOC6, BICDL2, MMP25, ZSCAN10, NF205, ZNF213, ZNF263, TIGD7,
   ZNF75A, OR2C1, ZNF174

   NAA60, C16orf90, CLUAP1, NLRC3, SLX4, DNASE1, TRAP1, CREBBP, ADCY9,
   SRL, TFAP4, GLIS2, VASN

   DNAJA3, NMRAL1, HMOX2, CDIP1, UBALD1, MGRN1

   NUDT16L1, ANKS3, SEPTIN12, ROGDI, GLYR1, UBN1

   PPL, SEC14L5, NAGPA, C16orf89
   [121]Open in a new tab

   Comparing the results of the three programs used in this study, we can
   highlight only one overlapping genomic region presumably under
   selection on CH6 of Nubian breed. It is worth highlighting that in the
   Nubian breed there were sixteen outlier SNPs falling into the putative
   genes under selection. These gene were HSP40 on CH4, PTPRB on CH5, and
   ZCCHC4, PPARGC1A, LGI2, SEPSECS, IBSP, CCSER1 (5 SNP), LCORL (2 SNP)
   and KCNIP4 (2 SNP) on CH6. However, we found positional coincidences
   between one outlier SNP and the ROH island on CH3 of Boer breed,
   spanning from 79 to 95 Mb and shared by AUCH, NZCH, USCH and TZCH
   populations.

Candidate gene identification and functional analysis

   Candidate genes within the two Mb intervals of the putative selected
   regions were retrieved with the Ensembl BioMart tool. The obtained
   lists were further analysed using relevant literature for verifying if
   there were genes associated with environmental adaptation (Table
   [122]4). In all four breeds, several loci are involved in metabolism
   and adipogenesis, as well as feed intake, immune response and growth or
   which expression is affected by the availability of food. More genes
   indirectly related to adaptation are discovered when we checked the two
   large hotspots common amongst all Boer populations and summarised in
   Table [123]3.

Table 3.

   List of candidate genes retrieved inside the genomic regions included
   in the interval of 1 Mb on both sides of ROH islands detected by
   detectRUNS.
   Breed Chr Genomic region in Mb Putative genes
   BOER 6 82–88 STAP1, UBA6, GNRHR, TMPRSS11D, TMPRSS11A, TMPRSS11F,
   TMPRSS11E, YTHDC1, UGT2A2, SULT1B1, ODAM, CABS1, AMTN, AMBN, ENAM,
   JCHAIN, UTP3, RUFY3, GRSF1, MOB1B, DCK, SLC4A4
   13 20–35 PLXDC2, NEBL, SKIDA1, MLLT10, DNAJC1, COMMD3, SPAG6, PIP4K2A,
   ARMC3, MSRB2, PTF1A, OTUD1, KIAA1217, ARHGAP21, PRTFDC1, ENKUR, GPR158,
   MYO3A, GAD2, FZD8, OPTN, MCM10, UCMA, PHYH, SEPHS1, BEND7, FRMD4A,
   CDNF, HSPA14, SNORD22, DCLRE1C, MEIG1, ACBD7, RPP38, NMT2, FAM171A1,
   ITGA8, MINDY3, PTER, C1QL3, RSU1, CUBN, TRDMT1, VIM, ST8SIA6, HACD1,
   STAM, TMEM236, MRC1, SLC39A12, CACNB2, NSUN6, EPC1, KIF5B, ARHGAP12,
   ZEB1, ZNF438, SVIL, JCAD, MTPAP, MAP3K8
   [124]Open in a new tab

   All biological process terms with P values < 0.1, including the number
   of genes assigned to these terms are summarized in Table [125]4. The
   extensive analysis of the identified GO terms revealed that the
   identified candidate genes have been associated with diverse biological
   functions, such the transmission of nervous signals and metabolic
   processes, all of them playing a role in basic functions of the
   organism probably in response to environmental pressure.

Table 4.

   Gene ontology terms significantly associated with biological processes,
   molecular functions, and cellular components for Angora, Boer, Nubian
   and Saanen breeds.
   Breed Category GO:term Ovis aries Bos taurus Homo Sapiens Gene name
   P value P value P value
   Angora Biological process *GO:0018149—peptide cross—linking 0.0221
   0.0251 – TGM7, TGM5
   GO:0035095—behavioural response to nicotine 0.0002 0.0001 0.00002
   CHRNA3, CHRNB4, CHRNA5
   GO:0035094—response to nicotine 0.0037 0.0016 –
   GO:0098655—cation transmembrane transport – 0.0018 –
   GO:0007274—neuromuscular synaptic transmission – 0.0038 –
   GO:0050877—nervous system process – – 0.0009
   GO:0007271—synaptic transmission cholinergic – 0.0038 0.0002
   GO:0060084—synaptic transmission involved in micturition 0.0112 0.0105
   0.026 CHRNA3, CHRNB4
   GO:0006940—regulation of smooth muscle contraction 0.0296 0.0245 0.0001

   CHRNB4, NMU

   CHRNA3^+
   GO:0045944—positive regulation of transcription from RNA polymerase II
   promoter 0.0841 – PPP1R12A, SHOX2, PHIP, NR3C1, FGF1
   GO:0007165—signal transduction – 0.0531 0.0005

   LINGO1, CHRN4 CHRNA3,

   PPP1R12A, CHRNA5

   CDNF^+

   FGF1^+

   NR3C1^+

   PPP1R12A^+
   Cellular components GO:0005892—acetylcholine-gated channel complex
   0.0076 0.0013 0.00008 CHRNA3, CHRNB4 CHRNA5
   GO:0045211—postsynaptic membrane 0.0013 0.0050 0.0004 CHRNA3, CHRNB4,
   CHRNA5, KCTD16
   GO:0005783—endoplasmic reticulum 0.0514 0.0273 0.002 RCN2, SRP72,
   SRD5A3, KDR, ITGA8, CDNF^+

   GO: 0070161

   anchoring junction
   – – 0.00004 CHRNA3, CHRNA5, CHRNB4, KDR, KCTD16, SYT1
   GO:0042734—presynaptic membrane 0.0626 0.0851 – SYT1, KCTD16
   GO:0030054—cell junction 0.0675 0.0360 – CHRNA3, CHRNB4, CHRNA5, SYT1
   Molecular Functions *GO:0003810—protein-glutamine gamma-glutamyl
   transferase activity 0.0098 0.0101 – TGM7, TGM5
   GO: 0005892-acetylcholine-gated cation-selective channel activity – –
   0.0001 CHRNA3, CHRNB4, CHRNA5
   GO:0015464—acetylcholine receptor activity 0.0031 0.0006 0.0001
   GO:0004889—acetylcholine-activated cation-selective channel activity
   0.0064 0.0010 –
   Boer Biological process GO:0034653—retinoic acid catabolic process –
   0.0072 0.003

   CYP26A1, CYP26C1

   CYP26A1, CYP26C1
   GO:0006766-vitamin metabolic process – – 0.004
   GO: 0048387—negative regulation of retinoic acid receptor signalling
   pathway – – 0.005
   GO:0016525—negative regulation of angiogenesis – 0.0833 – HHEX ANGPT2
   GO:0045444—fat cell differentiation – 0.0998 – FFAR4, NOC3L
   GO:0006898—receptor-mediated endocytosis^+ 0.0427 – – CUBN, MRC1
   Cellular components GO:0005615—extracellular space 0.0688 0.0563 0.02
   LGI1, MOV10, RBP4, ANGPT2, WNT2B
   GO: 0010008 -endosome membrane – – 0.009

   CUBN

   FFAR4

   MRC1
   Molecular Functions GO:0008401—retinoic acid 4-hydroxylase activity –
   0.0045 0.006 CYP26A1, CYP26C1
   GO:0001972—retinoic acid binding – 0.0136 0.0130
   GO:0003682—chromatin binding – 0.0793 – HELLS, HHEX, NOC3L
   Nubian Biological Process GO:0051603—proteolysis involved in cellular
   protein catabolic process 0.0095 0.0194 0.016 PSMB4, CTSK, CTSS
   GO:0043486—histone exchange 0.0245 0.0481 – VPS72, ANP32E
   RNA splicing – – 0.0001

   CELF

   DHX15

   PPARGC1A

   PRPF3

   SCNM1
   GO:0033235—positive regulation of protein stimulation 0.0425 0.0481 –
   PIAS3, ARNT
   GO:0007283—spermatogenesis 0.0596 – – TDRKH, CELF3, HORMAD1, OAZ3
   GO:0030574—collagen catabolic process 0.0602 – – CTSK, CTSS
   GO:0008637—apoptotic mitochondrial changes 0.0718 – – NDUFS1, MCL1
   GO:0008380—RNA splicing 0.0833 0.0605 0.001

   RBM8A CELF3 SCNM1

   DHX15^+

   PPARGCA^+

   PRPF3^+
   GO:0006631—fatty acid metabolic process 0.0890 – 0.006

   THEM4, PPARA

   THEM5^+

   SNCA^+
   GO:0000209—protein polyubiquitination – 0.0831 – KLHL42, RNF115 UBE3C
   GO:0031290—retinal ganglion cell axon guidance 0.037 0.0410 – SLIT2,
   BMPR1B
   GO:0060079—excitatory postsynaptic potential 0.044 – – GRID2, SNCA
   GO:0001503—ossification 0.063 – – IBSP, SPP1
   GO:0034599—cellular response to oxidative stress 0.068 – – PPARGC1A,
   SNCA
   GO:0031214—biomineral tissue development – 0.0386 0.005 SPP1, MEPE,
   IBSP^+
   Cellular components GO:0005737—cytoplasm 0.0399 0.0415 0.0007

   PIAS3, PTPRR, RBM8A SETDB1, TUFT1,

   CELF3, PRUNE1, OTUD7B, ARNT,

   PTHLH, PSMB4, PSMD4, CNOT2, PRPF3, POGZ,

   TXNIP, TNFAIP8L2, HORMAD1, S100A11
   – – 0.0007

   PPARGC1A,

   ENSA

   HPGDS

   HHEX

   OAZ3

   PLEKHO1

   PKD2

   THRDE
   – 0.0223 0.0007 SLC34A2, KCNIP4, NCAPG, RBPJ, SOD3, MED28, HERC5,
   SEPSECS, DHX15, LAP3, SLIT2, HERC6, SNCA
   GO:0032587—ruffle membrane – 0.0364 – THEM4, PIP5K1A, PLEKHO1
   GO:0016020—membrane – 0.0407 – IBSP, NCAPG, PKD2, BMPR1B, MED28, SNCA
   GO:0015629—actin cytoskeleton 0.0231 0.0325 – NCAPG, PDLIM5, SNCA
   GO:0005615—extracellular space 0.0511 – – IBSP, SPP1, SLIT2, SOD3, SNCA
   Molecular Functions GO:0016290—palmitoyl-CoA hydrolase activity 0.0324
   0.0374 – THEM5 THEM4
   GO:0003700—transcription factor activity sequence specific DNA binding
   – 0.0259 0.007 GABPB2, KLF7, ARNT, CREB1, MNX1, PPARA, RFX5, MYRFL
   GO:0019212—phosphatase inhibitor activity – 0.0031 0.02 ANP32E, ENSA
   GO:0043394—proteoglycan binding 0.0451 – 0.0004 CTSK, CTSS, SLIT2^+
   GO:0008270—zinc ion binding 0.0602 – 0.002

   ZDBF2, RNF32

   PIAS3, RNF115

   CPO, SETDB1, RORC OTUD7B, PPARA TRHDE, DYTN,

   KLF7^+

   SNCA^+
   GO:0003677—DNA binding 0.0915 0.0494 0.0002

   KLF7, SETDB1

   RFX5, POGZ, OTUD7B MYRFL, CERS2

   ARNT, PPARA

   VPS72

   RORC^+

   CREB1^+

   HHEX^+

   MNX1^+

   PPARGC1A^+

   RBPJ^+
   GO:0005509—calcium ion binding 0.0075 0.0194 –

   HPGDS, MMRN1, KCNIP4, SLIT2, PKD2, SNCA

   STIM2, KCNIP4
   GO:0051219—phosphoprotein binding – 0.0142 – PKD2, SNCA
   GO:0005267—potassium channel activity – 0.0280 – KCNIP4, PKD2
   Saanen Biological process * GO:0006308—DNA catabolic process – 0.0283 –
   DNASE1, DNASE1L2
   Retinal metabolic process – – 0.0006
   * GO:0016567—protein ubiquitination – 0.0455 0.003 SPSB3, TRAF7, CCNF,
   RNF151
   * GO:0006308—DNA catabolic process 0.0123 0.0123 0.01 DNASE1, DNASE1L2
   GO:0035556—intracellular signal transduction 0.0560 – – LYN, PRKD3
   ARHGAP29
   GO:0042574—retinal metabolic process – 0.0193 0.00067 SDR16C5 CYP1B1
   GO:0042572—retinol metabolic process – 0.0345 –
   GO:0045494—photoreceptor cell maintenance – 0.0551 – RP1, ABCA4
   GO: 0007603 -phototransduction visible light – – 0.01 RP1, ABCA4
   GO:0071407—cellular response to organic cyclic compound – 0.0625 –
   RGS20, CYP1B1
   GO:0009636—response to toxic substance – 0.0621 – EIF2AK2, CYP1B1
   GO:0001750—photoreceptor outer segment – 0.0820 –

   RP1

   ABCA4
   Cellular component GO:0005634—nucleus 0.047 0.0331 0.000001 MGRN1,
   CCNF, TSC2, PKMYT1, PKD1, NMRAL1, NUBP2, TFAP4, NTHL1, DNAJA3, UBN1,
   ZSCAN10, CDIP1, MSRB1
   GO:0005829 -cytosol – – 0.002

   ABCD3

   CDC42EP3

   DNAJA3

   LYN

   MOS

   CCNF

   TSC2

   TELO2

   ZNF174
   * GO:0031931—TORC1 complex – 0.0352 – MLST8, TELO2
   Molecular Functions GO:0005524—ATP binding 0.0473 0.0473 0.01 LYN, MOS,
   ABCD3, PRKD3, ABCA4, EIF2AK2
   GO:0042626—ATPase activity coupled to transmembrane movement of
   substances 0.0547 – – ABCD3 ABCA4
   GO:0005096—GTPase activator activity – 0.0579 – CDC42EP3, RGS20,
   ARHGAP29
   * GO:0046872—metal ion binding – 0.0085 0.000003

   ZNF263, ZNF174, FAHD1, ZNF75A, NUBP2, GLIS2, NTHL1, ZSCAN10, ZNF205,
   E4F1, HAGH, ZNF213, MSRB1

   DNAJA3^+

   ARHGAP29^+

   CDIP1^+

   MGRN1^+

   PKMYT1^+

   RNF151^+
   * GO:0003700—transcription factor activity sequence-specific DNA
   binding – 0.0240 – ZNF263, ZNF174, TFAP4, ZSCAN10, E4F1, ZNF205, ZNF213
   [126]Open in a new tab

   *Genes from HapFLK analysis; ^+genes present only Homo sapiens
   annotation; In bold: genes present only in Bos Taurus annotation.

Discussion

   Environmental factors are one of the forces influencing agricultural
   and the livestock sectors. Animals exposed to stressful environments
   exhibit various adaptive mechanisms, such as behavioural,
   physiological, endocrine, cellular, metabolic and biochemical for
   minimising the stressful conditions. Thus, adaptation is the natural
   strategy to ensure both welfare and efficiency. The action of selection
   leaves signs along the genome as responses to environmental and
   anthropogenic pressures that can be revealed using specific
   methodologies and bioinformatic tools. We applied three complementary
   methods for detecting a wider range of candidate genes that can be
   further investigated. The analysis with PCAdapt revealed, in Boer and
   Saanen breeds, a few markers potentially under selection for
   environment adaptation, whereas Angora and Nubian breeds showed several
   outliers distributed along different chromosomes.

   Our findings led to the identification of target genes related to
   adaptation and more specifically to response at the temperature stress,
   energy homeostasis, photoperiod, immune/inflammatory response,
   reproductive and production traits. The responses to stress include not
   only reactions to extreme cold and hot temperatures but also the
   ability to adapt to harsh environmental conditions, such as
   poor-quality forage or water scarcity. In African indigenous chickens,
   the TOGARAM1locus, involved in the assembly of non-motile cilia and
   thus essential for cellular signal transduction, was found affected by
   heat-shock^[127]7. In fact, heat can induce a rapid loss of these
   important organelles^[128]8, deducing that this gene may play an
   important adaptive role in alleviating this effect in high temperature
   conditions. PDIA3 regulates cell growth and death according to oxygen
   concentrations and this gene was implicated in the thermal
   acclimatisation process in ovine liver tissue^[129]9, and in sperm–egg
   fusion in sheep and cashmere goats^[130]10. These genes were detected
   in Angora goat populations on CH13 near to the SNP outlier and within
   the selective sweep on CH21. The heat-shock protein 40 (HSP40) and the
   heat shock protein family A (HSP70) member 14 (HSPA14) belong to the
   heat-shock proteins (HSP) family, involved in cellular responses and
   for protein homeostasis and survival under stress conditions. In
   particular, HSP70 gene has been linked with heat tolerance and higher
   milk production in cattle^[131]11. These findings indicate the putative
   effects of selective pressure on this gene family favouring animals
   with better thermotolerance, performance and stress resilience^[132]12.
   TRPA1 is a member of the transient receptor potential (TRP) superfamily
   of ion channels. Studies in mice suggested that TRPA1 channels mediate
   cold temperature sensing in mammalian vagal sensory neurons, evoking
   major protective reflexes and thermoregulatory responses^[133]13. This
   gene was found in the putative selective sweep on the CH25 of Saanen
   goat: in this breed, populations are from different climatic areas
   (continental and temperate, following the Köppen–Geiger Climate
   Classification; Table [134]5), indicating a plausible association with
   thermal stress and cold adaptation. Another interesting candidate gene
   is TRHDE, a gene implicated in energy homeostasis, body temperature
   regulation^[135]14, in particular adaptation to hot arid environments
   in goats^[136]15 and high-altitude in Ethiopian sheep^[137]16.

Table 5.

   Breed and population code, country and number of samples used in the
   study. We added the Köppen–Geiger Climate Classification for further
   considerations.
   Breed Population code Country Number of animals Total Köppen–Geiger
   climate classification
   NUBIAN NBN_ARCH Argentina 15 99 Cwa
   NBN_EGCH Egypt 64 BWh
   NBN_EGCH1 Egypt 20 BWh
   ANGORA ANG_ARCH Argentina 285 366 BWk
   ANG_FRCH France 26 Cfc
   ANG_MGCH Madagascar 7 BSh
   ANG_ZACH South Africa 48 BSk
   BOER BOE_AUCH Australia 61 332 BSh
   BOE_NZCH New Zealand 14 Cfb
   BOE_CHCH Switzerland 189 Dfb
   BOE_UGCH Uganda 5 Aw/As
   BOE_USCH USA 34 Cfb
   BOE_TZCH Tanzania 4 Aw/As
   BOE_ZWCH Zimbabwe 25 Cwc
   SAANEN SAA_ARCH Argentina 18 196 Cwa
   SAA_ITCH Italia 24 Cfa
   SAA_FRCH France 56 Cfc
   SAA_KECH Kenia 2 Cfc
   SAA_SWCH Switzerland 44 Dfb
   SAA_TZCH Tanzania 19 Cwc
   SAA_RU* Russia 33 Dfb
   [138]Open in a new tab

   *This population is not from the AdaptMap dataset. The Köppen–Geiger
   Climate Classification Map ranged from the year 1980 to 2016. Legend of
   the Map: Aw = Tropical/Dry winter As = Tropical/Dry summer;
   BWh = Arid/desert/hot; BWk = Arid/desert/cold; BSh = Arid/Steppe/Hot;
   BSk = Arid/Steppe/Cold; Cfc = Temperate/No dry season/Cold summer;
   Cfb = Temperate/No dry season/warm summer; Cwc = Temperate/Dry
   winter/Cold summer; Cwa = Temperate/Dry winter/Hot summer;
   Cfa = Temperate/No dry season/Hot summer; Cfc = Temperate/No dry
   season/Cold summer; Dfb = Continental/No dry season/Warm summer.

   From
   [139]https://en.wikipedia.org/wiki/K%C3%B6ppen_climate_classification.

   In this study, we identified several genes that are involved in the
   lipid metabolism, adipogenesis and feed intake, directly or indirectly
   related to energy balance. CLVS2 participates in regulation of foetal
   development in cattle that underlie the effects of early maternal
   nutrient restriction^[140]17. Interestingly, if we compare the Nubian
   populations from Egypt and Argentina, we can observe a clear
   discrepancy on the resources available in terms of food and water,
   since Egypt is a country characterised by a hot and dry climate. The
   same observation applies to Angora, as the populations from Argentina
   and South Africa, sampled in an arid zone, shared a common result in
   the analysis of ROH, with a greater number of medium-large homozygote
   segments (4–8 Mb), and thus suggesting a certain degree of selection
   that is occurred not recently. The CCSER1 locus was previously
   associated with the feed efficiency in beef cattle^[141]18 and in
   sheep^[142]19. It is worth highlighting that this gene lies close to
   well-known genes associated to body size, growth and height and it
   falls within the large genomic region we identified in Nubian. Our
   findings are confirmed by previous studies that reported a strong
   positive selection around the ABCG2, SPP1, LAP3, NCAPG, LCORL, PKD2,
   IBSP, and MEPE genes in domestic goats and sheep^[143]19.

   In all breeds, we also pinpointed genes under selection for altitude
   adaptation, like DCLRE1C, FANCM and PPP1R12A in the Angora population,
   MCPH1 and ANGPT2 in the Boer group, TRHDE and IBSP in the Nubian and
   TRAP1 (as discussed above), CEBPZ, HMOX2, NMRAL1 in Saanen. HMOX2,
   involved in hypoxia response and the neighbouring NMRAL1, involved in
   synthesis of nitric oxide, are thought to be contributors to adaptation
   to high altitude in humans^[144]20,[145]21. Edea and co-workers^[146]22
   observed PPP1R12A to be associated with high altitude adaptation in
   Ethiopian sheep, and previous studies have already demonstrated that
   hypoxia increased phosphorylation of this gene^[147]23.

   One of the most important and predictable environmental variations is
   seasonality in temperate zones, based on photoperiodism over the
   year^[148]24. Two out of four breeds studied here (Angora and Saanen)
   showed several candidate genes linked to physiological adjustments
   driven by photoperiodism. For example, CLOCK is one of the most
   important genes that controls circadian rhythms by regulating various
   physiological functions including sleep, body temperature, blood
   pressure, endocrine, cardiovascular and immune systems^[149]25. The
   CLOCK gene also has an impact on energy metabolism influencing the
   rhythms of feeding behaviour^[150]26. In the Angora breed, we found the
   KDR gene that is related to coat colour, and that falls into the same
   genomic segment that contains other genes like SRD5A3, TMEM165, PDCL2,
   EXOC1L, CEP135, SCFD2, FIP1L1, LNX1, PDGFRA, CLOCK, NMU and EXOC1 found
   under selection in Reggiana cattle^[151]27.

   Stress can affect the immune system by inducing alteration of
   inflammatory processes and the animal’s inflammatory response is a
   survival mechanism to cope with pathogenic or non-pathogenic
   challenges^[152]28. Oxidative stress is considered an imbalance between
   oxidant and antioxidant status and considered one of the key factors
   causing the weakening of immune system in animals that have undergone
   heat stress. Macrophages and neutrophils play an important role in
   innate immunity by producing nitric oxide. NMRAL1, a candidate gene in
   Saanen breed, is related to the synthesis of nitric oxide and maybe
   could play a role in the activation of inflammatory processes. Several
   studies suggest that exposure to heat results in oxidative stress, thus
   promoting cytotoxicity^[153]29 and cellular damage^[154]30. It is
   remarkable how we detected, again in Saanen breed and as previously
   mentioned, the gene HMOX2 that is involved in the antioxidant response
   like its homologous HMOX1 gene that has been reported to play a role in
   "tissue tolerance"—the ability to resist pathogens, inflammation, or
   oxidative stress-mediated damage during infection or inflammation in
   humans^[155]31. This intersection amongst oxidative imbalance, immune,
   and physiological responses has been already described in
   sheep^[156]28.

   In the selective sweep of CH21 (Angora breed), we found several
   interesting genes like FKBP and LRFN5. LRFN5 is involved in immune
   system in cattle^[157]32. It is worth noting that this gene maps inside
   a QTL region identified in sheep and involved in scrapie infection, a
   disease of the nervous system^[158]33. In the same genomic region of
   CH21, we also found MAP1A, that allows the maintenance and
   restructuring of adult neurons^[159]34 and maps inside a QTL affecting
   classical scrapie incubation time in a population of
   scrapie-infected^[160]35.

   The thermal environment is the largest single stressor affecting the
   efficiency of animal production systems. Some evidence from field
   studies in sheep^[161]36 highlighted that the physiological and
   behavioural adaptations that allow animals to maintain homeothermy,
   negatively impact their growth, welfare and reproduction. Therefore, it
   is not surprising that our analysis revealed within the selected
   regions several genes that correlated to reproduction traits like
   fertility and productive performances, including growth and
   development. The expression of NR3C1 was explored in the ovine
   uterus^[162]37,[163]38 discovering the crucial role of endometrial
   functions during early pregnancy in sheep. The effects of environmental
   stressors are also evident in male reproductive performances.
   Testicular thermoregulation is imperative to produce healthy viable
   spermatozoa^[164]39. We found in our study SCAPER, SEPTIN12, RODGI,
   selected in the Saanen group whereas HORMAD1, TDRKH, CELF3 and OAZ3 in
   the Nubian group, all genes related to spermatogenesis and fertility in
   mice^[165]40 and humans^[166]41.

   It has been reported that a reduction in wool fiber diameter is a
   consequence to deteriorating food quality and availability^[167]42.
   Based on our results, we observed selection signatures in the FGF1
   gene, a member of the fibroblast growth factor family and involved in
   the growth and development of various tissues and organs. FGF1 was also
   the target gene of a miRNA that had an effect on growth and development
   of hair follicles in sheep^[168]43.

   Our results showed several novel and established genes that are
   correlated with milk, meat and growth traits (development, body size
   and height). Amongst them, the most important were retrieved in the
   selective sweep on the CH6 of Nubian populations, that includes CCSER1,
   LAP3, MED28, FAM184B, DCAF16, NCAPG, LCORL, SLIT2, PACRGL, KCNIP4,
   PPARGC1A, these loci were described in cattle^[169]44,
   sheep^[170]19,[171]45 and goats^[172]46.

   The large hotspot retrieved on CH6 and shared by all populations
   investigated, contained genes associated to reproduction and immune
   resistance. GnRHR regulates the production of gametes and gonadal
   hormones and it is important for reproduction control in buffalo,
   cattle and goats^[173]47–[174]49. Interestingly, in a review
   investigating the evolution of GnRHR family genes and its receptors,
   the following genes surrounding the mammalian GnRHR1 (STAP1, UBA6,
   GnRHR, TMPRSS11D, TMPRSS11A, TMPRSS11F, TMPRSS11E and YTHDC1) and
   retrieved in our analysis, are conserved in human, mouse and other
   vertebrates^[175]50 suggesting that they can affect the same trait.

   In the Gene Ontology (GO) analysis, all biological processes are
   related with neurological functions and the nervous system in Angora
   breed. The functional annotation exacerbates neurological pathways
   involving behavioural acetylcholine-mediated responses. Acetylcholine
   (ACh) is the neurotransmitter used for muscular activation and all
   biological processes converge to cholinergic transmission. These
   chemical signals act on regulation of smooth muscle contraction
   (GO:0006940) and as a component of presynaptic (GO:0042734) and
   postsynaptic (GO:0045211) membranes. In fact, the key genes in these
   pathways are CHRNA3, CHRNB4, CHRNA5, which are nicotinic acetylcholine
   receptors. Researchers demonstrated that mild hypoxia decreased ACh
   synthesis and the amino acid metabolism^[176]51. In this breed, we
   found genes related to hypoxia and in general to adaptation to harsh
   environments, thus suggesting that the nervous system regulates many
   processes that can affect the efficiency in maintaining homeostasis.
   Three out of four populations included in the Angora dataset were
   sampled in arid cold and with desert or steppe (Argentina and South
   Africa) and hot (Madagascar) environments– thus exposed to extreme
   conditions, whereas the French population is the only one that comes
   from a temperate climate.

   The Boer, Saanen and Nubian groups shared a GO associated with retinoic
   acid activity pathway. Retinol is vitamin A, a fat-soluble compound
   that is required for vision, cellular proliferation and
   differentiation. Studies in cattle demonstrated that it regulates
   intramuscular adipose tissue and muscle development^[177]52. Retinol
   metabolism pathway is also involved in feed efficiency in
   livestock^[178]53 and in normal immunologic function^[179]54. In the
   Boer breed, there is another interesting GO regarding angiogenesis. As
   we discussed above, angiogenesis is involved in some high-altitude
   adaptation responses.

   Boer populations belonging to this dataset come from many different
   climatic zones, with a wide range of environmental variations; for
   example, the population from Switzerland originated from a sample site
   with continental climate that exposed individuals to different
   temperatures in winter and summer, whereas the population from
   Australia is exposed to hot weather. Intriguingly WNT2B, that is a
   potential target gene in wool follicle development, showed a footprint
   of selection in this breed that is not farmed for this purpose,
   suggesting that it could be related to the local adaptation of some
   populations to the temperature regime. If additional studies would
   verify that a selective pressure is acting on this locus in this breed,
   it could be a further confirmation that natural selection continues
   affecting and leaving detectable traces.

   In Saanen populations, the GO revealed also photoreceptor outer
   segment/photoreceptor cell maintenance processes that together with
   retinol acid activity pathway can lead us to hypothesize that some part
   of the genome is triggering mechanisms to the protection/maintenance of
   cells belonging to the visual system, and maybe adapting it to a new
   and variable condition of light. Moreover, in this breed, we see
   several genes related to DNA repair and oxidative stress that are also
   related to solar radiation.

   In Nubian breed, the GO results showed links with energetic metabolism,
   protein, and fatty acids synthesis regulation, but also catabolic
   processes (proteins and collagen) and cellular response to oxidative
   stress. The phosphatase inhibitor activity is a remarkable finding,
   because this impedes the target enzyme activity, avoiding the protein
   and cellular lysis. Consequently, as the protein phosphatases, it
   negatively regulates the HPSs proteins^[180]55. Inhibitors of this
   protein can prevent cell and protein damage in response to thermal
   stress. The two populations from Egypt are subjected to many stressful
   factors, in particular to thermal stress, since they were sampled in
   arid, desertic zones with very hot temperatures, whereas the population
   from Argentina comes from a temperate climate, thus the local
   adaptation to bio-climatic conditions is evident.

   Although our objective was not to compare the effectiveness of each
   program used to carry out the analysis for discovering genomic regions
   under selection, it has been possible to see a general agreement on the
   evident clues of the adaptative processes that synergically activate a
   complex gene network. In our study, we found well known loci that have
   been identified in previous studies in goats as well as novel genes
   that showed implications for biogeographical adaptation described in
   other species, in particular on other ruminants. Most of these studies
   focused on local or indigenous populations, thus highlighting a
   probable population-specific selection footprint. Detecting regions
   under selection is a complex task, and this is reflected from the
   intricate connections amongst genes and biological processes. Taken
   together, our findings indicated that natural selection operated and
   continues acting in commercial goat breeds despite human intervention.
   Moreover, they provided evidence of selection that may be specific to
   one or few populations (local adaptation), and this information could
   be useful to identify both causal variants that are involved in a
   particular phenotype or important adaptive traits and the affected
   genes. Further investigating the detected genes will shed light on the
   complex mechanisms involved in the adaptation process, and provide
   information on putative favourable variants. Such information could be
   use in selection/conservation programs, also via new breeding
   technologies.

Materials and methods

   Since the aim of this work was to detect loci that are under natural
   selection in artificially selected goat breeds, we addressed this issue
   choosing the follow four commercial and transboundary breeds: Angora,
   Boer, Saanen and Nubian. Each of them is known to be selected for a
   specific productive trait (wool, meet, milk and dual-purpose,
   respectively) and were transported over centuries in different
   countries, thus exposed to multiple environmental variables with
   respect to their original countries. Considering these characteristics,
   the four breeds studied meet our goal.

Sampling, genotyping and quality control

   Figure [181]8 describes the workflow followed for detecting genetic
   signatures in our dataset. Genotypic data were gathered for goat breeds
   with a worldwide distribution. A total of 993 individuals belonging to
   four commercial breeds with a worldwide distribution were included in
   the analysis: Angora (n = 366), Boer (n = 332), Nubian (n = 99) and
   Saanen (n = 163) breeds from AdaptMap project
   ([182]http://www.goatadaptmap.eu/^[183]56) and 33 genotypes of Russian
   Saanen goat^[184]57. All individuals were previously genotyped with the
   Illumina GoatSNP50 BeadChip^[185]3. The raw dataset was updated to the
   latest goat genome map (ARS1.2) and the quality control was carried out
   using Plink v1.9110^[186]58 (Table [187]1) excluding SNPs unmapped or
   mapped into the sex chromosomes, SNPs with minor allele
   frequency < 0.05%, markers that failure the Hardy–Weinberg test at a
   specified significance threshold of 1 × 10^−6, and SNP with call
   rate < 95%. Since we investigate breed-specific selection signatures
   related to adaptation, this procedure was repeated for all the four
   datasets, yielding a total of 44,655, 46,124, 44,800 and 47,325 for
   Angora, Boer, Nubian and Saanen, respectively. A first PCA analysis
   carried out with SNPrelate^[188]59 R package to explore the genetic
   structure Egyptian Nubian samples revealed a strong population
   divergence between individuals. Thus, we split the Nubian in two
   subpopulations: EGCH and EGCH1 (Table [189]1). Further analyses of this
   breed were done considering three populations in the Nubian dataset.

Figure 8.

   [190]Figure 8
   [191]Open in a new tab

   Graphical representation of workflow followed for detecting genetic
   signatures for adaptation.

   For better understand the genetic background of the four breeds, we
   used Arlequin 3.5.2 program^[192]60 to calculate the pairwise F[ST] and
   the Admixture 1.3^[193]61 for the clusters analysis testing a number of
   clusters (K) equal to the number of populations composing each breed
   plus 3.

Data analysis

   With the goal of leveraging the potential of the methods in capturing
   signals for regions under selection, we combined three complementary
   statistics with specific programs that can be used to calculate
   them^[194]62: Runs Of Homozygosity (ROH), FST-outliers detection and
   HapFLK methods were applied. ROH analysis compares genomic data within
   populations, and it is based on the detection of reduced local
   variability. The last two methods rely on the degree of differentiation
   due to locus-specific allele frequencies among populations and can be
   grouped into Single site (FST-outliers) and haplotype-based
   differentiation (HapFLK), respectively. To carry out these analyses
   DetectRUNS 0.9.4 package (R core 4.1), PCAdapt^[195]63 and HapFLK
   program v1.3^[196]64 were used.

Runs of homozygosity

   ROH are defined as two contiguous identical by descent (IBD) stretches
   of homozygous genotypes of a common ancestor present in an individual
   and inherited from both of its parents. The identification and
   characterization of ROH allow to reveal the population structure as
   well as footprints of natural and/or anthropogenic selection^[197]62.
   This analysis was carried out by using the R package DetectRUNS 0.9.4
   package (R core v4.1) applying the “sliding windows” function and with
   the following setting: windowSize = 15, threshold = 0.1, minSNP = 15,
   ROHet = FALSE, maxOppWindow = 1, maxMissWindow = 1, maxGap = 10^6,
   minLengthBps = 1,000,000, minDensity = 1/10,000. We identified both ROH
   (length per class of ROH, total length per chromosome and sum at
   individual level, the frequency of SNP in each segment and visualising
   the homozygous segments per classes of length) and ROH islands
   (frequency of ROH at population level) ROH islands were plotted for all
   breeds using Biocircus^[198]65 package in R v.1.3.1073 (R core team
   2020).

F[ST] outliers detection

   PCAdapt is a R package that uses statistical tools for outlier
   detection based on Principal Component Analysis (PCA). Briefly, this
   program tests how much each variant is associated with population
   structure, assuming that outlier variants are indicative of local
   adaptation. We determined the optimal number of PCs as recommended by
   Luu and co-workers^[199]63 using the graphical PCAdapt function and
   keeping PCs that correspond to eigenvalues to the left of the lower
   straight line in the screeplot (Supplementary Fig. [200]6) according to
   “Cattell’s rule”, that were 10 for all breeds. The P-values associated
   to the outlier variants were corrected with Bonferroni with a threshold
   of 0.05.

HapFLK analysis

   With the HapFLK program v1.3 the loci under selection are revealed by
   comparing the genetic differentiation amongst the analysed populations
   with respect to the neutral drift model identifying genomic regions or
   loci showing deviations from neutrality (selective sweeps). The
   analysis was performed using the scripts available at
   [201]https://forge-dga.jouy.inra.fr/projects/hapflk. The number of k
   (haplotype clusters) that better fits our data and estimated using the
   cross-validation procedure included in the fastPHASE software
   of^[202]66 was 35 for all breeds. The hapFLK statistic was computed as
   an average of 30 EM iterations to fit the Linkage Disequilibrium (LD)
   model. The P values obtained using the “Scaling_chi2_hapflk.py” script
   available at [203]https://forge-dga.jouy.inra.fr/documents, were
   corrected for multiple comparisons using the false-discovery rate (FDR)
   method in R and SNPs (with a P value ≤ 0.05)were considered
   significant. Graphical representations of the Manhattan plots of the
   significant outliers and the selective sweeps retrieved with PCAdapt
   and HapFLK were done using CMplot package in R v.1.3.1073
   ([204]https://github.com/YinLiLin/R-CMplot).

Searching for candidate genes and pathways related to adaptation

   The next step was to compare results from the three methodologies and
   to verify if identify genomic regions overlapped. Then, a screening
   within 1 Mb downstream and upstream of each significant marker was
   applied to pinpoint positional candidate genes, using Ensemble BioMart
   Capra hircus ARS1 data mining tool
   ([205]https://m.ensembl.org/info/data/biomart/; Capra_hircus—Ensembl
   genome browser 108). Loci were investigated for each breed, focusing on
   previous studies about selection signatures mainly in goat, but also in
   other livestock species like sheep and cattle, since that the
   annotation of some genes in goat is still lacking or poor. Pathway
   enrichment analysis was performed to explore possible pathways involved
   in environmental adaptation. The genes identified from Ensemble BioMart
   were stored to perform a functional annotation using Ovis aries, Bos
   taurus and Homo sapiens databases by DAVID v6.8^[206]67.

Supplementary Information

   [207]Supplementary Information 1.^ (96.1KB, pdf)
   [208]Supplementary Information 2.^ (160.2KB, pdf)
   [209]Supplementary Information 3.^ (22KB, pdf)
   [210]Supplementary Information 4.^ (93KB, pdf)
   [211]Supplementary Information 5.^ (64.8KB, pdf)
   [212]Supplementary Information 6.^ (2.5MB, pdf)
   [213]Supplementary Information 7.^ (86.9KB, docx)

Acknowledgements