Graphical abstract graphic file with name fx1.jpg [41]Open in a new tab Highlights * • The genomes of chickens worldwide carry a wealth of variation information * • Human-driven selection accelerates the shaping of genomic patterns of domestic chickens * • Genetic variants in IGF1, SMC1B are potent drivers for chicken body size and fertility * • Novel genes are promising targets influencing sperm storage capacity in layer chickens __________________________________________________________________ Biological sciences; Genetics; Poultry genetics Introduction Chicken is one of the most widely distributed animals in the world.[42]^1 It plays an indispensable role in human daily life by providing abundant meat and eggs. There is multiple evidence that suggest that the ancestor of modern chickens is the red jungle fowl (RJF),[43]^2^,[44]^3^,[45]^4^,[46]^5 and the domesticated chickens are traced back to 10,000 years ago.[47]^6 With the enhancement of human-driven selection, chickens in different regions are artificially selected to form different phenotypes and physiological characteristics to suit human needs, such as the adaptability to the gelidity of Tibetan chickens, the wide environmental tolerance of local domestic chickens, the aggressiveness of fighting cocks, and the high-yielding performance of commercial chickens. According to the United Nations Food and Agriculture Organization statistics, there are more than 1,600 breeds of domestic chickens in the world. The abundant breed resources are great advantages to support chicken being an excellent animal model for exploring the changes in genomic characteristics and genetic mechanisms that contribute to the diversity of phenotypic traits in the progress of domestication. With the advent of next-generation sequencing technology, whole-genome resequencing has been widely applied to the field of chicken genome research in recent years, such as the genetics of adaptation,[48]^7^,[49]^8^,[50]^9 the evolution of origin,[51]^5 and the patterns of genetic diversity.[52]^10^,[53]^11^,[54]^12^,[55]^13 Although many studies have reported genomic variations affecting complex phenotypic traits in chickens,[56]^14^,[57]^15 a thorough understanding of complex phenotypes is lacking due to different domestication histories and challenges to perform a more comprehensive population analysis. What is more, it is still unknown how changes in genomic characteristics shape the phenotypic diversity. In this study, we performed whole-genome resequencing of 100 commercial Jinghong layer chickens from China at a sequencing depth of approximately 16.24×, combined with those of 377 chickens from 24 breeds obtained from public databases to build up the sample size to 477 chickens. We utilize the resequencing data derived from 477 chicken individuals to identify the candidate genes and beneficial alleles contributing to chicken domestication including the subsequent genetic improvement. We propose that body size and egg production are the prominent outcomes of chicken domestication and subsequent genetic improvement, which were manipulated beyond a few key loci with critical changes and large genetic effects. We also revealed a genome-wide differential landscape of sperm storage capacity first, which can be considered as a potential economic trait for egg layers in the future. The potential genetic basis revealed in this report of these traits and the mechanisms can provide a microscopic view of the evolution of the genome and the genesis of phenotypic polymorphism. Results Sequencing and genomic variant calling In this study, we performed whole-genome resequencing of 100 commercial Jinghong layer chickens from China at a sequencing depth of approximately 16.24× and obtained a whole-genome resequencing dataset of 377 chickens derived from 24 chicken breeds from public databases to build up the sample size to 477 chickens. The 25 chicken breeds were mainly located in Asia, Europe, and America([58]Figure 1A). A total of 7.4 Tb clean data were generated in this study, with an average of 14.8× depth and 99.28% genome coverage per chicken ([59]Tables 1 and [60]S2). The Single-nucleotide polymorphism (SNP)/Insertion and Deletion (InDel) calling revealed a total of 23,504,766 SNPs and 3,289,782 InDels ([61]Tables S3 and [62]S4), distributed across the genome with detailed variants and marker density as shown in [63]Figure S1. After quality control, we detected the number of SNPs in each population ranged from 3.1 to 11.2 million, and ultimately identified 8,914,523 high-quality SNPs and 968,861 InDels ([64]Table 1) among 477 individuals. The high-quality SNPs consist of 387,755 novel SNPs and 8,678,756 autosomal SNPs. Most of the SNPs across the whole genome of chicken were located in intron regions (4,317,273), with only 1.32% (117,669) located in exonic regions which included 27,414 missense variants and 90,255 synonymous variants ([65]Table S3). In addition, a total of 24,700 deletions, 2,098 duplications, 228 inversions, and 1 insertion were detected in this analysis. Figure 1. [66]Figure 1 [67]Open in a new tab Geographic distribution and overall population structure landscape of chickens (A) Geographical distribution of 25 chicken breeds. (B) PCA plot of the first two principal components (PCs) for the 477 chickens. (C) Maximum-likelihood phylogenetic tree between 477 individuals. (D) Decay of linkage disequilibrium (LD) in 25 breeds, with one line per breed. (E) Population genetic structure of the 477 chickens. The length of each colored segment represents the proportion of the individual genome inferred from ancestral populations (K = 15). The other population structure information (K = 2–15) was listed in [68]Figure S5. Table 1. Summary information of chicken whole genome resequencing Breeds Abbreviation N male female depth (x) SNPs Indels Black Cochin BC 10 10 0 4.06 3573837 399581 Baier Yellow BEH 6 0 6 35.57 6888513 1178934 Brown Layer BL 20 20 0 8.15 5079218 591695 Broiler A BRA 20 11 9 11.41 8615308 990852 Broiler B BRB 20 10 10 11.71 7882862 907655 Dulong DLC 10 5 5 15 11173815 1328642 Daweishan DWS 28 13 15 6.32 4421491 549280 Light Brahma FT 20 17 3 4.97 3182747 361726 Hetian HT 6 0 6 38.62 7168607 1219559 High Bodyweight Selected HWS 20 9 11 41.61 3626693 443555 Jinghong JH 100 0 100 16.24 5307918 621015 Langshan LS 20 12 8 7.8 7431609 871461 Low Bodyweight Selected LWS 20 5 15 41.06 3938210 480138 Luxi Gamecock LX 10 6 4 10.38 10636079 1213606 Liyang LY 20 12 8 8.18 10033953 1163149 Red Jungle Fowl RJF 35 22 13 5.18 8786311 993005 Tibet Chickens TC 15 14 1 20.2 7987215 831838 Tulufan Gamecock TLF 6 6 0 37.82 7134685 1235187 Wuhua Yellow WHY 12 5 7 21.27 9336950 1066855 White Leghorn WLG 10 0 10 8.61 5285599 639234 White Layer WL 20 20 0 8.65 3965364 472399 Xinghua XH 9 6 3 10.26 7887132 1150718 Yuanbao YB 24 11 13 6.88 4311707 474371 Beijing You YOU 10 6 4 10.17 9165644 1051810 Yunyang Da YY 6 0 6 35.59 6910300 1184562 Allsample 477 220 257 14.75 8914523 968861 [69]Open in a new tab Phylogenetic relationship, population genetic structure and genomic diversity Principal component analysis (PCA) showed the different genetic relationships among 477 individuals ([70]Figures 1B and [71]S2). Of those, the samples of each breed were clustered in one cluster, and the genetic characteristics of the Chinese domestic chicken and the RJF were closer in terms of clustering relationships. Maximum-likelihood (ML) phylogenetic tree ([72]Figure 1C) and ADMIXTURE analysis ([73]Figures 1E and [74]S5) further uncovered different population structures between domestic local chicken and commercial chicken. On the basis of the phylogenetic and PCA analyses, six groups were clearly distinguished, including RJF, Chinese local domesticated chickens, American local domesticated chickens, layers, broilers and, Virginia chicken lines (High Bodyweight Selected [HWS] and Low Bodyweight Selected [LWS]), which were also supported by the ADMIXTURE analysis. To investigate the genetic diversity among different chicken populations, we found that most of the domesticated Chinese local chickens showed higher levels of diversity than RJF ([75]Figure S3), whereas two dwarf chickens (Yuanbao and Daweishan) showed lower levels of diversity RJF, which is consistent with the previously reported results.[76]^16 It is notable that the lowest levels of diversity were observed in American domestic chickens, followed by Virginia chicken lines and commercial chickens. Correspondingly, the weakest linkage disequilibrium (LD) decay level was observed in Virginia chicken lines, followed by commercial chickens, domestic chickens, and RJF chickens ([77]Table S5 and [78]Figure 1D). In addition, the population genetic differentiation index F[ST] values among each population were estimated to range from 0.03 to 0.46 ([79]Figure S6). These results indicate that domestication and subsequent artificial selection are the two main drivers to establish chicken genomic diversity. Moreover, the Virginia chicken lines were assessed to have the relatively highest F[ST] compared to other groups, suggesting that high-intensity artificial selection will accelerate the rate of population differentiation and contribute to population diversity. Selective sweep analysis implies the direction of modern genetic improvement To detect selective sweeps affecting the phenotypes of chickens during artificial and natural selection, the composite likelihood ratio (CLR) and the integrated haplotype score (iHS) test were performed to analyze 13 populations based on autosomal SNPs in this study ([80]Figure 2A). A total of 204.3 Mb genomic regions of 13 populations were identified as potential selection signatures, ranging from 12.11 Mb to 19.21 Mb per chicken line ([81]Table S8). The candidate genes were further annotated among the selected regions, and numbered from 340 to 846 in each breed, respectively. As shown in [82]Figure S7, there were few selective genes shared among populations, including the populations with similar breeding direction. There were almost no identical selected genes between domestic chickens and laying chickens, which indicates the rich genetic diversity among different chicken breeds. By enriching the candidate genes in the strong selective sweep regions of each population, we found that these genes were mainly enriched in signaling pathways and biological processes related to growth, metabolism, and reproduction, such as animal organ development, cell differentiation, mitogen-activated protein kinases (MAPK) signaling pathway, and oocyte meiosis ([83]Figures 2B, 2C, [84]Tables S9, [85]S10, and [86]S11). These results suggest that many candidate regions and genes potentially contribute causally to phenotypic changes during selective breeding. It also indicates that the main direction of selection in the process of chicken domestication and breeding is to improve growth and reproductive capacity. Figure 2. [87]Figure 2 [88]Open in a new tab Selection signal and enrichment analysis of 13 chicken breeds (A) Genomic landscape of selection signal in different breeds. From inner to outer, the circle indicates signature of selection from each statistics, CLR, |iHS|, and chromosome scheme, respectively. (B) Gene Ontology (GO) annotation analysis of selected genes in different breeds. This figure shows GO terms significantly annotated in at least three breeds. (C) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of selected genes in different breeds. This image shows the top 20 signaling pathways simultaneously enriched in at least four breeds. The complete list of bioinformatics analysis results was shown in [89]Tables S9 and [90]S10. Genetic loci associated with chicken body size Chicken body size is a complex trait depending on body length, shank length, breast width, breast depth, and body weight. In this study, we collected data on phenotypic values of body length, shank length, and body weight which are typical for different breeds ([91]Table S1). To simplify the measurement metrics, we performed a correlation analysis of the three phenotypic values as shown in [92]Figure 3A. The correlation between body weight and body length, body weight and shank length, and shank length and body slope length was 0.83, 0.79, and 0.82, respectively, with significant correlations among the three indices (p < 0.05). Among the three indicators, the body weight phenotypic data being relatively well recorded were chosen to represent the body size trait in this analysis. Figure 3. [93]Figure 3 [94]Open in a new tab Genome-wide scan and GWAS analysis affecting chicken body size (A) Correlation analysis of three indicators (body length, shank length, and body weight) of chicken body size. The shadow represents the 95% confidence interval, p < 0.05. (B) Genome-wide scan for positively selected genes in different body-size chickens. (C) Signatures of recent selection on IGF1 in different body-size chickens. (D) Allele pattern of IGF1 in different body-size chickens. (E) Genome-wide association analysis of body size using SNPs and InDels. (F) Manhattan plot around the SOX5 gene. (G) LD heatmap around the SOX5 gene. Here, two merged groups based on the body weight phenotype were applied to reveal the selection signatures associated with the body size by comparing genomics analysis. The distribution of genomic regions with significant differences in F[ST] and cross-population extended haplotype homozygosity (XP-EHH) values is shown in [95]Figure 3B. Combining the annotation of the quantitative trait locus (QTL) database, a total of 259 genomic regions including 782 genes are potentially related to the body size in chicken ([96]Tables S12 and [97]S13). On the genome-wide scale, the most significant regions were located near the IGF1 gene on chromosome 1 ([98]Figure 3C), which has been reported in several articles to be associated with growth and development in chickens.[99]^16^,[100]^17 Haplotype analysis revealed the significant differences in the haplotypes of IGF1 gene between large and small body size, with some loci showing completely opposite genotypes between the two populations ([101]Figure 3D). These loci also showed different patterns between the two weight-selected populations ([102]Figure S8), including a locus located within the IGF1 being significantly associated with body weight trait in the GWAS analysis. These results imply that IGF1 is a candidate gene with high confidence to influence chicken body size. We also identified 15 other candidate genes associated with body size trait ([103]Table 2), such as LCORL associated with body size in mammals,[104]^18 FMN1 associated with limb development,[105]^19 and TMEM263 associated with the dwarf trait in chickens.[106]^20 In addition, the comparative genome analysis based on Structure Variantion (SV) allele frequency between the two merged groups based on the body weight phenotype, including HWS and LWS, also reveals a series of genomic differentiation regions. Among them, a total of 9 deletions were found to be highly correlated with body size, including IGF2BP1, KCNJ6, and SATB2 ([107]Figure S9 and [108]Table S16). Table 2. A list of candidate genes related to body size obtained by GWAS or comparative genomic analyses Position Score/p-value(method) Candidate Gene Function 1:23662850-23926311 0.140(F[ST]); −3.097(XP-EHH); KCND2 Sheep body size[109]^21 1:53635743-53648531 0.156(F[ST]); 2.224(XP-EHH); TMEM263 Chicken body size[110]^20 1:55281097-55330373 0.295(F[ST]); 3.207(XP-EHH); 5.84E-10(GWAS); IGF1 Chicken body size[111]^16 1:55372179-55373646 0.232(F[ST]); 3.862(XP-EHH); PMCH Growth rate of chickens[112]^22 1:55597280-55669104 0.159(F[ST]); 3.098(XP-EHH); MYBPC1 Muscle development of chickens[113]^23 1:65833774-66173568 5.57E-28(GWAS) SOX5 Cartilage development in animals[114]^24 3:14058563-14184797 0.142(F[ST]); 2.902(XP-EHH); PLCB4 Pig body size[115]^25 4:53577145-53603062 0.150 (F[ST]); 2.561(XP-EHH); FGF2 Early muscle growth in chickens[116]^26 4:75821282-75869811 0.161 (F[ST]); 3.083(XP-EHH); LCORL Affecting the body size of mammals[117]^18 5:30598320-30751690 0.177 (F[ST]); 3.794(XP-EHH); FMN1 Limb development in chickens[118]^19 5:31437974-31606534 0.175(F[ST]); 4.166(XP-EHH); MEIS2 Chicken body weight[119]^27 5:38669193-38684139 0.181(F[ST]); 2.534(XP-EHH); TGFB3 Early muscle growth in chickens[120]^28 7:24024963-24051622 0.147(F[ST]); 2.637(XP-EHH); GYPC Muscle development of chickens[121]^29 9:10461236-10501942 0.134(F[ST]); 2.604(XP-EHH); ATP1B3 Early embryonic development of animals[122]^30 18:458826-477172 0.148(F[ST]); 2.300(XP-EHH); MYH1A Early muscle growth in chickens[123]^31 18:564255-586932 0.148(F[ST]); 2.300(XP-EHH); MYH1E Muscle development of chickens[124]^29 18:808100-1027820 0.136(F[ST]); 2.756(XP-EHH); MYOCD Chicken body weight[125]^32 [126]Open in a new tab Further genome-wide association study indicates that a total of 161 SNPs and 29 InDels were identified to be significantly associated with body size trait in chickens ([127]Figure 3E, [128]Tables S14, and [129]S15). Of those, the SNPs/InDels close to the SOX5 gene on chromosome 1 were shown to be the most significant signals ([130]Figure 3E). By analyzing the LD in this region, we found a higher degree of LD within the SOX5 gene than in the upstream region ([131]Figure 3G), and SOX5 is closely associated with cartilage development.[132]^33 Therefore, this gene may influence body size in chickens by regulating skeletal development. Genetic loci associated with chicken egg production Similarly, a total of 138 genomic regions containing 443 genes were identified as the selection signatures for chicken egg production based on the annual egg production of chickens ([133]Figure 4A, [134]Tables S17, and [135]S18). Of those, we observed that the selection signature on chromosome 1 with a significant difference between high and low populations is close to the SMC1B gene. In addition, Genome wide association study (GWAS) analysis also indicated that a locus within this region was significantly associated with egg production ([136]Figure 4B), and the haplotypes in this region differed significantly between high and low-egg-production populations. We observed relatively complete haplotypes in the egg population and similar haplotype patterns among three egg populations (JH, WL, and WLG), while their genotype distribution was irregular in the low egg production population ([137]Figure 4C). These results suggest that this region, including SMC1B, may have been selected in the process of egg production improvement. Other candidate genes such as WDR25, ESR1, and SLIT2 have also been reported to be associated with egg production and oocyte development ([138]Table 3). Figure 4. [139]Figure 4 [140]Open in a new tab Genome-wide scan and GWAS analysis affecting chicken egg production (A) Genome-wide scan for positively selected genes in different body-size chickens. (B) Signatures of recent selection and SNP Manhattan plot around SMC1B gene in different egg-production chickens. (C) Allele pattern around SMC1B in different egg-production chickens. (D) Genome-wide association analysis of egg production using SNPs and InDels. (E) Manhattan plot around the SLC27A6 gene. (F) LD heatmap in SLC27A6 gene. Table 3. A list of candidate genes related to egg production obtained by GWAS or comparative genomic analyses Position Score/p-value(method) Candidate Gene Function 1:30463434-30606675 0.257(F[ST]); 5.455(XP-EHH); NELL2 Egg production of chickens[141]^34 1:67659344-67921693 0.239(F[ST]); 5.852(XP-EHH); ITPR2 Egg quality of chickens[142]^15 1:70362862-70398886 0.321(F[ST]); 6.005(XP-EHH); SMC1B Oocyte development in mice[143]^35 1:119273945-119465002 0.242(F[ST]); 5.516(XP-EHH); POLA1 Egg production of chickens[144]^36 1:179980747-180021121 0.259(F[ST]); 5.885(XP-EHH); LATS2 Follicle development in chickens[145]^37 2:111223762-111224811 0.291(F[ST]); 5.693(XP-EHH); MOS Follicle development in chickens[146]^38 2:111237295-111284428 0.305(F[ST]); 5.422(XP-EHH); PLAG1 Egg production of chickens[147]^39 3:49053965-49241576 7.99E-09(GWAS); ESR1 Egg production of chickens[148]^40 4:74981753-75225786 0.254(F[ST]); 5.775(XP-EHH); SLIT2 Ovarian development in chickens[149]^41 4:75726455-75920712 0.264(F[ST]); 5.752(XP-EHH); NCAPG Oviduct development or eggshell quality in chickens[150]^42 5:48778158-48850687 0.291(F[ST]); 5.480(XP-EHH); WDR25 Egg production of chickens[151]^34 6:21525301-21528367 0.296(F[ST]); 5.301(XP-EHH); CYP26A1 Oviductal secretion in chickens[152]^43 6:31489421-31522011 0.277(F[ST]); 6.681(XP-EHH); PLPP4 Egg production of chickens[153]^44 12:14398620-14690794 0.243(F[ST]); 5.548(XP-EHH); MAGI1 Egg production of geese[154]^45 Z:13428369-13558411 3.28E-08(GWAS); GHR Regulation of fertility levels in animals[155]^46 Z:45721101-45760913 2.91E-09(GWAS); SLC27A6 Lipid metabolism[156]^47 Z:55161716-55172718 3.91E-09(GWAS); TAL2 Fertility of cattle[157]^48 [158]Open in a new tab As a supplement, a total of 84 SNPs, and 17 InDels were identified to be significantly associated with chicken egg production using GWAS analysis ([159]Tables S19 and [160]S20). We found that the SNP/InDel variants associated with egg production were mainly positioned on chromosome Z ([161]Figures 4D and 4E). A series of genes, such as SLC27A6, UNC13B, FKTN, and TAL2, were further revealed based on both types of significant variants ([162]Table 3). By analyzing the LD in the region containing the SLC27A6 gene ([163]Figure 4F), we found a high degree of LD within the gene, consisting of a mutation TG/T (rs734362696) with a G deletion which showed a homogeneous genotype in both WL and WLG layer chickens. The result suggested that the mutation of SLC27A6 may increase the egg production of chickens. In addition, 15 deletion fragments were found to be closely related to egg production traits in the SV allele frequency analysis, including TANGO2, IGF2R, MRPS10, and other genes ([164]Figure S10 and [165]Table S21). Candidate genes affecting sperm storage capacity Sperm storage capacity in chickens is important and valuable for exploring the specific mechanisms of sperm storage in birds, which has not been reported at the genomic level due to the difficulty of phenotypic measurements. In this analysis, a trait-specific selective sweep analysis was applied to reveal the potential genetic mechanism of sperm storage capacity. On basis of three phenotypic gradient difference population pairs, a total of 3780 genetic loci with gradient differences were considered as the potent selection signatures for sperm storage capacity ([166]Figures S11, [167]S14, and [168]Table S23). Of those, 358 regions with haplotype differences were detected by the XP-EHH method ([169]Figure 5A and [170]Table S24). The functional annotation further detected 59 mutations which were located in exons, including 10 missense mutations and 49 synonymous mutations. The missense mutations were located in the gene regions of ALCAM, ZC3H12D, AKAP12, GSDMA, THRA, WIPF2, CDC6, TNS4, FKBP8, and ENSGALG00000054634, respectively, which are mostly associated with immune defense. Integrating the results of both differential haplotypes and loci ([171]Figures 5A and [172]S11), we annotated 5 candidate genes (TDRP, ALCAM, ARL6, PFKP, and ZPBP2), and a candidate region with significant haplotype differences between the ALCAM and ARL6 genes located on chromosome 1, consisting of a dense number of loci with gradient differential variation and an A/G missense mutation on the ALCAM gene that changed the amino acid from His to Arg ([173]Figures 5B and 5C). In addition, haplotype patterns of the genes ALCAM and ARL6 showed substantial differences between the highest phenotypic subgroup (H3) and the lowest phenotypic subgroup (L3) ([174]Figures 5D and 5E). Figure 5. [175]Figure 5 [176]Open in a new tab Visualization of trait-specific selection signatures for chicken sperm storage capacity (A) Genome-wide distribution of trait-specific signatures of selection detected by XP-EHH across all autosomes in Jinghong layer chickens. (B) Manhattan plot for Allele frequency difference (AFD) and F[ST] score around the ALCAM gene. (C) Manhattan plot for AFD and F[ST] score around the ARL6 gene. (D) Allele pattern for ALCAM gene in Jinghong layer chickens with high or low sperm storage capacity. (E) Allele pattern for ARL6 gene Jinghong layer chickens with high or low sperm storage capacity. The colored dots indicate the trait-specific selection signatures. The black (green, red) dots indicate the selection signatures identified in first (second and third) population pair. A complex molecular network regulating improvement traits in domestic chickens In this study, we noted that the body size and reproduction traits were shaped in the process of chicken genetic improvement. After enrichment analysis, we found that the candidate genes affecting egg production were significantly enriched in oocyte meiosis, peroxisome proliferators-activated receptor (PPAR) signaling pathway, and retinol metabolism pathways. We have noticed that some candidate genes were significantly enriched for biological processes related to reproduction or energy metabolism such as circadian rhythm regulation, intracellular cell transduction, gene expression regulation, and retinoic acid metabolism ([177]Figure S12). In addition, candidate genes affecting body size were enriched to pathways such as Wnt signaling pathway, MAPK signaling pathway, focal adhesion, and calcium signaling pathway. Gene Ontology (GO) clusters were primarily enriched in biological processes mainly including cell differentiation, signal transduction, and transcriptional regulation ([178]Figure S13). Many candidate genes are simultaneously enriched in signaling pathways that affect growth and reproduction, suggesting pleiotropy of the selected genomic regions. Here, we provide a preliminary construction of a signaling pathway network affecting complex economic traits in domestic chickens ([179]Figure 6). Several candidate genes, such as IGF1, MAP2K4, PLCB4, and PPP3CA, may play important roles both in the reproductive ability and growth development of chickens. The other genes, such as RAC1, TGFB3, and FZD10 may play major roles in chicken growth, whereas genes such as SMC1B, ITPR2, and SLIT2 mainly affect the reproductive ability of chickens. All the candidate genes constitute a complex molecular network to regulate the growth, development, and reproductive behavior of chickens. Figure 6. [180]Figure 6 [181]Open in a new tab Schematic mechanisms of genetic adaptation signaling pathways for body size and reproduction in domestic chickens The candidate genes found in chicken body size are shown in green. The candidate genes found in chicken egg production are shown in red. The candidate genes both detected in chicken egg production and body size are shown in purple. The candidate signaling pathways are shown in blue. Dotted arrows indicate an indirect effect. Selection signals caused the changes of evolutionary relationship in small genomic regions To investigate the effect of selection signals on the evolutionary relationships within genes of different breeds, the maximum likelihood evolutionary trees were constructed based on autosomal and selected gene regions, respectively. The phylogenetic pattern in the tree for the autosomal region was consistent with the known evolutionary history of chicken with strong bootstrap support ([182]Figure 7A). However, in the tree for the IGF1 and SMC1B gene regions, the branches of the evolutionary tree showed a significant discrepancy. In the IGF1 gene-region tree, chickens with a large body size were clustered on a single branch based on the fixed allele frequencies and the similar haplotype modules, but the evolutionary pattern among chickens with a small body size is not clear ([183]Figure 7B). In the SMC1B gene tree, three laying breeds with high-egg production were clustered into one branch due to the fixed allele frequencies and consistent haplotype modules, leaving another egg-laying breed diverged to other branches due to the different haplotypes ([184]Figure 7C). Similarly, the clustering relationship of layer chickens with low-egg production is not clear as their allele frequencies are still in a dynamic process of change. Figure 7. [185]Figure 7 [186]Open in a new tab The maximum-likelihood trees for different genomic regions (A) The evolutionary tree was built based on autosomal data. (B) The evolutionary tree and allele frequency heatmap constructed on IGF1 gene. (C) The evolutionary tree and allele frequency heatmap constructed on SMCB1 gene. The colors in the sectors and rectangles represent the different phenotypes used in this study. Red represents large-bodysize domestic chickens, blue represents small-bodysize domestic chickens, yellow represents high-egg production domestic chickens, green represents low-egg production domestic chickens. (D) Evolutionary analysis and putative causal variant(rs735797093) of the NEDD4 gene. To further investigate the contribution of selection signals from larger evolutionary perspectives of birds, we thus annotated the SNPs within selected regions. In total, 6 missense variants of five genes were found with high absolute allele frequency differences (ΔAF >0.8; [187]Table S26) within chicken egg-production selection regions. The NEDD4 gene has the highest absolute allele frequency differences SNP (rs735797093), which encodes an amino acid that is not fully conserved in birds. Interestingly, the position is conserved in the ancient species of platypus and poultry-related birds ([188]Figure 7D). The evolutionary branching of the gene in birds also shows that there are differences at the micro-molecular level in the birds that were domesticated by humans, and the mutant locus occupies a high allele frequency in laying hens, suggesting this locus is highly associated with reproductive capacity. Discussion In this study, genome-wide scanning analyses of 25 breeds for population and phenotypic traits were performed to assess their contribution to genomic diversity in domestic chickens and changes in the genetic background of complex traits during domestication. Here, we primarily focus on three important commercial traits, including body size, egg laying, and sperm storage capacity, that have important economic significance on farm production. As expected, this study further proved a polygenic basis for these commercial traits in the process of genetic improvement. Theoretically, most of the commercial traits are categorized into quantitative traits, which are regulated by multi-genes. Our results indicate that the common selection signatures among different breeds are much less than breed-specific selection signatures, which is observed even in the breeds domesticated with a similar purpose ([189]Figure S7). However, we noted that although there are very few overlapping selection signatures across different breeds, some Go terms (or pathways) associated with growth trait were still enriched, such as the terms of growth plate cartilage development and the pathway of oocyte meiosis ([190]Figures 2B and 2C). These results suggest that both the polygenic basis of the commercial traits and the rich genetic diversity were developed in the chicken genome during long-term domestication. Compared to other commercial traits, body size shows a much broader diversity across different chicken breeds and is related to both egg-laying and meat-production traits. In general, the small body size of layers is beneficial to reduce the required maintenance, and the large body size of broilers is primed for meat production. Here, we not only reveal some established genes associated with body size such as IGF1 and IGF2BP1[191]^14^,[192]^16^,[193]^49 but also found several novel candidate genes, such as SOX5, SATB2, and KCNJ6 that have not been reported in the previous research on body size. In this study, the SOX5 gene was significantly associated with body size in the variation of both SNP and InDel types through genome-wide association study. The established function of SOX5 is to affect the pea-comb phenotype of chickens, with its causal mutation being a copy number variation in the first intron of SOX5.[194]^50 However, no regions of copy number variation associated with body size were found within this gene ([195]Figure S15). Therefore, we exclude the possibility that the copy number variation of the SOX5 gene is connected with body size diversity. SOX5 is reported to participate in cartilage formation and nervous system development.[196]^51 The mutations in SOX5 result in developmental delays and malformations in humans, suggesting an important role for SOX5 in animal growth and development. We assume that the SOX5 gene may indirectly influence body size in domestic chickens by interacting with or regulating other genes. Further fine mapping of egg production found that the ATP1B3 gene was detected in both GWAS and selective sweep analysis. This gene activates the Na/K-ATPase to translocate ions across the plasma membrane. It is closely associated with the regulation of osmotic pressure in mammalian neural and muscle function and early embryonic development.[197]^30 Similarly, it is jointly annotated to KIAA0930 both in GWAS and selective sweep analysis, which is associated with lung cancer in humans,[198]^52 and unclear reproduction-related roles in animals. The SMC1B gene is 50 kb downstream of the KIAA0930 and located in a candidate region affecting egg production, as evidenced in mice with oocyte development, and malformation of sterility due to the mutations in the coding sequence.[199]^35 In selective sweep analysis, the peak with the highest statistics associated with egg production was located between positions 48.46–48.84 Mb on chromosome 5, which is consistent with the region detected by Zhao using genome-wide association analysis, containing YY1, and WDR25 genes involved in oocyte and reproductive development.[200]^34 These results not only provide a validation of previous studies but also reveal a series of novel candidate genes of the egg-production traits. In addition, successful sperm-egg fusion is crucial and one of the most important events to maintain the species. Most animals in nature reproduce by internal fertilization, and the reproductive events of mating, insemination, fertilization, and delivery are often not synchronized in time, requiring a “sperm storage” mechanism to ensure that spermatozoa are remained temporarily in the female’s reproductive tract and are released in time for fertilization after ovulation. This process not only facilitates the selection of better quality sperm but also helps the sperm to be able to bind to the egg at the right time and place for successful fertilization. There are many factors that affect sperm storage, including genetic, environmental, and physiological metabolism. The complex sperm storage structure of chickens is of great interest to humans. There are only a few articles reported on transcripts with differential sperm storage capacity of chickens,[201]^53^,[202]^54 it remains elusive at the genomic level. In this study, we report for the first time the genome-wide distribution of loci affecting chicken sperm storage capacity. The further exploration of detailed genetic mechanisms manipulating sperm storage capacity is valuable for deciphering the diversity of biological reproduction and environmental adaptations beyond chickens. From a larger evolutionary and ecological perspective, chicken is landfowl, one of the earliest evolutionary branches of the bird family, which diverged from other birds to form a separate branch about 70 million years ago.[203]^55 During its long evolutionary history, the domestic chicken genome has undergone numerous breaks and recombination as well as mutations so that we can detect the occurrence of tens of millions of mutational events. Changes in these small fragments of genomic regions are the main reason for shaping the phenotype of domestic chickens. As shown in [204]Figure 7D, the missense locus harboring in the putative candidate gene of egg production is a novel mutation, which may occur after the domestication of RJF. However, this mutation locus is not fully conserved in other birds. Although this reduces the probability of this locus as a causal variation, this is also a microcosm of changes in genomic characteristics during the long evolutionary process of birds. It was a long way of chicken domestication from RJF to commercial lines. To meet the commercial demands, chickens have been subjected to human-driven selection leading to remarkable phenotypic changes in morphological characteristics and production performance. Our results indicate that artificial selection, as an enabler of phenotypic differentiation, starts from the shaping of small fragments of the genomic region and eventually determines the evolutionary direction of the population, such as the convergence of genomic patterns around the IGF1 and SMC1B gene ([205]Figures 7B and 7C). It’s no doubt that the genetic regulation mechanism and process behind adaptive evolution are extremely complex. Although we found a series of complex molecular regulatory networks collectively contribute to chicken domestication and its subsequent improvement ([206]Figure 6), the changes in animal phenotypes are not simply caused by the accumulation of genetic variations. On a larger scale, the interaction between various genes, the adaptability of individuals to their environment, and the competitiveness of species in the biosphere are also worth considering in future studies. Finally, we also noted that there are potential limitations in this study, for example, the accuracy of phenotypic recording and the effectiveness of variation detection and utilization. To address the possibly inadequate recording of complex traits in public databases, several previous studies tried to use phenotypic classifications instead of phenotypic values and achieved plausible results.[207]^56^,[208]^57^,[209]^58 Comparing with phenotypic classifications, the mean value of female records and male records may better reflect the true distribution and phenotypic variance for quantitative traits. Therefore, the phenotype was defined as the mean value of female records and male records for each breed in this study to reduce the bias of the results as well as achieve similar effects with phenotypic classification data. To test this hypothesis, we performed GWAS analysis of the two types of variation for the body weight trait using phenotypic classifications and phenotypic mean values, respectively. As expected, the Pearson correlation test showed that the results of the two methods were significantly correlated (r = 0.88, p < 2.2e-16; r = 0.7, p < 2.2e-16) ([210]Figure S16). In addition, the calling of SNP and indel variants both showed high accuracy, but the detection of SV using short-read sequencing is still a challenge to be improved with its accuracy.[211]^59 The emergence of pan-genomics has improved the efficiency of SV detection. For example, Wang et al.[212]^14 used pan-genomics to find potential SVs affecting body size in chickens and Zhou et al.[213]^60 also found that the SVs were major candidates for tomato phenotypes captured by graph pan-genomics. The related studies have contributed to our understanding of the genetic background of complex traits in chickens. With the improvement of read length and accuracy of third-generation sequencing technology, the pan-genome will serve as an important tool for further exploring the genetic basis of complex phenotypes observed in chickens, as well as other animals and plants. In summary, this study reports on the current phylogenetic relationships, population genetic structure, and genomic diversity of 25 breeds of chickens. The fine mapping of body size, egg production, and sperm storage capacity shed light on a polygenic basis of phenotypic diversity during domestication and subsequent improvement. The outcome of integrated analysis suggests that there may be an interesting genetic connection between body size and egg-laying traits at the level of molecular regulatory network. The abundant information produced in this study provides an important reference for functional genomics studies, as well as genome evolution from a micro-evolutionary perspective, and is also a valuable basis to further discover the mechanisms driving genome evolution in livestock and poultry. Our results and genome sequencing data presented in this study provide rich resources for functional genomics studies related to economic traits. Limitations of the study This research was performed to investigate the genomic diversity of domestic chickens and the genomic characteristics changes of complex traits during domestication. Although genome-wide scanning analyses of 25 breeds were employed, a broader global range of domestic chicken samples will help to observe the genetic basis of phenotypic diversity more systematically. Secondly, the accuracy of SV detection is usually dependent on the sequencing depth and the length of the sequenced reads. The short fragments used in this study and the discard of half of the samples with low depth (<10×) will reduce the identification of SV events. In addition, the breed phenotypic information of the chickens used for GWAS analysis depends on records rather than our actual measurements, which may reduce the power of the association study. STAR★Methods Key resources table REAGENT or RESOURCE SOURCE IDENTIFIER Deposited data __________________________________________________________________ Jinghong sequence reads This study GenBank: PRJNA940098 Black Cochin sequence reads Guo et al.[214]^61 GenBank: PRJNA552722 Light Brahma sequence reads Guo et al.[215]^61 GenBank: PRJNA552722 High Bodyweight Selected chicken sequence reads Guo et al.[216]^61 GenBank: PRJNA552722 Low Bodyweight Selected chicken sequence reads Guo et al.[217]^61 GenBank: PRJNA552722 Red Jungle Fowl sequence reads Wang et al.[218]^5 [219]http://bigd.big.ac.cn/chickensd/ White Leghorn sequence reads Wang et al.[220]^5 [221]http://bigd.big.ac.cn/chickensd/ Yuanbao sequence reads Wang et al.[222]^5 [223]http://bigd.big.ac.cn/chickensd/ Brown Layer sequence reads Qanbari et al.[224]^8 GenBank: PRJEB30270 Broiler A sequence reads Qanbari et al.[225]^8 GenBank: PRJEB30270 Broiler B sequence reads Qanbari et al.[226]^8 GenBank: PRJEB30270 White Layer sequence reads Qanbari et al.[227]^8 GenBank: PRJEB30270 Dulong sequence reads Wang et al.[228]^62 GenBank: PRJNA559932 Daweishan sequence reads Guo et al.[229]^63 GenBank: PRJNA597842 Langshan sequence reads Guo et al.[230]^63 GenBank: PRJNA597842 Liyang sequence reads Guo et al.[231]^63 GenBank: PRJNA597842 Wuhua Yellow sequence reads Weng et al.[232]^12 GenBank: PRJNA624239 Baier Yellow sequence reads Luo et al.[233]^11 GenBank: PRJNA627080 Hetian sequence reads Luo et al.[234]^11 GenBank: PRJNA627080 Luxi Gamecock sequence reads Luo et al.[235]^11 GenBank: PRJNA627080 Tulufan Gamecock sequence reads Luo et al.[236]^11 GenBank: PRJNA627080 Xinghua sequence reads Luo et al.[237]^11 GenBank: PRJNA627080 Beijing You sequence reads Luo et al.[238]^11 GenBank: PRJNA627080 Yunyang Da sequence reads Luo et al.[239]^11 GenBank: PRJNA627080 Tibet sequence reads Li et al.[240]^64 GenBank: PRJNA306389 __________________________________________________________________ Software and algorithms __________________________________________________________________ SRAToolkit v2.10.8 Kodama et al.[241]^65 [242]https://hpc.nih.gov/apps/sratoolkit.html Trimmomatic v0.39 Bolger et al.[243]^66 [244]http://www.usadellab.org/cms/index.php?page=trimmomatic BWA v0.7.17 Li et al.[245]^67 [246]https://sourceforge.net/projects/bio-bwa/ SAMtools v1.11 Li et al.[247]^68 [248]https://sourceforge.net/projects/samtools/files/samtools/ GATK v4.1.9 McKenna et al.[249]^69 [250]https://gatk.broadinstitute.org/hc/en-us VCFtools v0.1.16 Danecek et al.[251]^70 [252]http://vcftools.sourceforge.net/ SnpEff v5.0c Cingolani et al.[253]^71 [254]http://snpeff.sourceforge.net/ Manta v1.6.0 Chen et al.[255]^72 [256]https://github.com/Illumina/manta/ Delly v0.8.7 Rausch et al.[257]^73 [258]https://github.com/dellytools/delly svimmer v0.1 Eggertsson et al.[259]^74 [260]https://github.com/DecodeGenetics/svimmer SURVIVOR v1.0.6 Jeffares et al.[261]^75 [262]https://github.com/fritzsedlazeck/SURVIVOR PLINK v1.90 Purcell et al.[263]^76 [264]https://zzz.bwh.harvard.edu/plink/ Beagle v5.0 Browning et al.[265]^77 [266]http://faculty.washington.edu/browning/beagle/beagle.html IQ-TREE v2.1.4 Minh et al.[267]^78 [268]http://www.iqtree.org FigTree v1.4.3 [269]http://tree.bio.ed.ac.uk/software/figtree/ [270]http://tree.bio.ed.ac.uk/software/figtree/ ADMIXTURE v1.3.0 Alexander et al.[271]^79 [272]https://dalexander.github.io/admixture/ PopLDdecay v3.41 Zhang et al.[273]^80 [274]https://github.com/BGI-shenzhen/PopLDdecay SweeD v4.0.0 Pavlidis et al.[275]^81 [276]https://github.com/alachins/sweed GEMMA v0.98.3 Zhou et al.[277]^82 [278]https://github.com/genetics-statistics/GEMMA [279]Open in a new tab Resource availability Lead contact Further information and requests for resources should be directed to and will be fulfilled by the lead contact, Chunyan Mou ([280]chunyanmou@swu.edu.cn). Materials availability This study did not generate new unique reagent. Experimental model and subject details Animals The sampling site located at Beijing Huadu Yukou Poultry Industry (Beijing, China), a total of 300 Jinghong layer chickens were kept in the same environment (independent rearing of same cages in the laying house) until 56 weeks of age, all chicken were kept in a suitable environment with sufficient water and food. These chickens were all female and in healthy condition at the time of sampling (56 weeks). All animal experimental procedures were conducted under protocols (No. 5 proclaim of the Standing Committee of Hubei People’s Congress) approved by the Standing Committee of Hubei People’s Congress and the ethics committee of Huazhong Agricultural University in China. Method details Sample acquisition We recorded the average number of days post-insemination until the last fertile egg in each individual as an indicator of sperm storage capacity. The chickens used for sequencing were characterized by an extreme phenotype for sperm storage capacity. According to the phenotypic records, 50 chickens of each of the highest and lowest phenotypes were selected. Blood samples were collected from the wings of all chicken using vacuum tubes containing ethylenediaminetetraacetic acid (EDTA) at 56 weeks of age. These blood samples were used for DNA extraction. DNA extraction, whole-genome resequencing and genomic data collection Genomic DNA was extracted from 100 Jinghong blood samples by the SDS-Proteinase K method. The concentration and quality of the extracted DNA were quantified using the ND-2000 spectrophotometer and agarose gel electrophoreses, respectively. In total, 100 JingHong genomes were sequenced using Illumina Hiseq 4000 platform to an average depth of ∼16X per individual, with pair-end read lengths of 150 bp. We downloaded previously published genomes from NCBI Sequence Read Archive (SRA, [281]http://www.ncbi.nlm.nih.gov/sra/) and the European Nucleotide Archive (ENA, [282]https://www.ebi.ac.uk/ena). Finally, we used 477 whole-genome sequences for the analyses in the present study ([283]Table S1). Phenotype definition Four phenotypes were defined by the breed characteristic features in our study, which included shank length, body length, body weight, and annual egg production. Of those, body weight, body length and shank length are three indicators to measure the body size. The characteristic features of breeds were mainly from Annals of Chinese Poultry Genetic Resources, published studies of breed genetic resources and official websites for chicken breeds. For each phenotype, breeds with no records of characteristic features were marked as missing data (NA) ([284]Table S1). Sequence alignment, SNP/InDel-calling and annotation At first, the raw data was converted to fastq files using SRAToolkit (Version 2.10.8),[285]^65 and the fastq files were trimmed by removing adapters and low-quality reads using Trimmomatic (Version 0.39).[286]^66 Sequence quality was assessed by FastQC ([287]http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). After trimming, the clean reads were aligned to the chicken reference genome GRCg6a ([288]http://ftp.ensembl.org/pub/release-101/fasta/gallus_gallus/dna/) with the Burrows-Wheeler Aligner (BWA) (Version 0.7.17)[289]^67 using the default parameters. Mapping results were then converted into the BAM format and sorted with SAMtools (Version 1.11).[290]^68 After sorting, we performed SNP calling using Genome Analysis Toolkit (GATK) (Version 4.1.9),[291]^69 with all individuals in each dataset simultaneously, including MarkDuplicates, FixMateInformation and BaseRecalibration programs. The variants dataset was from ENSEMBL ([292]http://ftp.ensembl.org/pub/release-101/variation/vcf/gallus_gallu s/). Then, each individual variant was detected using HaplotypeCaller commands in GATK. SNPs and InDels were genotyped for all individuals together using the joint calling method and further filtered using VariantFiltration command in GATK package. The SNPs filter condition of VariantFiltration was “QUAL <30.0 || QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR >3.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0” and remove the sites with more than three SNP in the 10bp window “-cluster 3 -window 10”. The InDels filter condition of VariantFiltration was “QD < 2.0 || FS > 200.0 || SOR >10.0 || MQRankSum < −12.5 || ReadPosRankSum < −8.0”. Finally, VCFtools (Version 0.1.16)[293]^70 was used to filter SNPs and InDels with the parameters “--min-meanDP 5 --min-alleles 2 --max-alleles 2”. SnpEff (Version 5.0c) was used to annotate the SNPs and InDels to specific genomic regions according to the ENSEMBL chicken annotations ([294]http://ftp.ensembl.org/pub/release-101/gtf/gallus_gallus/).[295]^ 71 We keep the variants on autosomes and Z chromosome for further analysis. In total, 23,504,766 SNPs and 3,289,782 InDels were identified from our data set ([296]Figure S1, [297]Tables S3 and [298]S4). At the same time, we also counted the basic information of the raw sequencing data, including the number of reads, matching rate, sequencing quality, CG content and sequencing depth, and corrected the sex information of the samples according to the difference of reads on sex chromosomes in bam files ([299]Table S2). Structural variant calling We only kept samples with depth >10× to identity SVs, a totally 223 samples were used in this study. SVs such as deletions (DEL), inversions (INV), duplications (DUP), and insertions (INS) were identified through the Manta v.1.6.0[300]^72 and Delly v.0.8.7.[301]^73 For the manta pipeline, each individual SVs were called with default parameters, and all the SVs with PASS flag were kept in this step, lastly, all labeled "PASS" individual variant call format (VCF) files were merged using svimmer v.0.1[302]^74 to get final manta calls. For the delly pipeline, each individual SVs were called with default option, then the results were merged using delly merge with options of “-m 50 -n 1000000 -c -p -b 1000 -r 0.8”, an additional delly call step with default parameters was performed using the merged SVs as reference to genotype the individual SVs. The SURVIVOR v.1.0.6[303]^75 was implemented to detect the overlapping SVs identified by the two approaches with the command line: (SURVIVOR merge sample_files 1000 2 1 1 0 50 sample_merge.vcf). Data quality control of SNPs and InDels We used PLINK software (Version 1.90)[304]^76 to perform quality control of sequencing data using two strategies: (i) Population-wide quality control based on strict criteria to ensure that as much genomic sequence information as possible is utilized from all populations: SNPs and InDels with call rates <95% and minor allele frequencies (MAF) < 0.05 were excluded. (ii) Single-breed quality control based on general criteria: SNPs and InDels with call rates <90% and MAF < 0.05 were excluded. Population genetics analysis At first, we retained high quality autosomal SNPs using VCFtools, and the missing genotypes were imputed using Beagle (Version 5.0).[305]^77 In addition, we used PLINK with the parameters “--indep-pairwise 50 5 0.5” to minimize the effects of SNPs from regions with extensive LD. Finally, we got 2,958,430 SNPs for the phylogeny and population genetics analysis. A Maximum-likelihood tree was constructed by IQ-TREE (Version 2.1.4)[306]^78 and visualized using the FigTree (Version 1.4.3) ([307]http://tree.bio.ed.ac.uk/software/figtree/). For the principal component analysis (PCA), we used PLINK with the parameters “-pca 477” to calculate all principal components (PCs). Genetic structure clustering was performed using ADMIXTURE (Version 1.3.0) program[308]^79 by assuming the number of ancestral populations (K) from 2 to 25. The results were visualized using the R script (available in [309]https://github.com/speciationgenomics/scripts/blob/master/plotADMI XTURE.r). For autosomal genome SNPs data, we calculated the genetic diversity (θπ) using VCFtools with the parameters “-window-pi 40000 --window-pi-step 20000”. We used VCFtools to calculate the Fixation index (F[ST]) for all pairs of groups with sliding window of 40 kb and sliding step of 20 kb, and constructed a neighbor-joining tree based on the F[ST] value. Linkage disequilibrium (LD) decay was measured by calculating correlation coefficients (r2) for all pairs of SNPs within 1000 kb using PopLDdecay (Version 3.41).[310]^80 Selective sweep analysis Detection of positive selection Here, only the 13 populations with a sample size of at least 20 were chose to detect positive selection. In this analysis, two complementary methods, the CLR and iHS were used. SweeD (Version 4.0.0)[311]^81 software was employed to calculate the CLR with a grid size of 20 kb. The unstandardized iHS was calculated using the rehh R package with sliding window of 40 kb and sliding step of 20 kb.[312]^83 Regions ranked 99th percentile in the distribution of the values in any two of the methods were defined as putative selective sweeps. Identifying selection signatures for body size and egg production In this analysis, we divided 23 breeds according to the body weight phenotype values of each population, where FT, BC, BRA and BRB with the highest body weight phenotype values were defined as large-body size group and YB, DWS, DLC, RJF with the lowest body weight phenotype values were defined as small-body size group. Similarly, JH, BL, WL and WLG with the highest annual egg production phenotypic values were defined as high egg production groups, RJF, DWS, DLC and LX with the lowest annual egg production phenotypic values were defined as low egg production groups. Then, we performed pairwise F[ST] and XP-EHH scans across the whole genome with 40 kb sliding window and 20 kb sliding step in large body size vs. small body size, and high egg production vs. low egg production. Finally, we filtered out windows with less than 10 SNPs, the regions both in the top 5% of F[ST] and XP-EHH (including the top 5% of positive and negative values) were identified as potential selection regions. Based on the chicken QTL database, we selected genes annotated to overlapping selected regions of QTL related to growth, body size or reproduction and egg quality, respectively, as candidate genes for downstream analysis. Revealing specific selection signature for sperm storage capacity For the sperm storage capacity trait, we used a trait-specific selection signal detection method, the basic steps are as follows: (i) ranking the phenotypic value; (ii) according to the ranking result, the first 20 samples and the last 20 samples were selected to form the highest difference gradient pair, (iii) the first 35 samples and the last 35 samples formed the medium difference gradient pair, (iv) the first 50 samples and the last 50 samples formed the normal difference gradient pair ([313]Table S22). F[ST] and AFD[314]^84 were used to detect differential loci in three phenotypic gradient differential population pairs for sperm storage capacity. From normal to highest difference gradient population pair, the top 1% of SNPs in which the F[ST] and AFD statistic values exhibited a gradient change were defined as the sperm storage capacity traits’ specific selection loci. In addition, we also calculated the XP-EHH statistic to detect differential selection regions according to a sliding window of 40 kb steps of 20 kb. Of those, the top 1% of SNPs local in XP-EHH statistic values exhibited a gradient change were defined as the sperm storage capacity traits’ specific selection regions. Genome-wide association study The correlations between the SNPs/InDels and the traits were tested using mixed linear models in GEMMA software (Version 0.98.3).[315]^82 The kinship matrix is calculated using the standardization method. The effect of population stratification was corrected by adjusting the first five PCs as derived from the whole-genome SNPs. For body weight phenotype, we used average of the standard breed (male + female average). The phenotypic values for egg production are recorded as the annual egg production of the standard breed. In this study, Individuals within a breed have the same phenotypic values, missing values are represented by −9. The significance threshold for the GWAS was defined using the Bonferroni correction method. The suggestive genome-wide association significance threshold was defined as 0.05/N, where N represents the number of genome variants for each independent GWAS. The quantile-quantile and Manhattan plot were generated with the qqman R package (Version 0.1.2).[316]^85 Candidate gene analysis We annotated genes associated with the candidate SNPs, InDels, or genomic regions using the chicken reference assembly Gallus_gallus_GRCg6a. The functions of the candidate genes were consulted based on the annotations in the NCBI ([317]http://www.ncbi.nlm.nih.gov). For the target genes, we analyzed the haplotype blocks within the gene in different chicken breeds and compared the distribution of the selection signature statistics in the 2 Mb interval around the genes. In addition, we performed GO and KEGG pathway analyses on the candidate genes using the KOBAS online website.[318]^86 Categories with the threshold of p value <0.05 was considered as significantly enriched terms and pathways. Finally, significantly enriched signaling pathways or candidate genes that were reported in the previously published article were used to construct a signaling network affecting domestic chicken production. Acknowledgments